> But then the corollary is, sometimes the data is good.
Yes. but in my limited experience with ML, "the data is good" usually isn't the
>> most likely
explanation :-)
> This is what happens with loan or crime data sets, and it made plenty of noise in the news recently - but only the kind of noise in which people say it's obviously the algorithm that's broken, because it doesn't fit the "polite fiction" they'd like to believe.
People say the algorithm is broken because it's illegal to discriminate on the basis of race, and these algorithms were sneaking "discriminate by race" "in through the back door".
Calling the law a polite fiction won't get you very far when the judge issues an injunction against using your product because it's racially biased.
And the judge doesn't care how machine learning researchers define bias. He cares how the law defines bias.
So, if you're diddling around on your computer in your own time, then I guess the algorithm isn't broken. But if you're building a product you want to sell to courts or insurance companies, the algorithm very much is broken.
That is, unless you think "the law is a polite fiction and bias means only what ML researchers say it means" would be a winning argument to lift an injunction against your product. If you're going to make those arguments in front of a real judge, let me know; I want to see the bulge in that judge's forehead :-)
No. People say the algorithm is broken because it obviously is broken.
> So you're basically saying the algorithm is broken because it discovers illegal correlations, even though they may be true.
No. Discovering those correlations is completely legal. Making certain decisions based upon those correlations is illegal. Mostly because by making a decision, you make a tacit assumption about causation that's borderline impossible to prove and can have a huge impact on people's lives.
Like I said, if it's just you in your home office having fun, go at it. But if you then bake that model into certain products, you have a very serious bug.
> Well, that's precisely "polite fiction".
Except in this case, it's not even that!!!
The observation that certain crime statistics are highly correlated with race, and that race is correlated with zip code, are not new observations. ML did not usher in some brave new world here. Just because we call it "AI in 2018" instead of "John from the Actuarial dept. in 1960" doesn't change the moral, ethical, or legal landscape. (the article literally makes exactly this point.)
And despite your characterization, this particular truth (race correlates to crime correlates to zip) isn't even a politically incorrect observation!!! It's something everyone already knows, and I've never seen someone attacked for pointing out this correlation. In fact, it's a favorite talking point of social justice types! Pointing out this correlation is not impolite.
The impolite assertion is that there's a causative link between race and crime. That's an assertion that models (tacitly) make when their users shift from truth-seeking to decision-making. See the "question you thought you asked vs. question you actually asked" portion of the essay.
Now, if ML algorithms discovered some genetic, racial, causal theory of crime, then you might have a point about ML exposing polite fictions in this case. But they didn't, so you don't. COMPAS isn't being censored from sharing a politically incorrect truth. It's being prevented from ruining people's lives with a really, really lazy application of statistics.
I often joke that racial discrimination laws are one of the few examples where "being bad at math" is not just criminal, but unconstitutional.
> ...laws are not designed as truth-seeking tools
Again, in cases where bias becomes illegal, these models are NOT just being used to seek truth. They're being used to make decisions.
You seem to have misconstrued the fundamental thesis of the article. The author isn't calling for death to polite fictions. And there's a concrete example in the article of this point of departure between your perspective and his. Namely, his response to people-as-gorillas was not "fuck you, the math is right". And the crescendo of the piece is "AI is just a tool, not a divine oracle, and there's nothing new under the sun".
Yes. but in my limited experience with ML, "the data is good" usually isn't the
>> most likely
explanation :-)
> This is what happens with loan or crime data sets, and it made plenty of noise in the news recently - but only the kind of noise in which people say it's obviously the algorithm that's broken, because it doesn't fit the "polite fiction" they'd like to believe.
People say the algorithm is broken because it's illegal to discriminate on the basis of race, and these algorithms were sneaking "discriminate by race" "in through the back door".
Calling the law a polite fiction won't get you very far when the judge issues an injunction against using your product because it's racially biased.
And the judge doesn't care how machine learning researchers define bias. He cares how the law defines bias.
So, if you're diddling around on your computer in your own time, then I guess the algorithm isn't broken. But if you're building a product you want to sell to courts or insurance companies, the algorithm very much is broken.
That is, unless you think "the law is a polite fiction and bias means only what ML researchers say it means" would be a winning argument to lift an injunction against your product. If you're going to make those arguments in front of a real judge, let me know; I want to see the bulge in that judge's forehead :-)
No. People say the algorithm is broken because it obviously is broken.