“It’s concerning that it can do that when it shouldn’t be able to.”
Such was the insightful comment of one of my coworkers on learning that Artificial Intelligence systems can determine the race of a patient from his XRays, matching the patient’s self-reported race with 80-99% accuracy.
He wasn’t alone in thinking this. That is, emoting this; little to no rational thought was involved. Of the dozen people in the meeting, half or more spoke up, saying that the AI must have been trained wrong, that the race must have been encoded on the XRay, or it’s the quality of the XRay (with nonwhites of course being scanned by older equipment). Something. Not stated was that race doesn’t exist and is purely a social construct; that’s revealed truth to almost all of my coworkers and need not be spoken unless someone utters heresy.
“I know something is true and what I’m seeing doesn’t match what I know is true so what I’m seeing must not be true.” Or, to paraphrase Groucho Marx, “Who are you going to believe, your beliefs or your lying eyes?”
Most people will deny what their eyes are telling them. They’ll deny truth, reality, in favor of their beliefs. This of course applies to a lot more than an AI determining race “when it shouldn’t be able to”, but I’ll focus on AIs because that’s what came to my attention and because I know more than most about AI systems. (And to limit the amount of time spent in writing this; as usual I have many more things to do than time to do them.) (Proof of this: it’s taken me four months to get an uninterrupted hour to finish writing this essay.)
AIs have useful results in many fields, completely unrelated to one another. They’ve been trained to sift through mountains of noisy astronomical data to find exoplanets, a task largely beyond humans because the data is too great for our brains.
AIs have been used to predict recidivism, and seem to be much better than humans, at 90% versus little better than guessing. This has led to complaints by the accused (claiming bias, presumably because the AI wasn’t trained to be sensitive about disparate impact) and by judges (who inform us that these matters are more an art than a science).
AIs have been used by HR departments to make hiring decisions or at least to prioritize candidates. But there’s a problem. Two problems. First, the criteria for determining whether the AI (or the human) made a good decision are squishy. In predicting recidivism, regardless of your feelings about racial bias or disparate impact, if an offender is arrested again within a year, that’s a solid data point.
When it comes to employee performance, HR departments often try to apply objective criteria such as number of trouble tickets resolved, but those are seldom useful for more than an overview of one facet of the job. Most of the employee evaluation is squishy, and seldom completely honest for any number of reasons.
And that gets us to the other problem with using AIs for hiring decisions: Very often, racial minorities are underrepresented in the selected candidates, regardless of the job. In the computer fields, minority under-representation is even worse, when chosen by AIs. Women are also under-selected.
The obvious explanation, of course, is that the AI was trained wrong. It’s probably deliberate bias, because almost everyone involved in creating and training AIs is a white man. (That’s not true. In the US, men of European descent are a minority of people involved in building, training, and running AIs.) Or unconscious bias, because the training data is not fully representative of all job seekers or convicted criminals or hospital patients. Or some other kind of bias because the data contains patterns that no one’s noticed before and it’s throwing off the results.
That last one is a reasonable concern. In the example above of determining race from XRays, it’s possible that a hospital serving almost only blacks used one model of XRay machine and another hospital serving almost only whites used another model. When the AI was trained using data from these two hospitals, it could have picked up the relationship between XRay machine model and patient race. “Deep learning” systems are notoriously opaque, so something can sneak in and not be found for a while. (Though human thought processes are not exactly open for scrutiny, either, and very seldom do humans realize that their decisions are influenced by how recently they’ve eaten or the similarity of a job candidate to an old girlfriend.)
On the other hand, data scientists and AI developers are trained to be aware of this issue, to recognize the signs of a spurious correlation, and to actively look for problems. Furthermore, many of the announced, unacceptable results have been replicated many times with consistent findings. It’s not impossible but is very unlikely that the same error appears in different systems created by different teams using different data.
Even on projects in which only a single AI has been trained for some problem, claims that the training data must be biased are not accompanied by examples of problems in the data. The claim of bias comes by backward reasoning from the results which must be wrong, rather than from any direct evidence.
When AIs are trained to solve real-world problems using real-world data and they repeatedly come up with unacceptable results, there’s something wrong. Something major, something systemic.
It’s possible that the problem is in the way the AIs are designed and built and trained. Despite what I said a few paragraphs ago, errors do creep in. It’s conceivable that the same bias crept in and threw off the hiring recommendations of half a dozen HR AIs, all in the same racially and sexually biased fashion, or that the identification of hospital patients needing more intensive post-release follow-up was thrown off by richer (ie, white) patients getting more expensive treatment than poorer (ie, black) patients. (That did happen. It was corrected as soon as the problem was noticed, but that mistake has become one of the go-to examples of racial bias in AIs. One of the very few examples that anyone can point at, I’ll note.)
I don’t believe this is a systemic problem, though, because AIs have proved themselves useful and accurate in any number of areas, from optimizing warehouse layout in a manner no human would have thought of (and showing actual benefits in the amount of time needed to collect items) to finding risk factors for elderly people to fall and require hospitalization. (One of my coworkers found that a couple years ago, via “deep learning” examination of dozens of demographic and clinical factors. What popped out of the AI were factors such as age and weight, which were known to doctors and which served as a good check of the AI’s function, as well as a few surprises like a blood test showing some hormone over some threshold. Sorry about not remembering the details; I’m not a doc and they meant nothing to me. However, the docs put the findings to work and wound up with improved patient outcomes, showing that the AI’s findings were valid.)
As mentioned above, the claim that the demographics of the teams developing AIs result in systemic bias, somehow, is frequently made. The mechanism of this bad result is never detailed, merely hand-waved as a “Well, what else can it be?” As with the claims about biased training data, this is reasoning backward from a result which cannot be accepted to the conclusion that something must be wrong in the way the AI was built because of the people who built it.
The factor which is missing from “analyses” that the AIs must be wrong is the possibility that the AIs are right and that the common wisdom is wrong. That is, when there’s a discrepancy between A and B, why do they always assume that A is right and B is wrong?
It’s known that judges do a poor job in predicting recidivism. The good ones hitting maybe 60% is nothing to brag about.
It’s known that HR departments are bad at hiring, retention, and raise decisions because they apply criteria other than objective measures of expected or actual performance.
If what you care about is good results, then you should compare the real world effectiveness of what humans recommend versus what the AI recommends and go with whichever is get better results.
If the AI is giving bad results, you probably need to look at what it was told to do. More than that, you need to look at how you are deciding that the AI is wrong or unacceptable. By what criteria does the AI’s recommendation result in a bad result?
So far as I can see, only certain kinds of AI results result in butthurt. Not even radiologists complain when an AI detects early breast cancer at 99% accuracy, easily beating out the most experienced radiologists and doing a hundred times as many images in a day. It’s only when the AIs’ results trample on shibboleths of race or sex or other social factors that people complain.
There’s a saying which is common among some groups: Reality is that which doesn’t go away when you don’t believe in it. There’s an addendum to that: When religion collides with reality, reality wins but the truly faithful won’t admit it. (And they often use the unwelcome reality to strengthen their faith.)
The truth which the True Believers don’t want to admit is that a machine can do a better job of objectively seeing reality than humans can. The machine is more honest in looking at the world through whatever lens it’s told to look through.
AIs do a good job of finding patterns for what they’re optimized for. They give “wrong” results because there’s a mismatch between what people say is important and what is really important to them — eg, avoiding appearance of racial bias is more important than likelihood of repaying a loan.
If you want to make sure that “enough” black families get bank loans to meet some quota, program that into the AI. (The most straightforward way would be to order that the percentage of black loan recipients must exceed the percentage of local population which is black and let the AI figure out how to make that happen, but that may be too straightforward and honest for executives and managers to accept.)
If you want to believe that there’s no such thing as race, I don’t know what to tell you. Machines, which are not told in advance that there is no such thing as race, keep finding clusters of physiology or behavior or preference which bear a shocking similarity to race as understood by most people. Netflix came under fire for racism in their recommendations. Recommendations were based solely on the syllogism “Most people who liked X also liked Y. You liked X. We recommend Y to you.” But that was unacceptable because white people tended to like different things than black people. I don’t know how Netflix resolved that controversy, but I suspect that it was enough to simply point out that their customer sign-up form doesn’t ask about race.
Whatever way you go, be honest about how you’re deciding if an AI’s results are good. And stop attributing your own racism and sexism to the AI programmers.
I’ve cross-posted this over at Daily Pundit. There’s a lot of overlap in the readership but not total overlap.
Very good post. The true believers in “race is just a social construct” don’t want to listen to anything that contradicts their religious dogma. But any normal person knows that race is very real, and expresses in many ways. Small children know this, even; it takes years of indoctrination and social pressure to train them to pretend it is not true.
While it is likely to be horrifically nasty (especially for the professional race baiters and their pets), they are pushing for a race war and seem likely to eventually get one. It will not go as they anticipate.
My favorite statement on the subject is “Race is just a social construct until you need an organ transplant.”
Of <i>course</i> AI can determine race. Sex too. Those parameters can be derived from thousands of datasets, and you can’t lie about bone structure and density.
Which is why Bruce Jenner’s x-rays will never be a female, and Rachel Dolezal’s will never come back as negro, no matter what either of them wish it were otherwise. Ratifying delusion is not a function of AI.
And you can program AI to tell you, based on color return, whether it’s looking at chocolate, vanilla, or mint chocolate chip ice cream, with near 100% accuracy.
What it cannot do is make subjective assessments, like “Which one tastes better?”
You also can’t quantify a “bad” employee.
You could, perhaps, specify a chronically late one, or a time thief, or an unproductive one, but that would only take the boss looking at their timecard <i>after the fact</i>, and not require gigabytes of mainframe analysis, nor can such be inferred in advance with any worthwhile accuracy.
And you can’t tell me who’s going to re-offend.
You can suggest who’s <i>more</i> or <i>less</i> likely to do so, and based on a large enough dataset, you might could get to 90% reliability. But you can’t make decisions like that where 1 out of 10 times, you’re screwing the pooch, either by turning a monster loose, or punishing someone safe to release. That’s a vanilla/chocolate taste test, not a vanilla/chocolate flavor test. machines simply cannot perform that, because the calculations simply do not compute.
Find one set of fingerprints out of 8 billion? Sure, with enough access, and brute force computing to crunch the data. Tell me which ones are going to go all Aloha Snackbar terrorist? Not in a million years.
That was the entire point of <i>Minority Report</i>. Such precision is quite simply impossible, and attempting it is tyrannical. It would be like issuing COVID passport based on AI analysis of who’s <i>likely</i> to get the virus. How’s that working out for anyone, about now?
You can program AI to stay inside the lines. But evidently, if no one considered that people wander onto active highways by stepping over them without looking, your AI will run them over until it’s told not to, and you can capture a driverless car simply by simulating a white line with a pound of table salt poured around the car in a circle, effectively immobilizing it, because AI tells it not to cross a solid white line.
Tech is bullshit, and AI is a myth, in this regard. This is why globull warmism is, and always will be, utter bullshit: the number of variables is infinite, and infinitely unknowable, hence unpredictable. Ever. meteorologists with actual degrees can’t even tell me the weather 10 days from now, but warmist cultists think they can predict the climate of the entire planet a century from now. It’s sheer insanity to suggest it’s even possible, and the effort is only explainable by an agenda, not science.
Eeeeeeek! #Triggered #LiterallyShaking
AIs can not predict the future. (At least until Skynet builds that time machine.) But they could be trained to estimate likelihoods for sufficiently large groups that have sufficient amounts of accurate data about them. As with any such statistical analysis (human performed or AI), trying to apply broad conclusions to an individual member of the group is a mistake. But for flagging which members of the group might need some additional attention and further investigation, I suspect an AI could be trained to a fairly high level of usefulness.
The usual suspects will of course scream “RAYCISSSSS!!!!” and talk about disparate impact and systemic oppression and such. But profiling (if done well and with awareness of its limitations) works. Which is why pretty much every human ever born does it, unless rigorously indoctrinated not to.
I don’t really need an AI to tell me which criminals are more likely to reoffend.
One measure of how good an analysis method is, is to compare it to other methods’ results rather than against perfection. If judges, using their years of experience and their gut have 52% success and a complicated AI which cost millions to create and probably several dollars per decision for amortized cost has 90%, what could we do with something simple? Such as get the offender’s IQ, marital status, parental marital status, and, though heads will explode, race. Something simple like that could be computed by a human in under a minute and might well have an accuracy of 75%. Is that good enough to be worth doing? (Probably.) Would it result in shrieks of outrage? (Probably.)
Would you accept that level of likelihood to convict for capital crimes?
If you were the defendant?
I’ll get back to you on that when convictions and sentencing are done algorithmically.
Nice shifting of the goalposts, by the way.
0 for 2.
I did no such goalpost shifting; and you seemed perfectly willing to inflict a standard you would not live under until called on the point.
This is why algorithmic convictions and sentencing will never be acceptable.
Justice is always bespoke, and can never be machine-derived, because machines have no ability to judge, which has nothing to do with mere measurement, and concerns things which are by definition, essentially intangible.
When Solomon heard the case of two women claiming the same baby, he ordered the child to be cut in half, in order to force the actual mother to drop her claim. which worked.
Algorithmic “justice” would actually cut the baby in half, and think it had arrived at the actual solution post-division.
“Would you accept that level of likelihood to convict for capital crimes?
If you were the defendant?”
Perhaps you could clarify that – AI at 90% correct Vs 52% correct from the judges? Or the 75% computed by humans?
I’m not real sure what your beef is.
How happy would you be rolling the dice as to whether you were part of the 10% wrongly misclassified by such a system.
AI cannot make value judgements. Why would anyone want to try to use it for such? Tyranny by algorithm is still tyranny. It doesn’t become friendlier just because you substitute microchips for dice.
What you seem to be making the case for (and I’m not sure which is why I ask) is that “rolling the dice” with humans at a 48% wrongly misclassified is better. Which it’s clearly not. And yes I know the 48% is just a number pulled out of the air, as is 90%.
There are many things that when taken out of human hands have improved outcomes. We do it every day when we automate all manner of actions.
“Judy Gichoya, a radiologist and assistant professor at Emory University who worked on the new study, says the revelation that image algorithms can “see” race in internal scans likely primes them to also learn inappropriate associations.”
Why do I find it funny that Judy thinks AI will learn “inappropriate associations”?
How does an association become inappropriate? The association exists. AI cannot make the association appropriate or inappropriate.
This Judy idiot seems to think that she gets to decide what is “inappropriate” and what isn’t. But reality always gets the final vote. You can ignore consequences for a while, maybe even a long while. But you can not avoid them entirely.
Judy’s not the only one. Kai-Fu Lee, a researcher and businessman in the AI field, said in an interview, “Generally we want to balance the need to remove what we know is bad with the need to have more data to train on.” Which raises the question of what’s “bad” and who determines it. An adult human’s weight being ten pounds or ten thousand pounds is a data entry in his medical records. Three quarters of non-white inmates having IQ lower than 80, welllll, the IQ tests are biased and don’t mean anything anyway and they’re based on ability to read and systemic racism in the schools and …
Lee also said, later in the interview, “I wouldn’t throw out all the data just because I humanly think it’s not useful, but you might want to throw out data that you think is contaminating.” And later said that the AI makes decisions on a purely mathematical basis and we humans with our limited brains can’t understand it. So, really, I think he was talking out both sides of his mouth in order to hint at the truth but not get cancelled. -shrug- Or maybe he’s internalized doublethink so well that he doesn’t notice the contradictions.
I think your last sentence is probably true. Few people are more devoted to the practice of double-think than academics.
I just shake my head at people like this. The AI is trained on the data set to spot things humans can’t…so let’s start by manipulating the data set! Just like the climate models, it is amazing how the end result exactly matches the assumptions and biases of the people creating the algorithm. Garbage in, gospel out, and all the while they pat themselves on the back for believing in The Holy Science(tm).
“Garbage in, gospel out…”
Oh, that’s good Haz. Consider that stolen.
I can’t claim credit for that one, Barry. Can’t recall just where I saw it, but definitely not mine.
Along with “Shits and Giggles”, you are doing a good job with the English….