There was a deathly silence in that vast hall of a thousand-plus data mining and artificial intelligence practitioners in New York City recently when the speaker made the following statement: ‘If you reject a consumer loan application and the consumer asks why her loan was rejected, you will get into regulatory trouble if you say, ‘I don’t know, the algorithm did it’.’
Considering that the conclusions that algorithms make, be it in loan approvals or in deciding which stock a hedge fund should buy or sell, or in health care, are treated as sacred truths, this sudden demand for ‘explanations’ sounded like a thunderbolt from the sky.
In viewing this emerging debate, it is important to go back to the foundational years of statistics, the methods of which lie at the heart of today’s Machine Learning and Artificial Intelligence and Karl Pearson, in the Britain of the 1880s.
His ‘correlation coefficient’, which is the first baby step that anyone who studies statistics takes even at the school level. It, for example, helps you determine whether glucose level in humans increases with age.
Given the level of glucose in the blood of individuals of, say, a dozen people of different ages, it helps us calculate whether age and glucose level are ‘correlated’.
Pearson, in the England of the 1880s, went on to define a great many other things, which serve as the foundation of the science of statistics and its contemporary version, Machine Learning and Artificial Intelligence, such as principal component analysis, the chi-squared test and the histogram.
These statistical tools that we revere so much even today were essentially used to put a scientific basis to propagate ‘eugenics’, the belief that human beings come from different races and that there are ‘inferior’ races and ‘superior’ races and that no amount of training or education could improve a person of an ‘inferior’ race.
From those early days, as the 20th century progressed, statistics was enlisted for many other causes, including the computation of the gross domestic product, that one number which is nowadays used to conclude how well or badly a country and its government are doing.
By the 1950s statisticians were sitting at the highest policy making circles and helped created five-year plans.
With the general disenchantment with economic planning in the late 1980s and fervour about ‘free markets’ and ‘competition’, statistics and jobs for statisticians took a back seat, but in our contemporary era, young men and women with felicity in statistics get the highest starting salaries after graduation and with the advent of Machine Learning and Artificial Intelligence (the contemporary high-sounding words for statistics) this trend has accentuated many-fold.
At the core of these ultra-fashionable disciplines lies the work of Pearson and his contemporaries of the 1890s: Correlations, regressions and so on.
But, just as these late 19th century tools return to prominence, so have the questions about ‘explainability’, spelling out the reasons for a conclusion about, say, why a loan application was rejected, something which goes beyond saying that ‘the algorithm says so’.
Mridul Mishra of Fidelity Investments, at the same conference, offered some suggestions about what ‘explainability’ could be — what makes a ‘good’ explanation for the conclusions of an algorithm. First, he said, try ‘contrastiveness’, if an input into the algorithm changes.
For example, if the percentage of one’s salary that a loan applicant saves every month increases by, say, 10 per cent, does the algorithm spit out a ‘loan approved’ conclusion.
If yes, the ‘explanation’ for the loan rejection by the algorithm is that the loan applicant is not saving enough.
He spelt out several other ways of providing a ‘good’ explanation. (For the record: Counterfactual explanations, Bin-based explanations, Shapely Value explanations and so on). In other words, creating ‘explanations’ for the output of an algorithm is itself rising to an industry status!
While one cannot quarrel with the desire for ‘good’ explanations, I can anticipate some interesting debates in the immediate future. For example, if an IIM applicant’s CAT exam score is 85th percentile and he is rejected, what answer can we give him if he asks for an ‘explanation’ — that there is statistical evidence that the higher a person’s CAT exam score, the better he will be as a manager when he graduates from an IIM? (Having studied this issue, I can safely tell you that no such correlation exists).
Similar questions may get raised about all the various ‘weeding out’ exam scores that we use in our country for promoting kids in schools, admitting them to colleges and so on.
What will be the ‘good’ explanations? That high school exam score is correlated with the literacy level and a minimum income level of parents?
If that turns out to be true, are high-school exam scores merely a measure of the social origin of a kid?
Such debates are, at present, confined to the esoteric world of tech conferences, but I can see interesting debates (and battles) ahead, once our courts start backing the demand for ‘good explanations’ for conclusions made by algorithms.