This article is reproduced in full:
Thanks to Reddit user MJGee, we now have evidence of misuse of data which is causing Australia’s social security program Centrelink to issue false debts to people. The recipients of these false debts are informed of a supposed discrepancy with the income they reported many years ago, leading to debts of thousands of dollars.
As suspected, an automated system matches fortnightly Centrelink reporting periods to annual income from the Australian Taxation Office. The income from the ATO data is divided by 26 to obtain fortnightly income. The issue for MJGee and other innocent Australians is that for someone who is on benefits for only part of the year, the income they earned while not receiving benefits is treated as income they earned while receiving benefits. For many Australians, especially those struggling to get by, income varies throughout the year, and yet Centrelink’s automated system works on the assumption that income is perfectly stable.
To contest these false debts, those affected are being asked to submit payslips from as far back as six years ago. The victims may be unable to contest the debt because of troubles with the online system (these debts cannot be contested in person or over the phone) or the inability to produce old payslips. Some may even pay the debt either out of hopelessness, or because they do not realise that it is false. Others may only find out about the debt after it has been passed off to a debt collector.
Reddit users are claiming malice on the behalf of Centrelink and Government ministers, which is a claim I’m hesitant to make. A far simpler explanation is that these parties are using data they simply don’t understand. It’s unfortunate that the victims of this data illiteracy are likely to be the most vulnerable among us.
This bungle is a perfect example of how data can be misused and how data can hurt people. These experiences may even demonstrate the limits of automation, and the need for some processes to always require manual human verification.
I don’t know what happened inside Centrelink’s offices when this process was designed and implemented, but I can speculate how such a misuse of data came about. Suppose a Centrelink leader approaches one of their analysts for a data solution. They want to verify reported fortnightly income using ATO data. Except, for one reason or another, the ATO data is annual, with no finer granularity. How does the analyst tell a data illiterate Centrelink leader that their solution doesn’t exist; that you can’t join fortnightly data to annual data in the way they want? Here again comes the myth that data is magic, and that a data solution always exists. When it comes to data, “It can’t be done” is not an acceptable answer.
Every analyst knows that there’s no such a thing as a “set and forget” model. You need to test your model both before and after it’s implemented. Except the political quagmire on which Centrelink precariously floats only allows for one success metric: every dollar “clawed back” is a success.
Note that the mathematical elements of Centrelink’s wrongdoing are actually quite simple: You can’t get fortnightly data from annual data. There are no complicated statistical concepts at work here. The challenges are entirely human.
There’s also a lesson here in the limits of automation. A $24,000 false debt was issued because the recipient reported their employer’s name two different ways, and so the automated system assumed that there were two employers. Perhaps a human being would’ve realised this at a glance and not issued the debt? In the past, when I worked on a model in a similar domain, I would always stress that a computer is a tool for doing lots of calculations very quickly; it cannot make judgements.
I hope that this issue generates enough buzz such that Centrelink and the ministers responsible are forced to concede. These parties should heed a very important lesson from this: you need to understand data before you make decisions with it.