The findings have implications for social media sites where judgments about whether a particular post broke the terms of use of the site are often decided by AI algorithms.
The researchers found that the models often do not replicate human decisions about rule violations and, if trained with the wrong data, the model are in fact likely to make harsher judgements than humans would.
Marzyeh Ghassemi, a MIT assistant professor and author of the study, said: “I think most artificial intelligence/machine-learning researchers assume that the human judgements in data and labels are biased, but this result is saying something worse. These models are not even reproducing already-biased human judgments because the data they’re being trained on has a flaw:
Humans would label the features of images and text differently if they knew those features would be used for a judgment. This has huge ramifications for machine learning systems in human processes.”
According to Ghassemi, the correct dataset to train the models are those that have been labeled by humans who were explicitly asked whether items defy a certain rule.
But data used to train machine-learning models are typically labeled descriptively. For example, humans could be asked to identify factual features, such as the presence of fried food in a photo.
If this “descriptive data” is then used to train the models that judge rule violations, the models tend to over-predict rule violations, the researchers noted. Ghassemi’s findings were published recently in Science Advances.