Big Data Gone Bad for Fighting Crime


Nearly two years ago, we featured a story about a software system called PredPol, a data analytics tool that promised to deliver lower crime rates through algorithms designed to recognize patterns behind a series of crimes to better anticipate how to position police officers to intervene in the activities of criminals.

The results from the pilot studies for the predictive software were promising, enough so that it received a wider test in real-life policing in being applied to more regions within the cities where it was being tested. In October 2016, Kristian Lum and William Isaac of the Human Rights Data Analysis Group released the results of their study into how well PredPol was performing in reducing crime, but what they found gives cause for concern about the data that is being used to direct police activities in the U.S. cities that have adopted the data analytics approach to fighting crime.

Lum and Isaac discovered that the software had a real shortfall because of biases in the police records that were being fed into it.

While police data often are described as representing "crime," that's not quite accurate. Crime itself is a largely hidden social phenomenon that happens anywhere a person violates a law. What are called "crime data" usually tabulate specific events that aren't necessarily lawbreaking – like a 911 call – or that are influenced by existing police priorities, like arrests of people suspected of particular types of crime, or reports of incidents seen when >patrolling a particular neighborhood.

Neighborhoods with lots of police calls aren't necessarily the same places the most crime is happening. They are, rather, where the most police attention is – though where that attention focuses can often be >biased by gender and racial factors.

Focusing on Oakland, California, Lum and Isaac tested the bias of PredPol using race and income level as observable characteristics with crimes involving illegal drug use, the incidence of which studies have indicated are relatively uniform across the racial and income demographics of Oakland's population.

But you wouldn't know that from the results of the PredPol's predictive software using Oakland's police data.

Our recent study... found that predictive policing vendor PredPol's purportedly race-neutral algorithm targeted black neighborhoods at roughly twice the rate of white neighborhoods when trained on historical drug crime data from Oakland, California. We found similar results when analyzing the data by income group, with low-income communities targeted at disproportionately higher rates compared to high-income neighborhoods.

The reason for that turned out to have been directly embedded in the data the race-neutral PredPol software used to direct police activities. Because the data reflected the increased level of law enforcement activities that already existed in the city's primarily black and low income neighborhoods, the software directed police to increase their intervention efforts in the areas where they were already disproportionately focusing their attention.

That increased attention would then be reinforced in an adverse feedback loop, as the police records being generated from their new increased activity would tend to amplify the already disproportionate law enforcement activities in these areas, which would have consequences for a police department already accused of practicing racial profiling in law enforcement.

The software-directed adverse feedback loop would also open the city up to the "squeezing balloon" problem we noted in our previous coverage, where increasing police pressure in one area would result in increases in the incidence of crime in other areas, which would now be more likely to escape both detection and intervention.

What the results indicate is that using data analytics to effectively reduce crime is more complicated than simply factoring racial or income factors out of the software packages used to maximize the return on police investment. They can be useful, but the GIGO principle definitely applies.