Total number of stops and searches

Grouped by ethnicity

I start by plotting the total number of stops and searches (since May 2017 because that is the earliest data of the dataset), grouped by ethnicity.

From this chart, a simplistic conlusion would be that white people are searched significantly more than other ethnicities, so there is no racism in the system. This is clearly bad reasoning, as we need to account for the underlying population.

Including population

Population data is taken from here. I use this data to produce the following chart. Note that I grouped the various numbers together in the same way I grouped ethnicities together in producing the ethnicities column.

Now things look bad. There is clearly a discrepancy between the population and the number of stop and searches.

To visualise this discrepancy more clearly, I decided to create a Sankey diagram using Plotly.

The diagram makes the discrepancy quite plain to see. Black people are stopped disproportionately more than other ethnic groups. There is evidently a big problem here.

However, and unfortunately, this diagram does not tell us where exactly the problem is. Is the problem with the police or is there a deeper problem? Are the police racist for stopping black people more often, or, is this a reflection of crime rates and the underlying social issues?

Some people would look at the above diagram and wonder how this is not conclusive evidence of police racism. To illustrate the idea, consider the following two charts:

The majority of people would not look at these charts and conclude that the police are sexist or ageist, so one should not use the chart above for ethnicity to automatically conclude the police are racist.

To try to shed some light on the question of racism, I will take into account the outcome of the stop-and-search.

Including outcomes

The following stacked barchart shows the breakdown of outcomes for each ethnicity.

This is not at all what I was expecting. I was expecting to find that black people would have more false stop and searches than white people. It is shocking how consistent the ratio is across ethnicities - almost suspiciously so. There is some discrepancy if you look closely, but dramatically less than what the Sankey diagram above suggested.

Conclusions

My main goal for this was to gain some better understanding of crime data, and the process of cleaning and summarising data. To my surprise, it seems from this simple analysis that police stop-and-search is not inherently racist, but there is a high chance I have not accounted for something or that my process is over-simplistic. Of course, you should refer to more authoritative sources for conclusions on these complex issues, and not base your opinions on an amateur blog.

Some key lessons I learnt:

Code

Here I provide sample of the code used to produce the charts.

Below is the code to produce the first bar chart.

colours_255 = [(66, 133, 244,255), (234, 67, 53,255), (251, 188, 5,255), (52, 168, 83, 255)]

colours = [ tuple(n / 255 for n in colour) for colour in colours_255]

plt.figure
sns.barplot(x = sas_ethnicity.index, y = sas_ethnicity,
           order = ['White', 'Black', 'Asian', 'Other'],
           palette = colours)
plt.grid(True, axis = 'y')
plt.title('Stop and Searches since May 2017, by Ethnicity')
plt.xlabel('Ethnicity')
plt.ylabel('Number of Stop and Searches')
plt.tight_layout()
plt.savefig('sas3_sas_eth.png')

Here is the code to produce Sankey diagrams.

# create function that plots Sankey diagram given appropriate dataframe

def create_sankey(df, title):
    len = df.shape[0]
    
    fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = ['Proportion of Population'] + list(df.index) + ['Proportion of Stop and Searches'],
      color = "blue"
    ),
    link = dict(
      source = [0]*len + list(range(1,len+1)),
      target = list(range(1,len+1)) + [len+1]*len,
      value = df.iloc[:,0].append(df.iloc[:,1])
    ))])

    fig.update_layout(title_text=title, font_size=15)
    fig.show()

# create dataframe containing population and stop and search data by ethnicity
sas_eth_pop = pd.DataFrame({'population': population, 'sas': sas_ethnicity, }, index = sas_ethnicity.index)
sas_eth_pop = sas_eth_pop.loc[['White', 'Black', 'Asian', 'Other']]
sas_eth_pop.sas = sas_eth_pop.sas/sas_eth_pop.sas.sum()*100


# create sankey diagram
create_sankey(sas_eth_pop, 'Stop and Searches by Ethnicity')

Here is the code to produce the stacked barcharts at the end:

# group data by ethnicity and outcome. 
sas_eth_out = pd.DataFrame(sas.groupby(['ethnicity', 'outcome']).outcome.count())
sas_eth_out.rename(columns = {'outcome': 'frequency'}, inplace = True)
sas_eth_out.reset_index(inplace = True)

# convert frequencies into percentages
sas_eth_total = sas_eth_out.groupby(['ethnicity']).frequency.sum()
sas_eth_out['total'] = sas_eth_out.ethnicity.map(lambda eth: sas_eth_total[eth])
sas_eth_out['percentage'] = sas_eth_out.frequency / sas_eth_out.total * 100

# pivot table, and re-order the rows
sas_new = pd.pivot_table(sas_eth_out, values = 'percentage', columns = 'outcome', index = 'ethnicity')
sas_new  = sas_new.loc[['White', 'Black', 'Asian', 'Other']]

# plot the graph
sas_new.plot.bar(stacked = True)
plt.xlabel('Ethnicity')
plt.ylabel('Percent of Stop and Searches')
plt.title('Breakdown of Outcomes of Stop and Searches')
plt.legend(labels = ['False / no further action',
                    'Minor further action',
                    'Major further action'],
           loc='center left',
           bbox_to_anchor=(1, 0.5))
plt.tight_layout()
plt.savefig('sas3_outcome.png')