Home » Uncategorized » Chicago Murder Rate

Chicago Murder Rate

I’ve seen a lot of articles talking about Chicago’s high murder rate recently, and I’m going to complain about them. More constructively, I’ll post some graphs.

Most recently, the articles were triggered by the very high rate of murders in January 2013. Before that, they were triggered by the round number of 500 murders in 2012. Annoyed at those articles, I went to google, which found me an article on the high rate of murder 10/2011-3/2012. That third topic is a serious topic, but I don’t know why there was a crime wave then. Maybe the mild winter? But the other two were non-events. Monthly homicide rates vary by huge amounts and don’t mean anything. There were very few in December 2012, followed by very many in January 2013. So what? Despite the low (but meaningless) rate of murders in December 2012, reporters were writing about the high rate, because 2012 reached the round number of 500. That was 15% higher than in 2011, but the difference was entirely due to the very high rate early in the year, so the story was over. The first half of the year was a big deal, but reporters were not talking about that, but about the false claim that 2012 had been uniformly violent.

Also, many of these stories compared Chicago to New York. New York is the exception, not Chicago. Yes, Chicago could learn something from New York, but so could all American cities. Chicago is doing worse than the national trend in the past 5 years. But I expect that half of cities are doing worse and half better.

Before I get to the graphs, what did I learn?

  • monthly murder rates are noisy
  • murders are seasonal, occurring in the summer
  • loess won’t automatically detect seasonal trends on multi-year time series; more generally, I need to understand it better.

I got Chicago homicide data from a journalism project to map homicides based on a similar LA project. They get their data from the weekly police blotter. It differs from the final police figures by 2% in most years, in both direction, but the 2010 final number was 4% higher. I clicked on the google spreadsheet links to download, then ran this R code

library(plyr)
library(ggplot2)
library(Hmisc)
library(lubridate)

all <- mutate(all, date=as.Date(mdy(Date)), year=year(date), month=month(date))
all <- subset(all, month<2|year<2013)

summary<-rename( ddply(all,.(month,year),nrow), c("V1"="murders") )
summary<-mutate(summary, month2=month+12*(year-2007),
   rate=murders/monthDays(as.Date(paste(summary$year,summary$month,1,sep="-"))) )

qplot(month,rate,color=factor(year),data=summary)+geom_smooth(se=F)
qplot(month2,rate,color=year,data=summary)+geom_smooth()

The first graph shows the seasonal pattern of murders and compares different years.

The second graph shows the long term trend.

What did I learn from this?
From the first graph, there is a seasonal trend.
Second, from either graph, there is a lot of noise.
Third, the second half of 2012 is typical, while the first half is very bad, though I already deduced that from the numbers in the papers.

The loess on the whole time series did not notice the seasonal trends. It is probably right to ignore a such a high frequency effect (period 12 on the monthly discretization). So I tried graphing number of murders per day. That is very noisy, so I’m only going to show curves, not scatterplots. I didn’t know how to get ddply to do this, so I used a for loop. Also, R’s : operator doesn’t work well with dates.

library(plyr)
library(ggplot2)
library(lubridate)

all <- read.csv("all.csv")
all <- mutate(all,date=as.Date(mdy(Date)),year=year(date) )

start<-as.Date("2007-01-01")
end  <-as.Date("2013-01-31")
summary <- data.frame()
for(i in 0:(end-start)) {
  d <- start+i
  df <- data.frame(date=d, murders=nrow(subset(all, date==d)))
  summary <- rbind(summary,df)
  }
summary <- mutate(summary, year=year(date), day=yday(date))

qplot(date,murders,data=summary,geom="smooth",method=loess)
qplot(yday(date),murders,data=subset(summary,year<2013),geom="smooth",color=factor(year),se=F)

The first graph, of the six year trend, doesn’t look any different than the one based on monthly discretization, so I’m not posting it. The second is only slightly different from the one based on the monthly discretization. The most obvious difference I see is that in some years, the discretization broadens the peak across two months.

So one lesson is that monthly discretization is not so bad, for looking at the data a year at a time.

The second lesson is that I don’t understand loess. I was hoping that with the daily data, the global loess would pick up the seasonal cycle, but it doesn’t. It gives more weight to distant observations than I had thought. I should learn how it works and what options there are for tweaking it. Probably I can force it to be more local. By doing loess one year at a time, I’m throwing out information at the year breaks, yielding wide standard errors at the ends (visible in other graphs). I tried moving the year breaks, but the results weren’t that interesting.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s