By Elliott Ramos

Chicago Transit Authority president Forrest Claypool had some biting words for the Chicago Sun-Times — on its own pages.

On Thursday, the CTA chief penned a letter to the editor, chastising the newspaper’s article on CTA crime that ran on Tuesday.

On Monday evening, the tabloid released an article online, utilizing data analysis about CTA crime.  The front page of its Tuesday print edition ran with the headline: HIDE YOUR iPHONES.

Monday evening, the CTA countered with a release criticizing the analysis as flawed. The CTA’s main point of consternation was the Sun-Times claim of a 21 percent increase in crime.  

The sub-headline of the front page story read: “CTA rail stations hit by 21% spike even with high-tech surveillance.”

In his letter, Claypool said the suggestions are “false and misleading.”

The CTA did not refute whether or not crimes actually happened, but rather what types of crime should be measured and how they were measured.

The premise of the story is that crime increased on or around CTA rail stations despite an increased use of security cameras from 2010 to 2012.

In order to determine that, reporters had to assess which crimes happened near the cameras. The methodology that defined which numbers were used is at the heart of why the CTA says one thing and Sun-Times said another.

Given WBEZ’s commitment to data reporting that helps Chicagoans make sense of their city, we’re hoping to demystify the numbers… with a hilariously long-winded post on data.

The CTA crime story has made its way from online and print to subsequent TV reports.

And before sheer repetition turns this story’s claim into a commonly accepted idea among Chicagoans, we decided to look a little closer at the numbers.

If you don’t want to get into the weeds of data, numbers and variables, then the gist is this:

  • The Sun-Times analysis weeded out more than half of the locations identified as CTA crime to account for areas with cameras, although excluded buses, which all have cameras. It also excluded trains (which are partially fitted with cameras.)
  • Sun-Times also factored in all crime. CTA says violent crime is down. (Although some batteries are actually up.)
  • Theft overall is up because of smartphones, but the problem is not unique to CTA or Chicago. It’s a nationwide trend given the ease of stealing a small, expensive device.
  • The analysis does not take into account ridership numbers, which could vary dramatically depending on time of day.

NERD ALERT: The following contains a whole lot of numbers…

The Sun-Times compared 2012 crime numbers “and compared with 2010 — well before most of the CTA’s current 3,600 rail station cameras were installed — station crime was up 32 percent,” according to the paper.

In their release, the CTA countered saying that “in 2012, the number of robberies and aggravated battery incidents reported on CTA were down by 21 percent and nearly 12 percent, respectively, compared to 2011.”

The back-and-forth essentially amounts to a how-the-numbers-are-used debate. How you refine the crime data can create huge differences.

Total amount of reported crime incidents with descriptions matching (CTA) in 2010: 6,173.

In 2010, there were 6,173 reported crime “incidents” at locations identified as belonging to CTA. 

“Incidents” reported by the Chicago Police Department are not always arrests, just a record of anytime a police report was filed. 

Arrests are another story: (3,739 non-arrests and 2,434 arrests for 2010)—ignore that for now.

Where do the numbers come from?

The data set used in the analysis likely came from the data portal site (, and my personal favorite data set (Crimes: 2001-present).

The set description offers the following:

“This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified.“

One quirk about this public data is that it’s automated, meaning the data are fluid, prone to change and analysis is often a snapshot of reported data at that time.

If crimes are retroactively recategorized, the dataset will be updated to reflect changes. 

This has the potential to offset analysis that, while not common, happens often enough for data users to take into consideration.

It’s probably one of many reasons the city and the Chicago Police Department say the data should not be used for research or annual comparisons. In legalese, it’s essentially saying: ‘here’s the data, but it’s not our official number.’ 

For the sake of not getting a sanctioned number from a body that may have a stake in the outcome, public data, if cleaned up correctly, can accurately convey trends. 

(Full disclosure: I’ve published several stories utilizing an annual breakdown of crime.)

Also worth considering: a very, very small amount of crime incidents can be classified from public view, and short of a (non-denied) Freedom-of-Information-Act request, the data would not reflect those instances, so say CPD sources.

So back to that 6,173 number in 2010. Where did it come from? When you take the total amount of crime incidents on the public data portal, and apply a filter by year (2010) and by location (any instance containing “CTA”) you get that number. Hurray!

Now, we should be able to apply the same criteria to 2012 and get: 7,632.

Back to some elementary school math, making an annual amount of crime should be take 7,632 – 6,173 = 1,459. Now, 1,459/7,632 6,173 = .19116 .02363 * 100 = 19.11 23.63%.

Yay! 23.63% increase in crime from 2010 to 2012. Done. 

Wait. What?

It wasn’t going to be that easy. And I sympathize with any reporter digging through data, Sun-Times included. 

Here’s why:

If you sort all 2012 crime on ALL CTA properties, roughly 7,632, you get “types” of crime. Here’s a massive breakdown of 2010 crime on all locations identified as belonging to CTA.

If you sort crime by type, you get THIS:

Theft | 2010: (1,873) | 2012: (2,467)

Deceptive Practice | 2010: (1,027) | 2012: (2,010)

Battery | 2010: (863) | 2012: (901)

Robbery | 2010: (766) | 2012: (762)

Narcotics | 2010: (633) | 2012: (489)

Criminal Damage | 2010: (330) | 2012: (280)

Assault | 2010: (284) | 2012: (280)

Criminal Trespass | 2010: (164) | 2012: (191)

Sex Offense | 2010: (50) | 2012: (41)

Other Offense | 2010: (50) | 2012: (31)

Public Peace Violation | 2010: (48) | 2012: (53)

Weapons Violation | 2010: (22) | 2012: (26)

Motor Vehicle Theft | 2010: (13) | 2012: (21) 

Liquor Law Violation | 2010: (9) | 2012: (6)

Interfere With Public Officer | 2010: (8) | 2012: (30)

Stalking | 2010: (7) | 2012: (4)

Crim Sexual Assault | 2010: (6) | 2012: (8)

Gambling | 2010: (6) | 2012: (4)

Offense Involving Children | 2010: (4) | 2012: (5)

Burglary | 2010: (3) | 2012: (10)

Arson | 2010: (2) | 2012: (0)

Kidnapping | 2010: (2) | 2012: (4)

Prostitution | 2010: (1) | 2012: (1)

Intimidation | 2010: (0) | 2012: (4)

Public Indecency | 2010: (0) | 2012 (3)

Obscenity | 2010: (0) | 2012: (1)

Sources: CTA-identified crime for: 2010 | 2011 | 2012

Now, that you say is pretty interesting, but wait — it gets better. The FBI and the Chicago Police Department have to track violent vs. nonviolent crime.

That’s fair, right? A person drinking a 40 on the Red Line after what was possibly a tough day at the office should not be reasonably regarded with the same weight as a sex offense or an aggravated battery with a handgun, police parlance for: SHOOTING SOMEONE.


To get those numbers, we have to now sort by descriptions or a secondary set of classifications. Luckily, the police data include both NIBRS, National Incident-Based Reporting (FBI code) or the CPD method of subcategories UCR, Uniform Crime Reporting.

Wait, aren’t they the same? Of course not. 

If I sort by 2010 crimes by UCR, I get… 133 subcategories. Er.. trust me the list is quite extensive.

Why are the subcategories different and necessary?

Well, take thefts vs. robberies for instance. Theft, often categorized as a crime against property is traditionally a non-violent crime. Unless you take into account aggravated robbery with a knife – or a handgun – or strong-armed (beaten up). That’s pretty violent.

The Chicago Police Department does us the favor of meticulously outlining the subcategories and their descriptions here.

That said, you’d still be stuck classifying thousands of crimes into almost two hundred of categories.  Even with Excel and Google Refine, it’s a ton of work.

Why are the categories important?

To put bluntly: fear.

Chicagoans have been bombarded with crime stats, b-roll of crime scenes, front pages, murder stories and radio stories. It’s not unlike a country responding to terror attacks with a color coded-system that tells you when it’s OK to pack the 3 oz lotion on your trip to Florida.

Think: flash mobs. Remember those?

So when Chicagoans taking the CTA see a XX% jump in crime number, that can be interpreted as pretty alarming, until you realize there are variants, with many instances being non-violent.

The CTA, beleaguered at times with criticism, responded by saying that one of the largest reported spikes in crime was attributed to those jumping turn-styles. Turnstyle jumping is classified as “Theft of labor/services | UCR: 1210).” However, that classification can very well include other crimes, such as boarding a bus without paying.

In 2012, there were 1,861 of those, up 108.63% from 2010, which only saw 892

Here’s a SUPER breakdown of crimes by Types/UCRs:

OK, so why is the math still off?

Location matters. Choosing one can shift the numbers.

The crimes are sorted by one of five possible CTA locations:

CTA Bus stop……………………..2010: 659  | 2011: 692     | 2012: 802

CTA Train………………………….2010: 1,486   | 2011: 1,665  | 2012: 1,689

CTA Platform………………………2010: 2,029  | 2011: 2,066  |  2012: 2,658

CTA Garage/Other property….2010: 747     | 2011: 949     | 2012: 1,045

CTA Bus……………………………….2010: 1,252  | 2011: 1,392 | 2012: 1,438

Listing all of the locations would save a lot of time and effort. However, there is the issue of narrative. The question posed by the Sun-Times is: do security cameras deter crime?

To establish that, you would have to refine the results by areas where CTA has cameras.

Are there cameras on trains? Yes, but only in the recently issued 5,000-series train cars, the new ones with aisle-facing seats and electronics signs — as well as some older cars retrofitted with cameras, according to the CTA.

Are cameras on buses? Yes, actually, all of them, according to CTA spokesman Brian Steele.

Is tracking down instances of reported crimes on buses and trains while en route possible? Maybe, but that would require obtaining thousands of police reports, which may (and more than likely, not) indicate route numbers that can be matched against a bus or a train.

It’s quite possible these variables led the Sun-Times to exclude the totals from its methodology, which they’ve indicated they’ve done.

That leaves:

  • Bus Stop
  • Train Platform
  • CTA Garage/Other Property.

Bus stops were also excluded, presumably because of the lack of cameras. It’s also logical because crime can happen to just about anybody on a public way, which bus stops are on. 

We’re now left with:

  • CTA Platforms
  • CTA Garage/Other Property

The two remaining locations reflect 2,776 CTA crimes for 2010, out of 6,173 reported incidents, or 44.97% of reported CTA crimes (both violent and nonviolent).

Presumably, those locations were chosen to determine whether or not cameras were a crime deterrent. (And one worthy of the sizable cost taxpayers have poured into the far-reaching security apparatus.) CTA has repeatedly cited the spending on cameras as a potential cause for a decrease in violence, essentially justifying the expense of the Big Brother-ish system.

There are even more caveats that need to be considered – as there always are – in reporting with data.

The first, and particularly painful, is that police reports are supposed to reflect the location whenever possible, but many in the police department and those who work with data, myself included, have found instances where a crime occurred on a train, but the location is documented at the station. (Likely where a victim notified CTA to contact police.)

The data are flawed. Regardless of how well an individual or agency cleans it up, it fundamentally relies on the data reporting (officers) accurately documenting cases as told to them by victims (data sources) at the time of the incident.

While such instances are very rare, and enough to be within a comfortable margin of error, it does happen.

This is probably what precipitated the use of the following criterion by the Sun-Times: “Crimes listed as occurring on CTA property that were within two blocks of a station were assigned to that station.”

CTA has other properties, most, but not all of which, are where passengers commute. To give you an idea, think bus depots, garages, tunnels, utility facilities and even elevated tracks. Those possible locations had a total of 747 reported crime incidents in 2010.

However, when mapped out, this category, which essentially amounts to a “CTA other” category shows areas over or near a train station or over a bus stop – or in areas near neither. Why wasn’t a bus stop categorized as such?

Well, there is that previously mentioned margin of error. An officer likely checked “CTA property” instead “CTA bus stop.”

Also, the location itself is imprecise. Typical police reports always strive to get an exact address of an incident. When none is available, the nearest intersection is used. Both of these location types are geo-coded automatically by the city, not by the officers. This leaves a fair amount of ambiguity on where crime happens, but is almost always accurate enough to pin down overall trends down to the block level.

When the locations or points of coordinates are mapped, using GIS applications such as Google Earth or Fusion Tables, points with identical coordinates are automatically spaced apart for a visual representation of occurrence. It does not offer a foot by foot measure of location, but probably a radius of roughly 200 feet to as wide as a city block.

And so… determining the line of sight for a security camera juxtaposed to coordinates that are only as precise as the original entry might be impossible.

This does not mean that it wasn’t in the public interest to assess whether the CTA cameras were worth taxpayer dollars. But the math is fuzzy.

Given all the caveats, I personally cannot reproduce the Sun-Times results, even with the criteria outlined in their methodology. One would have to apply all the given factors listed above, account for a margin of error, then apply the same for 2012 to give an accurate account of crime on the CTA, as related to locations with security cameras.

Even when I take into account batteries by location, the numbers shift:

ALL Batteries2010…….2011………..2012

Platform …………201 ……. 209……….. ..212

Trains…………… 139 ……  138 ……….. 154

Bus ………………..333……   377 ………. 358

Bus Stop ……..,  130 ……  110 ………  130

Other CTA Prop..60………  53   ……….47

All Robberies………..       2010……….2011……….2012

Platform   …………………….167  ……….197  ……….122

Trains  ………………………..304 ………. 319  ………. 227

Bus….. ………………………….67   ………124  ………. 141 

Bus Stop   ……………………185    ……..191  ………. 239

Other CTA Prop.  ……………43  ……….. 29  ………… 33

This stuff is complex. 

Even though there have been increases in theft, batteries and robberies, it varies greatly by location.  Another vital factor has to be taken into account: per capita.

The amount of riders has increased rapidly, year over year.  This an especially important statistic for buses, trains and platforms as some crimes may increase as a population of area increases.

However, if location is not a factor, or at least folks agree to say let’s exclude bus stops and other properties — and work with buses, trains and “L” platforms, then everyone will be working off the same numbers.

If there is a mutually agreed set of crime categories that directly affect public safety that all parties can agree to use (CPD does), then the numbers and consequent analysis would and should be consistent.

The takeaway though, which the Sun-Times pointed out, is the prevalence in cellphone theft.

I encountered similar upticks when reporting annual crime in Lakeview last month. (Having the benefit of grouping a community area, which the city automatically does, is not nearly as painful as figuring out which crimes happened near areas with CTA cameras.)

While cellphones have been in use for over two decades, the recent reliance on smartphones has presented law enforcement with a Pandora’s box of assured theft.

While not explicitly saying these are smartphone thefts, one can make an educated guess as to where the thefts are coming from given the value amounts.


If they’re Tweeting, checking Facebook, emailing work or checking bus times – then they are not checking around them to see if somebody saw them pull out a phone worth $300-$600.

The attention of CTA passengers is often on their smartphone screens. If you want to see for yourself, take a look at this time-lapse video taken as part of the Race: Out Loud series done by WBEZ last summer to demonstrate segregation via mass transit in Chicago.

Absent the phenomenon of cellphone theft, some crime is down, while others have seen slight increases. But a contrarian or data nut would argue the year-over-year changes could possibly be interpreted as breaking even—or decreased crime— when ridership increases are taken into account.



This post mistakenly used 2012 as a denominator when calculating an example figure of an increase in crime from 2010, resulting in a 19.11% increase instead of 23.63%.  Jon Markel of the Field Museum pointed out the error. Thanks, Jon! Also, the irony of getting a numbers figure wrong in a post about numbers is not lost.  Queue Alanis Morissette.

  1. mreida reblogged this from wbezdata and added:
    So much data, but a great post.
  2. journo-geekery reblogged this from lifeandcode and added:
  3. jour72312 reblogged this from lifeandcode and added:
    Crime data is complicated. Fare evasion is a crime but does a rise therein mean I need to clutch my purse tighter? (a...
  4. lifeandcode reblogged this from wbezdata and added:
    Read this, it’s amazing.
  5. wbezdata posted this