Rayid Ghani has been one of the big names in the U.S. civic data movement. As chief scientist for Obama for America, he helped make analytics a popular topic for governments, non-profits and other groups that never realized the potential value their information held.

Now as the Chief Data Scientist for the University of Chicago’s Urban Center for Computation and Data, Ghani has continued to push and get data scientists interested in societal problems.

Last year he founded Data Science for Social Good, a fellowship through UChicago that matches up students with data skills with organizations looking for help solving problems with data.

In its second year, DSSG now has 48 fellows working with groups not only in Chicago but throughout the country and even an organization in Mexico. A second DSSG group has even started in Atlanta.

We visited DSSG offices recently and spoke with Ghani about what data science really means, what makes Chicago’s data community different and where he sees the field going.

One of the fellows mentioned to me that on the first day you put statements on the board and they had to agree or disagree, and one was “Data science is not a real field, it’s just statistics done by people with weird hair.” For you, how do you define data science and is it a relevant term or a buzzword?

There are going to be trendy buzzwords for every area that’s in demand. In my mind, data makes organizations and people rational, and whatever you call it, that way of rational decision making is not hype, it’s not a buzzword. It’s always been there, it’s just not as widespread because a lot of organizations were not collecting enough data. Now that they’ve collecting data, of course they want to allocate their resources more efficiently, of course they want to do things better, and data is a really good way to start down that path.

The phrase data science may be a buzzword right now, but the people behind it aren’t new people just coming up and saying we do data science. It’s people who were doing related things with computational tools and analytical tools to do better decision making, now they’re calling themselves data science. The phrase may be hype, but there’s a lot of solid science behind it that’s not hype.

Matt Gee mentioned at Open Gov Hack Night that you have an Atlanta group starting. How do you see this growing over the next few years?

The Atlanta group is a great start. A single program in Chicago can only grow so big, so the only way to scale this is a lot of people who are doing this in a distributed way, but connected in a community where they can share. We had a skype session with them. Some of the people applied to our program, we asked if they want to go to their program, some of the projects in both programs can be shuffled around.  I see a larger setup happening where there are lots of these programs, not just summer programs but semester-long classes or informal research groups at universities or meet up groups like Code for America trying to do similar things.

There is a core that is interested in these problems, a larger informal network that starts to come together where people have the same goals, they have different skills, so how do you share the overhead? That wasn’t initially the goal, but as we see interest from everybody else we want to make sure we can help them grow these programs and build this larger network.

The fellows have been out at a lot of different meet ups and groups. What do you see as the fellowship’s role in the larger Chicago data community?

Chicago has an interesting data community. People who are in Chicago and interested in data are really interested in deeper, more tangible problems. They’re not as interested in building a web app to find you a date. That’s sort of the typical data uses. They more interested in problems that are deeper, tangible, rooted in lots of data.

One of the things we wanted to do was make that more known. The rest of the country doesn’t really realize that. Part of the goal of bringing these fellows here is that they’ll see the community that is starting in Chicago and choose this as a place to do their work. One of the reasons I’m doing the program in downtown and not Hyde Park where the university is I want these fellows to interact with the local tech community, local nonprofits, local government communities, go to meetup groups. We do happy hours every Friday. The idea is to get them exposed and mingling. Then they become part of this community.

As someone who has been involved in the civic data community for a while, looking at your project and all the things that are popping up there seems to be a lot of attention around it. One, is it moving in the right direction, and two, is it moving fast enough to solve the problems you want to solve?

It’s never going to be fast enough. Problems will always grow faster than the solutions until you reach a certain tipping point. Right now what’s happening is there’s a lot of fascination with data, and not enough fascination with problems. That’s very natural in any new community; you get fascinated by the new shiny things which are data. The problems are old. We still have problems with education, problems in health care, problems with sustainability, problems with community development and crime, safety. The problems are not new, the data is the new thing and I think initially people get attracted to new things, but when that initial hype settles done it comes down to solving the real problems that you have.

We’re still in the data fascination stage. We’re moving to the problem phase, where what’s really important right now is for people who have the problems to make sure that people with the skills to solve these problems know about these problems. If you’re not telling people about your problems, you can’t get mad at them for not solving. If they’re building a web app that’s not very useful to you, maybe you should tell them what you’re problems are.

A lot of time I spend with nonprofits and cities is to try and illicit what are your biggest problems. How do I communicate to, not just the fellows here, but meet up groups and different people who have the skills to solve these problems, how do I translate these problems to them so they find them exciting and motivating and start working on the. Because without that guidance they’ll start working on problems that interest them and may not be the most useful problems for society in general.

The enterprising people at DataMade just released Election Money, where you can download 170MB of bulk campaign finance records from the Illinois Board of Elections. My copy is still downloading, but excited to start digging through this.

The city of Chicago released a bunch of new sets to the city data portal recently. From the city’s Chicago Digital blog:

The City of Chicago has released a handful of new datasets which pertain to several parts of daily life in Chicago. The public will be able to explore the water quality at Chicago beaches, find who and which vehicles are licensed to carry passengers, activities for Chicago’s Micro-Market Recovery Program, and the geographic areas targeted by the City’s Broadband Innovation Challenge.

The most interesting aspect of the new batch is the addition of pedicab licenses to the list of licensed Public Chauffeurs. The city started regulating the pedicab industry in June. In addition to requiring a license, pedicabs were banned from operating in the loop.

This release has the first 25 approved pedicab licenses, but also the first four denied applications and 10 more inactive ones.

Not surprisingly, the first license issued went to T.C. O’Rourke, a Chicago Pedicab Association board member. O’Rourke told Streetsblog after the ordinance passed he was in favor of the license regulations but not the geographic restrictions. He also gave his thoughts on what it all meant for his business. Go and read it.

Another interesting note: Of all the applicants, only one is female. That would be Joanne Marie Werling, who had her license approved June 11.

Switching gears (please forgive me the pun), the city’s post at Digital Chicago has a graph showing all the models of active cabs in Chicago (spoiler: cabbies love Camrys). That led to a long time sorting and filtering the makes and model of the cabs and livery vehicles.

While sorting through vehicle manufacturers I noticed Tesla listed. Indeed, there are two Teslas registered as livery vehicles, though the data portal has them coded as gasoline vehicles. Need to check in on that, but it may be an incorrect categorization.

While resources like the city of Chicago data portal have a lot of great information, it’s also good to step back and think about where the numbers came from.

The city posts all its 311 call data, including reports of abandoned vehicles. Citizens can call 311 or fill out an online form reporting a car that needs to be moved.

Like most 311 sets, the report has the date it was filed, when it was completed (if it was) and the location of the report. Abandoned vehicles also have some novel categories, such as make and model, license plate information and even the color of the car.

Those are relatively straight forward (though there are 50+ ways noting a car doesn’t have plate info). A car is a Honda or a Ford. It’s tan or red.

Less clear is the “How many days has the vehicle been reported as parked?” field. On its face it seems like we could just sum all the numbers and get an average numbers of days cars sit in every neighborhood.

The numbers range from zero (typically if a city worker finds a car without a report) to 10,000,000, which would be approximately 27,397 years.

No matter how long it may seem to someone on the block, so car has been abandoned in Chicago for 27,397 years.

While that’s likely a simple typo (or someone expressing their annoyance at said vehicle), there’s some other interesting patterns in how long people think the cars have been left.

The most common time reported is 30 days, and it’s not close. Of the nearly 60,000 completed incidents with a days parked reported, a quarter are for 30 days.

Here’s the list of all days reported with at least 1,000 mentions:

Basically that top five can be read as: one month, one week, two weeks, three weeks, two months. After that are round numbers, numbers divisible by 30 and numbers divisible by 5. When asked, people estimate time frames they know. It’s why the 3 and the zero on my microwave always wear out first.

If you’re using city data it’s important to know how the data were gathered and what the possible biases could be. In this case numbers are more like a survey with a margin of error than an actual measurement. While that shouldn’t stop someone from using it as a guide, it’s important not to draw too much from it without asking more questions first.

Some of my favorite bits of data journalism this year have come out of the MIT “You Are Here” project. Recently they took a look at Chicago transportation, calculating how long it takes to get from each spot in the city to every other. The map then tells you whether it would be faster to walk, ride, drive or take public transit. You can even start to see the L and major bus lines start to appear as you click around the map.

Here’s the statement from Illinois Gov. Pat Quinn on his veto of HB3796, which would have added restrictions for large FOIA requests. The bill also spelled out a fee structure for electronic requests based on the size of the file, averaging about $10/MB. After the bill passed, the Chicago Headline Club and the Citizen Advocacy Center came out and asked Quinn to veto the bill.


The map above shows the change in the 18 and younger population in the largest school districts in Illinois from 2007 to 2012. Darker red is a larger drop, darker green a larger gain.

This seemed important as Chicago Public Schools just announced around 1,000 layoffs yesterday, citing drops in attendance.
To be clear, these aren’t attendance numbers, but population figures from the Census Bureau’s American Community Survey. While not a direct measure, it allows for comparisons across districts.
Over the past five years Chicago has seen an 8 percent drop for those 17 and younger, compared to a 1.25 percent drop in the total population.
That ranks 10th among Illinois school districts over that time:

We embedded the interactive version of the map below, so you can check the numbers on all the Illinois school districts with at least 10,000 residents.

Used in this post:
2007 and 2012 American Community Survey, 3-year samples.
The map above shows the change in the 18 and younger population in the largest school districts in Illinois from 2007 to 2012. Darker red is a larger drop, darker green a larger gain.

This seemed important as Chicago Public Schools just announced around 1,000 layoffs yesterday, citing drops in attendance.

To be clear, these aren’t attendance numbers, but population figures from the Census Bureau’s American Community Survey. While not a direct measure, it allows for comparisons across districts.

Over the past five years Chicago has seen an 8 percent drop for those 17 and younger, compared to a 1.25 percent drop in the total population.

That ranks 10th among Illinois school districts over that time:

We embedded the interactive version of the map below, so you can check the numbers on all the Illinois school districts with at least 10,000 residents.

Used in this post:

2007 and 2012 American Community Survey, 3-year samples.

When we spoke to Josh Kalov in April, he was just finishing his first month as a civic data consultant with Cook County. Part of that process was creating an inventory of what data already existed in the county and where it lived. As part of that he put together this great list of tools and data for property in Cook County. Sounds like there will be more coming, as well.

image

Possible design for a sensor in the Array of Things project. Source: Urban Center for Computation and Data.

Charlie Catlett has a vision of the future of Chicago where citizens can talk directly with the city itself.

Imagine walking down the street and getting a text letting you know there’s ice ahead. Or even getting help walking around at night.

“It’s 10 p.m., it’s dark, you don’t know the city very well,” said Catlett, the director of the Urban Center for Computation and Data. “You’d be able to pull up an app on your phone that shows you where there’s the most foot traffic. That’s the route I want to take.”

This summer UCCD is taking a step to make those interactions possible through its “Array of Things” project. The project will place sensors around the Loop – around 50 this year with a total of closer to 400 over the next few years – that will detect light, sound, air quality and other measures, all made available for citizens to use for free.

image

Map of locations for the first Array of Things sensors. Source: Urban Center for Computation and Data.

“By making this data public, we can imagine people writing all sorts of applications taking advantage of the data, including, hopefully, ones we never would have thought of,” Catlett said.

Catlett said the project was driven by a simple question: How could you change the way people interact with the built environment if the built environment were smart?

“Could you imagine crowd-sourced infrastructure?” he said. “We’re putting these devices out into the community and it’s important to us to do that in a way that engages people to participate.”

The sensors will be placed on light poles around the Loop to start. The technology is similar to the small Raspberry Pi computers, wrapped in a weather-proof case. Around that is a shell designed by the School of the Art Institute of Chicago to make the sensors more visual inviting.

“We realized early on if we were going to put a new object into the city at infrastructural scale, it would have to be understood as positive,” said Doug Pancoast, School of the Art Institute of Chicago associate professor and UCCD member.

That’s also why starting in July UCCD will hold community workshops to talk about the sensors and what citizens would like to see – or not see – from the project.

To Catlett, the platform UCCD is building now isn’t about the sensors themselves, but the trust developed between them and the public.

“The sensors we put in now will all be replaced within three years,” he said. “That accountability is the foundation. The sensors are just the things we happen to have in there at any given time.”

The conversations are made more important because all the information collected will be automatically posted online for anyone to see and use. That makes the project a great resource for civic hackers but increases the need for privacy protections.

“The fact we’re saying the data immediately goes public means that nothing we collect can have any privacy breaches to it,” Catlett said. “It means that privacy in this instrument isn’t something that’s designed in in the end, but something that’s part of its nature and part of its architecture.”

To head that off, Catlett and Pancoast said that while the devices can detect light and sound, they will never have cameras or microphones. While they can detect the presence of BlueTooth devices (like mobile phones), they won’t do anything more than count the number of responses.

In addition to the community conversations, Catlett also said the project will get approval from an independent set of experts before adding any new technology to the sensors.

“It won’t just be scientists, but people from the community,” he said. “So maybe they don’t understand the computer science or the material science that goes into it, but maybe they can get a sense of it and understand if this is preserving privacy.”

image

Example of a possible use for air quality data. Source: Urban center for Computation and Data.

As municipal sensors like the Array of Things spread – there are already smart lights and trashcans in other cities – Pancoast sees designers using it to inform how cities are actually built.

“I can anticipate that data about urban interaction will be considered another form of input, another form of material that really has to be understood and shaped and applied to the design processes,” he said. “We have to know a lot about steel and glass, and we have to know a lot about data and human interaction to make the environments and objects we think will be useful.”