This blog will be dedicated to examining and promoting civic data in Chicago, Cook County and Illinois.
WBEZ is partnering with the Smart Chicago Collaborative to promote civic data. This blog will be part of that collaboration.
We'll post original data sets @OpenSocrata
Last week a group from Chicago’s Urban Center for Computation and Data released a new tool to simplify that process. It was announced at the Code for America Summit.
WBEZ’s Chris Hagan spoke to former Chicago Chief Information Officer Brett Goldstein recently and discusses what he’s learned about the project.
Here’s the conversation we had on the Afternoon Shift about Plenar.io and my conversation with former Chicago CIO Brett Goldstein (which you can read more about here).
Many conversations around data are marked by a feeling that there is more information available than society can handle or properly analyze.
The same feeling goes for open government data. Stores such as the city of Chicago data portal has a large supply of information, but joining the different sets together to find relationships – especially between agencies and sources – can be time consuming and a barrier to entry for some.
This week a group headed by Brett Goldstein and Charlie Catlett at the University of Chicago Computation Institute’s Urban Center for Computation and Data and Argonne National Laboratory released Plenario, a tool they hope will help those interested in civic data access it more easily.
Plenario works on the idea of joining multiple sources through two lenses – time and geography. Choose a time and place, and Plenario will show you all the public data that’s available.
Plenario in action, courtesy Urban Center for Computation and Data
Goldstein, the city of Chicago’s former and first Chief Data Officer, is scheduled to formally introduce the tool at the Code for America Summit in San Francisco Tuesday. WBEZ caught up with him on the phone before the announcement, while he was in an Uber on the way to the summit.
How did the idea for Plenario come together?
I was at the aspen institute for the open government session and we were talking about open data. And one of the frustrations I expressed was that it’s great that we’re opening this data and it’s great we’re being transparent, but I feared that all we were doing at that point was putting spreadsheets on the web. And I had the epiphany, what if we moved beyond the transparency play and beyond just spreadsheets of data on the web, but how can we unify the data? From there, it became how can we build technology that will bring together all of this data to make it more useful.
How do see people using this in the wild once they get used to the platform?
This is all about getting an architecture in place so people don’t have to worry about the underlying system. Right now you have open data from the city of Chicago, the state, the federal government, the county. You have all of this data all over the place and all of these application programming interfaces, which are different ways people connect with the data. What we’ve done here with Plenario is one-stop-shopping, where through a single API you get the total story of place. And this gets at all the constituencies, from researchers to the media to community activists, where applications can be built on top of this one API to get the total story. Beyond that is this idea that as we build tools and as we offer solutions, they’re portable, so what we build in New York will work in Chicago.
One of the stories I hear, and I think this goes back to the Aspen institute, was that open data was too hard to use, and this was from the perspective of a journalist. I don’t want data to be hard to use. I want people to focus on the information. By bringing it together, making it easy and doing the hardwork on the underbelly, I hope we’ve done this.
What has the reaction been so far? You’re out at the Code for America Summit in San Francisco, talking with a lot of open data people, what have you heard from people in the industry?
People are enormously excited by this. I’ve talked to researchers and they’re astounded by the idea that they don’t have to do enormous data work. Let’s take a case in point: Weather data. If you talk to people, they understand NOAA has weather data, but historically it’s very hard to work with and join with other data sets. How do you bring these two things together so you can work on your meaningful research problem? We have solved that. When researchers hear, the data is ready to go, I can focus on my hypothesis and testing on my models, there’s a level of excitement that we haven’t seen before, coupled with this idea that wow, maybe all these different datasets aren’t that different. My focus and the focus of Charlie Catlett has been on spatial data. At the end of the day, all of these jurisdictions have data that is spatially enabled. When you accept that common value and you tie it all together, you realize that the problems aren’t all that different.
People are even more excited that we’ve open sourced it. Later today I’m announcing it at the Code for America summit, it’s an alpha version but there’s genuine excitement in the community that this is the direction we should go and we should all collaborate and move our research, our media efforts, our community efforts from data work to solution providing and information gathering.
What do you see as the relationship between Plenario and the municipalities themselves. Is this a replacement for the city data portal or do they work together?
They are actually quite synergistic. Let’s take for example the city of Chicago. When they post data there which is spatially enabled, Plenario can then consume all of it and the city doesn’t have to worry about it. Imagine the situation where a tool is built on top of Plenario that provides whatever analytics the city would like, all the city has to do is upload their data to the portal, submit it to Plenario and get the benefit of the tool. This is a layer that brings things together and also allows for the creation of common tools. This is something that offers enormous benefit to the city of Chicago, but also all the other cities. When you have municipalities with finite budgets, finite tools and finite resources, this is an opportunity for them to get more while at the same time being more transparent, something the city of Chicago has done very well.
Looking forward, what is your hope for how this grows and evolves?
The first step we need to do is get past the alpha. We’re planning on scaling quite large. The initial things that worry me are how do we go to that enormous scale? Architecturally we’re confident in our design, but much more than that, and where the true value comes, are some of the research problems we’ll be focused on. And this really gets into the smarter city realm. Back when I was the CIO, CDO of Chicago, I was very focused on predicative analytics and how we move into prevention instead of reaction. And you can clearly see that we can begin to develop those tools on top of Plenario which will enable that smarter precision. And the beauty behind it is that we really have the technology. Clearly there is a movement behind big data and analytics, and all these tools are waiting to be used, but we’ve been so bogged down by getting the data into a useable form. But beyond just usability is sustainability. And part of Plenario, which is key, is the constant updating of data, so we avoid those one-hit-wonder solutions, but instead create a platform that can offer that real-time decision making. So the next few years are enormously exciting.
Looking at the larger civic data community, what do you see as the opportunities for this kind of work over the next few years?
There are certain things that when I was a CIO, we would not have been in the business of building certain apps. SweepAround.us by Scott Robin, dealing with the street sweeper schedule, that was a great example of the community stepping in to building something the city might not be able to do. This is a continuation of that, where we can get the civic community engaged and say, we brought all of this data together, help us build visualizations, help us build tools to make our city better and our residents more aware, because cities aren’t necessarily good at things like visualizations.
At the same time, this is a wonderful platform where you have companies in Chicago and throughout the US that are dealing with advanced analytics, and within these companies there are people who want to be civically engaged. When I was a kid, my parents sent me around the block with a can to collect for charity, and I thought that was really important. Now we have a cohort of super smart data scientists who want to know how to give back. We’re giving them the platform to work on those really hard problems that are key to our cities and key to smarter government and it’s waiting there for them to work on it.
One of things I found enormously exciting when I joined Mayor Emanuel’s new administration as the first Chief Data Officer, there was a commitment to data like we hadn’t seen before in Chicago. We like to talk about the city of Big Data. We pushed really hard for a couple years to have what really turned into an enormous open data program and advanced analytics, and now we’re seeing with the engagement of the University of Chicago, companies like DataMade, the computation institute, we have government engaged with academia with the private sector, which has really put Chicago in the front. It’s exciting to see what we’ve done, but it’s more exciting to see what we do in the months and years to come, so I’m really for Chicago in this regard.
This week the city of Chicago added new information on traffic violations to the data portal with the addition of location for speed and red light cameras, to go with violation data they posted last month.
For example, here’s a map of all the speed camera locations in Chicago with total and mean violation counts.
This issue got a huge amount of attention recently with the Chicago Tribune’s investigation into red light camera tickets, which showed that many drivers were getting suspicious tickets during anomalous spikes.
The data on violations only goes back to July 1, so it’s not possible to go through the Tribune’s process with these numbers. Of course it is possible with the Tribune’s own data, which they’ve made available for download.
Cook County also released new data this week, covering foreclosures and mortgages, and three sets around tax codes.
The tax data sets include listings of tax codes, agencies and rates for the more than 1,800 individual taxing districts in Cook County (which is crazy one its own), with rate information going back to 2006. The housing data covers listings of foreclosures, mortgages, and quit claim deeds from 2013 to August 2014.
The new release is part of Cook County’s ongoing push to add more information to their data portal.
Over the past few years there has been more and more pressure on government to treat technology more like companies in the private sector. Cook County took a step in that direction in April when they named Simona Rollinson as Chief Information Officer, after 15 years with Follett Software. Her task: Update an antiquated web of IT systems, something often easier said than done in county government. She joins us to discuss what she’s doing to overhaul their outdated IT systems.
Want to know how many grenade launchers law enforcement agencies in your county have? There’s a database for that now.
This week NPR released a database of military surplus purchases by local law enforcement agencies around the country from the Pentagon’s Law Enforcement Support Office. Included are purchase order for guns, ordinance robots and MRAP’s (that would be mine-resistant, ambush-protected vehicles), among other things. There’s also construction equipment, portable generators and musical instruments.
Purchases are broken down by county, not agency, so it’s not clear whether a purchase came to the city of Chicago, Cook County or another municipality, but does give an idea of how much equipment is available in the area for law enforcement. (Agency data is available for a few states, including Indiana but not Illinois, based on separate information requests the NPR team submitted.)
For example, there are 19 MRAP’s in Illinois and Indiana, with three in Cook County alone. If you’re curious what exactly an MRAP is, here’s a video of a few rolling down Lake Shore Drive.
The database also includes what the military originally paid for the item, though not what the new owner paid to get it. When the Northwest Indiana Regional SWAT team got its MRAP, the agency paid $1, though the original price tag to the military was $412,000.
In Illinois, Lake and Cook counties have received the most value, ranking 20 and 21 nationally. Both sit behind Clark County, Indiana, though. The Southeast Indiana county has acquired nearly $14 million in military surplus equipment since 2006. That’s $127 worth for each of the county’s 110,100 residents.
Still, that doesn’t mean Clark County (and specifically the Sheriff’s Department, which received the majority of the equipment) is awash in hi-tech weaponry. Weapons actually only account for 0.4 percent of the total value.
Instead, Clark County is stocking up on vehicles, construction and materials handling equipment, and tractors. Those four categories account for the majority of the military surplus value they’ve received since 2006.
If you’re curious and want to dig into the data yourself, the files are available on Google Docs, and NPR also has a github repo explaining how to set up your own database.
Divvy is your Chicago bike sharing system with thousands of bicycles available to you 24/7.
This week Divvy, Chicago’s bikeshare, released its second set of data, covering the first half of 2014. The first release in February had the the service’s first six months, so now there’s a whole year worth of information to look through.
Since 2001, fewer and fewer Chicago students are attending their neighborhood elementary school. The increase in school choice – including charter, gifted and magnet schools – have meant that all schools are part of the choice system in CPS. Over that time, the percentage of CPS students who attended their neighborhood school dropped from 74 to 62 percent.
Yesterday we published a story on the trend, including a map charting the change around the city over the past decade. In addition to all that, you can also download all the data we used in putting the story together.
The spreadsheet has information since 2001 on nearly 600 schools – including those that have opened and closed in that time – on the number of children attending and counts on how many are from the school’s attendance boundaries and how many aren’t.
Specifically, here’s what’s included:
Name of School: Common name of the school
Address: Most recent address for this school. Some schools have changed location over this time period so this does not reflect its address for all years.
UNIT: CPS unit number
CPS School ID: School ID assigned by Chicago Public Schools
Total Attending: Total number of students enrolled in the school .
Residing Attending: Number of students attending the school who reside in the school’s attendance boundary. Schools without an attendance boundary will display 0.
Attending Not Residing: Number of students attending the school who do not reside in the school’s attendance boundary. For schools without an attendance boundary this is equal to TA.
Residing Not Attending: Number of CPS students who reside in a school’s attendance boundary but do not attend the school. Schools without an attendance boundary will display 0.
Total Residing: Total number of CPS students residing in the school’s attendance boundary .
Residing Attending/Total Attending: Percentage of the school’s students who reside in the attendance area.
Attending Not Residing/Total Attending: Percentage of the school’s students who do not reside in the attendance area.
Residing Attending/Total Residing: Percentage of CPS students in the school’s attendance area who attend that school.
Residing Not Attending/Total Residing: Percentage of CPS students in the school’s attendance area who do not attend that school.
Our story and map focused on the percentage of CPS students in an area choosing to go to their neighborhood school as a proxy for community buy-in to the school, but there are other questions you could look at with the data. The other side to the question we looked at is the percentage of students at a school who live in the area (Residing Attending/Total Attending).
The data also capture the opening of closing of schools, as well as the increase in charters and other schools without attendance boundaries. You can find those schools by filtering for schools that have a Total Attendance number (meaning the school is open that year) but without a Total Residing number (which means no attendance boundary).
We hope others interested in the topic will find questions and answers we hadn’t even thought of.
As of today the mid-term elections are only three months away. Between now and November 4th, campaigns will be frantically raising and spending as much money as they can. While candidates do need to disclose their spending, this information isn’t that easy to find, and it can be even harder to analyze. That process got a little easier last week with the launch of ElectionMoney.org.
The site collects information on candidates, contributions, spending and more, and makes it free to download. It’s intended for people who are serious about finding out information about campaign finance, especially journalists, researchers and analysts.
Rayid Ghani has been one of the big names in the U.S. civic data movement. As chief scientist for Obama for America, he helped make analytics a popular topic for governments, non-profits and other groups that never realized the potential value their information held.
Now as the Chief Data Scientist for the University of Chicago’s Urban Center for Computation and Data, Ghani has continued to push and get data scientists interested in societal problems.
Last year he founded Data Science for Social Good, a fellowship through UChicago that matches up students with data skills with organizations looking for help solving problems with data.
In its second year, DSSG now has 48 fellows working with groups not only in Chicago but throughout the country and even an organization in Mexico. A second DSSG group has even started in Atlanta.
We visited DSSG offices recently and spoke with Ghani about what data science really means, what makes Chicago’s data community different and where he sees the field going.
One of the fellows mentioned to me that on the first day you put statements on the board and they had to agree or disagree, and one was “Data science is not a real field, it’s just statistics done by people with weird hair.” For you, how do you define data science and is it a relevant term or a buzzword?
There are going to be trendy buzzwords for every area that’s in demand. In my mind, data makes organizations and people rational, and whatever you call it, that way of rational decision making is not hype, it’s not a buzzword. It’s always been there, it’s just not as widespread because a lot of organizations were not collecting enough data. Now that they’ve collecting data, of course they want to allocate their resources more efficiently, of course they want to do things better, and data is a really good way to start down that path.
The phrase data science may be a buzzword right now, but the people behind it aren’t new people just coming up and saying we do data science. It’s people who were doing related things with computational tools and analytical tools to do better decision making, now they’re calling themselves data science. The phrase may be hype, but there’s a lot of solid science behind it that’s not hype.
The Atlanta group is a great start. A single program in Chicago can only grow so big, so the only way to scale this is a lot of people who are doing this in a distributed way, but connected in a community where they can share. We had a skype session with them. Some of the people applied to our program, we asked if they want to go to their program, some of the projects in both programs can be shuffled around. I see a larger setup happening where there are lots of these programs, not just summer programs but semester-long classes or informal research groups at universities or meet up groups like Code for America trying to do similar things.
There is a core that is interested in these problems, a larger informal network that starts to come together where people have the same goals, they have different skills, so how do you share the overhead? That wasn’t initially the goal, but as we see interest from everybody else we want to make sure we can help them grow these programs and build this larger network.
The fellows have been out at a lot of different meet ups and groups. What do you see as the fellowship’s role in the larger Chicago data community?
Chicago has an interesting data community. People who are in Chicago and interested in data are really interested in deeper, more tangible problems. They’re not as interested in building a web app to find you a date. That’s sort of the typical data uses. They more interested in problems that are deeper, tangible, rooted in lots of data.
One of the things we wanted to do was make that more known. The rest of the country doesn’t really realize that. Part of the goal of bringing these fellows here is that they’ll see the community that is starting in Chicago and choose this as a place to do their work. One of the reasons I’m doing the program in downtown and not Hyde Park where the university is I want these fellows to interact with the local tech community, local nonprofits, local government communities, go to meetup groups. We do happy hours every Friday. The idea is to get them exposed and mingling. Then they become part of this community.
As someone who has been involved in the civic data community for a while, looking at your project and all the things that are popping up there seems to be a lot of attention around it. One, is it moving in the right direction, and two, is it moving fast enough to solve the problems you want to solve?
It’s never going to be fast enough. Problems will always grow faster than the solutions until you reach a certain tipping point. Right now what’s happening is there’s a lot of fascination with data, and not enough fascination with problems. That’s very natural in any new community; you get fascinated by the new shiny things which are data. The problems are old. We still have problems with education, problems in health care, problems with sustainability, problems with community development and crime, safety. The problems are not new, the data is the new thing and I think initially people get attracted to new things, but when that initial hype settles done it comes down to solving the real problems that you have.
We’re still in the data fascination stage. We’re moving to the problem phase, where what’s really important right now is for people who have the problems to make sure that people with the skills to solve these problems know about these problems. If you’re not telling people about your problems, you can’t get mad at them for not solving. If they’re building a web app that’s not very useful to you, maybe you should tell them what you’re problems are.
A lot of time I spend with nonprofits and cities is to try and illicit what are your biggest problems. How do I communicate to, not just the fellows here, but meet up groups and different people who have the skills to solve these problems, how do I translate these problems to them so they find them exciting and motivating and start working on the. Because without that guidance they’ll start working on problems that interest them and may not be the most useful problems for society in general.
The enterprising people at DataMade just released Election Money, where you can download 170MB of bulk campaign finance records from the Illinois Board of Elections. My copy is still downloading, but excited to start digging through this.