'Agile, DevOps, GitHub:' FCC's Big Data Chief on Improving Capabilities
Posted: 01/14/2013 12:00:00 AM EST | 0
What advice would you have for other large federal agencies looking to improve their data management capabilities and ultimately to reduce costs? In this interview, Greg Elin of the Federal Communications Commission, examines the role of a 'Chief Data Officer', and provides insight to managing large scale data (or big data). Read on...
The Federal Communications Commission, or the FCC, is the first and only to include the role of Chief Data Officer, which is you. What advantages does this position give your agency?
I think the role of the Chief Data Officer is part steward and part data evangelist. There’s no question that data is growing incredibly year over year, and our resources are not necessarily growing. One of the aspects is to begin to think about data as an asset, and to think about better ways to manage more and more data with the same resources that you have. And that becomes a very creative process.
I think part of that growth of data is the realization by more people that data has value outside of the system or the process in which it was originally created. That you can actually draw data from different parts of the enterprise and the organization, and you can combine that data in novel ways to help you understand important trends and novel things about the world.
We have Chief Data Officers in all of our bureaus along with my role as Chief Data Officer for the organization, and so you have to have a team and a skill set that’s looking at data as an asset itself, beyond just a warehouse or beyond just economists, this is something that we have as an asset and we need to groom it and we need to make it more useful and make it more consumable as an asset outside of these different systems.
I think one of the advantages that it gives the Federal Communications Commission is the ability to begin to focus on moving the data closer and closer to the subject matter experts, who may not understand or have familiarity with all of the new tools that are available to work with data, to do data visualization or to do different types of analysis or look at data in novel ways. What we’re beginning to see at the FCC are more and more parts of the organization following common good practices on how we collect the data, how we manage the data, and how we share the data.
It’s a little bit like herding cats, but as people begin to work with data the same way and as their data literacy increases, what happens is that questions that we have from a policy point of view, it’s much easier for us to get the data, it’s much easier to tap different parts of the organization for analysis of that data.
Start by highlighting some recent successes of the FCC with regards to managing large-scale data, or big data. Why is it important to easily and quickly access this?
I think one of the successes that we’ve had at the FCC, both with smaller-scale data and large-scale data, is that we are moving to better and better ways of gathering that data from the organizations and industries that have to give it to us. We’re lowering the cost and the burden associated with the collecting of that data. And that has a huge advantage for all parties concerned. We’ve gone much more digital, we’re starting to create some interesting tools that make it easier for the parties that are giving us the data to provide the data, and so it lowers their expense.
When it comes to large-scale, the success that the FCC has had with large-scale data, it’s really our mapping efforts that are leading the way here at the FCC, headed up by our geographic information officer Michael Burns. One of the projects that we did over the past few years is a national broadband map, which represented the first ever map of where broadband was in the United States, where it was not, where the providers were, and that was a very large-scale data, it was a large-scale collection of data in that we coordinated with each of the fifty states and several territories to collect this information as part of the Recovery Act, and we put together a map that provided data and that data has provided a significant asset to the FCC as well as for many researchers in the industry.
Some of what, a related project that we did is called “Measuring Broadband America,” in which we crowd-sourced devices into people’s homes, we had volunteers sign up, which were measuring the performance of ISP’s broadband to the network drop in the home. We’re taking multiple measurements several times a day. So not only was that the first kind of scientific measurement of day-to-day broadband performance that had been undertaken in the United States, it also provided an opportunity for government and industry to get together and ask some questions that had not been asked before, along the lines of, “Well how do we actually define broadband performance, and how do we successfully measure that across different types of architectures?” In other words, prior to undertaking that project, industry did not have a common way to actually measure what they were selling to individuals in terms of actual performance. Everything was advertised rates. And so it’s that group coordination of how we define terms, of how we work out different issues, and putting everybody on the same page, so that data begins to have meaning for a whole community.
When it comes to large-scale data, I don’t think that the FCC is operating at the same big-data scale that some real-time collectors of data are, such as NASA or NOA, or some groups that are working with information like the human genomes, but what is happening is we are beginning to work with data that is large-scale to us, data that represents in terms of storage, in terms of number of rows, two or three magnitudes larger than the data sets that we have had before. And so some of our success is simply being able to work with that larger data set, and what that begins to give us it the ability to do analysis across an entire domain much more easily and faster than to only look at representative samples of the domain and project the analysis across the entire domain. I think an excellent example of that would be the low power FM, again another map, low power FM map that we recently published, which, the FCC is allowing more use of low power FM stations, and we’ve done a very interesting analysis where we’re looking at the entire country and finding all the different areas that could support low power FM, and making that available in the form of a map and in related data assets.
Part of your data philosophy is that if agencies are innovating with data, they should also be innovating their processes in order to make agencies more efficient. Expand on what you mean here.
Well I think that agencies right now, and an agency like the Federal Communications Commission which has been collecting data for many, many years before the web revolution, has created internal processes that were around paper-based collections. So you don’t have to look much further than the name of the Paperwork Reduction Act to understand that a lot of the institutional habits for dealing with information kind of assume a paper-based or a form-oriented means of collecting information, when in fact a lot of information is available in digital form and it’s available to be collected passively. So one of the projects that we did here at the FCC was the accessibility clearinghouse, and for the accessibility clearinghouse that we had to put together because of a mandate from congress, we had to gather in a list of accessible communication devices such as mobile phones etcetera, and we needed to have information about those accessibility features.
Ordinarily, agencies would create a rule making and a data collection and go through the PRA process that we could gather this information from manufacturers. What we did instead is we worked with the mobile manufacturers forum, who had already compiled much of this information for their members, and we worked with them so that they would produce a creative commons data feed of the information that they had already collected. Since they produced that data feed and it was creative commons, we were able to simply read that data feed and provide on our website the information about those mobile devices.
That’s a complete reworking of the ordinary process where we’re achieving our mandated goals of having the data and sharing that data, but we’re doing it in a way that is much less time-consuming and produces higher quality than going through a formal data collection process.
I think another lesson, especially about big data, has to do with how agencies look at their own performance. So here at the FCC we have a variety of economists and we have some individuals that are doing very sophisticated optimization studies when it comes to looking at how the spectrum could be used better and when it comes to looking at different options. So we do a lot of simulation, they develop algorithms that run and try out different things. It would be terrific to apply those types of data analysis techniques and data visualization techniques to how we do our internal budgeting process at the FCC, or how other agencies do their internal performance and budgeting processes. Right now we follow a process which has been defined by OMB, which was defined many years ago and is starting to but is not really taking advantage of some of the new tool sets and practices for working with large data.
What advice would you have for other large federal agencies looking to improve their data management capabilities and ultimately to reduce costs?
I’m going to first throw out a few buzzwords because I want people to go check these things out and learn about it. So I’m going to throw out the buzzwords: “Agile,” “DevOps,” “GitHub.” I think that there’s some very interesting trends right now through virtualization of computers where we’re seeing the idea that infrastructure can be treated as code and that infrastructure can be virtualized, whether it’s computer or other type of configurations, and making our computing power as flexible as we are now used to with working with Word documents.
I think similarly, those ideas can be applied to data. We can treat data as software. We don’t have to have our data locked inside of databases. We can have our data out in .cfe files or .xml files. We can track all changes in the data using versioning techniques so that we have a rich history of how data has evolved over time and we can make that data available in highly consumable ways.
I think that if you’re at an agency and you’re publishing data or you’re creating APIs, you have to think from the end user backwards, you have to think about, “How can I make this data really easy to consume?” So instead of giving people a dump of the database and documentation of the field, you give them actual scripts that allow them to put that data in their, you publish the actual scripts that allow people with a couple of clicks to load that data that you’re publishing into their database, or you publish tool kits that go along with your APIs. Because it’s not only for external users but your internal users inside your agency will be able to work faster and more productively if you don’t have individual after individual re-grooming the data and getting it ready for use.
Tell us a little bit about your history within the world of data management.
Well I got started with data more than ten years ago after working very much in communications and with the web, and I became very fascinated with how different people would look at the same data in different ways and different parts of an organization would use data in different ways and very much that, the kind of databases that an organization maintained really defined the type of knowledge about the world that the organization had and the type of questions that the organization could ask. And that brought me into more and more about data and data modeling, became very interested in the relationship between user interface and data, and then over the last half dozen years I’ve gotten very involved with government data, first through the Sunlight Foundation, and over the past three years working at the Federal Communications Commission, inside working for government.
Please feel free to post any comments below. Alternatively, email email@example.com
Military Pay, Vets’ Checks Under Threat in Spending Cuts Battle
Chuck Hagel: The big guns come out to back the would-be Defense Secretary
Military suicides: Alarm as death rates hit record high
Panetta: U.S. and Europe Must Work Together on Cyber Security
General Pollock's Plea to those with Vision Loss
'Agile, DevOps, GitHub:' FCC's Big Data Chief on Improving Capabilities xxx
Obama Speeds up U.S. Troops' Move to Support Role in Afghanistan xxx
Resiliency Training Teaching Soldiers it's OK to Say 'I'm Not Ok' xxx
Maintaining a Combat Ready Force (Insights from the PA National Guard) xxx
The Spreading Threat of Roadside Bombs and other Improvised Explosive Devices (IEDs) xxx