The Data Liberation Project

What

The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.

Why

Vast troves of government data are inaccessible to the people and communities who need them most. These datasets are inaccessible because they’ve never been made public, because they’re published in obscure formats, or because they’re published without the documentation necessary to properly interpret them.

How

Identify: Through its own research, as well as through consultations with journalists, community groups, scholars, government-data experts, and others, the Data Liberation Project aims to identify a large number of datasets worth pursuing.

Obtain: The Data Liberation Project plans to use a wide range of methods to obtain the datasets, including via Freedom of Information Act requests, intervening in lawsuits, web-scraping, and advanced document parsing. To improve public knowledge about government data systems, the Data Liberation Project also files FOIA requests for essential metadata, such as database schemas, record layouts, data dictionaries, user guides, and glossaries.

Reformat: Many datasets are delivered to journalists and the public in difficult-to-use formats. Some may follow arcane conventions or require proprietary software to access, for instance. The Data Liberation Project will convert these datasets into open formats, and restructure them so that they can be more easily examined.

Clean: The Data Liberation Project will not alter the raw records it receives. But when the messiness of datasets inhibits their usefulness, the project will create secondary, “clean” versions of datasets that fix these problems.

Document: Datasets are meaningless without context, and practically useless without documentation. The Data Liberation Project will gather official documentation for each dataset into a central location. It will also fill observed gaps in the documentation through its own research, interviews, and analysis.

Disseminate: The Data Liberation Project will not expect reporters and other members of the public simply to stumble upon these datasets. Instead, it will reach out to the newsrooms and communities that stand to benefit most from the data. The project will host hands-on workshops, webinars, and other events to help others to understand and use the data.

When

The Data Liberation Project launched in September 2022.

Where

The Data Liberation Project is based on the internet, but with a focus on the United States. If you’d like to bring the project’s model to other countries or to a specific US state, get in touch.

Who

As of September 2024, the Data Liberation Project is run by MuckRock and Big Local News.

Past staff

Jeremy Singer-Vine, founder and former director. Jeremy launched the Data Liberation Project in September 2022 and ran it until September 2024, when he handed it over to the current leadership. Before that, from 2014 until early 2022, Jeremy served as the founding data editor for BuzzFeed News, where he championed the publication of open, reproducible data analyses, and contributed to a range of award-winning investigations. Previously, he worked at The Wall Street Journal, where he was named a Pulitzer Prize co-finalist for National Reporting. He also publishes Data Is Plural, a weekly newsletter of useful/curious datasets. He builds and maintains open-source software, including pdfplumber, a tool for liberating data from PDFs.
Mackenzie Peluso, from The New School, interned with the Data Liberation Project for the fall of 2023.
Jake Zucker, from the University of Chicago, interned with the Data Liberation Project for the summer of 2023.
Anna Wilder, from the University of Florida, interned with the Data Liberation Project through the Brechner Center for Freedom of Information from January 2023 – April 2023.

Volunteers

The Data Liberation Project is grateful for the contributions of the following volunteers and FOIA-collaborators (in alphabetical order): Sayli Benadikar, Coline Bodenreider, Arianna Brisco, Kevin Bryson, Gustav Cappaert, Kat Chua, Marc DaCosta, Marco Dalla Stella, Adrienne Damicis, Madeline Everett, Corin Faife, Todd Feathers, Rebecka Flynn, Mike Gartner, Alec Gitelman, Aaron Gordon, Forest Gregg, Rohan Jinturkar, Emily Keller-O’Donnell, Julia Kieserman, Betsy Ladyzhets, Molly Longman, David Mamaril Horowitz, Evelyn Martin, Asako Mikami, Kate Miller, Kim Nguyen, Saul Pwanson, Rob Reid, Joerg Rings, Joseph Andrew Risi, Jake J. Smith, Molly Taylor, Victoria Tran, Urvashi Uberoy, Maddy Varner, and Iris Zhong.

If you’d like to be involved, read more here and get in touch.

Legal assistance

The project is also grateful to have received pro bono legal assistance from the Cornell Law School First Amendment Clinic.

About