What

The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest.

Why

Vast troves of government data are inaccessible to the people and communities who need them most. These datasets are inaccessible because they’ve never been made public, because they’re published in obscure formats, or because they’re published without the documentation necessary to properly interpret them.

How

Identify: Through its own research, as well as through consultations with journalists, community groups, scholars, government-data experts, and others, the Data Liberation Project aims to identify a large number of datasets worth pursuing.

Obtain: The Data Liberation Project plans to use a wide range of methods to obtain the datasets, including via Freedom of Information Act requests, intervening in lawsuits, web-scraping, and advanced document parsing. To improve public knowledge about government data systems, the Data Liberation Project also files FOIA requests for essential metadata, such as database schemas, record layouts, data dictionaries, user guides, and glossaries.

Reformat: Many datasets are delivered to journalists and the public in difficult-to-use formats. Some may follow arcane conventions or require proprietary software to access, for instance. The Data Liberation Project will convert these datasets into open formats, and restructure them so that they can be more easily examined.

Clean: The Data Liberation Project will not alter the raw records it receives. But when the messiness of datasets inhibits their usefulness, the project will create secondary, “clean” versions of datasets that fix these problems.

Document: Datasets are meaningless without context, and practically useless without documentation. The Data Liberation Project will gather official documentation for each dataset into a central location. It will also fill observed gaps in the documentation through its own research, interviews, and analysis.

Disseminate: The Data Liberation Project will not expect reporters and other members of the public simply to stumble upon these datasets. Instead, it will reach out to the newsrooms and communities that stand to benefit most from the data. The project will host hands-on workshops, webinars, and other events to help others to understand and use the data.

When

The Data Liberation Project launched in September 2022.

Where

The Data Liberation Project is based on the internet, but with a focus on the United States. If you’d like to bring the project’s model to other countries or to a specific US state, get in touch.

Who

Staff

Past staff

Volunteers

The Data Liberation Project is grateful for the contributions of the following volunteers and FOIA-collaborators (in alphabetical order): Sayli Benadikar, Coline Bodenreider, Arianna Brisco, Gustav Cappaert, Kat Chua, Marc DaCosta, Adrienne Damicis, Madeline Everett, Corin Faife, Rebecka Flynn, Aaron Gordon, Forest Gregg, Rohan Jinturkar, Emily Keller-O’Donnell, Julia Kieserman, Betsy Ladyzhets, Molly Longman, David Mamaril Horowitz, Evelyn Martin, Asako Mikami, Kate Miller, Kim Nguyen, Saul Pwanson, Rob Reid, Joerg Rings, Joseph Andrew Risi, Urvashi Uberoy, and Maddy Varner.

If you’d like to be involved, read more here and get in touch.

The project is also grateful to have received pro bono legal assistance from the Cornell Law School First Amendment Clinic.