Code for Charlottesville Teams Up
with Civil Rights Advocates
Data Science & AI
Author: Jonathan Kropko is an Assistant Professor at the School of Data Science. His research interests include civic technology, remote environmental sensing, survival and time series analysis, and missing data imputation. He also leads Code for Charlottesville, the local chapter of Code for America that invites the community to volunteer on important issues.
In the U.S., it is unconstitutional for someone to be tried multiple times for the same crime. So why then are people with criminal records punished again and again for past convictions — and even for past charges that did not result in conviction?
Anytime an individual charged with a crime appears in a district or circuit court, the charge creates a criminal record that can be found by the general public. In Virginia, these records can be accessed online in a matter of seconds, facilitating widespread criminal background checks in employment, housing, banking, and other decisions about whether to provide basic services. Schiavo (1969, p. 540) calls this practice “multiple social jeopardy” because although it is unconstitutional for a defendant to stand trial multiple times for the same charge, a person with a criminal record is punished by society over and over again through the withholding of basic services and opportunities. The result is a permanent underclass of people who are denied access to the resources and pathways they need to rebuild their livelihoods.
A growing movement, led by legal aid societies such as the Legal Aid Justice Center in Charlottesville, Virginia, and nonprofit organizations such as Nolef Turns, advocates for these criminal records to be destroyed (through a process called criminal record expungement) or hidden from public view (what’s known as record sealing). Both expungement and record sealing have been shown to reduce recidivism, which is, ostensibly, an ultimate goal of the justice and corrections systems.
Prior to 2021, only dismissals and cases of mistaken identity were eligible for criminal record sealing in Virginia. Even then, a qualifying individual had to complete a lengthy and costly petition process. Virginia enacted a law in 2021 that for the first time provided for automatic sealing of criminal records and extended eligibility for sealing to certain low-level convictions, such as possession of marijuana. The law goes into effect in 2025.
While the law represents real progress, it also comes with many restrictions and caveats: an individual can have no more than two records sealed over their lifetime; they must have no arrests or charges in the past three years; they must have no prior convictions; they must wait seven years with no additional convictions in order for the record to be sealed; and more.
All of which begs the question: How many people will actually qualify to have their records sealed once the law takes effect? Answering this question would help advocates decide where and how to focus their lobbying efforts, to ensure that the new law will in fact apply to the maximum number of people with records that deserve to be expunged or sealed.
Code for Charlottesville, a volunteer group of tech professionals and students that I lead, worked with the Legal Aid Justice Center and Nolef Turns to apply the tools of public interest technology to help answer this question.
Our task was simple, but not easy: collect all public records from the Virginia district and circuit criminal courts between 2009 and 2020; anonymize the records; and then count the number of records that would qualify for automatic sealing or petition sealing.
For any PIT project, it’s important to ask what data is available, how it was collected, and if there are any privacy concerns.
We used bulk data scraped from the web by Ben Schoenfeld, a computer engineer and civic tech enthusiast. While the current Online Case Information System 2.0 bans web scraping, Ben collected the data from version 1.0 of the system, which had no such restriction, and replaced individual defendants’ names and dates of birth with an anonymized numeric ID. This allowed us to use the entirety of a defendant’s record without knowing the defendant’s identity. Because the data was anonymized, we were confident that the solutions we built would not cause further harm to the people in the database.
In total, the data contains more than 9 million individual court records and more than 3 million different defendants. Code for Charlottesville volunteers built a decision tree-based classifier to translate all of the restrictions in the law into logical conditions that can be evaluated quickly by a code compiler. This function takes in all of a person’s court records and outputs a list that identifies which of the records would qualify to be automatically sealed, which would be eligible to be sealed by petition, and which would be ineligible for sealing.
According to our findings, more than 1.4 million records from 2009 to 2020 will immediately qualify for automatic record sealing once the law is implemented in 2025. More than 1 million additional records will become eligible if the individuals with these records avoid any convictions for the rest of a wait period. And 3 million more cases will, immediately or pending a wait period, be eligible for sealing by petition
We used our model to calculate how many more people would be eligible for record sealing if specific restrictions were loosened or removed. We even broke these counts down to the level of the Virginia House of Delegates or Senate district so that the Legal Aid Justice Center could show a delegate or senator the results for their district, making the impact directly visible to the decision makers.
The LAJC used our results in discussions with the Virginia House and Senate to advocate for specific changes to the 2021 law that would expand record sealing access to even more people. This project demonstrates how public interest technology — even when the group of workers is small — can provide right-sized tech tools that support democracy and advance justice.