The Legacies We Create with Data and AI
Data Science & AI
Renée Cummings is a Professor of Practice at the University of Virginia focused on the responsible, ethical and equitable use of data and artificial intelligence (AI) technology. She has spoken to groups around the country and the world, from local school districts to the EU Parliament about the risks and rewards of AI.
She recently sat down with PITUNiverse Editor Kip Dooley to reflect on a big year, and share the frameworks that her audiences have found most helpful for understanding data science and AI.
Kip Dooley: Renée, you’ve had a busy year of speaking engagements with everyone from local school districts to the World Economic Forum. What topics and areas of expertise have you been counseling people about?
Renée Cummings: So much is happening around AI and data science right now, and so quickly. The tools are rolling out with such frequency and ferocity that I have been called upon to discuss everything from generative AI to oversight, compliance, regulation, governance, and enforcement.
I always emphasize thinking about the “three R’s”: the risks, the rewards, and of course, the rights, whether talking about how we integrate AI into education systems, social systems, business — you name it.
In terms of your approach to data science and your professional journey, what do you think has prepared you to offer helpful advice at this moment?
I bring what I call an interdisciplinary imagination to data science. My work is not only in criminal justice, but also in psychology, trauma studies, disability rights, therapeutic jurisprudence, risk management, crisis communication, media, and more. My work is about future-proofing justice, fairness, and equity as we reimagine society, systems, and social relationships with AI. My work is also about public education, building awareness around the impact of data on society and democratizing data so we understand the power of data and how to use that power responsibly and wisely and in the interest of the public good as we design responsible AI.
I focus not only on doing good data science, but also on leveraging data science in ways that are equitable and ethical, using data science to build equitable and enduring legacies of success. How can we use this technology to ensure that groups and communities thrive? The goal is to build more equity into our systems, and in the ways in which we design data-driven solutions. It’s really about using data science to build more sustainable and resilient legacies.
“Legacy” is an interesting choice of word when talking about AI and data science. Tell me more about why you use that word in particular.
When we design an algorithm, we have the opportunity to use these tools of measurement as a means to enhance access, opportunity, and resources for communities — now, and for generations to come. Data is like DNA. How it’s collected and used determines the kinds of algorithms we can design, which are more and more determining access to resources and opportunities.
Unfortunately, what we’ve seen for the most part are algorithms that deny legacies, that deny access to resources and opportunities for particular communities, because of issues like the lack of diversity in technology. What we find is that bias, discrimination, and systemic racism are amplified, and certain communities don’t get equal access to resources. What data does is shift power, particularly at the level of systems. Data is about power. Data doesn’t have the luxury of historical amnesia.
What are some examples that illustrate this idea, that data is about power?
We can start with the mere fact that most of the world’s data is owned by five companies. Those companies have created thriving legacies, billion-dollar legacies. They are the ones that governments need to negotiate with over tech regulation, compliance, and enforcement. Furthermore, they set the agenda for what we talk about. We’re all talking about generative AI now, and how it could change — or already is changing — the game, from industry to education. It’s all about power.
Looking at the mad rush to acquire data, from a criminal justice perspective, we’re starting to consider data brokers as cartels, traffickers, smugglers. Think of how companies scrape all kinds of data from the internet to feed large language models. This is creating new systems of power, placing a small group of individuals and companies at the helm of decision making around who has access to resources.
You’ve been studying these systems for a long time. How have the questions shifted with the sudden onset of generative AI and the explosion of generative AI applications?
Generative AI is just a tool, and it can do us some good. But who has access to it? I just had a speaking engagement at a university in Kentucky where many students do not have internet access at home. So when we’re deploying technologies like generative AI, or we’re building smart cities, we’re only focusing on certain geographical spaces that have access to them. Is it going to widen the digital divide? The conversation happening in the U.S. and Europe about how to legislate generative AI is not engaging the Global South.
We also have to ask whether or not it’s just a lot of hype, because we see the many contractual issues with adopting generative technologies into corporations, or the federal government, because of intellectual property rights. The [Federal Trade Commission] recently was very direct and instructive, talking about generative AI and deception and disinformation.
I think that primarily it has amplified the questions we have been asking for a very long time, not created new questions. Although it does pose a new threat. And there’s just so much conversation, so much to keep track of, that a lot of people are overwhelmed at the moment.
For the people who are overwhelmed, what are some things you try to help them reorient toward in order to make the problems and the questions feel a little bit more manageable or at least digestible?
I often say that AI is a new language, and it’s important to become literate in that language. There’s a certain level of digital proficiency we’ll need to be able to function in society as these technologies continue to spread.
It’s also important to understand that this is not a new technology, it’s nearly 67 years old. The improvements in machine learning and deep learning and neural networks have advanced within the past 10 years and have brought forth very successful iterations, but it’s not a new concept.
These tools can assist you and bring an extraordinary amount of effectiveness or even excellence in the way you do your work. But there are challenges: accountability, transparency, explainability. We’re not able to truly audit these technologies. We’ve got to enter into this space with a certain amount of sobriety instead of being totally overwhelmed.
I often tell people to just breathe and to play with the tools. Use curiosity. Be curious enough to know about the technology and how to use it, with the knowledge that it’s changing the world around you. How is technology changing your world? This is the backdrop we can use to discuss the need for more regulation and governance in the tech space more broadly.
In this environment, where it feels like the public has little or no say in how these technologies are designed and governed, what are the areas or levers that you see as promising areas of intervention?
It’s important to remember that we have a very solid community of AI ethicists and activists working in this space who have the capability and competency to design rigorous and robust guardrails. But a lot of people, the public, don’t understand that AI ethics, tech ethics, and data ethics even exist and that we all have rights in this digital space. Many of the technologies being developed and deployed impact our civil rights, our civil liberties, our human rights.
When we bring rights to the fore, people wake up. When people understand there’s a technology making decisions about them, and they don’t have an opportunity to participate in those decisions, they start to think about what they can do and what they need to do. They start to think about the lack of agency and autonomy. We all have a right to privacy, to self-actualization, self-expression, self-determination. We also have the right to equal opportunities. These are hard-won rights that people usually are not so willing to give up.
Again, that concept of the “three R’s” — risks, rewards, and rights — can bring us back to these key questions.
To what extent can algorithms create equity? Where do you see positive gains that have been made, or possibilities, for algorithmic systems to protect rights and create equity?
One area is government corruption and procurement. Through algorithms, you can account for every dollar and track fraud and corruption through government systems. Every dollar that is stolen through corruption is a dollar that taxpayers don’t have access to, that children don’t have access to.
Algorithms can help us visualize data around human trafficking and migration, and crisis intervention in times of war and national emergencies. We’ve seen really solid work being done around natural disasters like hurricanes and volcanic eruptions. There’s been some research looking at the effects of earthquake aftershocks in Haiti — encoding buildings and visualizing where and how destruction could take place.
One other area I can point to, given my background in criminology, is organizations like the Innocence Project looking at ways to deploy algorithms to find cases where there could be wrongful convictions, or records that should be expunged. At UVA, through my data activism residency, we’re developing a tool called the Digital Force Index, which will help people see how much surveillance technology is being deployed in their communities.
Unfortunately in policing and the criminal legal/criminal justice system, tools like predictive policing have really not delivered on their promises. We hope that tools like the Digital Force Index will spur a more informed, community-led conversation around police budgets, the right to know how much is spent on surveillance tools, where in communities they are being deployed, and whether these tools are truly enhancing public safety or simply vanity projects. The Digital Force Index is the heart of public interest technology.
What questions or best practices would you like to see technology designers take on as part of their responsibilities?
What always wakes my students up is that concept of legacy: What is the legacy you are designing and deploying? That brings them back to their social responsibility. Data scientists, whether we’re working on services, systems, or products on behalf of the collective, we are designing futures. What is your legacy? What is the legacy of your family, your community, your generation?
They start to think about questions around diversity and inclusion, equity, trauma, and justice. How are we traumatizing certain groups with technology? How can we bring a trauma-informed and justice-oriented approach to the ways we’re using data? We have to understand that different communities experience data differently. We don’t want to do data science in ways that will replicate past pain and trauma.
Data carries memory, a troublesome inheritance for particular communities. Those painful past decisions are trapped in the memory of the data, opening some deep social wounds as we attempt to use data to resolve very pressing social challenges and social questions. If we use historical data sets to build tools like large language models, which have been developed with toxic data scraped off the internet, what we risk doing is retraumatizing, revictimizing, groups that have tried so hard to find ways to heal. I’m always trying to get students to ask how we can use data to help communities heal, thrive, and build resilient and sustainable legacies.