An Interdisciplinary Approach to AI Ethics Training

An Interdisciplinary Approach to AI Ethics Training

Data Science & AI

May, 2023

Sina Fazelpour

Sina Fazelpour is an assistant professor of philosophy and computer science at Northeastern University. His research centers on questions concerning values in complex sociotechnical systems that underpin our institutional decision making. He is a core member of the Institute for Experiential AI and co-founded the Intelligence, Data, Ethics and Society (IDEAS) summer institute for undergraduate students.

Sina recently sat down with PITUNiverse Editor Kip Dooley to share progress on the IDEAS summer institute, where undergraduate students learn from world experts on data science, ethics, computer science, philosophy and law about responsible development of data science and AI. The IDEAS institute is supported in its second year in part through a PIT-UN Challenge grant.

What is PIT-UN?

5 Keys to Institutionalizing PIT

Kip Dooley: Sina, you’re about to run the second cohort of an interdisciplinary summer institute on AI. How did the IDEAS institute come about? 

Sina Fazelpour: The motivations were twofold. First, I have both a technical background in engineering and a philosophical background in the values of technology. AI is a sweet spot for me as a practitioner and educator because AI systems very clearly create both benefits and burdens, whether in the context of allocating medical resources or hiring or some other domain. It is always going to be a complicated issue. Technologists working on AI need to be able to ensure that these systems simultaneously work in ways that respect privacy, lead to just and fair outcomes, and are robust in their performance. This is a very complex task, and we really don’t yet have good models for how to do it well. 

One of the key things missing from the puzzle is an interdisciplinary perspective. We cannot approach these problems from solely a technical perspective, nor solely a humanistic or philosophical perspective. A technical assessment without ethical considerations is insufficient, and you really can’t assess these systems well ethically without knowing at least some of the technical details. Interdisciplinarity is a key skill we need to cultivate for public interest technologists, but our institutions, generally speaking, are behind on this. 

When engineering students take ethics, it’s usually focused on on what not to do.

Most undergraduates interested in technology don’t receive the type of instruction that will prepare them to approach issues from an interdisciplinary perspective. Engineering students have to take an ethics course, but it’s usually focused on how you, as a professional engineer, can avoid breaking rules. They focus on what not to do. They don’t teach you what you ought to do in your practice as an engineer. What values should you consider when designing a product? What ethical processes should you embed in the entire process? We don’t train people how to do this, and that’s extremely problematic.

As a result, when we try to convene interdisciplinary teams (in academia or in industry), people often lack a shared language to even talk to each other. And perhaps even more fundamentally, they don’t know when they have to talk to each other. Engineers might come to a product launch thinking they are all done, only to find that some kind of ethicist or regulator is telling them how the product can or cannot be used. The engineers haven’t considered that throughout the design and development, they have made choices — their own choices! — that are permeated with certain values and ethical assumptions.

So the first motivation for the IDEAS institute was to make sure that we introduced this type of interdisciplinary way of thinking about values and technology at an earlier stage of development for our students, so that interdisciplinary thinking and dialogue is second nature for them by the time they graduate.

The second motivation was about broadening participation in the field of AI and technology development more generally. We know there are significant issues of underrepresentation of different groups, both in scientific disciplines and in the humanities. Both fields need to become more inclusive, and the environments more welcoming to different identities, value sets, and experiences. 

Why? Well, if you pay attention to the headlines, you’ll know that the harms of technology are not equally distributed. They disproportionately fall on members of historically disadvantaged groups. We want to make sure that people who are particularly affected by emerging technologies are among those making the decisions about how they are developed, deployed, and governed. This could mean making technical decisions, making philosophical decisions, legal decisions, regulatory decisions — technology touches every aspect of society, which is what public interest technology is trying to grapple with. We want to enrich the decision-making pipeline.

The IDEAS Summer Institute will take place in two locations this summer: Northeastern and UC San Diego.

For sourcing the guest speakers and creating this interdisciplinary program, did you already have connections with people in different disciplines? How did you bring together people from such a range of disciplines?

Coming from a very interdisciplinary background really helps. In my Ph.D. program at the University of British Columbia, I was in the Philosophy Department, but I was working with neuroscientists and computer scientists. My postdoc at Carnegie Mellon was in philosophy, but I had a secondary appointment in machine learning. So those relationships proved very helpful both in terms of guest speakers and in shaping the program. 

But to be honest, in the first year when funding was scarce, I just invited a bunch of my computer science and philosophy friends to come stay at my place for the week. It was really thanks to the generosity of my friends, who were willing to spend their own money to travel here and stay with me. 

 

We all need a little help from our friends. … How will the program be different this year? What do you hope to build on from the pilot?

On the final day last year, the students were so excited to take what they’d learned and to write a paper, make a video for social media, or design a product. I thought, “OK, the program needs to be two weeks.” The first week will provide the necessary technical background and also the philosophical background about fairness, justice, privacy, and in the second week they can work on group projects and presentations. 

The Network Challenge funding will allow us to do two full weeks. It will be more impactful in terms of training, because the students will actually get to do something with the theoretical background.

We’ll also look to enrich the mentorship piece this year. Last year, we just had guest faculty; this year we’ll also have graduate students who will serve as mentors. Throughout the two weeks, the students will have time to talk to their mentors about their projects and also ask questions about what life looks like in academia or industry. They’ll have the opportunity to build networks. 

We’ll also be inviting faculty from other PIT-UN schools, particularly ones that don’t have programs like this. Here at Northeastern, we have one of the highest densities of people working on the ethics of artificial intelligence. We want to share with others how to run these kinds of sessions, so they can create their own courses and programs and distribute this multidisciplinary ethics training across different types of institutions, not just the ones with a specialty like ours. 

 

To learn more about the IDEAS Institute, visit their website, or the website of Sina Fazelpour.

Code for Charlottesville Teams up with Civil Rights Advocates

Code for Charlottesville volunteers present findings at the Data Justice Academy

Code for Charlottesville Teams Up
with Civil Rights Advocates

Data Science & AI 

May, 2023

Jonathan Kropko, professor of data science at the University of Virginia

Author: Jonathan Kropko is an Assistant Professor at the School of Data Science. His research interests include civic technology, remote environmental sensing, survival and time series analysis, and missing data imputation. He also leads Code for Charlottesville, the local chapter of Code for America that invites the community to volunteer on important issues.

The Problem

In the U.S., it is unconstitutional for someone to be tried multiple times for the same crime. So why then are people with criminal records punished again and again for past convictions — and even for past charges that did not result in conviction?

Anytime an individual charged with a crime appears in a district or circuit court, the charge creates a criminal record that can be found by the general public. In Virginia, these records can be accessed online in a matter of seconds, facilitating widespread criminal background checks in employment, housing, banking, and other decisions about whether to provide basic services. Schiavo (1969, p. 540) calls this practice “multiple social jeopardy” because although it is unconstitutional for a defendant to stand trial multiple times for the same charge, a person with a criminal record is punished by society over and over again through the withholding of basic services and opportunities. The result is a permanent underclass of people who are denied access to the resources and pathways they need to rebuild their livelihoods. 

A growing movement, led by legal aid societies such as the Legal Aid Justice Center in Charlottesville, Virginia, and nonprofit organizations such as Nolef Turns, advocates for these criminal records to be destroyed (through a process called criminal record expungement) or hidden from public view (what’s known as record sealing). Both expungement and record sealing have been shown to reduce recidivism, which is, ostensibly, an ultimate goal of the justice and corrections systems. 

Prior to 2021, only dismissals and cases of mistaken identity were eligible for criminal record sealing in Virginia. Even then, a qualifying individual had to complete a lengthy and costly petition process. Virginia enacted a law in 2021 that for the first time provided for automatic sealing of criminal records and extended eligibility for sealing to certain low-level convictions, such as possession of marijuana. The law goes into effect in 2025.

While the law represents real progress, it also comes with many restrictions and caveats: an individual can have no more than two records sealed over their lifetime; they must have no arrests or charges in the past three years; they must have no prior convictions; they must wait seven years with no additional convictions in order for the record to be sealed; and more.

All of which begs the question: How many people will actually qualify to have their records sealed once the law takes effect? Answering this question would help advocates decide where and how to focus their lobbying efforts, to ensure that the new law will in fact apply to the maximum number of people with records that deserve to be expunged or sealed.

The Project

Code for Charlottesville, a volunteer group of tech professionals and students that I lead, worked with the Legal Aid Justice Center and Nolef Turns to apply the tools of public interest technology to help answer this question. 

Our task was simple, but not easy: collect all public records from the Virginia district and circuit criminal courts between 2009 and 2020; anonymize the records; and then count the number of records that would qualify for automatic sealing or petition sealing. 

For any PIT project, it’s important to ask what data is available, how it was collected, and if there are any privacy concerns. 

Code for Charlottesville volunteers present findings at the Data Justice Academy
Code for Charlottesville volunteers present findings at the University of Virginia Data Justice Academy

We used bulk data scraped from the web by Ben Schoenfeld, a computer engineer and civic tech enthusiast. While the current Online Case Information System 2.0 bans web scraping, Ben collected the data from version 1.0 of the system, which had no such restriction, and replaced individual defendants’ names and dates of birth with an anonymized numeric ID. This allowed us to use the entirety of a defendant’s record without knowing the defendant’s identity. Because the data was anonymized, we were confident that the solutions we built would not cause further harm to the people in the database.

In total, the data contains more than 9 million individual court records and more than 3 million different defendants. Code for Charlottesville volunteers built a decision tree-based classifier to translate all of the restrictions in the law into logical conditions that can be evaluated quickly by a code compiler. This function takes in all of a person’s court records and outputs a list that identifies which of the records would qualify to be automatically sealed, which would be eligible to be sealed by petition, and which would be ineligible for sealing.

The Impact

According to our findings, more than 1.4 million records from 2009 to 2020 will immediately qualify for automatic record sealing once the law is implemented in 2025. More than 1 million additional records will become eligible if the individuals with these records avoid any convictions for the rest of a wait period. And 3 million more cases will, immediately or pending a wait period, be eligible for sealing by petition

We used our model to calculate how many more people would be eligible for record sealing if specific restrictions were loosened or removed. We even broke these counts down to the level of the Virginia House of Delegates or Senate district so that the Legal Aid Justice Center could show a delegate or senator the results for their district, making the impact directly visible to the decision makers.

The LAJC used our results in discussions with the Virginia House and Senate to advocate for specific changes to the 2021 law that would expand record sealing access to even more people. This project demonstrates how public interest technology — even when the group of workers is small — can provide right-sized tech tools that support democracy and advance justice.

GAEIA: A Global Collaboration to Grow Tech Ethics

Cal Poly's Digital Transformation hub

GAEIA: Building the Future of AI Ethics

Data Science & AI

May, 2023

Soren Jorgensen, cofounder of the Global Alliance for Ethics and Impacts of Advanced Technologies

Author: Søren Jørgensen is a Fellow at the Center for Human Rights and International Justice at Stanford University, and a co-founder of GAEIA. He founded the strategy firm ForestAvenue, which is based in Silicon Valley, Brussels, and Copenhagen, and previously served as the Consul General of Denmark for California.

Elise St. John, co-founder of the Global Alliance for Ethics and Impacts of Advanced technologies

Author: Elise St. John heads Academic Programs and Partnerships at California Polytechnic State University’s New Programs and Digital Transformation Hub, and is a co-founder of GAEIA. She builds and manages cross-disciplinary teams, and designs and leads research and innovation projects that utilize advanced technologies across sectors.

Since ChatGPT’s release in November 2022, public awareness of AI ethics and implications has exploded. As companies and lawmakers grasp for resources to meet this moment with clear and comprehensible strategies for weighing AI’s risks and rewards, what do we in the academy have to offer them?

In 2021, we (Søren Juul Jørgensen, Stanford, and Elise St. John, Cal Poly) launched the Global Alliance for Ethics and Impacts of Advanced Technologies (GAEIA), an interdisciplinary and multicultural collaboration to help companies and governments systematically consider the risks and benefits of AI. We’re excited to share with our PIT-UN colleagues some insights and resources from our journey with GAEIA, and possible directions for growth and expansion.

Each year, GAEIA convenes a cohort of international researchers to collaborate with industry experts to investigate new, pressing ethical considerations in technology use and to develop methodologies and training tools for weighing risks and benefits. Our work is guided by a few key principles:

Changing cultures and norms within industries and companies is just as important as developing strong oversight and regulation of the tech industry. Diversity of geography, culture, race/ethnicity, gender, and values is of paramount importance in creating our methodologies and training tools.

Interdisciplinary collaboration is key to our work and to the future of ethical technology development, deployment, and governance. Here is what these principles have looked like in action.

 

Culture Change

I (Søren Jørgensen) worked in and alongside tech startups during the “move fast and break things” era of Silicon Valley’s early 2010s. Having experienced firsthand how damaging this ethos could be, I moved into a fellowship at Stanford, doing research and advising companies on ethics considerations. A German insurance company CEO said something to me in one of my early conversations at Stanford that really stuck with me: “Please, no more guidelines!”

Of course we need guidelines, but his point was that guidelines without culture change are just another set of rules for corporate compliance. How do you develop a company culture where people care about and understand the risks of technology? Our hypothesis with GAEIA is that companies need simple, iterative processes for collaborative ethical assessment and learning. 

Guidelines without culture change are just another set of rules for corporate compliance.

The first tool we developed is a simple template to iteratively assess the ethics of a technology by asking the kinds of questions that public interest technology prompts us to consider:

  • What is the problem we’re trying to solve with this technology?
  • How does the technology work, in simple terms?
  • How is data being collected and/or used?
  • Who is at risk, and who stands to gain?
  • What is our business interest here?
  • Is it fair? Is it right? Is it good?
  • What action should we take, and how will we communicate our actions?
  • How will we evaluate the impact and apply these insights?

To effectively pressure test this model, my colleague Elise St. John and I knew we needed a diverse, interdisciplinary global cohort of collaborators to mitigate against the kinds of bias and reductive thinking that cause so many tech-based harms in the first place.

The Need for Diversity

I (Elise St. John) joined Søren in 2021 to help organize and operationalize the first global network of collaborators, which would focus on the use of AI and advanced technologies in the financial sector. My background is in education policy research with a focus on issues of equity and the unintended outcomes of well-meaning policies. It lent itself quite well actually to the examination of unintended impacts of advanced technologies. At Cal Poly, I work in digital innovation and convene cross-disciplinary student groups to work on real-world public sector challenges through Cal Poly’s Digital Transformation (Dx)Hub

Images courtesy of Cal Poly

Cal Poly's Digital Transformation hub

When I reviewed the literature and explored the various academic groups studying tech ethics and the social impacts of financial technology at the time, it became apparent how very Western-centric this work was. Because public interest technology asks us to engage the voices and perspectives of those most exposed to and impacted by technological harms, we knew that the network we convened needed to be international and multicultural. This consideration is especially urgent vis-a-vis AI systems because they have the capacity to marginalize and silence entire populations and cultures, and to exacerbate existing inequalities, in totally automated and indiscernible ways. 

Our first cohort consisted of over 50 M.A.- and Ph.D.-level researchers representing Africa, the Americas, Asia, and Europe. Using the DxHub model, we broke them up into five groups, each of which worked with an industry adviser to consider real-world ethical dilemmas that companies are facing, using the GAEIA template. In biweekly meetings, the scholars and industry advisers discussed both new and potential ethical dilemmas that fintech services and novel data sources, for example, might inadvertently create. The advisers also spanned geographical regions, further diversifying the ethical frameworks and industry perspectives brought to the conversation, and then we also came together in monthly inspiration sessions to meet with other leading thinkers on ethics, AI, and fintech. 

Public interest technology asks us to engage the voices and perspectives of those most exposed to and impacted by technological harms.

The value of a truly global and diverse cohort was evident at several points. For example, one of the students introduced an ethical dilemma associated with “buy now/pay later” services. The consensus among many of the Western participants was that such services carry too much risk for users and are inherently prone to exploitation. A student from one of the African nations pushed back on this assessment, though, pointing out the opportunities that these systems could hold for the roughly 45% of people in sub-Saharan Africa who are unbanked. This opened up space for weighing the pros and cons of such a technology in different cultural and economic contexts, and it led to further conversations about the role of regulation vs. innovation, for example. These were very humbling and important moments, and they were exactly the kinds of conversations that need to become the norm in technology development, deployment, and governance.

We also had participants from Kenya, Brazil, and India, which are very exposed to climate disasters, develop a Global South working group. In our current cohort, students in Turkey and Ukraine who are living through natural disasters and war have also built connections and held separate meetings to explore how AI tools might provide swift and effective trauma relief in future crises.

Tech's Future Must Be Interdisciplinary

We intentionally recruited participants from across disciplines. Our two cohorts have featured M.A. and Ph.D. students from engineering, finance, law, philosophy, psychology, and more. Fundamentally, we want our students to be able to speak across many disciplinary languages. Technology is not just the domain of computer programmers. It is embedded in all aspects of society and the organizations where we work. Human resources managers have to understand how to communicate with engineers; product managers have to know enough about psychology to ask the right questions about enticement and deception; entrepreneurs need to be able to consult sociologists about the impacts of technologies on different communities. The list goes on. 

We believe that an interdisciplinary approach is not a “nice to have” but a “need to have” for businesses going forward. There’s a growing understanding of the potential risks that businesses face when they don’t have robust ethical decision-making processes: high fines (especially in the European Union), reputational risk among consumers and investors, and the demand from current and prospective employees that companies do no harm and live out good values. 

Having worked with hundreds of organizations during our careers, we can say with confidence that most of them don’t want to do bad things. They fundamentally want to understand risks and avoid them, which is why we’re designing the GAEIA resources and platform within the aspirational frameworks of learning and culture change, not corporate compliance. You can find good examples of how this approach has worked in the education sector. When educators are encouraged to develop genuine inquiry-oriented approaches to data use and systems change in response to accountability measures, they become invested in the accountability process and changing outcomes. Similarly, we want leaders and employees to be invested in ethical decision making, to set real metrics that not only ensure legal compliance but also lead to products and services that are profitable while at the same time aligning with the public interest.

What's Next for our Global Cohort

This work started as a project during the COVID-19 pandemic. At the outset, we didn’t know it would turn into a recurring cohort-based model and that we would further develop the model with the formation of GAEIA. In the first year, students were Zooming in from lockdown and quarantine and were sharing their diverse experiences as the waves of COVID-19 spanned the globe. 

The project’s goal was to break down institutional and sector-specific silos, and bring together a cross-disciplinary, global group of scholars to develop a pipeline of leaders versed in the ethics of advanced technology use. We got that and so much more. 

We are currently collaborating with people at the Center for Financial Access, Inclusion and Research at Tec de Monterrey (Mexico), who have expressed interest in forming a GAEIA chapter for undergraduates, and we are working now with Strathmore University Business School in Kenya on the development of a certification program. There is an emerging network not unlike PIT-UN that can help universities around the world build capacity and support for PIT research and curricula. 

We should also mention the inherent value of building a tech ethicist community across cultures and geographies. The students independently set up social hours on Zoom that were  structured around simple, fun aspects of culture like favorite foods and music. Students from China, Kenya, Germany, and the U.S. would show up on Zoom, whether it was 6 a.m. or 6 p.m. locally, with their favorite beverage. Getting to know more about each other’s lived realities, and bonding over simple human activities, even while far away, is the ground for understanding how AI and advanced technologies affect each of us in distinct ways.

Grounding Principles for Understanding and Regulating AI

Grounding Principles for Understanding and Regulating AI

Data Science & AI

May, 2023

Author: Maria Fillippelli is the Data Director for the Southern Economic Advancement Project, and a former Public Interest Technology Census Fellow with New America. As a PIT Fellow, she developed and led a strategy to assist dozens of national, state, and local organizations and governments navigate the technical changes to the 2020 Census.

The full piece, excerpted below, is available on the New America website.

A few weeks ago my yoga instructor asked me after class about the hype surrounding ChatGPT and generative AI. Did I think it really was a watershed moment in humanity? It was early in the day, and my immediate response was that only history can determine if this is a watershed moment. However, I added, the actions we take now to understand and weigh the pros and cons of generative AI are incredibly important.

He nodded thoughtfully, and seemed to be gathering his thoughts for a follow-up question, but it didn’t come. As the day wore on, I realized that my answer was clear but probably insufficient. My yoga instructor wasn’t really looking for a single answer; he was, like many of us, looking for a framework to sort through the immense swirl of claims, counterclaims, hype, and critique about generative AI that has been unleashed since ChatGPT’s release in November 2022.

And I realized that I hadn’t seen much in the way of useful frameworks for experts and nonexperts alike to evaluate generative AI products…[continue reading on New America’s website].

Higher Education and Generative AI: Evolving Lessons from the Field

Higher Education and Generative AI
Higher Education and Generative AI

Higher Education and Generative AI:
Evolving lessons from the field

Online Event  |  April 20, 2023

Read Transcript (Excerpt)

Like many fields, Higher Education is grappling with the immense challenges posed by Generative AI tools like ChatGPT. While the rapidly unfolding changes to teaching, learning, publishing, and career readiness may feel overwhelming and unprecedented, lessons from the field of public interest technology (PIT) provide opportunities to reframe the challenges across multiple sectors.

In this one-hour webinar, experts from the Public Interest Technology University Network (PIT-UN)  share insights from history, computer science, sociology and pedagogy to contextualize our current moment, and prompt innovative thought and action 

Below the video is an excerpted transcript, which you can download for use in educational settings here.

Play Video

Renée Cummings (Moderator) teaches at the University of Virginia School of Data Science, and has counseled many organizations, from local school districts to the World Economic Forum, about how to understand and respond to artificial intelligence. 

Meredith Broussard is a leading AI researcher and data journalist who teaches at NYU. Her newest book, More Than a Glitch, takes a deep dive into the myths and assumptions that lead to tech-enabled bias.

Todd Richmond leads the Tech + Narrative Lab at Pardee RAND. He invites us to imagine how we can transform our teaching and learning methodologies to suit the interdisciplinary nature of how we work, learn, and live our lives.

Vanessa Parli is Research Director for Stanford’s Institute on Human-Centered AI, where she leads an interdisciplinary effort to map emerging trends in AI development and foster holistic understanding of this emerging technology. 

Resources from our panelists:

Meredith Broussard

In the classroom, we need to clarify for our students what AI is and isn’t. That’s where all of our conversations need to start. 

AI is math. It’s very complicated, beautiful math. But it is just that – math. Real AI is not what we see on screens, coming out of Hollywood. Real AI is not fantasies of artificial general intelligence. Today’s generative AI systems are computer systems that work in specific ways. Generally, you feed a pile of data into a computer, the computer makes a model of the mathematical patterns in the data, and then you can use that model to make new decisions. It’s complicated, but it is possible to understand the process. So we need to start with a shared understanding of the real and what’s imaginary.

We also need to think about the current moment as part of a larger history of technology. We’re in a hype cycle right now around AI, especially around generative AI, and people are saying things like “oh, it’s going to change everything. It’s going to make new jobs. It’s going to eliminate jobs. It’s going to change education forever.” It is going to change a couple of things. But this is not the invention of fire. It’s just another AI program.

The hype cycle does not have room for ambiguity, and ambiguity and bias and social problems are everywhere when it comes to AI. Social problems like structural discrimination, racism, sexism, ableism – all of these problems are reflected in the data that we’re using to train AI systems. Then these problems get crystalized in code, and become very hard to see and hard to eliminate. Which is why we need more investment in the work of algorithmic accountability reporters. Algorithmic accountability journalism is a newer kind of journalism that is about holding algorithms and their makers accountable.

In terms of what people should be thinking about for syllabi, I would be remiss if I didn’t suggest my own books, which explain AI in plain language and introduce some of the complex social issues we’re discussing in the public interest technology space. Those titles are More Than A Glitch: Confronting Race, Gender, and Ability Bias in Tech and Artificial Unintelligence: How Computers Misunderstand the World. In addition, the first place that I always send people is to the resource list at the Center for Critical Race and Digital Studies

I also really like The Tech That Comes Next: How Changemakers, Philanthropists, and Technologists Can Build an Equitable World by Amy Sample Ward and Afua Bruce. This PIT reading would pair nicely with a generative AI activity to get students thinking about larger ethical implications aside from the ubiquitous narrative of “this is new technology, isn’t it so exciting?”

 

Todd Richmond

I take a little bit of a different viewpoint than Meredith. I do think this is a sea change, just like the internet was a sea change for humanity. And I think it’s time for a fundamental rethink of what constitutes a human endeavor.

If education is supposed to prepare our students for the future, we need to understand what the future is going to look like, and what metrics for success are going to look like. How do we prepare them for that world? Six months ago, we were not having these conversations about generative AI, or at least not to the level that we are having them now. You can attribute some of that to the hype cycle, but you can also attribute it to the fact that the rate of change is astronomical.

Back in January, we did a quick and dirty task analysis with three of my graduate students to assess how generative AI could be used in their work. Our scorecard was that 97 percent of the tasks that our graduate students do can have some generative AI aspect. Some of them are using it to summarize dense academic articles they read for classes.

I think about mechanical, operational, and conceptual tasks. ChatGPT is pretty darn good at some mechanical and operational things. It can do calculus. The conceptual part, however, is a weakness. I describe generative AI s a food processor that has access to every vegetable. You ask it to make pico de gallo, it’ll make pico de gallo. But you don’t know who’s recipe it pulled, and it doesn’t know what pico de gallo tastes like – but it knows how other people have described it.

With this in mind, do students need to know higher math? My economics colleague would argue that yes, they need to know calculus. But again, is that just a mechanical skill they need, or one that helps them develop a conceptual skill? You can go back to the arguments around calculators in the classroom, but this is a much bigger fish to fry.

Finally, we have writing. Is writing, and by extension reading, going away? That’s a question that we have to ask, especially given the fact that I have very smart grad students who are very skilled, who are using ChatGPT to do some of their reading. This is the really important philosophical question: what are human endeavors? We have seen that as technologies come in, if they are compelling and convenient, they will replace what came before it. So this is an opportunity to really think deeply about “what, how and why” humans do the things that they do.

 

Vanessa Parli 

At Stanford’s Institute for Human Centered Artificial Intelligence, we believe that interdisciplinary collaboration is essential in ensuring these technologies benefit all of us. That interdisciplinary mindset is reflected in our faculty leadership that comes from medicine, science, engineering, humanities, social sciences. It is also reflected in all of our programming. The annual AI Index, a report covering many topics summarizing the state of AI, is made up of a steering committee of experts from academia, industry and government

While the report is over 300 pages, I will provide some report highlights to perhaps spark interest. The majority of AI systems are developed in the U.S., Canada, the EU and China. What might this mean for the rest of the world? Certain values, cultures, norms are embedded within these technologies, and then they’re distributed across the world, where not everyone has the same culture, norms, etc. Do we want to be doing that? How can we build these systems so that norms can be adjusted or modified, depending on where you are in the world? That is one of the many reasons we need diverse perspectives participating in all phases of development of these technologies.

Computer Science PhD students are gradually becoming more diverse, and undergraduate students even more so, however I would personally say that this is not good enough, and we still have a long way to go. The portion of new women in AI PhD programs has remained at around 20 percent, and women are making up an increasingly greater portion of computer science faculty. But again, I personally don’t think this is good enough.

 Meredith Broussard

One of the things that gets lost in the hype around generative AI is the incredibly toxic nature of the training data used to create these systems. In this case, it’s data scraped from the open web, which has a lot of really wonderful stuff and a lot of really toxic stuff.

A recent analysis by the Washington Post looks at what, specifically, are the sites that make up the Common Crawl, which is the data set that is used to feed ChatGPT and other big generative AI systems. They’re all drinking from the same well. There aren’t that many places to get massive data sets, and everybody’s pretty much using the same stuff.

Generative AI systems that use Common Crawl are being fed with text from 4Chan, with data scraped from StormFront, and other sites that publish hate speech. There’s a lot of toxic material in the training data, and a lot of copyrighted material. You really need to be careful about trusting these generative AI systems. 

I think that ethical use of generative AI starts with emphasizing that this is just a tool. It has zero foundation in truth. It is given to hallucinations; when a generative writes something, there is no guarantee that it is generating information that is true. It makes up citations, for example. So even though it looks like it’s working, it’s not necessarily working the way a student needs it to in order to learn class material. People have a lot of trouble with that. If you use generative AI to summarize a complex scholarly article, great– that’s a good way to get started on a challenging task. But you should still go back and read the original scholarly article, because there is no guarantee that the summary is accurate.

 

Todd Richmond

I think “entirely” is the key word. I don’t think that everything that ChatGPT spits out is nonsense, because you can fact-check it. You can’t trace back to see where the original sources were, but the graduate student who used it for his reading did a sanity check on it, and found that  it was a pretty accurate distillation of the original source material.

One area where we need to think through equity and ethics is intellectual property and fair use. Someone recently created a fake Drake song using generative AI that went viral. Almost all of these generative AI companies are making the same claims that the search engine companies made, which is “well, the stuff on the Internet is out there. It’s free to use, for anything we want to use it for.” For search engines, it was one thing to index it. It’s a completely different thing to use that information to build an algorithm which will now essentially create derivative work of the original work. That’s much more problematic.

There needs to be an entire new field (here’s a framework we use at Pardee RAND) focused on creating emerging technology that has equity and ethics at its core, instead of toxicity and disinformation.

 

Vanessa Parli 

I would add that ethics needs to start from the very beginning. Computer science graduates working on these systems also need to be trained in ethics. At Stanford there’s a program called Embedded Ethics, where all the computer science students in their core courses are taught ethics modules in the hope that that impacts their thinking as they go on and develop these technologies.

For the grant funding we do at HAI, applicants need to write an ethics statement as part of their application. They have to show that they’re thinking about the ethical implications, the societal implications. If this technology were to be ubiquitous, how might it play out and what adjustments are made in the research approach to be sure outcomes are positive? Those statements are reviewed by an interdisciplinary panel of experts again from medicine, philosophy, computer science, etc. and a lot of times, there is iteration on the research methodology. Sometimes it’s decided that part of the research maybe should not go forward. 

Meredith Broussard

It’s important to keep in mind the context where AI is used. Let’s take facial recognition AI. A low-risk use might be using facial recognition to unlock your phone. A high-risk use might be law enforcement using facial recognition on real-time video feeds as part of surveillance, because it’s going to misidentify people with darker skin, thereby contributing to harassment and over-policing. The context is key.

I would really like to see more algorithmic auditing. Algorithmic auditing is something that we talk a lot about in PIT circles. It’s the work of opening up black boxes, interrogating algorithms to look at where the biases are. All you have to do is look for biases, and you’ll find them. I would love to see algorithmic auditing integrated into regular ethics reviews, and to have ongoing monitoring of technology as new iterations are rolled out. We need to monitor the technologies to make sure not only that the technologies are not biased to begin with, but that the bias that is mathematically mediated is then not added back in in future iterations. So that’s a kind of technical feature of algorithmic governance that is going to help in implementing high-level policies.

 

Todd Richmond

The key question is, can you trace it back? Can you open up the black box, and can you start to trace how the algorithms are working? Because when you have algorithms rewriting themselves, the people who set them into motion don’t really know what’s going on under the hood.

We’re big fans of red teaming, and I was heartened to see that Open AI is doing red teaming on their algorithms. The red team process is usually done to try and figure out where your vulnerabilities are, and commercial companies do it for a competitive advantage. 

A few years ago, we started arguing for narrative red teaming. LA City announced that they were going to release all their 311 data to the public as part of a transparency effort, and I was immediately horrified because the 311 data, taken out of context, allows you to construct very toxic narratives when you combine it with demographic census data. We have to think about how data might be weaponized. With narrative red-teaming, if you’re going to release a report or you’re going to release data, you work through how people could weaponize that data, and then prepare messaging in advance, and maybe conduct narrative inoculation. We have a grad student working on this with Russian troll posts and how you inoculate against disinformation campaigns.

For the governance piece, the thing that I find most challenging is that most of us are sitting in the U.S., and we have a very U.S.-centric viewpoint of this. This is a global phenomenon. At Pardee RAND, we do a lot of national security work, where we worry about adversaries. Not all of the countries that are developing these technologies have the same moral compass that we have. There’s an asymmetric governance problem. It’s great if we want to pause AI development, but the problem is that other people are not going to pause it. It’s a very messy, complex global problem set when you talk about governance for these emerging technologies.

 

Vanessa Parli 

There’s also this aspect of developing community norms. These technologies are moving so fast. Our governments don’t move so fast.What do we as a community of researchers and computer scientists want for this technology? There’s the example of CRISPR, where once those in the development phase of that technology realized its impact, they developed their own community norms which had nothing to do with the federal government. When we develop these technologies, what is appropriate to release? What types of documentation should be released with these models?

 

Todd Richmond

One thing I just want to tag on is that we should be very careful when we say “community”. We need to cast a wide net for our stakeholders that includes artists. My wife is faculty at Cal Arts, and I’ve spoken to her students about generative image algorithms. They stand to lose a lot in this, and that community of practice needs to not just be the computer scientist and the technical folks, but it needs to bring in the arts and the humanities because they are very real stakeholders in this equation.

Todd Richmond

Plagiarism is nothing new, and what we’re seeing in schools is a supersonic version of plagiarism. Humans do what humans do. So we can look to the past to see how humans behave badly. It’s just that. The problem is that digital technology scales in a way we’ve never seen before. So the speed at which those problems propagate and the scope of those problems is drastically different now. It’s going to be faster, and it’s going to be on a bigger scale.

 

Meredith Broussard 

I don’t know if I agree with the idea that it’s scaling in a way we’ve never seen before. We’ve been doing this for 30 years now. Digital technology scales, yes – it’s not dramatic, it’s not new.

But in terms of plagiarism, you’re absolutely right, plagiarism is nothing new. I’ve seen some interesting work about how to inoculate students against cheating with ChatGPT, like creating iterative assignments. So, you can do in-class writing, and have assignments that build on the work done previously. That’s a little bit more challenging. 

Another assignment that I’ve seen a lot is when instructors have the students use ChatGPT and then critique the output. That’s been a really useful exercise for critical thinking around technology. I don’t think we can eliminate cheating entirely. 

We can reevaluate what we are trying to do by asking students to have closed book exams. Are we requiring them to memorize something for the sake of memorizing it? Can we design experiments and exams that are, say, open book? Can we design assignments that acknowledge that there is generative AI out there and that the students use it?

Todd Richmond

I won’t comment on what needs to be done, but I do want to point out a resource that Stanford’s Center for Research on Foundation Models has recently developed, called ecosystem graphs, where they try to map each of these different generative AI systems to find where is the data coming from, what system is built upon, so that you can can see what’s going on, and perhaps better identify where some of the bias, etc. might be.

 

Meredith Broussard 

If you want to know what’s in GPT3, you can go and read the academic papers. What you’ll find there is that this is trained on Common Crawl, and for its self-censorship it’s using Real Toxicity Prompts to find bad words in the data set. The information is not super secret. People like to pretend that it’s super secret, but it’s not all that secret.

 

Todd Richmond

Well, the weighting is the secret sauce.

 

Meredith Broussard 

The weighting is secret. But most people are not messing around with weights, most people are just interested in what generative AI is being trained on. The fact that it’s being trained on Reddit data is really helpful to know, because you can look at Reddit, and you can say, “oh that’s a cesspool. It’s pretty interesting, but it’s also a cesspool. So maybe this generative AI is going to spew some filth that I do not agree with.” That’s the level of transparency that I think a lot of people would be pretty happy with.

Todd Richmond

Not only our students, but our researchers are using it to write code. It is very good at Python, and if it works, if it’s good enough, then it will get adopted. Some students are using it to do literature reviews. Sometimes it’s good at lit reviews, and sometimes it’s not good. They’re using it to do outlining for their dissertations, to help them think through how they sequence things. Some of them are using it for summation analysis. You can put text in and ask it, “What are the takeaways?” And if ChatGPT doesn’t synthesize your prose the way you expect, then maybe you didn’t write very clearly in the first place. So it’s kind of like having another writer to check your work. 

We’ve also been connecting it to agent-based modeling systems. You can run really interesting experiments with agent-based models that are driven by ChatGPT inputs and outputs. So there’s a lot of really interesting stuff that can be done, and we’re just scratching the surface at this point.

Todd Richmond

It holds the promise of doing tasks that are boring, rote, and not stimulating, and giving humans more free time. That said, I have yet to see “more free time” as an outcome of any technology advancement in the past. 

We’re seeing amazing capabilities. When I was a biochemist, I did protein structure function. We wished that we could imagine, given a DNA sequence, what a protein structure looked like. AI is solving those problems. Is it 100 percent correct? No. But it’s figured out thousands of  structures, and it gives us a starting point where the humans can come in and do really interesting work that builds upon that. It has the power to do a lot of stuff that we wish we were able to do, but don’t have the time or don’t have the patience to do. 

The challenge will be, is it accurate? Is it equitable? And is it good enough for the humans to then use and build on? That’s an open question.

 

Meredith Broussard

I’ll push back on that notion a little bit. I. We are about 30 years into the technological era, so we need to add nuance to our assumptions about technology. I wouldn’t assume that there is a “thing” about AI that is going to be helpful for a society. I would not assume that AI is going to be all good or all bad. I would just encourage people to add nuance to your understanding of it. It’s not about binaries anymore. And in general, I’m really optimistic about the field of public interest technology as a way of helping people understand all of the nuances and all of the potential implications of new technologies.

 

Vanessa Parli

There is a lot of promise, but we don’t know what we don’t know. We really need to think about how we want to use these tools. What are humans not good at, that maybe the tech is better? And vice versa, what are humans better at? How do we want to pair up and use these tools to create exciting work and opportunities? We should be thinking in that way especially since there’s a lot of hype out there, as Meredith said.

Explore Thought Leadership & Insights from Across PIT-UN

How Public Interest Tech Principles Can Shape the Future of Data Science and Artificial Intelligence

Higher Education and Generative AI

Public Interest Tech Principles Can Shape the Future
of Data Science and Artificial Intelligence

Data Science & AI

May, 2023

Public Interest Technologist Afua Bruce

Author: Afua Bruce is the founder of the ANB Advisory Group, co-author of The Tech That Comes Next and former Director of Engineering at New America Public Interest Technology. In early 2023, ANB Advisory Group conducted a scan of data science for impact programs at PIT-UN member institutions, and also conducted a review of data science projects that have received PIT-UN Challenge funding.

It has been more than a decade since Harvard Business Review declared the profession of data scientist to be the “sexiest job of the century.” Since then, we have seen industry embrace data science as businesses seek ways to differentiate themselves using insights and predictions based on data of their consumers, their markets, and their own organizations. Accordingly, research into data science has increased, and academic institutions have created a number of credentialed programs and research institutes for students and faculty. Data science’s ability to positively impact the speed of operations and efficiency of organizations has been proven. However, as many scholars, practitioners, and advocates have pointed out, that same speed and efficiency can also magnify social inequities and public harms.

Higher Education and Generative AI
An April 2023 PIT-UN webinar explored challenges and opportunities in higher education posed by generative AI

At the same time, the field of artificial intelligence has greatly expanded, as has its embrace by industry and the general public. AI now streamlines how organizations take notes, process payroll , recommend products to clients, and much, much more. Recent product releases and headlines about artificial general intelligence (the theoretical possibility that AI could perform any task humans can perform) have spurred a new round of conversations about how AI could transform human society — or destroy it altogether, depending on one’s perspective. 

With widespread use of AI, the workforce will certainly shift as some tasks and perhaps even entire jobs will be performed by AI systems. Many colleges have made significant investments in AI research programs. Many institutions have recognized the importance of training students in how to design and develop AI systems, as well as how to operate in a world where AI is prevalent. And once again, many scholars, practitioners, and advocates have warned that without more intentional and ethical designs, AI systems will harm, erase, or exclude marginalized populations.

The Intersection of Data Science, AI and Public Interest Technology

Data science and artificial intelligence are two separate, but related, computational fields. As Rice University’s Computer Science department describes:

While there is debate about the definitions of data science vs. artificial intelligence, AI is a sub-discipline of computer science focused on building computers with flexible intelligence capable of solving complex problems using data, learning from those solutions, and making replicable decisions at scale.

Data scientists contribute to the growth and development of AI. They create algorithms designed to learn patterns and correlations from data, which AI can use to create predictive models that generate insight from data. Data scientists also use AI as a tool to understand data and inform business decision-making.

Practically, at some institutions, data science and artificial intelligence programs are sometimes seen as competitors for talent and funding, sometimes seen as collaborators, and sometimes remain organizationally separate. As both data science and artificial intelligence garner more and more attention from universities, students, and employers, we must ask ourselves how to balance the promise and excitement of these fields with the need to develop the associated algorithms responsibly. When systems can automatically have an impact on who is eligible to be hired or promoted, who gets access to housing, or who can receive medical treatments, those designing the systems must understand how to approach problems with not just efficiency and profitability in mind, but also equity, justice and the public good. 

Public interest technology provides a framework to tackle these challenges. “By deliberately aiming to protect and secure our collective need for justice, dignity, and autonomy, PIT asks us to consider the values and codes of conduct that bind us together as a society,” reads an excerpt from PIT-UN’s core documents. 

What could it mean for designers and implementers of data science and AI to “advance the public interest in a way that generates public benefits and promotes the public good”? Public interest technology provides a way to ask, research, and address the following key questions:

  • How do technologists ensure the tools they design are deployed and governed responsibly within business, government, and wider societal contexts?
  • What data sets and training data are being used to design these systems? Do they represent the nuance of human populations and lived experience? Are they representative enough to ensure that analyses or predictions based on the data will be fair and just?
  • How do decisions made early in the data science life cycle affect the ultimate efficacy and responsiveness of systems?
  • How will acceptable accuracy rates be determined for different applications? 
  • Are there ways to turn on and off the algorithms as needed?
  • What accountability structures and auditing systems can be built to ensure the fairness of data science and AI algorithms across industries?

Examples of Public Interest Data Science and AI

Over the past several years, an increasing number of academic institutions have recognized the importance of applying data science and AI in the public interest and have created extracurricular clubs, classroom and experiential courses, and certificate and degree programs that train students to consider how data science and AI affect communities in different ways and how these tools can be designed and deployed in new, beneficial ways. 

A field scan by the ANB Advisory Group shows that students at PIT-UN institutions are learning vital historical context, working on interdisciplinary teams, and translating data insights into language that policymakers, community organizations, and businesses can understand.

For example, in Boston University’s BU Spark!, five program staff assign students to teams and manage semester-to-semester relationships with government agencies and nonprofit organizations. Students have used data science to conduct sentiment analysis of Twitter feeds for a national civil rights organization and regularly provide data analysis for the Boston City Council. Over 3,000 students have learned how to work with real-world, messy data, and how solving data problems can contribute to solving larger organizational or societal problems. In addition to technical courses, students learn critical sociological skills, such as how to understand race in the context of using census data. BU Spark! is one of many programs throughout PIT-UN members demonstrating that labs (including summer programs and practical courses) are an effective way for students to learn public interest tech ideas in real-world contexts and to practice co-design and co-development with affected community partners. 

Penn State’s “AI for Good, Experiential Learning & Innovation for PIT” program was one of a handful of PIT-UN grantees to train both college students and working professionals in the ethics and techniques of artificial intelligence. The program developed a new slate of experiential learning opportunities for college students, along with an online microcredential course for professionals in any sector. While it is important to train the next generation of technologists, we must also consider how to train today’s leaders and decision makers. 

Similarly, Carnegie Mellon University launched a Public Interest Technology Certificate program in 2022. Geared toward employees in all levels of government, the six-month program trained its first cohort in data management, digital innovation, and AI leadership “to create a more efficient, transparent, and inclusive government.” Training mid-career professionals while also building a PIT network that can inform and support their work can lead to real-world impact well beyond the walls of the university.

Key Lessons & Recommendations

These are just two of the many projects across PIT-UN applying a public interest framework to data science and AI challenges. And universities can do even more. Although the development and use of data science and AI differ, some of the application settings and opportunities to affect change have similar underlying challenges. Therefore, the following three recommendations can apply to both data science and AI programs.

1. Produce recommendations for policy work

As federal, state, and local policies and initiatives encourage the advancement of data, government agencies will seek not just support in accessing data, but also access to advanced data science tools to make data actionable. Miami Dade College, for example, worked with the nonprofit Code for South Florida, Microsoft, and the city of Miami to create a participatory web app that helps Miami residents become informed contributors to the city’s budget. In their 2019-2020 PIT-UN Challenge project, MDC created a GIS certificate course for underrepresented students to contribute to mapping the impacts of climate change.

Using data science to make clear policy recommendations or create policy products — especially in collaboration with other stakeholders — is a great way to provide students with experiential learning opportunities while also increasing the reach and impact of public interest tech’s core ideas. 

2. Define PIT competencies for data science and AI

As colleges and universities create and expand both data science programs and AI, both students and professors seek courses grounded in strong research and clear outcomes. Projects such as Georgia State’s Public Interest Data Literacy Initiatives have created individual courses that offer PIT frameworks for data science and AI. We are at a point where PIT-UN schools could collaborate to create an inclusive set of standard competencies. Such standardization could lend more credence and visibility to PIT degrees and could be a prototype for standards required of all data science and AI practitioners regardless of sector.

3. Structure meaningful internships & experiential programs

Students — and even faculty — seek practical experience that they can put on their resumes, describe to potential employers, and use to forge cross-sector partnerships. PIT-UN has consistently funded experiential learning projects to strengthen the pipeline of technologists who understand how to apply data science and AI in the public interest. 

Columbia University and Lehman College’s Public Interest Technology Data Science Corps placed college students in teams to use data science to support New York City agency projects to improve the lives of local residents. Ohio State University placed PIT fellows in state government to encourage young technologists to consider public service, while fostering a culture of collaboration between the public sector and academia. These are just two examples of how meaningful internships and experiential learning speak to the interests of students and faculty while growing PIT’s public reputation. 

Our Task Going Forward

The sustained interest in and excitement about both data science and artificial intelligence bodes well for the future of academic programs dedicated to these concepts. More significantly, the ways in which industries and community organizations operate will change and be changed because of advancements with these technologies. 

Making these changes more positive than negative, and actively reducing adverse disparities, will require sustained work, new ways of training practitioners, and usable recommendations and tools to shape a more just technology ecosystem. Public interest technology’s emphasis on equity and justice provides the necessary lens to guide the development and use of these technologies. As PIT-UN Program Manager Brenda Mora Perea reminds us, it is our job to keep these concepts at the center of all we do and to advocate for social responsibility at every stage of technology design, deployment, and governance.