The PIT-UN Members React series invites faculty, researchers, and students from PIT-UN institutions to foster collaboration and share ideas on trending topics and recent work within the PIT sphere.
This paper outlines a socio-technical approach to audit automated hiring tools. It introduces a matrix that provides a method for inspecting the assumptions that underpin a system and how they are operationalized technically. In this perspective, the authors develop a matrix for auditing algorithmic decision-making systems (ADSs) used in the hiring domain.
The tool is a socio-technical assessment of hiring ADSs that is aimed at surfacing the underlying assumptions that justify the use of an algorithmic tool and the forms of knowledge or insight they purport to produce. These underlying assumptions, it is argued, are crucial for assessing not only whether an ADS works “as intended,” but also whether the intentions with which the tool was designed are well founded.
Earlier this year, researchers from PIT-UN member institution — Mona Sloane of New York University — and co-authors Emanuel Moss of Cornell Tech and Data & Society Research Institute and Rumman Chowdhury, director of Machine Learning Ethics at Twitter, made the case for a systematic approach for developing socio-technical assessment for hiring algorithmic decision-making systems (ADSs). Their paper, “A Silicon Valley Love Triangle: Hiring Algorithms, Pseudo-Science, and the Quest for Auditability,” was published in several journals.
PIT-UN invited members to spur a dialogue about the findings introduced in the paper. Two PIT practitioners and former graduate students from Carnegie Mellon University, Krystal Alex Jackson and Alisar Mustafa, asked the authors to expand on their ideas in the following in-depth Q&A.
Postdoctoral Researcher at Data & Society and Cornell Tech Digital Life Initiative
Sociologist at New York University and University of Tübigen AI Center
EMANUEL MOSS: Just because playbooks and guidelines already exist doesn’t mean that they are what will be transformed into enforceable regulations. Rather, the regulations will determine what the playbooks and guidelines need to be in order to produce compliance.
EM: Researchers, advocates, and policymakers need to think carefully about what kind of relationships produce accountability in ways that center the public interest. For example, a relationship in which a regulator is tasked with accepting or rejecting an audit is a different kind of relationship than one in which a regulator requires a company to conduct an audit without stipulating what the components of that audit should be. That said, existing playbooks and guidelines offer a useful road map for thinking about what points in the development pipeline require scrutiny and what components of a system need testing. However, much more work is needed to determine what makes for good evidence and what constitutes sound methods.
RUMMAN CHOWDHURY: We’re currently bombarded with various playbooks and guidelines. Researchers, advocates, and policymakers can collaborate with the people at companies to determine which of these playbooks and guidelines are feasible and capture the goals of all the organizations involved.
EM: It depends on what your goal is. If your goal is to encourage developers to think about the implications of what they build, then self-assessment through internal audits may accomplish that goal. But if your goal is to provide stronger guarantees that the systems themselves receive robust scrutiny — in ways that prioritize interests that might be external to the developer — then independent audits might accomplish that goal. While it is certainly easier for companies to assess their own systems, from an access standpoint, that places a lot of emphasis on the good faith of developers and does not provide a check on any entity that would engage in an audit in less than good faith.
RC: I consider them to be on a spectrum in terms of value add, and serving different purposes. Individuals within companies have access, information, and skills that individuals outside companies do not have. Individuals outside companies also have different incentive structures than those inside companies. I view neither as better or worse than the other, but complementary.
EM: It depends what “enforcement” means, but there should be some entity who can pass judgment on whether an audit was done properly and contains all the necessary information. Beyond that, some form of judgment might need to be made about whether [the system] that is being audited is acceptable, from the point of view of the public interest, and who would have the power to say whether or not [the system] should be used, or under what circumstances it can be used. Any number of governmental agencies could be in charge of that, and it makes some sense to empower relevant existing agencies to enforce audits for things that are used in domains they already regulate. This would mean that the Food and Drug Administration might enforce audits of health care ADS, for example.
RC: Fortunately, precedent exists in other industries for audit enforcement. Usually this takes the form of a third-party body and regulatory agency that requires an assessment. What is interesting is how existing regulatory structures will impact who owns enforcement and what they own enforcement of. For example, for countries that have a data protection authority or widespread regulatory ownership of technology, there is the possibility of more sweeping audit regulation. For countries that do not, the process may end up more atomized (e.g., a financial regulatory agency performs financial algorithmic audits, and so on).
EM: I’m not entirely sure I understand this question, but I would say that an ADS doesn’t drive its own values. It drives the values that are designed into it, and one aspect of audit and assessment is to make sure that it behaves in accordance with intended design specifications. Companies should not be deploying ADS they do not have direct control over. Now, many ADS are sold from one company to another, so you can imagine one company using an ADS it didn’t design, and might not have control over, and in that case there is a shared responsibility between the developer and the operator over the impacts of such an ADS.
RC: I’m not sure if I’m answering the question, but the individuals who develop and deploy algorithms choose everything from scope to design to implementation, and therefore they are the ones who drive what values the system has.
EM: This is a really good question, and the epistemological roots portion of the framework is meant to connect exactly what you point to: the long-held practices of resume gathering and evaluation with assumptions about what makes a good candidate. An ADS that parses resumes and sorts candidates who went to highly ranked schools is making an assumption about the connection between college rankings and good employees.
Hiring managers know how to navigate an assumption like this if it’s already part of their everyday hiring practices. But an ADS that purports to examine microexpressions to rank good candidates is making a different set of assumptions, which may be more tenuous or have other implications that hiring managers are not aware of. These assumptions should be made visible through a sociotechnical audit, so that hiring managers can incorporate them into their work practices in ways that are informed by such implications.
RC: The intent of an assessment of the epistemological roots means that hiring managers will have the insights they need to make smart decisions about which systems to use. Today, they don’t have that information or knowledge, and these systems are sold as scientifically sound, when, in fact, some of them are based on unsound pseudoscience.
MONA SLOANE: The question remains if hiring has, in fact, ever been particularly fair. And the jury is still out on the question whether AI-driven hiring tools, in legal terms, discriminate. So this is a tricky question. Professional recruiters are under enormous pressure to bring in talent and retain it. They often have to deal with large amounts of applications, and technology has become crucial for managing the growing pool of applications they receive per job posting. That, however, does not warrant the use of systems that, for example, have been proven to have eugenicist roots. In my opinion, these types of systems should be banned while others, such as resume parsers or large applicant databases, should be iterated upon so that they create maximum possible transparency for recruiters and job seekers and therefore a pathway for recourse, but also more efficiency.
RC: One of the intentions of audits is to determine if they are a threat to fair and legal hiring practices. A weakness in many of the laws that are emerging today is a failure to discuss exactly what an audit is and what passing criteria are. This leads to audits based on disparate impact, which nearly all hiring companies using ADS already passed.
MS: That is a great question. Two things can and must happen: [First,] we will see more regulation kick in that will require checks and balances in the form of AI audits. [And second,] organizations will have to find ways to comply. The matrix can help them to assess a system holistically, and, based on that, make more informed decisions about how to comply with regulation. Internally that means that an organization must decide who takes ownership of the matrix, and what power that individual or entity is equipped with to enforce their assessment and decision, and how they can enforce it.
Table 1. Examples of the type of information and ways of obtaining information for each element of the socio-technical matrix
Questions and method
name of hiring ADS
question: what is the name of the hiring ADS? method: identify from sales copy
select from Bogen and Reike
question: at what stage does this company’s hiring ADS operate? method: identify from sales copy and align with funnel list
question: what is the hiring ADS intended to be used for? method: identify from sales copy, interview developers, and hiring managers who operate the hiring ADS
inventory of data types, datasets, and benchmarking datasets
question: what data, and what types of data, are used in training, testing, and operating the hiring ADS? method: interview developers and hiring managers who operate the hiring ADS and inspect data directly
narrative description, machine learning models, and metadata about models
question: how does the hiring ADS work and what is it optimizing for? method: interview developers and hiring managers who operate the hiring ADS and inspect models, metadata, and product directly
question: why is the hiring ADS useful, what is the assumed relationship between data about an applicant and the goals of the hiring manager, and how does the hiring ADS inform the hiring process? method: interview developers and hiring managers who operate the hiring ADS
question: where do the assumptions made by the hiring ADS come from, what is their intellectual lineage, and what are the critiques of this lineage? method: archival research, interview developers, and hiring managers who operate the hiring ADS, and ethnographic study of hiring managers and developers
MS: Actually, the main focus of the matrix is on data that can be gathered without access to a system or a dataset. The goal here was to facilitate the establishment of stealth audits that can be conducted without requesting access — a shift in the power relationship between organization and auditor.
As for the regulation question, recent and impending regulation includes transparency mandates, including the EU AI Act, the Algorithmic Accountability Act 2022, or more local regulation, such as New York City’s bill on the sales of automated systems used in hiring and employment.
MS: This is an urgent question. My take is that if there is proven discrimination for certain populations, for example, because certain ADS discriminate against people with certain disabilities, then yes, there should be an option to be accommodated. We need to be very clear about what that must look like.
RC: Ideally, yes; practically speaking, it would be difficult to see this done well. Opting out will, by definition, mean the candidate has less information available about them than others who opted in (putting aside the question of whether or not the information is accurate). How do we reconcile this? Also, given the inherent power dynamic, even if there were regulatory assurances, I imagine most candidates would opt in simply to appear more agreeable to their potential employer.
MS: All data is biased, so this problem is not specific to hiring. What needs to be noted, however, is that some of the ADS used in hiring are not even trained on data that has been collected in the context of hiring, for example, social media data for personality assessment. Again, that is not necessarily special and it happens, for example, in the context of precision medicine. What we need to do moving forward, whether it is in the context of hiring or another context, is to use more holistic approaches, such as the one we propose, to consider not just a technology and its impact, but also how it is embedded into social and professional practices. This socio-technical approach is the only way in which we can avoid a technosolutionist approach to addressing algorithmic harm and bias. And that requires interdisciplinary collaboration and stakeholder empowerment, both of which need to be rewarded and funded better.