b'Viewpointsmechanism and organism to describe the debate. 5 Indeed, an over-emphasis on test error rates is particularly Mechanism names an aspiration for an integrative formalproblematic from the perspective of mitigating algorithmic approach to audit, which holds out the promise of anbias. Consider what happens if a developer is building algorithmic knowledge base. Organism assumes that thea model to predict the likelihood that an individual will whole is always greater than the parts, and that the specificityrepay a loan they are issued. During the testing process, the of knowledge places limits on the mechanistic world view.developer splits the population by demographic background, In recent years, auditing firms have been increasingly pulledand notices that the model is more likely to predict a toward the former approach as they seek to standardize theirpositive outcome for certain groups. One possible reason offerings and manage human resources. 6 for this discrepancy could be that the training data, which is primarily comprised of the credit records of individuals Structure and judgement may appear to be at oddsand their demographic characteristics, disproportionately from the perspective of a company that performs externalrepresents on group. To improve the error rate for other financial audits, but the dichotomy presented by Powersubgroups, the developer might mechanically adjust some of and other scholars actually proves useful in the context ofthe models parameters so that it performs better across all algorithmic auditing. For an individual attempting to evaluategroups. While this practice works as a band-aid solution a machine learning algorithm for bias, both approaches havefor the existing population, it also means that the developer merits. In framing the systematic investigation of bias, risksnever has to question the structural features of the data that present in machine learning tools as an audit. It is worthare feeding the biased results. This means such bias could noting that organizations may initially view the exercise asreemerge as the model is deployed over new demographic intrusive or punitive. As the above discussion of auditingsubgroups.reveals, productive evaluations require collaboration from the people who built and use the model, so it is importantThe preceding example does not imply that the process to actively combat this perception. Rather, the audit- of adjusting model parameters is inappropriate in and of ready organization is one that understands there areitself. Rather, the practice draws attention to the fact that genuine benefits to having an objective extra set of eyesalgorithmic audits are meant to identify sources of bias that look for bias risks in a model. A few cultural aspects canmight not be paid much attention during development. As help facilitate this productive exchange, including clearsuch, it is just as important for an audit to examine inputs consensus around goals and cross-disciplinary inputs into theand their processing as it is to measure outputs (predictions) development process. for accuracy.Principle 2: Examining outputs, Principle 3: Relying on robust internal as well as inputs documentationAnother auditing principle that proves useful forFinally, the tenet of auditing that is perhaps the evaluating algorithms is the notion that a systemmost important for evaluating algorithms is also must be comprehensively assessed for integrity. In somethe most difficult and controversial. Namely, this is the idea ways, this framing actually contradicts the training of datathat in order for audits to occur, the organization in question scientists. As statistician Leo Breiman once wrote, Predictivemust make an effort to document its activities for the accuracy on test sets is the criterion for how good the modelpurposes of later review. Power describes this process quality is. 7The focus on reducing test error rate of a modelwhichas verifiability, or the attribute of information which allows represents performance on data that was not included inqualified individuals working independently of one another the training sethas shaped much of the progress madeto develop essentially similar measures or conclusions from in machine learning over the past two decades. As onean examination of the same evidence, data or records. 9publication from the UC Berkeley School of InformationWith respect to machine learning applications, verifiability notes, A system with poor quality controls may producemight require keeping track of everything ranging from how good outputs by chance, but there may be a high risk of thethe data is cleaned to how individuals are trained to interpret system producing an error unless the controls are improved. 8 and act upon the models results.Accordingly, audits cannot assume that a seemingly correct output from a model is sufficient evidence that theAt a high level, there are two types of challenges related to appropriate inputs were used, particularly when the goal is toproducing auditable machine learning applications. First, minimize systematic bias.90 www.businessofgovernment.org The Business of Government'