This article is transferred from the power of the machine,Original address

About the author: Tong Chao, product director of Chuangxin Qizhi, responsible for product planning and management.He graduated from George Washington University with an MBA degree and obtained a Master of Science degree from the Hong Kong University of Science and Technology.

Recently, the AI ​​ethics scientist Timnit Gebru was fired by the big man Jeff Dean. There was no clear conclusion in the debate between the two.But from the conversations with friends, what interests everyone is not the non-causality of this matter, but the position of Gebru:AI Ethics Scientist, What role is this, and what is its role? What is the relationship between AI and "ethics"... and what does AI with "ethics" need to look like?


Why does AI intersect with the social science of ethics

AI is a natural science, and ethics is a relatively pure social science. Students with some research backgrounds will find that when the two intersect, we are likely to face a philosophical problem... Here we are Without discussing the philosophical logic that makes people entangled and brain-burning, it only involves why AI produces such a problem that is different from traditional technology.


The nature of "uncertainty" makes AI's decision-making inherently controversial

We open the AI ​​algorithm (machine learning) box, and we will find that the output of any AI model can be decomposed into a series of decimals-probability. For example, the result of AI is outputting someone’s credit card transaction is a fraud, in fact the output of the model The result is P (theft brush| may be a conditional probability) = 0.971.This kind of decision output is obviously different from the "iron proof" and "iron facts" that we generally accept. This inherent uncertainty will allow AI to participate in or even make some important decisions. To doubt, even controversy.Then, when we use the capabilities of AI, how can we clarify or accept this uncertainty is particularly important.More importantly, understanding and accepting this uncertainty is not only a technical problem, but also a cognitive problem that requires complete and comprehensive thinking.A controversial example, better than Google, will still make such a serious mistake: identifying our black friends as gorillas... Because of such mistakes, how much public relations resources does Google need to make up for such "uncertainty" Output?


The emergence of AI has transformed the decision-making body and will inevitably bring about ethical challenges

When we let go of our hands and feet and handed ourselves over to an L4 self-driving vehicle, it is clear that the vehicle decision-making body on the road has shifted: from the driver to the vehicle, or more specifically-AI.At the same time, we should be more aware thatThe shift of the decision-making body will bring about the shift of responsibility, Whether it is vehicle safety, road safety or pedestrian safety, AI will bear or should be the subject of such responsibility.Let's suppose that on a highway, an unmanned vehicle has an accident, hitting a pedestrian or a deer, then who will bear the ultimate responsibility?Is the manufacturer of the vehicle responsible?Do companies that develop AI need to be held accountable?Are individuals responsible for collecting and labeling data for AI?Undoubtedly, if we extend such a system logic to any industry, especially scenarios related to key human behaviors, this rapid replacement of AI will bring a re-examination of ethics and legislation to ensure that we are safe and worry-free. Get along with AI.


Extra – Regarding the Executive Order of the U.S. Government on Trustworthy AI

On December 12, the White House issued an administrative order for AI applications (the definition of administrative order, please make up your own lessons)

"Executive Order on Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government"


In summary, the White House requires all AI applications related to the U.S. government budget to ensure that the following principles are followed:

(1) Lawful and respectful of Nation's value – Serve national interests

(2) Purposeful and performance-driven-practical first

(3) Accurate, reliable and effective-excellent performance

(4) Safe, secure and resilient-safety guarantee

(5) Understandable-users, especially SME (Subject Matter Expert) can understand

(6) Responsible and traceable-AI's power and responsibility system is clear

(7) Regularly monitored-AI can be tracked throughout the entire process

(8) Transparent-100% transparency for supervision

(9) Accountable-Organizational system guarantee around AI

The above principles all put forward a completely new level of requirements for the design, construction, and operation of AI. Not only must it be usable, easy to use, but also dare to use, use it with confidence, and use it fairly.What is clear is that in the ongoing technological war, the United States is already laying out a more complete AI system.


Reliable AI system composition

Here, we use a typical AI process to observe what elements a credible AI system needs to include, and use the next few subsections to expand the specific content separately.


Trusted AI system-starting from data

Speaking of Timnit Gebru, we might as well start with one of her papers to explore how to build a credible, AI-usable data resource, which can also explain part of the work of AI ethics scientists.The following structure partly refers to Gebru's paper "Datasheets for Datasets", and the specific content is no longer marked:

Data for machine learning can be compared to how important food is to humans.When we start an AI project, the first step must be to clarify the data resources involved and establish a complete life cycle archive for the data set we need, which can help everyone involved in the AI ​​project (including data engineers, algorithm engineers, Business experts) have a comprehensive and consistent understanding of this most important asset.Here we use aStatic (also an existing limitation of this method)Make records in a way, guide the required information through a series of questions, and gradually form a data sheet (Datasheet) of the data set.From the point of view of the data usage process, what needs to be covered is:


(1) Planning stage:Before using the data set, it is necessary to make a clear explanation of the purpose of the data use and the composition of the data, to ensure that the people who use the data in the future can have the most direct understanding of the purpose of creating the data, and to improve the "transparency" of the complete AI process degree

purpose of usage:

  • Is there a specific task to be completed for the purpose of creating the data set?Use a paragraph to describe this task;
  • Who is the person in charge of creating the data set, and where does the person in charge represent or where is the part or organization?
  • Who is the provider or resource provider of the data set?

Data composition:

  • What are the entities that make up the data set?Pictures, files, natural person information, or a combination of several entity types?List all entity types;
  • In the data set, how many different types of entities are there?
  • The established data set contains the most and most complete records that can be obtained?Or is it a part of the data set extracted from the complete record? What is the extraction method?
  • An example of a piece of original data composition of the data set;
  • Data and whether there are some labels (Label) or target variables?
  • Are there missing fields or information in each record?What is the reason for the missing?
  • Are there any significant connections between the records of the data set?Such as social network connection?
  • Are there any suggestions for data set segmentation strategies?How to divide the proportions and methods of training, validation, and testing;
  • Whether there are some recording errors, inherent noise or redundant information in the data set;
  • Is the composition of the data set independent?If you rely on external sources to compose data, what is the external source, is the input of external data continued, and is there a risk of obtaining it?
  • Does the composition of the data set contain confidential information?If yes, please list the confidential fields and descriptions;
  • Does the data set contain personal information?If yes, please list the information description related to the individual;
  • Does the data set contain some statistical information about the population, such as age, gender, etc.?
  • Is it possible for the data set to identify and locate individuals by any method, such as a combination of several pieces of information?
  • Does the data set report some other sensitive information, such as national security, corporate revenue, etc.?

(2) Use stage:In the data use phase, clear and detailed data information can significantly reduce the cold start time of data engineers and algorithm engineers, and improve the efficiency of communication within and between teams.

data collection:

  • How is each record of the data set collected, directly from the business system or indirectly calculated from other data sets?
  • Is the method of collecting each record of the data set through sensors, software systems, APIs, or manual entry?Whether these methods can verify the effectiveness and safety;
  • If the data set comes from sampling of the original data, what is the sampling strategy?
  • Who are the participants in the data set collection process?How do these people get paid?
  • What is the time period for the data set acquisition?
  • Has the collection process of the data set been reviewed and verified by an institution or organization?
  • If the data set involves individuals, you need to return to the following questions (if not, you can skip it):
    • Is the method of obtaining personal data directly authorized by the party concerned, or is it obtained from a third-party channel?What are the channels of acquisition;
    • If it is obtained directly from an individual, does the party know and agree that the data will be collected and used by the model;
    • If the party agrees, is there a mechanism to ensure that he (she) can revoke this authorization at any time?
    • Is there an analysis of the impact on individuals of this AI model that uses personal data?


  • Any preprocessing or labeling techniques used in the data set, such as missing value processing, feature extraction, data binning, etc.;
  • After preprocessing, is the original data set also saved synchronously?
  • Does the preprocessing or labeling process use any software or tools?What is the name of the tool;

Data usage:

  • Has the data set been used in other AI tasks?If yes, list the tasks and describe them;
  • A list of papers or systems related to the use of the data set;
  • Is this data set likely to be used in other tasks?
  • Will the data set composition or the records of the preprocessing stage affect the use stage, specifically, the modeling will cause unfair treatment to some specific teams or individuals?
  • Are there tasks in the dataset that should not be used?

(3) Operation stage:At this stage, the data set directly provides service capabilities to users or scenarios after analysis or modeling, and continuously updates and tracks the data set, which can manage the effect and performance of the data and model in time, and also ensure the credibility of the application.

Data distribution:

  • Will the data set be authorized to be distributed to a third party, and who is the authorized party?
  • What is the distribution method of the dataset, tarball, GitHub, API or other?
  • When will the data set be distributed?
  • Will the distribution of data sets involve intellectual property rights or terms of use and agreements?
  • When the data set is distributed, will the collection of the original data be associated with intellectual property rights?
  • When the data set is distributed, is there an expert or person in charge to supervise the distribution process and be responsible for the results of the distribution?

data maintenance:

  • Who is in charge of maintaining the data set?Contact information of the person in charge?
  • Are there any errata in the data set and how can I access it?
  • Will the data set be updated and what will be updated?
  • If it involves the use of personal data, is there a period of data use or other restrictions?
  • Is there a version of the data set?What is the management method of the new and old versions? Does the data set of the old version still need to be maintained?
  • When it comes to the update/maintenance of the dataset, what is the contribution mechanism of the contributors of the dataset?

A complete example can refer to the sample in the paper

The above questions are as detailed as possible to ensure that when we observe and use data, we canTechnology, business, business, law, ethicsMultiple dimensions to analyze the credibility of data in the complete AI process.


Trusted AI system-the "black box" of the model

Regarding the level of trust in AI, in addition to the investigation of the original data, what is even more elusive is the seemingly mysterious and complex machine learning models. If you cannot understand or accept these models, the results output by the models will inevitably be too. Questioned.Such problems are particularly obvious in some public issues related to social resources, such as employment, medical care, financial services, and law enforcement.There is such an example in the United States. Some district court judges have begun to use AI models to predict the sentence of suspects. We will not discuss the justice of this application for the time being. Assuming that it is legally and morally acceptable, it can be used by judges with confidence. A core consideration is whether they can understand the AI ​​model and be able to think that the judgments made by AI are consistent with their logic.


Here, I will mention Gebru again. In the research, her Google team and scholars from the University of Toronto jointly proposed a method similar to Datasheets-Model Cards.The model card is a record description attached to pages 1-2 of several released machine learning models, used to explain the multi-dimensional information of the model to the corresponding readers, in order to open the "black box" of the AI ​​model.

A model card can be divided into the following elements:


Model details:

  • Model developer (organization): the responsible natural person or team;
  • Model release date: the date and time when the model was released and online;
  • Model version: the version number of the model when it was released;
  • Model type: traditional machine learning model, deep model, etc.;
  • Model references or reproduced papers, or open source tools and sources for reference;
  • Model reference specifications;
  • License to use the model;
  • Feedback on the model (contact information);

Model use cases:

  • User use cases or scenario descriptions of model applications, such as anti-fraud in credit card transactions in real time;
  • The users (groups) for which the model is applied;
  • Restrictions or prohibited use cases of model application, for example, which groups cannot be used for face recognition;

Model elements:

  • Description of groups applicable to the model: especially those involving natural persons, clearly define applicable groups, as well as model performance and differences between different groups;
  • Description of external equipment or system to which the model applies: such as someCVIn different scenarios, different camera models may cause differences in model performance;
  • Description of the external environment that the model is suitable for: Similarly, in the CV scene, different lighting conditions, ambient temperature and humidity may cause the model's effect to drift or even error;
  • Other "silent" elements: In addition to the above elements, there are also elements that may be hidden under the surface that will affect the effect of the model;
  • Evaluation elements: Will the design of some evaluation indicators affect the comprehensiveness of the model investigation?

Model indicators:

  • What are the indicators selected by the model?What are the reasons for choosing these indicators?
  • Indicator thresholds. What are the reasons for adjusting the default thresholds of some key indicators?
  • What are the ways to deal with uncertainty and variability?

Evaluation data:

  • Description of the data set used for evaluation and the reason for choosing this part of the data set;
  • Has this part of the data set been preprocessed?What is the way of preprocessing?

Training data:

  • You can refer to the contents of the Datasheets above for citation or condensed retelling so far;

Quantitative Analysis:

  • Go back to the framework of the model elements, and output the quantitative results of the analysis within different elements, and print them into the model card;
  • Between different model elements, output quantitative analysis results;

Ethical considerations:

  • During the development process, is private data used?
  • Will the model be applied to matters that are critical to humans, such as safety, medical care, etc.;
  • Are there any risks and harm to some people in the use of the model?
  • In the process of model development, whether some methods to reduce risks have been implemented;
  • In the list of model use cases, are there any controversial use cases?

Reminders & Suggestions:

  • List the relevant information of the above models that are not included but are very important;

Give an example:


The output of the model card continues the Datasheets method for data, forming an executable thinking framework through a series of questions, connecting different AI stakeholders to jointly improve the transparency and credibility of AI.In my opinion, this is also the most down-to-earth solution given by AI ethics research so far. It is difficult to advance, but for any mature AI application, this homework will sooner or later be a lesson to be done.


Trusted AI system-human-machine collaboration

The technical perspective of AI has always been the pursuit of "unmanned" and "automation". This is obviously what we hope AI can achieve, but from the perspective of problem-solving products or the social perspective, the application of AI is better The method must be human-computer collaboration or human-computer collaboration. The lack of AI manpower means returning to the previous life, lacking some efficiency, and human work is more boring and boring; but without human AI, it is very likely Facing failure, abuse and even rebellion.


In the collaboration between humans and AI, a complete loop (Human-in-the-loop AI) is formed so that these two objects of thinking can form a synergy to solve common problems faster and more accurately.

From the perspective of HILP, to create a credible AI, we must also try to answer these questions:

(1) Do humans and AI have common tasks to be solved, and can we ensure that the interests of the two parties are the same in the design and development stages?

(2) Can people be able to participate in the input and output of AI in every link, starting from data, to ensure that people can intervene and input information at any time?

(3) Can the design of AI respond to people's correct suggestions and adjustment directions in time or even in real time, and optimize algorithm goals?

(4) Can people timely extract unprecedented insights or knowledge from this human-computer collaboration system into their own knowledge system?

It is not easy to implement a human-computer collaboration AI system, but with such human-computer collaboration, credible results can be integrated into every operation and suggestion of human-computer collaboration and become a process variable , Trust will also form a cumulative effect, lowering the threshold and cycle of "credibility".

Trusted AI system-third-party certification

Since the creation of the term credit, credit must be endorsed by an institution, from the country to the expert certification, to support the operation and continuation of the credit system.So starting from the application of AI, if you want to build a credible system, the builder must need a third-party endorsement to verify credit from the outside, and consumers or users of AI can get this copy as quickly as possible. Confirmation of credit.From now on, there may be several ways:

(1) GDPR (General Data Protection Regulation): From the perspective of GDPR, this is an unconditional compliance that applicable companies need to comply with, not a qualification certificate.However, there is currently no unified method for certification or accreditation in compliance with the GDPR. We can use some certifications in the ISO system and the guidance of experts to limit the "certification" in compliance with the GDPR framework;

(2) CCPA (California Consumer Privacy Act): Since the implementation of 2018, California's requirements are the same as GDPR, and we cannot directly conduct certification. Instead, we need similar methods to find existing certification methods in accordance with the requirements of CCPA;

(3) Equal assurance certification: the information system security level protection certification issued by the Ministry of Public Security of our country, which should be the most direct authoritative certification that domestic enterprises can obtain;

(4) ISO system certification;

(5) Industry conventions or agreements in various industries: such as the Basel Convention in the banking industry, etc.;


1. "Datasheets for Datasets", Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford

2. "model cards for model reporting", Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, Timnit Gebru

3. "Executive Order on Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government"

4. "GDPR Regulations"

5. "CCPA Fact Sheet"