Global catastrophic risks Artificial intelligence

Center for Human-Compatible AI

▲ Photo by Google DeepMind on Unsplash

Related research

Safeguarding the future report

The Center for Human-Compatible AI (CHAI) is an academic research centre at University of California, Berkeley that carries out technical and advocacy work to help ensure the safety of AI systems and build the field of future AI researchers.

What problem are they trying to solve?

In building advanced machine intelligence, we would forfeit our position as the most intelligent force on the planet, and we are currently doing so without a clear plan. Given the potential benefits we could enjoy if the transition to advanced general AI goes well, successfully navigating the transition to advanced AI seems to be one of the most important challenges we will face.

Artificial intelligence research is concerned with the design of machines capable of intelligent behaviour, i.e., behaviour likely to be successful in achieving objectives. The long-term outcome of AI research seems likely to include machines that are more capable than humans across a wide range of objectives and environments. This raises a problem of control: given that the solutions developed by such systems are intrinsically unpredictable by humans, it may occur that some such solutions result in negative and perhaps irreversible outcomes for us. CHAI’s goal is to ensure that this eventuality cannot arise, by refocusing AI away from the capability to achieve arbitrary objectives and towards the ability to generate provably beneficial behaviour. Because the meaning of beneficial depends on properties of humans, this task inevitably includes elements from the social sciences in addition to AI.

CHAI is led by Stuart Russell, Anca Dragan, Pieter Abbeel and other Faculty Principal Investigators. Professor Stuart Russell is a co-author of Artificial Intelligence: A Modern Approach, the leading AI textbook^¹ used in over 1,500 universities, and Human Compatible: Artificial Intelligence and the Problem of Control, a 2019 popular science book, that discusses threats to humanity from artificial intelligence. CHAI’s mission is to ensure that AI systems are provably beneficial for humans.

In recent years, machine learning approaches to AI development have made strong progress in a number of domains. AI systems: now surpass humans at image recognition and games such as chess, Go and poker; have made huge progress in areas such as translation; and have even made novel scientific discoveries, such as predicting how proteins will fold. Figure 1 illustrates the rapid recent improvements in AI image generation: AIs are now able to produce synthetic images that are nearly indistinguishable from photographs, whereas only a few years ago the images they produced were crude and unrealistic.

Figure 1.

CHAI_Fig 1.png

Source: Brundage et al., The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (2018): p.15. In order, the images are from papers by Goodfellow et al. (2014), Radford et al. (2015), Liu and Tuzel (2016), and Karras et al. (2017).

What do they do?

CHAI’s goal is to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems. CHAI aims to do this by developing a “new model” of AI, in which (1) the machine’s objective is to help humans realize the future we prefer; (2) the machine is explicitly uncertain about those human preferences; (3) human behavior provides evidence of human preferences. This is unlike the standard model for AI, in which the objective is assumed to be known completely and correctly. CHAI’s research focuses on the following areas related to AI safety:

Value alignment through, e.g., inverse reinforcement learning from multiple sources (such as text and video).
Value functions defined by partially observable and partially defined terms (e.g. “health,” “death”).
The structure of human value systems, and the implications of computational limitations and human inconsistency.
Conceptual questions including the properties of ideal value systems, trade-offs among humans, and the long-term stability of values.

CHAI’s research spans computer science, psychology, economics and other areas, and CHAI faculty and Principal Investigators include academics at various world-leading American universities. CHAI is one of the few academic research centers that is focused solely on the safety of advanced AI. Given that the field of AI safety is controversial, especially in relation to global catastrophic risk, CHAI’s ability to afford the topic academic legitimacy is likely to be especially valuable. CHAI faculty members and Principal Investigators have a very strong reputation in the field.

Open Philanthropy, our research partner, recommends CHAI as one of the highest-impact organizations working on AI alignment in the world.
CHAI faculty, affiliates and students have already produced an impressive amount of high-quality technical AI safety research.
The field of AI safety is still young and CHAI is actively involved in building it, for instance, by offering internships to potential AI safety researchers, hosting workshops on AI safety and funding graduate students to do AI safety research.
CHAI actively engages in positively shaping public policy and public opinion surrounding AI safety. For example, Professor Stuart Russell authored a popular science book on AI safety, Human Compatible: Artificial Intelligence and the Problem of Control, and has independent observer status as a member of the Global Partnership on AI, an international body with 15 member states, including the US and the EU.

CHAI has only been operating since 2016, but it has already had significant success in its three main pathways to impact: research, field-building and thought leadership.

Technical AI safety research

Technical research on AI safety is CHAI’s primary focus. This generally involves publishing papers in top journals and presenting papers at major conferences. According to the expert we defer to in this area, Daniel Dewey, there is some evidence that CHAI is having a positive impact on the growth of the technical AI safety field.^² Since 2016, dozens of papers have been published by CHAI faculty, affiliates and students (a full list of papers can be found here). One of their most potentially significant achievements is the introduction of Cooperative Inverse Reinforcement Learning and the proof that it leads to provably beneficial AI systems that are necessarily deferential to humans.^³ This paper and two other papers by CHAI researchers have been accepted by NeurIPS, a major conference on machine learning, making these some of the best-received AI safety papers among machine learning researchers.^⁴ For an extensive list of CHAI’s technical AI safety papers and other achievements, see their Progress Report.

Building the AI safety field

The field of AI safety research is immature and is dwarfed by efforts to improve the capabilities of AI systems. CHAI works to ensure that there is a robust pipeline of future safety researchers, and a thriving community to support them. For example, CHAI usually hosts around seven interns per year (these are sometimes students and are sometimes people working in technical jobs who want to pivot to contributing to AI safety research) and for the 2020-21 academic year, CHAI is funding and training 25-30 PhD students. CHAI alumni have placed very well, accepting positions at Stanford, Princeton, MIT and DeepMind.

Every year, CHAI hosts an annual workshop to bring together experts on AI safety and related areas from around the world. At the third workshop in May 2019, CHAI hosted 85 attendees. The 2020 workshop was held online due to the COVID-19 pandemic and had 150 participants. These meetings may be valuable for relationship building and the generation of ideas. The estimated cost for the in-person workshop is about $1,000 per attendee.

Influencing public policy and opinion

CHAI Principal Investigators are frequently invited to give talks, share insights and provide input on national AI strategy. CHAI’s work in this area has included:

Professor Stuart Russell’s popular science book Human Compatible: Artificial Intelligence and the Problem of Control.
Dozens of talks, including Professor Stuart Russell’s popular TED talk on ‘3 principles for creating safer AI’ and multiple talks at the World Economic Forum in Davos and the Nobel Week Dialogues in Stockholm and Tokyo.
A variety of media articles, including “Yes, We Are Worried About the Existential Risk of Superintelligence” in MIT Technology Review.
Invitations to advise the governments of numerous countries.
Professor Stuart Russell originated and co-wrote “Slaughterbots,” a short video campaigning against the development and use of autonomous weapons, which has been viewed more than 70 million times.^⁵
Professor Stuart Russell has independent observer status as a member of the Global Partnership on AI, an international body with 15 member states, including the US and the EU.

Given the controversial status of AI risk in some quarters, we believe that this kind of work is especially valuable.

Why do we trust this organization?

For this recommendation, we are grateful to be able to utilize the in-depth expertise of, and background research conducted by, current and former staff at Open Philanthropy, the world’s largest grant-maker on global catastrophic risk. Open Philanthropy identifies high-impact giving opportunities, makes grants, follows the results and publishes its findings. (Disclosure: Open Philanthropy has made several unrelated grants to Founders Pledge.)

CHAI is one of the few academic centers devoted to the development of provably safe AI systems. CHAI is especially well-placed to produce reliably positive impact because it is based in a world-leading university and because its faculty members and Principal Investigators have an excellent reputation in the field. Professor Stuart Russell, in particular, is more focused on the extreme downside risk of transformative AI systems than any other comparably senior mainstream researcher.^⁶ As the author of one of the leading AI textbooks, Professor Russell has an excellent reputation in the field.

CHAI researchers have shown the ability to produce high quality and widely respected research in a short space of time, and to communicate the potential downside risks of AI to wider audiences. CHAI has faculty members across numerous major universities, including UC Berkeley, Cornell and the University of Michigan, and, as discussed above, there is potential for branches to be opened in other world-leading universities. We therefore believe that, for those concerned about AI safety, CHAI is one of the best donation opportunities available.

Message from the organization

Since its founding in 2016, CHAI has grown from a small group of PI’s and PhD students to 9 faculty investigators, 18 affiliate faculty, around 30 additional graduate and postdoctoral researchers (including roughly 25 PhD students), many undergraduate researchers and interns, and a staff of 5. Our research is the primary focus of our work.

CHAI’s research output includes foundational work to re-frame AI on the basis of a new model that factors in uncertainty about human preferences, in contrast to the standard model for AI in which the objective is assumed to be known completely and correctly. Our work includes topics such as misspecified objectives, inverse reward design, assistance games with humans, obedience, preference learning methods, social aggregation theory, interpretability, and vulnerabilities of existing methods.

Given the massive resources worldwide devoted to research within the standard model of AI, CHAI’s undertaking also requires engaging with this research community to adopt and further develop AI based on this new model. In addition to academic outreach, CHAI strives to reach general audiences through publications and media. We also advise governments and international organizations on policies relevant to ensuring AI technologies will benefit society, and offer insight on a variety of individual-scale and societal-scale risks from AI, such as pertaining to autonomous weapons, the future of employment, and public health and safety.

The October 2019 release of the book Human Compatible explained the mission of the Center to a broad audience just before most of us were forced to work from home. The pandemic has made evident our dependence on AI technologies to understand and interact with each other and with the world outside our windows. The work of CHAI is crucial now, not just in some future in which AI is more powerful than it is today.

Thank you for your interest in CHAI. We are grateful for the generous sponsors who have made our work possible, and would be pleased to meet with new prospective sponsors.

CHAI leadership.

More resources

Stuart Russell

BBC Radio 4 Today Programme: Filmed interview, October 7, 2019.
New York Times: How to Stop Superhuman A.I. Before It Stops Us (adapted excerpt), October 8, 2019.
The Economist: The promise and peril of AI, interview by Kenneth Cukier, October 9, 2019.
Financial Times: Stuart Russell on losing control of AI, Techtonic podcast interview by John Thornhill, October 20, 2019.
The Guardian: AI and our future, review by Ian Sample, October 24, 2019.
Vox: AI could be a disaster for humanity. A top computer scientist thinks he has the solution, interview by Kelsey Piper, October 26, 2019.
Sunday Times Magazine: The end of humanity: will artificial intelligence free us, us enslave or exterminate us?, interview by Danny Fortson, October 27, 2019.
C-SPAN: Human Compatible, televised interview with Richard Waters (Financial Times), November 13, 2019.
Wall Street Journal: "Human Compatible" and "Artificial Intelligence" Review: Learn Like a Machine, by Matthew Hutson, November 19, 2019.
Forbes Magazine: Leading AI Luminary Has An Idea To Ensure Humans Remain In Control, by Peter High, January 13, 2020.
McKinsey Global Institute: How to ensure artificial intelligence benefits society: A conversation with Stuart Russell and James Manyika, January 2020.
Slate Star Codex: Book Review: Human Compatible, by Scott Alexander, January 30, 2020.
Quanta Magazine: Artificial Intelligence Will Do What We Ask. That's a Problem., by Natalie Wolchover, January 31, 2020.
Future of Life Institute podcast: Steven Pinker and Stuart Russell on the Foundations, Benefits, and Possible Existential Threat of AI, June 15, 2020.
Carnegie Council for Ethics in International Affairs: The Future of Artificial Intelligence, with Stuart J. Russell, interview by Alez Woodson, February 24, 2020.
New York Times: Killer Robots Aren't Regulated. Yet., by Jonah Kessel, December 13, 2019. Includes short video "Stuart Russell on The Inevitability of general AI"
Human Compatible is one of the best books of the year in The Daily Telegraph, The Financial Times, The Guardian, Forbes Magazine, and El Tiempo.

Notes

Anna Nowogrodzki, ‘Mining the Secrets of College Syllabuses’, Nature 539, no. 7627 (November 2016): 125–26, https://doi.org/10.1038/539125a. ↩
“Suggestions for Individual Donors from Open Philanthropy Project Staff - 2017.” ↩
Dylan Hadfield-Menell et al., “Cooperative Inverse Reinforcement Learning,” June 9, 2016, https://arxiv.org/abs/1606.03137/. ↩
Dylan Hadfield-Menell et al., “Inverse Reward Design,” in Advances in Neural Information Processing Systems, 2017, 6765–6774; Nishant Desai, Andrew Critch, and Stuart J. Russell, “Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making,” in Advances in Neural Information Processing Systems, 2018, 4717–4725. ↩
“Following on the success of the Slaughterbots video at the end of 2017, which won a Gold Medal at the Cannes Corporate Media & TV Awards for best viral film of the year and now has an estimated 70 million views across various social media platforms, we continued working in support of a global ban on LAWS.” Future of Life Institute, Annual Report 2018. Note that this estimate is from 2018 so the total views today is likely higher. ↩
“UC Berkeley — Center for Human-Compatible AI,” Open Philanthropy Project, May 23, 2016, https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/uc-berkeley-center-human-compatible-ai/. ↩

What problem are they trying to solve?
What do they do?
Why do we recommend them?
Why do we trust this organization?
Message from the organization
More resources
Notes