How Can We Align AI with Human Values? A Proposal

Arguably the most difficult open problem in the development of Artificial Intelligence is the ‘Alignment Problem’, roughly, how do we ensure that AI processes and outputs align with our best considered judgments about what is and is not valuable? Current strategies for achieving alignment, ranging from regulation to design interventions such as reinforcement learning from human feedback, prompt engineering, data scrubbing and the like, appear promising but are also riddled with spectacular failures. Proponents of such strategies hold out the hope that more data, more learning — and more resource-draining computing power — will eventually iron out alignment difficulties.
Common to all these approaches is the strategy of ‘fixing’ AI after it has been created. Instead of imposing ad hoc guardrails after AI has been created, I suggest a more radical approach to Alignment: reexamination of the fundamentals of AI design. In particular, I identify two fundamentally mistaken assumptions about human values made in current AI design, which suggest that i) no amount of data scrubbing, prompt engineering, or reinforcement learning from human feedback can achieve alignment given the nature of human values, and ii) an alternative AI design that corrects for these two mistakes could go a long way to achieving value alignment. I propose such an alternative, the ‘Parity Model’, which has two distinctive features. It is a ‘values-based’ approach to AI design, and it recognizes that machine-human value alignment requires development of what I call ‘small ai’.

Speaker
Prof. Ruth Chang
Professor of Jurisprudence, University of Oxford

Online
No registration is required.
Meeting ID: 973 4712 4276
Link: https://cuhk.zoom.us/j/97347124276

Face-to-face
Register by 23 March 2025
Link: https://cloud.itsc.cuhk.edu.hk/webform/view.php?id=13705343

Enquiry:
Tel: (852) 3943 7135
Fax: (852) 2603 5323
Website: http://phil.arts.cuhk.edu.hk/

探索更多

講座 11/04/2025

Book Launch Event – Japan at War, 1914–1952 by Prof. Jeremy Yellen

了解更多

研討會 08/04/2025

Scope, Quantifier Raising, and the Typology of Ditransitive VPs

了解更多

研討會 01/04/2025

Perceptual Similarity as a Filter on Tonal Variation and Change

了解更多

講座 28/03/2025

Feeling-to Ethics: On Nishida Kitarō’s Moral Philosophy

了解更多