
Arguably the most difficult open problem in the development of Artificial Intelligence is the ‘Alignment Problem’, roughly, how do we ensure that AI processes and outputs align with our best considered judgments about what is and is not valuable? Current strategies for achieving alignment, ranging from regulation to design interventions such as reinforcement learning from human feedback, prompt engineering, data scrubbing and the like, appear promising but are also riddled with spectacular failures. Proponents of such strategies hold out the hope that more data, more learning — and more resource-draining computing power — will eventually iron out alignment difficulties.
Common to all these approaches is the strategy of ‘fixing’ AI after it has been created. Instead of imposing ad hoc guardrails after AI has been created, I suggest a more radical approach to Alignment: reexamination of the fundamentals of AI design. In particular, I identify two fundamentally mistaken assumptions about human values made in current AI design, which suggest that i) no amount of data scrubbing, prompt engineering, or reinforcement learning from human feedback can achieve alignment given the nature of human values, and ii) an alternative AI design that corrects for these two mistakes could go a long way to achieving value alignment. I propose such an alternative, the ‘Parity Model’, which has two distinctive features. It is a ‘values-based’ approach to AI design, and it recognizes that machine-human value alignment requires development of what I call ‘small ai’.
Speaker
Prof. Ruth Chang
Professor of Jurisprudence, University of Oxford
Online
No registration is required.
Meeting ID: 973 4712 4276
Link: https://cuhk.zoom.us/j/97347124276
Face-to-face
Register by 23 March 2025
Link: https://cloud.itsc.cuhk.edu.hk/webform/view.php?id=13705343
Enquiry:
Tel: (852) 3943 7135
Fax: (852) 2603 5323
Website: http://phil.arts.cuhk.edu.hk/