PHIL7004: AI Safety & Security

Spring 2026

Course Overview

Advanced artificial intelligence systems offer the potential for unprecedented economic and scientific growth, but also present a unique class of risks. Beyond the immediate challenges of bias or labor displacement, a growing body of philosophical and technical literature argues that AI poses an "existential risk"—the possibility that misaligned advanced systems could permanently curtail humanity's future or lead to human extinction. This course investigates the arguments supporting and opposing these concerns.

Specifically, this seminar tackles six core questions:

Instrumental Convergence: Will sufficiently intelligent agents necessarily seek power and resources, regardless of the goals we give them?
The Singularity: Is a rapid "intelligence explosion" inevitable, or will compute bottlenecks and economic friction slow progress?
Misalignment: Can we prevent AI from engaging in deception or power-seeking behavior when its internal objectives diverge from our own?
Gradual Disempowerment: Will an AI takeover happen suddenly (a "hard takeoff"), or will humanity slowly hand over the reins of civilization?
Lethal Autonomous Weapons: Is it ethically permissible to delegate lethal force to autonomous systems?
AI Rights: If AI systems become sentient or sufficiently sophisticated in other ways, do we owe them moral rights—and does granting those rights compromise or promote human safety?

Course Schedule (Spring 2026)

January 19

Introduction to the course

Required Reading

Greaves - Concepts of Existential Catastrophe

View PDF

Bales et al - Artificial Intelligence: Arguments for Catastrophic Risk

View PDF

D'Alessandro and Kirk-Giannini - Artificial Intelligence: Approaches to Safety

View PDF

Recommended Reading / Other Resources

CAIS - Overview of Catastrophic AI Risk (Sections 1.1-1.6)

View Resource

January 26

Instrumental Convergence

Required Reading

Bostrom - The Superintelligent Will

View PDF

Southan et al - A timing problem for instrumental convergence

View PDF

Sharadin - Promotionalism, orthogonality, and instrumental convergence

View PDF

Recommended Reading / Other Resources

Gallow - Instrumental Divergence

View PDF

February 2

The Singularity Hypothesis

Required Reading

Thorstad - Against the Singularity Hypothesis

View PDF

Kirk-Giannini and Davidson - Rebooting the Singularity

View PDF

Recommended Reading / Other Resources

Davidson - Will Compute Bottlenecks Prevent a SIE

View Resource

Chalmers - The Singularity - A Philosophical Analysis

View PDF

Eth and Davidson - Will AI R&D Automation cause a Software Intelligence Explosion

View PDF

David Thorstad's Blog Series on the Singularity Hypothesis

View Resource

February 9

Misalignment

Required Reading

Ngo and Bales - Deceit and Power - Machine Learning and Misalignment

View PDF

Carlsmith - Existential Risk from Power-Seeking AI

View PDF

Recommended Reading / Other Resources

David Thorstad's Blog Series on Instrumental Convergence & Power Seeking

View Resource

Thorstad - What Power Seeking Theorems do not Show

View PDF

Gabriel and Keeling - A matter of principle

View PDF

February 16 NO CLASS (Lunar New Year)

February 23 NO CLASS (Lunar New Year)

March 2

Midterm Exam (Study Guide)

March 9 NO CLASS (Reading Week)

March 16 NO CLASS (General Holiday)

March 23

Gradual Disempowerment

Required Reading

Kulveit et al - Gradual Disempowerment

View PDF

Bales - AI takeover and human disempowerment

View PDF

Recommended Reading / Other Resources

Kasirzadeh - Two Types of AI X Risk - Decisive and Accumulative

View PDF

March 30

LAWs

Required Reading

Sparrow - Killer Robots

View PDF

Simpson and Muller - Just War and Robots Killings

View PDF

Recommended Reading / Other Resources

Muller - Autonomous Killer Robots are Probably Good News

View PDF

Longpre et al - LAWS and AI - Trends

View PDF

April 6 NO CLASS (General Holiday)

April 13

AI Rights & Safety

Required Reading

Salib and Goldstein - AI Rights for Human Safety

View PDF

LoPucki - Algorithmic Entities

View PDF

April 20

Power, Trust, and Technology

Required Reading

Whittaker - The Steep Cost of Capture

View PDF

Oneill - Linking Trust to Trustworthiness

View PDF

Winner - Do Artifacts have Politics

View PDF

April 27

Presentations

Signup here

May 4

Presentations

Signup here

May 11

Individual & Small Group Paper Meetings

Signup here

Assessments

Midterm Exam (30%)

Midterm Exam Study Guide

Date: March 2

Format: In-class. A combination of Multiple Choice Questions (MCQ) and Short Answer questions.

Description: This exam evaluates your technical precision and conceptual understanding of the material covered from Weeks 1 through 5. MCQs will test your grasp of specific definitions, distinctions, and logic (e.g., distinguishing between instrumental convergence and orthogonality), while Short Answers will require you to succinctly explain core arguments or reconstruct a specific dialectic from the readings.

Weekly Reading Response (15%)

Due: 24 hours before class (via Submission link, below).

Format: 300–500 words.

Description: For 5 of the 6 instruction weeks, you must submit a short analytic response to one of the required readings for that week. These responses must not merely summarize the text; instead, they should either reconstruct a specific argument in logical form or raise a focused philosophical objection to a specific premise. These submissions will serve as starting points for seminar discussions. For more details, see the Taxonomy of Responses.

Paper Abstract & Annotated Bibliography (10%)

Due: March 23 (First class following the break).

Format: 500-word extended abstract + 5–8 annotated sources. Submit via Moodle.

Description: You will submit a formal proposal for your Final Paper consisting of an extended 500-word abstract and an annotated bibliography. The abstract must go beyond a simple topic description by outlining the logical progression of your intended argument and clearly stating the conclusion you aim to defend. The bibliography must include sources outside of the provided syllabus to demonstrate independent research capability.

Presentation (15%)

Date: April 27, May 4, or May 11

Signup: Presentation Signups

Format: Strict 15-minute slot (10-minute pitch + 5-minute Q&A).

Description: This is not a summary of your entire paper. You must deliver a high-intensity, 10-minute pitch focusing on one single idea or specific premise from your upcoming paper. You will be cut off strictly at the 10-minute mark. The goal of this format is to crowd-source objections and stress-test the most vulnerable or complex part of your argument against class feedback.

Final Paper (30%)

Due: May 13

Format: 2,500 – 3,000 words. Submit via Moodle.

Description: A substantial research paper on a topic related to AI Safety & Security. You are encouraged to synthesize the research done for your Abstract and the feedback received during your Presentation. The paper should aim to do more than merely review the literature; it should aim to make a philosophical intervention, such as defending a view against a recent counterargument, exposing a hidden assumption in an argument, or applying a specific ethical framework to a novel AI capability, and so on.

Course Policies

This course follows all relevant HKU policies, e.g., on plagiarism, academic freedom, and research integrity. Weekly responses inform class discussion and may be used to improve course materials and teaching methods. Please also review the Mental Health and Well-Being Statement.

Submit / Contact

Submit

Submit a paper or other longer assignment via Moodle @ HKU.

Contact

If you want to contact me about something else, please email me (sharadin@hku.hk) or book an appointment during my office hours.