PHIL7004: AI Safety & Security
Please enter the password to access the course site.
Please enter the password to access the course site.
Spring 2026
Advanced artificial intelligence systems offer the potential for unprecedented economic and scientific growth, but also present a unique class of risks. Beyond the immediate challenges of bias or labor displacement, a growing body of philosophical and technical literature argues that AI poses an "existential risk"—the possibility that misaligned advanced systems could permanently curtail humanity's future or lead to human extinction. This course investigates the arguments supporting and opposing these concerns.
Specifically, this seminar tackles six core questions:
Required Reading
Recommended Reading / Other Resources
CAIS - Overview of Catastrophic AI Risk (Sections 1.1-1.6)
View ResourceRequired Reading
Recommended Reading / Other Resources
Davidson - Will Compute Bottlenecks Prevent a SIE
View ResourceChalmers - The Singularity - A Philosophical Analysis
View PDFEth and Davidson - Will AI R&D Automation cause a Software Intelligence Explosion
View PDFDavid Thorstad's Blog Series on the Singularity Hypothesis
View ResourceRequired Reading
Recommended Reading / Other Resources
David Thorstad's Blog Series on Instrumental Convergence & Power Seeking
View ResourceThorstad - What Power Seeking Theorems do not Show
View PDFGabriel and Keeling - A matter of principle
View PDFSignup here
Signup here
Signup here
Date: March 2
Format: In-class. A combination of Multiple Choice Questions (MCQ) and Short Answer questions.
Description: This exam evaluates your technical precision and conceptual understanding of the material covered from Weeks 1 through 5. MCQs will test your grasp of specific definitions, distinctions, and logic (e.g., distinguishing between instrumental convergence and orthogonality), while Short Answers will require you to succinctly explain core arguments or reconstruct a specific dialectic from the readings.
Due: 24 hours before class (via Submission link, below).
Format: 300–500 words.
Description: For 5 of the 6 instruction weeks, you must submit a short analytic response to one of the required readings for that week. These responses must not merely summarize the text; instead, they should either reconstruct a specific argument in logical form or raise a focused philosophical objection to a specific premise. These submissions will serve as starting points for seminar discussions. For more details, see the Taxonomy of Responses.
Due: March 23 (First class following the break).
Format: 500-word extended abstract + 5–8 annotated sources. Submit via Moodle.
Description: You will submit a formal proposal for your Final Paper consisting of an extended 500-word abstract and an annotated bibliography. The abstract must go beyond a simple topic description by outlining the logical progression of your intended argument and clearly stating the conclusion you aim to defend. The bibliography must include sources outside of the provided syllabus to demonstrate independent research capability.
Date: April 27, May 4, or May 11
Signup: Presentation Signups
Format: Strict 15-minute slot (10-minute pitch + 5-minute Q&A).
Description: This is not a summary of your entire paper. You must deliver a high-intensity, 10-minute pitch focusing on one single idea or specific premise from your upcoming paper. You will be cut off strictly at the 10-minute mark. The goal of this format is to crowd-source objections and stress-test the most vulnerable or complex part of your argument against class feedback.
Due: May 13
Format: 2,500 – 3,000 words. Submit via Moodle.
Description: A substantial research paper on a topic related to AI Safety & Security. You are encouraged to synthesize the research done for your Abstract and the feedback received during your Presentation. The paper should aim to do more than merely review the literature; it should aim to make a philosophical intervention, such as defending a view against a recent counterargument, exposing a hidden assumption in an argument, or applying a specific ethical framework to a novel AI capability, and so on.
This course follows all relevant HKU policies, e.g., on plagiarism, academic freedom, and research integrity. Weekly responses inform class discussion and may be used to improve course materials and teaching methods. Please also review the Mental Health and Well-Being Statement.
Submit a paper or other longer assignment via Moodle @ HKU.
If you want to contact me about something else, please email me (sharadin@hku.hk) or book an appointment during my office hours.