PHIL7004: AI Safety & Security
Please enter the password to access the course site.
Please enter the password to access the course site.
Spring 2026
Advanced artificial intelligence systems offer the potential for unprecedented economic and scientific growth, but also present a unique class of risks. Beyond the immediate challenges of bias or labor displacement, a growing body of philosophical and technical literature argues that AI poses an "existential risk"—the possibility that misaligned advanced systems could permanently curtail humanity's future or lead to human extinction. This course investigates the arguments supporting and opposing these concerns.
Specifically, this seminar tackles six core questions:
Required Reading
Recommended Reading / Other Resources
CAIS - Overview of Catastrophic AI Risk (Sections 1.1-1.6)
View ResourceRequired Reading
Recommended Reading / Other Resources
Davidson - Will Compute Bottlenecks Prevent a SIE
View ResourceChalmers - The Singularity - A Philosophical Analysis
View PDFEth and Davidson - Will AI R&D Automation cause a Software Intelligence Explosion
View PDFDavid Thorstad's Blog Series on the Singularity Hypothesis
View ResourceRequired Reading
Recommended Reading / Other Resources
David Thorstad's Blog Series on Instrumental Convergence & Power Seeking
View ResourceThorstad - What Power Seeking Theorems do not Show
View PDFGabriel and Keeling - A matter of principle
View PDFDate: March 2
Format: In-class. A combination of Multiple Choice Questions (MCQ) and Short Answer questions.
Description: This exam evaluates your technical precision and conceptual understanding of the material covered from Weeks 1 through 5. MCQs will test your grasp of specific definitions, distinctions, and logic (e.g., distinguishing between instrumental convergence and orthogonality), while Short Answers will require you to succinctly explain core arguments or reconstruct a specific dialectic from the readings.
Due: 24 hours before class (via email/LMS).
Format: 300–500 words.
Description: For 5 of the 7 instruction weeks (excluding Intro and Presentations), you must submit a short analytic response to one of the required readings for that week. These responses must not merely summarize the text; instead, they should either reconstruct a specific argument in logical form or raise a focused philosophical objection to a specific premise. These submissions will serve as starting points for seminar discussions.
Due: March 23 (First class following the break).
Format: 500-word extended abstract + 5–8 annotated sources.
Description: You will submit a formal proposal for your Final Paper consisting of an extended 500-word abstract and an annotated bibliography. The abstract must go beyond a simple topic description by outlining the logical progression of your intended argument and clearly stating the conclusion you aim to defend. The bibliography must include sources outside of the provided syllabus to demonstrate independent research capability.
Date: April 20 or April 27
Format: Strict 10-minute slot (5-minute pitch + 5-minute Q&A).
Description: This is not a summary of your entire paper. You must deliver a high-intensity, 5-minute pitch focusing on one single idea or specific premise from your upcoming paper. You will be cut off strictly at the 5-minute mark. The goal of this format is to crowd-source objections and stress-test the most vulnerable or complex part of your argument against class feedback.
Due: May 7
Format: 3,500 – 4,500 words.
Description: A substantial research paper on a topic related to AI Safety & Security. You are encouraged to synthesize the research done for your Abstract and the feedback received during your Presentation. The paper should not merely review the literature; it must make a novel philosophical intervention, such as defending a view against a recent counterargument, exposing a hidden assumption in a standard safety argument, or applying a specific ethical framework to a novel AI capability.