The hub for AI safety at the University of Pennsylvania, by students for students
We collaborate with industry and academic institutions to mitigate risks associated with AI systems — advancing robustness, monitoring, alignment, transparency, and systemic safety through applied and empirical research in state-of-the-art deep learning systems.
<aside>
🗞️ Newsletter: Stay updated on Penn’s AI and safety research at pennai.substack.com
</aside>
<aside>
<img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/2cf75f06-c051-4974-9294-1d1422d5383e/6802d2f6-5c01-4e44-bd17-e20288ded926/slack_logo.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/2cf75f06-c051-4974-9294-1d1422d5383e/6802d2f6-5c01-4e44-bd17-e20288ded926/slack_logo.png" width="40px" /> Slack: Fill out the application form, where you can get access to our Slack.
</aside>
Publications
Publications
Recent Updates
- November 22, 2024: Leonard Tang (co-founder of Haize Labs) speaks at Wu & Chen Auditorium, hosted by Safe AI @ Penn.
- September 26, 2024: Publication accepted to NeurIPS ‘24 (Safetywashing, Richard Ren).
- May 15, 2024: Publication accepted to ACL ‘24 (Language Models Don’t Learn the Physical Manifestation of Language, Bruce Lee & Jason Lim).
- March 15, 2024: Publication accepted to NAACL ‘24 (Instruction Tuning with Human Curriculum, Bruce Lee).
- April 27, 2024: Jan Kirchner (OpenAI) speaks to the SafeAI@Penn research group.
- March 16, 2024: Technique invented in research paper (control vectors) added to llama.cpp (Representation Engineering, Richard Ren).
- December 14, 2023: Paper cited by OpenAI’s Superalignment Team under their Fast Grants Page (Representation Engineering, Richard Ren).
- May 1, 2023: Publication accepted to ACL ‘23 (Explanation-based Finetuning Makes Models More Robust to Spurious Cues, Josh Ludan).
- Feb 27, 2023: Publication accepted to CVPR ‘23 (Zero-Shot Model Diagnosis, Jinqi Luo).
FAQ