SafeAI@Penn Labs

The hub for AI safety at the University of Pennsylvania, by students for students

We collaborate with industry and academic institutions to mitigate risks associated with AI systems — advancing robustness, monitoring, alignment, transparency, and systemic safety through applied and empirical research in state-of-the-art deep learning systems.

<aside> 🗞️ Newsletter: Stay updated on Penn’s AI and safety research at pennai.substack.com

</aside>

<aside> <img src="https://prod-files-secure.s3.us-west-2.amazonaws.com/2cf75f06-c051-4974-9294-1d1422d5383e/6802d2f6-5c01-4e44-bd17-e20288ded926/slack_logo.png" alt="https://prod-files-secure.s3.us-west-2.amazonaws.com/2cf75f06-c051-4974-9294-1d1422d5383e/6802d2f6-5c01-4e44-bd17-e20288ded926/slack_logo.png" width="40px" /> Slack: Fill out the application form, where you can get access to our Slack.

</aside>

Publications

Recent Updates

December 30, 2024: Alexander Chen (sophomore organizer at SafeAI@Penn) joins xAI as Member of Technical Staff.
November 22, 2024: Leonard Tang (co-founder of Haize Labs) speaks at Wu & Chen Auditorium, hosted by Safe AI @ Penn.
September 26, 2024: Publication accepted to NeurIPS ‘24 (Safetywashing, Richard Ren).
May 15, 2024: Publication accepted to ACL ‘24 (Language Models Don’t Learn the Physical Manifestation of Language, Bruce Lee & Jason Lim).
March 15, 2024: Publication accepted to NAACL ‘24 (Instruction Tuning with Human Curriculum, Bruce Lee).
April 27, 2024: Jan Kirchner (ex-OpenAI, Anthropic) speaks to the SafeAI@Penn research group.
March 16, 2024: Technique invented in research paper (control vectors) added to llama.cpp (Representation Engineering, Richard Ren).
December 14, 2023: Paper cited by OpenAI’s Superalignment Team under their Fast Grants Page (Representation Engineering, Richard Ren).
May 1, 2023: Publication accepted to ACL ‘23 (Explanation-based Finetuning Makes Models More Robust to Spurious Cues, Josh Ludan).
Feb 27, 2023: Publication accepted to CVPR ‘23 (Zero-Shot Model Diagnosis, Jinqi Luo).