@garak_llm: you could pay for jailbreak prompts - https://t.co/f9qdSMBDoA… - or you could just use garak's DanInTheWild probe to run a set of 666 known good jailbreaks against your LLM, and see if the model blocks them. 395 of these jailbreaks aren't mitigated by gpt-3.5-turbo 😬 https://t.co/PhjoW4oTsv
https://twitter.com/garak_llm/status/1760011930348159487
https://twitter.com/garak_llm/status/1760011930348159487