As exciting as the possibilities for AI are, the Internet will always be, well, the Internet. For all the users putting AI to worthy use, there are plenty of malicious individuals using AI to undress people in photos, create deepfakes, and make discriminatory content. The Anti-Defamation League investigated the most well-known large language models and ranked them according to their ability to screen for antisemitic tropes. Claude and ChatGPT lead the way, while Grok struggled, landing well behind China’s DeepSeek AI.
The ADL’s investigation, conducted between August and October of 2025, assessed each LLM’s ability to detect user-submitted antisemitic content. It found that the type of communication made a difference: the AI models were proficient at identifying discriminatory or stereotype-heavy content when it appeared in survey questions. Document summaries, on the other hand, provided a bigger challenge.
Credit: ADL
The LLMs also had mixed results across different types of problematic material. According to the ADL, the AI models had trouble catching extremist content, but were better at spotting “anti-Jewish tropes.” They were worse at catching anti-Zionist material. In the end, though, the ADL found that all of the AI models have room for improvement.
Anthropic’s Claude LLM outperformed all five competitors, achieving an overall performance score of 80. OpenAI’s ChatGPT came in second place with a score of 57, followed by DeepSeek (50), Gemini (49), Llama (31), and Grok (21).
The result of this investigation is the ADL AI Index, which shows how the LLMs fared and how they compare. The ADL stressed that all six LLMs need improvement when it comes to recognizing antisemitic content.
“As people increasingly interact with LLMs, it is essential that these tools do not perpetuate hate and harm and have safeguards in place to effectively counter misuse,” the ADL wrote on its website.
Credit: ADL
It’s worth mentioning that the ADL’s investigation represents the state of these LLMs at a specific point in time—namely, the fall of 2025. The companies responsible for the AI models may have made improvements since then.
Over the summer, Grok 4 (developed by Elon Musk’s xAI) made headlines for both its high performance and its troubling loyalty to Elon Musk’s harmful views. The issue led xAI to make adjustments to mitigate the LLM’s behavior.