A new interactive tool for learning about securing generative AI models called Chat Playground has been launched by Steve Wilson, co-chair of the Gen AI Security Project. Wilson said that the group wanted to provide something with a low bar to getting started, and with Chat Playground, testing teams only need a web browser.
"It gives you a really easy way to play with some vulnerable bots and some guardrails and get a feel for what it's really like to try and secure these things'"
—Steve Wilson
Hosted on GitHub, the tool first asks a user to choose a chat personality based on an AI model. For example, "Eliza" is a therapist based on a simple bot model, and "Bob" is a tech support bot that uses ChatGPT. The user can then customize the display; conversations can be shown in green letters on a black background, for instance, or with an iMessage format, with dialog appearing in blue bubbles. Teams can experiment with a variety of input and output filters, or guardrails. Testers can also add API keys.
In Chat Playground, teams can experiment with an assortment of gen AI chat scenarios. One example draws from a case recounted in Wilson's 2024 book on large language model (LLM) security, published by O'Reilly. The case involved Tay, an early chatbot developed by Microsoft that began to spew offensive tweets after its short exposure to Twitter and had to be pulled offline after only 16 hours. "We simulate that in Chat Playground. One of the bot personalities is jailbroken. It will curse you out, say bad things, use bad words. You then have to put guardrails in place to see how they work and what gets detected and what doesn't," he said.
The tool was posted on GitHub to make it available for a general audience. "It's super easy to download and fork and hack, especially in this world of vibe coding," Wilson explained. "You can download the source code and get out your favorite coding assistant."
"Even if you're not a hardcore developer, you can try different kinds of guardrails or make a new kind of bot with new behaviors. It lets you experiment outside your production system in a safe place."
—Steve Wilson
Here's what you need to know about the new Chat Playground project — and how to put it to work in your organization to secure your gen AI.
[ Get White Paper: How the Rise of AI Will Impact Software Supply Chain Security ]
A browser-based gen AI sandbox is born
Chat Playground offers a practical glimpse into how AI can enhance content moderation to be dynamic and go beyond traditional or static rule-based systems, explained Melody (MJ) Kaufmann, an author and instructor at O'Reilly Media.
At its core, the tool is a browser-based sandbox designed to showcase dynamic filtering using LLMs, Kaufmann said. It introduces a more adaptable method, unlike static filters that rely on hardcoded keyword blocklists, which are notoriously brittle and easy to evade.
"It leverages AI’s ability to understand language in context, enabling it to catch harmful or inappropriate content even when it’s masked by euphemisms, slang, or creative misspellings."
—Melody (MJ) Kaufmann
Kaufmann said that, as a gamer, she is familiar with how users are able to outsmart the algorithmic filters that game makers often use to block bad language. "As players find the filters, they adapt their language to circumvent the filter using creative spellings, numbers, and other workarounds. That, of course, creates a spiral of the developers creating more filters, which are promptly abused by nefarious elements within the community in new and unique ways," she said.
From a security standpoint, Chat Playground is thoughtfully scoped. Requiring users to supply their own OpenAI API key decentralizes risk and prevents abuse of the developer’s infrastructure, Kaufmann said.
"As an educator, I like that it encourages experimentation in a controlled, low-risk environment. Security teams evaluating new moderation technologies or designing guardrails for LLM-based applications may find this a useful sandbox to test behaviors, develop threat models, or educate peers."
—Melody (MJ) Kaufmann
Testing tool offers surprising depth
The most valuable security insight from Chat Playground isn't technological at all, said Dev Nag, CEO and founder of the QueryPal chatbot.
"It's watching nontechnical executives finally understand LLM vulnerabilities when they see a jailbreak happen live in front of them in 30 seconds'"
—Dev Nag
Nag said Chat Playground offers surprising depth despite its stripped-down appearance. It allows security researchers to test prompt injections, content filtering, and UI manipulation without needing specialized infrastructure or risking production systems. The tool's local pattern-matching Eliza clone, called SimpleBot, can also be valuable to red teams.
"It generates toxic content offline without API costs or bans, creating a perfect 'malicious oracle' for testing guardrail effectiveness."
—Dev Nag
The browser-only architecture enhances security, Nag said. "API keys stay in local storage, no server stores sensitive transcripts, and any malicious code returned by models remains inert text," he said. "The visual feedback showing moderation scores — like 98% violence probability — simultaneously helps defenders understand filter behavior while teaching attackers how to carefully craft threshold-skipping payloads."
"Playground serves a unique niche in security tooling by prioritizing immediate accessibility over comprehensive features."
—Dev Nag
Out-of-band AI controls are missing
Casey Bleeker, CEO and co-founder of SurePath AI, said he loves the focus on testing security controls for different use cases and hopes it will expand to include many of the complex guardrails and controls in use within the ecosystem. However, at its core, Chat Playground can’t apply testing of the most critical AI security controls: the out-of-band controls for AI policy.
Policy-based access controls to models, filtering and classification of requests separate from model inference, and enforcement of data controls based on user identity all but eliminate many of the enterprise risks included in the OWASP LLM Top 10, Bleeker said.
"Failure to apply out-of-band controls leaves the control plane for enforcement of policy squarely under the influence of the data plane, which is a foundational security flaw in any system, not just AI. Regular expressions and in-model guardrails can be valuable but can also be Band-Aids masking the larger unaddressed risks beneath."
—Casey Bleeker
Next up for Chat Playground
Wilson said that he will be expanding Chat Playground in the coming weeks with a new version that adds more guardrails, bots and techniques.
"I want to expand it to cover, not only traditional guardrails, but to cover things like supply-chain security, SBOMs, RAG [retrieval-augmented generation], and indirect prompt injections — all sorts of fun things like that."
—Steve Wilson
Learn how to secure your AI supply chain with an ML-BOM. RL's Dhaval Shah explains how ML-BOMs provide immediate visibility into every ML model in your environment.
Keep learning
- Read the 2025 Gartner® Market Guide to Software Supply Chain Security. Plus: See RL's webinar for expert insights.
- Get the white paper: Go Beyond the SBOM. Plus: See the Webinar: Welcome CycloneDX's xBOM.
- Go big-picture on the software risk landscape with RL's 2025 Software Supply Chain Security Report. Plus: See our Webinar for discussion about the findings.
- Get up to speed on securing AI/ML with our white paper: AI Is the Supply Chain. Plus: See RL's research on nullifAI and learn how RL discovered the novel threat,
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.