RL Blog
|

Red teamers take on AI at DEF CON 31

It takes a village... Researchers play capture the flag to find vulns in tools like ChatGPT — with a White House assist.

Richi Jennings
Blog Author

Richi Jennings, Independent industry analyst, editor, and content strategist. Read More...

def-con-ai-village-richixbw--bernard-spragg--cc0

At this year’s DEF CON, large language models (LLMs) come under scrutiny. Infosec researchers can compete to find vulnerabilities in the new generation of generative AIs.

From bias, to hallucination and jailbreaks, expect much egg on face this summer. In this week’s Secure Software Blogwatch, we prime for prompt action.

Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: Onions.

Near the Tannhäuser Gate

What’s the craic? Elias Groll reports — “Coming to DEF CON 31: Hacking AI models”:

Safety and security concerns
Attendees at the premier hacking conference held annually in Las Vegas in August will be able to attack models from Anthropic, Google, Hugging Face, Microsoft, NVIDIA, OpenAI and Stability AI in an attempt to find vulnerabilities. [It’s] part of a White House initiative to address the security risks posed by the rapidly advancing technology.

In the rush to launch these models … experts are concerned that companies are moving too quickly … without properly addressing … safety and security concerns. … The event hosted at the AI Village is expected to draw thousands of security researchers. … Any bugs discovered will be disclosed using industry-standard responsible disclosure practices.

There’s no need to feel down. Jessica Lyons Hardcastle predicts “a weekend of red-teaming by infosec's village people”:

Capture the flag
The collaborative event, which AI Village organizers describe as "the largest red teaming exercise ever for any group of AI models," will host "thousands" of people, including "hundreds of students from overlooked institutions and communities," all of whom will be tasked with finding flaws. … Think: traditional bugs in code, but also problems more specific to machine learning.

There will be a capture-the-flag-style point system. … Whoever gets the most points wins a high-end Nvidia GPU. The event is also supported by the White House Office of Science, Technology, and Policy; America's National Science Foundation's Computer and Information Science and Engineering (CISE) Directorate; and the Congressional AI Caucus.

What’s this about jailbreaks? Benj Edwards explains — “White House challenges hackers”:

Difficult to lock down
LLMs such as ChatGPT … come with inherent risks. [They] have proven surprisingly difficult to lock down — in part due to a technique called "prompt injection," … a technique that can derail a language model into performing actions not intended by its creator.

DEF CON 31 will take place on August 10–13, 2023.

So, basically, we’re seeking Blade Runners? namaria thinks it’ll all end in tears (in rain):

Non-deterministic computer programming (which is conceptually what ML/AI is) … sounds great and can truly unlock whole areas of problem and solutions spaces we didn't even know existed. Until you realize you're just creating infrastructure where you cannot possibly hope to determine behavior — something that is fundamental to debugging and security.

In fact, this is a legit use of the word “hackers.” As wub explains:

I feel that to find the nasty corners in ML, as in anything, we need folks with the hacker spirit. Coloring inside the lines isn't going to expose the obscure problems.

The more the merrier. … We can always use more diverse viewpoints banging on these things.

Is it going to tell us more than we already know? Asvarduil offers this colorful metaphor:

The findings will be something along the lines of, "Has more holes than Swiss cheese." This is an immature technology.

What I'm more concerned about is the stupid stuff a red-team test won't find. With businesses and organizations already considering using A"I" … in decision-making in fields like medicine, which can have life or death consequences for real people … the FDA and other regulators need to be involved. … I'm not sure this effort is going to be thorough enough.

Can’t wait? grapescheesee is going into hiding in mid-August:

I can wait on the mainstream fear-porn and out of context excerpts.

What happens at DEF CON stays at DEF CON? Mizagorn makes an oblique Vegas reference:

A high-end GPU? That's it, with all those sponsors? … Low stakes.

Meanwhile, dtich imagines things you people wouldn’t believe:

Can Bing take over a sewage treatment plant's PLCs? Can it social engineer its way into the Pentagon communication bunker? What happens when ChatGPT rewrites its own prompt filter to allow everything that the human minders decided against? Can Stable Diffusion create a fake passport that passes for real? If given internet access and a credit card can one of these systems have a Chinese Military Supplier build a drone and give it the remote codes? Will one of these systems be able to post a fake election video to YouTube that is taken for genuine and foments a popular revolt?

And Finally:

DoD alliums

 

Previously in And finally


You have been reading Secure Software Blogwatch by Richi Jennings. Richi curates the best bloggy bits, finest forums, and weirdest websites … so you don’t have to. Hate mail may be directed to @RiCHi or ssbw@richi.uk. Ask your doctor before reading. Your mileage may vary. Past performance is no guarantee of future results. Do not stare into laser with remaining eye. E&OE. 30.

Image sauce: Bernard Spragg (cc:0; leveled and cropped)

Keep learning


Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

More Blog Posts

Do More With Your SOAR

Do More With Your SOAR

Running an SOC is complex — and running without the best tools makes it more difficult. Learn how RL File Enrichment can automate and bolster your SOC.
Read More