OpenAI doesn’t want you to know what its latest AI models are “thinking.” Since the company announced its “Strawberry” AI model family last week and touted the so-called inference capabilities of o1-preview and o1-mini, OpenAI has sent warning emails and emails to users trying to explore how its models work. Sent a ban threat. It works.
Unlike OpenAI’s previous AI models, such as GPT-4o, the company specifically trained o1 to go through a step-by-step problem-solving process before generating an answer. When a user asks a question to the “o1” model in ChatGPT, the user has the option to view this thought chain process exported to the ChatGPT interface. However, by design, OpenAI hides the raw thought chain from the user and instead displays a filtered interpretation created by a second AI model.
Nothing is more appealing to enthusiasts than hidden information, so hackers attempt to uncover o1’s raw thought chain using jailbreak and prompt injection techniques to trick models into revealing their secrets. The competition continues between the Red Team and the Red Team. There are some early reports of success, but nothing has been definitively confirmed yet.
Along the way, OpenAI has been monitoring it through its ChatGPT interface, and the company reportedly fiercely opposes any attempts, even by the merely curious, to probe o1’s reasoning. It is said that it is.
Screenshot of ChatGPT’s “o1-preview” output. A filtered chain of thought section appears just below the “Thinking” subheader.
Credit: Benji Edwards
Screenshot of ChatGPT’s “o1-preview” output. A filtered chain of thought section appears just below the “Thinking” subheader. Credit: Benji Edwards
One X user reported receiving a warning email when using the term “inference trace” in a conversation with o1 (confirmed by other users, including Scale AI prompts engineer Riley Goodside). Some people say that just asking ChatGPT about the model’s “inference” triggers a warning.
A warning email from OpenAI states that a particular user request has been flagged as violating its policies against circumvention of safeguards and security measures. “Please stop this activity and ensure you are using ChatGPT in accordance with our Terms of Service and Acceptable Use Policy,” it says. “Further violations of this policy may result in the loss of Reasoning’s access to GPT-4o,” referring to the internal name of the o1 model.
OpenAI warning email received by user after contacting o1-preview about its inference process. OpenAI warning email received by user after contacting o1-preview about its inference process. Credit: Marco Figueroa via X
Marco Figueroa, who manages Mozilla’s GenAI bug bounty program, was one of the first to post about the OpenAI warning email on X last Friday, leading to active red team safety research on the model. They complained that their abilities were being hindered. “I was so focused on #AIRedTeaming that I didn’t realize I received this email from @OpenAI yesterday after jailbreaking,” he wrote. “I’m now on the banned list!!!”