OpenAI Announces a New AI Model

22 أكتوبر 2024

OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step by Step

This new model also exceeds human PhD-level performance in physics, biology, and chemistry, as evidenced by its performance on the GPQA (General Physics Question Answering) benchmark. OpenAI’s decision to release an early version of OpenAI o1, called OpenAI o1-preview, highlights their commitment to continuously improving the model while making it available for real-world testing through ChatGPT and trusted API users.

The new model is slower than GPT-4o, and OpenAI says it does not always perform better—in part because, unlike GPT-4o, it cannot search the web and it is not multimodal, meaning it cannot parse images or audio.

Mark Chen, vice president of research at OpenAI, demonstrated the new model to WIRED, using it to solve several problems that its prior model, GPT-4o, cannot. These included an advanced chemistry question and the following mind-bending mathematical puzzle: “A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of the prince and princess?” (The correct answer is that the prince is 30, and the princess is 40).

OpenAI’s Chen says that the new reasoning approach developed by the company shows that advancing AI need not cost ungodly amounts of compute power. “One of the exciting things about the paradigm is we believe that it’ll allow us to ship intelligence cheaper,” he says, “and I think that really is the core mission of our company.”

To demonstrate the advancements of OpenAI o1, OpenAI tested the model on various benchmarks, including competitive programming exams, math tests, and science challenges. The results were remarkable. For instance, on the USA Math Olympiad qualifier (AIME), OpenAI o1 performed at a level comparable to the top 500 math students in the U.S. GPT-4o, by comparison, only solved 12% of the problems. In contrast, OpenAI o1 averaged a 74% success rate, with an impressive 93% accuracy when using consensus among multiple samples.

Source: Wired, Marktechpost