back to blog

OpenAI o1-preview – frontier LLM for math, science, code

Read Time 6 mins | Written by: Cole

OpenAI o1-preview – frontier LLM for math, science, code

OpenAI introduced a breakthrough in frontier LLMs – OpenAI o1-preview – a model that excels in complex science, math, language, and coding problems. By leveraging chain-of-thought (backed by reinforcement learning), o1-preview outperforms previous models like GPT-4o across competitive benchmarks – from complex math to science problems at a PhD level.

Here’s an AI researcher using o1-preview to code a visualization that describes self-attention mechanisms:

How OpenAI o1-preview reasons through complex problems

o1-preview uses a "chain-of-thought" process – mimicking human problem-solving by thinking through challenges step-by-step. This method allows o1-preview to refine its strategies and correct mistakes – dramatically improving its ability to tackle tasks requiring deep reasoning.openai o1 performance graph

The model’s achievements are impressive. It ranks in the top 500 students in the US for the prestigious USA Math Olympiad (AIME) and consistently outperforms human experts on scientific challenges. In chemistry, biology, and physics benchmarks – o1-preview is the first LLM to surpass the performance of PhD-level experts.

  • Chain-of-thought – This process enables o1 to break down difficult problems into manageable steps, improving its accuracy in reasoning-heavy tasks.
  • Reinforcement learning – Continuous learning that allows o1 to think more effectively and solve problems with more precision.
  • Advanced problem solving – From math challenges like AIME to high-level scientific questions, o1 excels where prior models fell short.

How o1-preview uses chain-of-thought

Just like a person might take time to think carefully before answering a complex question, o1-preview employs a step-by-step reasoning process, known as a chain-of-thought (CoT), to tackle challenges. 

When you use o1-preview, you can see its CoT steps to understand how the LLM solved your problem. openai o1-preview chain-of-thought

This CoT process is backed by a large-scale reinforcement learning algorithm. That means the performance of o1-preview consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute).

It also adapts by shifting approaches when needed, significantly boosting its reasoning capabilities. 

OpenAI o1-preview use cases

OpenAI o1-preview has been in the wild for 2 weeks now. It’s significantly better than GPT-4o at solving complex problems and reasoning tasks in areas like genetics, quantum physics, math, and coding.openai o1-performance improvements

o1 preview uses a chain-of-thought step before answering questions to “think” for you. This is the “reasoning” capacity of the model and it takes a different style of prompting that OpenAI explains here.

It can solve crossword puzzles – something that other models can’t do. 

And when used in AI-software engineer Devin, it performs significantly better than GPT-4o at coding.OpenAI o1 devin performance

o1-preview excels at coding

OpenAI trained starting from o1, that scored 213 points and ranked in the 49th percentile at the 2024 International Olympiad in Informatics (IOI). 

The model faced the same conditions as human contestants, solving six tough algorithmic problems in 10 hours, with up to 50 submissions per problem.

o1-preview coding performance -1

The model reached an Elo rating of 1807, beating 93% of human competitors and far surpassing GPT-4o’s rating of 808, which placed it in the 11th percentile.

When people prefer o1-preview vs GPT-4o

OpenAI compared o1-preview and GPT-4o on tough, open-ended prompts from many areas, beyond just exams and academic tests. Human evaluators were shown anonymous responses from both models and asked to pick which one they liked better.openai human preferences evail o1-preview

 o1-preview was chosen much more often for tasks that require strong reasoning, like data analysis, coding, and math. GPT-4o still performed better in some natural language tasks, meaning o1-preview isn’t a huge improvement in personal writing or editing.

It’s also slower and costs more than GPT-4o.

But, OpenAI released a smaller version – o1-mini – that still excels at coding and is 80% cheaper than o1-preview.

o1-mini is 80% cheaper, faster, and excels at coding

To provide developers with a more efficient option, OpenAI also introduced o1-mini, a smaller, faster, and more affordable model optimized for coding tasks. 

o1-mini is 80% cheaper than o1-preview – making it an ideal choice for apps that require strong reasoning capabilities without needing extensive world knowledge.

o1-preview and o1-mini via API and ChatGPT

OpenAI o1-preview is now available in preview for ChatGPT Plus and Team users and select API users (Tier 5).

Both o1-preview and o1-mini can be selected manually in the model picker, and at launch – weekly rate limits will be 30 messages for o1-preview and 50 for o1-minio1-preview in chatgpt

Developers who qualify for API usage tier 5 can start prototyping with both models in the API today with a rate limit of 20 RPM. OpenAI plans to increase those limits after more testing. 

The API for the o1 models is limited – it doesn’t currently include function calling, streaming, support for system messages, and other features. Developers can learn more in the API documentation.

How do I hire senior AI engineers to build with OpenAI o1 LLMs? 

We can assemble a senior AI development team for you in 4-6 weeks and start building your apps with OpenAI o1 LLMs. It’ll be faster to get started, more cost-efficient than internal hiring, and we’ll deliver high-quality results quickly.

 Zappos, Twilio, and Veho are just a few companies that trust us to build their software and systems with a remote-first approach.

You can schedule a time to talk with us here. No hassle, no expectations, just answers.

Don't Miss
Another Update

Subscribe to be notified when
new content is published
Cole

Cole is Codingscape's Content Marketing Strategist & Copywriter.