Most powerful LLMs (Large Language Models)
Read Time 34 mins | Written by: Cole
[Last updated July 2024]
The LLMs (Large Language Models) underneath the hood of ChatGPT, Claude, Gemini, and other generative AI tools are the tech your company needs to understand. LLMs make chatbots possible (internal and customer-facing), can assist in increasing coding efficiency, and are the driving force behind why Nvidia exploded into the most valuable company in the world.
Model size, context window size, performance, cost, and availability of these LLMs determine what you can build and how expensive it is to run.
Here are the important stats of the most powerful LLMs available – from the GPT-4o API to the world’s best open-source models.
LLMs (Large Language Models) for enterprise systems
OpenAI LLMs
ChatGPT and OpenAI are household names when it comes to large language models (LLMs). They started the generative AI firestorm with $10 billion in Microsoft funding and their GPT models have been at the top of the best LLMs available ever since.
Last time we updated this article, GPT-5 wasn’t launched but Sam Altman had already told Stanford students that GPT-4 would be the “dumbest” model anyone would have to use again.
GPT-4o Mini – GPT-4o is the most affordable frontier model available. It’s faster and cheaper than most full-sized models and has a lot of GPT-4o’s smarts – scoring 82% MMLU. At at 15 cents per million input tokens and 60 cents per million output tokens, it's an order of magnitude more affordable than previous frontier models and more than 60% cheaper than GPT-3.5 Turbo.
- Model Size: 1.76 trillion parameters (unconfirmed by OpenAI)
- Context Window Size: 128k tokens
- Max Output: 16k tokens
- Vision: Yes
- Audio: No
- Knowledge Cutoff: Oct 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Developer API, ChatGPT Plus, ChatGPT Enterprise, Azure OpenAI Service
- Cost: Input: $0.15 per 1M tokens | Output: $0.60 per 1M tokens
GPT-4o – GPT-4o is faster, cheaper, and more human than GPT-4 Turbo (and other leading models). GPT-4o has a 128K context window, is multimodal, and generates text 2x faster. GPT-4o is 50% cheaper than GPT-4 Turbo, across both input tokens ($5 per million) and output tokens ($15 per million).
- Model Size: 1.76 trillion parameters (unconfirmed by OpenAI)
- Context Window Size: 128k tokens
- Max Output: 4k tokens
- Vision: Yes
- Audio: Yes
- Knowledge Cutoff: Oct 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Developer API, ChatGPT Plus, ChatGPT Enterprise, Azure OpenAI Service
- Cost: Input: $5 per 1M tokens | Output: $15 per 1M tokens
GPT-4 Turbo – GPT-4 Turbo is faster, has a bigger context window (128k tokens), and is significantly cheaper than GPT-4. On top of being one of the best LLMs available for developers via API, GPT-4 Turbo also has vision capabilities. It’s not as good as GPT-4 at complex logic but it hallucinates less, is more stable, and better for real-time interactions.
- Model Size: 1.76 trillion parameters (unconfirmed by OpenAI)
- Context Window Size: 128k tokens
- Max Output: 4k tokens
- Vision: Yes
- Audio: No
- Knowledge Cutoff: Dec 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Developer API, ChatGPT Plus, ChatGPT Enterprise, Azure OpenAI Service
- Cost: Input: $10.00 per 1M tokens | Output: $30.00 per 1M tokens
GPT-4 – GPT-4 is more expensive than GPT-4 turbo and better at complex tasks. Compared to GPT-3.5 Turbo, it’s more advanced in logic, math, and general applications – making it better at code generation. GPT-4 also has vision capabilities and is one of the most popular, high-performing LLMs available for developers via API.
- Model Size: 1.76 trillion parameters (unconfirmed by OpenAI)
- Context Window Size: 32k tokens
- Max Output: 4k tokens
- Vision: Yes
- Knowledge Cutoff: April 2023
- Performance: OpenAI GPT-4 benchmarks
- Technical documentation: GPT-4 technical sheet
- Availability: Developer API, ChatGPT Plus, ChatGPT Enterprise, Azure OpenAI Service
- Cost: Input: $60.00 per 1M tokens | Output: $120.00 per 1M tokens
GPT-3.5 Turbo – GPT-3.5 Turbo is the most affordable LLM from OpenAI. It’s not as good at logic, doesn’t include vision, and doesn’t sound as human as GPT-4o or 4 Turbo.
- Model Size: 175 billion parameters
- Context Window Size: 16k tokens
- Max Output: 4k tokens
- Vision: No
- Knowledge Cutoff: September 2021
- Performance: OpenAI GPT-3.5 benchmarks
- Availability: Developer API, ChatGPT Plus, ChatGPT Enterprise, Azure OpenAI Service
- Cost: Input: $0.50 per 1m tokens | Output: $1.50 per 1 million tokens
Anthropic LLMs
Anthropic was founded by ex-OpenAI VPs who wanted to prioritize safety and reliability in AI models. They moved slower than OpenAI but their Claude 3 family of LLMs were the first to take the crown from OpenAI GPT-4 on the leaderboards in early 2024. Anthropic released Claude 3 Sonnet to outperform GPT-4o and all of their own Claud 3 models in intelligence, speed, and cost.
Claude 3.5 Sonnet – The fastest, most cost-efficient, and highest-performing LLM as of 6.20.24 – Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus. It’s especially good at code generation – Claude 3.5 Sonnet solved 64% of coding problems, outperforming Claude 3 Opus which solved 38%.
It also introduces a new way to use Claude that isn’t just a chat UI called Artifacts. When you generate content like code snippets, text documents, or website designs, these Artifacts appear in a dedicated window alongside the conversation.
- Model Size: Unknown
- Context Window Size: 200k tokens
- Max Output: 4k tokens
- Vision: Yes
- Knowledge cutoff: April 2024
- Ethical AI feature: Claude 3 Opus follows Anthropic’s Constitutional AI framework and has dedicated teams that track and mitigate a broad spectrum of risks.
- Performance: Claude 3.5 benchmarks
- Tech documentation: Claude 3.5 Sonnet
- Availability: Anthropic API, Claude Pro, Claude Team, Amazon Bedrock, Google Vertex
- Cost: Input: $3 per 1M tokens / Output: $15 per 1M tokens
Claude 3 Opus – This is Anthropic’s most powerful LLM for highly complex tasks – from vision functions to code generation. It was the first model to beat GPT-4 on many benchmarks – including undergraduate level knowledge and graduate level reasoning.
Its 200k context window is matched with near-perfect recall in needle in a haystack (NIAH) scenarios. Claude 3 Opus was the first LLM to beat GPT-4 Turbo on the Chatbot Arena Leaderboard.
- Model Size: 500 billion - 2 trillion parameters (unconfirmed by Anthropic)
- Context Window Size: 200k tokens (up to 1 million)
- Max Output: 4k tokens
- Vision: Yes
- Knowledge cutoff: August 2023
- Ethical AI feature: Claude 3 Opus follows Anthropic’s Constitutional AI framework and has dedicated teams that track and mitigate a broad spectrum of risks.
- Performance: Claude 3 model family benchmarks
- Tech documentation: Claude 3 Model Card
- Availability: Anthropic API, Claude Pro, Claude Team, Amazon Bedrock, Google Vertex
- Cost: Input: $15 per 1M tokens / Output: $75 per 1M tokens
Claude 3 Opus use cases
- Task automation: plan and execute complex actions across APIs and databases, interactive coding
- R&D: research review, brainstorming and hypothesis generation, drug discovery
- Strategy: advanced analysis of charts & graphs, financials and market trends, forecasting
Claude 3 Sonnet – Sonnet hits the sweet spot for Anthropic enterprise customers with an ideal balance of intelligence and speed. It’s designed for large-scale deployments with strong, reliable performance at lower costs than Opus.
Also high on the leaderboard, Claude 3 Sonnet ranks alongside GPT-4, Command R+, Llama 3 and Nemotron 340B instruct.
- Model Size: ~70 billion (unconfirmed by Anthropic)
- Context Window Size: 200k tokens (up to 1 million)
- Max Output: 4k tokens
- Vision: Yes
- Knowledge Cutoff: August 2023
- Ethical AI feature: Claude 3 Sonnet follows Anthropic’s Constitutional AI framework and has dedicated teams that track and mitigate a broad spectrum of risks.
- Performance: Claude 3 model family benchmarks
- Tech documentation: Claude 3 Model Card
- Availability: Anthropic API, Claude Pro, Claude Team, Amazon Bedrock, Google Vertex
- Cost: Input: $3 per 1M tokens / Output: $15 per 1M tokens
Claude 3 Sonnet use cases
- Data processing: RAG or search & retrieval over vast amounts of knowledge
- Sales: product recommendations, forecasting, targeted marketing
- Time-saving tasks: code generation, quality control, parse text from images
Claude 3 Haiku – Compact size with near-instant response, Claude 3 Haiku excels at customer interaction, content moderation, and cost-saving automations. For its low cost, high speed, and accuracy it ranks surprisingly close to some of the most powerful LLMs on the Chatbot arena leaderboard.
- Model Size: ~20 billion (unconfirmed by Anthropic)
- Context Window Size: 200k tokens (up to 1 million)
- Max Output: 4k tokens
- Vision: Yes
- Knowledge cutoff: August 2023
- Ethical AI feature: Claude 3 Haiku follows Anthropic’s Constitutional AI framework and has dedicated teams that track and mitigate a broad spectrum of risks.
- Performance: Claude 3 model family benchmarks
- Tech documentation: Claude 3 Model Card
- Availability: Anthropic API, Claude Pro, Claude Team, Amazon Bedrock, Google Vertex
- Cost: Input: $0.25 per 1M tokens / Output: $1.25 per million tokens
Claude 3 Haiku use cases
- Customer interactions: quick and accurate support in live interactions, translations
- Content moderation: catch risky behavior or customer requests
- Cost-saving tasks: optimized logistics, inventory management, extract knowledge from unstructured data
Google LLMs
Google was notoriously far behind on commercial LLMs – even though a Google team developed the revolutionary transformer technology that makes LLMs possible. They’ve since caught up in capabilities with the Gemini family multimodal models and their 1-2 million token context windows.
Gemini 1.0 Ultra – This is Google’s most capable and largest model for highly-complex tasks. It’s a multimodal model that works with images, audio, video, and code.
- Model Size: ~1.56 trillion parameters (unconfirmed by Google)
- Context Window Size: 32k tokens
- Max Output: 4k tokens
- Vision: Yes
- Knowledge cutoff: Connected to internet
- Performance: Claude 3 model family benchmarks
- Tech documentation: Gemini 1 report
- Availability: Developer preview
- Cost: Not available via API (as of 5.8.24)
Gemini 1.5 Pro – Google’s best model for scaling across a wide range of tasks. It’s a multimodal model that works with images, audio, video, and code.
- Model Size: ~500 billion (unconfirmed by Google)
- Context Window Size: 128k tokens (up to 2 million tokens)
- Max Output: 4k tokens
- Vision: Yes
- Knowledge cutoff: Connected to internet
- Performance: Gemini MMLU scores
- Tech documentation: Gemini 1.5 whitepaper
- Availability: Google API
- Cost: Input: $7 per 1M tokens / Output: $21 per 1M tokens
Mistral LLMs
Mixtral Large – Mistral Large achieves strong results on commonly used benchmarks, making it the world's second-ranked model generally available through an API (next to GPT-4). It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation.
Model Size: Unknown parameters
Context Window Size: 32k tokens
Max Output: 4k tokens
Vision: No
Knowledge cutoff:
Performance:
Tech documentation: API doc
Availability: Azure, Amazon Bedrock
Cost: Input: $4 per 1M tokens / Output: $12 per 1M tokens
01.AI Yi LLMs
Yi Large – The Yi series models are the next generation of open-source large language models trained from scratch by 01.AI. Yi Large is available commercially through the O1.AI API and quickly jumped into the top 10 on the LMSYS Leaderboard.
- Model Size: Unknown
- Context Window Size: 16k
- Max Output: 4k tokens
- Vision: No
- Knowledge cutoff: Unknown
- Performance: LMSYS Leaderboard
- Tech documentation: API and model info
- Availability: 01.AI API
- Cost: Input: $2.5 per 1M tokens / Output: $10 per 1M tokens
Cohere LLMs
Cohere Command R+ – Command R+ is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It is best suited for complex RAG workflows and multi-step tool use.
It’s listed here as a paid model with prices through Cohere’s API, but it’s also available as one of the best open source models.
- Model Size: 104 billion parameters
- Context Window Size: 128k
- Max Output: 4k tokens
- Vision: No
- Knowledge cutoff:
- Performance: LMSYS Leaderboard
- Tech documentation: Model card
- Availability: Cohere API, Hugging Face, Azure AI, Amazon Bedrock
- Cost: Input: $3 per 1M tokens / Output: $15 per 1M tokens
Command R+ use cases
- Advanced Retrieval Augmented Generation (RAG) with citation to reduce hallucinations
- Multilingual coverage in 10 key languages to support global business operations
- Tool Use to automate sophisticated business processes
Open source LLMs for enterprise
Nvidia LLMs
Nvidia is known for their GPUs but they have a whole enterprise AI ecosystem – from dev tools to their NIM microservices platform. They had early entries into LLM space with ChatRTX and Starcoder 2 but their most powerful LLM offering is the Nemotron-4 340B model family.
Nemotron-4 340B Base – An LLM that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs. This model has 340 billion parameters, and supports a context length of 4,096 tokens. It is pre-trained for a total of 9 trillion tokens, consisting of a diverse assortment of English-based texts, 50+ natural languages and 40+ coding languages.
Model Size: 340 billion parameters
Context Window Size: 4096 tokens
Max Output: 4k tokens
Knowledge cutoff: June 2023
Performance: LMSYS Chatbot Arena Leaderboard
Availability: Nvidia NGC, Hugging Face
License Type: NVIDIA Open Model License
Nemotron-4 340B Instruct – An LLM used for synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs. It is a fine-tuned version of the Nemotron-4-340B-Base model, optimized for English-based single and multi-turn chat use-cases. It supports a context length of 4,096 tokens.
Model Size: 340 billion parameters
Context Window Size: 4096 tokens
Max Output: 4k tokens
Knowledge cutoff: June 2023
Performance: LMSYS Chatbot Arena Leaderboard
Availability: Nvidia NGC, Hugging Face
License Type: NVIDIA Open Model License
Nemotron-4 340B Reward – A multidimensional Reward Model (outputs multiple scalar values) that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs. Made from the Nemotron-4-340B-Base model it supports a context length of up to 4,096 tokens.
Model Size: 340 billion parameters
Context Window Size: 4096 tokens
Max Output: 4k tokens
Knowledge cutoff: June 2023
Performance: LMSYS Chatbot Arena Leaderboard
Availability: Nvidia NGC, Hugging Face
License type: NVIDIA Open Model License
Meta Llama 3.1 LLMs
While Meta is commonly known for being the champion of open source in AI, their models are open weights and not true open source according to many. Either way, open weights still means you can run these models locally – which you can't do with OpenAI LLMs.
Llama 3.1 405B: The first open weights model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation.
- Model Size: 405 billion parameters
- Context Window Size: 128k tokens
- Max Output: 4k tokens
- Knowledge cutoff: December 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Meta, Amazon Bedrock, Azure, Hugging Face
- License type: Custom commercial license
Llama 3.1 70B: Llama 3.1 70B excels in grasping language nuances and understanding context. It’s adept at handling complex tasks like translation and generating dialogues. Enhanced scalability and performance allow it to manage multi-step tasks with ease. Post-training refinements have significantly reduced the rate of false refusals – giving more accurate and diverse answers. Has capabilities in reasoning, code generation, and following instructions.
- Model Size: 70 billion parameters
- Context Window Size: 8k tokens
- Max Output: 4k tokens
- Knowledge cutoff: December 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Meta, Amazon Bedrock, Azure, Hugging Face
- License type: Custom commercial license
Llama 3 8B: Llama 3.1 8B shows what is possible when you train a relatively small model on a huge number of tokens (15 trillion). That’s a training dataset 7x larger than used for Llama 2, including 4x more code. Llama 3 8B consistently beats out similar sized models like Gemma 7B and Mistral 7B.
- Model Size: 8 billion parameters
- Context Window Size: 8k tokens
- Max Output: 4k tokens
- Knowledge cutoff: March 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Meta, Amazon Bedrock, Azure, Hugging Face
- License Type: Custom commercial license
Yi series LLM
Yi-34B-Chat – Trained on 3T multilingual tokens. Ideal for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. Yi-34B model ranked first among all existing open-source models (such as Falcon-180B, Llama-70B, Claude) in both English and Chinese on various benchmarks – including Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval (based on data available up to November 2023).
- Model Size: 34 billion parameters
- Context Window Size: 32k -200k
- Max Output: 4k tokens
- Knowledge cutoff: June 2023
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Hugging Face
- License Type: Apache 2.0
Qwen LLMs
Qwen refers to the LLM family built by Alibaba Cloud.
Qwen2-72B-Instruct – Qwen2 has generally surpassed most open source models and demonstrated competitiveness against proprietary models across a series of benchmarks targeting for language understanding, language generation, multilingual capability, coding, mathematics, reasoning, etc.
- Model Size: 72 billion parameters
- Context Window Size: 131k tokens
- Max Output: 4k tokens
- Knowledge cutoff: June 2024
- Performance: LMSYS Chatbot Arena Leaderboard
- Tech documentation: Qwen 2 details
- Availability: Hugging Face
- License Type: Qianwen License
Qwen 1.5 110B Chat – The first 100B+ parameter model of the Qwen1.5 series, it’s comparable with Meta-Llama3-70B performance. This LLM is multilingual – supports English, Chinese, French, Spanish, German, Russian, Korean, Japanese, Vietnamese, Arabic, etc.
- Model Size: 110 billion parameters
- Context Window Size: 32k tokens
- Max Output: 4k tokens
- Knowledge cutoff: April 2024
- Performance: LMSYS Chatbot Arena Leaderboard
- Tech documentation: Qwen 1.5 details
- Availability: Hugging Face
- License Type: Qianwen License
Mistral LLMs
Mixtral 8x22b – A Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its sparse activation patterns make it faster than any dense 70B model, while being more capable than any other open-weight model (distributed under permissive or restrictive licenses). It’s also one of the most cost-effective and has strong mathematics and coding capabilities
- Model Size: 8 billion parameters
- Context Window Size: 64k tokens
- Max Output: 4k tokens
- Knowledge cutoff: April 2024
- Performance: LMSYS Chatbot Arena Leaderboard
- Tech documentation: Mixtral 8x22B model card
- Availability: Hugging Face
- License Type: Apache 2.0
Other open source LLMs
Falcon 180B – Falcon is an LLM developed by the Technology Innovation Institute (TII) and hosted on the Hugging Face hub.
- Model Size: 180 billion parameters
- Context Window Size: 4k tokens
- Max Output: 4k tokens
- Knowledge cutoff: Dec 2022
- Performance: LMSYS Chatbot Arena Leaderboard
- Availability: Hugging Face, Azure
- License Type: Falcon-180B TII License
New LLMs specific for software development
Claude 3 Opus – Claude 3 is winning over developers for code generation when compared to GPT-4 and Github Copilot. Its 200k context window size makes it ideal for pasting large samples of code, refactoring code, and general coding tasks.
Opus outperforms other models on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more.
Code Llama – In benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks. It has the potential to make workflows faster and more efficient for developers and lower the barrier to entry for people learning to code.
Code Llama is available in three models:
- Code Llama: the foundational code model
- Code Llama Python: specialized for Python
- Code Llama Instruct: fine-tuned for understanding natural language instructions (e.g., code me a website in HTML with these features)
Code Llama supports many of the most popular languages being used today – including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash. It can also be used for code completion and debugging.
Three sizes of Code Llama are being released with 7B, 13B, and 34B parameters, respectively. Each of these models is trained with 500B tokens of code and code-related data. The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to insert code into existing code.
Code Llama is free for research and commercial use.
StarCoder 2 – Nvidia released this family of open-source LLMs for code generation in collaboration with BigCode (backed by ServiceNow and HuggingFace.) StarCoder 2 supports hundreds of programming languages and delivers the best-in-class accuracy. It helps advanced developers build apps faster with code completion, auto-fill, advanced code summarization, and relevant code snippet retrievals.
The StarCoder2 family includes 3B, 7B, and 15B parameter models, giving flexibility to pick the one that fits your use case and meets your compute resources. StarCoder 2 has a context length of 16,000 tokens – letting it handle longer sections of code. The models have been trained responsibly, with 1 trillion tokens on permissively licensed data from GitHub.
Github Copilot – Github Copilot is the most recognized name in code generation – increasing developer productivity by up to 55%. You can use it to start a conversation about your codebase – whether you’re hunting down a bug or designing a new feature. It can help you improve code quality and security.
GitHub Copilot is trained on all languages that appear in public repositories. For each language, the quality of suggestions you receive may depend on the volume and diversity of training data for that language.
For example, JavaScript is well-represented in public repositories and is one of GitHub Copilot’s best supported languages.
How do I hire a senior AI development team that knows LLMs?
You could spend the next 6-18 months planning to recruit and build an AI team that knows LLMs. Or you could engage Codingscape.
We can assemble a senior AI development team for you in 4-6 weeks. It’ll be faster to get started, more cost-efficient than internal hiring, and we’ll deliver high-quality results quickly.
Zappos, Twilio, and Veho are just a few companies that trust us to build their software and systems with a remote-first approach.
You can schedule a time to talk with us here. No hassle, no expectations, just answers.
Don't Miss
Another Update
new content is published
Cole
Cole is Codingscape's Content Marketing Strategist & Copywriter.