Almost everyone knows about ChatGPT and Bing’s AI search feature, but the LLMs (Large Language Models) underneath the hood are the tech your company needs to understand better. These make chatbots possible (internal and customer-facing), can assist in increasing coding efficiency, and have driven $12 billion in equity funding in 2023 alone.
With the generative AI market poised to reach $1.3 trillion by 2033, it’s worth knowing about the currently available LLMs. Here are the most recent versions of the most powerful LLMs available – from open-source models to paid services.
LLMs (Large Language Models) for enterprise systems
- Model size: GPT-4 has a staggering parameter count of 1.76 trillion, which is much larger compared to its predecessor, GPT-3, which has 175 billion parameters.
- Context window: GPT-4 supports generating and processing up to 32,768 tokens, which allows for much longer content creation or document analysis than previous models.
- Performance: While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers. It’s also reported that GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4% in a medical context.
- Availability: GPT-4 is currently available to subscribers of ChatGPT Plus and as an API for developers to build applications and services. All existing API developers with a history of successful payments can access the GPT-4 API with 8K context. It’s also available in the new ChatGPT Enterprise service.
Llama 2 (Open Source) – Llama 2 is an open-source LLM developed by Meta. It’s available for free for research and commercial use.
- Model size: Llama 2 comes in a variety of sizes: 7B, 13B, and 70B parameters. There’s also a chat-tuned variant, appropriately named Llama 2-chat, available in sizes of 7B, 13B, 34B, and 70B parameters.
- Context window: Llama 2 has a context length of 4096 tokens, twice that of its predecessor, Llama 1. This larger context length allows for more in-depth comprehension and completion of tasks and relevant information passed in the user prompt.
- Performance: Llama 2 models have shown impressive performance. They outperform open-source chat models on most benchmarks tested. For example, the Llama-2-chat models are optimized for dialogue use cases and are fine-tuned for chat-style interactions through supervised fine-tuning and reinforcement learning with human feedback (RLHF).
- Availability: Llama 2 models are available through various platforms. They are available on Microsoft Azure, Amazon Web Services (AWS), Hugging Face, and other providers. Qualcomm announced that it will make the Llama 2 model available on Snapdragon-powered mobile devices and desktops in early 2024.
Claude 2 (Closed Source) – Claude 2 is a large language model with a chatbot built on top and developed by Anthropic. Its massive context window of 100,000 tokens currently sets it apart from other LLMs.
- Model size: Claude 2 is trained on over 130 billion parameters.
- Context window: Claude 2 has a large context window that can handle up to 100,000 tokens in a single prompt, a significant leap from Claude’s previous 9,000 token limit. 100,000 tokens is 75,000 words or the size of a full-length novel.
- Ethical AI feature: Claude 2 is one of the first constitutional AI chatbots. It has been trained to make judgments based on a set of principles taken from documents, including the 1948 UN Declaration and Apple’s terms of service. This ensures that the model can expand into ethical issues in the digital domain.
- Performance: Claude 2 has improved performance in various areas, such as coding, math, and reasoning. For example, it scored 76.5% on the multiple-choice section of the Bar exam, up from 73.0% with Claude 1.3. When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning. On the Codex HumanEval, a Python coding test, Claude 2 scored a 71.2%, up from 56.0% with Claude 1.3.
- Availability: Claude 2 is available in beta starting in the U.S. and U.K. on the web for free with limited use and via a paid API (in limited access). Anthropic is working to make Claude more globally available.
PaLM 2 (Open Source) – PaLM 2 is a transformer-based model released by Google with better multilingual, reasoning, and coding capabilities. It’s also more compute-efficient than its predecessor, PaLM.
- Model size: Google will make PaLM 2 available in four sizes from smallest to largest: Gecko, Otter, Bison, and Unicorn2. According to Google, PaLM has 540 billion parameters, so the “significantly smaller” should put PaLM 2 anywhere between 10 to 300 billion parameters.
- Context window: 32,000 tokens
- Performance: PaLM 2 significantly outperforms its predecessor, PaLM, in some mathematical, translation, and reasoning tasks. It excels at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation.
- Availability: PaLM 2 is available to developers through Google’s PaLM API. It’s also available through the Google AI Platform. The smallest model, Gecko, can reportedly run on a mobile device.
Falcon (Open Source) – Falcon is an LLM developed by the Technology Innovation Institute (TII) and hosted on the Hugging Face hub.
- Model size: Falcon comes in two base models: Falcon-40B and Falcon-7B. The flagship model, Falcon 180B, is a 180-billion-parameter LLM trained on 3.5 trillion tokens.
- Context window: Falcon LLM has a predefined context window size of 2048. However, there are efforts to extend the context length of Falcon 40B to 10k.
- Performance: As of September 2023, Falcon 180B ranked as the highest-performing pre-trained LLM on the Hugging Face Open LLM Leaderboard. It performs comparable to Google’s PaLM 2 (Bard) and is not far behind GPT-4. It even outperforms GPT-3.5 on some benchmarks.
- Availability: Falcon models are available in Model Catalog on the Azure Machine Learning platform due to the Microsoft and Hugging Face partnership. The larger variant, Falcon 180B, is available for demo and ranks as the highest-performing pre-trained LLM on the Hugging Face Open LLM Leaderboard.
New LLMs specific for software development
While many of the above LLMs have coding skills, Meta’s new Code Llama LLM (released Aug. 23, 2023) is explicitly designed for writing code. So far, it’s outperforming other publicly available LLMs.
If you’re looking to build a tool for developers to increase their efficiency, Code Llama is a good place to start.
Code Llama: In benchmark testing, Code Llama outperformed state-of-the-art publicly available LLMs on code tasks. It has the potential to make workflows faster and more efficient for developers and lower the barrier to entry for people learning to code.
Code Llama is available in three models:
- Code Llama: the foundational code model
- Code Llama Python: specialized for Python
- Code Llama Instruct: fine-tuned for understanding natural language instructions (e.g., code me a website in HTML with these features)
Three sizes of Code Llama are being released with 7B, 13B, and 34B parameters, respectively. Each of these models is trained with 500B tokens of code and code-related data. The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, allowing them to insert code into existing code.
Code Llama is free for research and commercial use.
Future enterprise LLMS to look out for
It’s been popular to poke Google for “losing AI” to OpenAI in 2023, but they’re set to launch Google Gemini by the end of this year, and it’ll start out 5x as powerful as the latest GPT-4 models.
And Google can train Gemini on all the data from search, Ads, YouTube, etc. – which no one else can do. Details like model size, content size, and training tokens aren’t available for Gemini yet. Besides the massive model size and advanced DeepMind reasoning capabilities that helped AlphaGo defeat a Go champion, Google has the hardware and infrastructure to outperform almost any company in the world in terms of compute resources.
Watch for a Google Gemini launch by the end of 2023, especially if you already use Google Cloud.
What can you do with these LLMs?
You can build your own internal chatbots, highly personalized customer-facing experiences, and access new territory in predictive analytics.
But first, you need to get your data cleaned up and centralized, figure out what business cases matter, and hire a team of AI experts to integrate LLMs and other AI systems.
How do I hire a senior AI development team that knows LLMs?
AI experts are hard to find and in incredibly high demand – from Netflix’s $900,000 a year AI product manager position to Meta losing ⅓ of its AI research team to startups and competitors. You could spend the next 6-18 months planning to recruit and build an AI team (if you can afford it), but in that time, you won’t be building any AI capabilities.
That’s why Codingscape exists.
We can assemble a senior AI development team for you in 4-6 weeks. It’ll be faster to get started, more cost-efficient than internal hiring, and we’ll deliver high-quality results quickly. We’ve been busy building AI capabilities for our partners and helping them plan their AI investments for 2024.
Zappos, Twilio, and Veho are just a few companies that trust us to build their software and systems with a remote-first approach. We know enterprise AI systems at scale and love to help companies start using the latest developments in AI, solve hard problems efficiently, and get a competitive advantage.
You can schedule a time to talk with us here. No hassle, no expectations, just answers.
new content is published
Cole is Codingscape's Content Marketing Strategist & Copywriter.