You know what an LLM is by now, but what’s RAG? What’s a transformer? A GPU? Knowing these AI technologies is key to building custom AI solutions or choosing new enterprise AI products and services as they launch.
Here’s a guide to the AI terms you need to understand to start building AI capabilities and integrate AI into your enterprise systems.
Key high-level AI terms
Artificial intelligence (AI)
A field of computer science aiming to create machines capable of intelligent behavior. AI encompasses the creation of algorithms and systems that perform tasks which typically require human intelligence – e.g. understanding language, recognizing images, solving problems, and learning.
The goal is not just to mimic human intelligence but to potentially solve problems in ways humans do not.
Machine learning (ML)
A subset of AI, where computers learn from data to improve their performance. Instead of being explicitly programmed, machines use data and algorithms to emulate the learning process. The focus is on developing programs that can access data and learn independently.
Sound like AI to you? Learn more about the difference between AI and machine learning here.
Inspired by biological neural networks, these are a series of algorithms that capture relationships in data. Deep learning, a subset of machine learning, involves neural networks with many layers that can learn increasingly abstract data features.
Deep learning (DL)
A subset of ML, deep learning uses neural networks with three or more layers. These layers can learn increasingly abstract representations of the data. Deep learning is particularly effective for complex tasks like speech recognition or image classification.
Specialized in generating new creations by learning from existing examples. These AI systems can create new content, like texts, images, or music.
ChatGPT, Stable Diffusion, Claude 2, and Cohere are popular generative AI examples. Gen AI is where everyone is focused.
Under the hood of generative AI
Transformers are the foundation of LLMs that make services like ChatGPT possible.
A transformer is a breakthrough architecture in ML that uses a mechanism called "attention" to significantly improve the efficiency of predicting sequences. This architecture underpins most current state-of-the-art AI language systems.
Get a thorough and simple-to-understand visual walkthrough of how transformers work here. Or a technical breakdown with a comprehensive catalog of transformers and their attributes here.
A high-level, interpreted programming language known for its simplicity and readability, Python is already a favorite for developers in the realms of web development, data analysis, artificial intelligence, and scientific computing.
It serves as a versatile tool, much like a Swiss Army knife for programmers, due to its extensive libraries and frameworks that cover a vast range of applications. Python is quickly becoming the foundation of AI-native software development.
Generative Pre-trained Transformer (GPT)
Yes, GPT as in ChatGPT. This artificial intelligence model is known for its ability to generate human-like text.
GPT models are trained on all text available from the internet up to a certain date to inform their internal context. To add up-to-date information, they must be connected to web browsing or external information sources through Retrieval Augmented Generation (RAG).
A class of generative models that craft new data (especially images or sounds) by gradually altering a random distribution of noise into structured patterns. This process is somewhat like a sculptor shaping a formless block into a detailed statue.
GLIDE, DALLE-3, Imagen, and Stable Diffusion are all image generators based on diffusion models.
Generative Adversarial Networks (GANs)
These are AI architectures where two neural networks contest with each other in a game. Conceptualized as a forger and a detective, one network generates content (the forger), while the other evaluates it (the detective). The generator creates increasingly convincing fake data, while the discriminator learns to get better at distinguishing the fake data from real data.
GANs are used in image generation, video generation, and voice generation. They’re especially good at providing accurate representations of human faces.
Convolutional Neural Networks (CNNs)
A specialized kind of neural network used primarily to process grid-like data, such as images. CNNs employ layers of convolutions, which are mathematical operations that filter input data to extract features like edges and shapes.
They function like a team of artists, each skilled in recognizing different aspects of an image, from basic textures to complex objects. As data passes through each layer, the network builds a detailed understanding, making it recognize and categorize visual information precisely.
CNNs are used to detect objects in self-driving cars, for image analysis in medicine, and for facial recognition.
The set of languages that humans use for daily communication, including spoken, written, and signed forms. Distinct from artificial or constructed languages, it evolves naturally over time through culture and social behaviors.
Natural language is what you would say to an AI system – e.g. “Alexa, order me the tasty paleo dinner plan for Friday night.” or “Write Python code that performs this task.”
Natural language processing (NLP)
Natural language processing enables machines to understand and respond to text or voice data.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models.
A technology that converts written text into spoken words using artificial voices. This technology acts as a vocal interpreter, giving voice to written words.
TTS is used in various applications like virtual assistants, navigation systems, and accessibility tools for those with reading difficulties.
LLMs, LMMs, vision models and their components
Large Language Models (LLMs)
GPT-4 Turbo, Llama 2, PaLM 2, Falcon, Cohere Command, and Yi-34B are the world’s leading LLMs. They’re highly advanced and scaled-up transformer models trained on billions or trillions of parameters.
LLMs can process and generate human-like text by learning from a massive corpus of literature, web pages, and other text sources. They must be fine-tuned or given more information through Retrieval Augmented Generation (RAG) to perform specific industry tasks well.
Here are the specs for the most powerful LLMs in 2023.
Like the human visual system can identify objects, faces, and scenes, vision models can perform tasks such as image classification, object detection, and scene reconstruction.
They are the backbone of various applications, from medical imaging analysis to autonomous vehicle navigation – acting as the "eyes" for AI systems to perceive and make sense of visual information.
AI systems that integrate and process more than one type of data input, such as text, images, and audio. They can interpret complex queries that include a combination of different data types.
For example, a multimodal model could analyze a social media post by looking at the image, the text caption, and the tone of any accompanying audio to understand the sentiment better.
The stage in machine learning where models 'learn' by adjusting their parameters to predict or classify data accurately. This involves feeding the model a large amount of training data to learn to make decisions or predictions.
Training for new LLMs requires prohibitively expensive supercomputing resources.
In machine learning, these are the variables that the learning algorithm adjusts through the training process. Parameters are the part of the model that is learned from historical training data.
They are critical to the model's ability to represent the relationship between input features and the target predictions. GPT-4 has 1.7 trillion parameters, while GPT-3 has 175 billion parameters.
While models with more parameters can be more powerful, they’re also more resource intensive and there is an effort to find models that can be more effective in smaller sizes.
The building blocks of text in NLP are similar to atoms in molecules. When a model processes text, it breaks down the input into tokens, allowing it to understand and generate language at a granular level.
In AI, this refers to the span of understanding—how much information or how many tokens a model can consider at once. This limit affects the model's ability to make coherent and contextually relevant responses.
e.g. GPT-4 Turbo has a context limit of 128k tokens or just over 300 pages of text.
An optimization process that takes an existing model, which has already been trained on a large dataset (pre-trained model), and makes it perform better for a specific task. This fine tuning takes supercomputer clusters to retrain the model.
Fine-tuning uses a smaller, often task-specific dataset, which could be related to a particular language, domain, or problem set for which the pre-trained model was not originally trained.
RLHF (Reinforcement Learning with Human Feedback)
This approach fine-tunes AI behavior by integrating human judgment into the reinforcement learning cycle. Humans provide feedback on the AI's actions, effectively teaching the model what is desirable or undesirable in a given context.
RLHF evolves an AI's capabilities beyond initial programming, incorporating nuanced human preferences and ethical considerations, not unlike an apprentice learning a craft under the watchful eye of a master.
RAG (Retrieval Augmented Generation)
RAG gives LLMs access to external data without the expense of fine-tuning and retraining the whole model on new datasets. With RAG, you could build an internal LLM with access to secure company data that could improve productivity for everyone. Employees could query the LLM about business intelligence, process, or customer data and generate new material through the LLM for their role or department.
It’s like consulting a library of books to gather information before writing an essay – ensuring the content is both relevant and rich in detail.
Specialized storage systems designed to efficiently index and retrieve high-dimensional vectors, which are often the output of machine learning models, especially in tasks related to similarity search.
These databases enable quick and accurate retrieval of items—such as images, text, or audio—by converting them into vectors and comparing their 'distance' to find matches. It's akin to finding the closest match in a sea of data points based on their positions on a multidimensional map. Vector databases are a crucial part of RAG architecture.
Human-facing AI products
Interactive programs like ChatGPT and Claude 2 are designed to simulate conversation with human users, employing natural language processing to interpret and respond to queries. They operate like a digital concierge, assisting with tasks ranging from answering FAQs to more complex activities like shopping assistance or customer service.
The sophistication of a chatbot can vary from simple rule-based systems to advanced AI-driven agents capable of learning and personalization over time.
Autonomous Agents or GPTs
Systems or software that perform tasks without human intervention guided by a set of policies or learning algorithms. They can be virtual, like chatbots or personal assistants, or physical, like robots and self-driving cars.
Think of them as independent entities that can observe their environment, make decisions, and act to achieve specific goals, akin to a self-sufficient explorer navigating unknown territory. OpenAI's GPT Store is full of GPTs, a kind of digital autonomous agents.
The craft of designing and formulating inputs (prompts) to effectively communicate with AI models, particularly those involved in natural language processing, to elicit desired outputs or behaviors.
With thoughtful design, these prompts can harness an AI's vast knowledge base to produce specific and relevant outcomes, much like using the right keywords can yield the most helpful search results.
AI hardware and compute
Graphics Processing Units (GPUs)
Initially designed to render graphics, these devices are now pivotal in AI development. They can perform many parallel operations, making them ideal for the matrix and vector computations that are common in machine learning.
Many of the biggest companies in the world have GPUs on backorder. These physical processors, mostly made by Nvidia, are the backbone of every AI service. And a major limiting factor for the whole AI market.
Tensor Processing Units (TPUs)
Custom-designed chips by Google – optimized specifically for the tensor calculations used in neural networks. They speed up machine learning and offer high throughput for AI workloads.
Field-Programmable Gate Arrays (FPGAs)
Chips that can be reprogrammed to suit different purposes and tasks, used to accelerate specific AI processes or workflows.
Cloud provides the infrastructure necessary for storing massive datasets and the computational power needed to train complex models.
Platforms like AWS, Google Cloud, Azure provide on-demand AI services and infrastructure, allowing for scalable compute resources that can be increased or decreased as needed.
Want help building custom AI capabilities?
You could spend the next 6-18 months planning to recruit and build an AI team, but in that time, you won’t be building any AI capabilities. That’s why Codingscape exists.
We can assemble a senior AI development team for you in 4-6 weeks. It’ll be faster to get started, more cost-efficient than internal hiring, and we’ll deliver high-quality results quickly. We’ve been busy building AI capabilities for our partners and helping them plan their AI investments for 2024.
Zappos, Twilio, and Veho are just a few companies that trust us to build their software and systems with a remote-first approach.
You can schedule a time to talk with us here. No hassle, no expectations, just answers.
new content is published
Cole is Codingscape's Content Marketing Strategist & Copywriter.