In artificial intelligence, Generative AI has emerged as a transformative force, capable of creating original content that mirrors human ingenuity. At the heart of this revolution lie large language models (LLMs), sophisticated AI systems trained on vast amounts of text data to comprehend and generate human language.
What is a Language Model?
Before you even get into the nuances and mechanics of a large language model, you need to understand what a language model is.
A language model is a model that estimates the probability of a token or sequence of tokens occurring within a longer sequence of tokens. For example, consider the following sentence:
When my car stops accelerating, it must be the ________.
If we assume that a token is a word, then a language model determines the probabilities of different words or sequences of words to replace that underscore. For example, a language model might determine the following probabilities:
- accelerator (64%)
- gas (22%)
- brakes (8%)
- driver (6%)
A “sequence of tokens” could be an entire sentence or a series of sentences. That is, a language model could calculate the likelihood of different entire sentences or blocks of text.
Context improves Predictions: Tokens provide Context
Our brains can keep relatively long contexts. For instance, while watching Act 3, you can remember a character introduced in Act 1. Similarly, the punchline of a long joke makes you laugh because you can remember the context from the setup.
In language modeling, a token is a chunk of text, the smallest unit the model can see.
In some models, the smallest unit of text the model can see might be whole words. In other models, it might be characters. Most transformer language models today use “subword” tokens, which are chunks of text that vary from single characters to whole words. Most common words are represented by a single token, but less common words get split up into multiple chunks.
For common English-language text, one token generally corresponds to ~4 characters (100 tokens ~= 75 words), but tokenization is also language-dependent. Spanish or Cyrillic token counts would probably be different.
Tokens are traditionally associated with language modeling but are now being successfully applied to computer vision and audio generation tasks.
LLMs are not merely parroting existing text; they can learn from the nuances of language, absorbing the intricacies of grammar, syntax, and semantics. This enables them to produce human-quality text, ranging from creative writing to informative summaries, with remarkable fluency and coherence.
LLMs: Really Big Neural Language Models
The size of a very large neural network model is typically described by the number of parameters it uses. The more parameters you have, the more “complex” your network will be.
A Point of Clarification: A potential issue of confusion for those new to this area is the difference between parameters and token counts. Some models will say “trained on 1 trillion tokens” while others might say “a 340B parameter model”. The distinction is that:
- Parameters = weights
- Tokens = pieces of training data
A larger model parameter or token count does not necessarily mean that the LLM is superior to another. In fact, smaller language models are ideal for specialized use cases and can be significantly cheaper and more efficient versus using a massive LLM. Recent research even indicates that training smaller models on more data could significantly boost performance.
If size is not indicative of performance, why are tech firms pushing to grow language models to such large metrics?
Many factors contribute to model growth.
First, these large models would not have been possible without the ever-increasing size of computer memory over the past 6 decades. Additionally, accelerator processors (GPUs and TPUs) are designed with manipulating matrices in mind.
The effectiveness of models such as recurrent neural networks (RNN) and convolutional neural networks (CNN) for working with sequential data and images, respectively, encouraged deeper and wider networks.
However, the most recent and largest contributing factor can be attributed to a 2017 paper on the transformer (encoder/decoder) model for processing large language models. This paper introduced the idea of self-attention; which allows for parallelism at a scale previously unattainable.
From this point on, the size of language models grew exponentially.
Large Language Model Benefits
Larger transformer models are extremely effective at predicting and generating results on tasks they have been trained on.
But more than that, according to the 2022 paper Emergent Abilities of Large Language Models, there appears to be a model size threshold, above which the model can perform tasks that it was not trained on. Models this large have emergent abilities.
Examples of Generative AI Large Language Models
Several prominent LLMs have garnered attention in the AI landscape, each with its unique strengths and capabilities. We’ve highlighted some of the most in-demand LLMs of 2023, but here’s a snapshot:
- GPT-4 by OpenAI: This is the latest and most advanced multimodal large language model from OpenAI. Debuting in March 2023, it reportedly has 1.7 trillion parameters and can perform a wide range of tasks, including text generation, translation, and even tutoring. GPT-4 scored in the 90th percentile on the Uniform Bar Exam and the 99th percentile for the Biology Olympiad. That put the whole world on notice: Generative AI is for real.
- PaLM2 by Google: This is one of the largest LLMs in the world, with 340 billion parameters trained on 3.6 trillion tokens. It can perform a wide range of tasks, including text generation, translation, and question-answering. Google has announced betas of specialized variants of PaLM2 for cybersecurity (Sec-PaLM2) and medical (Med-PaLM2) use.
- Claude by Anthropic: Anthropic is quickly rising, and Claude is a large reason why. Claude can help with use cases including summarization, search, creative and collaborative writing, Q&A, coding, and more.
Potential Use Cases for Large Language Models
The applications of LLMs extend far beyond mere text generation; their potential impact is felt across diverse industries and domains:
- Content Creation: LLMs can revolutionize how we create content, from generating compelling marketing copy and engaging social media posts to crafting captivating scripts and producing original literary works.
- Customer Service / Virtual Agent: LLMs can power virtual assistants or “chatbots” capable of providing personalized customer support, answering questions, resolving issues, offering recommendations, enhancing customer satisfaction, and reducing operational costs.
- Education / AI Tutor: LLMs can transform education by providing personalized learning experiences, tailoring instruction to individual needs and learning styles, and offering real-time feedback and guidance.
- Language Translation: LLMs can break down language barriers, enabling seamless translation of documents, conversations, and real-time communication, fostering global collaboration and understanding.
- Research and Development: LLMs can accelerate research efforts by analyzing vast amounts of data, identifying patterns and insights, and generating new hypotheses, leading to breakthroughs in various fields.
As LLMs continue to evolve, their potential applications are only limited by our imagination.
While LLMs have significant utility, there are many myths about what generative AI can do. Media hype, disingenuous influencers on social media, and not understanding the technical requirements spread misinformation and oversell the technology.
Perhaps the greatest utility the average consumer will have from LLMs will be the integration of generative AI into Microsoft 365 and Google Workspace. Increasing efficiency and productivity for repetitive tasks such as resume generation, requests for proposals, and summarization will have enormous savings across global enterprises.