The Rise of Small Language Models (SLMs) (original) (raw)

This article has been updated from when it was originally published on February 14, 2024.

The impressive power of large language models (LLMs) has evolved substantially during the last couple of years. These versatile AI-powered tools are in fact deep learning artificial neural networks that are trained with massively large datasets, capable of leveraging billions of parameters (or machine learning variables) in order to perform various natural language processing (NLP) tasks.

These can run the gamut from generating, analyzing and classifying text, all the way to generating rather convincing images from a text prompt, to translating content into different languages, or chatbots that can hold human-like conversations. Well-known LLMs include proprietary models like OpenAI’s GPT-4, as well as a growing roster of open source contenders like Meta’s LLaMA.

But despite their considerable capabilities, LLMs can nevertheless present some significant disadvantages. Their sheer size often means that they require hefty computational resources and energy to run, which can preclude them from being used by smaller organizations that might not have the deep pockets to bankroll such operations. With larger models there is also the risk of algorithmic bias being introduced via datasets that are not sufficiently diverse, leading to faulty or inaccurate outputs — or the dreaded “hallucination” as it’s called in the industry.

What Are Small Language Models?

These issues might be one of the many that are behind the recent rise of small language models, or SLMs.

Small language models are slimmed-down versions of their larger cousins, and for smaller enterprises with tighter budgets, SLMs are becoming a more attractive option, because they are generally easier to train, fine-tune and deploy, and also cheaper to run.

How Small Language Models Stack Up Next to LLMs

Small language models are essentially more streamlined versions of LLMs, in regards to the size of their neural networks, and simpler architectures.

Compared to LLMs, SLMs have fewer parameters and don’t need as much data and time to be trained — think minutes or a few hours of training time, versus many hours to even days to train a LLM. Because of their smaller size, SLMs are therefore generally more efficient and more straightforward to implement on-site, or on smaller devices.

How Small Language Models Work

Similar to their larger cousins, small language models utilize a type of deep learning neural network architecture known as the transformer model. Introduced by Google researchers back in 2017 via a paper titled Attention Is All You Need, transformers have revolutionized natural language processing (NLP) during the last few years, paving the way for the generative pre-trained transformers (GPTs) that underlie some of today’s most massive and powerful large language models.

Generally, these are the basic building blocks of the transformer model architecture:

How Small Language Models Are Created

Small language models are typically made from large language models using an approach called model compression, which results in smaller models that are more resource-efficient and performant, yet still relatively accurate.

Some techniques of model compression include:

Benefits and Limitations of Small Language Models

Examples of Small Language Models

Nevertheless, despite some of these potential limitations, some SLMs like Microsoft’s recently introduced 2.7 billion-parameter Phi-2, demonstrate state-of-the-art performance in mathematical reasoning, common sense, language understanding, and logical reasoning that is remarkably comparable to — and in some cases, exceed — that of much heftier LLMs. According to Microsoft, the efficiency of the transformer-based Phi-2 makes it an ideal choice for researchers who want to improve safety, interpretability and ethical development of AI models.

Other SLMs of note include:

Use Cases for Small Language Models

Because of their smaller size, and reduced computational and operational cost, businesses and institutions can easily fine-tune and tailor small language models to a specific use.

For instance, SLMs could be used as chatbots to offer timely customer service, or utilized to summarize content or create calendar events for users. These smaller models could also be used to translate foreign languages in real-time, generate programming code, or to monitor or perform preventative maintenance on devices linked to the Internet of Things (IoT). Within automotive systems, SLMs can go a long way in offering real-time traffic updates for smarter road navigation, or improving voice commands or handsfree calling.

The Future Ahead for Small Language Models

Ultimately, the emergence of small language models signals a potential shift from expensive and resource-heavy LLMs to more streamlined and efficient language models, arguably making it easier for more businesses and organizations to adopt and tailor generative AI technology to their specific needs. As language models evolve to become more versatile and powerful, it seems that going small may be the best way to go.

YOUTUBE.COM/THENEWSTACK Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to stream all our podcasts, interviews, demos, and more. SUBSCRIBE

Group Created with Sketch.