AI news
January 27, 2024

Many Lightweight Models as an Alternative to a Single Heavyweight

Assessing a new Language Model strategy.

Roshni Ramnani
by 
Roshni Ramnani

In this article, I will introduce what I consider a new set of techniques, aimed at helping us extract more from language models for building applications.

Leveraging multiple diverse Language models to respond to a user’s query.

Background:

When it comes to building and leveraging large language models — we have seen important contributions using 3 core ideas or techniques:

  1. The Very Large Language Model: This term refers to a large language model trained on a vast amount of data, known as pretraining. This model, with or without fine-tuning, can perform a multitude of tasks using simple instructions (prompt engineering) or through the use of exemplars (few-shot learning/in-context learning). Examples of these include GPT-3, GPT-4, Falcon, Llama 2, and Claude 2, to name a few. The challenge, of course, lies in the computational power required to build and use such large models, and the complexity involved in updating them with new knowledge.
  2. Using Large Models to Train Smaller Models: This approach involves training a smaller language model (e.g., Phi, Phi 1.5, Phi 2) using high-quality data from textbooks and/or synthetic data generated by LLM [1]. These smaller models have demonstrated the ability to mirror the performance of their larger counterparts.
  3. Use External Knowledge (RAG — Retrieval Augmented Generation) : This approach involves using external sources like files or knowledge graphs to provide the language model with specialized domain or company-specific information. Essentially, a search is performed for each query to retrieve the appropriate content from a knowledge store. The language model then uses this content as context to generate the correct answer. This technique, known as Retrieval-Augmented Generation (RAG), is the most popular method that promises to help LLMs gain traction in business applications.

The Art of Blending:

Let us consider a possible emerging category of techniques — one that involves ‘blending’ together multiple diverse language models. The key idea is to present the same question or context to each model and use a single or multiple answers to respond to the user. This blending approach treats each model as a ‘black box,’ meaning these models can have different architectures and be trained on various data sets.

Now the key question is — why? why bother to utilize multiple models?

One core reason is that although LLM’s ( 175+ Billion parameters ) like ChatGPT (GPT 3) can be leveraged in multiple tasks with high accuracy — large models are simply very expensive to use.

Second, LLM leaderboards are dominated by proprietary models and it is in everyones interest to find optimum ways to leverage open source model(s).

Interestingly, it has been found that using a combination of multiple models is indeed better than simply picking any one of them for open-domain chat. In LLM-Blender [2], the authors evaluated various models against a custom-defined set of 5000 questions and found that the best answers varied among different models (see Fig 1), with no single model emerging as the clear winner. The real complexity, of course, lies in automatically selecting the best answer for each question and potentially combining the top answers into one superior response. The downside I see here is the additional response time.

Also, we can use blending to generate more diverse responses by randomly selecting the response from each language model for every potential question. In a specific set of experiments, this approach was found to increase user retention and engagement [3]

As a potential benefit, we could utilize outliers in the answers of the LLM to help identify and mitigate hallucinations and biases, thus providing a better overall user experience.

I am personally curious if blending can be used for some degree of personalization — like shaping responses as per the users reading level or knowledge.

Fig1 “ Percentage of Best Answers per LLM. Source : LLMBlender [1]

Final Thoughts:

The concept of using multiple models for a single output is well-established in machine learning, with several ensemble techniques already mainstream, such as multiple decision trees in RandomForest/AdaBoost, or other mechanisms for combining diverse models types like Voting. A detailed explanation is available in various sources including Wikipedia [4]. These techniques improve performance in classification and regression tasks. Therefore, the idea of ensembling multiple language models for natural language processing tasks appears to be a natural extension of these existing methods.

How useful can blending be? To accurately judge its utility, we need to see much more work in this area, similar to the extensive research conducted with RAG. Perhaps in matching the performance of large language models, a combination of various techniques applied to smaller models could be the answer. For example, blending smaller models , RAG and active learning together might provide an effective solution.

References:

[1] https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

[2] LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion : https://arxiv.org/pdf/2306.02561.pdf

[3] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM : https://arxiv.org/abs/2401.02994

[4] https://en.wikipedia.org/wiki/Ensemble_learning