Engineering

Select your LLM’s

Shri Kumar D

Nov 29, 2023 — 4 min read

Large language models (LLMs) like GPT-3, and PaLM have sparked a new era of AI-generated text. These models can produce remarkably human-like writing and unlock exciting possibilities for businesses. However, the LLM landscape is filled with options. How do you determine which model best fits your specific use case?

A basic understanding of how LLMs and deep learning work will help guide your selection process. At a higher level, LLMs are trained on massive text datasets to generate intelligent text by predicting the next word in a sequence. Knowledge of the underlying architecture (transformer), training approach (self-supervised learning), and parameters will allow you to evaluate different models better.

In this comprehensive guide, we’ll outline the key factors to consider when selecting an LLM while referring to the core deep learning concepts that power them. A thorough understanding of your needs and the models' capabilities will help you choose the right LLM for your goals.

Defining Your Needs and Use Case

The first step is clearly defining what you want to achieve with an LLM. Ask yourself the following.

What are the main applications? Creative content, conversational AI, code generation?
Do you need longer, highly coherent text generation or concise responses?
Is accuracy on factual answers critical?
Does the model need to be fine-tuned to a niche domain?

Having defined objectives will inform the size, architecture, and capabilities required in a model.

Evaluating Model Architecture:

LLMs come in different architectures like GPT, BERT, and BART. Knowing how Transformer models process language helps you choose the best structure. For example:

GPT models excel at textual generation, creativity, and open-ended tasks.
BERT models are better for answering question answering and searching.
BART combines auto-encoding and auto-regressive capabilities well suited for summarization.

Assessing Model Size:

As model size grows into the billions of parameters, so too does the knowledge capacity and coherence of the text generated. However, the computing power required scales massively. More compact models in the 6 billion parameter range, while having some limits in quality, possess greater feasibility. Discover your optimal balance between text quality needs, size implications, and budget realities. With thoughtful prioritization, you can land on a generative model that strikes the right chord for your aims.

We now have many open-sourced models that we’ll be able to find on HuggingFace, some of which have 100B+ parameters.

More parameters certainly do not always mean better results. The result is based on the problem domain. Most of the models are trained on generic data sources. For that to work in the problem domain, the model needs to be finetuned to the context. Then comes the question, “What is the benefit of finetuning 70B vs 7B?” Well, smaller. The model should possess fundamental knowledge to understand a language like English. Once the model is good at a language, it's all about inputs and outputs.

If we consider AI models a function, the inputs are the domain data, and the task and output are the expected results. Different inputs are fed at different points, domain data is fed during finetuning, and the task is provided during inference. The output is produced based on these two inputs. Focus more on your inputs than the model parameters. The larger the number of parameters, the larger the model's size, resulting in complex hardware requirements and ensuing enormous costs.

Leveraging Fine-Tuning:

Most LLMs benefit immensely from fine-tuning domain-specific data related to your use case. Opt for models and platforms that allow transfer learning and customization. This tailors the model to your needs.

APIs vs Self-Hosted Models:

Identifying whether we should use APIs vs self-hosted models is a big decision. Three major factors that we see from experience that drive this decision are compliance and legal needs like HIPAA, data security, and cost.

While API offers pay-as-you-go, running similar solutions in a self-hosted infrastructure is still challenging. For example, if it were a microservice, with serverless deployment and pay-as-you-go billing, ensuring no upfront cost. Regarding LLM, provisioning a machine and running the inference on demand on self-hosted options is cumbersome and impractical as it would result in delays and inefficiencies. Simply put, serverless for LLM is only an option once we go with tiny models that can run on containers (Yes, it is possible to run models on containers). Choosing the suitable model, whether it is a specific model vs a large model, is a mix of art and science.

Assessing Vendor Reputation & Responsibility:

When using a third-party LLM vendor, ensure they have a strong reputation and track record in developing quality models responsibly. Vet their capabilities, support channels, and commitments to AI ethics.

Responsibility and Ethics:

With an open-source stack, you define content filters, ethical constraints, and data practices. Proprietary models embed the values of their creators.

Start Your LLM Journey Today:

With clear objectives, an understanding of model capabilities, and a thoughtful selection process, you'll be prepared to find the ideal large language model to take your AI initiatives to the next level. To identify a suitable model, it is inevitable to understand the problem better, which will solve 20% of the problem (A well-understood problem is a half-solved problem). Once the model is identified, having clean contextual domain data that meets the data quality standards defines the next 30%. The remaining 50% comes from practices like prompt engineering and tuning the model with hyperparameters. In our experience, we produced similar results for large models like Open AI GPT3.5 using smaller models like Llama2-7B with calibrating and hyperparameter tuning. We could even run these models (finetuning + inference) on consumer-grade machines using techniques like LoRa and deploy the models on smaller machines using quantizations that drastically reduced the model's size. The best model is only sometimes the large model. Do not get carried away too much by the name of LLMs ...:) Meet you on the next blog.