The growing popularity of large language models (LLM) has also created interest in embedding models, deep learning systems that compress the features of different data types into numerical representations.
Embedding models are one of the key components of retrieval augmented generation (RAG), one of the important applications of LLMs for the enterprise. But the potential of embedding models goes beyond current RAG applications. The past year has seen impressive advances in embedding applications, and 2024 promises to have even more in stock.
How embeddings work
The basic idea of embeddings is to transform a piece of data such as an image or text document into a list of numbers representing its most important features. Embedding models are trained on large datasets to learn the most relevant features that can tell different types of data apart.
For example, in computer vision, embeddings can represent important features such as the presence of certain objects, shapes, colors, or other visual patterns. In text applications, embeddings can encode semantic information such as concepts, geographical locations, persons, companies, objects, and more.
In RAG applications, embedding models are used to encode the features of a company’s documents. The embedding of each document is then stored in a vector store, a database that specializes in recording and comparing embeddings. At inference time, the application computes the embedding of new prompts and sends them to the vector database to retrieve the documents whose embedding values are closest to that of the prompt. The content of the relevant documents is then inserted into the prompt and the LLM is instructed to generate its responses based on those documents.
This simple mechanism plays a great role in customizing LLMs to respond based on proprietary documents or information that was not included in their training data. It also helps address problems such as hallucinations, where LLMs generate false facts due to a lack of proper information.
Beyond basic RAG
While RAG has been an important addition to LLMs, the benefits of retrieval and embeddings go beyond matching prompts to documents.
“Embeddings are primarily used for retrieval (and maybe for nice visualizations of concepts),” Jerry Liu, CEO of LlamaIndex, told TechForgePulse. “But retrieval itself is actually quite broad and extends beyond simple chatbots for question-answering.”
Retrieval can be a core step in any LLM use case, Liu says. LlamaIndex has been creating tools and frameworks to allow users to match LLM prompts to other types of tasks and data, such as sending commands to SQL databases, extracting information from structured data, long-form generation, or agents that can automate workflows.
“[Retrieval] is a core step towards augmenting the LLM with relevant context, and I imagine most enterprise LLM use cases will need to have retrieval in at least some form,” Liu said.
Embeddings can also be used in applications beyond simple document retrieval. For example, in a recent study, researchers at the University of Illinois at Urbana-Champaign and Tsinghua University used embedding models to reduce the costs of training coding LLMs. They developed a technique that uses embeddings to choose the smallest subset of a dataset that is also diverse and representative of the different types of tasks that the LLM must accomplish. This allowed them to train the model at a high quality with fewer examples.
Embeddings for enterprise applications
“Vector embeddings introduced the possibility of working with any unstructured and semi-structured data. Semantic search—and, to be honest, RAG is a type of semantic search application—is just one use case,” Andre Zayarni, CEO of Qdrant, told TechForgePulse. “Working with data other than textual (image, audio, video) is a big topic, and new multimodal transformers will make it happen.”
Qdrant is already providing services for using embeddings in different applications, including anomaly detection, recommendation, and time-series processing.
“In general, there are a lot of untapped use cases, and the number will grow with upcoming embedding models,” Zayarni said.
More companies are exploring the use of embedding models to examine the large amounts of unstructured data they are generating. For example, embeddings can help companies categorize millions of customer feedback messages or social media posts to detect trends, common themes, and sentiment changes.
“Embeddings are ideal for enterprises looking to sort through huge amounts of data to identify trends and develop insights,” Nils Reimers, Embeddings Lead at Cohere, told TechForgePulse.
Fine-tuned embeddings
2023 saw a lot of progress around fine-tuning LLMs with custom datasets. However, fine-tuning remains a challenge, and few companies with great data and expertise are doing it so far.
“I think there will always be a funnel from RAG to finetuning; people will start with the easiest thing to use (RAG), and then look into fine-tuning as an optimization step,” Liu said. “I anticipate more people will do finetuning this year for LLMs/embeddings as open-source models themselves also improve, but this number will be smaller than the number of people that do RAG unless we somehow have a step-change in making fine-tuning super easy to use.”
Fine-tuning embeddings also has its challenges. For example, embeddings are sensitive to data shifts. If you train them on short search queries, they will not do as well on longer queries, and vice versa. Similarly, if you train them on “what” questions they will not perform as well on “why” questions.
“Currently, enterprises would need very strong in-house ML teams to make embedding finetuning effective, so it’s usually better to use out-of-the-box options, in contrast to other facets of LLM use cases,” Reimers said.
Nonetheless, there have been advances in making the training process for embedding models more efficient. For example, a recent study by Microsoft shows that pre-trained LLMs such as Mistral-7B can be fine-tuned for embedding tasks with a small dataset generated by a strong LLM. This is much simpler than the traditional multi-step process that requires heavy manual labor and expensive data acquisition.
The pace at which LLMs and embedding models are advancing, we can expect more exciting developments in the coming months.
TechForgePulse's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.