Toronto-based AI startup Cohere has launched Embed V3, the latest iteration of its embedding model, designed for semantic search and applications leveraging large language models (LLMs).

Embedding models, which transform data into numerical representations, also called “embeddings,” have gained significant attention due to the rise of LLMs and their potential use cases for enterprise applications. 

Embed V3 competes with OpenAI’s Ada and various open-source options, promising superior performance and enhanced data compression. This advancement aims to reduce the operational costs of enterprise LLM applications.

Embeddings and RAG

Embeddings play a pivotal role in various tasks, including retrieval augmented generation (RAG), a key application of large language models in the enterprise sector.

RAG enables developers to provide context to LLMs at runtime by retrieving information from sources such as user manuals, email and chat histories, articles, or other documents that weren’t part of the model’s original training data.

To perform RAG, companies must first create embeddings of their documents and store them in a vector database. Each time a user queries the model, the AI system calculates the prompt’s embedding and compares it to the embeddings stored in the vector database. It then retrieves the documents that are most similar to the prompt and adds the content of these documents to the user’s prompt language, providing the LLM with the necessary context.

Solving new challenges for enterprise AI

RAG can help solve some of the challenges of LLMs, including lack of access to up-to-date information and the generation of false information, sometimes referred to as “hallucinations.”

However, as with other search systems, a significant challenge of RAG is to find the documents that are most relevant to the user’s query.

Previous embedding models have struggled with noisy data sets, where some documents may not have been correctly crawled or don’t contain useful information. For instance, if a user queries “COVID-19 symptoms,” older models might rank a less informative document higher simply because it includes the term “COVID-19 has many symptoms.”

Cohere’s Embed V3, on the other hand, demonstrates superior performance in matching documents to queries by providing more accurate semantic information on the document’s content.

In the “COVID-19 symptoms” example, Embed V3 would rank a document discussing specific symptoms such as “high temperature,” “continuous cough,” or “loss of smell or taste,” higher than a document merely stating that COVID-19 has many symptoms. 

According to Cohere, Embed V3 outperforms other models, including OpenAI’s ada-002, in standard benchmarks used to evaluate the performance of embedding models. 

Embed V3 is available in different embedding sizes and includes a multilingual version capable of matching queries to documents across languages. For example, it can locate French documents that match an English query. Moreover, Embed V3 can be configured for various applications, such as search, classification and clustering. 

Advanced RAG 

According to Cohere, Embed V3 has demonstrated superior performance on advanced use cases, including multi-hop RAG queries. When a user’s prompt contains multiple queries, the model must identify these queries separately and retrieve the relevant documents for each of them.

This usually requires multiple steps of parsing and retrieval. Embed V3’s ability to provide higher-quality results within its top-10 retrieved documents reduces the need to make multiple queries to the vector database.

Embed V3 also improves reranking, a feature Cohere added to its API a few months ago. Reranking allows search applications to sort existing search results based on semantic similarities. 

“Rerank is especially strong for queries and documents that address multiple aspects, something embedding models struggle with due to their design,” a spokesperson for Cohere told TechForgePulse. “However, Rerank requires that an initial set of documents is passed as input. It is critical that the most relevant documents are part of this top list. A better embedding model like Embed V3 ensures that no relevant documents are missed in this shortlist.”

Moreover, Embed V3 can help reduce the costs of running vector databases. The model underwent a three-stage training process, including a special compression-aware training method. “A major cost factor, often 10x-100x higher than computing the embeddings, is the cost for the vector database,” the spokesperson said. “Here, we performed a special compression-aware training, that makes the models suitable for vector compression.”

According to Cohere’s blog, this compression stage ensures the models work well with vector compression methods. This compatibility significantly reduces vector database costs, potentially by several factors, while maintaining up to 99.99% search quality.

TechForgePulse's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.