From Rags to Riches – AI in Process Automation

Empowering your Workforce with Knowledge

A company may be rich in information, much of it is proprietary, which is likely distributed across many thousands of documents, such as email, Word, PowerPoint, Excel and PDF files. The challenge here is that although it is rich in information it lacks the ability to readily retrieve such information in response to a query from a user.

The solution to this challenge is RAG (Retrieval Augmented Generation) which has now only become practical with the availability of Large Language Models (LLM) and Vector database.

In the very simplest terms, you can think of RAG as being a Google keyword search but on steroids. The emphasis here is very much focused on the steroids. A keyword search, sometimes described as a lexical search looks for the literal matches of the keyword without any understanding of the overall meaning of the query. Suppose I had a keyword search for the word ‘car’. If the referenced material had no occurrence of the word ‘car’ but had used the term ‘vehicle’ then no results would be returned.

RAG systems perform a semantic search based on meaning and would understand that the terms ‘car’ and ‘vehicle’ are closely associated with one another. Such searches seeks to improve search accuracy by understanding the intent of the question and the contextual meaning of terms as they appear in the searchable space. This searchable space could include the world wide web, however in this post we will assume that it refers to a closed system containing the company’s intellectual property.

RAG systems can do more than just answer questions they can also be asked to summarise information and present it in a format that is suitable for the user. For example, a response that is returned to an adult is probably not appropriate for a teenager.

Use Case

Question Answering
RAG systems excel in answering complex questions by leveraging a large corpus of external knowledge. Instead of relying solely on pre-existing training data, they dynamically fetch and incorporate information from relevant sources to provide accurate and up-to-date answers.

Content Creation
In fields like journalism, content marketing, or technical writing, RAG systems can assist in generating articles, reports, or summaries by retrieving pertinent information from diverse sources. This ensures that the content is comprehensive, well-researched, and aligned with current developments.

Customer Support and Chatbots
RAG systems can enhance chatbot capabilities by retrieving specific information related to customer queries. This allows chatbots to provide more accurate and contextually relevant responses, improving customer satisfaction and efficiency in handling inquiries.

Naive RAG

The major components of a simple RAG system are:

Document Loaders
Typically, information is stored in a variety of file formats. The objective of loaders is to convert this data into a more structured format called ‘documents’. Such documents contain both the content of the original documents as well as meta data such as source and timestamps.

Transform and Embed
Large language models (LLM) have a context window. This is the amount of text that a model can handle at any one point in time when generating a response. Thus, in practical terms it is necessary to breakdown a document into chunks. However, this is not a straightforward process as ideally you want to keep semantically linked pieces of text together.

Embeddings are used to create a vector representation of the text. As stated earlier, this vector representation is key to supporting a semantic search.

Store
The previously created embeddings are stored in the vector database. With the explosion of AI and the demand for such technology there are numerous offering available. These may be pure vector databases or conventional SQL/NoSQL databases which have been enhanced to provide vector support. Examples of the latter include: PostgreSQL, Cassandra, MongoDB, Clickhouse etc.

User Question and Answer Session
The user writes a query in their natural language. This is passed to a large language model (LLM) which is typically, but not exclusively, cloud based. Examples of cloud based LLMs are: ChatGPT, Bard and LLaMA. However, various open source projects now have commercially licensed LLMs that rival the previous cloud based offering and can be hosted in the cloud or on-premise. An LLM then converts the question into a vector form which is then passed to the vector database and retrieves corresponding chunks that match the query.

Finally the retrieved chunks are passed back to the LLM, together with an optional prompt, which causes the chunks to be reconstituted back into a natural language answer. The purpose of the prompt is to provide the LLM with additional information about how you would like the question to be answered. For example, ‘Summarise the answer in 100 words’, ‘the response should be suitable for my teenager to understand’.