Learning Is a Lifelong Task– Also for AI Language Models?

Learning is a lifelong task, and is the only way to stay up-to-date with today’s stream of information, changing conditions and ever new challenges.

Lifelong learning also applies to AI language models, which also need further training to stay current. After all, a language model only has the knowledge that it has acquired through training.

In this blog post, we describe how to enrich language models with up-to-date and reliable information without retraining them, and how we at SEEBURGER use this for the benefit of our customers. The watchword is: Retrieval Augmented Generation.

What are AI language models and what challenges do they involve?

AI language models let you easily generate texts using instructions in natural language – prompts – and can provide customized text responses to user queries. With the sudden mainstreaming of language models through the release of the chatbot “ChatGPT”, the advantages of these models have come to the fore. However, as the various language models become more widespread, their disadvantages are also becoming more obvious. A language model only knows what it has learned in its training. It also cannot identify the source of its information and therefore cannot verify the accuracy of a statement. This is particularly problematic because language models are sometimes prone to hallucinations and giving factually incorrect responses.

After initial extensive training, a language model is often fine-tuned to a specific domain. This fine-tuning uses a smaller, more specialized data set and is an iterative process aimed at improving the model’s performance in a particular field without losing the knowledge acquired during the initial training.

Training and fine-tuning a language model is expensive, resource-intensive, and requires appropriate training data. Since the database is frequently updated, it is also time-consuming to extend or update a language model through fine-tuning. Furthermore, fine-tuning cannot solve the problems of lack of transparency or occasional hallucinations.

How can we address these challenges and enrich a language model with up-to-date and reliable information? The answer to this question is Retrieval Augmented Generation, or RAG for short.

What is artificial intelligence?

There are many definitions of the term “artificial intelligence” (AI). The understanding of artificial intelligence is flavored by what we understand by “intelligence”. If our measure of intelligence is the way humans think and act, then the goal of artificial intelligence is to imitate human behavior or thought processes. On the other hand, a more idealistic view of intelligence is defined by rationality, with the goal of achieving optimal results beyond human capabilities. Definitions of intelligence also differ in whether it is described as the ability to think or as externally observable behavior. This means that intelligence can be measured by both the decision-making process and the outcome of a decision.1

A very broad definition – and as we currently understand it a fitting definition – of artificial intelligence, is the ability of computer-based systems to act rationally. Artificial intelligence in the broadest sense therefore includes any technology that allows computers to mimic rational behavior and reproduce or surpass human decisions to solve complex tasks independently or with minimal human intervention.2

How does artificial intelligence work?

An intelligent system is capable of solving certain tasks. To do so, it must know what to do and how to do it. For complex problems, it would be extremely difficult even for experts to formulate these rules. Instead of relying on programmed processes, machine intelligence grows through machine learning.3 The goal of machine learning is the automated generation of analytical models to solve cognitive tasks.4

This relationship can be illustrated by the following comparison: Imagine trying to explain to a child what a cat looks like. You could describe its fur, its size, its ears, and its eyes. However, a child who has never seen a cat would have a difficult time visualizing the animal despite your detailed explanations. But if you happened to see a cat walking by and showed it to the child, she would recognize its features and compare them to her previous experiences. The difficult task of explaining a cat to a child can be compared to programming a sequence of actions. It is much easier for a machine to learn patterns on its own from training data – much like a child learning what a cat looks like.

How do machines learn?

Machine learning algorithms allow computers to learn iteratively from problem-specific training data, finding hidden knowledge and complex patterns in that data. General rules are abstracted from concrete examples and can be applied to unknown situations. In general, machine learning means that the performance of a computer program improves as it gains experience with specific tasks and performance measures. There are many types of machine learning algorithms and concepts. One very specific machine learning concept that is popular because of its applicability to text, image, video, speech, and audio data is deep learning.5

What is deep learning?

Deep Learning is a machine learning concept based on deep neural networks. For many use cases, deep learning is better suited than conventional machine learning or traditional approaches to data analysis. Deep learning is particularly useful in areas with large and high-dimensional data, which is why deep neural networks outperform shallow machine learning algorithms in most applications involving text, image, video, voice, or audio.6

Artificial neural networks are inspired by the way information is processed in the human brain. They consist of mathematical representations of interconnected processing units called artificial neurons. Like synapses in the brain, each connection between neurons transmits signals whose strength can be amplified or attenuated by a weighting that is constantly adjusted during the learning process. Neurons are typically organized in multi-layer networks. An input layer usually receives the input data and an output layer produces the final result. In between, there are usually several hidden layers that are responsible for learning a non-linear relationship between input and output. Deep neural networks, as used in deep learning, are organized in deeply nested networks.7

How does AI-generated language work?

AI-based systems that can generate new content such as text, images, or audio from training data are generally referred to as generative intelligence. For example, a generative language model learns statistical information about natural language from training data and uses this information to interpret and generate natural language.8

Language models can be classified not only under the umbrella of generative artificial intelligence, but also under the umbrella of natural language processing (NLP). Natural language processing refers to computer-based techniques for the automatic processing and recognition of natural language. These techniques are used, for example, in machine language translation and text analysis.9 These techniques are particularly interesting when they enable seamless communication between humans and computers based on natural language.

Large Language Models (LLM) have proven to be particularly suitable for this application. Large language models are special deep learning models that can perform a variety of language processing tasks. Compared to conventional language models, they are characterized by their enormous parameter size, which gives them increased performance and special abilities known as emergent abilities.10 A particularly well-known example of a Large Language Model is GPT-4 from OpenAI, the engine behind ChatGPT.

How can you augment pre-trained language models with vetted knowledge?

Large language models have been shown to perform well in natural language processing tasks. For example, they can formulate precise responses to user queries. They are also able to acquire and reproduce extensive knowledge from data without access to external memory. However, this ability to retrieve and accurately process learned knowledge is limited. This can cause large language models to hallucinate and invent incorrect answers. They are also unable to base their decisions on reliable sources. It is possible to fine-tune the knowledge base of a pre-trained language model. This involves retraining the basic model on a smaller, problem-specific data set. However, this retraining is very time-consuming, requires a suitable training data set, and is therefore unsuitable for frequent tuning. This means that a large language model may be using out-of-date information to answer queries.11

Instead, the issues above can be solved by using Retrieval Augmented Generation.

What is Retrieval Augmented Generation?

Retrieval Augmented Generation (RAG) extends a model with a parametric memory, such as a large language model, with an additional information memory. This is accessed by a pre-trained neural retriever.12

When processing input, the retriever retrieves appropriate documents for the large language model to use to generate its output.13

The following comparison illustrates how a RAG system works: When you book a room at a hotel and go to the front desk, the receptionist assigns you a room. Since it is impossible for the receptionist to keep track of all the bookings and last-minute changes made by hotel guests, the receptionist looks up your room number in the hotel’s reservation system. The reservation system provides the correct and up-to-date information. The receptionist’s task is then to communicate this information to you in an appropriate manner.

Similar to a reservation system’s database, a RAG system’s memory contains the specialist information needed. The retriever provides the information from memory that is relevant to answering a query.The retriever provides the information from the memory that is relevant to answering a query. Like the receptionist, the Large Language Model then needs to use all this information to provide an appropriate answer to the query.

The use of Retrieval Augmented Generation makes it possible to provide more specific, fact-based responses using large language models. Because its information store is separate from that of the language model, it is quick and easy to update the information it contains. A RAG system not only reduces LLM hallucinations, it also provides more checks and greater interpretability.

How does RAG contribute to better language models?

Retrieval Augmented Generation retrieves contextual information for the LLM to answer a query. The user is neither affected nor aware of this background process. All the user notices is that the information the retriever pulled from the information store has resulted in a more accurate, contextual response.

A RAG system can therefore help provide answers based on correct and up-to-date information, particularly for specific use cases or subject areas. In addition, the information on the basis of which queries are answered is transparent and comprehensible for the user.

RAG systems are particularly flexible because the knowledge base is independent of the language model used. This means that it can be easily exchanged, updated or otherwise modified. It is also possible to operate the language model locally and independently of third-party vendors. This also makes RAG systems attractive in terms of data security and privacy.

How do you maintain a RAG system?

In order to be able to successfully answer as many user queries as possible, a RAG system needs an extensive, high-quality information database.

A large part of developing such a system therefore involves selecting, pre-processing and expanding the information for the database

In some domains, it is useful to combine a RAG system with fine-tuning techniques to achieve even more precise and accurate results. For example, some fields require a specific vocabulary that can be achieved through fine-tuning.

Although RAG systems offer a number of advantages and even allow the integration of external data sources, their use does not preclude such additional customization. For example, SEEBURGER uses a RAG system enhanced with SEEBURGER-specific documents to help SEEBURGER users perform their tasks.

The flexibility of Retrieval Augmented Generation lies in the ability to easily replace, update or delete the information used by a language model, as well as in its independence from the language model used. This means that Retrieval Augmented Generation is not only ideal for adapting pre-trained language models to specific use cases and contexts, but also allows the LLM to learn and remain up-to-date throughout its lifetime. This means that its users always have access to the latest information.

About SEEBURGER

SEEBURGER is an integration service and software provider. One central platform, one experience, all integrations, all deployment models. Our BIS platform enables the seamless networking of applications, people and processes, whether in the cloud, a hybrid environment or on-premises. Integrate applications and technologies for the secure exchange of data, automate and digitalize processes and enable innovation through perfect integration.

Family-owned since 1986, SEEBURGER now has over 1,200 employees worldwide. Over 14,000 customers rely on integration expertise from SEEBURGER every day.

Source: https://blog.seeburger.com/learning-is-a-lifelong-task-also-for-ai-language-models/