The Andi Copilot Early Adopter Program: How LLMs Are Powering the Copilot

Daniel Wagner | Director of Engineering, Andi Copilot

Welcome to Part Four of our ongoing in-depth look at the Andi Copilot Early Adopter (EA) program and its first major area of focus, the commercial credit and lending sector. You can catch up on the story via this link – which will take you to the previous posts.

Part three outlined how user feedback guided us to our first Andi Copilot GenAI use case – verifying accuracy between the credit memos and commercial loan agreements.

But while identifying the first big job for Andi Copilot was important, it was just the first step on a long journey. As anyone who works in the area of GenAI and Large Language Models (LLMs) will tell you, the field is littered with shuttered companies that put together flashy demos but then failed when it came to turn that potential into concrete reality.

In today’s post we’ll outline some of the major challenges the Q2 Andi team faced in turning an LLM into a tangible, valuable tool for bankers—and how Andi Copilot overcame them.

Reminder: We hope you enjoy the post, but the best way to understand how Andi Copilot and GenAI can help your institution is by joining the early adopter program.

Challenge No. 1: Breaking the job up into smaller pieces for the LLM

There’s been a lot of recent attention about the growing size of LLMs’ “context window” - the amount of text the model can receive as input when generating or understanding language. In theory this sounds great: the more instructions you give the LLM, the more tasks you can ask the copilot to perform. In reality, LLM attention spans are still very limited, regardless of context window size. If you give it one page of text prompts, it will do very well. But by the third page, the LLM may start neglecting what was mentioned on page one and focus only on the latter part of the document.

When dealing with lengthy loan agreements and credit memos – documents with many different pieces of information to check – the short attention span of the LLM can result in poor accuracy and efficiency. We needed a way to parse the elements down into smaller chunks of data that the LLM can digest and act upon, before moving on to the next chunk.

The solution to this challenge is called a “knowledge graph.” Knowledge graphs help create a map of the document by grouping information into different nodes and connecting them, based on their relationship, to other nodes of information. The outcome presents the information to the LLM that makes it easier for it to find what it’s looking for when it receives a query.

For example, a bank might require that a borrowing company’s headquarters must be based in a city where the bank has a branch location. A knowledge graph can create two information nodes - one with the cities with branches and the other with the location of the company’s headquarters - making it easy for the LLM to check them against each other.

Challenge No 2: Scaling up with agents and classic algorithms

Knowledge graphs help deal with an LLM’s ability to answer multiple queries with efficiency and accuracy. But there is still the matter of scale.

Documents like credit memos and loan agreements – densely packed with information that needs to be compared and confirmed, and often as long as 100 pages – can require very large knowledge graphs, made up of subgraphs for each page in the document.

To solve this scalability problem, the Q2 Andi Team needed to:

a) Create all those knowledge graphs.
b) Merge all those graphs into one single knowledge graph for each document.
c) Compare the single credit memo knowledge graph to the single loan agreement graph.

To do this, we built “agents,” within the Andi Copilot platform. Agents are autonomous helper systems that use LLMs to perform specific, smaller tasks that are part of the larger workflow for comparing two large, complex documents.

But even with AI-powered agents, there are still limits to what a pure LLM system can do. In this case, comparing a loan agreement to a credit memo would just take too long with an LLM alone. We had to use non-AI, “classic algorithms,” to supplement and speed up the process.

Challenge No. 2.5: Switching LLMs

As we wrestled with how to solve the scale issue, we also realized we needed to reevaluate our LLM.

The Q2 Andi team began this project using OpenAI GPT-4 (the engine behind ChatGPT). But as we began the process of creating and merging knowledge graphs, GPT-4 proved to be too slow, and GPT-3.5 was fast enough, but too inaccurate. Anthropic’s LLM, Claude 3, while a smaller system, proved to be faster than GPT-4 and nearly as accurate. And unlike GPT-4, it could run multiple queries in parallel rather than sequentially or one at a time.

Switching to Claude 3 gave us the capacity to scale Andi Copilot for more customers, without losing quality or speed.

Challenge No. 3: Minimizing the Risk

Of course, none of this matters if Andi Copilot fails to handle the document comparison with a high level of accuracy. To ensure the job is done well, we developed a robust model comparison framework to ensure the ongoing accuracy and reliability of our product across different versions and updates.

As we continue to develop Andi Copilot, our team remains focused on model risk management. We systematically evaluate each iteration of the AI model and, most importantly, each new version of our prompts against a diverse set of test cases, historical data, and real-world scenarios. This process not only helps us maintain high standards of performance but also allows us to identify errors, or inconsistencies that might arise as the model evolves.

By prioritizing model risk management, we're able to, continuously improve the accuracy and efficiency of our loan document automation tool and instill confidence in our clients.

The bottom line

LLMs have massive potential, but users should be aware of the limitations a generic LLM or copilot can have when it comes to handling banks-specific tasks. Making LLMs work for you with specific banking use case takes a lot of time, development, and trial and error. And that’s exactly the value of the Andi Copilot EA Program!

If you’re looking to improve the speed and accuracy of your lending process, or if you’re more generally looking for ways that your institution can leverage AI, we encourage you to join the Andi Copilot EA program. To make your voice heard and have a say in the shaping of Andi Copilot, join today!