How Conversational Finance reliably converts language to code using LLMs

April 11, 2024

The problem

It’s clear today that businesses of all shapes are turning to LLMs to try to increase productivity. The use cases vary, and today the variety of problems solved by LLMs represent just a fraction of the future opportunities for the technology.

Yet, most solutions today remain confined to “chatbots” on unstructured data. While these can provide value, we’ve found in various conversations with business users, IT professionals and others in an enterprise that enabling natural-language questions against structured data would be the massive unlock toward meaningful productivity gains. 

Structured data is foundational to any enterprise, from operations to sales and marketing, however, extracting insights and value from this data in today’s paradigm requires multiple departments to contribute to the retrieval, processing, and interpretation of structured data.

For example, take the process of dashboard creation. With an existing solution, a company or department must first engage with a vendor. They then onboard the client, define the project, configure the solution, buildout the infrastructure and software, construct a data pipeline, replicate the dashboard’s data into a data lake, build data models, construct the dashboards based on the starting solutions and then turn them over for use. This is a multi-step process that can take nine months to complete. And then, at the end of this three-quarter-long building, the dashboards require significant effort to change.

LLMs offer a promise to vastly shorten this process and make it dynamic by translating the question into code and returning the answer, which would be even better if charted. Yet without additional technology components and tools built into a readymade application, such as our own Conversational Finance solution powered by our hila platform, getting an LLM to operate properly comes with its own difficulties, such as:

  1. Performance: It often takes a long time to receive an answer from the LLM.
  2. Reliability: Responses can be inconsistent, even the same question can yield different answers.
  3. Security: The best-performing models are often public, but the data in a system of record, for example, is confidential and should not be shared externally.
  4. Scalability: Getting a decent performance from one table in a system of record is very different than getting good answers from 20 (or 2,000) tables.

Overcoming the challenges

Let’s focus specifically on one of the predominant use cases inside of an enterprise — text to SQL.

The most performant models and papers in Yale’s Spider Leaderboard have a credible 91.2 percent accuracy, but that remains too low for an enterprise. Indeed, using a combination of models and chain-of-thought reasoning, the next closest techniques from the Alibaba Group achieved only 86.6 percent accuracy.

For an enterprise to trust a system, it needs to be at 99 percent accuracy or greater. We have achieved this through a combination of techniques. To start, we process the question to see that it can be answered by the available dataset. We then cross-reference the query against a vast database of knowledge that is both domain-specific (such as finance-specific for an ERP system) and company-specific (such as which months constitute a fiscal year  for a particular company). The response then gets post-processing after the LLM returns the SQL to ensure that the SQL answers the initial query. These various guardrails around any model, public, private or fine-tuned, ensure we raise its SQL generation accuracy above 99 percent.

The key to our process is in what we call “golden SQL” or a robust set of knowledge and key SQL statements that guide the LLM in making the correct calls. This knowledge, as referenced above, is based on both domain-specific understanding and customer-specific data. Both of these contribute to the synthetic data generation, or golden SQL, that constitutes the backbone of our system.

In this way, Conversational Finance, our primary application for text to SQL, addresses the concerns head-on:

  1. Performance: We can achieve high-accuracy SQL generation with smaller and faster models — such as fine-tuned models or smaller public models.
  2. Reliability: The knowledge we embed covers a vast swath of the potential queries inside of a company and ensures consistency in responses.
  3. Security: Because we do not rely on public models, the models themselves can be fine-tuned and put on a private cloud or on-premises.
  4. Scalability: The knowledge we embed in the models can include complex tasks, such as table joins. We also can handle structured and unstructured data in simple queries.

In addition, hila can monitor both LLMs and traditional models at scale, including models with characteristics in the billions per day.

 As we are already working with several enterprises in these areas, we can apply the knowledge set and common questions we’ve gained previously to future engagements. This enables us to go from onboarding to in use in 90 days.

 Interested in learning more about how we are helping enterprises tap into structured data in real-time?

Get in touch here.