Structured and Unstructured Data in AI Systems - Complexities of Integrating LLMs with Traditional Data Sources

Vianai Editorial Team
December 6, 2023

Gartner estimates that 80 to 90 percent of data inside of a company is unstructured, or textual, often locked in the contracts, emails, memos, policies and the like that keep a business functioning.

Generative AI offers a method to unlock this unstructured data, which implies a significant value, but to be truly data agnostic, a solution should span both structured and unstructured data.

The problem, though, is twofold - first, LLMs and generative AI are notoriously bad at the kinds of solutions that traditional machine learning and AI are good at, such as mathematics and predictive modeling. Next, each have unique demands and systems to ensure they function in a business-specific way, techniques such as retrieval augmented generation (RAG) aid in providing the right unstructured information to an LLM and SQL generation aids in accessing traditional databases.

Take the following use cases:

  • Finding insights in large corpuses of unstructured data: For a large financial institution, hila Enterprise processes various unstructured sources of information continuously , which allows exploration and the discovery of new insights in topics they cannot pay close attention to, from sources that are often crowded with noise as well as signal (such as social media and the like). This kind of continuous processing needs to take place often, regardless, such as vectorizing the unstructured data and putting it into a database, identifying key metadata aspects or mapping a hierarchical structure among the documents.
  • Created structured systems in unstructured documents: For a large energy corporation, we use advanced information extraction techniques (using fine-tuned LLMs) to classify and model their contracts information. This data preprocessing then is loaded into a database and queried using text-2-SQL techniques that convert natural language questions into SQL queries. This enables for the benefits of structured querying, and mitigates aspects of unstructured systems, such as incomplete data, duplicates, etc. In this way, all contracts set to expire in the next three months, say, could be identified quickly.
  • Analyzing unstructured data, at scale: For a very large bank, we take the vectorized documents from their internal use cases and enable them to much more rapidly analyze their contents and find answers. This kind of chat assistant aids them in the writing of loans and conducting business. This type of AI also allows for parts of a business, such as HR, Legal and Marketing, to use AI in ways that they have not before. This significantly elevates the operating efficiency of a company's people.
  • Generating charts on the fly: With the Conversational Finance solution, hila Enterprise connects to a client's system of record then translates a natural language question into code on the fly to get a response from the system. For example, with the data hila receives, the system can build dashboards within minutes.

It has become clear to us in our many conversations with clients, prospects and investors, that modern business is awash in data, which is growing all the time. Yet, the promise of these AI systems is to provide a layer of intelligence, using natural language and LLMs, on top of any/all data, as a breakthrough new way to access, unlock and understand the information available today.

The complexities of realizing this lofty promise are immense. That's why we are working closely with partners (Cognizant, KPMG) to bring the expertise, product capabilities, underlying tools and technologies, talent, and the implementation execution to help enterprises navigate the complexities, and drive tangible business value from AI. Read More about hila Enterprise.