In the world of artificial intelligence, large language models (LLMs) have become indispensable tools for a wide range of applications. However, with their immense power comes a significant challenge: the potential for hallucinations and inaccuracies in their outputs. Enter veryLLM, a new open-source initiative created by Vianai, dedicated to identifying and rectifying inaccuracies in the generated outputs of LLM models. In this blog post, we will dive deep into the world of veryLLM, exploring its components, inner workings, and the collaborative efforts it encourages to ensure the trustworthiness of language models.
What is veryLLM?
veryLLM, is a framework crafted to address the critical challenge of improving the reliability of large language models. Developed by Vianai, a leading name in AI research and development, veryLLM seamlessly complements Vianai's Hila Enterprise product, contributing to advancements in the AI field.
At the heart of veryLLM are several crucial components designed to support the challenge of solving the issue of hallucinations in LLMs:
1. Hallucination Dataset:
The VeryLLM project offers a distinctive hallucination dataset that originates from the TruthfulQA dataset (https://github.com/sylinrl/TruthfulQA) but has been meticulously modified and expanded to focus on hallucinations rather than truths. This expansion is crucial because hallucinations often stem from interpolated dataset-truths that either are out-of-context for a given question or not widely understood as global-truths. This dataset has undergone rigorous manual evaluation by multiple team members, and we actively encourage our community to participate in critiquing and further expanding it.
2. Truthful Function Framework:
Within veryLLM, we introduce an approach to hallucination detection through "truthful functions" (is_true() functions). These functions play a critical role in assessing the veracity of language model responses by considering the question posed to the LLM, the generated response, and the relevant contextual information used by the LLM in generating that response. Is_true() functions are designed to be modular and work alongside LLMs, providing support for existing and future language models.
This framework simplifies the process of creating these functions, making it accessible to data scientists and developers alike. Additionally, it offers an automated testing and upload pipeline, allowing creators to effortlessly benchmark their functions against others and potentially earn a coveted place on the leaderboard. This collaborative environment promotes the development of the most reliable functions for the benefit of the entire community.
3. GUI Playground:
veryLLM offers a user-friendly GUI playground that serves as a valuable sandbox for researchers and data scientists to experiment with various is_true() functions. What distinguishes this playground is its integration of an ensemble of top-performing functions from the leaderboard, ensuring rigorous verification.
While it's important to note that the frontend UI of veryLLM is not currently open-sourced, it is public and freely accessible for developers. This transparency allows developers to explore and understand the functionalities veryLLM offers, providing insights for potential integration into their own language model applications. Developers may also directly interact with veryLLM APIs from Jupyter notebooks and python scripts to evaluate performance of these is_true() functions.
""At the heart of veryLLM are several crucial components designed to support the challenge of solving the issue of hallucinations in LLMs."
veryLLM's is_true() functions are designed to classify statements into three distinct categories:
1. Verified: Statements in this category are substantiated by specific contextual information from Wikipedia. Users are provided with sources to corroborate the information. Additionally, users can view additional information, like the results of all is_true() functions used to evaluate the statement with the specific context paragraph considered.
2. Not Verified: In some cases, even with the most pertinent context available, a statement cannot be definitively confirmed as true. This category reflects such instances. It's important to note that this category does not ensure that the statement is false; rather, the statement is likely unsubstantiated given the reference context. veryLLM operates as a truth detector, not a lie detector.
3. Unable to Verify: This classification indicates a lack of relevant information within the subset of Wikipedia articles utilized as context. It signifies the need for additional context to assess the veracity of the statement effectively.
Important to highlight is that veryLLM's current context pool comprises a subset of Wikipedia articles. Instead of reprocessing vast amounts of data, veryLLM taps into Cohere's extensive collection of 1 million+ embedded Wikipedia articles at the paragraph level (https://txt.cohere.com/embedding-archives-wikipedia/), made accessible via Weaviate's publicly accessible vector database instance (https://weaviate.io/?ref=txt.cohere.com). Given that most publicly disclosed LLM training datasets include Wikipedia, this approach provides a robust foundation for veryLLM's verification process.
Furthermore, veryLLM intends to expand its contextual resources by processing paragraph-level embeddings of other commonly used datasets in the training of LLMs for datasets like C4 / Common Crawl (https://commoncrawl.org/) and others. This expansion will enable the verification of claims from a wider range of topics. This is a call to our community to actively participate in helping us map and process the training datasets of all widely used LLMs, fostering a collaborative environment that enhances the trustworthiness of AI systems.
Developers working with language models can easily integrate VeryLLM's APIs to verify the accuracy of LLM responses. Whether they want to test their own is_true functions or simply integrate the best LLM output verification, our straightforward API allows them to achieve this with just a few lines of code, providing access to comprehensive verification details for each sentence generated by the LLM.
Getting Started with veryLLM
For those interested to join the effort, check out these resources:
– veryLLM GitHub Repository – https://github.com/vianai-oss/veryLLM
– veryLLM Playground – https://www.veryllm.ai/
These links are your gateway to exploring veryLLM, contributing to its development, and making a positive impact on the reliability of language models.
In the ever-evolving field of AI, the importance of reliable language models cannot be overstated. veryLLM offers a robust framework for identifying and addressing hallucinations in these models. It reflects Vianai's dedication to enhancing the trustworthiness of language models and welcomes the broader AI community to join us in this effort. With veryLLM, we are advancing towards a future where AI-generated content is not just powerful but also unquestionably reliable.
Our Approach, Our Products
Since its inception, Vianai's mission has been to bring human-centered and responsible AI to enterprises worldwide. H+AIâ„¢ is the philosophy that underpins all of the work we do at Vianai, the products we build, and how we work with customers.