Integrating OpenAPIs and LLMs for RAG Services

Introduction

What if you had an AI assistant that knows not only how to interpret human language and intent but also has a seamless access to enterprise APIs and other structured data sources? This assistant could provide real-time insights combining its natural language capabilities with rich structured data of your enterprise. This is the promise of Retriever Augmented Generation (RAG) services, a cutting edge approach for combining capabilities of Large Language Models (LLMs) with the rich, structured data powering your enterprise to unlock never before seen levels of decision-making and automation. Recently, in March 2024, Klarna [1] captured AI news headlines with their AI assistant that in the first month of operation autonomously handled a total of 2.3 million conversational chats — doing the job equivalent to 700 full-time agents.

In the ever-evolving field of natural language processing (NLP), RAG has traditionally focused on ingesting unstructured text data into LLMs’ text generation processes. However, as LLM capabilities continue to expand, there is a growing need to incorporate the wealth of structured data readily available through OpenAPI services — data that is often underutilized but holds great potential for enhancing the accuracy and relevance of language model outputs.

The recently released version 2.0 of the NLP framework Haystack [2] redefines the integration of RAG services by providing an efficient and scalable solution to combine structured data with LLMs by fully embracing the OpenAPI specification (OAS) [3] standard.

The Case for RAG Services

Going back to the Klarna case, imagine you are a stakeholder in a typical enterprise where you already employ OAS-like services to track customer orders, requests, and general customer information. Using RAG services, your sales employees could then simply ask, “What are the top-selling products in Berlin, Germany?” or “How has customer feedback trended for our new product line over the last quarter?”. And that’s not where the benefits of this approach end. A customer service conversational chatbot could be integrated to handle customer requests autonomously. With traditional RAG that relies on unstructured data sources, these features seem far-fetched. However, with RAG services, an AI assistant can seamlessly connect to these enterprise data sources, combine them with natural language understanding, and deliver fast and accurate data-driven insights.

Recently, AI and LLM application developers have already attempted to include structured data sources in their systems. They have used special per-service connector classes commonly known as custom service wrappers, primarily programmed in Python. While it provides a functional solution, this method has considerable limitations, scalability issues, and other drawbacks.

Challenges with Custom Code Wrappers

Currently, enterprises are often forced to develop custom code wrappers to integrate diverse services into the LLM ecosystem. This approach, by default, brings along several systemic challenges, such as limited scalability, complex maintenance, and considerable inconsistencies, all of which hurt the robustness and reliability of RAG services.

As the set of enterprise services grows, so do scalability issues. Custom service wrappers, initially designed for only a few services, rapidly become bottlenecks as the number of services increases. Adaptation to new services or API changes requires ongoing modifications or additions, resulting in a bloated and rigid codebase that hinders integration and adaptability.

Another daunting task regarding custom wrappers is their maintenance. Keeping the wrapper code up to date with the latest API changes, security authentication changes, and system upgrades requires significant time and expertise.

These challenges amplify the need for a standardized, scalable, and maintainable method of RAG service integration. The following section will explore how we address these issues, pointing to an innovative pathway to a more efficient and dependable implementation of RAG services.

Haystack 2.0 and OAS: RAG Services

The advent of Haystack 2.0 can be considered a paradigm shift in how Retrieval-Augmented Generation services are implemented and scaled. This version addresses the challenges of custom code wrappers by using OpenAPI Specification (OAS) at the core of its design. Let’s see how:

Step 1: Using function-calling to generate OAS service payload

In manual LLM service integration done previously, a large amount of labor had to be invested in integrating each new service into the RAG framework and in developing custom wrapper code.

We shift this paradigm using the OpenAPIServiceToFunctions component and OpenAPI specification. OpenAPIServiceToFunctions dynamically translates any given OpenAPI specification into OpenAI’s function-calling definitions [4]. These definitions, with the help of LLMs, translate human queries into relevant function-calling payloads for OAS-specified services.

Fig 1. Function Calling Step — Image by author

An example can be a weather service query like, “What is the forecast for San Francisco for the next three days?” In function calling, we don’t actually do any function calling per se. In fact, the process is much simpler: LLMs utilize the OpenAI function-calling definitions to understand the structures and parameters of the query, as well as the API requirements of the hypothetical service. This enables them to create a precise function-calling payload for a service.

Going back to our weather example and the query: “What is the forecast for San Francisco for the next three days?” This human text query gets translated into a JSON payload:

{ "type": "function", "name": "weather_forecast", "arguments": { "location": "San Francisco, CA", "num_days": 3 }}

Step 2: REST invocation and data retrieval

In the next step, the OpenAPIServiceConnector, given the OpenAPI specification and function calling payload from the previous step, invokes the specific endpoint of the target REST service.

Now, although the OpenAPI specification helps with precise service definitions, invoking a service behind an HTTP endpoint is no easy task. We need to provide support for all endpoint invocation variants. Are the parameters passed as a query or URL path? Do we pack the complex payloads as JSON in the body of the request? How do we enable authentication and all its variants? What do we do with rate limits and error handling? The OpenAPIServiceConnector handles all of these details of setting up HTTP invocation on our behalf.

Having received this invocation from the OpenAPIServiceConnector component, the REST server processes the invocation and responds with structured data in JSON format.

Fig 2. REST invocation — Image by author

Step 3: Response Formulation

In the final step, the structured REST data response is synthesized into a coherent and informative answer. The JSON response from the REST server is paired with a system message (prompt), contextualizing the information and formatting it under the end-user’s or application’s needs. This structured service response data together with the system prompt is then passed to the LLM for inferencing, generating the final, enriched response.

Fig 3. Response Formulation — Image by author

By focusing on OAS specification and abstracting the complexities of API communication not only overcomes the limitations of custom code wrappers but also strengthens the efficiency, scalability, and dependability of RAG services to a large extent.

Unlocking the Future of RAG Services

As LLMs, and in particular their function-calling capabilities, continue to improve, the Retrieval-Augmented Generation (RAG) services provide a sneak peek of an era of innovation and efficiency — if not complete enterprise automation. RAG services, with the assistance of up-and-coming LLMs, will become dynamic and adaptable, capable of providing relevant data in response to user queries in all modalities (text, voice, etc.).

In addition, such AI systems are likely to learn and adapt, continually improving their service selection, ultimately providing a highly responsive and intelligent data ecosystem that supports an ever-growing list of tasks and decision-making processes.

However, RAG services’ benefits extend far beyond automating individual services. The dynamic nature of such implementation lays the foundation for sophisticated intelligent agents — systems capable of performing complex tasks and interactions by effortlessly selecting and integrating multiple services in real-time. AI systems will provide highly responsive and intelligent data ecosystems that support a growing list of tasks. For example, imagine an agent that can grasp the intent of a user’s query about a recent order, dynamically extracting user data, order history, and current logistics information to formulate an exhaustive response or suggest follow-up actions.

The integration of structured data, the boost of service capabilities, and the development of intelligent agents are just the beginning. With each step forward, we move closer to a world where AI becomes an integral part of our interactive decision-making processes, rather than just a tool for text interpretation and generation.

Conclusion

In this article, we’ve explored how Retrieval Augmented Services (RAG) services can effectively integrate structured data sources into the LLM orchestration frameworks like Haystack. By leveraging the existing OpenAPI Specification (OAS) we removed the necessity for previously used glue code tool wrappers and have significantly eased the maintenance cost for integration of these services.

The integration of OAS and LLMs involves a three step process: utilizing the existing LLM function calling for invocation payload generation, REST invocation and JSON response retrieval, and finally the contextualized response formulation using system messages and LLMs.

As the capabilities of function calling LLM models continue to expand, we can expect to see an increase in RAG services applications that push the boundaries of what’s possible with AI and automation. Practical applications of this technology can already be seen in open-source GitHub Actions — PR Auto[5], and Reno Auto [6], which showcase the potential of RAG services in real-world scenarios of collaborative software development automation.

While the full potential and impact of RAG services in enterprise automation remains to be seen, we believe it is as impactful as RAG with unstructured text. Just as traditional RAG ushered us in a new era of natural language understanding and text processing, RAG service has the potential to automate enterprise processes and decision making.