Real AI ROI In Action: Custom AI for Grocery Wholesale Operations

John Sosoka
Jun 5
5 min read

The Problem & Goal

Our partners at a top-ranked grocery chain launched the wholesale portion of their business last year, targeted at selling larger quantities of products to other businesses, like restaurants or hotels. A problem their merchandising team would regularly encounter is being unable to identify why a product wasn’t available for sale in the wholesale store. Given their in-house e-commerce solution, it would often require developer intervention to help answer these questions, as there were about a half-dozen variables that impacted product availability, and often queries to remote APIs and multiple Elastic Search calls to answer questions. The turnaround time would often be 30-60 minutes, and each request would be pulling a software engineer from their regular day-to-day work. This was a classic case of brilliant people spending time on repetitive tasks instead of building for the future, something we've seen all too often in retail tech.

Since the process for answering merchandiser questions was well-documented and the inputs from the merchandisers could be highly variable, we determined this was a great use case for GenAI. We could equip an agent with tools & a runbook to seek solutions. Commerce Architects got right to work exploring possible solutions. What if we could give merchandisers answers in seconds instead of waiting an hour while developers interrupted their actual work? That was the challenge we couldn't resist tackling.

Navigating Bleeding-Edge Technologies

The team was heavily invested in GCP-based solutions. This meant that the LLM provider we used had to be a model provided by GCP Vertex. As the team was working with predominantly Java-based applications, we opted to utilize LangChain4J to implement the agentic workflow. We opted to use Claude Sonnet as it was available in the Model Garden. The first major issue we encountered was a lack of LangChain4J support for Claude on Vertex. Furthermore, at the time of implementation, Claude on Vertex didn’t support tool/function calling, which would be critical for our application. Lastly, there was no official Vertex Java API client available during implementation.

Custom ChatLanguageModel Implementation

Luckily, the LangChain4J framework made wonderful use of interfaces, enabling us to implement a custom ChatLanguageModel which could be easily plugged in to other aspects/capabilities of the framework. A ChatLanguageModel within the LangChain4J framework is ultimately how we interact with LLM providers. The equivalent of this in Python LangChain is a BaseChatModel.

Commerce-Architects was able to implement a basic google Vertex API Client & ChatLanguageModel that supported a simple inference call within several days; But, a problem still remained–how could we support tool/function calls when the provider didn’t officially support them?

Unlocking Function Calls Without Provider Support

When function calling is officially supported by a provider, there are separate fields in the request/response models reserved specifically for tools. They will either include tool schema descriptions (on send) or a request to invoke a tool (on receive). In this scenario, we only had a single field for managing a basic inference call to interact with. The solution was to pack everything into the simple messages field. This included the original user text, JSON representing the equipped tools, a prompt instructing the LLM on how to involve a tool, and an example invocation.

The overall response flow looked like the following:

After receiving the response from the LLM, we would first identify if a ``` code block was in the response. If one were present, we would strip the backticks & attempt to deserialize the contained text to a ToolExecutionRequest model. If that deserialization was a success, we knew that a valid tool execution request had resulted. We would hand that off to the LangChain4J framework and let it handle the invocation of our annotated tool methods.

Building The Agentic Workflow

Now that we were able to unlock unsupported capabilities (LangChain + Claude on Vertex and Function Calling) it was time to get to work building the agentic workflow. While we had built agentic workflows internally within our CA Labs R&D program, this was our first real-world scenario–so we added some guard rails. Instead of equipping a single LLM with a bunch of tools, we created multiple instances of the LLM tasked with different portions of the process. Each having an isolated context. The following is a simplified explanation of the workflow:

By splitting up the roles & responsibilities of each LLM instance, we were able to mitigate against things like a prompt injection attack. If the first agent couldn’t map the user query to an existing plan, it would not move on to the next agent and instead exit the process. This means that if a malicious user attempted to override the prompt of the first agent, the malicious prompt would never make it to the executor as a plan could never be selected.

SELF-DISCOVER & Planning

We utilized a modified DeepMind SELF-DISCOVER algorithm for plan formation. The original paper deals with creating a plan for novel problems. With SELF-DISCOVER, an LLM is guided through a three-phase process

SELECT – It evaluates the user query and selects a way to approach a problem from a pre-written list.
ADAPT – It adapts the selected problem approach to the task at hand.
IMPLEMENT – It takes the adapted plan and forms an actionable step-by-step plan.

In the wholesale scenario, the problems weren’t novel, as there would be runbook entries available for developers to use in solving the task. We adapted those runbook entries to plans stored within the application. We essentially borrowed the SELECT & ADAPT phases, and added in another step to verify that the user provided enough information to solve the problem. For example, a user might claim a product is unavailable but forget to include the product id–When this occurred, the correct plan would be selected but insufficient information was provided to solve the problem, it would often result in LLM hallucinations as we were tasking it to utilize information that wasn’t available within its context.

Mitigating LLM Variability

The base LLM models have improved dramatically since we first implemented this project. The types of issues we were mitigating are more rare to encounter now; However, during implementation we would sometimes encounter a scenario where the LLM Instance tasked with determining the viability of the request would yield either false positives or false negatives. Luckily, this occurred less than 50% of the time.

Our approach was to create a simple voting mechanism when determining viability. At this node in the agentic workflow, we would spin up 3 viability evaluators and perform 3 concurrent requests to vertex. This resulted in a 45% increase in accuracy, bringing the viability step to yield correct results over 95% of the time.

Conclusion

We were able to deliver this project to production and save developers countless hours of fielding merchandiser questions.

Reduced time-to-answer availability questions from hours to minutes.
Delivered accurate results over 94% of the time
- If it couldn’t answer the question, the response would include every step taken which allowed developers to quickly pick up where the agent left off.
Saved developers countless hours answering merchandiser questions
Allowed developers to focus on the harder tasks.

The rollout was strategic. First, the team exposed access to the new slack Agent to developers, and gradually increased the pool of users, eventually opening up access to merchandisers. The capabilities of Claude on Vertex as well as the LangChain4J framework have greatly improved since the time of implementation. This project highlights a crucial lesson for teams working in the generative AI space: don't let gaps in tooling or documentation slow you down. By creating custom implementations and workarounds, we were able to deliver value months before official support became available—proving that agility and technical ingenuity are as important as the underlying AI technology itself.