top of page

The AI Agent Tsunami Is Here: Hype Versus Transformation

Architecture in Copenhagen | GETTY
Architecture in Copenhagen | GETTY

Few technologies have been subject to as much hype, misrepresentation, and speculation as AI. Some people say it’s bigger than electricity, others say it’s massively overhyped. One simple metaphor I use to help recognize when to pay attention to a technology is to look for the “Trojan Horse,” or when, where and how a technology creates value for an internal user or an external customer. When a technology creates real value, even if in just a niche application, it kickstarts the cycle of learning by doing which in turn dramatically accelerates development. For example, when Facebook renamed itself Meta and promoted the Metaverse, it was hard to see the Trojan Horse. By contrast, when Tesla launched the Model S, the thrill and excitement of a comfortable, fast, and software-driven car that could be updated remotely created real value.


We are in the middle of a tectonic shift in AI from asking AI questions to Gen AI to AI agents taking action. This could be a transformative moment in technology history and the field is moving fast. But it can be hard to tell if there is a real Trojan Horse, or if it is mostly hype. Plenty of fluffy articles suggest one can use AI Agents to make zillion dollars but in reality, AI agents, particularly multi-agent systems composed of many agents working together have been unstable and hard to implement.


But recently a friend shared the work they are doing with AI agents and the results are both fascinating and terrifying. The challenge of Gen AI is hallucinations, or false responses, which by some accounts are getting worse or even lying. Prompt engineering is one approach to improve results, for example, by introducing multi-step reasoning. RAG, or retrieval augmented generation, connecting an LLM to external search or knowledge databases also helps. But at best the accuracy of these approaches reaches 70% or 80%. In multi-agent systems, we can split tasks into separate activities, each performed by a specialized agents doing a single task very well, which tasks are synthesized, validated, and checked by other agents, a bit akin to earlier adversarial networks that tested and improved each other. Behind closed doors this approach appears to be driving up accuracy to remarkably high levels and threatening to replace large scale activities that are repetitive in nature.


For example, one large company the team works with has a group of 70 employees just processing contracts once someone inside the company decides they want to purchase a product or service. Contracts can come through as an Excel file, PDF, or email. Instead of one LLM agent poorly interpreted the three types, they have constructed a multi-agent system that routes each file type to separate, specialized LLM-based agents trained to process only that file type. This first LLM agent parses all the information from the contract but does nothing more. Then a second LLM-based agent checks if the information is correct, and if not cycles it back. If correct, a third LLM-based agent enters the contract into the system. Yet another agent generates a response to the humans involved. Yet another agent monitors the process, performing accuracy and security checks, and so on. They are finding that by giving very concrete, simple tasks to separate, specialized agents in a multi-agent system, backed by a RAG system, accuracy rates are reaching 95% and higher. This means that for 10,000 contracts, 9,500 are performed correctly. For the 500 that don’t process correctly, 100 of those are because there is one step in the decision tree that is missing, for which, a new agent can be introduced, pushing accuracy up even higher. The results are profound in terms of the development of AI but also the eventual impact on people. There is no doubt that the contracting group will shrink from 70 to less than 7 people.


In the past, I have typically believed AI will be a complement to how we work, for example, how many thought AI would replace radiologists, but it has evolved to become a tool to help radiologists improve their work. However, the rapid progress of AI agents suggests that this may only be the case for people who are curious and proactively engaged with using these tools to improve. For those people doing repetitive, low-thinking, low dexterity tasks the rapid progress of multi-agent systems could be immensely disruptive. How do we encourage people to experiment with AI to learn and improve? How do we help people who are afraid to even try for fear of their digital abilities? How will we make ethical pathways for the people displaced by these tools? What will it mean for the future and in what ways we may not imagine will the world evolve? I’m curious about asking, what would it look like if we got this right? For example, could it lead to the rebirth of people following their true passions, such as skillful artisanship or more ecologically healthy farming?


These are as important questions to consider as being curious and riding the wave of discovery ourselves, so as to become the shepherds of AI, rather than the substituted by it. Whatever the case may be, it looks like a Trojan Horse and we can expect that the cycle of learning by doing will accelerate development even faster as we move into the future.

 
 
 

Comments


bottom of page