Featured Image

“Pre-trained model” is the name I will use to refer to AI models made accessible by companies like OpenAI, with capabilities including text, image, and embedding generation. You can contrast these pre-trained models with custom models trained to solve a specific task using “big data.”

It’s 2023, and there are still growing start-ups with billion-dollar valuations that rely on a single pre-trained model.

For example, companies like Jasper and Copy.ai have focused on designing user interfaces and prompt engineering to differentiate using a single type of pre-trained model, the generative pre-trained transformer, GPT for short. Fortunately, both are still young companies. So both are likely to have the runway, momentum, and ambitious talent essential to adapt using the strategies in this article.

But these companies have painted a seemingly simple version of the AI future. And that sparked a skills debate. Which skill will be most important for our new AI-enabled future? You can find arguments for and against both frontend development and prompt engineering as the most important skill.

However, companies focused on either of those two domains, frontend development or prompt engineering, see copycat products emerging as fast as they can build. And that insecurity has prompted an even bigger debate than the one around whose skills will be most valuable.

”The Moat Question”

A bigger debate has emerged: how to build a competitive advantage, also known as a moat, around businesses with AI-enabled products.

It’s one of the biggest, if not the biggest, question that executives, investors, and founders still have about our AI future. People are engaging in fiery debates about whether such a moat is even possible! And many people throw in the towel and accept that no one seems to have a satisfying answer to “the moat question.”

A Level Playing Field

Since pre-trained models have leveled the playing field for shipping AI-enabled products, it is clear that you will need more than development around a single pre-trained model to sustain a competitive edge, no matter how pretty the design or how advanced the prompt is.

Nor will the “data moat,” the strategy that has sufficed for over a decade, will not be significant enough of a competitive advantage. It too will fail, as its castle falls into disrepair, overshadowed by a new skyline, with the king wondering where everyone, and the proceeds from their data tax, has gone.

A New Opportunity

Naturally, entrepreneurs, leaders, and product designers must search for new ways. They must design more complex products than simply building a frontend application around an API endpoint.

They need to design systems that create a new type of competitive advantage while leveraging the capabilities of the latest AI technology. They must find synergy between pre-trained models and build robust yet flexible systems that can evolve to provide increased value with each advancement in the underlying technology.

The “Data Moat”

For the past decade, many companies that succeeded with AI products maintained market dominance by collecting troves of data, a “data moat,” which they used to train their product consisting of a proprietary custom model. This “data moat” allowed them to perform better, and prevent competition, simply by hoarding data.

But the “data moat” has dried up, and the crocodiles are nowhere to be found.

Highly-capable pre-trained models, made available by companies like OpenAI, have ended the era of the “data moat” for nearly all companies, with maybe the exception of those providing the pre-trained models as a service.

In this new era, a single pre-trained model could be used to quickly build products that outperform those from businesses that previously relied on a “data moat.”

I predict that companies continuing to believe in their “data moat” will be recognized as the first to fall since this recent wave of pre-trained models. Some may even do so quite spectacularly. As a company, it will be challenging to realize that its entire strategy is hoarding value instead of creating it.

While the fall from the top is never easy, this level playing field brought on by the accessibility of pre-trained models will allow new value creators to benefit us all.

Note: Don’t feel bad for the “data moat.” I’m holding back here, but I want to say the “data moat” is a bad strategy. Not bad as unprofitable, but bad as harmful to the ecosystem. In no unclear terms, a “data moat” is monopolistic.

The First Wave

Some people may believe that the recent wave of pre-trained models is a new phenomenon. However, you can compare the pre-trained models to the natural language processing (NLP) services that began to pop up over a decade ago.

These NLP services, also available via API, were the first wave of off-the-shelf AI components. They had specific functions such as keyword extraction, summarization, and sentiment analysis.

Although each NLP service had a limited function on its own, when stacked together, they were able to build impressive systems that are still in use today.

These NLP services, the original pre-trained models, were simple enough that it would have seemed like poor judgment if I had just repackaged them individually to sell as a new product with a shinier user interface.

Instead, by designing a system that implemented sequences of these NLP services, I built an autonomous value creating a system that grew a significant audience and continues to inspire others to try to develop similar strategies today.

Case Study: Top Feed

Going back to around 2013, an early start-up I worked on enabled businesses to capture value from curated content. I used the product to build an audience around bleeding-edge technologies in the frontend development space. But curating content took a lot of time and was something I knew I could automate.

While it started with a simple script and keyword matching, my curating system quickly improved in efficiency and autonomy as I discovered the various NLP services that were becoming available.

My system, code-named “Top Feed,” continually improved how it curated niche content. I designed “Top Feed” to deconstruct content based on contextual indicators like keywords and sentiment sourced from NLP services, the original pre-trained models.

Using a Bayesian network approach, a system for keeping the score of the indicators relative to the niche, well-performing content based on clicks and engagement, would increase its contextual indicators’ future relevance.

New content was regularly added to the system using a Twitter search of keywords that the system determined most relevant. When a piece of content was deemed sufficiently relevant, another NLP service would summarize it before distributing it to the audience.

My “Top Feed” operated for years after I stopped developing it and enabled me to grow audience numbers to the hundreds of thousands in the JavaScript developer niche. And I used a similar system to grow a WFH audience to almost a million followers since the pandemic.

I did all this without using custom-trained models, which were simultaneously becoming more accessible and easier to develop. But instead, my “Top Feed” succeeded by utilizing carefully arranged off-the-shelf NLP services, the original pre-trained models, if you will, to create synergy and produce a valuable system.

The Current Wave

It’s easy to misunderstand this latest wave of advancements because the abilities of pre-trained models, like OpenAI’s ChatGPT and Bing’s Sydney, seem nearly infinite in scope.

But, while they may seem infinitely robust, each pre-trained model available today has limitations. That’s true for “GPT 3.5” and will remain true for the inevitable GPT-4 and whatever comes next!

The truth is each of these pre-trained models serves a particular function you can learn to understand. It will take some time since they function much differently than our standard computational components. And the reason these recent examples seem so capable has more to do with their ancillary systems, including human feedback loops, than the underlying pre-trained models themselves.

Building Blocks

Understanding the limits and possibilities is necessary to begin exploring how these pre-trained models may be used together in a system. You can imagine these pre-trained models as building blocks. And knowledge of the building block materials is an essential first step in architecting a durable, long-lasting, competitive advantage.

By “stacking” together multiple pre-trained models, creating a significantly more advanced and robust system becomes possible. So, while it may be tempting to focus on a single pre-trained model by fine-tuning and prompt engineering, it’s important to remember that the actual value lies in designing the system. Notably, a system that creates synergy amplifies each component’s output. With this approach, you can gain a competitive advantage. Your “moat.”

Prompt Chains

If you’re reading this, you’re familiar with the concept of prompt chains. At the time of this writing, a product with a prompt chain is far ahead of its competition that continues to focus on engineering a single prompt.

These products containing “prompt chains” may utilize pre-trained models with different capabilities, prompts, and fine-tuning to maximize the accuracy and efficiency of the results they provide users.

In hindsight, we will recognize prompt chains as the infancy of developing AI systems, akin to the first multi-cellular organisms when all previous life was single-celled. While trying not to minimize this vital development, I predict that future systems will simultaneously see an increase in prompts and a decrease in the ratio of components that utilize prompts.

Embeddings Systems

Embeddings systems provide a way to organize memories for the AI. They are also used to numerically represent non-numerical data, like text and multi-media, for efficient comparison using traditional computations.

More recently than prompt chains, products that integrate embeddings systems have exploded in numbers.

OpenAI lists a half-dozen ways embeddings you can use embeddings in a system, including search, clustering, recommendations, anomaly detection, diversity measurement, and classification. Furthermore, you can embed content in nearly as many ways as you can write a prompt to achieve a similar result with similarly varying accuracy and cost.

My prediction is that, as useful as prompt chains have become, a chain of embeddings may become even more helpful. And mature AI systems will implement embeddings before and after every generative pre-trained model, whether for adding context or criticizing outputs.

The “System Moat”

“System Moat” is the name I use to describe an AI system consisting of multiple pre-trained models such that their arrangement becomes valuable as a design.

The “System Moat” is the competitive technological advantage you can achieve with pre-trained models. Unlike designs for many other products, “System Moats” are unlikely to benefit from being patented, though they should remain a corporate secret.

Instead of gaining value from a single patented design, the “System Moat” should gain strength from its flexibility. As the AI ecosystem advances, new pre-trained model components become part of and improve the system’s capabilities.

The system’s complexity may develop until it’s necessary to organize sub-systems. Then, you may form development teams around these sub-systems, and only principal architects may understand the complete system—similar to complex software applications today.

A “System Moat” visual may look like the flow chart or diagram from any other software application. The difference is, where in past systems components were made of computational functions, components of the “System Moat” consist of pre-trained models. Traditional computational processes become transient requirements to hold the AI system together. They are most valuable when they’re easy to replace and reshape to make room for future renovations.

Ultimately, the “System Moat” will evolve into a high-level organizational chart for the business. Not only will the pre-trained model components replace traditional computational functions, but they will also begin to replace human elements.

The rise of the “System Moat” will redefine technology’s role in business. The “System Moat” will see a company’s technology evolve from being the product to becoming the backbone of its brand. And I predict that “Network Effects” will take on a new definition as “System Moats” become sufficiently complex.

Other Moats

Many start-up founders seem to forget that technology, or intellectual property, is just one of the possible ways to build a competitive advantage, or a “moat,” around their business. However, as we continue to witness the decrease in the value of the “data moat,” in addition to building a “System Moat,” it will be essential to focus on the other types of competitive advantages.

A well-oiled machine will continue to provide a competitive advantage to AI-enabled businesses through economies of scale. The “Systems Moat” may even be viewed as a derivative of this competitive advantage.

Brand & Relationships

Relationships will likely regain importance, with the current lack of importance epitomized by the SaaS industry and their efforts to make ranting on social media the best solution to getting customer support.

With AI enabling the personification of software, like conversational interfaces, users’ expectations will continue to grow. As a result, users will begin to build empathy for the personable chat interfaces and develop a deeper trust. However, with this deeper comes responsibility. Like in any relationship, the more trust has grown, the more painful it will feel and the more damage it will do if broken.

To prevent breaking trust and losing customers, companies will have to design their systems to remain flexible enough to improve with the advancing technology so that their promises to customers, “being the best this for that,” stays true because a breach in that trust will usher users to newer competitors, with even more entering the market as barriers continue to fall.

Network Effects

Network effects will continue to play a significant role in providing a competitive advantage for technology companies. Moreover, the type of network effects will continue to grow. The current view is that human users are the driver of network effects. But that may soon change. For example, when do pre-trained models become sufficiently advanced enough to bring as much value to the network as a user?

It’s also generally accepted that network effects create value through user interactions. But what if users interacting with the AI made the value of the network?

What’s Next

First, developing a deep understanding of the shifting paradigm from training custom models to utilizing pre-trained models is essential. You must understand everything from the technology’s capabilities to recognizing the types of talent necessary for your product development (hint: why didn’t I mention data science until now?).

When you’re ready to begin product development, emphasizing continuous iteration and flexibility is more important than ever. Unfathomable developments in AI technology will continue at a rapid pace. The ability to incorporate those developments, rather than compete with them, will define companies and strengthen those with a “System Moat.”

Key Points

  • There are a growing number of components available to build AI systems. These components are known as pre-trained models, and each comes with varying levels of capabilities and cost.

  • With an effective strategy, you can organize pre-trained models into a system that maximizes utility and accuracy while minimizing cost.

  • The method for implementing pre-trained models into a synergistic arrangement, a “System Moat,” becomes a company’s proprietary information. Maintaining this “System Moat” strategy as a corporate secret becomes a competitive advantage.

  • Systems that maximize flexibility will allow increasing sophistication and become the hardest to reproduce.