What’s Next for LLMs? The Rise of Domain Specificity

Do domain-specific LLMs represent the future of AI? The Velstar team investigates and makes some interesting predictions for 2025.

It’s all about the large language models (for now)

When the average person talks about AI, they’re generally referring to generative AI. That is, those cool little interfaces that allow you to produce reams of text or a dazzling picture with a simple text-prompt. And, that’s all very well and good. But, if you take a peek under the hood of your average generation AI tool (be it OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, or xAI’s Grok) you’ll see that they are underpinned by what are known as large language models - or LLMs as we’ll refer to them from now on. It’s these LLMs that are the beating heart of generative AI tools. But, what exactly are they? This is a drastic simplification, but LLMs are deep-learning algorithms that can perform a variety of natural language processing tasks. They ingest vast amounts of data (be it written or visual) and effectively ‘work out’ the relationships between different data points. This allows them to recognise the relationship between words and string them together in a statistically modelled best or most relevant sequence. In many ways, they mimic the human brain in a remarkable way (so much so that some scientists refer to LLMs as neural networks). As we’ve seen over the past two years, the capabilities of these LLMs - and their attendant generative AI platforms - have been impressive, finding use in myriad industries from sales and marketing through to engineering and healthcare. However, it’s possible that we’re reaching a ‘ceiling’ in LLM development. Which brings me on to my next point…

The limits of LLM development?

Based on my fairly extensive reading of experts in this field, I would suggest that there are signs that the development of LLMs may be stalling and reaching a peak. Why? Well, there are a number of reasons.

A lack of new training data

LLMs must be trained on data. In the same way that humans are (arguably) born as a tabula rasa, and must be educated into a fully-functional adult, LLMs must be trained upon data in order to provide their life-like feedback and responses. In fact, it seems to be becoming clear that performance gains for LLMs are (almost solely) obtained from the data they are trained on. Speaking to TechCrunch, Kyle Lo, a senior applied research scientist at the Allen Institute for AI (AI2) summed up the situation describing the difference between two new LLMs: “Meta’s Llama 3, a text-generating model released earlier this year, which outperforms AI2’s own OLMo model despite being architecturally very similar. Llama 3 was trained on significantly more data than OLMo”, which Lo believes explains its superiority. As another researcher, James Betker of OpenAI, put it: “Trained on the same data set for long enough, pretty much every model converges to the same point”. When you think about it, this makes sense. If you were to make every school-age pupil in the UK read exactly the same books, you’re going to end up with a group of people with shared assumptions, thought patterns and social norms. The same is true of LLMs. It’s all about the data. The LLMs must be fed with data. The thing is, the LLMs of all the major players have now ingested nearly all the world’s data (or at least the entirety of the scrapable Internet). Everything from the great works of literature to obscure manifestos sit within the bowels of the great LLM beasts (which is also why a lot of guardrails have had to be implemented to stop ChatGPT regurgitating 4Chan screeds in response to GCSE essay questions). There’s not a great deal of data left to be ingested by these LLMs - which means they’ve effectively reached the pinnacle of actualisation (if machines can truly ‘actualise’ in the sense that Maslow meant it). It’s for this reason that the big AI players have, over the past six months, engaged in the great ‘training data hunt’. They’re not exactly wearing pith helmets and carrying blunderbusses, but they do appear to be engaging in some fairly frantic efforts to obtain (and/or create) new training data to feed the ever hungry maws of their LLMs. This has led to a situation where AI leaders like OpenAI are paying hundreds of millions of dollars to obtain new sources of training data from the likes of news publishers, book publishers, stock media libraries and even institutional archives. Idea - want to make some serious money in the next decade? Become an AI data broker (the market is expected to be worth at least $30 billion by 2035). One response to this training data ceiling has been to see if it’s possible to create LLMs on ‘synthetic data’. In a move which brings to mind something akin to soylent green, companies are now turning to the use of synthetic data to train their models. This is - as its name suggests - data which has been created artificially (via decision trees, deep learning, and graphics engines) to mimic the characteristics of real-world data. This isn’t a new development, by the way. Synthetic data has been around for a long time - computer simulations being the archetypal example of artificial data. However, academics are suggesting that the use of synthetic data to train LLMs may not be feasible in the long run. Why? Because of our next point…

The lessons of Habsburg’s jaw

There is a famous portrait of Charles II - the King of Spain from 1665 to 1700. Painted by Juan Carreño de Miranda it sits in the Kunsthistorisches Museum in Vienna. It depicts - as you would expect for a royal portrait - a man who is garbed in finery with a regal bearing. However, there is one element that stands out to the careful observer - the subject’s prominent lower jaw and generally lopsided face. Charles II wasn’t the only member of his lineage to bear such an unusual jawline (known as mandibular prognathism) - in fact, so many of his relatives had the same feature it garnered the name the ‘Habsburg jaw’. Why am I telling this story about a long-deceased monarch in an essay about A.I? It has been firmly established that Charles II’s distinct jawline was a result of inbreeding. Charles II had an inbreeding coefficient of .25 (which is phenomenally high). To put it into context, such a coefficient is the same as two siblings having offspring. The result wasn’t just a prominent jaw, but other health issues including epilepsy, an overly large tongue, infertility, gastrointestinal problems and more. I raise this point as LLMs are facing their own ‘inbreeding’ problem. To be more specific, they must avoid ingesting data that has been produced by other LLMs. Such recursively generated data can induce what academics term ‘model collapse’. It’s a situation which has been named Habsburg AI. AI companies now find themselves in something of a conundrum. Having been initially able to mine the Internet for data to train their LLMs, they no longer have this option - what with vast amounts of data online now having been produced by various LLMs. The authors of the Nature study, ‘AI models collapse when trained on recursively generated data’, summed up the situation as follows: “In our work, we demonstrate that training on samples from another generative model can induce a distribution shift, which - over time - causes model collapse. This in turn causes the model to mis-perceive the underlying learning task. To sustain learning over a long period of time, we need to make sure that access to the original data source is preserved and that further data not generated by LLMs remains available over time. The need to distinguish data generated by LLMs from other data raises questions about the provenance of content that is crawled from the Internet: it is unclear how content generated by LLMs can be tracked at scale”. I’ve highlighted that last sentence to draw your attention to the key point. The Internet is essentially ‘dead’ as a source of new data for LLM training. From now on, AI companies will need to rely on institutional (and most importantly, proprietary) sources of knowledge and data for training purposes. If you have a large source of data that a) you own, and b) has never been uploaded to the Internet - now’s the time to make an approach to an AI company.

AI’s energy quandary

As I’ve written previously, LLMs (and the AI platforms they underpin) consume truly vast amounts of electricity; I’m talking country-sized amounts of power. LLMs use electricity in two stages; the first is the training stage. As De Vries pointed out in his journal article, ChatGPT 3 consumed 1,287 MWh of electricity during its training stage. That’s enough energy to power 390 UK homes for an entire year. The second stage in which LLMs use energy is the inference stage. This is the energy the LLM uses to provide responses to inputs (e.g. the energy it uses to stay running following training). It has been estimated that ChatGPT consumes 564 MWh per day just to run and provide answers. That’s the same amount of energy that 170 UK homes use in a year - every. single. day. Indulge me, but I just want to restate that - every day, ChatGPT uses the same amount of energy as 170 UK homes use in an entire year (and, the reality is, this number is likely to be much higher). As you can imagine, this all comes at an enormous cost. In fact, it's so costly that AI firms are racing to find a solution. The current solutions on the table include:

Bringing shuttered power plants back online. In fact, Microsoft wants to bring the infamous Three Mile Island nuclear power plant back online.
Some AI companies - such as Elon Musk’s xAI - are going ‘off grid’ and creating their own gas-powered power stations for their AI data centres.
Other AI pioneers - such as OpenAI’s Sam Altman - are being hugely ambitious and pouring money into nuclear fusion (which, cynics have pointed out is ‘the energy of the future - and always will be’).

With major AI players such as OpenAI set to lose in the region of $6-5 billion this year (in large part due to their capital and energy costs), AI companies are desperate to solve the energy quandary associated with LLMs. Idea - given the long time horizons associated with nuclear power, the smart money in 2025 is on natural gas as being the energy-source of choice for AI companies (see Musk’s xAI above). As elusive energy analyst Doomberg puts it, “If speed is truly of the essence in the AI race - and few would deny that it is - natural gas is by far the best solution… Will natural gas be the bridging fuel that carries the AI boom along until a full nuclear renaissance can be realised? All the signs are there”. In other words, natural gas producers could be a wise investment in 2025 (although we are in no way offering financial advice 🙂).

What’s next for LLMs in 2025?

Okay, so we’ve seen that LLMs, and the AI they power has made huge progress over the last two years. However, they’re facing a number of challenges. It’s my contention that these challenges will be solved - but LLMs (and the broader AI industry) are going to look very different by the end of this year.

The data solution

How are AI companies going to solve the data issue facing LLMs, then? There are a number of possible solutions. The first is through simple brute force capitalism. OpenAI has just completed the largest venture capital deal of all time. In its latest funding round, the generative AI progenitor raised $6.6 billion at a valuation of $157 billion. That valuation makes OpenAI the third most valuable VC-backed company in the world, surpassed only by SpaceX and ByteDance (the owner of TikTok). OpenAI now has the funds to buy any data it wants. In theory, the company is now in a theoretical position to buy the entire back catalogue of Penguin Random House or the rights to tens of thousands of movies. This isn’t mere idle conjecture, either. Earlier this year Meta (the parent company of Facebook and Instagram) briefly mulled over purchasing Simon & Schuster. In short, AI companies may solve their current dearth of data with cold, hard cash. I won’t discount that these companies may also solve the Habsburg AI problem, but in the near term I suspect they’ll prefer to just use their capital to acquire new data sources.

The domain-specific LLM solution

Okay, so we are onto the crux of this article. There is an emerging type of LLM that can potentially leapfrog the current obstacles in the way of LLMs. Domain-specific LLMs. Domain-specific LLMs are language models which have been trained in a certain knowledge domain. They are explicitly designed to excel within certain areas of expertise. Ask them the sort of more general questions that you would have on ChatGPT4, and they’ll likely fail. But, ask them a question in the domain in which they have been trained, and you’ll get a superb answer. Domain-specific LLMs are already extant and being used ‘in the wild’. Examples include:

ClimateBERT - an LLM that is trained on climate-related information.
Med-PaLM2 - an LLM that is designed to provide medical diagnoses.
BloombergGPT - an LLM that has been created and trained for financial forecasting purposes.

So, why do I think that domain-specific LLMs represent the future of LLMs more generally? It boils down to a couple of points. Firstly, domain-specific LLMs have significantly lower development costs than the behemoth foundational LLMs of OpenAI and others. Domain-specific LLMs typically start life as small language models that are then expanded and refined through fine-tuning LLM techniques. This results in LLMs that are far more accurate and efficient in relation to specific knowledge domains. In other words, instead of trying to consume an entire Internet worth of data, domain-specific LLMs focus instead on smaller, domain-specific datasets - meaning they are cheaper (in energy terms) in both the training and inference stages. To put it bluntly; domain-specific LLMs are likely to be much more profitable (at least in their inference stage). Aside from profit, domain-specific LLM’s have the potential to be more effective at their given tasks. As a study³ from earlier this year pointed out, foundational LLMs are just too big to be effective for targeted applications: ‘However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g. various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications)’. From a marketing perspective, foundational LLMs have already made a number of cultural faux pas. Does anyone remember Gemini’s image generation controversy from earlier this year? Domain-specific LLMs will allow marketers to create AI applications that better take into account the cultural nuances that the larger LLMs miss. Sure, they lack some of the magic of generic LLMs - you can’t expect a domain-specific LLM to answer a question on any topic whatsoever - but, when correctly trained and optimised they can be incredibly effective. Think of them as ‘subject matter experts’ rather than savants. In some senses, this evolution parallels the development of human knowledge. In the Classical era through to the Renaissance, emphasis was placed on being a multipotentialite - an individual who could excel in myriad fields of knowledge. However, as the world globalised from the 19th century onwards and Ricardian economics became the presumptive trading model, emphasis was placed instead on specialisation. Individuals would prosper economically by becoming highly specialised in certain fields of knowledge. As this story makes clear, we effectively went from human foundational LLMs to human domain-specific LLMs. It makes economic sense for the AI industry to follow this path in the years ahead. Note - we are already seeing this domain-specific trend emerge. Take Consensus, for example. It is an AI-powered search engine that focuses only on scientific subjects. Idea - if you want to gain a competitive advantage in AI, companies should be focusing on creating domain-specific LLMs for niche, cash-rich, knowledge domains. It certainly seems to be an approach that Google is endorsing.

The special–purpose technology solution

To date, AI platforms have relied on costly and energy-hungry GPUs to train and run their LLMs. In fact, the biggest AI players have been spending colossal sums on these GPUs (and their associated data-centre infrastructure). In its last earnings call Alphabet, the parent company of Google, announced it had spent $50.6 billion on infrastructure in the last quarter - up from $30.6 billion during the same quarter last year. The likes of Alphabet are cash-rich - but such jumps in infrastructure spending will be noticed by even the biggest of computing giants. Not only does all this infrastructure cost vast gobs of money - but, it appears as though the industry is starting to see something of a ceiling on GPU performance in relation to LLMs. As the size of LLMs has increased, memory access bottlenecks have appeared. In other words, LLMs are becoming so big that even the latest GPUs are struggling to handle them. Again, there is also the question of energy. Blackwell, NVIDIA’s latest GPU, runs five times faster than its predecessor (helping to alleviate the memory access bottleneck problem), but uses 70% more power in the process. What’s the solution? AI-specific computing hardware. GPUs were never designed for AI (at least until recently). They existed to make your video games look fantastic or to help creatives edit stunning visuals or videos. Using them to run LLMs came about because there wasn’t really any alternative technology. That’s all changing. Google and others are developing processors that are explicitly built to run LLMs. Let’s have a look at one of the earliest, and most prominent, examples; Google’s Tensor Processing Unit (TPU). The MIT describes how this TPU architecture works: ‘The TPU contains thousands of multiply-and-add units directly connected in a giant grid. The TPU loads the data from external memory into its grid, where it flows through in regular waves, similar to how a heart pumps blood. After each multiplication the results are passed to the next unit. By reusing data from previous steps, the TPU reduces the need to access the off-chip memory. TPUs are a type of ‘domain-specific’ (DSA), processors that are hard-wired for one purpose’. Tensor is not the only AI-specific hardware on the market. We are now also seeing the rise of NPUs (neural processing units). As the name suggests, these are chips that are designed to accelerate AI processing. In an interesting development, NPUs can be either standalone chips, or integrated into CPUs or GPUs. The good news is that AI-specific processing hardware (be it Tensor or an NPU) seems to be not only faster, but also more energy efficient - particularly when compared to generalist GPUs. This will certainly go some way to alleviating AI’s energy quandary. Other good news is that progress on this front is rapid. We are already seeing laptops coming to market which feature NPUs, facilitating on-board, locally-installed LLMs. Lenovo appears to be one of the quickest actors in this regard, with laptops featuring integrated NPU chips married to powerful, yet efficient, GPUs and CPUs. Idea - if you’re upgrading your computer hardware, make sure you future-proof it by investing in machines that feature AI-specific processors.

Conclusion

AI isn’t going away - there’s far too much money at stake for one thing - but, the future of AI is likely to look quite different to today. Already, we are seeing a shift away from the foundational LLMs to domain-specific LLMs. At the same time, hardware and chip manufacturers are busily consumed in the work of developing hardware that can support LLMs more efficiently and more effectively. Key takeaway: LLMs will still exist, but we will see more smaller, domain-specific LLMs married to more efficient AI-specific processors. What does this all mean for e-commerce? As my colleague John Warner has repeatedly pointed out, we are likely to see brands embracing domain-specific LLMs to create niche chatbots that offer a truly unique online shopping experience - your own (genuinely useful) personal shopper if you will. In my own realm of content marketing, we may see a complete inversion - whereby, content writers create unique content (which is never published on the Internet!) to be fed in as training data to domain-specific retail chatbots. If you want to gain a competitive advantage in online retail in 2025, a domain-specific LLM to power your own personal shopper chatbot should be at the top of your shopping list.

Harness the power of AI for your website and marketing efforts with Velstar

Want to stay ahead of the competition? Want to sustain your brand even as the economic sky darkens? Then harness the power of intelligently-applied AI with Velstar. Velstar can create and promote your website in a way that will delight your customers. Let’s chat…

Footnotes

1. Provided some of the larger companies can weather the storm of copyright infringement lawsuits currently marching in their direction. 2. Though Sam Altman is desperately attempting to redefine what constitutes AGI in order to escape the golden handcuffs and, likely, win further investment. 3. Ling C, Zhao X, Lu J, et. al. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [online]. Available at: https://arxiv.org/html/2305.18703v7 (Accessed on 5th January 2025).