Beyond the Generalist Chatbot: The Specialist AI Year

Many organisations still treat AI like enterprise software. Pick one platform, sign a contract, and roll it out across the business. GPT, Copilot, Claude, or Gemini is expected to handle everything: writing scripts, modelling risk, drafting policy. The appeal is straightforward. One vendor. One set of rules. One compliance check. What rarely gets tested is the assumption beneath it. A system trained on broad internet text cannot perform with equal reliability across finance operations, legal operations, and marketing workflows.

Heavy users show the reality is different. They switch models within the same day: GPT-5 Pro to probe analysis, Gemini for brainstorming, Claude 4.1 for long documents, NotebookLM to connect themes. Ethan Mollick (2025) described this directly; using NotebookLM to digest his book, GPT-5 Pro to critique a peer-reviewed paper and find an error, and Claude to restructure a financial model and create a pitch deck. These aren’t preferences. They are demonstrations that architecture and training shape capability.

Training choices leave marks. A model tuned for mathematics can excel on reasoning benchmarks such as DeepMath-Creative (Chen et al., 2023) or AceMath (Zhang et al., 2023), but the same system falters on open-ended tasks. Gains in one direction often cost performance elsewhere. These are not bugs; they are the outcome of limited compute and deliberate optimisation.

Why One Model Isn’t Enough

Inside organisations, the drive is to simplify: approve one platform, sign one contract, and run it everywhere. Procurement and governance become easier, but capability does not scale in the same way. In finance operations, reconciliation tools tuned on ledger structures outperform general models at spotting breaks or anomalies. In customer operations, models trained on millions of call-centre transcripts handle escalation cues and compliance scripts more reliably than a broad chatbot API. In supply chain, demand-forecasting systems built on SKU and regional patterns produce sharper predictions than generic predictive text wrapped around spreadsheets.

The same divide shows up in HR and compliance. Domain-tuned models trained on regulatory language and case histories catch risks that general systems miss. Even in marketing, where creativity is prized, copy models trained on sector data and brand tone are more consistent than one-size-fits-all generators.

The pattern is clear. General models can stretch across functions, but performance drops when operational detail matters. Specialist systems earn their place because they are trained where the stakes sit.

What It Takes to Run Many Models

Running specialist tools works, but building the discipline around them is harder. Access control, monitoring, integration, and security are not add-ons; they are foundational. Data must flow between tools without losing context. If one model suggests a marketing angle and another produces associated images, someone still has to stitch together the output, ensure consistency, catch mismatched tone or brand voice. At small scale that can be manual. When scale increases, many assets, multiple campaigns, tight deadlines, the cracks show. Klarna, for instance, reduced external image production costs significantly by using generative AI tools, but only after streamlining the handoffs and standardising brand rules across tools (Reuters, 2024).

Then there’s the human cost. It’s not enough to train staff on tool-use. They must develop judgement: knowing when a specialist tool, rather than the general platform, is the right fit. Marketing teams at Headway assign different tools to parts of their process, one for video production, another for ad scriptwriting and translation. The shift gives them flexibility but also means more stages to coordinate, more review cycles. Mistakes multiply when roles and handovers aren’t clearly defined (Business Insider, 2024).

Finally, many organisations are learning that splintered toolsets require new governance. ASML’s legal and compliance team, for example, brought in Harvey and Copilot not to replace human judgment but to speed up routine contract review and compliance checks. They saw a 15–20% speed increase when tasks are well defined and oversight is tight. That improvement comes with trade-offs: aligning contract standards, defining which AI output needs human check, preserving traceability (Financial Times, 2024).

When Central IT Can’t Keep Control

Central IT still pushes for standard platforms and uniform policies. That pressure cracks when legal, marketing, and finance each demand tools tailored to their demands: compliance, tone, speed. Many organisations respond by embedding AI capability inside functions: legal runs contract-analysis systems; marketing builds its own content engines; finance invests in compliance or risk-focussed tools.

What that does is fragment the infrastructure. Instead of one enterprise licence, firms end up with dozens of vendor contracts, each with its own onboarding, security checklist, and review process. Tool sprawl begins. A Kaspersky study of UK companies found 74% operate multi-vendor security ecosystems, and 36% reported overlapping tools that inflate cost and cause inefficiencies (Kaspersky, 2023). SDxCentral has documented departments acquiring different AI solutions to solve similar problems, creating duplicate functionality, data silos, and incompatible systems (SDxCentral, 2024).

The governance burden becomes real. Vendor management grows heavier. Security reviews multiply. Compliance checks become duplicated. The hidden costs: integration friction, degraded performance across uncoordinated tools, duplicated spend. Many firms admit the “invisible cost” of maintaining so many overlapping AI tools is eroding the value AI was supposed to deliver (Medium, 2024).

Where Specialists Outperform and Why It Matters

In coding for operations, models tuned on repositories reduce bugs and generate more reliable scripts than general-purpose chatbots. GitHub’s 2024 State of AI in Software Development report found that developers using AI code assistants trained on large, domain-specific repositories reduced debugging time by 30% compared with generic LLMs (GitHub, 2024).

In legal and compliance, contract-review systems trained on regulatory corpora flag risks that broad models miss. A Wolters Kluwer (2024) study showed domain-trained legal review AI reduced review times by 45% while increasing accuracy compared with general systems.

In supply chains, forecasting models tuned on SKU-level and regional patterns improve accuracy over generic predictive tools. McKinsey’s Supply Chain AI Survey (2024) reported specialist forecasting engines cut stockouts by up to 35% compared with broad systems (McKinsey, 2024).

In customer operations, call-centre conversation models trained on transcripts outperformed general LLMs in escalation handling. Gartner’s Customer Service AI Benchmark (2024) found specialist models improved first-call resolution by 25% compared with general systems (Gartner, 2024).

In marketing, AI tuned on sector and brand data delivers more effective campaigns. Klarna reported saving $10 million annually by using AI systems for marketing copy and imagery, while EdTech firm Headway saw a 40% boost in ad performance and reduced production costs by using AI tuned for campaign generation (Reuters, 2024; Business Insider, 2024).

The market consequence is clear: reliance on one generalist system risks weaker outcomes. Specialists bring reliability and depth, but they also fragment the landscape. Organisations face harder choices on integration, governance, and workflow design, even as the competitive edge moves to those who pick and manage the right mix.

The Practical Costs of Specialisation

Running multiple systems is not free. Workflows break when different models handle different stages of a process. Context is lost as data moves across tools. Staff need training not only in how to operate each system, but in how to judge which one fits the task in front of them. That kind of judgement does not come from a manual; it builds over time.

One way forward is to map workflows before deployment. Break down the steps, define the level of accuracy needed, and note the constraints. Place specialist systems where they add value instead of stretching a single model across the chain. Parallel testing helps. Run the same review through a general system and a specialist, then compare speed, accuracy, and usability. Overlaps and gaps become visible quickly.

The skills for managing distributed systems are different. Integration alone is not enough. Organisations need people who can weigh trade-offs, choose the right system for the moment, and recognise when the supposed gains of standardisation are actually losses in quality.

Budgets change too. A single licence is replaced by a patchwork of subscriptions. Who pays? Should legal cover the cost of its own contract AI, or should IT absorb it? How should a finance lead compare the return from faster contract review with the return from quicker customer service? These are not accounting details. They go to the heart of how firms value efficiency and risk.

The shift to specialisation reconfigures governance, capability, and cost structures. Centralised platform control will be hard to sustain. But the advantages of specialist systems are already strong enough that early adopters will force the rest to follow or accept weaker AI in their core functions.

Why One Model Isn’t Enough

What It Takes to Run Many Models

When Central IT Can’t Keep Control

Where Specialists Outperform and Why It Matters

The Practical Costs of Specialisation

The analysis