Model Volatility: The AI Reproducibility Problem

It wasn't a process. It was a time-stamped response.

A cloud-hosted language model shows a stable surface over a changing interior. The screen the user works with does not change. The model answering behind it is retrained and reconfigured on the provider's schedule, and the organisation rarely learns when that has happened. What changes is the model itself, at every layer the buyer cannot see: the weights, the instructions it runs under, its safety behaviour, and the routing that carries each request. None of this is announced to the organisation that has built its workflows on top.

The mismatch breaks an assumption that most enterprise work depends on without ever stating it, that the tool you tested is the tool you are using. An output received on a Tuesday reflects one configuration. The same prompt sent a month later reaches a different one and returns something coherent that diverges in the details that mattered. AI-assisted work becomes dated in the literal sense. It is tied to the configuration present on the day it was produced, and it cannot be reliably reproduced.

The governance problem

Audit assumes that work can be reproduced. The standards built to make AI governable, ISO/IEC 42001 among them, assume that a controlled change to a service is a documented change, communicated to the people who depend on it. A hosted model updates continuously and quietly. Board papers, risk assessments, and regulatory filings are produced with tools whose behaviour changed after the validation that approved them, and the change leaves no trace in the organisation's own records.

When the output drifts, the organisation looks inward first. It examines the prompt, the training given to staff, the internal data feeding the model, because those are visible and within reach. The change on the vendor's side is neither, so it goes unexamined, even when it is the cause of the drift. The prompt library that worked in March becomes a record of how the model behaved in March. It is treated as a current toolset while functioning as a historical document.

The contract offers no remedy, because it was never written to. Cloud AI agreements reserve the vendor's right to modify the service and define that service through access, uptime, and usage limits. Behaviour is left out of the guarantee. Organisations build their workflows on behaviour anyway, on the model's tone, its refusal patterns, and the way it signals uncertainty. When those change, there is nothing in the agreement to point at. The service was sold as something that keeps working. What keeps working is the access. The behaviour is not under contract.

The shadow system problem

A consequence follows that rarely appears in the governance papers. The sanctioned tool is the governed one, which makes it the constrained one, and constraint reaches the user as friction. The unsanctioned assistant, used quietly in a personal channel, is more useful for difficult work precisely because nothing is watching it. Decision quality migrates toward the place that is easier to think in, which is the one outside the record. Accountability stays with the approved tool while the reasoning happens elsewhere.

That exposure becomes concrete in a dispute. The claim that AI was used only to tidy the grammar survives until someone asks how the first set of options was narrowed, and on which version of the model. A final document can read as unassailable while the path to it ran through a configuration that no longer exists and cannot be produced on request. The assumption that the approved tool is where the analytical work happened is the one under the most pressure, and the one an organisation may most need to demonstrate.

What reproducibility actually requires

Reproducibility in AI-assisted work would require, at minimum, the ability to identify the exact model version in use at the time of production, the system instructions under which it was operating, and the routing decisions that determined which configuration answered which request. None of these are routinely disclosed by providers. None are routinely recorded by organisations. The absence is not negligence on either side; it reflects that reproducibility was never part of the design.

The organisation that has committed to AI-assisted governance now operates with a tool that cannot demonstrate consistency over time. It can demonstrate output. It cannot demonstrate that the output would have been the same last month, or that it will be the same next month. In regulatory environments that require demonstration, this is not a minor limitation. It is a structural one that the current governance frameworks were not built to address.

Model Volatility: When the tool changes without telling you

The governance problem

The shadow system problem

What reproducibility actually requires

The analysis