This Looks LLM-Shaped: When 150 SEC filings shift in ways researchers associate with AI in writing
A corpus-level signature, and what it does and does not let you infer about any single document.
Across 150 SEC 10-K filings from 50 large US public companies, language drift accelerated by 24.5 per cent when enterprise LLM tools became available. Filings shifted towards a broader, less specific register across five dimensions: hedging, specificity, lexical range, named referents, and structural variation. The previous article reported this result.
None of the 50 companies discloses LLM use in drafting. These are documents investors rely on for material decisions, and the corpus shows a pattern that no filing explains.
How can we make a corpus-level claim about LLM influence when no company will name the tools or describe their use?
The answer comes from controlled research where LLM use is known and effects are measured directly. That research describes a specific signature, and the 10-K corpus matches it.
In a 2024 ICLR paper, NYU researchers Vishakh Padmakumar and He He tested this. Three groups wrote argumentative essays: one unassisted, one using a base model (GPT-3), and one using a feedback-tuned model (InstructGPT). The feedback-tuned condition caused a statistically significant drop in diversity. Writers reused identical 5-word sequences across unrelated essays, and distinct argumentative points fell. The base model produced no comparable effect.
Decomposing the essays into human and machine components revealed the convergence source. The human portions remained varied while the model's contributions repeated across writers, occupying the same narrow phrase space regardless of who typed. Homogenisation entered through the model's text and travelled into the final essays when writers accepted those suggestions.
Feedback-tuning narrows the model's output towards patterns human raters preferred during training. Writers who accept suggestions import this narrower set of options. Convergence emerges at the corpus level because the suggestions occupy a smaller phrase space than writers would reach independently. The constrained vocabulary arrives with the polish; writers absorb it during editing.
The same pattern surfaces in adjacent research. Stanford's Liang and colleagues estimated that 6.5 to 16.9 per cent of peer reviews at four major AI conferences showed substantial LLM modification after ChatGPT's release. Other work measures a sharp post-2022 shift in academic prose towards a smaller set of words and constructions, with the steepest drop in computer science journals. LLM-mediated editing studies find that arguments survive intact, but stylistic differences between authors shrink.
The 10-K finding fits this shape. Drift occurs simultaneously across all five dimensions at low amplitude. Aggregated across the corpus, the movement registers strongly across 50 unrelated drafting environments with no shared editorial process. A sharp shift in one dimension could be sector-specific, a new SEC rule, or a single advisory firm. Movement on all five, across companies sharing no drafter or template, requires a different explanation. The literature predicts this distributed, lower-amplitude convergence when feedback-tuned models enter the editorial chain. The mechanism Padmakumar and He observed applies here. Human-authored material reflects the company's drafters, while model-suggested material drifts towards the same narrow register.
Residual Logic identifies the structural mechanism by which AI-generated reasoning passes through human editorial layers into final text. Before a human reads the first draft, the model has already chosen which variables to include, in what sequence, and against which alternatives. The editorial layer that follows operates on language; the model's architecture goes unexamined. Structural decisions survive because humans lack the time or the brief to reconstruct them.
The 10-K finding is a field instance of this mechanism, observed at corpus scale across 50 companies that made no disclosure. Identifying the specific tools or users is not required for the inference; the signature is the structural residue itself. Across a sample wide enough to rule out sectoral causes, something with the properties of a feedback-tuned LLM has acted inside the drafting process.
Whether this constitutes a disclosure failure under existing securities law is a separate question, and one not yet posed to the 50 companies in this sample.