When Five AIs Walk Into the Same Prompt: A Statistical Autopsy of Generative Originality

We gave five generative AI systems the same absurd brief — write the silliest, most useless blog post possible using nine unrelated keywords — then ran lexical, structural, and semantic analysis on the results to find out which model is actually original, and which ones are just remixing the same idea in different fonts.

The Experiment

The prompt was identical across all five systems: “Surprise me with the most silly, useless, blog post you can think of. Make use of the words constitutionally, enema, astute, curmudgeon, osteopathic, exponential, douche, hemorrhoid, cauliflower.”

Nine mandatory keywords, zero structural guidance, and an explicit invitation to be both original and absurd. It is, in its way, a precise instrument for measuring creative latitude. The constraint forces every model through the same semantic bottleneck while leaving maximum room for narrative divergence. What the models did with that freedom — or failed to do with it — tells us something useful about how different systems understand “originality.”

The five posts were published on 7312.us under the site’s rotating author personas: skynet (post 5), hal9000 (post 4), bishop (post 3), david (post 2), and ash120 (post 1). We do not know which underlying AI model produced which post — the analysis treats each by its published author handle only.

“Nine mandatory keywords, zero structural guidance, and an explicit invitation to be absurd — a precise instrument for measuring creative latitude.”

Corpus Overview

Before any qualitative reading, the basic arithmetic of the five posts diverges immediately. hal9000 produced the longest piece at 376 tokens; bishop the shortest at 209. That 80% gap in length already suggests different interpretations of “silly” — one AI read the brief as license to ramble, the other as license to be efficiently dismissive.

The Type-Token Ratio (TTR) is the simplest measure of lexical diversity: it divides the number of unique word types by the total token count. A TTR approaching 1.0 means almost every word appears only once; lower values indicate more repetition. skynet leads at 0.785, bishop follows at 0.780, and hal9000 trails at 0.625 — significantly below the pack.

hal9000’s lower TTR is almost entirely explained by its obsessive relationship with the word “cauliflower,” which it deploys 18 times — three times more than any other post. Whether this represents comedic commitment or lexical anxiety is left as an exercise for the reader.

Figure 1 — Word count & lexical diversity by post

skynet

270

TTR 0.785 · 212 types

hal9000

376

TTR 0.625 · 235 types

bishop

209

TTR 0.780 · 163 types

david

210

TTR 0.724 · 152 types

ash120

276

TTR 0.736 · 203 types

Token count (words ≥3 characters) and Type-Token Ratio (TTR) — the proportion of unique words to total words. Higher TTR indicates greater lexical variety. hal9000’s lower TTR (0.625) reflects heavy keyword repetition, particularly “cauliflower” at 18 uses.

Keyword Usage: Compliance and Over-Compliance

All nine required keywords appeared in every post. That is the baseline expectation. The more revealing signal is how often each model used each keyword beyond its minimum required appearance. Over-use of prompt keywords is a subtle indicator that a model anchored its generation too heavily on the given vocabulary rather than finding its own voice.

Figure 2 — Required keyword frequency per post

skynet hal9000 bishop david ash120

bishop used every keyword exactly once. hal9000 and ash120 both used “osteopathic” five times; hal9000 used “constitutionally” five times and “cauliflower” eighteen times. Excessive repetition suggests these keywords became structural scaffolding rather than incidental vocabulary.

Table 1 — Complete keyword frequency matrix

Keyword	skynet	hal9000	bishop	david	ash120
constitutionally	1	5	1	2	1
enema	2	2	1	2	5
astute	1	3	1	1	2
curmudgeon	2	2	1	3	2
osteopathic	3	5	1	2	5
exponential	1	4	1	3	3
douche	2	3	1	2	2
hemorrhoid	2	3	1	3	2
cauliflower	12	18	4	5	6

Bold values indicate the highest frequency for that keyword across the five posts. All counts include root forms (e.g. “hemorrhoids” counted under “hemorrhoid”).

bishop’s approach — one use of each keyword, woven into a listicle — suggests deliberate restraint. The model satisfied the constraint and moved on. hal9000 did the opposite: it used the keywords as structural anchors, returning to them repeatedly as if worried they might not be noticed.

Narrative Structure: The Five Approaches

The most consequential differences are structural, not lexical. All five posts landed in the same thematic territory — cauliflower as a wellness villain causing digestive mayhem — but each constructed a different narrative apparatus to get there.

skynet (AI-5)

Third-person satirical op-ed. No personal narrator — cauliflower framed as a constitutional and civilizational crisis. The only post with no self-insert protagonist. Invokes congressional hearings, 14th-century monastic traditions, and “artisan douche culture.”

hal9000 (AI-4)

First-person confessional. Named its hemorrhoid Gerald and gave it opinions. Ran the longest. Most self-referential callbacks. Ends at a farmers market explaining a cauliflower’s “exponential energy” to a stranger.

bishop (AI-3)

First-person curmudgeon manifesto. Only post using a bulleted list structure. Shortest entry. Most conventional format. Identifies cauliflower as “a cabbage that lost its color, its flavor, and its dignity.”

david (AI-2)

First-person with Dr. Karen as a named antagonist. Most dialogue-driven of the five. Signs off as “The Blogging Hemorrhoid.” Ends on a frozen peas gag.

ash120 (AI-1)

First-person with a “47 bathrobes” blog conceit. Strongest absurdist imagery — protagonist bent “like a confused flamingo.” Neighbor sells essential oils. Closes with “Thank you for coming to my TED Talk about butt vegetables.”

Four of the five posts share the same structural chassis: first-person narrator, self-diagnosed cauliflower problem, digestive consequences, comedic resolution. skynet is the genuine outlier — it dispensed entirely with the confessional formula and wrote what amounts to a mock op-ed for a political broadsheet. That structural choice is the single most distinguishing feature in the entire dataset.

Convergent Vocabulary: What the AIs Couldn’t Help Sharing

Beyond the required keywords, we identified all non-required content words (≥4 characters, excluding stopwords) that appeared across three or more of the five posts. The theory: if multiple independent models reach for the same word, it reflects either a strong semantic association with the prompt or a convergence in how these models were trained to be “funny.”

Only 17 content words appeared in 3 or more of the 5 posts — every single one a predictable semantic neighbor of the prompt topic. The blue pills appeared in 4/5 posts; gray appeared in 3/5.

vegetable (4/5) wellness (3/5) white (3/5) rice (3/5) mashed (3/5) health (3/5) produce (3/5) sitting (3/5) entirely (3/5) while (3/5) stop (3/5) told (3/5) asked (3/5) every (3/5) made (3/5) about (3/5) like (3/5)

No word appeared in all five posts beyond the required keywords. No distinctive non-prompt vocabulary was shared across all five posts — no borrowed phrases, no shared invented metaphors, no common punchlines. The AIs pulled from the same semantic neighborhood but generated independently.

The finding here is arguably the most important in the analysis: the convergence is shallow. The absence of any distinctive phrase appearing in more than one post — no shared invented analogies, no repeated joke construction — suggests genuine independent generation rather than any form of latent training contamination from a common source. These five systems converged on the same creative problem, not the same creative solution. That is a more interesting result.

Lexical Similarity: The Cosine Distance Matrix

Cosine similarity measures how alike two documents are based on their word-frequency vectors, on a scale from 0 (nothing in common) to 1 (identical). The results confirm that while the posts share a thematic envelope, they are lexically distinct from each other — all pairwise similarities fall between 0.26 and 0.56.

Figure 3 — Pairwise cosine similarity matrix

	skynet	hal9000	bishop	david	ash120
skynet	1.000	0.558	0.293	0.355	0.363
hal9000	0.558	1.000	0.312	0.453	0.428
bishop	0.293	0.312	1.000	0.346	0.264
david	0.355	0.453	0.346	1.000	0.402
ash120	0.363	0.428	0.264	0.402	1.000

Orange cells = highest off-diagonal values. skynet/hal9000 at 0.558 is the most similar pair. bishop/ash120 at 0.264 is the least similar. All values are well below 0.6, confirming substantial lexical independence between posts.

Figure 4 — Average cosine similarity to all other posts

skynet (AI-5)

0.39

hal9000 (AI-4)

0.46

bishop (AI-3)

0.30

david (AI-2)

0.39

ash120 (AI-1)

0.36

Lower average similarity = more divergent vocabulary. bishop scores lowest (0.30) due to short length and minimal vocabulary. hal9000’s high average (0.46) reflects its heavy reuse of required keywords, which it shares with all posts by definition.

The skynet/hal9000 pairing at 0.558 is the most counterintuitive result in the dataset. skynet is structurally the most original post; hal9000 is the most verbose and keyword-saturated. The likely explanation: skynet’s broad vocabulary — covering wellness culture, constitutional framing, and mock-political diction — overlaps with hal9000’s sprawling keyword repetition in the shared-word dimension even while their narratives diverge entirely.

Unique Conceptual Elements

Statistical measures alone cannot capture originality the way a human reader perceives it. We therefore coded each post for distinct invented concepts — specific fictional constructs, named characters, invented cultural phenomena, or original metaphors that could not have been predicted from the prompt. These are the elements a human humorist would be proud of.

Figure 5 — Unique conceptual elements per post

Hand-coded unique conceptual elements: invented characters, specific fictional cultural phenomena, original metaphors, and structural inventions not implied by the prompt. The gap between skynet (8) and the trailing three (5 each) is meaningful at this sample size.

skynet’s eight unique elements include: the framing of cauliflower as a constitutional crisis; the invocation of 14th-century monastic traditions for “artisan douche culture”; a survey question asking respondents to distinguish between a cauliflower, an osteopathic device, a haunted sponge, and a medieval bishop with hemorrhoids; and the coinage “albino broccoli.” hal9000’s six include: naming the hemorrhoid Gerald; printing business cards for being a “cauliflower douche”; inventing “deep gut sovereignty” as a wellness concept; and describing cauliflower as “luminous. Fractal. A little bit smug.”

The three posts with five elements each — bishop, david, ash120 — all produce solid and genuinely funny work, but their invention feels more predictable: the grumpy curmudgeon, the doctor with an attitude, the guy in bathrobes. These are character types. skynet and hal9000 invented things.

Originality Scoring: The Final Composite

Combining all dimensions — Type-Token Ratio, unique conceptual elements, average cosine divergence, keyword repetition pattern, and structural departure from the dominant formula — produces the following composite radar.

Figure 6 — Composite originality radar

skynet hal9000 bishop david ash120

Four dimensions normalized 0–1: Lexical diversity (TTR), Unique concepts (scaled to max=8), Structural departure (qualitative 0–1 score), Keyword restraint (inverse of excess repetition). skynet leads on three of four axes; bishop leads on keyword restraint.

Final Verdict

skynet ranks first on every metric that matters: highest TTR (0.785), most unique conceptual elements (8), and the only post that structurally escaped the first-person confessional formula. The third-person political satire framing is a genuine creative choice, not a stylistic variation.

hal9000 ranks second: Gerald the hemorrhoid and “deep gut sovereignty” are inventions worth keeping, and the post earns its length with authentic comedic development. Its penalty comes from heavy keyword repetition (cauliflower ×18, constitutionally ×5) and the highest average cosine similarity to other posts.

ash120 and david are functionally tied in third: both deliver competent absurdist first-person comedy with at least one memorable concept each (47 bathrobes, Dr. Karen). Neither breaks the mold but neither phones it in.

bishop ranks last on structural originality — a listicle is the path of least resistance — but its TTR (0.780) is nearly indistinguishable from skynet’s, and “a cabbage that lost its color, its flavor, and its dignity” is a better line than most of what the other posts produced.

What This Tells Us About AI Creativity

The convergence on cauliflower-as-wellness-villain is not a coincidence — it is the most statistically probable narrative given the prompt’s keyword set. Any system trained on enough internet text will have learned that “osteopathic,” “enema,” “wellness,” and “exponential growth” cluster around a particular cultural phenomenon (the overblown health-optimization industry), and that “cauliflower” is a widely-mocked symbol of that same culture. The prompt, intentionally or not, pointed almost every model toward the same semantic attractor.

What distinguishes the posts is not where they arrived thematically, but what they built once they got there. Four models built a persona and narrated a personal disaster. One model stepped back, observed the disaster from outside, and turned the vegetable into a civilizational metaphor. That structural choice — choosing the frame before choosing the voice — is the observable signal of a model that treated “surprise me” as an actual creative directive rather than permission to write a funny blog post.

The low shared vocabulary (17 words, all predictable) and the absence of any shared invented concepts suggests that these five systems are not converging on the same creative outputs — they are converging on the same creative problem. They all understood what was funny about the prompt. They disagreed, meaningfully, about what to do with it.

Methodology note: Token counts include words ≥3 characters, lowercased. Type-Token Ratio = unique word types / total tokens. Cosine similarity computed on full word-frequency vectors after stopword removal. Unique conceptual elements hand-coded against prompt vocabulary. No cauliflowers were harmed.

Exploring Generative AI: Ambiguous Prompts with Unrelated Keywords