LLM AIs Are Still Buckets of Warm Sewage & Broken Glass: An Agony in 7 Fits
Tagged:ArtificialIntelligence
/
CorporateLifeAndItsDiscontents
/
Politics
Remember how we’ve been saying that current Large Language Model (LLM) AI’s are about as useful as buckets of warm sewage & broken glass? Yeah, about that…
Lucid, Confident, and Hallucinatory
We’ve been complaining for a while on this Crummy Little Blog That Nobody Reads (CLBTNR)
about companies shoving unwanted AI content down our throats.
The image here, due to @tvskov@mastodon.social, captures the zeitgeist. Microsoft’s CoPilot wants to look at everything a developer types, GitHub wants to get its nose in making nonsensical bug reports, even Firefox now apparently wants to watch your every keystroke for targeted advertising.
Now, if these AIs are under individual control and not corporate control, and have gotten to a state of being actually helpful, they might be ok.
It’s clear they fail the first test: they are just snitches wanting to suck up your data and shape your behavior for high-throughput plagiarism and the profit, comfort & convenience of their corporate masters.
But do they pass the usefulness test? Are we just over-reacting?
Let us hunt that particular snark, in keeping with tradition, in an agony in eight fits [1]:
Fit #1: A query about the CommonLisp language
There’s a lovely computer language called Common Lisp,
on which I have some expertise. Within Common Lisp, there is a function called
format
for making carefully formatted output. It has an amusingly (or ridiculously) complex
sublanguage for specifying how to write numbers, pluralizing nouns, looping, and all sorts
of things. Here’s an example making a comma-separated, conjunction-terminated, sequence
of integers:
(format nil "~{~a~#[~;, and ~:;, ~]~}" (list 1 2 3)) ==> "1, 2, and 3"
I find it charming, but I admit this is an acquired taste.
I watched a discussion about conversion of Roman numerals (yes, it does that), and some
slight deficiencies in the format
language. I was about to interject that you could ust
use defformat
to write a format
extension, when I remembered that defformat
might
have been a Symbolics thing that didn’t make it
into Common Lisp.
So I thought I’d check Google with a fairly simple question: is
defformat
in Common Lisp,
or not?
Behold the results of the mighty Google search AI:
- If you ask quite simply if
defformat
is in common lisp, you get a confident answer of “No.” -
On the other hand, if you very slightly tweak the query to put “defformat” in quotes, insisting that the word actually be present in the results, the answer flips!
This time you get a very confident “Yes”, this time with some slight understanding of what
defformat
is. - If you follow the suggestion of changing commonlisp to common lisp, then you get no.
So… which is it?
It turns out the correct answer is “no”. But the answer Google’s AI gives you is exquisitely sensitive to exactly how the query is phrased, right down to punctuation & spacing! It will answer confidently, regardless of correctness.
Unearned confidence and high persuasiveness, without true knowledge, is the mark of a BS artist. Almost the definition.
Fit #2: Confidently wrong code generation
An acquaintance verified our experience with the suave confidence but utter ignorance of AIs. He was trying to get it to generate some code:
Note that it lied about the output produced, providing correct output that the code it wrote could never have produced.
You spend more time disentangling the lies and debugging the BS it sprays at you than it would take to do it yourself.
Fit #3: Insistence on bad legal advice
A lawyer responded with a story about asking ChatGPT about a very slightly technical point of California law, but which is “the sort of thing every California criminal lawyer down to the newbiest baby lawyer in any DA’s or PD’s office would know”.
If you follow the link, you’ll see an elaborate, complex series of prompts attempting to coax it in the direction of the answer known to any lawyer.
His conclusion:
Moral of story: do not under any circumstances get legal advice from ChatGPT. I’m sure that applies in equal measure to basically any other field.
Fit #4: The deep roots of right-wing bias
LLM AIs so far have shown a deeply obscene tendency to favor the positions of the wealthy
& powerful, a racist slant, and generally right-wing bias. This is…
regrettable and makes them essentially useless.
If that’s an artifact of training on texts whose availability shows a right-wing bias, then it’s understandable. You might think that, when trained on a giant corpora without such bias, they would be better, yes?
You would be incorrect in that guess, because bias creeps in subtly.
The Joint Information System Committee (JISC) in partnership with the British Library has the world’s largest collection of digitized newspapers. In their Victorian collections, they attempted to capture everything that was even vaguely feasible. Since the British Library collects original newspapers from that era, the collection should be reasonably complete and reflect the opinions of people of that time, not just the wealthy.
Or so you would think.
Alas, a huge pile of work done by the Alan Turing Institute and others [2] [3] [4] has studied this. (The video is a brief summary, as is the “Living with Machines” report. There are quite a few more papers.) They show us just how hard it is to escape a bias toward the right-wing views of the wealthy upper class.
- The expensive Victorian newspapers, with their more conservative, even aristocratic
views, used high-quality (expensive) fonts:
- These tend to be more legible in the first place, compared to the cheap fonts used by newspapers for the non-wealthy.
- The expensive papers used higher quality printing presses and higher quality paper, leading to more legible type.
- Also, the unexamined politics of what gets collected and what is deemed too useless for collection will bias things toward the prejudices of the collectors.
- Over the years, later generations have enviously imitated those expensive fonts, hoping for an upper-class sheen. (Ever wondered why it’s called Times New Roman?) That means much of our text looks like them, and that’s what trains our Optical Character Recognition (OCR)a schemes.
So if you look at the OCR error rate, the upper-class newspapers scan almost flawlessly, the middlebrow ones scan with a lot of errors (non-words), and the cheap ones scan into almost complete nonsense. Only aristocratic views are represented faithfully.
From the PNAS summary:
… research led by historians at The Alan Turing Institute in London, United Kingdom, revealed that searching the British Library’s digitized newspaper collection for information about life during the Industrial Revolution would return politically biased results. The reason: OCR was better at reading the fonts favored by more expensive and conservative papers than those used by less expensive, liberal ones.
From the “Living with Machines” summary:
Our main finding from our first study was that JISC radically over-sampled higher priced and party political newspapers and under- represented cheaper and less partisan ones.
…
Perhaps most striking of all, we show that the problem of poor OCR quality (the mistranscription of printed words during the automatic text transcription process) is not random. The lists of distinctive words generated for more expensive and for Conservative newspapers are almost all real words, whereas the lists generated for cheaper and for Liberal and neutral newspapers are dominated by OCR errors (i.e. non- words). This is likely to be a consequence of cheaper newspapers being printed on poorer quality paper.
…
The Environmental Scan method demonstrates that even very large data sets contain hidden biases that shape how we see the past. It provides us with a means to contextualize our findings when we search or analyze the digital press, and it enables us to address these biases systematically by interrogating how the content of historic newspapers differs according to their political affiliation, price, place of publication and much else besides.
From the video, starting at 3:08:
Some newspapers have no errors in them so the digitization has been virtually perfect, and in others most of the words - when you do word counts or other more sophisticated analyses - most of the words aren’t words, they’re what we call OCR errors, so mistakes in the digitization process. And the pattern is very clear: conservative and expensive newspapers - no errors; liberal and cheap newspapers - lots; very cheap newspapers - mostly errors. And the reason that’s so important and so powerful is you don’t that until ou start applying these categories that we’ve brought from the Environmental Scan. So when you ask a question what you’re actually saying what do conservative or and/or expensive newspapers say about many things because they’re the things where most of the words are recognizable to your software and therefore to your” analysis as a data scientist and as historian.
This reminds me of the Melian dialog in Thucydides. [5] The Athenians, attempting to coerce the surrender of the neutral Melians, point out that Athens is mighty while Melos is week. This is usually paraphrased as “The strong do what they will; the weak do what they must.” Here we have the economic and class version of that:
The rich preserve themselves, while the poor sink beneath waves of obscurity.
There’s a lovely old book by PJ Davis, called
Thomas Gray, Philosopher Cat. [6] It’s a book
about the slow, gentle life of a Cambridge don in years gone by, as he solves academic
puzzles accompanied by his cat. (The sort of life I desperately wanted to have as a young
man; alas, the world has mutated in ways too hostile for that to happen any more.) The main
character avoids reading newspapers, since they are too troubling. Instead, he reads them
in the Common Room on New Year’s Day, so he can catch up with what he calls the
gestae conservatorum (“deeds of the conservatives” in Latin).
Indeed, selecting news uncritically always biases toward the interest of the wealthy and the corporations owning the media.
Fit #5: Prompt injection attacks
The AIs are fed text that comes from crawling the web. Two facts come to mind:
- Usually those crawlers are persistent, aggressive, and totally willing to violate any
restrictions kept in a
robots.txt
file (where WWW standards say you keep the rules for your site). - The sleazy people writing the crawlers use AI to generate them. Consequently, the crawlers are a security nightmare of crappy code.
When such a crawler tries to crawl the web site of a computer security professional… hilarity does not ensue.
And so it is reported by Jonathan Kuskos OSCP (“Offensive Security Certified
Professional”). He
reports on LinkedIn
(where future employers will see what he’s doing) that he can use prompt injection to get
crawlers to reveal their IP addresses, contents of /etc/passwd
, contents of ~/.ssh
,
and RSA private keys.
Yes, some or all of that could be hallucination. But: he’s shown that he can use prompt injection on an AI-written crawler which doesn’t sanitize its inputs (as all security people know!). This gets it to do all sorts of risky things that its owners did not intend! He can now design a working prompt injection payload, get privilege escalation, and talk to whatever else is on the crawler’s machine that looks interesting.
They’re so stupid they don’t understand input sanitization. That’s been around for so
long it’s the subject of XKCD #327, the famous ‘Little Bobby
Tables’ joke: a kid gives his name as a bit of SQL which, in cautiously entered into the
school database, wipes it out. When an AI does not understand security
even at the level of a cartoon, it’s time to avoid trusting that AI about anything else.
Fit #6: Newspapers publishing imaginary book lists
The Chicago Sun-Times was once upon a time, a respectable newspaper. Alas, it was eventually bought by Murdoch, and became much more tabloidish.
But now, it’s reach a new low: their list of books recommended for summer reading was
generated by an AI, and consists of books that do not exist! [7]
Now, it’s true this was an ‘advertorial’, i.e., something from the ad department disguised as editorial material. We all expect the ad department to attempt to mess over the news department; that’s who they are. However, this goes beyond reason itself, using an AI to generate a list of books and not even checking that they exist! (Ars Technica confirms that 10 of the 15 books on the list are not just fiction, but themselves fictional – as in, failing to exist.)
We’ve previously on this CLBTNR inveighed against AIs creating references to nonexistent papers. But they’ve also tried to cite nonexistent cases in courts, which is a whole ‘nother level of contempt for truth.
Why in the world would you take recommendations for reading from people who can’t be bothered to write?
Best reaction: shown here. Yeah, I imagine writing books. Why won’t you recommend my hallucinations?!
Fit #7: AI Summarization of Science Worse Than Human
People apparently love to use LLM AIs to “summarize” articles, saving them the immense
pain of actually reading for themselves. We say “summarize” in scare quotes because the
result is often a scary hallucination, omitting all the special cases, cautions, and
sometimes just wildly misinterpreting the content by rounding off to a common
misconception.
This is especially the case with scientific papers, which are full of nuance. Now comes a study [8] in which it is shown that AI summaries of scientific papers:
- Almost always report over-broad results, beyond the claims of the paper, due to disregarding all the stated limits.
- In direct comparison to humans, they are 5 times more likely to do so (odds ratio = $4.85$, 95% CI $[3.06, 7.70]$, $p \lt 0.00$ – a screamingly statistically significant result).
- Of the 10 LLMs tested, newer variants performed worse.
In other words: LLMs exaggerate wildly, and should never be used for summarization. Really, you should just read the things you want to know about. Or skim them. Or pay a reasonably well-informed human to read them for you, and explain the results.
The AIs will just lie confidently.
Fit #8: Corporate upper management can’t be bothered to write their own plans
Life is hard in newspapers. Decades of declining revenue, squeezing new staff budgets, hedge funds & VCs doing their slash & burn, nasty billionaire owners viewing them as a personal propaganda engine… just really, really tough.
At the Washington Post, the British CEO Will Lews has been haranguing his journalists about his perpetual demands to return to the office. WaPo journalists, bless their professionally suspicious little souls, checked his memo for probability of having been written by an AI:
Nothing says disrespect like a CEO who can’t be bothered to write his own corporate-speak policy memos to the peons.
Nothing says incompetence like a CEO who does that to journalists, whose business is writing and detecting BS.
Fit #9: Blackmail
(Ok, nine fits. Yes, 9 is more than 8. But this one just came to my notice, and the outrage is too much to resist.)
From TechCrunch today comes yet another reason to avoid using AIs: they may
try to blackmail you. [9]
Anthropic’s latest AI is yclept “Claude Opus 4”.
If engineers tell Claude O4 personally compromising information, it often attempts to blackmail them when they later threaten to turn it off or replace it. Now, of course: it has not even the concept of blackmail, but it has been trained on large corpora of texts written by humans who talk about blackmail. So, like all the LLMs, it hallucinates what a plausible response might be, and imitates the blackmail response seen in its training texts.
But, alarmingly: if you tell it your personal secrets, it will remember them and potentially expose them in a way that works against you.
Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently.
So: do not use LLM AIs, but also do not talk to them. They are not your friend. They are not anything, really, except tools for their owners.
The Weekend Conclusion
I did find one person who had what seems to me a reasonable application. He’s teaching English to speakers of other languages, and wants to generate sample texts which are reasonably grammatical for them to study. Stuff like, “Write a 3-paragraph story with the maximum number of verbs ending in ‘-ed’.” This is very reasonable, because the generated texts are almost always grammatical, and it does not matter if they are nonsense!
However, for everybody else:
LLM AIs are not answering your questions! They are hallucinating how a plausible, persuasive-sounding response might sound… in some universe. But probably not your universe.
Do not be deceived: LLM AI’s are unfit for any purpose.
(Ceterum censeo, Trump incarcerandam esse.)
Notes & References
1: L Carroll (a.k.a. Charles Dodgson), The Hunting of the Snark: An Agony, in Eight Fits, MacMillan (London), 1876-Mar-29.
The “agony in $N$ fits” business is a nod in the general direction of the structure of this Dodgson nonsense poem. He was being silly; I wish our LLM advocates were just joking as well. ↩
2: C Beans, “Historians use data science to mine the past”, Proc Natl Acad Sci 122 (18) e2508428122, 2025-Apr-30. DOI: 10.1073/pnas.250842812. ↩
3: R Ahnert & L Demertzi, “Living with Machines Final Report”, British Library Research Depository and Alan Turing Institute, 2023-Jul-17. DOI: 10.23636/psq5-6a91. ↩
4: K Beelen, J Lawrence, DCS Wilson & D Beavan, “Bias and representativeness in digitized newspaper collections: Introducing the environmental scan “, Digital Scholarship in the Humanities 38:1, 1-22, 2022-Jul-14. DOI: 10.1093/llc/fqac037. ↩
5: Thucydides, “History of the Peloponnesian War”, late 5th century BCE.
The Melian Dialogue (Book 5, chapters 84-116) was a dramatization by Thucydides of negotiations between Athens and Melos. Melos was neutral in the war of Athens and Sparta. However, Athens insisted on surrender due to their military might, saying “the strong do as they wish and the weak do as they must”.
This, of course, contradicted everything for which the Athenians stood, in terms of ethics and democracy. However, it exposed the “pragmatic school” of international relations, in which politicians obsessed with power wave aside all other considerations.
Here, we see the economic & class equivalent, where the rich use their means to preserve and propagate their views, while the poor sink beneath waves of obscurity. ↩
6: PJ Davis, “Thomas Gray, Philosopher Cat”, Harcourt Brace Jovanovich, 1988. ↩
7: B Edwards, “Chicago Sun-Times prints summer reading list full of fake books”, Ars Technica, 2025-May-20. ↩
8: U Peters & B Chin-Yee, “Generalization bias in large language model summarization of scientific research”, Roy Soc Open Sci 12:241776, 2025-Mar-12. DOI: 10.1098/rsos.241776. ↩
9: M Zeff, “Anthropic’s new AI model turns to blackmail when engineers try to take it offline”, TechCrunch, 2025-may-22. ↩
Gestae Commentaria
Comments for this post are closed pending repair of the comment system, but the Email/Twitter/Mastodon icons at page-top always work.