Wed 2023-Jun-14

On using ChatGPT for Peer Review

Tagged: ArtificialIntelligence / NotableAndQuotable / Sadness / SomebodyAskedMe / Statistics / ϜΤΦ

Somebody submitted a paper to a journal. The journal sent it out for peer review. A reviewer, skimping on their job, used ChatGPT to write the review. This went about as well as you probably think.

How Publishing an Academic Paper Works (or, Is Supposed To Work!)

Let’s look at the way things are supposed to work (and often do work, very approximately):

You, after many long years, finally codify your Grand Unified Theory of Pastry into a simple 10-pager that lets mundanes in on your particular gnosis.
You send this off to a prominent journal in your field, the Journal of Post-Modern Psycho-Ceramics. The editor of Jnl PoMo P-C, the eminent Josiah Stinkney Carberry, gives it a quick once-over, and thinks it might be worth looking into. So your paper goes into their tracking system and then out to a panel of peer reviewers.
The peer reviewers are a panel of several other scholars in your field, just as crazy as you. They each independently, and ideally without knowing you are the author, review the paper. They assess it:
- Does this even make sense?
- Is it right, or at least very likely right?
- Is it timely, i.e., solving a problem about which other scholars even care?
- Is it reasonably clear?
- Is it relevant to the readership of Jnl PoMo P-C?
If that comes up mostly “yes”, then your paper is published. Done!
Otherwise, the editor guesses whether there’s any hope of making it better:
- If not, your paper is finally rejected, with the unspoken suggestion that you never darken their door ever again.
- If it’s kinda ok, but not on topic, they may suggest a different journal for you to try, or just tell you to find one yourself. Beware if they suggest the Journal of Irreproducible Results.
- If it’s kinda ok but just needs some improvement, you get it back with “suggested” changes, some of which come from the reviewers.
You look the situation over, and either give up or “revise & resubmit.”

This is supposed to result in scientific literature that is up-to-date, accurate, understandable, and relevant. You can have various opinions about this.

Today’s example

Now, you might understand that here at Château Weekend we have a somewhat jaundiced view of these Large Language Model gizmos, and their tendency to hallucinate, resulting in convincing, bogus text. ^[1] (Best take heard since then: even when they’re right, it’s by accident since they have no notion of truth. So they’re always hallucinating, nonstop.)

Today’s example comes from a LinkedIn post by an academic named Robin Bauwens ^[2]:

Yeesh… let’s unpack this. So Bauwens submitted a paper, which the journal sent out for peer review.

So far, so good.

But one of the reviewers rejected the paper, “suggesting” Bauwens familiarize himself with several other papers before trying again. This can be meant helpfully, or it can be snarky as in “You’re too ignorant to publish here, so let me send you on a wild-goose chase by way of tutorial.”

In this case, it was even worse: the references, of course, did not exist! This is exactly what we documented before, with regard to hallucinated references. Since then, we’ve heard from academics frustrated with people requesting papers they never wrote. One particularly clever wag pointed out that given how LLMs work, hallucinations are normal and it would be astounding if it came up with real references!

So, suspicious that this might be what the reviewer did, Bauwens wisely checked. The suggested tutorial references were all fabrications of GPT-2.

The reviewer had, in fact, not reviewed the paper. He’d fed it to something like ChatGPT and asked it to write a rejection letter!

Ever-so-slightly gratifyingly, Bauwens reports that when he showed this to the journal editor, that aberrant peer reviewer was dropped:

Bauwens @ LinkedIn: Cheating reviewer dropped

It’s actually too bad the cheating reviewer wasn’t outed by name, for public shaming. Academics (a) should know better about LLMs, given their students are faking homework with them, and (b) should fulfill their review duties honestly in the first place.

The Weekend Conclusion

Look, just don’t use ChatGPT or other LLM engines for anything serious! They’re absolutely great for playing around, or for generating short texts that you’re going to fact-check at the level of each and every word, but nothing else.

Consult your cat, who probably has some excellent, if sarcastic, advice on this subject. Or you could read what my cat, the Weekend Publisher, had to say. ^[3]

Notes & References

1: Weekend Editor, “On ChatGPT and Its Ilk”, Some Weekend Reading blog, 2023-Feb-15. ↩

2: R Bauwens, “Untitled report of GPT-2 use in peer review”, LinkedIn, 2023-April. ↩

3: Weekend Editor, “ChatGPT and Francophone Misbranding”, Some Weekend Reading blog, 2023-Mar-25. ↩

Published Wed 2023-Jun-14

Gestae Commentaria

Comments for this post are closed pending repair of the comment system, but the Email/Twitter/Mastodon icons at page-top always work.

Reserved for future page visit counter.
Site last built:	Sun 2025-Jul-20 17:32 UTC