Wed 2021-Oct-20

State of the Blog at 1 Year

Tagged: About / Statistics / TheDivineMadness / ϜΤΦ

This crummy little blog that nobody reads has been around for a little more than a year. It’s time to look at the numbers and see how we’ve been doing.

It’s been how long?

Fiat blog was on 2020-Jul-01, my first day of retirement. Today is 2021-Oct-20. According to the duration calculator, that is 477 days, inclusive. So we’ve been blogging for:

\[\frac{477 \mbox{ days}}{365.24 \mbox{ days/yr}} = 1.306 \mbox{ yr}\]

(Yeah, I missed the first blogiversary. The line forms to the left for a chance to demand a refund of your blog subscription fee.)

It seems like it’s time for a bit of retrospective introspection, speculation, and haruspication. Or words to that effect.

Using GitHub as Content Management System and GitHub Pages as Host

As you can see from the orange & white “merit badges” at the bottom of each page, this blog is hosted at GitHub. (Also this blog cares so much about HTML & CSS correctness that you can check it for yourself against the canonical HTML, CSS, and hyperlink validators.)

GitHub’s worked out more or less fine for me. If you’re not comfortable with software tools, though, it’s probably not for you and you’d like WordPress better.

For those of you asking, “Why not just use WordPress like everybody else?” Mostly, I wanted to have finer control over things, use the absolute bare minimum of icky, intrusive Javascript, and be relatively robust against the various WordPress hacks. I’m willing to pay a significant price of time & effort in figuring out how to do lots of things (like how to get comments to work with StaticMan). I haven’t yet used much of that fine control, e.g., to style the front page, but I will in the by-and-by, perhaps imitating the MinimalMistakes theme for Jekyll blogs hosted on GitHub Pages.

One of the amusing side-effects of using GitHub is that there are a number of software tools for examining what’s in the repository, gathering ongoing statistics about it, and generating reports. I’m going to be pretty primitive here and just examine the clone of the repository I have on my laptop, since that’s sufficient for now:

  • All the posts are in GitHub-flavored markdown files in a directory called _posts.
  • All the comments are in yaml files in a directory called _data/comments.

That means I can use even just elementary Unix command-line tools to collect statistics about posts and comments. For example, to count the number of posts:

$ find ./_posts -iname "*.md" -a -type f -a -print | wc -l

I started this blog with the goal to have fun writing; that’s been achieved.

It was specifically not a goal to be monetized or to become an “influencer” with a huge following. I suspect both of those have also been successfuly avoided; this is, after all, just a crummy little blog that nobody reads.

Let’s see how the numbers say we’ve been doing.

The Basic Numbers

  • Number of posts:
    $ find ./_posts -iname "*.md" -a -type f -a -print | wc -l
  • Number of comments submitted: This is harder to get programmatically, so I went to the GitHub web UI and got it. (In the future, I should use the API to count pull requests progammatically.)

    GitHub: Number of pull requests (submitted comments) Comments are done in a way that a remote process creates a branch with your comment and a pull request to ask me if I want to merge your comment or delete it. Since pull requests here aren’t used for anything else, the number of pull requests is the number of comments submitted. Go to the Pull requests tab, set the filter to just is:pr, and see that there have been 281 pull requests submitted since the fiat blog event (all currently closed).

  • Number of accepted comments: There are several possibly interesting numbers here: the total number of comments that made it past moderation, the ones from people who are not me, and the ones from me (generally my replies to comments). We are also interested in the number of unique outside commentators (and who they are, but I’m not publishing that).
    • Total accepted comments:
      $ find ./_data/comments/ -iname "*.yml" -a -type f -a -exec grep "^name: .*" \{\} \; | wc -l
    • Total accepted comments not from me:
      $ find ./_data/comments/ -iname "*.yml" -a -type f -a -exec grep "^name: .*" \{\} \; | grep -v "Weekend Editor$" | wc -l
    • Total accepted comments from me:
      $ find ./_data/comments/ -iname "*.yml" -a -type f -a -exec grep "^name: Weekend Editor" \{\} \; | wc -l
    • Total unique commenters: Really only 10 after removing myself and collapsing spelling variations on the names of people I know. (Hi, guys. Good to see you.)
      $ find ./_data/comments/ -iname "*.yml" -a -type f -a -exec grep "^name: .*" \{\} \; | grep -v "Weekend Editor$" | sort | uniq | wc -l

Post Frequency

I started out with a goal to be “weekend reading”, i.e., posting about 1ce/week. I think I’ve achieved that, since the average post frequency has been:

\[\mbox{Post Frequency} = \frac{477 \mbox{ days}}{124 \mbox{ posts}} = 3.85 \mbox{ days/post}\]

… or just a hair under 2 posts/week. (Whether or not they’re quality posts, well… that’s another matter!)

Spam and Nastygrams

We get a lot of spam here:

  • In the early days, it was people trying to sell “generic Viagra”, i.e., trying to sell illegal drugs on the blog of a retired drug researcher. That’s… special.
  • Then there were people trying to sell term papers for college students. I wonder if they know every professor in the developed world has software to spot that sort of plagiarism? Or if they just thought their dumb student customers didn’t know that, and they were willing to tank their customer’s education? Either way… ick.
  • There was a surprisingly small amount of attempted porn advertising.
  • Then suddenly, almost everything became Russian: invitations to participate in micro-lending, a small bit of icky sex stuff, and a lot of crap about casinos. Even attempts to sell me software to spam comments into other blogs! I mean… who in the world is gonna get involved with a Russian casino over the web, or buy spam software from spammers?!

But they don’t stop trying, especially with some older posts that somehow came to their notice. There are half a dozen posts that collect > 90% of the spam. No idea why.

We’ve also gotten a couple nastygrams, both from the same guy.

He didn’t have anything constructive or even interesting to add, so I blocked them. He just had a head full of the usual conservative claptrap, and wanted to call me names. Not even original names: socialist (yeah, probably… so?), liberal (absolutely), communist (really?) and some vague obscentities. He just wanted to say I’m wrong, dumb, and a bad person who should feel bad. (Look, dude: I’ve had drug-resistant clinical depression my entire life. I already know that.) He wasn’t even being original! Had he been original, I might have accepted the nastygram comment and replied with a thoughtful and helpful critique of his command of invective. But they weren’t even competent insults.

The probability of spam or nastygram is kind of interesting (“PR” = “pull request” = “attempted comment”). The point estimate is:

\[\begin{align*} \mbox{Outside PRs} & = \mbox{PRs} - \mbox{comments by me} \\ & = 281 - 19 \\ & = 262 \\ \mbox{Spam or Nasty Prob} & = 100\% \times \frac{\mbox{Outside PRs} - \mbox{OutsideComments}}{\mbox{OutsidePRs}} \\ & = 100\% \times \frac{262 - 33}{262} \\ & = 87.4\% \end{align*}\]

The 95% confidence interval on the spam/nasty probability via a uniform prior and Beta posterior is easy to calculate, too:

> 100.0 * round(qbeta(c(0.025, 0.500, 0.975), 262 - 33 + 1, 33 + 1), digits = 3)
[1] 82.8 87.2 90.9

Thus the Bayesian posterior Beta distribution gives us an estimate of the probability of the spam/nastygram probability: median 87.2% (95% CL: 82.8% – 90.9%).

(I should probably write a script to do all that. And another to collect all the page view counts into a table.)

So… yeah, the spam is tiresome and voluminous. Maybe some of you actual readers could comment once in a while, to give me an idea of how the articles go over?

Comment Rate

The comment rate is pretty low:

\[\mbox{Comment Rate} = \frac{33 \mbox{ outside comments}}{124 \mbox{ posts}} = 0.266 \mbox{ comments/post}\]

…or about 1 comment every 3.76 posts. I have gotten some emails as well, mostly from people who don’t want to use the comment system, or can’t figure it out.

Google Search Console

We can also use Google Search Console to see things like how often we come up in Google searches, what the search queries were, how often people clicked through, and what other web pages link to us.

Google Search Console: Impressions and clicks 2020-Sep-19 to 2021-Oct-20 Since we had very little search presence before September 2020, let’s go back 16 months. The plot shows the number of times we appeared in a Google search (purple line, right-hand vertical axis) and the number of times there was a click through (blue line, left-hand vertical axis).

We have a very low click-through rate of 1.9%, which means as far as Google searchers are concerned, this really is a crummy little blog that nobody reads. And I’m ok with that.

I’m also intrigued by the sudden drop in search appearances at the end of June. Since I’m doing absolutely no SEO, perhaps this is a change to Google’s ranking algorithm? The (1) along the horizontal axis in mid-August is one such event; Search Console reports when changes to ranking might affect your search appearances. But there’s no corresponding note for the much bigger drop in July, so… I dunno.

The search queries that got to us are kind of interesting, when sorted by what actually provoked a click-through:

  • The first, second, and sixth place queries were “hank green vaccine”, “hank green covid vaccine”, and “hank green covid”. Clearly I need to blog more about internet-famous people like Hank Green!
  • The others that provoked click-throughs were “filibuster statistics by party”, “#googletranslate”, and various queries about the Moderna COVID-19 vaccine.

The pages to which people clicked through corresponded pretty much to the queries. The highest click-through rate was on the front page, though. No idea why.

The countries were first the Anglosphere (US, UK, Australia, Canada… but not NZ?), followed by various European countries and then India. Only 4 click-throughs from France, so I guess my former colleagues in France aren’t reading this much. About as expected?

As far as devices, it’s almost evenly divided between desktops and mobile, with only a few hardy tablet users. Again, about as expected. This blog is tagged as mobile-friendly by Google, but every time I’ve tried it the result was much better on a real desktop screen or on a tablet, compared to a dinky phone screen.

Google Search Console: Top linked pages When we ask about other web pages that link to us, the top link is to the front page, and then a few others about vaccine stuff that apparently interests people. All in all, not much linkage, as expected.

Google Search Console: Top linking sites I don’t do any promotion for this blog: no Twittage, no Instagrammaton, no FaceBorg, nothing. The only things I do are (a) mention it to people in conversation or email when it’s relevant, and (b) very occasionally leave comments on somebody else’s blog. The linking sites confirm this, being mostly places I’ve left comments on other blogs.

Google Search Console: Top linking text The text people use to link here is kind of amusing. Some of it’s just my nom de blog, or the name of the blog itself, or the ubiquitous “here”. But the “fda declares war on america” guy is… probably not paying attention to what I have to say. Or linking to me as someone with whom they disagree, maybe? I didn’t bother to track down the reference, so I dunno. But good luck to you, whoever you are.

The Weekend Conclusion

This is still a crummy little blog that nobody reads.

And I’m still ok with that.

There are a few links, mostly from the comment sections of a few blogs we’re I’ve dropped in to say something. I’m not interested in doing promotion work, or monetization. I might look into Google Ads and some minor promotion someday, once I get the stylesheet stuff straightened out, but also maybe not. So don’t hold your breath on that.

To my spammers: You’re hopeless. You’ll never make it past moderation. Move along.

To my readers (all 3 of you, excluding my spouse, my cat, and myself): Thanks for reading. I’m gratified at the couple of you that have expressed interest. Please feel free to leave comments; it makes me happy to engage with thoughtful people.

Notes & References

1: Nope. Not today.

Written Wed 2021-Oct-20

Gestae Commentaria

I enjoy your blog. I especially enjoy the stats :-) It’s also interesting to see some of what’s going on in the US from the inside. Plus, it’s nice to have a connection, however tenuous, with an old friend.

Weekend Editor, Fri 2021-Oct-22 00:13

I enjoy your blog.

Louise, it’s so nice to hear from you! I had no idea you were reading your way through this madness.

I especially enjoy the stats :-)

Coming from someone with your qualifications, I really appreciate that. I have occasional nightmares that somebody like you who really knows their stuff will come by and tell me I’m doing posterior Beta distributions wrong. :-)

It’s also interesting to see some of what’s going on in the US from the inside.

The US seen from the inside is a slightly ugly sight, I’m afraid. At least nowadays, from a lefty perspective. OTOH, my only insight into the UK from an internal perspective is Charlie Stross’s blog and his Laundry Files novels… at least some of which I hope is fiction.

Plus, it’s nice to have a connection, however tenuous, with an old friend.

Good to hear from you too. I thought of you recently when I heard jokes about people attempting to use Excel to manage data science workflows, about which I’m sure you’d have a few very tart remarks!

I enjoy the blog, and did not know it was such a recent undertaking. I confess I don’t always fully understand all the math you use. (And I’m okay with that.)

Also, nice bot trap on the comments system.

Weekend Editor, Wed 2021-Oct-20 23:33

I enjoy the blog, and did not know it was such a recent undertaking. I confess I don’t always fully understand all the math you use. (And I’m okay with that.)

And of course I’m more than ok with your willingness to read despite the math. :-)

Also, nice bot trap on the comments system.

Yeah, if only it actually worked a bit more to repel boarders; several spambots have figured it out. I may have to get crazy on them.