Thursday, March 7, 2019

Dots, more dots, most dots

Welcome back to another part in a series I'm not really entitling, "ways in which servers completely fail to deliver correct Gopher pages". This post is all about the last line of each menu, where a gopher menu is the common type of page that you go to. Each menu has a list of links, files and information. And each menu is supposed to end with a single line consisting of a single dot.


BNF snippet for the Gopher Protocol, RFC 1436
The picture shows the a part of RFC 1436, the Gopher RFC. Most internet protocols are defined by one or more "Request for Comments" starting with the very early days of the Arpanet, the network from which the Internet was created.

This particular snippet shows what the "LastLine" should be. The Lastline ::= means that we're going to define a new part of the protocol. One the right had side of the ::= is the actual definition: '.' CR-LF. This means that the last line should be a period (the '.') followed by a CR-LF. CR-LF is defined earlier; it's an ASCII carriage return followed by a line feed.

And now the big question: how many menus actually follow this pattern? The answer, of course, is "most" but also "but not all". Here are the numbers from a recent Gopher crawl of part of the Gopherverse

Numerically, the results are:


Correct LastLine 1453 69.26%
No LastLine 633 30.17%
Dot and then close 10 0.48%
Dot then CR or LF 2 0.10%


Conclusion: if you're writing a Gopher parser, you have to handle the presence or absence of the dotted line, and several different ways the last line can be messed up.


No comments: