BNF snippet for the Gopher Protocol, RFC 1436 |
The picture shows the a part of RFC 1436, the Gopher RFC. Most internet protocols are defined by one or more "Request for Comments" starting with the very early days of the Arpanet, the network from which the Internet was created.
This particular snippet shows what the "LastLine" should be. The Lastline ::= means that we're going to define a new part of the protocol. One the right had side of the ::= is the actual definition: '.' CR-LF. This means that the last line should be a period (the '.') followed by a CR-LF. CR-LF is defined earlier; it's an ASCII carriage return followed by a line feed.
And now the big question: how many menus actually follow this pattern? The answer, of course, is "most" but also "but not all". Here are the numbers from a recent Gopher crawl of part of the Gopherverse
Numerically, the results are:
Correct LastLine | 1453 | 69.26% |
No LastLine | 633 | 30.17% |
Dot and then close | 10 | 0.48% |
Dot then CR or LF | 2 | 0.10% |
Conclusion: if you're writing a Gopher parser, you have to handle the presence or absence of the dotted line, and several different ways the last line can be messed up.
No comments:
Post a Comment