Saturday, September 21, 2019

GOPHER and the Internet for Dummies Quick Reference 1994

Another trip down the memory lane (aka, the latest find in a used book store).  The Internet For Dummies, 1994 edition has this to say about Gopher

If you're looking for a particular file or piece of information, like a copy of The Wonderful Wizard of Oz or the current weather in Key West, Gopher is the quickest way to find it. If yo goal is something obscure, WAIS is your best bet because -- unlike the other two -- WAIS doesn't depend on someone already having indexed the information. If you just want to browse in a general topic area, WWW is the quickest way to have a look around.

Times have sure changed. Of the Gopher URLS in the text (page 79 to 90), here's the current status for each of the gopher servers listed: 100% of them are no longer functional.

US public Gopher servers
infoslug.ucsc.edu
infopath.ucsc.edu
grits.valdosta.peachnet.edu
ux1.cso.uiuc.edu
gopher.netsys.edu
panda.uiowa.edu
inform.umd.edu
seymour.md.gov
gopher.ora.com
wsuaix.csc.wsu.edu
gopher.msu.edu
consultant.micro.umn.edu
nicol.jvnc.net
sunsite.unc.edu
twosocks.ces.ncsu.edu
cat.ohiolink.edu
envirolink.hss.cmu.edu
ecosys.drdr.viginia.edu
telnet.wiscinfo.wisc.edu


International public Gopher servers
info.anu.edu.au
finfo.tu-graz.ac.at
nstn.ns.ca
camsrv.camosun.bc.ca
tolten.puc.cl
gopher.denet.dk
ecnet.ec
gopher.th-darmstadt.de
gopher.isnet.is
siam.mi.cnr.it
gopher.torun.edu.pl
gopher.uv.es
info.sunet.se
gopher.chalmers.se
huni.ub2.lu.se
gopher.brad.ac.uk

None of these gopher servers is still existing. The common return value is "no such host", with a very small number returning a connection actively refused or even started to read, but no data was sent.

And just for fun, here's the list of Gopher programs:
Unix Gopher program "gopher"
Commands are
  • Enter to select
  • + - (or u) m = go to the next/previous/main menu pages
  •   = go to the selected menu item
  •  / = search menu for a string
  •  n = search for next item
  • q = quit
  •  = = describe current item
There are some bookmark commands: a=add item A=add menu v=view d=delete
And there are some file commands m=mail s=save p=print D=download

Also mentioned is hgopher for MS Windows, xgopher for X Windows.

Saturday, September 7, 2019

A Network Reading List

I get to be  mentor to an incoming program manager where I work. Most newcomers aren't super familiar with networking, and many people aren't familiar with the actual history of our part of computer technology.

Here's a hopefully useful curated list of useful links to source material on the creation of the internet.

The first internet was created for ARPA by BBN

The American Deparment of Defense "Advanced Research Projects Agency" (ARPA) asked for the internet in this original request. The winning proposal was from Bolt, Beranek and Newman (BBN). About a decade later, BBN wrote the Completion Report.


Ton of BBN history is online.

RFCs and Protocols everyone should know

The internet is governed (as much as it is governed) by a set of "Request for Comments" (RFCs). There's also this handy list for all 8649 current RFCs. These are all numbered; there's a society that curates them. However, anyone can be part of the RFC process. In addition to RFCs, there are also "IEN" paper with their own numbering

One of the most accessible papers (and a great way to start) is the Endian RFC, published by Danny Cohen of the Internet Engineering Task Force (IETF) as IEN 137. A good introduction to network humor is RFC 1149.

As you might guess, creating an RFC can be a long, slow slog, starting with basic proposals in a proposed state, getting consensus from experts, and converging on a draft standard and finally being voted in as an official internet standard. You might also guess that that many important protocols have gone all the way through the entire process, but here you would be wrong. The IMAP Email standrd 3501 (from 2003) is still "proposed".

You might also think that conforming to the official standards is critical. Actually, not so much. 

Internet Mail

EMail was one of the earliest super-popular uses of the Internet. There are two main ways to read email: the Post Office Protocol (POP) formats POP (RFC 918) , POP2 (RFC 937) and the current and most popular POP3 (RFC 1939) and the much more powerful / complex IMAP (currently IMAP 4.1 RFC 3501 from 2003). Sending email is via SMTP, which is a different protocol, and calendars use CALDAV (or other)

The Internet before HTTP

Telnet (RFC 318) lets you type into a computer here and have it be accepted over there. Now replaced with SSH

Gopher (RFC 1436) is kind of like HTTP/HTML, but is more of a "menu" oriented system. There still exist Gopher servers, and there's even a yearly conference. Amazingly, there's only one Gopher program for Windows in the Windows Store (and I wrote it)

Usenet News (RFC 977) is like a distributed Reddit but without reputation points.

FTP (RFC 2065) is how we used to download files. Unlike many other protocols, two sockets are used: one is for sending commands, and the other is for getting a response. This actually works really badly with firewalls. FTP is also a great example of RFCs being updated: the original RFC is 114 and then runs up to 172 265 354 542 765 959. After that there are merely a string of updates. Also notice that the orignal email protocols were handled by FTP.

Finger (RFC 742) is what was facebook. People would write information in their .plan file and it would be retrieved.

Internet Relay Chat (IRC) (RFC 1459) is like Teams (or text messages). Unlike some of these other protocols, plenty of people still use IRC.

The internet and HTTP + HTML

HTML is the markup language; HTTP is the protocol for sending HTML over the internet. It's HTML plus a bunch of headers. HTTP and HTML was first created in 1991 by Tim Berners-Lee. The NeXT computer he used when creating it is on display at the Science Museum in London. 

HTTP is defined originally in RFC 1945 which also defines Uniform Resource Locators (URLs)
HTML is defined in RFC 1866 and heavily updated since then.

Security and the internet

A common thread through all of the original RFCs is a complete lack of care about security. For example, RFC 475 says:

It has been suggested that FTP specification should require that mail
   function (for receiving mail) should be "free", i.e., FTP servers
   should not require the user to "login" (send the USER, PASS, and ACCT
   commands).

Furthermore, note that all communications are send in the clear with no encryption whatsoever. Amazingly, a recent Senate proposal for all states to email a number of sensitive documents to a senate committe proposed using a completely in-clear mail system!

Reading the Transport Layer Security (TLS) RFCs is not recommended. They are very complex, and won't help you understand the protocol. You should know that the current common TLS version is 1.2 and the new one is 1.3 (but it's not in common use yet). TLS 1.0 is barely acceptable for security and is expected to be completely broken before long. The old Secure Sockets Layer (SSL) is completely broken and must never be used if you want security.

VPN and WiFi access points often used RADIUS (RFC 2059) for authentication and authorization. One neat thing about RADIUS is that it's designed to be secure, but thanks to advances in breaking encryption, it's not at all secure on the internet.

You'll see tons of stuff about X.509 certificates. The most important thing to know is that all certificates are X.509. Other than that:
  1. A certificate is just a thin wrapper for a security key with some extra goo
  2. People handled their certificates incorrectly so often (e.g., using the code signing cert for TLS) that they are now marked with the intended purpose.
  3. The Public Key Infrastructure (PKI) is a way to 'chain' from one certificate to the next. This is always done so that there are just a few super-trusted certs and everything else flows down from there.
  4. There are custom mechanisms for trusting a certificate. Azure uses a custom mechanism inside the datacenter; for their purposes it's much faster, more robust and more secure than using the standard PKI
  5. Combining #3 and #4: the bar for inventing a new scheme to validate certificates is spending three month with the security experts. 
  6. "Browser level" certificate checking is pretty secure. The entire chain is examined; each cert along the chain is validate for expiration and to make sure that each cert correctly signed its part of the chain.
  7. Adding to browser-level certificate check is safe (security-wise). 
  8. Bypassing the browser-level certificate check is not safe. People screw this up constantly.
Read The Most Dangerous Code in the World for how common security mess-ups are.


Monday, May 20, 2019

INLINE elements and GOPHER for the gopher-of-things

INLINE Gopher links and the Gopher of things!

INLINE links are a simple but powerful Gopher addition that I've implemented in my "Simple Gopher Client" program. This was a small change, but it makes a giant improvement when using the Gopher protocol to control devices as part of the "Gopher-of-things". Here's a screenshot:



The little array of buttons are all commands for a Skoobot device (left, right, etc.). It's all Unicode, which is why the "rover" mode is a little picture of a satellite. When I press a button on the client, the Gopher command goes to a simple 70-line custom gateway program that can both display a Gopher page for the Skoobot and also can control a Skoobot robot using Bluetooth.

A common IOT configuration is that you have some set of Bluetooth (or similar) sensors connected to a hub, and then you want to have a control program that can access the hub. Often you want to make a quick, simple prototype before you go to the trouble of making a full. If only there was some widely-used very simple protocol that could make a simple UI...

Hey wait! That perfectly described Gopher! And that's why my Best Calculator, IOT edition now supports Gopher-of-things: you can make a simple "gateway" program that controls your sensor and still make a nice little user interface!

But there's something missing. Gopher works page-by-page: when you click a link (a menu entry, in Gopher terms), you get an entire new menu back (or a file or an image). There's no proviso for simply updating some existing page.

That's where my INLINE extension comes in. When a link (directory entry type '1') is seen, if the fifth column includes the word INLINE, then the link is turned into a button! And when pressed the resulting Gopher menu (directory) is returned, the first element of the returned menu replaces the first non-button item.

You can see that in the image. When a press the STOP button, it sends a command to the gateway, which does the command, and sends back just a little Gopher menu (directory) with the results of the command ("stop").

Technical details of the INLINE

1. Links (Gopher menu entry type 1) can be set to be INLINE, at which point they are called BUTTONS
2. The INLINE style is added to column 5 (which is also where the Gopher-plus "+" is added).
3. The INLINE style is part of a list of styles (e.g., "BLUE;INLINE") where styles are split with a semicolon (;). Never put a space in (it's "BLUE;INLINE" not "BLUE; INLINE")
4. Multiple BUTTON items can be placed on the same line if the client wants to.
5. The first menu entry from a button result will replace the first non-button item in the original menu. This means that you can have a block of buttons followed by a single data element.

Sunday, April 28, 2019

Working Herbert Televox robot model




The world's first "robot" was arguably Herbert Televox, built by Westinghouse. At the time, Herbert's ability to listen to commands over the phone and report back data was ground-breaking. Herbert Televox itself was really a standard electric box with a complex assortment of switches, timing and buzzers. The box was placed into a robot-shaped cutout when demonstrating its capabilities.






Herbert Televox. The box in its stomach is an advanced (for the 1920's) control unit; the robot body is simply the 1920's version of cheap plywood.


I built a scale model of Herbert Televox for a talk with early-in-career programmers where I work. The scale model had the ability to detect a phone being picked up and measure the depth of a water reservoir. I wanted a live demo with a phone connected to the scale model Herbert Televox, and a water sensor from the Herbert Televox into a little model reservoir. The reservoir is really just a small glass dish. Then I'd pick up the phone, actually dial some numbers, and then Herbert Televox would buzz once, twice, or three times based on the reservoir depth.


What I needed for a demo 



To power the smarts of my Herbert Televox I used a Particle Photon, an Arduino-based microcontroller with Wi-Fi capability. After struggling with debugging with just an LED, I added a small Adafruit_SSD1306 OLED display. In order that I added them, the Herbert Televox abilities are:

  1. Can detect a phone going off-hook and light the blue LED
  2. Can write to an OLED display with debugging information
  3. Can read the water sensor and split it into one of three different water levels
  4. Can ring a small buzzer based on the level of the water
  5. Can wait a period of time after the phone goes off-hook before buzzing the buzzer



The Arduino base of the Particle Photon makes some of this very easy. For example, it was easy to set up an analog to digital convert (ADC) for the phone off-hook detection. But the nature of programming the Arduino made handling timing awkward; I'll show my solution.



Connecting a phone and writing to an OLED







Diagram of a rotary phone showing the handset, the cradle in which it's placed, and the on-hook/off-hook switch




Although a phone jack is large enough for 6 wires, and is normally wired up with four wires (the inner four), a classic rotary phone works off of just two wires: one red, and the other green. The two wires form a loop with a certain resistance. Every operation of the phone (taking the handset off-hook, dialing, and talking) will change the resistance across the wires in a way that can be easily detected by the Arduino.



To connect my demo phone to the Particle Photon Arduino, I used a RJ11 right-angle jack breakout board which I bought from Amazon, but it's also available for less straight from the maker along with an enticing set of other boards. I plugged the phone into the jack and ran some wires from the nice terminal strips on the breakout board to the Arduino.


CZH Labs D-1039 Phone Breakout board (<$10)

Wiring setup for measuring phone resistance




In particular, the connections are



  1. Put the Particle Photon into a breadboard
  2. Terminal 3 (green) to ground
  3. Terminal 4 (red) to a spare row on the breadboard
  4. From the same row, connect a 1K resistor to Vcc
  5. From the same row, connect a wire to A0 on the Photon



The resistance of the phone and the fixed 1K resistor forms a voltage divider that runs from Vcc to ground. When the phone is on-hook, it has a rather high resistance so that the voltage at A0 will be close to Vcc and an AnalogRead of the pin will return 4096 (the maximum value). When the phone is off-hook AnalogRead of pin A0 will read about 1680 with a variation of about 50.



The actual code to read the phone off-hook starts at line 57





  // To blink the LED, first we'll turn it on...

  digitalWrite(bluePin, HIGH);

  phone = analogRead (phonePin);

  measure = analogRead (measurePin);

  speed = phone/64;

  // on hook = 4096 off hook = 1680+-50 or so

  int isOffHook = (phone < 2500);



  // …

  // A BUNCH OF OTHER CODE

  // …



  // Report the analog phone value via blue LED

  delay(speed);

  if (isOffHook) hookTimeDelta += speed;

  digitalWrite(bluePin, LOW);

  delay(speed);

  if (isOffHook) hookTimeDelta += speed;









When the phone is off-hook (a person has picked it up), I track the amount of time that the phone has been off-hook. That's because I will need to delay actually buzzing the buzzer for 2 seconds or so after the phone is picked up. When the phone is not off-hook, the hookTimeDelta is reset to zero (it get reset to zero a lot, but that’s OK).



I get a speed to flash the blue LED based on the phone analog value; I use a constant 64 to make for a "nice" flash speed (not too fast, not too slow). This value was picked based on observation.



After mucking around trying to debug the analog values using just the LED flashes, I got smart and wired up an OLED so that I'd get a little display. This was super useful; debugging these microcontrollers without a decent debugger is pretty painful. On each loop, I erase the OLED and print the phone analog reading; that's how I know the exact range of analog readings for the phone.



Quick and easy water sensor (list #3)


The water sensor is just a length of insulated twisted-pair telephone wire (I snagged 25 feet of 50-pair telephone wire from a dumpster 20 years ago, and have been using it as my go-to wire ever since). I scraped off 1-cm long sections of insulation in three places: at the very end and then twice more spaced apart by about 3 cm. Then I taped the wires together so that the bare sections are right next to each other just a few millimeters apart.



This crude sensor was then read in by analog pin A1 in the exact same way that the phone analog value is read: the water sensor is part of a voltage divider. One of the water sensor wires is at ground, the other is the mid-point, and then there's a resistor from the mid-point to Vcc. I used a 1K resistor because it was supplied as part of the Photon Particle kit. I then wrote the resulting values into the OLED debugging screen, and discovered that it wasn’t nearly sensitive enough.



The easy way to pick a fixed resistor for these simple voltage dividers is to use a resistor that's approximately the same as the variable resistance that you're measuring. For the water sensor, 1K was much too low, so I scrounged up a 10K resistor from a different kit.



Once I had a set of analog reading from the water sensor that seemed to be far enough apart that they would be clearly distinguished, I simply dipped and undipped the water sensor to get some useable split-points and then set an nbuzz variable to be the number of buzzes that I wanted.



Ringing a buzzer (list #4)


The Photon Particle kit I have includes a small buzzer. I followed the instructions, connecting it to digital pin D0 and initializing that pin with the pinMode function to be an OUTPUT pin. There are two commands, tone and noTone to turn the buzzer on and off. After a few experiments, I settled for buzzing the buzzer at 40 Hz for 200 milliseconds with a 500 millisecond gap between them. At the same time I set up a didBeep variable which starts at 0 (false) and is set to 1 (true) when I beep. The variable is also set back to 0 whenever the phone is on-hook. This ensures that I only beep once per off-hook event.



Waiting to buzz (list #5)


It's not good just to buzz as soon as the phone is picked up. For an effective demo, the presenter will need to pick up the phone, dial, and then pause, all while explaining what the device will do. I solved this problem with two variables: the hookTimeDelta time count that's increased by the time value passed in to each delay() function, and an didBeep variable that says whether I've done the beep for any particular off-hook event.



The actual code for deciding to buzz is then just

if (isOffHook == 1 && hookTimeDelta > 2000 && !didBeep)

{

didBeep = 1;

}



This will guarantee that the buzzes only happen once, and after a delay. Dialing is a different story. I decided against actually detecting the phone dial digits for the much simpler detection of whether the phone is dialing at all. Detecting the phone dialing is trivial: when rotary (technical: pulse-dial) phones are dialing, internally the phone is just going on and off hook. The existing code that detects on and off hook is sufficient to pause the buzzer while dialing.



The way the delay works is that every time the phone is seen to be on-hook (which includes the pulses from dialing), the hookTimeDelta value is reset to zero. This means that the buzzing will be automatically delayed every time the phone is dialed!



From a presenter point of view, the demo involves picking up the handset and talking while dialing. So long as the dialing keeps happing, the buzzing is delayed. The presenter can choose when to stop dialing and let the buzzing happen!



The good and the bad




Three great parts of using the Particle Phone Arduino Wi-Fi controller:

  1. The over-the-air downloads worked very smoothly. I could edit my code in their editor and press the "download now" button to flash the chip; this was as fast as could be expected.
  2. Debugging by writing to an OLED was a real time saver, especially since there no other obvious debugger in their environment.
  3. The simple A2D conversions were super useful for connecting the water sensor and phone.



Some less than good parts:

  1. The default Photon setup is that connecting to the development Wi-Fi is required for the program to run. I learned this on the day of my presentation, far from my development area. The actual code didn't work for the demo at all.
    The solution is to put
    SYSTEM_THREAD(ENABLED); in your code
  2. The Arduino concept of "setup" and "loop" was awkward in practice. This little robot really needs a state machine, but that's not the simple path for Arduino.
  3. The Particle Photon documentation didn't mention what the actual OLED type was, even though it's part of the Particle Photon dev kit. I had to poke around for too long to find the libraries and then figure out how to use the libraries. Turns out it's an Adafruit SSD1306.
    #include
     



Conclusion


Herbert Televox was an awesome robot, even if it was just the 1920's version of cheap plywood. Making a modern version was an effective prop for a bigger presentation.







You can see the original Herbert Televox at the http://www.themansfieldmuseum.com/

Saturday, March 30, 2019

Any port in a storm!

Gopher, of course, runs off of port 70, and has since the Gopher RFC 1436 from March, 1993. Since that, many protocols have embraced using SSL (or TLS in its more modern and secure form), often preferring to send all SSL-protected and encrypted data over some other port. HTTP, for example, uses port 80 for unencrypted traffic and 443 for encrypted traffic.

Heat map for the most popular ports in GopherSpace


So what about poor Gopher? How do Gophers in the wild handle SSL/TLS? I've seen a gopher/s protocol on the internet where if a port number is > 100,000 then it's assumed to be TLS. That has the problem that although it's a technically valid URL, some programs (like OneNote) don't seem to much like them.

Or khzea.net treats port 105 (normally the CSO port, and therefore pretty much unused in the real world) as the SSL port for Gopher.

What I want to know, of course, is the real-world distribution of port numbers in the existing GopherSpace. As always, I'm using the a bunch of data from an earlier Gopher crawl.

No surprise, the most common specifically mentioned port is port 70, the gopher port. There was just a single reference to a port > 100105, so the new standard of using very large ports to indicate SSL/TLS hasn't taken off yet.

The file entry overwhelmingly uses port 70; failing that, port 9999 is popular.
The directory entry is also mostly port 70 (really, not a surprise at all), with port 9999 the second-most popular port along with a smattering of 7070 7006 and 7005 ports.
The HTTP (h) entry is also mostly port 70 with port 80 (the official HTTP port) being the runner-up
The info (i) entry has an unsurprisingly variance. Since the info entry isn't a selectable entry, the port and host for it aren't actually used; developers can pick whatever random values they want. On common value is no value at all
Lastly, the Image (I) entry is once again most commonly on port 3298 with essentially no variation. Interestingly, the GIF(g) entry is also essentially always on port 70.

What can we learn from all this? My biggest takeaway is that there are enough users of non-standard ports that any gopher client that's worth anything should use the given port numbers. I suspect that I'd get a lot more secure gopher (perhaps on port 105 like khzae.net does it) showing up on my Gopher scan if the scanner actually supported TLS/SSL Gopher -- right now, it won't correctly follow most of the secure Gopher links.

What I don't know how to do is to correctly follow a secure link. If I'm at a secure Gopher site (like maybe the gophers://khzae.net:105 site), should I simply assume that all links are secure links? Or should I assume that all non-70 links are? Or just the 105 links? I suspect that the only way to know is to try to pull data from each Gopher directory in both TLS/SSL mode and plain text mode, and see what works.

Saturday, March 9, 2019

All my character sets

Yet another post about Gopher. In yet another diversion, I'm looking at what character sets are used on Gopher menu pages (the Gopher type 1 directory listings).
In the table, you can see that plain ASCII is the clear winner; almost every directory entry is printable ASCII characters (e.g., with no control characters, escape, DEL or any type of 8-bit character).

What are the others? UTF8 is about twice as popular as LATIN1 even though the Gopher spec is pretty clear that LATIN1 is the correct encoding. After that comes ASCII with some control characters. 

The last two categories are a little weird: they represent ASCII, but some of the space characters are actually character 160, the LATIN1 "Non-breaking space" character. There were 41 entries where the text was otherwise perfectly ordinary ASCII characters. 

Even more weirdly, the entries that were ASCII but included some kind of control character, the control characters are most often 1a (17), 7f (8) and 1b (5). I don't know why SUB and DEL are so popular. At least the 1b char (ESC) is understandable; it's used for fancy graphics.

There's often no way to automatically prove that a string is UTF8 or LATIN1. It is possible to prove that a string is not legal UTF8, but as we all know, sometimes strings are malformed, and what's supposed to be UTF8 is instead nearly or almost UTF8. What I do is to look at each string; if there are any characters with the 8th bit set, then I look to see if it's perfect UTF8. If it's not prefect UTF8, I assume that it's LATIN1.

Yes, there are other character sets in existence. I'm rather hoping that there aren't any in GopherSpace, because I'm not sure how I'd figure out which was which.

For C# programmers, I've learned two important things about UTF8 conversions. Firstly, many of the "Utf8 check" libraries are copies of each other and they have a common bug where if a buffer ends with a UTF8 sequence, the sequence is incorrectly flagged as being incorrect.

Secondly, the UWP Encoding.Utf8 has the pernicious habit of sometimes throwing on bad UTF8 sequences and sometimes not. I could understand them making a choice either way, but being in an in-between state is just plain programmer-unfriendly.

TL/DR: you have to handle UTF8 and LATIN1 char sets, and as a programmer, you have to double-check your conversion library.

Thursday, March 7, 2019

Dots, more dots, most dots

Welcome back to another part in a series I'm not really entitling, "ways in which servers completely fail to deliver correct Gopher pages". This post is all about the last line of each menu, where a gopher menu is the common type of page that you go to. Each menu has a list of links, files and information. And each menu is supposed to end with a single line consisting of a single dot.


BNF snippet for the Gopher Protocol, RFC 1436
The picture shows the a part of RFC 1436, the Gopher RFC. Most internet protocols are defined by one or more "Request for Comments" starting with the very early days of the Arpanet, the network from which the Internet was created.

This particular snippet shows what the "LastLine" should be. The Lastline ::= means that we're going to define a new part of the protocol. One the right had side of the ::= is the actual definition: '.' CR-LF. This means that the last line should be a period (the '.') followed by a CR-LF. CR-LF is defined earlier; it's an ASCII carriage return followed by a line feed.

And now the big question: how many menus actually follow this pattern? The answer, of course, is "most" but also "but not all". Here are the numbers from a recent Gopher crawl of part of the Gopherverse

Numerically, the results are:


Correct LastLine 1453 69.26%
No LastLine 633 30.17%
Dot and then close 10 0.48%
Dot then CR or LF 2 0.10%


Conclusion: if you're writing a Gopher parser, you have to handle the presence or absence of the dotted line, and several different ways the last line can be messed up.


Wednesday, March 6, 2019

Gopher: Carriage Returns, Line Feeds and Tabs (oh my)

Part of the Gopher menu screens (aka directory listing) is that the protocol carefully specifies the line endings (CR-LF, a carriage-return followed by line-feed). The last line should be just a period (.) followed by a CR-LF. Each line is supposed to have exactly three tabs so that a single directory entity is

Type User_Name Selector Host Port

Let's see, from the current Gopher survey data, how many menu (directory) pages match these requirements!

As an FYI, CR, carriage return and \r all refer to the same character. The same goes for LF, line feed and \n. Just to make life extra confusing, in the C programming language a string with a \n in is often called a new-line and will be "expanded" to a \r\n on some operating systems depending on how the file is written.

First let's check out menus with incorrect line endings. Out of 2098 menus with some data,

  • 93% (1941) were completely correct; all lines ended with CR LF
  • 5% (95) ended with just LF and not CR
  • 3% (54) had a mix of CRLF and either LF or CR line endings
  • .3% (7) are a confusing mix of line endings
  • 1 had no line endings at all. This data was seemingly garbage but might be a TN3270 telnet session. Or it might not be!
  • 0 ended with just CR and no LF
Just for fun I also counted up the number of menus that included any LF CR pairs (where the developer got the line endings the wrong way around). There are 9 such menus.

This mix of line endings makes like for Gopher clients more complicated, of course :-(

Next up: that TAB analysis I promised at the start of this post!


Tuesday, March 5, 2019

Directory entry says what? Current Gopher type field types

Ready to dig into the details of the modern Gopher ecosystem?  

The Gopher network protocol is like the precursor to modern HTML web browsers. Like HTML, a Gopher client could display links and text, download files, and display images. Unlike HTML, a Gopher client is using very spare, without the opportunity to display colorful, interactive displays. 

A typical Gopher screen is a directory, a menu-like list of lines, each of which does one thing. A line might display some text, or be a link to a file to view or download, or an image, or might be  a link to another Gopher directory, possibly on another server.  
The Amadeus Gopher Server, served up by my own Simple Gopher Client

I started looking more into modern Gopher sites and how they actually work when creating my own Simple Gopher Client for the Window Store.

 The Gopher RFC 1436 lists 14 different directory entry type fields, each of which is given a single letter identifier. 0, for example, means that the entry refers to a text file that can be displayed; 1 stands for a link to another gopher server. The 'g' type field is for a GIF file and 'I' is for an image type (but the type isn't explicitly given). Uniquely, directory entries can point to other protocols: '8' for a Telnet server, '2' for a particular phone-lookup protocol, '7' for a search engine, and 'T' for an IBM 3270-style terminal connection. 

Over the years other type fields have been informally added to the list. I recently did a crawl of the Gopher space as it exists in February, 2019 to see what kinds of directory entry type fields are in current usage across the current Gopher space. 

Many Gopher files are served up using generic descriptions and not the more precise descriptions. Type 9, "Binary file" is the most common, followed by type 'd', document (a modern addition that's not part of the official Gopher spec). These two account for 87% of the non-image files served up by Gopher. The third most common type is type 5, DOS Binary, followed by BinHex, PDF and UUEncoded. 


Something similar happens with picture image formats. The 'I' generic image field type is used about 30x more often than the more specific 'g' GIF field type. 


Looking at the most popular field types, the 0=file, i=information and 1=directory are the top three field types by far, accounting for about 90% of the field types. 




Some of the original field types are hardly present. The "T" type field that indicated a IBM TN3270 style interaction is entirely missing. The type field '2' CSO Phone book lookup is present on just a 4 pages total, but most of them seem to be samples of what a CSO phone book would be like, not a real phone book. There are actual field type '3' error pages, and no surprise, they seem to result from correctly handling errors from the scripts that generate some Gopher pages. There are also no Duplicate Server '+' type field entries. 


 (Note: I removed from the numbers pages that are test pages whose purpose is to validate Gopher clients) 

Type Fields (Alphabetical order) 
I'll finish this blog post with a handy table of existing Gopher tag types. Every tag that was found at least 10 times, or is part of the official RFC, is listed here 

Field 
Count 
Type 
Status 
; 
11 
Video 

+ 
0 
Duplicated Server 
RFC 
0 
60976 
File 
RFC 
1 
29335 
Directory 
RFC 
2 
5 
CSO Phone 
RFC 
3 
36 
Error 
RFC 
4 
223 
File (BinHex) 
RFC 
5 
631 
File (DOS Binary) 
RFC 
6 
3 
File (UUEncoded) 
RFC 
7 
257 
Index-search server (Veronica) 
RFC 
8 
479 
Telnet 
RFC 
9 
4799 
File (binary) 
RFC 
D 
12 
Some kind of binary file? 

d 
1590 
File (document) 

g 
102 
Image (gif) 
RFC 
H 
4 


h 
3914 
HTML Link 

I 
3300 
Image 
RFC 
i 
13216 
Information 

M 
115 
Mail file? 

P 
26 
PDF File 

p 
15 
Image (PNG) 

s 
278 
Sound 

T 
0 
IBM TN3270 
RFC 
w 
9 
Wiki edit link 
Field 
Count 
Type 
Status 
; 
11 
Video 

+ 
0 
Duplicated Server 
RFC 
0 
60976 
File 
RFC 
1 
29335 
Directory 
RFC 
2 
5 
CSO Phone 
RFC 
3 
36 
Error 
RFC 
4 
223 
File (BinHex) 
RFC 
5 
631 
File (DOS Binary) 
RFC 
6 
3 
File (UUEncoded) 
RFC 
7 
257 
Index-search server (Veronica) 
RFC 
8 
479 
Telnet 
RFC 
9 
4799 
File (binary) 
RFC 
D 
12 
Some kind of binary file? 

d 
1590 
File (document) 

g 
102 
Image (gif) 
RFC 
H 
4 


h 
3914 
HTML Link 

I 
3300 
Image 
RFC 
i 
13216 
Information 

M 
115 
Mail file? 

P 
26 
PDF File 

p 
15 
Image (PNG) 

s 
278 
Sound 

T 
0 
IBM TN3270 
RFC 
w 
9 
Wiki edit link 
Field 
Count 
Type 
Status 
; 
11 
Video 

+ 
0 
Duplicated Server 
RFC 
0 
60976 
File 
RFC 
1 
29335 
Directory 
RFC 
2 
5 
CSO Phone 
RFC 
3 
36 
Error 
RFC 
4 
223 
File (BinHex) 
RFC 
5 
631 
File (DOS Binary) 
RFC 
6 
3 
File (UUEncoded) 
RFC 
7 
257 
Index-search server (Veronica) 
RFC 
8 
479 
Telnet 
RFC 
9 
4799 
File (binary) 
RFC 
D 
12 
Some kind of binary file? 

d 
1590 
File (document) 

g 
102 
Image (gif) 
RFC 
H 
4 


h 
3914 
HTML Link 

I 
3300 
Image 
RFC 
i 
13216 
Information 

M 
115 
Mail file? 

P 
26 
PDF File 

p 
15 
Image (PNG) 

s 
278 
Sound 

T 
0 
IBM TN3270 
RFC 
w 
9 
Wiki edit link 
  
Field 
Count 
Type 
Status 
; 
11 
Video 

+ 
0 
Duplicated Server 
RFC 
0 
60976 
File 
RFC 
1 
29335 
Directory 
RFC 
2 
5 
CSO Phone 
RFC 
3 
36 
Error 
RFC 
4 
223 
File (BinHex) 
RFC 
5 
631 
File (DOS Binary) 
RFC 
6 
3 
File (UUEncoded) 
RFC 
7 
257 
Index-search server (Veronica) 
RFC 
8 
479 
Telnet 
RFC 
9 
4799 
File (binary) 
RFC 
D 
12 
Some kind of binary file? 

d 
1590 
File (document) 

g 
102 
Image (gif) 
RFC 
H 
4 


h 
3914 
HTML Link 

I 
3300 
Image 
RFC 
i 
13216 
Information 

M 
115 
Mail file? 

P 
26 
PDF File 

p 
15 
Image (PNG) 

s 
278 
Sound 

T 
0 
IBM TN3270 
RFC 
w 
9 
Wiki edit link 
  
Field
Count
Type
Status
;
11
Video

+
0
Duplicated Server
RFC
0
60976
File
RFC
1
29335
Directory
RFC
2
5
CSO Phone
RFC
3
36
Error
RFC
4
223
File (BinHex)
RFC
5
631
File (DOS Binary)
RFC
6
3
File (UUEncoded)
RFC
7
257
Index-search server (Veronica)
RFC
8
479
Telnet
RFC
9
4799
File (binary)
RFC
D
12
Some kind of binary file?

d
1590
File (document)

g
102
Image (gif)
RFC
H
4


h
3914
HTML Link

I
3300
Image
RFC
i
13216
Information

M
115
Mail file?

P
26
PDF File

p
15
Image (PNG)

s
278
Sound

T
0
IBM TN3270
RFC
w
9
Wiki edit link