WIN32S: a programmer's dream from the Windows 3.1 era
Gather 'round, young 'uns, and let me give you some wisdom from the days of the Win32s compatibility library. When Windows was being made, it was a 16-bit system, meaning that pointers were short and it was painful to address much memory. I'll also share a key feature of technical decisions: how to make an early decision that you don't know the ramifications of.
Quick aside: the people who made the 8086 and compilers as well as the PC and the OS knew this, so pointers were "actually" 20 bits (1 meg) thanks to the selectors and corresponding weird compiler switches, plus there was weird back-compat with the "A20" shim, and of course there was additional weirdness for some bank-selection.
Enter the RS/1 statistical program, which I helped port from its VAX/VMS and Unix heritage to the PC running Windows 3.1 and NT. It was a "big" program that had been in active development by a team of capable developers for ten years or more, with a ton of features. It had been designed originally to work on constrained computers like the PDP-11 (also a 16 bit machine with its own pointer weirdness).
A key early decision was to use the Win32s library on Windows 3.1. This library let us use 32-bit pointers in our code, which was a big plus. A downside, though, is that we didn't have any way to know about any potential downsides. Marketing material then, as now, talks big about how great new technology is, and doesn't much mention possible issues.
But there was an incredibly tiny flaw in the Win32s implementation of "unlink" that was to have enormous implications. To explain the bug, you have to know two things:
Firstly, the unlink call is used to "delete" a file. It's called unlink because technically the file isn't "deleted": it's merely removed from a directory. If the file is only present in one directory and no program is using it, the file is also deleted.
Secondly, RS/1 kept track of every "table" of data as a separate file, and it used tables for pretty much everything. This was actually awesome, and I think more programming environments would benefit from a table-first approach. Some tables were permanent, but others were "temporary" and might be just in memory or might have a file backing depending on the available memory. Remember that RS/1 was designed for low-memory environments, so this automatically shuffling of data between disk and memory was a critical part of the program.
The bug
The bug in unlink was that if you did an unlink on a file that didn't exist, it returned the wrong value. Specifically, it returned success. The Posix standard was to return failure. This was critical to the underpinnings of RS/1: it's how the program knew if a temporary file was just in memory or was actually backed by a file on disk. By returning the wrong values, some internal bookkeeping got confused, and would eventually cause a crash.
This was caught, BTW, by the incredibly good $systemtest() function that RS/1 shipped with. Any user, at any time, could run $systemtest() and the program would do a pretty solid job of verifying that it was all running correctly.
On a Windows 3.1 machine, the program crashing also meant that the whole computer crashed, which was not ideal for debugging. I finally tracked it down by running RS/1 in a debugger on two machines, one running Windows 3.1 (where it used the Win32s library and crashed) and one running Windows NT (which used the native implementation which worked perfectly). I then started doing a sort of binary search to trace what was different about the two systems as it did the $systemtest().
Key takeaways for making early technical decisions
Bet on the future, not the past. We could have just made a 16-bit app. But the future was clearly 32-bit for Windows
Everything that isn't mainstream has bugs. Your schedule should include time for them. A constant in my technical career is that libraries that don't get much use have bugs, and there isn't much management resolve to fix them. In our case, the mainstream was either 16-bit code on Windows or 32-bit code on VMS or Unix. Win32s was a weird little library (as was the "PharLap" DOS extender that an earlier PC version used).
Workarounds are better than hoping for a bugfix. If your library has an issue, it's best to figure out a workaround. Waiting for a fix that might never come will delay shipping, possibly forever.
No comments:
Post a Comment