Wednesday, July 18, 2007

Command-line parsing: more details

Command-line parsing: more details

In the June 26, 2007 post "CMD.EXE Parsing - Splitting into Arguments" I listed the rules that Microsoft's C RTL uses when console apps split their command line into arguments. Rule number two was "2. The first argument (the program name) is parsed specially".

The actual rule for the first argument isn't terribly complex. Like regular arguments you're either in 'inquote' mode or not; when in 'inquote' mode the only end for the argument is the NUL character (but you can switch out of 'inquote' mode). Unlike the regular rules, there are escape characters -- a backslash is just a backslash, and there's no way to embed a double-quote. The double-quotes that switch in and out of 'inquote' mode are, of course, not considered part of the argument.

When not in 'inquote' mode a space or tab exits the parsing.

By the way -- Microsoft actually publishes the underlying source code for all of this. When you compile in DEBUG mode, you can even set breakpoints on the C RTL startup part of your code. In the "Visual Studio 8" directory (which is what they call 'Visual Studio 20005'), look for VC\crt\src\stdargv.c