A blast from the past: Disassembling DOS

Andrew Schulman

from Undocumented DOS: A Programmer's Guide to Reserved MS-DOS Functions and Data Structures (2nd edition, 1994)

Copyright (c) Andrew Schulman 1994-2020. All rights reserved.

[This nearly-ancient text (along with others from Undocumented DOS and Undocumented Windows) is being presented as a case study in some methodologies of software reverse engineering, applied to mass-market software. Note that this chapter appeared in the 2nd edition of the book, not in the 1st edition.]

The previous chapter showed that it possible to discover a lot about a program without resorting to what is often called reverse engineering. Simply by examining a program's outward behavior, a utility such as INTRSPY shows, for example, that Windows uses the undocumented DOS Get SysVars function, and that Microsoft's QuickC makes the weird SetPSP(0) and SetPSP(-1) calls that are discussed in chapter 4.

But such external examination of a program's behavior can take us only so far. INTRSPY can't tell us why Windows calls Get SysVars — that is, which fields it uses in the SysVars data structure) — nor can INTRSPY tell us why QuickC passes the illegal values 0 and -1 to the DOS Set PSP function. To figure out why a program behaves in a certain way, you need to actually get inside the program. This requires disassembly.

Disassembly is particularly important to understanding what goes on inside MS-DOS itself. What does DOS actually do when a program calls the Get SysVars function, for example? How does DOS carry out an INT 21h AH=4Bh EXEC request? How do DOS 5.0 and 6.0 interact with Windows? To answer questions like these, there's no substitute for looking at the DOS code. Though Microsoft does produce a DOS OEM Adaptation Kit (OAK) that we discuss later in this chapter, source code to MS-DOS is not widely available. For those of us without the DOS source code, understanding DOS requires disassembling it.

The goal of this chapter is to acquire an understanding of DOS internals, that is, to get an intuitive feel for what goes on when a program makes an INT 21h DOS call. Chapter 2 briefly presented a disassembly of two DOS functions, INT 21h AH=0Eh (Set Default Drive; see listings 2-7 and 2-8) and INT 21h AH=19h (Get Default Drive). But how did we find the code for these functions in the place? A key purpose of this chapter is to present a close look at the key part of MS-DOS, the INT 21h handler, with its function dispatch table, which contains pointers to the code that handles each individual INT 21h function. Armed with this table, you can readily consult the code for any particular DOS function whose implementation interests you. You can apply the same technique to other pieces of code, such as DR DOS or the INT 21h hook in Novell NetWare's NETX.COM (see chapter 4).

The resident DOS code is found in two files, IO.SYS and MSDOS.SYS—sometimes named IBMBIO.COM and IBMDOS.COM. DOS 6.0 and higher also has DBLSPACE.BIN, which Microsoft usually considers a third member of the DOS kernel. While there are various ways to examine the code in these files on disk, this chapter instead examines the INT 21h handler in memory, using Microsoft's own DEBUG, a primitive though handy tool that comes with MS-DOS.

Part of the reason for using DEBUG, rather than a more sophisticated debugger or disassembly tool, is to underline the point that Microsoft itself provides the means for reverse engineering DOS. Since programmers frequently have questions about the legalities of disassembly, this chapter also briefly discusses the law surrounding reverse engineering and trade secrets.

Of course, there is more to DOS than just IO.SYS and MSDOS.SYS. We also look briefly at the disassembly of external programs such as COMMAND.COM, MSCDEX.EXE, and PRINT.COM, which is probably the most heavily disassembled DOS utility and the one on which many TSR writers figured out their craft.

Whether or not you disassemble DOS depends of course on what interests you. The examination of the INT 21h dispatch code in this chapter may provide all you ever wanted to know about how DOS functions internally. On the other hand, if you absolutely, positively must know exactly what is going on inside MS-DOS and you have the money to pay for this information, you may want to license Microsoft's DOS OEM Adaptation Kit, which includes assembly language and C source code for many parts of DOS, as well as .OBJ files with full symbolic information for those parts where direct source code is not provided. We take a quick look at the OAK contents later on.

What is MS-DOS?

MS-DOS is a bit like pornography. Everyone knows what it is when they see it, but almost no one can define it.

of all, MS-DOS is not the C> prompt. While that infamous user interface seems practically synonymous with MS-DOS, it is not actually a necessary part of DOS. The C> prompt is provided by COMMAND.COM, which (as chapter 10 explains in more detail) anyone can easily replace. As indicated by the shell= statement in CONFIG.SYS, COMMAND.COM is just a shell around the DOS kernel. Other shells, such as 4DOS or the MKS Korn shell, are widely available. Get rid of COMMAND.COM, and you still have MS-DOS.

From a programmer's perspective, MS-DOS seems like a collection of INT 21h functions. But this isn't quite accurate either. While the INT 21h functions are the most important service provided by DOS, DOS and INT 21h are not synonymous. Several application wrappers in chapter 2 (listings 2-20 and 2-21) already showed how easy it is for a normal program to fiddle with INT 21h calls before or after DOS itself gets them. That a piece of code handles INT 21h doesn't necessarily make it part of DOS.

So if DOS ain't necessarily the C> prompt or the INT 21h interface, what then is it? And where is it?

The "what" part is difficult to answer, except to note that DOS is in many ways what textbooks on operating systems call a microkernel. DOS provides a small bare minimum of services, on top of which other, more sophisticated, services can be built. Think of DOS as a software motherboard, into which the user is free to plug in various extensions. These extensions come not only from Microsoft but also from key third-party vendors such as Novell, Quarterdeck, Qualitas, Symantec, Central Point, and Phar Lap. DOS is the arena in which all these companies' products must both compete and work together.

Well, that was vague enough!

Mercifully, the "where" part at least is easy to answer. MS-DOS consists of two files, IO.SYS and MSDOS.SYS. In both IBM PC-DOS and Novell's DR DOS, these files are called IBMBIO.COM and IBMDOS.COM. Despite the .SYS file names, these are not device drivers, but binary images. In MS-DOS 6.0, there is a third file, DBLSPACE.BIN, which Microsoft generally considers a full-fledged third member of the DOS kernel—the SYS and FORMAT /S commands in DOS 6.0 copy DBLSPACE.BIN over to a floppy, along with IO.SYS and MSDOS.SYS. Take these two or three files, and you've got DOS. Of course, you'll also need a shell such as COMMAND.COM in order to get much work done.

Among other things, MSDOS.SYS contains the DOS dispatch function, which is DOS's handler for INT 21h calls. There are other DOS functions, such as INT 25h, 26h, and 2Fh, that MSDOS.SYS and IO.SYS handle as well.

IO.SYS consists of two parts, a loader (MSLOAD.COM) and BIOS support code (MSBIO.BIN); Microsoft creates IO.SYS by concatenating these two files:

      copy /b msload.com+msbio.bin io.sys

IO.SYS is not "the BIOS," as books on DOS programming frequently claim, but merely the DOS interface to the BIOS. IO.SYS contains the standard device drivers such as CON, AUX, LPT1, and COM1 (see chapter 7). These device drivers are implemented using BIOS calls. For example, the CON driver built into IO.SYS (more precisely, MSBIO.BIN) makes INT 10h and INT 16h calls to the ROM BIOS video and keyboard routines.

The MSLOAD.COM portion of IO.SYS contains a famous set of routines called SYSINIT, which is responsible for the bootstrap loading of DOS.

We won't discuss SYSINIT here, as it has already been covered elsewhere (see "How MS-DOS Is Loaded" in chapter 2 of Ray Duncan's Advanced MS-DOS Programming, and "The Components of MS-DOS" in Duncan's MS-DOS Encyclopedia ). And practically every other book on DOS programming seems to repeat this same basic material on SYSINIT. Presumably this is not just because the bootstrap loading of DOS is an interesting subject, but also because Microsoft already documents SYSINIT in the DOS OAK. Geoff Chappell provides a far more original and useful description of DOS startup in his DOS Internals, chapters 1 ("The System Configuration"), 2 ("The System Footprint"), and 3 ("The Startup Sequence"). For example, Chappell is the author to make the connection between SYSINIT and the List of Lists structure (whose actual name in the DOS source code is SysInitVars).

So the DOS boot sequence is fairly well known. What hasn't been provided before, amazingly, is any description of what DOS looks like once it is up and running. This primarily requires a description of DOS's INT 21h handler and the INT 21h dispatch table. In other words, what code runs when you make an INT 21h call to DOS? Scores of DOS programming books of course describe what this or that DOS function call does, but few describe how any of these function calls work; and none to our knowledge "aside from a brief discussion of DOS stack switching in Microsoft's MS-DOS Encyclopedia (pp. 353-355)" describes the DOS function call mechanism itself. This seems far more important than providing yet another standard description of how DOS boots up or how SYSINIT moves segments around in memory.

One of our tech reviewers writes that "parts of the boot sequence are NOT well known! In DOS 6.0 and up, there's the mechanism that IO.SYS uses to load DBLSPACE.BIN. And in DOS 7.0 (Chicago), if CONFIG.SYS contains the setting DOS=ENHANCED, there is code in IO.SYS that loads DOS386.EXE, which is a big executable similar to WIN386.EXE."

Disassembling IO.SYS and MSDOS.SYS

The choice between describing SYSINIT or describing the INT 21h handler is an important one, because the portion of DOS which one is interested in looking at largely determines how one goes about disassembling DOS.

To look at DOS initialization, you either have to acquire the DOS OAK (which provides assembly language source code to IO.SYS, including the SYSINIT modules), or you have to disassemble the actual IO.SYS and MSDOS.SYS files on disk. These files are hidden system files, which however can be easily unhidden:

      C:\UNDOC\CHAP6 > attrib -h -s \*.sys

IO.SYS is about 32K, and MSDOS.SYS is about 37K. Once unhidden, these two files can be disassembled, even with the u (unassemble) command in the primitive DEBUG utility that comes with DOS. After running ATTRIB to unhide MSDOS.SYS or IO.SYS, type DIR to find the file's size. DEBUG loads the file at address 100h, so add 100h to the file size (converted to hexadecimal) to yield the disassembly end-range. For example, if MSDOS.SYS is 37,506 (9282h) bytes:

    C:\UNDOC2\CHAP6>type msdos.scr
    u 0100 9382

    C:\UNDOC2\CHAP6>debug \msdos.sys < msdos.scr > msdos.lst

The resulting MSDOS.LST is about one megabyte in size; if you use a disassembler such as Sourcer, the file is about 800K. In some ways, the output from such a straightforward disassembly of MSDOS.SYS looks quite useful. For example, you can quite plainly see DOS's INT 21h handler inspecting the caller's function number in AH. This is the DOS code called whenever a program generates an INT 21h:

    6A76:040B FA            CLI
    6A76:040C 80FC6C        CMP AH,6C       ; is function > 6Ch?
    6A76:040F 77D2          JA  03E3        ; yes: error
    6A76:0411 80FC33        CMP AH,33
    6A76:0414 7218          JB  042E
    6A76:0416 74A2          JZ  03BA
    6A76:0418 80FC64        CMP AH,64
    ; ... etc. ...

Likewise the MSDOS.SYS INT 2Fh handler is also visible. IO.SYS has its own INT 2Fh handler, and in the last line of the code fragment below, you can see the INT 2Fh handler in MSDOS.SYS jump to the one in IO.SYS, using a hard-wired address:

    1C53:07B9 FB            STI
    1C53:07BA 80FC11        CMP AH,11
    1C53:07BD 750A          JNZ 07C9
    ;;; Go to 07BFh if an INT 2Fh call belonging to an external
    ;;; program such as a redirector, SHARE, or NLSFUNC, ends up
    ;;; in MSDOS.SYS. This means the external program isn't loaded.
    1C53:07BF 0AC0          OR	AL,AL       ; is AL=0?
    ; ... error handling ...
    1C53:07C9 80FC10        CMP AH,10       ; INT 2Fh AH=10h? (SHARE)
    1C53:07CC 74F1          JZ  07BF  ; got here, so SHARE not loaded
    1C53:07CE 80FC14        CMP AH,14		; INT 2Fh AH=14h? (NLSFUNC)
    1C53:07D1 74EC          JZ  07BF  ; got here, so NLSFUNC not loaded
    1C53:07D3 80FC12        CMP AH,12       ; INT 2Fh AH=12h?
    1C53:07D6 7503          JNZ 07DB
    1C53:07D8 E99701        JMP 0972        ; handle DOS internal functions
    1C53:07DB 80FC16        CMP AH,16       ; INT 2Fh AH=16h? (Windows)
    1C53:07DE 740D          JZ  07ED        ; might be Windows broadcast
    1C53:07E0 80FC46        CMP AH,46       ; INT 2Fh AH=46h? 
    1C53:07E3 7503          JNZ 07E8
    1C53:07E5 E93E01        JMP 0926
    1C53:07E8 EA05007000    JMP 0070:0005   ; see if IO.SYS wants it

But while at this looks useful, after a few minutes it becomes clear that the quality of the unassembly is unfortunately quite poor. Much better versions of these INT 21h and INT 2Fh handlers are shown later in figures 6-7 and 6-13. For example, the most important part of the INT 21h handler uses the function number in AH as an index into a dispatch table:

    ;;; previously moved AH func number into BX
    6A76:04FE 8B9FA73E      MOV BX,[BX+3EA7]
    6A76:0502 36871EEA05    XCHG BX,SS:[05EA]
    6A76:0507 368E1EEC05    MOV DS,SS:[05EC]
    6A76:050C 36FF16EA05    CALL SS:[05EA]

Unfortunately, if you now go and look at 3EA7h, presumably the address of the all-important INT 21h function dispatch table, there turns out instead to be perfectly valid-looking code at that address, and not a table at all. Likewise, 05ECh and 05EAh are, in this context, totally bogus. This isn't a problem with DEBUG, however. A straight disassembly on disk of MSDOS.SYS or IO.SYS, even with a more sophisticated disassembler such as Sourcer, doesn't produce much better results.

The problem is that the SYSINIT process (as described in the MS-DOS Encyclopedia ) moves segments around in memory and relies heavily on segment arithmetic. Address cross-references often won't match up properly in a static disassembly of DOS on disk. To get a good disassembly of the core DOS interrupt handlers, it is much easier to disassemble DOS in memory, after the DOS initialization segment movement (which might include the DOS=HIGH movement of the DOS kernel to the high memory area, or HMA) is complete.

The only problem with disassembling DOS out of memory, rather than in the system files on disk, is that this misses the SYSINIT code, which is discarded from memory when the initialization is complete. However, as noted earlier, SYSINIT and the DOS bootstrap process have already been adequately covered elsewhere.

Again, a tech reviewer writes, "NO! You're forgetting all the "preload" stuff that IO.SYS does starting in DOS 6.0. Also, taking apart IO.SYS really isn't that difficult. To link up data with the code that uses it, you just need to subtract some fixed amount, which is easy to figure out once you have one code/data pair. Just look at the code in IO.SYS that preloads DBLSPACE.BIN." Hmm, it seems we ought to take a look at this...

Examining How IO.SYS Preloads DBLSPACE.BIN

It turns out that static disassembly of IO.SYS is actually pretty easy, even though at glance the results produced by a disassembler such as Sourcer look inadequate. It's true that references to data don't match up with the actual locations of the data in the file, but once you match up just one piece of data in the file with code that references it, you can figure out everything else.

For example, a Sourcer disassembly of IO.SYS from MS-DOS 6.0 contains the following data item:

    54BF:8138  5C 44 42 4C 53 50 41 43    db    '\DBLSPACE.BIN'
    54BF:813E  45 2E 42 49 4E 00

This is followed shortly by code that, based on the surrounding context (the code calls the INT 21h AX=4B03h Load Overlay function), is probably loading DBLSPACE.BIN. However, the code does not reference offset 8138h. Instead, it references CS:3B62h:

    54BF:8153  0E           push cs
    54BF:8154  1F           pop ds
    54BF:8155  BE 3B62      mov si,3B62h

If you subtract 3B62h from 8138h, you get 45D6h. If the code at 54BF:8155 really is referencing the '\DBLSPACE.BIN' string at offset 8138h, then 45D6h is the amount which you must add to other data references in this version of IO.SYS in order to locate the data itself. To confirm if this amount is accurate, just look for another data reference, and see if adding the amount onto it yields a likely-looking address. For example, a little further on in the file, IO.SYS produces an error message:

    54BF:81E9  0E           push cs
    54BF:81EA  1F           pop ds
    54BF:81EB ├║BA 5823      mov dx,5823h
    54BF:81EE  B4 09        mov ah,9
    54BF:81F0  CD 21        int 21h     ; DOS Services  ah=function 09h
                                        ;  display char string at ds:dx

From the helpful comment supplied by Sourcer on how INT 21h AH=9 works, it is clear that 5823h must be the offset within CS of a string. Adding 45D6h to 5823h yields 9DF9h and there, indeed, is the error message:

    54BF:9DF9  57 72 6F 6E 67 20    db  'Wrong DBLSPACE.BIN version', 0Dh

Thus, we really can pick apart IO.SYS on disk. This lets us examine the DOS boot process, in particular the recent additions such as the preloading of DBLSPACE.BIN in DOS 6 and the apparent ability to preload DOS386.EXE in DOS 7. "Preloading" means that IO.SYS looks for and loads these external programs before processing any DEVICE= statements in CONFIG.SYS. Chapter 1 discussed how Stacker 3.1 uses this interface to get itself preloaded under DOS 6. By examining IO.SYS, you can see how the interface works.

For example, after calling INT 21h AX=4B03h to load DBLSPACE.BIN, IO.SYS looks for a function pointer at offset 14h in DBLSPACE.BIN:

    54BF:819F  E8 FBD6              call    LOAD_OVERLAY   ; subr. does 21/4B03
    ; ...
    54BF:81C6  2E: C7 06 0387 0014  mov word ptr cs:[387h],14h ; get func ptr from
    54BF:81CD  2E: 8C 06 0389       mov word ptr cs:[389h],es  ;   offset 14h
    ; ...                                                      ;   in DBLSPACE.BIN

IO.SYS saves away the function pointer provided by DBLSPACE.BIN, and then calls it:

    54BF:81DA  0E                   push cs         ; IO.SYS passes DBLSPACE.BIN 
    54BF:81DB  07                   pop es          ;    a pointer to a buffer:
    54BF:81DC  BB 036A              mov bx,36Ah     ; 36Ah+45D6h=4940h (see below)
    54BF:81DF  B8 0006              mov ax,6        ; DOS version
    54BF:81E2  2E: FF 1E 0387       call dword ptr cs:[387h] ; call DBLSPACE.BIN
    ; ...                                                    ;    function ptr

    54BF:8228  BB 0004              mov bx,4                 ; subfunction 4
    54BF:822B  2E: FF 1E 0387       call dword ptr cs:[387h]
    ; ...

    54BF:4940  18 00                db   18h, 00h  ; a communications buffer

IO.SYS also checks for a 2E2Ch signature at offset 12 in DBLSPACE.BIN. A hex dump of DBLSPACE.BIN reveals the presence of this signature:

    C:\UNDOC2\CHAP6>dump \dos\dblspace.bin -bytes 32
    0000 | FF FF FF FF 42 48 41 08 8B 08 01 44 42 4C 53 50 | ....BHA....DBLSP
    0010 | 41 43 2C 2E E9 B2 59 00 00 EA 41 08 00 00 EA 8B | AC,...Y...A.....

Further discussion of this interface, and its possible role in the ongoing battle between Microsoft and Stac Electronics, appears in chapter 1. Here, the point is simply that all existing descriptions of the DOS boot process will need to be rewritten to take account of new additions to DOS such as DBLSPACE.BIN (and, in DOS 7, DOS386.EXE).

In any case, one topic that hasn't been covered at all is the INT 21h dispatch code, which is executed every time a program makes a DOS call (except another program that hooks INT 21h has completely intercepted the call, without chaining). As we'll see, there are many important aspects to the INT 21h dispatch code, including stack switching, use of the current PSP, incrementing and decrementing the InDOS flag, handling of critical sections, Ctrl-Break, and critical errors, checking the machine's A20 line when DOS=HIGH, and special casing for Windows Enhanced mode.

Interrupt Vectors and Chaining

Studying DOS internals requires finding the code in DOS that handles software interrupts such as INT 21h and INT 2Fh. As we just saw, trying to do this with IO.SYS and MSDOS.SYS on disk can produce inadequate results. In memory, however, it seems like it should be trivial to find DOS's INT 21h and INT 2Fh handlers. As every PC programmer knows, there is a documented DOS function, INT 21h AH=35h, which returns (in ES:BX) a far pointer to the code that handles the interrupt given in AL.

Finding the current handlers for INT 21h and INT 2Fh is thus a simple matter of calling INT 21h AX=3521h and AX=352Fh and looking at the returned far pointer, or vector, as it is called. This can be wrapped up in a simple program to print out interrupt vectors. Add a little extra smarts, such as trying to figure out the owner of each interrupt vector and disassembling some frequently encountered instructions at the beginning of the interrupt handler, and the result is INTVECT.C, shown in listing 6-1; listing 6-2 shows MAP.C, which attempts to figure out owners.

Listing 6-1: INTVECT.C

    bcc intvect.c map.c

    #include <stdlib.h>
    #include <stdio.h>
    #include <dos.h>

    typedef unsigned char BYTE;
    typedef unsigned short WORD;
    typedef unsigned long DWORD;

    #define MK_LIN(fp)  ((((DWORD) FP_SEG(fp)) << 4) + FP_OFF(fp))

    extern char *find_owner(DWORD lin_addr);    // in map.c

    #define ARPL    0x63
    #define IRET    0xCF
    #define JMPF    0xEA
    #define JMP8    0xEB
    #define JMP16   0xE9

    BYTE far *get_vect(int intno)   // call INT 21h AH=35h
        _asm push es
        _asm mov al, byte ptr intno
        _asm mov ah, 35h
        _asm int 21h
        _asm mov dx, es
        _asm mov ax, bx
        _asm pop es
        // return value in DX:AX

    void print_vect(int intno)
        char *s;
        BYTE far *fp = get_vect(intno);
        printf("INT %02Xh   %Fp   ", intno, fp);
        if (fp == 0)
        s = find_owner(MK_LIN(fp));
        printf("%-08s   ", s? s: " ");

        switch (*fp)    // see if first instruction of interrupt handler
        {               // is anything really obvious
            case ARPL:  printf("arpl -- Windows V86 breakpoint"); break;
            case IRET:  printf("iret -- NOP function"); break;
            case JMP8:  printf("jmp %Fp", 
                ((BYTE far *) fp) + fp[1] + 2); break;
            case JMP16: printf("jmp %Fp",
                ((BYTE far *) fp) + *((WORD far *) &fp[1]) + 3); break;
            case JMPF:  printf("jmp %Fp", 
                *((void far * far *) &fp[1])); break;

    main(int argc, char *argv[])
        char *end;
        int intno, i;
        if (argc < 2)
            for (intno=0; intno< 256; intno++)
        else for (i=1; i< argc; i++)
            print_vect(strtoul(argv[i], &end, 16));
        return 0;

For example:

    C:\UNDOC2\CHAP6>intvect 21 28 2f 2f
    INT 21h   C0B6:0942              
    INT 28h   18D4:0615   PRINT
    INT 29h   0070:0762   IO         
    INT 2Fh   1A82:000D   NLSFUNC

INTVECT and Windows

If you run INTVECT without command line parameters, it dumps out the vectors for all 256 interrupts. This is useful, for example, in determining which interrupts Windows Enhanced mode takes over; you can run INTVECT > TMP.TMP, start Windows, run INTVECT > TMP.2 from inside a DOS box, and then use diff or a similar utility to compare the files TMP.TMP and TMP.2. The difference between these two files reveals the interrupts that Windows Enhanced mode hooks using the low memory interrupt vector table (it also hooks some interrupts using the protected mode interrupt descriptor table). Where < points to the pre-Windows DOS output from INTVECT, and> points to the output under Windows, part of the output from diff might look like this (the complete output also shows changes to INT 0, 3, 8, 10h, 15h, 1Ch, 22h, 23h, 24h, 67h, and 68h):

    C:\UNDOC2\CHAP6>intvect 21 28 2f 2f
    INT 21h   C0B6:0942              
    INT 28h   18D4:0615   PRINT
    INT 29h   0070:0762   IO         
    INT 2Fh   1A82:000D   NLSFUNC

INT 28h is the DOS idle interrupt, and the Virtual DMA Services (VDS) use INT 4Bh. As you can see, INTVECT examines the byte of an interrupt handler looking for code such as the ARPL instruction, which Windows Enhanced mode uses as a V86 breakpoint, to force a transition from user (Ring 3) code to VMM (Ring 0) code. The seeming location of the Windows V86 breakpoints inside DBLSSYS$ (DoubleSpace) is misleading; this has to do with the way Windows implements V86 breakpoints (see Chappell, DOS Internals, chapter 2).

To build INTVECT, INTVECT.C should be linked with MAP.C (listing 6-2). MAP.C attempts to provide the owner's name for each interrupt vector, using code that is explained in detail in chapter 7 (see UDMEM.C, listing 7-XX). MAP.C will be reused with another program later in this chapter, INTCHAIN.C (listing 6-5). MAP can also be compiled with -DTESTING to produce a standalone program. For example, running MAP on one machine happened to produce the following output, which shows that this machine is running DoubleSpace, MSCDEX, SMARTDRV (loaded high), DOSKEY (also loaded high), and XMS and EMM servers:

        00000700   000009A0   IO
        000009A0   00001E80   DOS
        00001E80   00002010   D:
        00002010   00005780   MS$MOUSE
        00005780   00007EA0   MSCD001 
        00007EA0   00012FA0   DBLSSYS$
        00012FA0   000131F0   SETVERXX
        000131F0   00013670   XMSXXXX0
        00013670   00014950   EMMXXXX0
        00014950   000188A0   MSCDEX
        000189A0   0002A7E0   MAP
        000CAA30   000CBBA0   COMMAND
        000CBBD0   000D2C60   SMARTDRV
        000CDDA2   000CDDB4   M:
        000CDDB4   000DE470   J:
        000DE470   000DF4A0   DOSKEY
        00100000   0010FFEE   HMA

Listing 6-2: MAP.C

        bcc intvect.c map.c
        bcc intchain.c map.c
        bcc -DTESTING map.c
        #include <stdlib.h>
        #include <stdio.h>
        #include < string.h >
        typedef unsigned char BYTE;
        typedef unsigned short WORD;
        typedef unsigned long DWORD;
        typedef void far *FP;
        #ifndef MK_FP
        #define MK_FP(s,o)      ((((DWORD) s) << 16) + (o))
        #pragma pack(1)
        typedef struct {
            DWORD start, end;
            char name[9];
            } BLOCK;
        static BLOCK *map;
        static int num_block = 0;
        int cmp_func(const void *b1, const void *b2)    
            if (((BLOCK *) b1)->start < ((BLOCK *) b2)->start)       return -1;
            else if (((BLOCK *) b1)->start > ((BLOCK *) b2)->start)  return 1;
            else                                                     return 0;
        typedef struct {
            BYTE type;          /* 'M'=in chain; 'Z'=at end */
            WORD owner;         /* PSP of the owner */
            WORD size;          /* in 16-byte paragraphs */
            BYTE unused[3];
            BYTE name[8];       /* in DOS 4+ */
            } MCB;
        #define IS_PSP(mcb)     (FP_SEG(mcb) + 1 == (mcb)->owner)
        WORD get_first_mcb(void)
            _asm mov ah, 52h
            _asm int 21h
            _asm mov ax, es:[bx-2]
            // retval in AX
        typedef struct DEV {
            struct DEV far *next;
            WORD attr, strategy, intr;
            union {
                BYTE name[8], blk_cnt;
                } u;
            } DEV;
        #define IS_CHAR_DEV(dev)    ((dev)->attr & (1 << 15))
        DEV far *get_nul_dev(void)
            _asm mov ah, 52h
            _asm int 21h
            _asm mov dx, es
            _asm lea ax, [bx+22h]
            // retval in DX:AX
        int get_num_block_dev(DEV far *dev)
            // can't rely on # block devices at SysVars[20h]?
            // walk once through dev chain just to count # blk devs
            int num_blk = 0;
            do {
                if (! IS_CHAR_DEV(dev))
                    num_blk += dev->u.blk_cnt;
                dev = dev->next;
            } while(FP_OFF(dev->next) != (WORD) -1);
            return num_blk;
        WORD get_umb_link(void)
            _asm mov ax, 5802h
            _asm int 21h
            _asm xor ah, ah
            // return value in AX
        WORD set_umb_link(WORD flag)
            _asm mov ax, 5803h
            _asm mov bx, flag
            _asm int 21h
            _asm jc error
            _asm xor ax, ax
            // return 0 or error code in AX
        WORD get_dos_ds(void)
            _asm push ds
            _asm mov ax, 1203h
            _asm int 2fh
            _asm mov ax, ds
            _asm pop ds
            // retval in AX
        /* find IO.SYS segment with built-in drivers */
        WORD get_io_seg(DEV far *dev)
            WORD io_seg = 0;
            do {
                if (IS_CHAR_DEV(dev))
                    if (_fstrncmp(dev->u.name, "CON     ", 8) == 0)
                        io_seg = FP_SEG(dev);   // we'll take the last one
                dev = dev->next;
            } while(FP_OFF(dev->next) != (WORD) -1);
            return io_seg;
        static int did_init = 0;
        void do_init(void)
            MCB far *mcb;
            DEV far *dev = get_nul_dev();
            WORD dos_ds, io_seg, mcb_seg, next_seg, save_link;
            BLOCK *block;
            int blk, i;
            map = (BLOCK *) calloc(100, sizeof(BLOCK));
            block = map;
            io_seg = get_io_seg(dev);
            block->start = io_seg << 4; block->end = (DWORD) -1;
            strcpy(block->name, "IO"); block++;
            dos_ds = get_dos_ds();
            block->start = dos_ds << 4; block->end = (DWORD) -1;
            strcpy(block->name, "DOS"); block++;
            // should really check if there IS an HMA!
            block->start = 0x100000L;   block->end =  0x10FFEEL;
            strcpy(block->name, "HMA"); block++;
            num_block = 3;
            /* walk MCB chain, looking for PSPs, interrupt owners */
            if (_osmajor >= 4)
                mcb_seg = get_first_mcb();
                mcb = (MCB far *) MK_FP(mcb_seg, 0);
                if (_osmajor >= 5)  // be lazy; see ch. 7 for DOS < 5
                    save_link = get_umb_link();
                    set_umb_link(1);    // access UMBs too
                for (;;)
                    next_seg = mcb_seg + mcb->size + 1;
                    if (IS_PSP(mcb))
                        block->start = ((DWORD) mcb_seg) << 4;
                        block->end = ((DWORD) next_seg) << 4;
                        _fstrncpy(block->name, mcb->name, 8);
                        block->name[8] = '\0';
                        block++; num_block++;
                    mcb_seg = next_seg;
                    if (mcb->type == 'M')
                        mcb = (MCB far *) MK_FP(next_seg, 0);
            /* walk device chain looking for non-builtin drivers */
            blk = get_num_block_dev(dev);
            do {
                MCB far *dev_mcb;
                if ((FP_SEG(dev) != dos_ds) && (FP_SEG(dev) != io_seg))
                    block->start = (((DWORD) FP_SEG(dev)) << 4) + FP_OFF(dev);
                    dev_mcb = (MCB far *) MK_FP(FP_SEG(dev)-1,0);
                    if (dev_mcb->owner == 8)
                        dev = dev->next;
                    if (dev_mcb->type == 'M')
                        block->end = block->start + ((DWORD) dev_mcb->size << 4);
                        block->end = (DWORD) -1;
                    if (IS_CHAR_DEV(dev))
                        _fstrncpy(block->name, dev->u.name, 8);
                        block->name[8] = '\0';
                        blk -= dev->u.blk_cnt; // block drivers in reverse order
                        block->name[0] = blk + 'A';
                        block->name[1] = ':';
                        block->name[2] = '\0';
                    block++; num_block++;
                dev = dev->next;
            } while(FP_OFF(dev->next) != (WORD) -1);
            if (_osmajor >= 5)
            qsort(map, num_block, sizeof(BLOCK), cmp_func);
            for (i=0, block=map; i< num_block-1; i++, block++)
                if (block->end == (DWORD) -1)
                    block->end = map[i+1].start;
            if (block->end == (DWORD) -1)   // last one
                block->end = 0xFFFFFL;
            did_init = 1;
        char *find_owner(DWORD lin_addr)
            BLOCK *block;
            int i;
            if (! did_init) do_init();
            for (i=0, block=map; i < num_block; i++, block++)
                if ((lin_addr >= block->start) &&
                    (lin_addr <= block->end))
                    return block->name;
            /* still here */
            return (char *) 0;
        #ifdef TESTING
            BLOCK *block;
            int i;
            for (i=0, block=map; i < num_block; i++, block++)
                printf("%08lX   %08lX   %s\n",
                    block->start, block->end, block->name);

With the exception of unused interrupt vectors and those (such as INT 1Eh) that point to data rather than code, you can take addresses displayed by INTVECT and unassemble them to see how a given interrupt is handled. As an example, Figure 6-1 shows INT 29h, which is the undocumented Fast Console Output function, located by default in the CON driver provided by IO.SYS.

Figure 6-1: Default Implementation of INT 29h

        C:\UNDOC2\CHAP6>intvect 29
        INT 29h   0070:0762   IO         

        -u 70:762
        0070:0762 50            PUSH  AX
        0070:0763 56            PUSH  SI
        0070:0764 57            PUSH  DI
        0070:0765 55            PUSH  BP
        0070:0766 53            PUSH  BX
        0070:0767 B40E          MOV   AH,0E
        0070:0769 BB0700        MOV   BX,0007
        0070:076C CD10          INT   10
        0070:076E 5B            POP   BX
        0070:076F 5D            POP   BP
        0070:0770 5F            POP   DI
        0070:0771 5E            POP   SI
        0070:0772 58            POP   AX
        0070:0773 CF            IRET        

That is very straightforward. INT 29h here is just a wrapper around INT 10h AH=0Eh, which is the ROM BIOS function to write a character in teletype mode.

Of course, things are never quite that simple. For example, if you install ANSI.SYS, which is a replacement CON driver, INT 29h points somewhere else:

        C:\UNDOC2\CHAP6>intvect 29
        INT 29h   0070:0762

        C:\UNDOC2\CHAP6>\undoc2\chap7\devlod \dos\ansi.sys

        C:\UNDOC2\CHAP6>intvect 29
        INT 29h   6EB3:0510   DEVLOD 

Because we loaded ANSI.SYS using DEVLOD, the INTVECT program shows DEVLOD as the owner of the interrupt vector; the owner, of course, is actually the new CON driver in ANSI.SYS. Now the code at 6EB3:0510 is no longer just a wrapper around an INT 10h call. Instead, it directly manipulates video memory at segment B800h and contains special handling for ANSI escape control codes. Showing the code here would take us too far afield, even for a chapter such as this that rambles more-or-less aimlessly through the DOS code. The point anyway is merely that the INTVECT program, simple as it is, can help us point DEBUG at useful segment:offset addresses to unassemble.

But there's a major problem here. Recall that we are interested in looking at the DOS INT 21h and INT 2Fh handlers. INTVECT can of course print out the addresses of the INT 21h and INT 2Fh handlers:

        C:\UNDOC2\CHAP6>intvect 21 2f
        INT 21h   0F93:32B6   MSCDEX
        INT 2Fh   1305:0285   DOSKEY  

However, as INTVECT indicates, these interrupt vectors point, not to DOS, but to DOS add-ins such as MSCDEX and DOSKEY. In fact, it is practically guaranteed that, except on the lamest, freshly booted, stripped-down system with no AUTOEXEC.BAT or CONFIG.SYS file, INT 21h, INT 2Fh, and many other DOS interrupt vectors won't point into DOS. The INT 21h and INT 2Fh vectors are pointing at one of the plug-in subsystems rather than at the DOS motherboard.

Of course, if you're interested in examining MSCDEX's INT 21h handler or DOSKEY's INT 2Fh handler, the INTVECT results are very useful. They provide all the information needed by a debugger such as DEBUG or SYMDEB (a handy debugger that Microsoft once included with the Windows SDK). For example, by using DEBUG or SYMDEB to unassemble the 1305:0285 address displayed by INTVECT for INT 2Fh, we can see that DOSKEY watches for the Windows and task-switcher initialization broadcasts (INT 2Fh AX=1605h and AX=4B05h). DOSKEY clearly uses the same piece of code (here, at offset 0299h) to handle both calls. We can also see confirmation that, as documented in Microsoft's MS-DOS Programmer's Reference, DOSKEY responds to INT 2Fh AH=48h calls:

        C:\UNDOC2\CHAP6>intvect 2f
        INT 2Fh   1305:0285   DOSKEY
        -u 1305:0285
        1305:0285 3D0516         CMP    AX,1605
        1305:0288 740F           JZ     0299
        1305:028A 3D054B         CMP    AX,4B05
        1305:028D 740A           JZ     0299
        1305:028F 80FC48         CMP    AH,48
        1305:0292 741B           JZ     02AF
        1305:0294 2EFF2E5F02     JMP    FAR CS:[025F]
        ; ...

But if, for example, we want to see MSCDEX's INT 2Fh handler rather than DOSKEY's, and if DOSKEY is loaded after MSCDEX, INTVECT is of no use. (Note, however, that unlike MSDOS.SYS and IO.SYS, programs such as MSCDEX.EXE and DOSKEY.EXE are easy to disassemble on disk with a program such as Sourcer from V Communications.)

More important, INTVECT doesn't help us get the address of what we might call The One True INT 21h Handler inside MSDOS.SYS. Nor does it help with finding the original INT 2Fh handlers inside MSDOS.SYS and IO.SYS.

Why? Because interrupts are handled in a kind of last-in, -out (LIFO) stack. The point was made at the beginning of this chapter that the DOS philosophy is to provide the bare minimum operating system services, along with facilities for extending DOS. As discussed in greater detail in chapter 9 on TSRs, one of the keys to extending DOS is INT 21h AH=25h, the DOS Set Interrupt Vector function. Along with the Get Interrupt Vector function (AH=35h), the Set Vector function allows the creation of what are called interrupt chains, which are essentially linked lists (or LIFO stacks) of code. An interrupt chain consists of two or more pieces of code that handle the same interrupt. The following code fragment, adapted from the FUNC0E32 and DOSVER programs in listings 2-20 and 2-21, illustrates this:

        void (interrupt far *prev)();           // ptr to previous handler in chain
        prev = _dos_getvect(0x21);              // call 21/35 -- get previous
        _dos_setvect(0x21, my_int21_handler);   // call 21/25 -- set new
        // ...
        void interrupt far my_int21_handler(REG_PARAMS r)
            // look at AH to see if we're interested
            // ...
            _chain_intr(prev);  // pass interrupt down to previous owner in chain

The _chain_intr() does a far JMP to the previous interrupt handler in the chain, without returning. It is important to note that sometimes interrupt handlers CALL, rather than JMP to, the previous handler. This allows a handler to post-process the interrupt after the previous handler has done its work, rather than pre-processing the interrupt beforehand, which is what happens in the more typical JMP style of interrupt chaining. Sometimes the JMP-style code is called a front-end handler, and the CALL-style code is called a back-end handler.

It is especially important that INT 21h AH=25h and 35h allow even INT 21h itself to be hooked. This is a source of tremendous flexibility in DOS, but it also makes it difficult for us to find The One True INT 21h Handler. Calling INT 21h AX=3521h returns the head of the INT 21h linked list, that is, the address of the most recently installed INT 21h handler. This might conceivably be the genuine DOS INT 21h handler, but more likely it belongs to MSCDEX, NETX, or perhaps even something as dumb as the FUNC0E32 or DOSVER programs from chapter 2. INT 21h AX=35h simply returns the head of an interrupt chain. Finding the original INT 21h or INT 2Fh handler belonging to DOS usually requires finding the chain's tail. (Usually rather than always, because there might be back-end handlers.)

How can we find the actual INT 21h and INT 2Fh handlers provided by DOS itself, when all we have is the address of the head of the INT 21h or INT 2Fh interrupt chain? There is unfortunately no function that returns the tail of an interrupt chain. And while there is an undocumented DOS function (INT 2Fh AX=1203h) to return the DOS data segment, there is no equivalent function that returns the DOS code segment (which, remember, may well be in the HMA).

One solution would of course be to boot on an absolutely bare-bones system and hope that INT 21h and INT 2Fh point to the original MS-DOS handlers, thereby bypassing the whole problem of how to follow interrupt chains. Or you could write a device driver to keep track of interrupts, and install it very early in DOS initialization. But this is ridiculous! Clearly there must be some way to follow the interrupt chain, as the processor does this many times a second.

Unfortunately, there is no standard mechanism for interrupt chaining. IBM and Microsoft at one point put forward a specification for this purpose (David Thielen described it in detail in Microsoft Systems Journal , July 1991, pp. 24-25), but unfortunately no one seems to use it. Ralf Brown has proposed an INT 2Dh protocol (described in the Interrupt List on disk) to combat the extremely long interrupt chains that currently plague INT 2Fh, but again you can't rely on programs to do the right thing and use this protocol.

Tracing a DOS INT 21h Call

It turns out that Microsoft provides, with every copy of DOS, an almost perfect solution to the problem of finding the actual DOS INT 21h and INT 2Fh handlers. The solution is none other than DEBUG.

Like most debuggers, DEBUG has an a command to assemble instructions on the fly, and a t command for tracing into (as opposed to stepping over) instructions. Even better, unlike some otherwise more sophisticated debuggers, the t command in DEBUG can trace into an INT instruction. For the purposes of trace, in other words, DEBUG does not treat INT as an atomic operation:

        C:\UNDOC2\CHAP6>intvect 21
        INT 21h   0F93:32B6   MSCDEX
        19B5:0100 mov ah, 62
        19B5:0102 int 21
        19B5:0104 ret

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFEE  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=19B5  IP=0102   NV UP EI PL NZ NA PO NC 
        19B5:0102 CD21          INT 21                                 

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=0F93  IP=32B6   NV UP DI PL NZ NA PO NC 
        0F93:32B6 80FC60        CMP AH,60                              

Notice that pressing t at the INT 21h instruction, took us into the line of the handler at 0F93:32B6, rather than over it to the RET instruction at 19B5:0104. This is exactly what one might expect from pressing t rather than p (proceed); yet because of the way the single step interrupt works on Intel processors (see INTCHAIN.C at listing 6-5 later in this chapter), most debuggers don't behave this way; it's useful that every copy of DOS comes with one that does.

We can use this facility in order to follow the INT 21h or INT 2Fh chain down into the bowels of DOS itself. (Yuck!) All we must do is keep tracing (either by continuously pressing t or by telling DEBUG with a command such as t 16 to trace a certain number of instructions) until the segment:offset returns to DEBUG and our RET instruction (which, in the example above, is located at 19C7:0104). This will surely locate the actual DOS INT 21h or INT 2Fh handler.

However, the astute reader may wish to interject right now, before we go any further, that using DEBUG to trace into INT 21h "won't work" because DEBUG itself uses DOS, and DOS, as we all know, is not reentrant. This is absolutely true; a debugger that does not use DOS, such as Nu-Mega's Soft-ICE, is better suited than DEBUG to tracing through DOS.

However, there are a handful of DOS functions that are reentrant, at least for the purposes of tracing with DEBUG. By examining the DOS code for INT 21h, we will soon see precisely what this reentrancy or lack thereof means. In the meantime, simply take it on faith that the DOS INT 21h functions shown below in table 6-1, are (with an important caveat that we'll get to) reentrant, and thus traceable using DEBUG, SYMDEB, or any other debugger that uses DOS. With the exception of the undocumented INT 21h AH=64h, note that these are among the INT 21h functions that Microsoft ( MS-DOS Programmer's Reference, chapter 7) lists as callable from a critical error handler.

Table 6-1: Reentrant MS-DOS Functions

It is desirable for MS-DOS to single out the Get and Set PSP functions for special treatment, because this means that interrupt handlers can freely call these process-manipulation functions (see chapter 9 on TSRs). But it is not at all obvious why functions 33h and 64h merit this special attention. It would seem that other functions, such as AH=25h and AH=35h to get and set interrupt vectors, might be more useful. On the other hand, including function 33h here means that interrupt handlers can freely get and set the DOS BREAK= flag.

Let us now use DEBUG to trace into a call to one of these functions, INT 21h AH=62h (Get PSP), and see exactly what occurs when this function is called under DOS 6.0, in a configuration with a few standard DOS TSRs such as MSCDEX and DOSKEY. The documentation states that function 62h takes no parameters other than the number 62h in AH, and that the function returns the current PSP in BX. You can probably guess that the DOS implementation for this function is rather simple, doing little more than loading BX from the CURR_PSP location in the DOS data segment. This location corresponds to offset 10h in the Swappable Data Area (SDA; see INT 21h AX=5D06h in the appendix). However, as you'll see, the processor executes a lot of code before DOS eventually gets to the point of carrying out the otherwise simple Get PSP operation.

As noted earlier, the key facility DEBUG provides here is that (unlike SYMDEB, for example) it traces into the INT instruction. In Figure 6-2, comments have been added to the following DEBUG output, using ;;; to make them stand out

Figure 6-2: Starting to Trace a Call to INT 21h AH=62h

        19B5:0100 mov ah, 62
        19B5:0102 int 21
        19B5:0104 ret

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFEE  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=19B5  IP=0102   NV UP EI PL NZ NA PO NC 
        19B5:0102 CD21          INT 21

        ;;; We have to keep tracing until the segment:offset comes 
        back to 
        ;;; our own code, the RET instruction at 19B5:0104.

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000
        DS=19B5  ES=19B5  SS=19B5  CS=0F93  IP=32B6   NV UP DI PL NZ NA PO NC
        0F93:32B6 80FC60        CMP AH,60

        ;;; Running MEM /D showed that above is MSCDEX. This is consistent
        ;;; with output from INTVECT program. Apparently MSCDEX is interested
        ;;; in the undocumented DOS INT 21h AH=60h (Truename) function. Note that
        ;;; we were running MSCDEX /S (for network sharing); usually MSCDEX doesn't
        ;;; care about the INT 21h AH=60h call.

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=0F93  IP=32B9   NV UP DI PL NZ NA PO NC 
        0F93:32B9 7405          JZ  32C0

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=0F93  IP=32BB   NV UP DI PL NZ NA PO NC 
        0F93:32BB 2E            CS:
        0F93:32BC FF2EB232      JMP FAR [32B2]                         CS:32B2=15FA

        ;;; MSCDEX decided it's not interested in our call to 21/62, so it chains
        ;;; to the previous handler, whose address it earlier retrieved (by
        ;;; calling 21/35) and saved away (apparently in CS:32B2) before installing 
        ;;; (with 21/25) its own INT 21h handler.

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=07F9  IP=15FA   NV UP DI PL NZ NA PO NC 
        07F9:15FA 80FC3F        CMP AH,3F

        ;;; We're now in the previous INT 21h handler. MEM /D shows that
        ;;; 07F9:15FA is SMARTDRV. Here, it's (reasonably enough) interested in
        ;;; whether we've called INT 21h AH=3Fh to read from a file (SMARTDRV)
        ;;; wants to see if the data we want from the file is actually 
        ;;; in its cache). But we called 21/62 not 21/3F so...

Well, you get the idea. Running DEBUG this way is a bit tedious, and saving its output to a file is difficult. As an improvement, you can drive DEBUG with input scripts, such as 2162.SCR in listing 6-3, and redirect its output to a file. (For a lengthy discussion of DEBUG scripts, see PC Magazine DOS Power Tools, 2nd edition, chapter 9.) Furthermore, rather than repeatedly hitting t to trace (single step) the next instruction, you can give the trace command a numeric parameter (for example, t 16 or t 32 ) to trace a series of instructions.

Listing 6-3: 2162.SCR

        C:\UNDOC2\CHAP6>type 2162.scr
        mov ah, 62
        int 21
        ; blank line below is crucial to leave assembly mode!

        t 100

The only problem is in guessing how many instructions to trace; if you ask DEBUG to trace too far, it starts executing garbage. You only want to trace until you return to the RET instruction you assembled, or at least not much past it. The best bet is try t 16, examine DEBUG's output to see if the traced instructions come back, then try t 32, examine the output again, and so on. In any case, t 100 happened to work here; a larger number would be needed on machines with more TSRs that hook INT 21h installed.

The only problem is in guessing how many instructions to trace; if you ask DEBUG to trace too far, it starts executing garbage. You only want to trace until you return to the RET instruction you assembled, or at least not much past it. The best bet is try t 16, examine DEBUG's output to see if the traced instructions come back, then try t 32, examine the output again, and so on. In any case, t 100 happened to work here; a larger number would be needed on machines with more TSRs that hook INT 21h installed.

Figure 6-3 shows a complete trace into an INT 21h AH=62h call, from the time we issued the INT 21h until DOS returns to us with the current PSP in BX. Normally all that you see (or want to see!) of an INT 21h call is your input and its output. But figure 6-3 views the DOS call "through the looking glass," as it were. Instead of looking down at DOS, you'll be inside DOS looking up at the INT 21h call. This can be slightly disorienting at , but in the end you'll have a much better > understanding of what DOS is all about.

Figure 6-3: Tracing a Call to INT 21h AH=62h

        < B >C:\UNDOC2\CHAP6>debug < 2162.scr > 2162.out

        C:\UNDOC2\CHAP6>type 2162.out
        19B5:0100 mov ah, 62
        19B5:0102 int 21
        19B5:0104 ret
        -t 106

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFEE  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=19B5  IP=0102   NV UP EI PL NZ NA PO NC 
        19B5:0102 CD21          INT 21

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=0F93  IP=32B6   NV UP DI PL NZ NA PO NC 
        0F93:32B6 80FC60        CMP AH,60

        ;;; As before (figure 6-2), we're in MSCDEX /S now.

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFE8  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=0F93  IP=32B9   NV UP DI PL NZ NA PO NC 
        0F93:32B9 7405          JZ  32C0 

        ;;; The AX=xxxx BX=xxxx etc. dump that DEBUG shows each time usually
        ;;; isn't important here, so from now on we'll omit it (and blank lines)
        ;;; except when the register dump is useful.

        0F93:32BB 2E            CS:                                    
        0F93:32BC FF2EB232      JMP FAR [32B2]                         CS:32B2=15FA
        07F9:15FA 80FC3F        CMP AH,3F

        ;;; As before, we're in SMARTDRV now.

        07F9:15FD 7414          JZ  1613
        07F9:15FF 80FC0D        CMP AH,0D
        07F9:1602 7426          JZ  162A
        07F9:1604 3D1325        CMP AX,2513
        07F9:1607 7451          JZ  165A
        07F9:1609 80FC68        CMP AH,68
        07F9:160C 7442          JZ  1650

        ;;; Above provides a catalog of the DOS INT 21h function calls that
        ;;; SMARTDRV cares about:  3Fh (read file), 0Dh (disk reset), 2513h
        ;;; (set INT 13h vector), 68h (commit file). All this makes sense.
        ;;; For example, SMARTDRV uses 21/0D as a signal to flush the cache.
        ;;; For some calls such as 21/0D, SMARTDRV doesn't JMP to the previous
        ;;; handler; instead, it does a far CALL and examines the 21/0D on
        ;;; the way back.

        07F9:160E 2E            CS:                                    
        07F9:160F FF2E1423      JMP FAR [2314]                         CS:2314=0800

        ;;; We called 21/62, SMARTDRV doesn't care, so SMARTDRV chains to
        ;;; previous handler, C801:0800, which SMARTDRV earlier got from 
        ;;; calling 21/35 before installing its own 21 handler with 21/25, and 
        ;;; which is stored in CS:2314.

        C801:0800 9C            PUSHF

        ;;; Was running with DOS=UMB, so some INT 21h handlers are running
        ;;; in upper memory. Don't know who the owner of this is!

        C801:0801 FB            STI
        C801:0802 3D0258        CMP AX,5802
        C801:0805 7413          JZ  081A
        C801:0807 3D0358        CMP AX,5803
        C801:080A 7431          JZ  083D
        C801:080C 80FC31        CMP AH,31
        C801:080F 7503          JNZ 0814
        C801:0814 9D            POPF

        ;;; We can see that this handler cares about calls to INT 21h functions
        ;;; 5802h (Get UBM Link), 5803h (Set UMB Link), 31h (TSR). Wonder why.
        ;;; Anyway, we called 21/62, the handler isn't interested in that, so it
        ;;; chains to the previous handler.

        C801:0815 2E            CS:                                    
        C801:0816 FF2ECE01      JMP FAR [01CE]                         CS:01CE=0023
        0255:0023 EA8E052ECC    JMP CC2E:058E                          

        ;;; DEV shows that seg 0255h is a a block-mode device driver for
        ;;; D: through I: -- it is a low-memory stub for DoubleSpace, located in
        ;;; high memory. Stacker uses the same area; both have signatures at
        ;;; 0255:0000. DEV also shows that CC2E:058E is DBLSSYS$ (DoubleSpace).

        CC2E:058E 9C            PUSHF
        CC2E:058F FB            STI
        CC2E:0590 FC            CLD
        CC2E:0591 1E            PUSH    DS
        CC2E:0592 0E            PUSH    CS
        CC2E:0593 1F            POP DS
        CC2E:0594 C606C20700    MOV BYTE PTR [07C2],00                 DS:07C2=00
        CC2E:0599 53            PUSH    BX
        CC2E:059A 8ADC          MOV BL,AH
        CC2E:059C 80FB6C        CMP BL,6C
        CC2E:059F 7759          JA  05FA
        CC2E:05A1 32FF          XOR BH,BH
        CC2E:05A3 8A9F1305      MOV BL,[BX+0513]                       DS:0575=00
        CC2E:05A7 FFA78005      JMP [BX+0580]                          DS:0580=05FA

        ;;; DoubleSpace is sufficiently tied into DOS that it uses a jump table to
        ;;; store a handler for every DOS function. The table at CC2E:0513 holds
        ;;; byte offsets into code at CC2E:0580. Most DOS functions (including
        ;;; our 21/62 call) are just passed on. Examining the table with the FTAB
        ;;; program from later in this chapter shows that DoubleSpace cares
        ;;; about the following INT 21h functions:  00, 0A, 0D, 10, 13, 17, 25, 31, 
        ;;; 36, 39, 3A, 3E, 41, 43, 4B, 4C, 56, 57, 5D, 68. We know this from 
        ;;; running "ftab cc2e:0513 6d DSI21 1 | grep -v 00". For example, it hooks
        ;;; 21/25 because (like SMARTDRV) it wants to know whenever someone sets the
        ;;; INT 13h (BIOS Disk) vector.

        CC2E:05FA 5B            POP BX
        CC2E:05FB 1F            POP DS
        CC2E:05FC 9D            POPF
        CC2E:05FD 2E            CS:
        CC2E:05FE FF2E0005      JMP FAR [0500]                         CS:0500=109E

        ;;; Trivial handling for our 21/62 call. Just pass it on to previous
        ;;; handler for INT 21h...

        0116:109E 90            NOP

        ;;; MEM /D shows that 0116h is MS-DOS. Finally!

        0116:109F 90            NOP
        0116:10A0 E8CC00        CALL    116F

        ;;; Hmm, DOS is calling some subroutine (which we've traced into):

        0116:116F 9C            PUSHF
        0116:1170 1E            PUSH    DS
        0116:1171 06            PUSH    ES
        0116:1172 51            PUSH    CX
        0116:1173 56            PUSH    SI
        0116:1174 57            PUSH    DI

        ;;; We need to see the registers for the next few instructions.
        ;;; Note what happens to DS and ES

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFDA  BP=0000  SI=0000  DI=0000  
        DS=19B5  ES=19B5  SS=19B5  CS=0116  IP=1175   NV UP DI NG NZ AC PE CY 
        0116:1175 2E            CS:                                    
        0116:1176 C5366711      LDS SI,[1167]                          CS:1167=0080

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFDA  BP=0000  SI=0080  DI=0000  
        DS=0000  ES=19B5  SS=19B5  CS=0116  IP=117A   NV UP DI NG NZ AC PE CY 
        0116:117A 2E            CS:                                    
        0116:117B C43E6B11      LES DI,[116B]                          CS:116B=0090

        AX=6200  BX=0000  CX=0000  DX=0000  SP=FFDA  BP=0000  SI=0080  DI=0090  
        DS=0000  ES=FFFF  SS=19B5  CS=0116  IP=117F   NV UP DI NG NZ AC PE CY 
        0116:117F B90400        MOV CX,0004

        0116:1182 FC            CLD
        0116:1183 F3            REPZ
        0116:1184 A7            CMPSW
        0116:1185 7407          JZ  118E

        ;;; DOS has just compared 8 bytes (4 words) at DS:SI (0000:0080) and
        ;;; ES:DI (FFFF:0090). If they are identical, DOS jumps somewhere.
        ;;; What is this?! This particular run of DEBUG was conducted with
        ;;; DOS=HIGH. DOS is in the HMA, which is only reachable when the
        ;;; machine's A20 address line is enabled. DOS is comparing 0000:0080
        ;;; and FFFF:0090 because, if the 8 bytes at these two addresses are
        ;;; identical, it assumes that memory addresses are wrapping around, and
        ;;; therefore that A20 is off. DOS can't call routines in the HMA if A20
        ;;; is off. Thus, even when DOS=HIGH there must be a low-memory stub; the
        ;;; code at 0116:109E is that stub, which ensures that A20 is enabled before
        ;;; calling DOS in the HMA. Here, A20 was already on (0000:0080 and
        ;;; FFFF:0090 were different), but A20 has been off, we would
        ;;; have jumped to the subroutine at 0116:118E, whose job
        ;;; is to enable A20 (by calling XMS function 5, Local Enable A20).
        ;;; If that function call succeeds, DOS will jump back here, just as if
        ;;; A20 had been enabled all along. If that function call fails, we're
        ;;; in big trouble: DOS uses INT 10h AH=0Eh to display "A20 Hardware
        ;;; Error", and goes into a dynamic halt. We'll come back to this
        ;;; code later. Right now, A20 is enabled so...

        0116:1187 5F            POP DI
        0116:1188 5E            POP SI
        0116:1189 59            POP CX
        0116:118A 07            POP ES
        0116:118B 1F            POP DS
        0116:118C 9D            POPF
        0116:118D C3            RET
        0116:10A3 2E            CS:                                    
        0116:10A4 FF2E6A10      JMP FAR [106A]                         CS:106A=40F8

        ;;; The low-memory stub for DOS knows it can jump to DOS in the HMA, and
        ;;; here we go:

        FDC8:40F8 FA            CLI

        ;;; We are now in The One True INT 21h Handler. That this is at
        ;;; FDC8:40F8 in this particular configuration is the one piece of
        ;;; information we're after here, because now we can go and unassemble
        ;;; (rather than trace) at that address. Static unassembly is 
        ;;; generally easier than dynamic tracing. But let's see the thing
        ;;; through, to learn exactly how 21/62 is handled...

        FDC8:40F9 80FC6C        CMP AH,6C
        FDC8:40FC 77D2          JA  40D0

        ;;; Any INT 21h function > 6Ch is an error. ("In DOS 7.0,
        ;;; the upper limit is 72h," writes one tech reviewer.)

        FDC8:40FE 80FC33        CMP AH,33
        FDC8:4101 7218          JB  411B

        ;;; Any INT 21h function < 33h will be handled at FDC8:411B.

        FDC8:4103 74A2          JZ  40A7

        ;;; 21/33 is special:  it is handled at FDC8:40A7 (in this configuration)

        FDC8:4105 80FC64        CMP AH,64
        FDC8:4108 7711          JA  411B

        ;;; Any INT 21h function > 64h will also be handled at FDC8:411B; 
        ;;; seems like 411B is the handler for "normal" DOS calls.

        FDC8:410A 74B5          JZ  40C1

        ;;; 21/64 is another special function, handled here at FDC8:40C1

        FDC8:410C 80FC51        CMP AH,51
        FDC8:410F 74A4          JZ  40B5
        FDC8:4111 80FC62        CMP AH,62
        FDC8:4114 749F          JZ  40B5

        ;;; Finally! DOS sees our 21/62 call, and will handle it by jumping to
        ;;; FDC8:40B5. Notice that the same code also handles calls to 21/51, which 
        ;;; makes sense, since the two functions are documented as being identical.

        FDC8:40B5 1E            PUSH    DS
        FDC8:40B6 2E            CS:
        FDC8:40B7 8E1EE73D      MOV DS,[3DE7]                          CS:3DE7=0116

        ;;; DOS DS (0116h) is stored in a variable kept at CS:3DE7. This is 
        ;;; the segment where things like SysVars and SDA live. This value is
        ;;; also returned from 2F/1203 (see appendix).

        FDC8:40BB 8B1E3003      MOV BX,[0330]                          DS:0330=1408

        ;;; Believe it or not, the line above is actually the Get PSP function!
        ;;; We know that DOS keeps the current PSP at SDA+10h. In this 
        ;;; configuration, 21/5D06 (Get SDA) returns 0116:0320. The Get PSP
        ;;; function just moves the WORD at 0116:0330 into BX. In other words, 
        ;;; 21/62 (and 21/51) just return the WORD from SDA+10h. Duh.

        FDC8:40BF 1F            POP DS
        FDC8:40C0 CF            IRET

        ;;; DOS IRETs back to our code running in DEBUG

        19B5:0104 C3            RET

        ;;; This is the RET statement in our DEBUG script.

        19B5:0000 CD20          INT 20
        0116:1094 90            NOP

        ;;; Our script has already returned to DEBUG, which did an INT 20h return
        ;;; to DOS. At this point, we start tracing all sorts of things we don't
        ;;;  care about. If we trace too far, we start to make DEBUG execute
        ;;; garbage, which can hang the machine.

The most noticeable feature of the INT 21h trace in figure 6-3 is the way that DOS extensions such as SMARTDRV and MSCDEX become indistinguishable from DOS itself. If any non-Microsoft DOS extensions such as Novell NetWare or Stacker had been running, they too would have appeared in the INT 21h chain, looking not a bit different from any of the Microsoft-provided software in the chain. The walk through the INT 21h chain in figure 6-3 thus presents an excellent illustration of what DOS really is.

Unassembling the Get/Set PSP Functions

As you can see, under normal circumstances with a few TSRs loaded, you have to wade through a lot just to get to the single line of code that actually performs the DOS Get PSP function. It should now be clear why INT 21h is called an interrupt "chain." As you'll see later, the INT 2Fh chain is typically much longer than the INT 21h chain. Given the overhead of INT 21h on a typical machine, programmers might even consider writing their own Get PSP calls to bypass this long interrupt chain. Seeing how DOS implements Get PSP (when it eventually gets there!), you can also see how to implement your own:

        // uses get_sda() from GETSDA.C (listing 3-4a)
        WORD my_get_psp(void)
            static WORD far *psp_ptr = (WORD far *) 0;
            if (! psp_ptr)                  // one-time init
                psp_ptr = (WORD far *) (get_sda() + 0x10);
            return *psp_ptr;

Of course, this would cut out any TSRs or drivers that might actually need to see and respond to DOS Get PSP calls.

Having already seen the code that handles the Get PSP function (INT 21h AH=51h and 62h), we might as well also examine the code for Set PSP, though we can guess what it's going to look like (we'll see later in figure 6-7 where the 40A9h address comes from):

Figure 6-4: Implementation of INT 21h AH=50h (Set PSP) in MS-DOS 6.0

        -u fdc8:40a9
        FDC8:40A9 1E            PUSH    DS          ; save caller's DS
        FDC8:40AA 2E            CS:
        FDC8:40AB 8E1EE73D      MOV DS,[3DE7]       ; switch to DOS DS
        FDC8:40AF 891E3003      MOV [0330],BX       ; put caller's BX into CURR_PSP
        FDC8:40B3 1F            POP DS              ; restore caller's DS
        FDC8:40B4 CF            IRET                ; done!

In other words, the Get and Set PSP functions just manipulate this word at offset 330h in the DOS data segment (offset 10h in the SDA). This provides a small taste of how DOS internally uses such externally-visible structures as SysVars and the SDA. Thus:

        void my_set_psp(WORD psp)
            static WORD far *psp_ptr = (WORD far *) 0;
            if (! psp_ptr)                  // one-time init
                psp_ptr = (WORD far *) (get_sda() + 0x10);
            *psp_ptr = psp;

Unassembling INT 21h AH=33h

A glance towards the end of the DEBUG output in figure 6-3 shows that MS-DOS special-cases a handful of functions: 33h, 51h, 62h, 64h, and (not shown in figure 6-3) 50h. These functions correspond to the reentrant DOS functions listed in table 6-1 above. While we're still not quite in a position to understand what makes these functions different from all other DOS functions, we do at any rate now have a bunch of addresses that we can unassemble. Recall that this was our goal in tracing through DOS.

For example, INT 21h AH=33h is an omnibus function with a number of subfunctions relating to Ctrl-Break, the Boot Drive, and the DOS Version. For example, setting BREAK=ON ends up calling INT 21h AX=3300h with DL=1. In this configuration, code at FDC8:40A7 handles this function:

        FDC8:40FE 80FC33        CMP AH,33
        FDC8:4101 7218          JB  411B
        FDC8:4103 74A2          JZ  40A7

We can now unassemble (rather than trace) at this address, using DEBUG or any other DOS debugger. Comments have been added to the output in figure 6-5, which has also been cleaned up slightly.

Figure 6-5: Implementation of INT 21h AH=33h in MS-DOS 6.0

        -u fdc8:40a7
        FDC8:40A7 EBA9          JMP 4052

        -u fdc8:4052
        FDC8:4052 3C06          CMP AL,06       ; functions 3300h through 3306h
        FDC8:4054 7603          JBE 4059
        FDC8:4056 B0FF          MOV AL,FF       ; error: subfunction number too high
        FDC8:4058 CF            IRET
        FDC8:4059 1E            PUSH    DS      ; save caller's DS
        FDC8:405A 2E            CS:                                    
        FDC8:405B 8E1EE73D      MOV DS,[3DE7]   ; switch to DOS's DS; hmm, not truly
        FDC8:405F 50            PUSH    AX      ;     reentrant after all!
        FDC8:4060 56            PUSH    SI
        FDC8:4061 BE3703        MOV SI,0337     ; offset of break flag:  SDA+17h
        FDC8:4064 32E4          XOR AH,AH       ; see if subfunct 0
        FDC8:4066 0BC0          OR  AX,AX
        FDC8:4068 7504          JNZ 406E
        FDC8:406A 8A14          MOV DL,[SI]     ; 21/3300 -- get break flag
        FDC8:406C EB35          JMP 40A3
        FDC8:406E 48            DEC AX          ; see if subfunct 1
        FDC8:406F 7507          JNZ 4078                               
        FDC8:4071 80E201        AND DL,01                              
        FDC8:4074 8814          MOV [SI],DL     ; 21/3301 -- set break flag
        FDC8:4076 EB2B          JMP 40A3
        FDC8:4078 48            DEC AX          ; see if subfunct 2
        FDC8:4079 7507          JNZ 4082
        FDC8:407B 80E201        AND DL,01                              
        FDC8:407E 8614          XCHG DL,[SI]    ; 21/3302 (UNDOC) -- get/set brk flg
        FDC8:4080 EB21          JMP 40A3        ;   as single atomic operation: XCHG
        FDC8:4082 3D0300        CMP AX,0003 ; see if subfnc 5 (already subtracted 2)
        FDC8:4085 7506          JNZ 408D                               
        FDC8:4087 8A166900      MOV DL,[0069]   ; 21/3305 -- get startup drive
        FDC8:408B EB16          JMP 40A3
        FDC8:408D 3D0400        CMP AX,0004 ; see if subfnc 6 (already subtracted 2)
        FDC8:4090 7511          JNZ 40A3
        FDC8:4092 BB0600        MOV BX,0006     ; 21/3306 -- MS-DOS version 6.0
        FDC8:4095 B200          MOV DL,00
        FDC8:4097 32F6          XOR DH,DH                              
        FDC8:4099 803E111200    CMP BYTE PTR [1211],00  ; is DOS=HIGH?
        FDC8:409E 7403          JZ  40A3                               
        FDC8:40A0 80CE10        OR  DH,10       ; DOSINHMA flag
        FDC8:40A3 5E            POP SI          ; done: restore caller's regs
        FDC8:40A4 58            POP AX
        FDC8:40A5 1F            POP DS
        FDC8:40A6 CF            IRET            ; return to caller

In addition to showing how DOS happens to handle function 33h, the code in figure 6-5 also provides many snippets of information than can be used to understand the disassembly listing of other parts of MS-DOS. For example, Microsoft documents INT 21h AX=3306h as returning the DOSINHMA flag in DH. The end of figure 6-5 shows DOS using the byte at DOS_DS:[1211h] to set DH. Therefore, DOS_DS:[1211h] must be the DOS=HIGH indicator. This is not important by itself, but you can use this factoid to help you understand other parts of the code: anywhere you see DOS:DS:[1211h], you now know that this is the DOSINHMA flag.

Similarly, functions 3300h and 3301h are known to get and set the Ctrl-C flag; figure 6-5 shows these functions manipulating the byte at offset 0337h in the DOS data segment; this byte must then be the Ctrl-C (or break) flag. (Later on, at step in figure 6-7, we'll see how DOS uses this flag.) Finally, Microsoft documents INT 21h AX=3305h as returning the startup drive in DL, and the code in figure 6-5 clearly shows DOS setting DL from DOS_DS:[0069h]. Therefore, anywhere else in the code where you see DOS_DS:[0069h], you can now translate this to STARTUP_DRIVE. Q.E.D.

Examining the Low-Memory Stub for DOS=HIGH

Another interesting location to examine is the function that DOS's low memory stub calls when DOS=HIGH, but the A20 line is disabled. The processor's A20 address line accesses memory above one megabyte. PCs based on 286 and higher processors disable A20 in order to emulate address wraparound on 8088 PCs. If DOS=HIGH but A20 is off, DOS must enable A20 before it can reach its code in HMA above one megabyte. But if DOS's code is located above one megabyte, how can it check A20 in the place? With a function that it keeps in low memory when DOS=HIGH. Earlier (figure 6-3) you saw this was located at 0116:118E; figure 6-6 shows what this function actually does.

Figure 6-6: DOS Function Called When DOS=HIGH But A20 Is Off

        -u 116:118e
        0116:118E 53            PUSH    BX
        0116:118F 50            PUSH    AX
        0116:1190 8CD0          MOV     AX,SS
        0116:1192 2E            CS:
        0116:1193 A38610        MOV     [1086],AX
        0116:1196 2E            CS:
        0116:1197 89268810      MOV     [1088],SP   ; save caller's stack
        0116:119B 8CC8          MOV     AX,CS       ; switch to a DOS stack; hmm, not
        0116:119D 8ED0          MOV     SS,AX       ;    reentrant at all if A20 off!
        0116:119F BCA007        MOV     SP,07A0     ; SDA+480h=end of Crit Err Stack
        0116:11A2 B405          MOV     AH,05       ; XMS func 5 = Local Enable A20
        0116:11A4 2E            CS:
        0116:11A5 FF1E6311      CALL    FAR [1163]  ; XMS address from 2F/4310
        0116:11A9 0BC0          OR      AX,AX
        0116:11AB 740F          JZ      11BC        ; failed: can't turn A20 on!!

        ;;; okay:
        0116:11AD 2E            CS:
        0116:11AE A18610        MOV AX,[1086]
        0116:11B1 8ED0          MOV SS,AX
        0116:11B3 2E            CS:
        0116:11B4 8B268810      MOV SP,[1088]   ; switch back to caller's stack
        0116:11B8 58            POP AX
        0116:11B9 5B            POP BX
        0116:11BA EBCB          JMP 1187  ; jump back into normal code (fig. 6-3)
                                    ; as if A20 had been enabled all along.

        ;;; fail:
        0116:11BC B40F          MOV AH,0F       ; come here if couldn't enable A20
        0116:11BE CD10          INT 10          ; get video mode
        0116:11C0 3C07          CMP AL,07
        0116:11C2 7406          JZ  11CA
        0116:11C4 32E4          XOR AH,AH
        0116:11C6 B002          MOV AL,02       ; set normal text mode
        0116:11C8 CD10          INT 10
        0116:11CA B405          MOV AH,05
        0116:11CC 32C0          XOR AL,AL       ; set display page 0
        0116:11CE CD10          INT 10
        0116:11D0 BEB812        MOV SI,12B8     ; 12B8 -> "\nA20 Hardware Error\n$"
        0116:11D3 0E            PUSH    CS
        0116:11D4 1F            POP DS
        0116:11D5 FC            CLD
        0116:11D6 AC            LODSB
        0116:11D7 3C24          CMP AL,24       ; look for '$'
        0116:11D9 7409          JZ  11E4
        0116:11DB B40E          MOV AH,0E       ; write in TTY mode (use BIOS
        0116:11DD BB0700        MOV BX,0007     ;    since can't make DOS calls
        0116:11E0 CD10          INT 10          ;    here!)
        0116:11E2 EBF2          JMP 11D6
        0116:11E4 FB            STI
        0116:11E5 EBFD          JMP 11E4        ; tight little loop (INTs on)

        -d 116:12b8
        0116:12B0                         -0D 0A 41 32 30 20 48 61          ..A20 Ha
        0116:12C0  72 64 77 61 72 65 20 45-72 72 6F 72 0D 0A 24 36  rdware Error..$6

Notice, by the way, that DOS leaves the A20 line on. This reduces the overhead of keeping the DOS code in the HMA: DOS probably doesn't have to call the low-memory stub in figure 6-6 very often.

That all calls to DOS in the HMA are guarded with this low- memory stub brings up an interesting question: what about data in the HMA? MS-DOS doesn't put internal data structures such as the Current Directory Structure (CDS) and System File Tables (SFT) up in the HMA, because this would break too many third-party applications that peek and poke these ostensibly-internal structures and that wouldn't know to ensure that A20 is enabled. However, DOS does keep its BUFFERS in the HMA. If a program such as BUFFERS.C in chapter 8 (see listing 8-8) accesses the DOS sector buffers ("or if some future version of DOS has FILESHIGH or LASTDRIVEHIGH statements that use HMA," adds one tech reviewer), the program would need to check and possible reenable A20, just like DOS does in figure 6-6. But since, from what we've just seen, any trivial DOS call will ensure that A20 is turned on, perhaps a program that accesses data in the HMA merely needs to preface that access with a trivial DOS call: DOS will take care of checking the A20 state and, if necessary, calling XMS function 5 to enable A20. But any TSR could turn it off! How frequently should programs that access the HMA check the A20 state? How much of a problem is this? Is the extra few kbytes gained by putting data in the HMA worth this kind of uncertainty? ("Ouch! This makes my head hurt," says one of the tech reviewers)

Examining the INT 21h Dispatch Function

Of all the addresses we found through tracing the INT 21h call, the most important is that of DOS's INT 21h handler, seen above in figure 6-3 at FDC8:40F8. This is really the piece of information we wanted all along. To see exactly what happens during an INT 21h call, we can now disassemble at this address. By tracing an INT 21h AH=62h, we only saw those snippets that happen to get executed when calling the Get PSP function; we can now look at the entire function. Here it is (figure 6-7), the DOS INT 21h handler (this time we've used SYMDEB and added some labels as well as comments). In Microsoft's source code, this all important function, located in MSDISP.ASM, is called COMMAND.

Figure 6-7: MS-DOS 6.0 INT 21h Dispatch Function

        -u fdc8:40f8
        FDC8:40F8 FA             CLI                    ; disable interrupts
        FDC8:40F9 80FC6C         CMP    AH,6C
        FDC8:40FC 77D2           JA     40D0            ; invalid function number

        ; step 1
        FDC8:40FE 80FC33         CMP    AH,33
        FDC8:4101 7218           JB     411B            ; normal DOS function
        FDC8:4103 74A2           JZ     40A7            ; do 21/33 (fig. 6-5)
        FDC8:4105 80FC64         CMP    AH,64
        FDC8:4108 7711           JA     411B            ; normal DOS function
        FDC8:410A 74B5           JZ     40C1            ; do 21/64
        FDC8:410C 80FC51         CMP    AH,51
        FDC8:410F 74A4           JZ     40B5            ; do Get PSP
        FDC8:4111 80FC62         CMP    AH,62
        FDC8:4114 749F           JZ     40B5            ; do Get PSP (51==62)
        FDC8:4116 80FC50         CMP    AH,50
        FDC8:4119 748E           JZ     40A9            ; do Set PSP (fig. 6-4)

        ; step 2
        ; caller's flags, CS, and IP of course already pushed on the stack by INT
        FDC8:411B 06             PUSH   ES   ; 10h   ; Save regs on caller's stack.
        FDC8:411C 1E             PUSH   DS   ; 0Eh   ; The order is important, as
        FDC8:411D 55             PUSH   BP   ; 0Ch   ; later on different INT 21h
        FDC8:411E 57             PUSH   DI   ; 0Ah   ; functions will access the
        FDC8:411F 56             PUSH   SI   ; 08h   ; caller's original registers
        FDC8:4120 52             PUSH   DX   ; 06h   ; by treating this stack frame
        FDC8:4121 51             PUSH   CX   ; 04h   ; as a structure. See 2f/1218.
        FDC8:4122 53             PUSH   BX   ; 02h   ; For example, caller's BX
        FDC8:4123 50             PUSH   AX   ; 00h   ; is at offset 2, ES at 10h.

        ; step 3
        FDC8:4124 8CD8           MOV    AX,DS
        FDC8:4126 2E8E1EE73D     MOV    DS,CS:[3DE7]       ; get DOS DS
        FDC8:412B A3EC05         MOV    [05EC],AX          ; save caller's DS
        FDC8:412E 891EEA05       MOV    [05EA],BX          ; save caller's BX
        FDC8:4132 A18405         MOV    AX,[0584]   ; SDA+264h = ptr to stack frame
        FDC8:4135 A3F205         MOV    [05F2],AX   ;    containing user registers
        FDC8:4138 A18605         MOV    AX,[0586]   ;    on INT 21h
        FDC8:413B A3F005         MOV    [05F0],AX

        ; step 4
        FDC8:413E 33C0           XOR    AX,AX              ; set AX=0
        FDC8:4140 A27205         MOV    [0572],AL
        FDC8:4143 F606301001     TEST   Byte Ptr [1030],01 ; Is Win3 Enh running?
        FDC8:4148 7503           JNZ    414D
        ; following line only if Windows 3 Enhanced mode not running!
        FDC8:414A A33E03         MOV    [033E],AX          ; set machine ID to zero

        ; step 5
        FDC8:414D FE062103       INC    Byte Ptr [0321]    ; increment InDOS flag

        ; step 6
        FDC8:4151 89268405       MOV    [0584],SP          ; SDA+264h
        FDC8:4155 8C168605       MOV    [0586],SS          ; save current stack ptr
        FDC8:4159 A13003         MOV    AX,[0330]          ; get current PSP
        FDC8:415C A33C03         MOV    [033C],AX          ; SDA+1Ch = SHARE, NET PSP
        FDC8:415F 8ED8           MOV    DS,AX              ; point DS at caller's PSP
        FDC8:4161 58             POP    AX
        FDC8:4162 50             PUSH   AX                 ; get back caller's AX 
        FDC8:4163 89262E00       MOV    [002E],SP          ; save current stack ptr
        FDC8:4167 8C163000       MOV    [0030],SS          ;    in caller's PSP
        FDC8:416B 2E8E16E73D     MOV    SS,CS:[3DE7]  
        ; INT 21h AX=5D00h (Server Function Call) jumps to here
                                                    ; switch stack to 07A0h-SDA =
        FDC8:4170 BCA007         MOV    SP,07A0  ;   SDA+480h=end of Crit Err Stk

        ; step 7
        FDC8:4173 FB             STI                        ; reenable interrupts
        FDC8:4174 8CD3           MOV    BX,SS
        FDC8:4176 8EDB           MOV    DS,BX               ; point DS at DOS_DS
        FDC8:4178 93             XCHG   AX,BX               ; caller's AX into BX
        FDC8:4179 33C0           XOR    AX,AX
        FDC8:417B 36A2F605       MOV    SS:[05F6],AL        ; extended open off?
        FDC8:417F 36812611060008 AND    Word Ptr SS:[0611],0800
        FDC8:4186 36A25703       MOV    SS:[0357],AL        ; set different vars to 0
        FDC8:418A 36A24C03       MOV    SS:[034C],AL
        FDC8:418E 36A24A03       MOV    SS:[034A],AL
        FDC8:4192 40             INC    AX
        FDC8:4193 36A25803       MOV    SS:[0358],AL        ; okay to do INT 28h

        ; step 8
        FDC8:4197 93             XCHG   AX,BX               ; get 
        back caller's AX
        FDC8:4198 8ADC           MOV    BL,AH               ; DOS 
        func num into BL
        FDC8:419A D1E3           SHL    BX,1    ; make DOS func number 
        into word ofs

        ; step 9
        FDC8:419C FC             CLD
        FDC8:419D 0AE4           OR     AH,AH
        FDC8:419F 7417           JZ     41B8            ; AH=0 (terminate process)
        FDC8:41A1 80FC59         CMP    AH,59
        ; if 21/59 (get critical error), bypass code that turns off critical error!
        FDC8:41A4 7444           JZ     41EA            ; AH=59h (get extended error)
        FDC8:41A6 80FC0C         CMP    AH,0C
        FDC8:41A9 770D           JA     41B8            ; AH > 0Ch

        ; step 10
        FDC8:41AB 36803E200300   CMP    Byte Ptr SS:[0320],00  ; critical error set?
        FDC8:41B1 7537           JNZ    41EA    ; if so, stay with crit error stack
        FDC8:41B3 BCA00A         MOV    SP,0AA0 ; SDA+780h=end of Char I/O Stack
        FDC8:41B6 EB32           JMP    41EA

        INT21_ABOVE_0C: ;;; except (normally) 33h, 50h, 51h, 59h, 62h, 64h
        ; step 11
        FDC8:41B8 36A33A03       MOV    SS:[033A],AX
        FDC8:41BC 36C606230301   MOV    Byte Ptr SS:[0323],01  ; crit err locus
        FDC8:41C2 36C606200300   MOV    Byte Ptr SS:[0320],00  ; turn off crit error
        FDC8:41C8 36C6062203FF   MOV    Byte Ptr SS:[0322],FF  ; crit err drive#

        ; Windows Enhanced mode patches next four lines into a far call!
        FDC8:41CE 50             PUSH   AX
        FDC8:41CF B482           MOV    AH,82
        FDC8:41D1 CD2A           INT    2A                    ; End crit section
        FDC8:41D3 58             POP    AX

        FDC8:41D4 36C606580300   MOV    Byte Ptr SS:[0358],00   ; no INT 28h
        FDC8:41DA BC2009         MOV    SP,0920        ; SDA+600h = end of Disk Stack
        FDC8:41DD 36F6063703FF   TEST   Byte Ptr SS:[0337],FF ; SDA+17h=break flag
        FDC8:41E3 7405           JZ     41EA
        FDC8:41E5 50             PUSH   AX                      ; BREAK=ON, so
        FDC8:41E6 E8964E         CALL   907F                    ; check ctrl-break
        FDC8:41E9 58             POP    AX

        ; step 12
        ;;; next four lines are the key; call through dispatch table
        ;;; BX holds caller's INT 21h function number SHL 1 (word offset)
        FDC8:41EA 2E8B9F9E3E     MOV    BX,CS:[BX+3E9E]  ; get func handler addr
        FDC8:41EF 36871EEA05     XCHG   BX,SS:[05EA]     ; move func ptr into var
        FDC8:41F4 368E1EEC05     MOV    DS,SS:[05EC]  ; switch to caller's saved DS
        FDC8:41F9 36FF16EA05     CALL   SS:[05EA]        ; call func handler addr!
        ;;; we've just called the DOS function for the specific DOS function in AH

        ; step 13
        ;;; now into cleanup preparatory to returning to caller
        FDC8:41FE 3680268600FB   AND    Byte Ptr SS:[0086],FB
        FDC8:4204 FA             CLI
        FDC8:4205 2E8E1EE73D     MOV    DS,CS:[3DE7]     ; switch back to DOS DS
        FDC8:420A 803E850000     CMP    Byte Ptr [0085],00
        FDC8:420F 7527           JNZ    4238
        FDC8:4211 FE0E2103       DEC    Byte Ptr [0321]         ; decrement InDOS
        FDC8:4215 8E168605       MOV    SS,[0586]        ; switch back to caller's
        FDC8:4219 8B268405       MOV    SP,[0584]        ;   stack
        FDC8:421D 8BEC           MOV    BP,SP
        FDC8:421F 884600         MOV    [BP+00],AL
        FDC8:4222 A1F205         MOV    AX,[05F2]
        FDC8:4225 A38405         MOV    [0584],AX        ; caller's SP
        FDC8:4228 A1F005         MOV    AX,[05F0]
        FDC8:422B A38605         MOV    [0586],AX        ; caller's SS
        FDC8:422E 58             POP    AX               ; put back caller's
        FDC8:422F 5B             POP    BX               ; registers, including
        FDC8:4230 59             POP    CX               ; any changes the DOS
        FDC8:4231 5A             POP    DX               ; function made to them
        FDC8:4232 5E             POP    SI
        FDC8:4233 5F             POP    DI
        FDC8:4234 5D             POP    BP
        FDC8:4235 1F             POP    DS
        FDC8:4236 07             POP    ES
        FDC8:4237 CF             IRET

The dispatch function in figure 6-7 is the heart of DOS. It is executed every time a program issues an INT 21h call. The dispatch function is the DOS equivalent of the function syscall() in UNIX, which has been examined in books such as Bach's Design of the UNIX Operating System (pp. 165-168) and Andleigh's UNIX System Architecture (pp. 21-23). The discussions of syscall() in these and other UNIX books provides a useful background for to understanding the INT 21h dispatch code. However, in UNIX there is a clear separation between applications and the operating system. The discussions of syscall() emphasize the transition from user mode to kernel mode. As you can see, there is nothing like this in DOS, though DOS extenders such as Windows do maintain a separation between the application running in protected mode and DOS running in real mode. Actually, there is one important separation. DOS usually switches from the application's stack to one of its own. This important aspect of DOS will be discussed in detail below.

Near the top of the function (commented "step 1"), you see how DOS picks off a handful of special functions (33h, 64h, 51, 62h, and 50h). These of course are none other than what we've been calling the reentrant DOS functions. Here, reentrancy simply means that, while the above code is executing—after it has passed the initial CLI, and before it has executed the closing IRET—it could be interrupted by an interrupt handler, and the interrupt handler could call one of these five functions. These five functions are reentrant simply in the sense that DOS handles them before switching stacks and incrementing the InDOS flag. Thus, an interrupt handler can call these functions, even if the InDOS or critical error flag is set.

In a larger sense, of course, these functions aren't really reentrant, given the way that, for example, the Set PSP function writes to a global variable (see figure 6-4). MS-DOS's extensive reliance on global variables makes it completely non-reentrant. Furthermore, if DOS=HIGH and the A20 line is off, DOS, as figure 6-6 showed, has to switch stacks. But in any case, it should now be clear why we picked INT 21h AH=62h to trace with DEBUG and not, say, INT 21h AH=52h; DOS handles the latter function only after switching stacks.

Next (step 2), the INT 21h dispatch code pushes the caller's registers on to the caller's stack. The caller is of course simply whatever program issued the INT 21h call. This can be slightly disorienting because, of course, we're used to thinking about INT 21h from the caller's perspective and now we're looking at it from DOS's point of view. These pushed registers form a structure that many DOS functions use later on. Undocumented INT 2Fh AX=1218 (Get Caller's Registers; see appendix) returns a pointer to this structure.

At step 3 in figure 6-7, DOS saves away the caller's DS and BX again, and switches from the caller's DS to its own DS. DOS keeps DS in a variable accessible through DOS's CS. It also available by calling INT 2Fh AX=1203h (see get_dos_ds() in listing 6-2). Note that, even though DOS=HIGH and the DOS code is in the HMA, the data segment is still in low memory. This is necessary because many existing DOS programs rely on the ability to reach DOS internal data structures and wouldn't know to check the status of the A20 line. Microsoft has to introduce improvements such as DOS=HIGH without breaking existing applications.

The next interesting thing the code does (step 4) is check a variable at 1030h to see whether Windows 3.x Enhanced mode (or Windows/386 2.x) is running. Since most of us think of Windows as something that runs "on top of" DOS, it is a bit disconcerting at to learn that DOS 5.0 and higher knows about Windows. As discussed in chapter 1, however, this part at least of the intricate DOS/Windows connection is implemented using documented functionality. In its INT 2Fh handler, MSDOS.SYS monitors the AX=1605h Windows initialization and AX=1606h exit broadcasts; the code for AX=1605h sets the variable at 1030h (actually, just the byte at 1031h), and the code for AX=1606h clears it. This variable thus serves as a kind of InWindows flag. It's important to underline that this is for Enhanced mode only; DOS doesn't care one way or the other about Standard mode.

If Windows Enhanced mode is not running, then DOS zeroes out a variable at 033Eh (SDA+1Eh), used by DOS as the machine ID. If Windows Enhanced mode is running, the DOSMGR VxD (as explained in chapter 1) has smacked a virtual machine ID in here. DOS uses this VM ID to manage SFTs.

Next (step 5), the code increments the InDOS flag, which is simply a variable at 0321h (SDA+1) in the DOS data segment. The until-recently-undocumented function INT 21h AH=34h (Get InDOS Flag Address) returns a pointer to this variable.

The InDOS flag has been set, so we're now "in DOS"! Of course, we were in DOS before, but the significance of this spot is that DOS is about to switch stacks. Switching stacks requires a guard or semaphore, namely the InDOS flag. Notice, however, that while DOS increments the InDOS flag, it does not check it before proceeding. Thus, InDOS is not a true semaphore. If the processor is interrupted in the middle of this code (or, rather, a little further on when DOS reenables interrupts with an STI instruction), the code can be reentered.

In other words, DOS does nothing to enforce its requirement that only one caller at a time execute inside the INT 21h code. Obeying the InDOS flag is merely a convention. But it is vital that programs do observe this convention, because making an INT 21h call when InDOS is set will almost always cause problems. For one thing, DOS relies on many global variables. If, for example, DOS were working with a particular hard-disk cluster to service an INT 21h file I/O call, and an interrupt handler that ignored the InDOS flag made a file I/O call to DOS before DOS had finished with the one, DOS would mistakenly use the second caller's cluster to satisfy (not!) the caller's request. Global variables do not work like a last-in/-out stack. It is vital that interrupt handlers check InDOS before issuing INT 21h calls. (So why did it take Microsoft so long to document InDOS and the INT 21h AH=34h function that returns a pointer to it?)

Ignoring InDOS can cause another problem. Because the code at step 5 in figure 6-7 increments InDOS, reentering DOS means that InDOS will take on a value of two or greater. This is bad, because the internal DOS function that checks for Ctrl-C only does so when CMP Byte Ptr IN_DOS, 01. Thus, if InDOS is 2 or greater, DOS won't check Ctrl-C, even if BREAK=ON.

There is a method by which DOS can be safely reentered: if the entire DOS state (including all three DOS stacks) is saved and restored by each caller, and if each such caller observes the DOS critical sections by hooking INT 2Ah. The SDA TSR technique put forward in chapter 9 is an approximation to this method, though only an approximation because the SDA does not include the entire DOS state.

Returning to step 6 in figure 6-7, you can see the beginnings of the stack switching code. How does DOS switch away from the user's stack to one of its own? , it saves away the caller's current SS:SP. Next, DOS gets the current PSP (at 0330h, or SDA+10h) and uses it to save the caller's SS:SP at offset 2Eh in the caller's PSP. Finally, it sets SS:SP to a DOS stack. Depending on the DOS function number, it may switch again to a different DOS stack; see below.

What is the purpose of stack switching? Why not just use the caller's stack? Wouldn't that make DOS much more reentrant? Yes, it would. As it is, making an INT 21h call already uses 18h bytes on the caller's stack (see table 6-2). If the caller could be relied upon to provide a large enough stack, DOS could even be multithreaded. Unfortunately, DOS has to accommodate programs with unknown stack sizes. This complicates DOS tremendously and helps make it non-reentrant.

At the very end of step 6, where DOS points SP at the Critical Error stack, is a location (called Redisp in the source code) to which undocumented INT 21h AX=5D00h (Server Function Call) jumps. This function is a backdoor into the INT 21h dispatcher. If a network-aware program hooks this call, it can be used by one machine to do remote INT 21h calls on another machine (or perhaps to another Windows virtual machine).

Skipping over a bunch of the code in step 7, which zeroes out several variables in the DOS data segment, we come to step 8, where the code takes the caller's AH (with the crucial DOS function number) and turns it into a word offset in BX. This will be important later on.

Next (step 9), DOS examines the function number in AH. If AH=59 (Get Extended Error) is being called, DOS proceeds directly to step 12, where the code for function 59h will be called. It stays on the Critical Error Stack, bypassing more stack-switching code in steps 10 and 11, and bypassing code that would obliterate information pertaining to any pending Critical Error.

If one of the CP/M-based character I/O functions (INT 21h AH=1 through AH=0Ch) is being called, DOS (step 10) points SP at 0AA0h, which is the top of the character I/0 stack, located in the Swappable Data Area (see appendix). However, if there is a pending critical error, DOS stays with the Critical Error stack that was set initially. This is not surprising, since Microsoft documents these functions (MS-DOS Programmer's Reference) as callable from a critical error handler. Notice that DOS does not turn off critical error information for functions 1 through 0Ch. As you can see, much of the core DOS code accommodates critical errors.

Finally, if the DOS function number is 0 (Terminate Program), or anything greater than 0Ch, but not 59h, and not one of the special functions that were picked off earlier in step 1 and which DOS already processed on the caller's stack, DOS (step 11) switches to the disk stack. Thus, there are three DOS stacks:

Critical Error (or auxiliary), used for function 59h and for functions 1 through 0Ch when a critical error is pending, and used temporarily for any DOS function call if DOS=HIGH but A20 is off.
Character I/O Stack, used for functions 1 through 0Ch in the absence of a critical error
Disk Stack, used for everything else. Calling any of the special functions with the INT 21h AX=5D00h indirect function call also ends up using this stack (though in practice indirectly calling the special functions via 21/5D00 crashes the machine).

For the majority of functions running on the disk stack, the code (step 11) carries out a number of tasks, turning off critical error, calling undocumented INT 2Ah AH=82h to end any critical sections (see below), and checking the Ctrl-C Check flag at SDA+17h. In figure 6-5, you saw the code for INT 21h AX=3301h that sets this flag when a user types BREAK=ON. Now you can see where DOS actually uses this flag. If BREAK=ON (that is, if the flag at SDA+17h is non-zero), DOS calls a subroutine (here located at 907Fh) to check Ctrl-C for the functions that come through here. Otherwise, DOS (elsewhere) only checks Ctrl-C for functions 1 through 0Ch. As noted earlier, the DOS internal function to check Ctrl-C will only do so if IN_DOS == 1.

What is this call to INT 2Ah AH=82h? Normally, the INT 2Ah handler in DOS does an immediate IRET, performing no operation. However, other programs can take over INT 2Ah and/or patch DOS. Windows Enhanced mode, in particular, uses INT 2Ah critical sections because it runs preemptively multitasked DOS boxes on top of a single copy of MS-DOS. Because the InDOS flag is instanced per VM (that is, each DOS box gets its own copy), it cannot be used to control access to DOS by different VMs. Nor would you want the InDOS flag to do that, as different VMs could be in different parts of DOS at the same time. What different parts? Different critical sections can be set and cleared with INT 2Ah AH=80h and 81h (see appendix). DOS's call to INT 2Ah AH=82h is a signal that a multitasking extension to DOS, such as Windows or networking software, can restart any task (VM) that was suspended because it was waiting for a critical section. For additional information on critical sections, see chapter 1 and chapter 9 (see CRITSECT.C in listing 9-XX ). Also see Microsoft's MS-DOS 6 Technical Reference (p. 41), which briefly discusses critical sections in the context of the MRCI specification.

As discussed later in this chapter, the DOSMGR VxD in Windows Enhanced mode patches this INT 2Ah AH=82h in the INT 21h dispatch, turning it into a far call into Windows. When Windows exits, of course it (hopefully) puts back the original code.

With all this talk of critical errors, Ctrl-Break, and critical sections—which do dominate the DOS dispatch code—it is important not to lose sight of the main goal, which is that a program wants to call a DOS function! As is typical of software, DOS accomplishes this main goal in only a few lines, while rarer situations such as critical errors and so on occupy the bulk of the code.

Having switched to an appropriate stack, saved the caller's registers, and so on, step 12 in figure 6-7 is the simplest and the most important. Recall that step 8 moved the function number in AH into BX and multiplied by two. DOS will now use this value as an offset into an array of function pointers, one for each DOS function. Here, the table is at CS:3E9E, so that for example the address for DOS function 0 is at CS:3E9E, function 1's address is at CS:3EA0, function 2's address is at CS:3EA2, and so on. Since this array holds two-byte words, not far pointers, you can't use it to hook individual DOS functions. All handlers must be located in a single segment (here, FDC8h). We come back to this array of function pointers momentarily; it is very important to us. In any case, having retrieved the address of the handler for the DOS function being called, DOS calls the handler. Ta da! The function the user wanted has now been called.

In step 13, having invoked the appropriate handler for the DOS function, DOS decrements the InDOS flag, switches back to the caller's stack, and pops back the caller's registers from the register image created on the stack back in step 2. As you'll see in a moment, the handler for the specific DOS function probably modifies the register image. Finally, DOS returns to the caller with an IRET. Since IRET pops the flags off the stack, the specific DOS functions have to set or clear the carry flag by modifying its image that the processor pushed on the stack as part of the initial INT (see the comment to step 2).

Seeing the DOS dispatch code in figure 6-7, it should now be clear why a DEBUG trace through an INT 21h AH=62h call works, but tracing for example through a call to INT 21h AH=52h wouldn't. A call to AH=52h would involve switching stacks, mucking with global variables in the DOS data segment, and so on. DEBUG itself uses DOS, so you would end up instead tracing through one of the DOS calls that DEBUG would be making to display our information. A complete mess! One alternative of course is to use a debugger that bypasses DOS, such as Soft-ICE (or SERMON from the edition of Undocumented DOS which, however, did not support tracing through INT instructions).

Examining the INT 21h Dispatch Table

However, we really don't need to trace through INT 21h any more. We now have the address of COMMAND (The One True INT 21h Handler) and the address of the function pointer array (called Dispatch in the DOS source code) and can thus unassemble at leisure, rather than trace under pressure, so to speak.

To find the code that handles each specific DOS function, you need do nothing more than dump out the Dispatch table, which you can see from step 12 in figure 6-7 is located at FDC8:3E9E. This table of two-byte words is conveniently dumped with SYMDEB's dw command:

        -dw fdc8:3e9e
        FDC8:3E9E  A1F6 54E0 54E9 559F 55BC 55C2 541C 544B
        FDC8:3EAE  51BA 5214 5220 55D6 55E0 4DA1 4C78 5CCC
        FDC8:3EBE  5688 5DDF 5E73 5625 5DCB 5DD0 5DB1 56F9
        FDC8:3ECE  440D 4C73 4C68 4D2D 4D2F 440D 440D 4D71
        FDC8:3EDE  440D 5DD5 5DDA 5639 560D 4C9A 4EB6 5DC6
        FDC8:3EEE  5DC1 4D22 4839 4856 4876 4887 4A46 4C54
        FDC8:3EFE  4A1C A19A 4D73 4052 4D59 4C8A 4C2B 4CC9
        FDC8:3F0E  4A4D 60E1 6029 6065 AFE6 AF0F A72A A839
        ; ... etc. ...

You can double-check that all is in order by looking for a known function. Let's see what the table shows for function 62h (although we know it usually gets picked off in step 1 of figure 6-7, only coming through this table in the unlikely event of an INT 21h AX=5D00h indirect function call of AH=62h):

        -dw fdc8:(3e9e+62*2)
        FDC8:3F62  40B5 ......

        -u fdc8:40b5
        FDC8:40B5 1E             PUSH   DS
        FDC8:40B6 2E8E1EE73D     MOV    DS,CS:[3DE7]
        FDC8:40BB 8B1E3003       MOV    BX,[0330]
        FDC8:40BF 1F             POP    DS
        FDC8:40C0 CF             IRET

That's it! So you now have the DOS dispatch table and can examine at will the code for any DOS function you're interested in.

However, examination of this table and others like it is made easier with a short C program, FTAB.C, shown in listing 6-4. FTAB can display tables of bytes (1), words (2), or dwords (4).

Listing 6-4: FTAB.C

        /* FTAB.C */

        #include <stdlib.h>
        #include <stdio.h>
        #include <dos.h>

        typedef unsigned char BYTE;
        typedef unsigned short WORD;
        typedef unsigned long DWORD;

        void fail(const char *s) { puts(s); exit(1); }

        main(int argc, char *argv[])
            char *prefix;
            void far *tab;
            BYTE far *btab;
            WORD far *wtab;
            DWORD far *dtab;
            WORD seg, ofs;
            int num_func, size, i;

            if (argc < 4)
                fail("usage: ftab < seg:ofs > < num_func | ? > [prefix][size]");

            sscanf(argv[1], "%04X:%04X", &seg, &ofs);
            tab = (void far *) MK_FP(seg, ofs);
            if (argv[2][0] == '?')
                num_func = *((BYTE far *) tab);     /* first BYTE is #func */
                tab = ((BYTE far *) tab + 1);       /* then array of WORDs */
                sscanf(argv[2], "%04X", &num_func);
            prefix = (argc > 3) ? argv[3] : "func";
            size = (argc > 4) ? atoi(argv[4]) : 2;  /* default to WORD table */

            switch (size)
                case 1:
                    for (i=0, btab=(BYTE far *)tab; i < num_func; i++, btab++)
                            *btab, prefix, i);
                case 2:
                    for (i=0, wtab=(WORD far *)tab; i < num_func; i++, wtab++)
                            seg, *wtab, prefix, i);
                case 4:
                    for (i=0, dtab=(DWORD far *)tab; i < num_func; i++, dtab++)
                            *dtab, prefix, i);
                    fail("size only 1 (byte), 2 (word), 4 (dword)");

            return 0;

To generate a list of the 6Dh (0 through 6Ch) different DOS INT 21h function handlers ( "72h, not 6Ch, is the highest function number in the DOS 7.0 component of Chicago,"" says one tech reviewer), run FTAB on the table at FDC8:3E9E. Figure 6-8 shows sample output from FTAB.

Figure 6-8: The INT 21h Dispatch Table Displayed by FTAB

        C:\UNDOC2\CHAP6>ftab fdc8:3e9e 6D int21 2
        FDC8:A1F6   int21_00
        FDC8:54E0   int21_01
        FDC8:54E9   int21_02
        FDC8:559F   int21_03
        ; ...
        FDC8:4C9A   int21_25
        ; ...
        FDC8:4D59   int21_34
        FDC8:4C8A   int21_35
        ; ...
        FDC8:AF0F   INT21_3D
        FDC8:A72A   INT21_3E
        FDC8:A839   INT21_3F
        FDC8:A89F   INT21_40
        FDC8:B038   INT21_41
        FDC8:A8A4   INT21_42
        ; ... 
        FDC8:40A9   int21_50
        FDC8:40B5   int21_51
        FDC8:4D65   int21_52
        FDC8:4DD6   int21_53
        FDC8:4A41   int21_54
        FDC8:4EA5   int21_55
        FDC8:B05E   int21_56
        FDC8:A90C   int21_57
        FDC8:A448   int21_58
        FDC8:4CDD   int21_59
        FDC8:B0E9   int21_5A
        FDC8:B0D1   int21_5B
        FDC8:B2D8   int21_5C
        FDC8:A531   int21_5D
        FDC8:AA49   int21_5E
        FDC8:A9AA   int21_5F
        FDC8:AEA8   int21_60
        FDC8:440D   int21_61
        FDC8:40B5   int21_62
        ; ... etc. ...

Confirming that this table is correct, you can see that int21_51 and int21_62 are located at the same address (FDC8:40B5), as they should be.

Get SysVars and the Caller's Registers

To check that the FTAB output in figure 6-8 is really correct, examine another function that should be simple, INT 21h AH=52h, which returns a far pointer in ES:BX to SysVars. According to the FTAB output, the code to handle function 52h should be at FDC8:4D65, so can you use SYMDEB or DEBUG to unassemble at that address. Figure 6-9 shows the results.

Figure 6-9: MS-DOS 6.0 Implementation of INT 21h AH=52h (Get SysVars)

        -u fdc8:4d65
        FDC8:4D65 E81AF5         CALL   4282
        FDC8:4D68 C744022600     MOV    Word Ptr [SI+02],0026
        FDC8:4D6D 8C5410         MOV    [SI+10],SS
        FDC8:4D70 C3             RET

In fact, calling Get SysVars in this particular configuration does return 0116:0026, so the hardwired 0026h above does look correct. But what is going on here?! How come we don't see SS:0026 being moved into ES:BX? What are [SI+02] and [SI+10h]?

To answer these questions, let's examine the subroutine being called at 4282h:

        -u fdc8:4282
        FDC8:4282 2E8E1EE73D     MOV    DS,CS:[3DE7]
        FDC8:4287 C5368405       LDS    SI,[0584]
        FDC8:428B C3             RET

CS:3D37h is just our old friend, the DOS data segment, whose value DOS keeps in its code segment. (DOS stores the value of DS in its code segment because, when an INT occurs, DS isn't known, but CS is.) So this subroutine is setting itself up to use DOS's DS, just as the code did back in Figures 6-3, 6-4, and 6-7 for Get PSP, Set PSP, and the INT 21h dispatch.

The subroutine then loads DS:SI with something at DOS:584h. In step 6 of the INT 21h dispatch code in figure 6-7, you saw DOS set the dword at DOS:584h to the caller's SS:SP. In other words, DOS:584h contains a pointer to the caller's stack, with all the registers that were pushed upon it during step 2 of figure 6-7 (and earlier, as part of the actual INT instruction). Sure enough, the comments to step 3 point out that DOS:584h in this configuration happens to be SDA+264h, which the appendix identifies as "a pointer to the stack frame containing the user registers on entry to the INT 21h call."

So the subroutine at FDC8:4282 loads DS:SI with a pointer to the caller's pushed register structure. Given the order in which steps pushes registers, it won't surprise you to learn that the client register structure has the format shown below in table 6-2.

Table 6-2: MS-DOS Caller's Register Structure

        00h     AX
        02h     BX
        04h     CX
        06h     DX
        08h     SI
        0Ah     DI
        0Ch     BP
        0Eh     DS
        10h     ES
        12h     IP
        14h     CS
        16h     flags

In figure 6-9, the code for function 52h at FDC8:4D65 moves 26h into [SI+2] and DOS's SS (DS) into [SI+10h]. DS:SI points at the caller's register structure, where offset 2 is BX and offset 10h is ES. Thus, the code is actually setting an image of the caller's ES:BX to DOS_DS:0026. The register image gets popped into the actual CPU registers in the series of POPs at the end (step 13) of the INT 21h dispatch in figure 6-7. So, this is how INT 21h function 52h returns SysVars in ES:BX. (If you want to see how DOS creates SysVars in the place, you need to disassemble the DOS initialization code on disk.)

The reader may have noted from the appendix that there is an internal DOS function, INT 2Fh AX=1218h, to get the caller's register structure; it returns a pointer to the structure in DS:SI. This sounds a lot like the subfunction you viewed above at FDC8:4282. In fact, they are one and the same function. DOS calls this subroutine through a near function pointer rather than through an INT 2Fh. You'll see in a few moments that, in a table of INT 2Fh AH=12h subfunctions, 4282h duly appears as the handler for subfunction 18h.

A Very Brief Glance at File I/O

Next, let's look at a more interesting function. From Figure 6-8, the code for INT 21h AH=3Fh (Read File) is supposed to be located at FDC8:A839. The code for this function is too extensive to examine much of it here, so let's just look at the two lines:

        -u FDC8:A839
        FDC8:A839 BEFD71         MOV    SI,71FD     ; offset of internal Read func
        FDC8:A83C E82DFE         CALL   A66C        ; see below
        ; ...

You know that function 3Fh expects a file handle in BX; you know furthermore that file handles are associated with the current PSP. Examining the subroutine called at A66Ch shows how DOS uses the passed-in file handle:

        -u FDC8:A66C
        FDC8:A66C 2E8E06E73D     MOV    ES,CS:[3DE7]    ; get DOS DS
        FDC8:A671 268E063003     MOV    ES,ES:[0330]    ; ES <- current PSP
        FDC8:A676 263B1E3200     CMP    BX,ES:[0032]    ; PSP[32h] = # max open files
        FDC8:A67B 7204           JB     A681
        FDC8:A67D B006           MOV    AL,06           ; 6 = invalid handle error
        FDC8:A67F F9             STC                    ; set carry flag
        FDC8:A680 C3             RET
        FDC8:A681 26C43E3400     LES    DI,ES:[0034]    ; PSP[34h] -> file handle tbl
        FDC8:A686 03FB           ADD    DI,BX           ; use file handle as offset
        FDC8:A688 C3             RET

In other words, this subroutine uses the current PSP to convert the passed-in file handle into a pointer to the caller's Job File Table (see chapter 8). Dereferencing this pointer yields an index into the System File Table. From the SFT entry, the DOS Read function can determine what type of file the caller wants to read from. With a network file, for example, the Read function must pass the call down to a redirector (see chapter 8), whereas with a normal file, a device driver must handle the call. Of course, a Read call may never get here in the place, having already been picked off by a disk cache such as SMARTDRV. That, after all, is the whole point of a disk cache.

The subroutine at FDC8:A66C is none other than the handler for the internal DOS function INT 2Fh AX=1220h (Get Job File Table Entry; see appendix). You saw earlier that many DOS functions call use a near pointer to INT 2Fh AX=1218h to get a pointer to the client register structure. And the "MOV DS,CS:[3DE7]" code you've seen so many times sounds a lot like what INT 2Fh AX=1203h (Get DOS Data Segment) must do. We keep on running into these INT 2Fh AH=12h subfunctions; it's time to take a closer look.

Tracing a DOS INT 2Fh Call

To examine the code for INT 2Fh AH=12h, we're going to unassemble the DOS INT 2Fh handler, just as we did for INT 21h. Recall that we used DEBUG to trace through a simple INT 21h call so we could find the DOS INT 21h handler. We could do the same thing again for a simple call such as INT 2Fh AX=1200h (DOS internal services installation check; see appendix). But is there any way to automate what DEBUG did? Can you perhaps trace through interrupts and locate an entire interrupt chain without DEBUG's help?

How Does DEBUG Trace Through an INT?

Yes, but you have to understand a little of how the DEBUG trace command works. The edition of Undocumented DOS had an entire chapter by Tim Paterson on debugging, with extensive source code examples on the accompanying disk (\UNDOC\CHAP7\*.ASM) This is an excellent place to turn for a general understanding of how DEBUG works.

The trace command in debuggers such as DEBUG and SYMDEB uses the single-step feature built into all Intel 80x86 microprocessors. When the processor's trace flag (TF) is enabled, the processor issues an INT 1 for every instruction it executes. A debugger can install an INT 1 (Single Step) handler and get the effect of having a breakpoint on every instruction.

However, a single step handler contains code too, and leaving TF enabled on entry to the single step handler would produce an endless loop. For this reason, the processor temporarily disables the trace flag when it issues an interrupt, and reenables tracing when the interrupt handler returns. In fact, the processor disables single step for all interrupts.

This is why most debuggers won't trace into an INT. To trace through an INT, a debugger must do something like set a breakpoint at the instruction of the interrupt handler; and then re-enable single-step after the breakpoint is hit (see Crawford and Gelsinger, Programming the 80386 ). This is what DEBUG does. Unfortunately, the MON family of debuggers included with the edition of Undocumented DOS happened not to trace through INT instructions.


We can incorporate this knowledge into a program that single-steps through an interrupt handler. INTCHAIN.C, shown in listing 6-5, installs an INT 1 single-step handler, turns on the trace flag, calls an interrupt function specified on the program's command line, and turns off the trace flag. Because INTCHAIN.C uses a far CALL rather than an INT, the processor calls the single-step handler for each instruction in the other interrupt handler; the handler saves away CS:IP whenever CS changes, as a likely indication that the interrupt function is chaining to the previous handler. When the interrupt function returns and INTCHAIN has turned off the trace flag, INTCHAIN prints out the interrupt chain as saved by the single-step handler.

For example, consider the point made earlier in figure 6-3 that SMARTDRV does back-end handling of the DOS Disk Reset function (INT 21h AH=0Dh). This is plainly visible in an INTCHAIN trace of a call to this function, shown in figure 6-10a.

Figure 6-10a: INTCHAIN Display for INT 21h AH=0Dh (Disk Reset)

        C:\UNDOC2\CHAP6>intchain 21/0d00
        1387 instructions
        Skipped over 4 INT

        0B94:32B6   MSCDEX
        07FA:15FA   SMARTDRV
        0255:0023   J:
        CC2C:058E   DBLSSYS$
        0116:109E   DOS
        FDC8:40F8   HMA
        0070:06F5   IO
        FDC8:8653   HMA
        0070:0700   IO
        FFFF:0043   HMA
        CC2C:0623   DBLSSYS$
        07FA:1631   SMARTDRV

Notice that, after being processed by MSDOS.SYS, IO.SYS, and DBLSSYS$, the call winds up back in SMARTDRV.

For a direct comparison with the DEBUG trace in figure 6-3, Figure 6-10b presents sample INTCHAIN output when tracing an INT 21h AH=62h call.

Figure 6-10b: INTCHAIN Display for INT 21h AH=62h

        C:\UNDOC2\CHAP6>intchain 21/6200
        77 instructions

        0F93:32B6   MSCDEX
        07F9:15FA   SMARTDRV
        0255:0023   D:
        CC2E:058E   DBLSSYS$
        0116:109E   DOS
        FDC8:40F8   HMA

Sure enough, this matches the interrupt chain we so laboriously traced back in figure 6-3. INTCHAIN uses MAP.C from listing 6-2 to try to match up CS:IP addresses with the names of resident TSRs and drivers. The addresses displayed by INTCHAIN can be passed to SYMDEB or DEBUG for unassembly (this is the whole point of the program).

INTCHAIN can also trace through an XMS function or an arbitrary segment:offset pointer. Actually the program has little to do with interrupt chains as such. Rather than generate an actual INT instruction and then have to mess with setting a breakpoint, the program just turns an INT XXh into a far call (and PUSHF) to the handler for INT XXh. Thus, INTCHAIN won't trace any INT generated inside the handler (such as the INT 2Ah call made by the INT 21h dispatch in figure 6-7); this is generally what you want anyway.

Listing 6-5: INTCHAIN.C

        Andrew Schulman, May 1993
        Copyright (C) 1993 Andrew Schulman. All rights reserved.

        bcc intchain.c map.c

        Uses single-step to trace through interrupt chains
        usage:   intchain intno/ax/bx/cx/dx
        example: intchain 21/6200

        #include <stdlib.h>
        #include <stdio.h>
        #include < string.h >
        #include <dos.h>

        typedef unsigned char BYTE;
        typedef unsigned short WORD;
        typedef unsigned long DWORD;

        #ifdef __cplusplus
        typedef void interrupt (far *INTRFUNC)(...);
        typedef void (interrupt far *INTRFUNC)(void);

        typedef void (far *FARFUNC)(void);

        #ifndef MK_FP
        #define MK_FP(s,o)  ((((DWORD) s) << 16) + (o))

        #define MK_LIN(fp)  ((((DWORD) FP_SEG(fp)) << 4) + FP_OFF(fp))

        #pragma pack(1)

        typedef struct {
        #ifdef __TURBOC__
            WORD bp,di,si,ds,es,dx,cx,bx,ax;
            WORD es,ds,di,si,bp,sp,bx,dx,cx,ax;     /* same as PUSHA 
            WORD ip,cs,flags;
            } REG_PARAMS;
        #define INT_INSTR       0xCD
        #define TRACE_FLAG      0x100

        extern char *find_owner(DWORD lin_addr);    // in map.c

        void fail(const char *s) { puts(s); exit(1); }

        #define MAX_ADDR        512

        static WORD volatile instr = 0, int_instr = 0;
        static WORD prev_seg = 0, my_seg = 0;
        static void far * *addr;
        static int num_addr = 0;

        void interrupt far single_step(REG_PARAMS r)    // INT 1 handler
            WORD seg;
            BYTE far *fp;
            if ((seg = r.cs) == my_seg)          // ignore my own code
            fp = (BYTE far *) MK_FP(r.cs, r.ip);
            if (fp[0] == INT_INSTR)              // count INTs
            if (seg != prev_seg)                 // if segment changed,
            {                                    // assume we've chained
                if (num_addr < MAX_ADDR)
                    addr[num_addr++] = (void far *) fp;
                prev_seg = seg;

        #define GET_FLAGS(reg)  _asm { pushf } ; _asm { pop reg }
        #define SET_FLAGS(reg)  _asm { push reg } ; _asm { popf }

        void set_flag(unsigned mask)
            _asm or ax, word ptr mask

        void clear_flag(unsigned mask)
            _asm mov bx, word ptr mask
            _asm not bx
            _asm and ax, bx

        FARFUNC get_xms(void)
            _asm mov ax, 4300h
            _asm int 2fh
            _asm cmp al, 80h
            _asm je present
            fail("XMS not present!");
            _asm mov ax, 4310h
            _asm int 2fh
            _asm mov ax, bx
            _asm mov dx, es
            // retval in DX:AX

        main(int argc, char *argv[])
            static int intrfunc = 0;    /* make sure not in a register 
            INTRFUNC old_sstep;
            FARFUNC func = (FARFUNC) 0;
            FARFUNC xms_func = (FARFUNC) 0;
            void far *fp;
            char *s;
            WORD intno, _ax, _bx, _cx, _dx;
            int a20off = 0;
            int i;

            puts("INTCHAIN 1.0 -- Walks interrupt chains");
            puts("From \"Undocumented DOS\", 2nd edition (Addison-Wesley, 1993)");
            puts("Copyright (C) 1993 Andrew Schulman. All rights reserved.\n");
            if (argc < 2)
                fail("usage: intchain [-a20off] < intno|xms|seg:ofs >/ax/bx/cx/dx");
            if (strcmp(strupr(argv[1]), "-A20OFF") == 0)
                xms_func = get_xms();
            // Figure out what code they want to generate:
            // an XMS call
            if (strncmp(strupr(argv[1]), "XMS", 3) == 0)
                func = get_xms();
                sscanf(argv[1], "XMS/%04X/%04X/%04X/%04X",
                    &_ax, &_bx, &_cx, &_dx);
                printf("Tracing XMS at %Fp\n", func);
            // ... or a far (segment:offset) CALL
            else if (strchr(argv[1], ':'))
                WORD seg, ofs;
                sscanf(argv[1], "%04X:%04X/%04X/%04X/%04X/%04X",
                    &seg, &ofs, &_bx, &_cx, &_dx);
                func = (FARFUNC) MK_FP(seg, ofs);
                printf("Tracing function at %Fp\n", func);
            // ... or an INT XXh
                sscanf(argv[1], "%02X/%04X/%04X/%04X/%04X",
                    &intno, &_ax, &_bx, &_cx, &_dx);

                /* single-step doesn't go through INT, so turn the INT into
                a PUSHF and far CALL */
                if (! (func = (FARFUNC) _dos_getvect(intno)))
                    fail("INT unused");
                intrfunc++;     // so do PUSHF when call func
                printf("Tracing INT %02X AX=%04X\n", intno, _ax);
            if (! (addr = (void far **) calloc(MAX_ADDR, sizeof(void far *))))
                fail("insufficient memory");
            fp = (void far *) main;
            my_seg = FP_SEG(fp);
            old_sstep = _dos_getvect(1);
            _dos_setvect(1, (INTRFUNC) single_step);
            if (a20off)
                _asm mov ah, 6
                (*xms_func)();  // local disable A20 line

            /* call the code */
            _asm mov ax, _ax
            _asm mov bx, _bx
            _asm mov cx, _cx
            _asm mov dx, _dx
            if (intrfunc)
                _asm pushf
            _dos_setvect(1, old_sstep);
            printf("%u instructions\n", instr);
            if (int_instr)
                printf("Skipped over %u INT\n", int_instr);
            for (i=0; i< num_addr; i++)
                s = find_owner(MK_LIN(addr[i]));
                printf("%Fp\t%s\n", addr[i], s? s: " ");
            if (num_addr == MAX_ADDR)
                fail("Overflow: very long INT chain!");
            return 0;

Examining The INT 2Fh Chain

You can now use INTCHAIN to trace through a call to INT 2Fh AX=1200h, without using DEBUG. Figure 6-11 shows sample results. Note that the configuration was somewhat different from the one used to produce the INTCHAIN output for INT 21h AH=62h in figure 6-10b.

Figure 6-11: INTCHAIN Display for INT 2Fh AX=1200h

        C:\UNDOC2\CHAP6>intchain 2f/1200
        174 instructions
        Skipped over 1 INT

        1248:0007   NLSFUNC
        109A:0980   PRINT
        0F16:0943   SHARE
        DB18:0285   DOSKEY
        0B94:308D   MSCDEX
        07FA:1368   SMARTDRV
        0726:019F   COMMAND
        0725:0135   COMMAND
        0725:01BD   COMMAND
        FFFF:DFD8   HMA
        0255:002D   J:
        CC2C:25ED   DBLSSYS$
        0255:0028   J:
        CC2C:0116   DBLSSYS$
        0116:10C6   DOS
        FDC8:44BD   HMA

Towards the goal of disassembling DOS, the essential piece of information here is the very last line, as this gives the address (FDC8:44BD) of MSDOS.SYS's INT 2Fh handler. We will come back to this in a few moments.

The most noticeable feature of figure 6-11 is the very long interrupt chain. NLSFUNC, PRINT, SHARE, DOSKEY, MSCDEX, SMARTDRV, COMMAND.COM, and DoubleSpace all take a crack at processing the call. Processing even what is (as you'll see) an absolutely trivial INT 2Fh AX=1200h call requires that every TSR and device driver camped out on INT 2Fh inspect the call to see if it interests them. INT 2Fh chains can be extremely long; they are particularly bad when any interrupt handlers written in C (such as the wrappers from chapter 2) are involved. As noted earlier, Ralf Brown has suggested an alternate INT 2Dh protocol in an attempt to shorten the long chains of handlers waiting around for INT 2Fh calls to appear.

Naturally, you can pass any of the addresses displayed by INTCHAIN to a debugger such as DEBUG or SYMDEB. For example, take the 0B94:308D handler for INT 2Fh, which INTCHAIN shows belong to the Microsoft CD-ROM Extensions:

        -u b94:308d
        0B94:308D 9C             PUSHF   
        0B94:308E 80FC11         CMP    AH,11 
        0B94:3091 7503           JNZ    3096 
        0B94:3093 EB6B           JMP    3100 
        0B94:3095 90             NOP     
        0B94:3096 80FC15         CMP    AH,15 
        0B94:3099 7503           JNZ    309E 
        0B94:309B EB09           JMP    30A6 
        0B94:309D 90             NOP     
        0B94:309E 80FC05         CMP    AH,05 
        ; ...

You can see MSCDEX checking for calls to INT 2Fh AH=11h. This makes perfect sense, as INT 2Fh AH=11h is the network redirector protocol, and MSCDEX is a network redirector (see chapter 8). MSCDEX next looks for calls to INT 2Fh AH=15h, which again makes sense since this is the documented MSCDEX API (see Ray Duncan, MS-DOS Extensions ). What about INT 2Fh AH=05h? As explained in the appendix, this is an undocumented interface that allows resident programs (network redirectors in particular) to expand critical error numbers into strings. External DOS programs such as COMMAND.COM issue INT 2Fh AH=05h calls; network redirectors such as MSCDEX handle the calls and provide the caller with strings to display (such as "CDR101: Not ready reading drive D" when you try to DIR a recording of Handel's Messiah ).

How about INT 2Fh under Windows? Figure 6-12 shows INTCHAIN output for the same configuration as figure 6-11, except that INTCHAIN is running in a DOS box under Windows Enhanced mode:

Figure 6-12: INTCHAIN Display for INT 2Fh AX=1200h under Windows Enhanced Mode

        Tracing INT 2F AX=1200
        175 instructions

        14D4:02A7   win
        12E4:0D68   WINICE
        1248:0007   NLSFUNC
        109A:0980   PRINT
        0F16:0943   SHARE
        DB18:0285   DOSKEY
        0B94:308D   MSCDEX
        07FA:1368   SMARTDRV
        0726:019F   COMMAND
        0725:0135   COMMAND
        1580:0045   win386
        0725:01BF   COMMAND
        FFFF:DFD8   HMA
        0255:002D   J:
        CC2C:25ED   DBLSSYS$
        0255:0028   J:
        CC2C:0116   DBLSSYS$
        0116:10C6   DOS
        FDC8:44BD   HMA

Not only have WIN.COM and WINICE (the Soft-ICE/Windows debugger) added themselves to the front of the INT 2Fh chain, but notice that WIN386 has insinuated itself into the middle of the chain. This, however, isn't the half of it. Windows Enhanced mode executes large amounts of code to service interrupts from DOS boxes that never show up in INTCHAIN, at least in its present form. Many instructions, such as STI and CLI, cause a jump into the Windows Virtual Machine Manager, running in 32-bit protected mode. This jump is invisible to a real mode DOS program like INTCHAIN. In particular, Windows Enhanced mode hooks INT 2Fh using the protected mode interrupt descriptor table (IDT). A more sophisticated version of INTCHAIN would need to be written to deal with Windows Enhanced mode. The same goes for INTVECT (listing 6-1), which does however at least recognize the ARPL instruction that Windows uses as a V86 breakpoint.

The MSDOS.SYS and IO.SYS INT 2Fh Handlers

However, you already have the information you want, which is the last line in figures 6-11 and 6-12. (In the next to last line, you see the low-memory stub for INT 2Fh when DOS=HIGH.) In this configuration, the MSDOS.SYS INT 2Fh handler is located at FDC8:44BD; this code is shown with comments in figure 6-13.

Figure 6-13: MSDOS.SYS INT 2Fh Handler from MS-DOS 6.0

        -u fdc8:44bd
        FDC8:44BD FB          STI    
        FDC8:44BE 80FC11      CMP   AH,11       ; 2F/11 network redirector 
        FDC8:44C1 750A        JNZ   44CD        ; no 

        ;; Unsupported functions come here. Some external program like SHARE,
        ;; NLSFUNC, or a redirector is supposed to handle these. If we got here,
        ;; the external program must not be loaded, so it's an error -- except
        ;; if the caller is doing a 2F/??/00 install check, in which case DOS
        ;; will just return AX unchanged to indicate the software isn't installed.

        FDC8:44C3 0AC0        OR    AL,AL       ; 2F/??/00 install check?
        FDC8:44C5 7403        JZ    44CA        ; yes: unsupported func; AX unchanged
        FDC8:44C7 E8DCFF      CALL  44A6        ; no -- set carry flag for error
        FDC8:44CA CA0200      RETF  0002        ; sort-of IRET without changing flags

        FDC8:44CD 80FC10      CMP   AH,10       ; 2F/10 SHARE call?
        FDC8:44D0 74F1        JZ    44C3        ; yes: error
        FDC8:44D2 80FC14      CMP   AH,14       ; 2F/14 NLSFUNC call?
        FDC8:44D5 74EC        JZ    44C3        ; yes: error
        FDC8:44D7 80FC12      CMP   AH,12       ; 2F/12 DOS internal 
        FDC8:44DA 7503        JNZ   44DF        ; no: keep checking
        FDC8:44DC E91102      JMP   46F0        ; yes: goto fig. 6-15a
        FDC8:44DF 80FC16      CMP   AH,16       ; 2F/16 Windows call or broadcast?
        FDC8:44E2 740D        JZ    44F1        ; yes: DOS communicate with Windows
        FDC8:44E4 80FC46      CMP   AH,46       ; 2F/46: misc. DOS/Windows func?
        FDC8:44E7 7503        JNZ   44EC        ; no: jump to IO.SYS INT 2Fh handler
        FDC8:44E9 E9B801      JMP   46A4        ; yes: goto 2F/46 handler
        FDC8:44EC EA05007000  JMP   0070:0005   ; pass to IO.SYS (fig. 6-14)

At the very end of figure 6-13, you can see a hardwired jump to 0070:0005. Here, MSDOS.SYS has decided that it doesn't handle a particular INT 2Fh call, so it passes it down to IO.SYS, which has its own INT 2Fh handler. Geoff Chappell discusses these two DOS INT 2Fh handlers at greater length in his DOS Internals , but since we're here, we might as well steal a brief glance at the IO.SYS INT 2Fh handler, which is shown in figure 6-14. Note that when DOS=HIGH, IO.SYS can assume that A20 is already on because the only path into IO.SYS's INT 2Fh handler is through the one in MSDOS.SYS, which already took care of enabling A20 in its low memory stub (located at 0116:10C6 in figures 6-11 and 6-12).

Figure 6-14: IO.SYS INT 2Fh Handler from MS-DOS 6.0

        -u 70:5
        0070:0005 EA93087000     JMP    0070:0893

        -u 70:893
        0070:0893 2EFF2EE606     JMP    FAR CS:[06E6]

        -dd 70:6e6 6e6
        0070:06E6  FFFF:1302

        -u ffff:1302
        FFFF:1302 80FC13         CMP    AH,13   ; 2F/13 (set INT 13h handler) call?
        FFFF:1305 7413           JZ     131A    ; yes: do it
        FFFF:1307 80FC08         CMP    AH,08   ; 2F/08 DRIVER.SYS call?
        FFFF:130A 743B           JZ     1347    ; yes: do it
        FFFF:130C 80FC16         CMP    AH,16   ; 2F/16 Windows call?
        FFFF:130F 7479           JZ     138A    ; yes: IO.SYS also handles these!
        FFFF:1311 80FC4A         CMP    AH,4A   ; 2F/4A (misc. undoc func) call?
        FFFF:1314 7503           JNZ    1319    ; no: return unchanged
        FFFF:1316 E9A700         JMP    13C0    ; yes: do it 
        FFFF:1319 CF             IRET

There are many interesting side roads we could explore here, including the Set INT 13h Handler (INT 2Fh AH=13h) function, and the several different AH=16h subfunctions that MSDOS.SYS and IO.SYS use to communicate with Windows. Sadly, however, we have to drive by if we are to have any chance of making it to our goal of disassembling DOS. As noted already, to do this, we must find where DOS handles the INT 2Fh AH=12h internal functions.

Examining the MSDOS.SYS Handler for INT 2Fh AH=12h

In figure 6-13, it is clear that FDC8:46F0 is the handler for these functions. As usual, we can pass this address to DEBUG or SYMDEB for unassembly; figure 6-15 shows the results.

Figure 6-15a: MSDOS.SYS Handler for INT 2Fh AH=12h

        -u fdc8:46f0
        FDC8:46F0 2EFF36783F     PUSH   CS:[3F78]   ; word at FDC8:3F78 = 44CAh
        FDC8:46F5 2EFF367A3F     PUSH   CS:[3F7A]   ; word at FDC8:3F7A = 3F7Ch
        FDC8:46FA 50             PUSH   AX          ; push function/subfunction
        FDC8:46FB 55             PUSH   BP 
        FDC8:46FC 8BEC           MOV    BP,SP 
        FDC8:46FE 8B460E         MOV    AX,[BP+0E]  ; put possible stack arg into AX
        FDC8:4701 5D             POP    BP 
        FDC8:4702 E84509         CALL   504A        ; call subroutine (fig. 6-15b)

        FDC8:4705 E9BFFD         JMP    44C7 

Hmm, not very promising looking. What is this subroutine at 504A?

        -u fdc8:504a
        FDC8:504A 55             PUSH   BP 
        FDC8:504B 8BEC           MOV    BP,SP 
        FDC8:504D 53             PUSH   BX 
        FDC8:504E 8B5E06         MOV    BX,[BP+06]      ; address of subfunc table
        FDC8:5051 2E8A1F         MOV    BL,CS:[BX]      ; number of valid sbfuncts
        FDC8:5054 385E04         CMP    [BP+04],BL      ; caller's subfunction number
        FDC8:5057 7317           JNB    5070            ; if too high, error
        FDC8:5059 8A5E04         MOV    BL,[BP+04]      ; get subfunction
        FDC8:505C 32FF           XOR    BH,BH 
        FDC8:505E D1E3           SHL    BX,1            ; turn into word offset
        FDC8:5060 43             INC    BX              ; skip past # subfunctions
        FDC8:5061 035E06         ADD    BX,[BP+06]      ; add in address of table
        FDC8:5064 2E8B1F         MOV    BX,CS:[BX]      ; pull out func ptr
        FDC8:5067 895E06         MOV    [BP+06],BX      ; push on stack, RET to it
        FDC8:506A 5B             POP    BX 
        FDC8:506B 5D             POP    BP 
        FDC8:506C 83C404         ADD    SP,+04 
        FDC8:506F C3             RET                    ; call subfunc via RET
        FDC8:5070 5B             POP    BX              ; invalid sbfunc come here
        FDC8:5071 5D             POP    BP 
        FDC8:5072 C20600         RET    0006 

Despite the heading, the code in figure 6-15b is not specifically related to INT 2Fh AH=12h; other functions that have subfunctions use this same subroutine. For example, the handler for INT 21h AH=5Dh calls this same subroutine. The top of figure 6-15a shows that calling this subroutine involves pushing several values on the stack, including AX, which holds the function and subfunction that the caller wants (for example, 1200h) and the address of a table of function pointers. This table's byte holds the number of valid subfunctions; the rest of the table is an array of near function pointers to the appropriate handlers for each subfunction.

The subroutine in figure 6-15b takes the caller's subfunction number (for example, the 00h in 1200h) and compares it against the byte of the table to see if it is within range. If it is, the code shifts the subfunction number into a word and adds it onto the address of the table; the value is incremented by 1 to skip past the table's byte. The subroutine then pulls the function pointer out of the table, pushes the function pointer on the stack, and "returns" to it.

Locating the INT 2Fh AH=12h Dispatch Table

This is somewhat difficult to follow, but for our purposes, the key piece of information is simply the location of the table, as this holds pointers to every INT 2Fh AH=12h subfunction. At the top of figure 6-15a, there is a comment indicating that, in this configuration, the table is at FDC8:3F7C. The byte of this table is the number of subfunctions. This is followed immediately by an array of 30h words, holding function pointers to the various INT 2Fh AH=12h subfunctions:

        -db fdc8:3f7c 3f7c
        FDC8:3F70                                 30                       0

        -dw fdc8:3f7d
        FDC8:3F7D  470E 6E2E 4CBE 4708 9066 54EB 9342 98EA
        FDC8:3F8D  6F2F 9A9F B38F 6B6A 6B53 48CE 5030 98E3
        FDC8:3F9D  5030 4FF9 5011 9011 9927 9A76 A6A3 AB12
        FDC8:3FAD  4282 AABB AECD 4978 4A12 496C 4FD7 A9FC
        ; ...

Let's see if this is really the INT 2Fh AH=12h dispatch table. Earlier, it was noted that the subroutine at 4282h that DOS is so fond of calling is actually the code for INT 2Fh AX=1218h (Get Caller's Registers). Using SYMDEB to dump the table entry #18h confirms that this is correct:

        -dw fdc8:3f7d+(18*2)
        FDC8:3FAD  4282 ....

The FTAB program from listing 6-4 can produce a nicer display of this same table. In fact, FTAB has an option to display tables such as this that keep the number of subfunctions as their byte. The two commands shown in figure 6-16 are thus equivalent. So that you have a handy 2F/12 crib sheet to refer to, the entire table is shown, along with comments indicating the purpose of each subfunction.

Figure 6-16: INT 2Fh AH=12h Dispatch Table Displayed by FTAB

        C:\UNDOC2\CHAP6>ftab fdc8:3f7e 30 int2f12

        C:\UNDOC2\CHAP6>ftab fdc8:3f7d ? int2f12
        FDC8:470E   int2f12_00      ; install check
        FDC8:6E2E   int2f12_01      ; close current file
        FDC8:4CBE   int2f12_02      ; get interrupt addr
        FDC8:4708   int2f12_03      ; get dos data seg
        FDC8:9066   int2f12_04      ; normalize path separator
        FDC8:54EB   int2f12_05      ; output char
        FDC8:9342   int2f12_06      ; invoke crit err
        FDC8:98EA   int2f12_07      ; make disk buff most recently used
        FDC8:6F2F   int2f12_08      ; decrement sft ref count
        FDC8:9A9F   int2f12_09      ; flush and free disk buff
        FDC8:B38F   int2f12_0A      ; perform crit err interrupt
        FDC8:6B6A   int2f12_0B      ; signal share violation
        FDC8:6B53   int2f12_0C      ; set fcb file's owner
        FDC8:48CE   int2f12_0D      ; get date and time
        FDC8:5030   int2f12_0E      ; mark all disk buffer unreferenced
        FDC8:98E3   int2f12_0F      ; make buffer most recently used
        FDC8:5030   int2f12_10      ; find unreferenced disk buffer
        FDC8:4FF9   int2f12_11      ; normalize asciiz filename
        FDC8:5011   int2f12_12      ; strlen
        FDC8:9011   int2f12_13      ; toupper
        FDC8:9927   int2f12_14      ; _fstrcmp
        FDC8:9A76   int2f12_15      ; flush buffer
        FDC8:A6A3   int2f12_16      ; get address of SFT entry
        FDC8:AB12   int2f12_17      ; set working drive
        FDC8:4282   int2f12_18      ; get caller's registers
        FDC8:AABB   int2f12_19      ; set drive
        FDC8:AECD   int2f12_1A      ; get file's drive
        FDC8:4978   int2f12_1B      ; set year, length of February
        FDC8:4A12   int2f12_1C      ; checksum memory
        FDC8:496C   int2f12_1D      ; sum memory
        FDC8:4FD7   int2f12_1E      ; compare filenames
        FDC8:A9FC   int2f12_1F      ; build CDS
        FDC8:A66C   int2f12_20      ; get JFT entry
        FDC8:AEA8   int2f12_21      ; truename
        FDC8:4434   int2f12_22      ; set extended err info
        FDC8:8147   int2f12_23      ; check if char dev
        FDC8:5030   int2f12_24      ; delay
        FDC8:501F   int2f12_25      ; strlen
        FDC8:50D4   int2f12_26      ; open file
        FDC8:A72A   int2f12_27      ; close file
        FDC8:50DA   int2f12_28      ; move file pointer (lseek)
        FDC8:A839   int2f12_29      ; read file
        FDC8:5094   int2f12_2A      ; set fastopen entry point
        FDC8:5117   int2f12_2B      ; ioctl
        FDC8:5106   int2f12_2C      ; get dev chain
        FDC8:5134   int2f12_2D      ; get extended err code
        FDC8:5139   int2f12_2E      ; get/set error table addresses
        FDC8:440D   int2f12_2F      ; nop

The whole reason for looking at INT 2Fh AH=12h was that we expected that many of the near function calls that DOS makes internally would show up here. Indeed, you can now see clearly that the CALL 4282h that has continually popped up in these explorations is actually INT 2Fh AX=1218h. Similarly, as promised earlier, CALL 466C is actually INT 2Fh AX=1220h (Get JFT Entry). DOS internally makes extensive use of the functions in figure 6-16, but as already noted, it does so using a near CALL rather than an INT. DOS provides the INT form mostly for use by redirectors (see chapter 8). So, having this table of obscure INT 2Fh AH=12h functions definitely makes it much easier to understand the code for the INT 21h functions in which you are presumably interested.

Recall that, in figures 6-11 and 6-12, the process of locating this table started by having the INTCHAIN program call INT 2Fh AX=1200h. This function, the DOS internal services install check, does nothing more than return with AL=FFh to indicate that the services are present. The table indicates that FDC8:470E is the handler for this function. Let's unassemble at this address to check that the table makes sense:

        -u fdc8:470e
        FDC8:470E B0FF       MOV    AL,FF 
        FDC8:4710 C3         RET     

How about INT 2Fh AX=1203h, which is supposed to return with the DOS data segment in DS?

        -u fdc8:4708
        FDC8:4708 2E8E1EE73D     MOV    DS,CS:[3DE7]
        FDC8:470D C3             RET

The table seems to be accurate, so let's look at a more interesting function. According to the appendix, INT 2Fh AX=1217h sets DOS's working drive; the caller must push a zero-based drive number on the stack before calling the function. According to figure 6-16, this function is located at FDC8:AB12. Figure 6-17 shows a commented SYMDEB unassembly of this code.

Figure 6-17: INT 2Fh AX=1217h (Set Working Drive) in MS-DOS 6.0

        -u fdc8:ab12
        ;;; SS points at DOS DS
        ;;; Here, SysVars is at DOS:0026. So DOS:0047 is SysVars+21h
        FDC8:AB12 363A064700     CMP    AL,SS:[0047]    ; SysVars+21h = LASTDRIVE
        FDC8:AB17 7202           JB AB1B                ; is drive < LASTDRIVE?
        FDC8:AB19 F9             STC                    ; no: set carry flag, fail
        FDC8:AB1A C3             RET     
        FDC8:AB1B 53             PUSH   BX              ; yes
        FDC8:AB1C 50             PUSH   AX 
        FDC8:AB1D 36C5363C00     LDS    SI,SS:[003C]    ; SysVars+16h = CDS ptr
        FDC8:AB22 B358           MOV    BL,58           ; 58h = size of CDS entry
        FDC8:AB24 F6E3           MUL    BL 
        FDC8:AB26 03F0           ADD    SI,AX           ; DS:SI = ptr to drive's CDS
        ;;; Here, SDA at DOS:0320, so DOS:05A2 is SDA+282h
        FDC8:AB28 368936A205     MOV    SS:[05A2],SI    ; move drive's CDS ptr into
        FDC8:AB2D 368C1EA405     MOV    SS:[05A4],DS    ;    DOS SDA+282h
        FDC8:AB32 58             POP    AX 
        FDC8:AB33 5B             POP    BX 
        FDC8:AB34 F8             CLC     
        FDC8:AB35 C3             RET

But if this function is called with the drive number on the stack, you may wonder how the code starts off with the drive number in AL. Looking back at figure 6-15a, note that the generic INT 2Fh AH=12h handler took a word off the stack (BP+0Eh, located after the caller's CS:IP and flags) and moved it into AX. In the case of those functions that don't expect a parameter on the stack, AX holds ignorable garbage. Thus, when this chapter says in various places that DOS makes an INT 2Fh AX=12xxh call, this is just a shorthand way of saying that DOS issues a near call to the code for INT 2Fh AX=12xxh, and that any parameter which, in the INT 2Fh version, would appear on the stack (see the appendix) actually appears in AX.

Everything else in this function involves fairly straightforward manipulation of DOS internal structures. The function checks the drive number against the internal value of LASTDRIVE in SysVars. If the drive number is valid, the function uses it as an index into the CDS array, a pointer to which is also contained in SysVars. The function then moves a pointer to the CDS entry into a DOS global variable. Changing this variable is basically what it means to set DOS's working drive.

It is useful to see how DOS internally uses the LASTDRIVE variable in SysVars, the CDS, and other undocumented DOS features. Discussions of undocumented DOS are often (as in the edition of the book) disconnected from any consideration of DOS internals. But the CDS, SFT, List of Lists, and other structures are not provided for our entertainment, like the hidden "gang screens" that software hobbyists and enthusiasts seem to enjoy finding. In fact, the CDS and so on are not so much undocumented DOS features, as internal DOS features that happen to be externally accessible through an undocumented interface. That undocumented DOS is often discussed without the surrounding context of DOS internals tends to obscure the real purpose of these structures. It is important to realize that the "true" form is the internal one, not the undocumented one.

For example, even though this chapter has often referred to the location of variables such as CURR_PSP as SDA+10h, or BREAK_FLAG as SDA+17h, within DOS there really is no such thing as the Swappable Data Area. The SDA is merely an externally-visible interface that Microsoft added at a rather late on top of the DOS data segment (see "Origins of the SDA" in chapter 8). Likewise, the INT 2Fh AH=12h functions are just an undocumented external interface provided on top of some internal DOS functions, for the convenience mostly of network redirectors. The internal near-call form of these functions is the true one.

What have we accomplished here? Basically, by locating the INT 2Fh AH=12h dispatch table, we now acquired names for 30h different internal DOS functions. Our earlier uncovering of the INT 21h dispatch table gave us names for 6Dh different locations in DOS. Rather than keep picking at disassembly of individual functions here and there, we can now turn around and do a full-blown disassembly of this entire code segment.

Really Disassembling DOS

Everything we've looked at in DOS is in the same code segment, which in this particular configuration happens to be FDC8h. Of course there are other parts of DOS, but this segment seems like a good place to start. How can you disassemble the entire code segment at once, but still keep track of where the individual functions are located? For example, in a monster disassembly of segment FDC8h, you would like to know where the Set PSP function is handled, where Exec is handled, and so on.

You can use DEBUG or SYMDEB to produce a disassembly of this DOS code segment, and use the FTAB program to produce labels indicating the location of key functions within the segment. To merge the FTAB output with the disassembly, and, while we're about it, clean up and improve the disassembly in various ways, we will use a program named NICEDBG, written in AWK, a C-like pattern-matching language that is excellent for text processing tasks like this.

To unassemble the main DOS code segment, you need to know where to tell DEBUG to start and stop unassembly. You can make a preliminary stab at finding the proper unassembly range by taking the FTAB outputs for the INT 21h dispatch table (figure 6-8) and the INT 2Fh AH=12h dispatch table (figure 6-16), combining them, and sorting them by address:

        C:\UNDOC2\CHAP6>type tmp.bat
        @@echo off
        ftab fdc8:3e9e 6d INT21 > int212f.tmp
        ftab fdc8:3f7d 30 INT2F_12 >> int212f.tmp
        sort < int212f.tmp > int212f.log


        C:\UNDOC2\CHAP6>type int212f.log
        FDC8:4052   INT21_33
        FDC8:40A9   INT21_50
        FDC8:40B5   INT21_51
        FDC8:40B5   INT21_62
        FDC8:40C1   INT21_64
        FDC8:4282   INT2F_12_18
        FDC8:440D   INT21_18
        ; ... etc. ...
        FDC8:B0E9   INT21_5A
        FDC8:B183   INT21_6C
        FDC8:B2D8   INT21_5C
        FDC8:B38F   INT2F_12_0A

From the and last lines of INT212F.LOG, it is clear that you want DEBUG or SYMDEB to unassemble starting at FDC8:4052 and ending somewhere a bit after FDC8:B38F. B500h is probably a good place to stop. You will probably need to adjust the unassembly range later, and rerun DEBUG, but this is fine for now. You can put the unassembly command into a tiny script file, feed it to the debugger, and redirect the debugger's output to a file:

        C:\UNDOC2\CHAP6>type int212f.scr
        u fdc8:4052 b500

        C:\UNDOC2\CHAP6>debug < int212f.scr > int212f.out

Using SYMDEB rather than DEBUG produces nicer results. SYMDEB puts segment overrides in their proper place, rather than on a separate line like DEBUG. But you must use the SYMDEB /X command line to suppress SYMDEB's [more] prompt, which you wouldn't see if you redirected output to a file:

C:\UNDOC2\CHAP6>symdeb /x < int212f.scr > int212f.out

This takes a minute or so to run. The INT212F.OUT file will be about 870k bytes—much smaller if you use SYMDEB—and won't yet look very interesting. For example, there aren't yet any labels indicating where each DOS function starts. One of the things NICEDBG can do is merge the INT212F.OUT file produced by DEBUG or SYMDEB with the INT212F.LOG file that you produced using FTAB.

Windows Patches MS-DOS

Actually, there's one interesting thing you can do with the raw unassembly output from DEBUG or SYMDEB. Run the DEBUG unassembly script once under MS-DOS; then start Windows Enhanced mode and rerun the DEBUG script again from inside a DOS box. Redirect DEBUG's output to a different file. This sequence gives you an easy way to examine the patches that Windows applies to MS-DOS. Just compare the two files, using diff or a similar utility. Any differences in this DOS code segment are the result of Windows patches.

        C:\UNDOC2\CHAP6>debug < int212f.scr > int212f.out


        ;;; from inside DOS box:
        C:\UNDOC2\CHAP6>debug < int212f.scr > int212f.win

        C:\UNDOC2\CHAP6>diff int212f.out int212f.win > int212f.dif

The list of Windows patches in INT212F.DIF is incomplete, because it shows only one DOS code segment. Still, it does provide some idea of what is going on:

        ;; original MS-DOS code in INT 21h dispatch (see figure 6-7 
        < FDC8:41CE 50            PUSH  AX
        < FDC8:41CF B482          MOV   AH,82
        < FDC8:41D1 CD2A          INT   2A
        < FDC8:41D3 58            POP   AX

        ;; patched by Windows; 15AD belongs to WIN386.EXE
        > FDC8:41CE 9A0A00AD15    CALL  15AD:000A
        > FDC8:41D3 90            NOP

        ;; original DOS code in a frequently called internal Begin 
        Crit 01 function
        < FDC8:514B B80180        MOV   AX,8001                            
        < FDC8:514E CD2A          INT   2A                                 

        ;; patched by Windows 
        > FDC8:514B 9A4300AD15    CALL  15AD:0043

        ;; original DOS code in a frequently called internal End Crit 
        01 function
        < FDC8:516B B80181        MOV   AX,8101                            
        < FDC8:516E CD2A          INT   2A                                 

        ;; patched by Windows
        > FDC8:516B 9A7900AD15    CALL  15AD:0079

        ; ... etc. ...

The DOSMGR VxD built into WIN386.EXE applies these patches. When Windows exits, DOSMGR of course backs its changes out, restoring the original DOS code. As you can see, these patches have to do with DOS critical sections; DOSMGR wants DOS to call into the Windows VMM Begin_Critical_Section and End_Critical_Section functions. It's important to note that DOSMGR scans for the INT 2Ah instructions to patch, rather than using hardwired addresses. Thus, these patches should at least theoretically also work with a different vendor's DOS.

The same before and after technique can be used to find DOS patches applied by other programs, such as MSCDEX. Programs that patch DOS can only be safely unloaded by a MARK/RELEASE type of program that knows enough about these patches to back them out.


To run NICEDBG, feed it output from DEBUG or SYMDEB. Optionally, you can supply a symbol-table file of code name/address pairs such as FTAB produces. You can also supply NICEDBG with an optional file of data name/address pairs (see below). For example:

        debug < int212f.scr > int212f.out
        ftab fdc8:3e9e 6d INT21 > int212f.log
        ftab fdc8:3f7d 30 INT2F_12 >> int212f.log
        nicedbg int212f.out int212f.log int212f.dat > int212f.lst

NICEDBG can make many improvements to the output from DEBUG or SYMDEB. The program makes several passes over the DEBUG file, replacing calls and jumps to meaningless-looking addresses such as 4282h with calls and jumps to meaningful labels supplied by the user, such as INT2F_12_18. The program also creates semi-useful labels for any other addresses that are the target of calls, loops, or jumps. If the target address itself contains a RET or JMP, NICEDBG changes the label to reflect this. The program also generates a list of cross-references to each location.

For example, a sample of output from DEBUG looks like this:

        FDC8:5126 9C            PUSHF
        FDC8:5127 36            SS:
        FDC8:5128 803E0C0D00    CMP BYTE PTR [0D0C],00
        FDC8:512D 740F          JZ  513E
        FDC8:512F EB01          JMP 5132
        FDC8:5131 CF            IRET
        FDC8:5132 0E            PUSH    CS
        FDC8:5133 E8FBFF        CALL    5131
        FDC8:5136 50            PUSH    AX
        FDC8:5137 B80180        MOV AX,8001
        FDC8:513A CD2A          INT 2A
        FDC8:513C 58            POP AX
        FDC8:513D C3            RET
        FDC8:513E EB01          JMP 5141
        FDC8:5140 CF            IRET
        FDC8:5141 0E            PUSH    CS
        FDC8:5142 E8FBFF        CALL    5140
        FDC8:5145 C3            RET

This is not very promising looking. But NICEDBG can transform this raw disassembly listing into something much more readable and useful:

        ; xref: FDC8:4304 FDC8:438B FDC8:4D7A
        FDC8:5126   9C              PUSHF 
        FDC8:5127   36803E0C0D00    CMP BYTE PTR SS:[0D0C],00 
        FDC8:512D   740F            JZ jmp_513E  -> loc_5141
        FDC8:512F   EB01            JMP loc_5132 

        ; xref: FDC8:5133 
        FDC8:5131   CF              IRET 

        ; xref: FDC8:512F 
        FDC8:5132   0E              PUSH CS 
        FDC8:5133   E8FBFF          CALL ret_5131 
        FDC8:5136   50              PUSH AX 
        FDC8:5137   B80180          MOV AX,8001 
        FDC8:513A   CD2A            INT 2A 
        FDC8:513C   58              POP AX 
        FDC8:513D   C3              RET 

        ; xref: FDC8:512D 
        FDC8:513E   EB01            JMP loc_5141 

        ; xref: FDC8:5142 
        FDC8:5140   CF              IRET 

        ; xref: jmp_513E 
        FDC8:5141   0E              PUSH CS 
        FDC8:5142   E8FBFF          CALL ret_5140 
        FDC8:5145   C3              RET 

Here are some of the changes that NICEDBG made at various offsets in the code:

NICEDBG uses loc_ to specify targets of jumps, func_ to specify targets of CALLs, loop_ to specify targets of LOOPs, ret_ to specify code that immediately returns via either RET or IRET, jmp_ to specify code that does an unconditional JMP. If the user supplies a symbol-table file of name/address pairs such as generated by FTAB, NICEDBG will use this as a source of labels.

NICEDBG.AWK (listing 6-6) is the source code for this postprocessor for output from DEBUG or SYMDEB.

What is AWK?

Since the reader is likely to be unfamiliar with AWK, a brief explanation of listing 6-6 is probably called for. AWK reads in each line of text in one or more files and splits the line into fields. You can change the delimiters that AWK uses to decide where fields start and end, but it defaults to using white space, which is exactly what we need here. The fields are available to the program as $1, $2, and so on, up to $NF (NF is a built-in AWK variable that holds the number of fields); $0 is the original line. For example, the line "FDC8:440D INT21_1D" is $0, "FDC8:440D" is $1, and "INT21_ID" is $2 (and $NF).

Note too that AWK handles regular expressions (as also found in utilities such as grep); for example the regular expression "/[CDES]S\:/" matches "CS:", "DS:", "ES:", or "SS:", and "/\[.*\]/" matches anything within square brackets. AWK also has associative arrays (just built-in hash tables, really) that can be indexed with strings (for example, array["string"]) as well as numbers. The presence of an item in an associative array can be tested with the in operator; for example, if ("string" in array).

The standard reference is The AWK Programming Language by Alfred Aho, Brian Kernighan, and Peter Weinberger (from the letters of whose last names the language got its name). The high-level pattern-matching and array features of AWK make it possible to implement NICEDBG in about 200 lines of code.

NICEDBG.EXE on the accompanying disk was produced with the excellent AWK compiler from Thompson Automation. You can run the program without having AWK or understanding anything about it; but to modify the program, you would of course need Thompson AWK or another AWK interpreter or compiler. The popular MKS Toolkit comes with AWK, and many BBSs carry MAWK, a freely available, fast AWK interpreter by Mike Brennan.

NICEDBG processes each line in the DEBUG file. For example, consider the following line from a DEBUG listing:

        FDC8:512D 740F JZ 513E

AWK breaks this line into fields, delimited by spaces. The nth field is referred to as $n :

        $1    FDC8:512D          Address of the instruction
        $2    740F               Instruction opcode bytes 
        $3    JZ                 Instruction operator
        $4    513E               Instruction operand

Of course, not every instruction looks quite like this. For example:

        $1     $2            $3  $4   $5     $6
        FDC8:5126 9C            PUSHF
        FDC8:5127 36            SS:
        FDC8:5128 803E0C0D00    CMP BYTE PTR [0D0C],00  

In any case, NICEDBG.AWK can rely on $1 as the address of the instruction, and $3 as either the instruction operator or (when using DEBUG rather than SYMDEB) something like a segment override.

Before processing the DEBUG file, NICEDBG reads in the optional symbol-table and data files. NICEDBG uses INT212F.LOG (or any similarly formatted file) to build a table of names (called ftab) corresponding to segment:offset locations; the program runs through each line in INT212F.OUT, or any unassembly listing produced by DEBUG or SYMDEB, to see if the line's segment:offset address is in the table.

NICEDBG makes three passes over the DEBUG file:

Pass 1 : NICEDBG looks for any calls, jumps, or loops in the code, and adds the target of the call, jump, or loop to ftab, which it will later use to generate labels. Simplifying considerably, the AWK code looks like this:

        if ($3 ~ /CALL/) ftab[$4] = "func_" $4;
        if ($3 ~ /LOOP/) ftab[$4] = "loop_" $4;
        if ($3 ~ /J.*/)  ftab[$4] = "loc_" $4;

In pass 1, NICEDBG also constructs the jmptab, for resolving JMPs to JMPs:

        if ($3 ~ /JMP/)  jmptab[$1] = $4;   # jmptab[SOURCE] = TARGET

Pass 2 : The second time through the DEBUG file, NICEDBG builds its xref table, and also improves some of the labels generated in pass 1. A label such as jmp_XXXX or ret_XXXX, indicating that location XXXX does an unconditional JMP or (I)RET, is generally more useful than a label such as loc_XXXX, indicating that XXXX is the target of a jump. Thus, if pass 1 assigned a location a name, and if this location does a JMP or a (I)RET, NICEDBG changes ftab to reflect this:

        if (($3 ~ /I*RET/) && ($1 in ftab)) ftab[$1] = "ret_" $1;
        if (($3 ~ /JMP/) && ($1 in ftab))   ftab[$1] = "jmp_" $1;

Also in pass 2, NICEDBG looks for code that may be "not reached," that is, not accessible from any other location in the listing (of course, the code might be called from some other place that happened not to be in the disassembly range). If the previous line of code did an unconditional JMP or (I)RET, and if there are no labels at the current address (i.e., ftab[$1] is empty, indicating that $1 is not the target of a jump, call, or loop), NICEDBG adds $1 to a not_reached array:

        if ((did_jmpret == 1) && (! ($1 in ftab))) not_reached[$1]++;
        did_jmpret = 0;
        if ($3 ~ /I*RET|JMP/) did_jmpret = 1;

Pass 3 : In its final pass over the DEBUG listing, NICEDBG prints out the new, improved listing:

        0069        STARTUP_DRV
        0330        CURR_PSP
        0337        BRK_FLAG
        3DE7        DOS_DS
        1030        IN_WIN3E
        033E        MACHINE_ID
        0321        IN_DOS
        0584        USER_SP
        0586        USER_SS
        0320        CRIT_ERR
        1211        DOS_HIGH

But note that NICEDBG's replacements of, for example, [0330] with CURRENT_PSP are very simple-minded: the program merely does a blind global search and replace. Thus, you should be conservative about what you put in a NICEDBG .DAT file.

If DEBUG rather than SYMDEB was used to produce NICEDBG's input, NICEDBG saves away any segment override on the current line ($3 ~ /[CDES]S\:/) and uses the AWK sub() substitution function to smack it into its proper place on the next line.

Listing 6-6: NICEDBG.AWK

        # NICEDBG.AWK -- Produces nicer output from DEBUG input and symbol table
        # Copyright (c) 1993 Andrew Schulman. All rights reserved.
        # usage: nicedbg symtab dbgfile > lstfile
        # example: nicedbg int212f.log int212f.out > int212f.lst

        # get offset from seg:ofs
        function get_off(addr)      { split(addr, so, ":"); return so[2]; }

        function mk_fp(ofs)         { return seg ":" ofs; }  # make seg:ofs farptr

        function get_ftab_name(addr) {  # get name from table
            if (addr !~ SEG_OFS)
                addr = mk_fp(addr);     # table indexed by seg:ofs
            if (! (addr in ftab))
                return addr;            # not there -- return unchanged
            split(ftab[addr], label, ",");
            return label[1];            # just return first name if > 1

        function resolve_jmp_jmp(src) { # JMP to JMP to ...
            if (! (src in jmptab))
            if (done[src])
                return done[src];
            # if get here, haven't seen this one yet
            target = target2 = jmptab[src];
            while (target in jmptab)    {
                target2 = jmptab[target];
                if (target2 == target)      # endless loop
                if (target2 == src)         # cycle
                if (target2 in done) {      # we've seen this part already
                    target2 = done[target2];
                target = target2;
            done[src] = target2;
            return target2;

        function hex(x)     { return 0 + ("0x" x); }    # relies on Thompson AWK

        BEGIN {
            print "NICEDBG -- Makes nicer output from DEBUG input and symbol table";
            print "From \"Undocumented DOS\", 2nd edition (Addison-Wesley, 1993)";
            print "Copyright (C) 1993 Andrew Schulman. All rights reserved.\n";
            if (ARGC < 2)  {
                print "usage: nicedbg dbgfile [symtab] [datfile] > lstfile" ;
                print "example: nicedbg int212f.out int212f.log > int212f.lst" ;
                did_anything = 0;
            else did_anything = 1;
            # commonly-used regular expressions
            SQ_BRACK = /\[.*\]/;              # anything within square brackets
            SEG_OFS = /\:/;                   # has a : in it
            SEG_OVERRIDE = /[CDES]S\:/;       # CS: or DS: or ES: or SS:
            CALL_OR_JUMP = /CALL|LOOP|J.*/;   # CALL, LOOP, JMP, J*

            # read in optional symbol-table file
            # lines in symtab file look like:  xxxx:yyyy    name
            if (ARGC > 2) {
                while (getline < ARGV[2])       # for each line in symbol table
                    ftab[$1] = ftab[$1] $2 ","; # put name into table for seg:ofs

            # read in optional data file
            # lines in data file look like:    xxxx   name
            # example:                         0321   IN_DOS
            if (ARGC > 3)   {
                while (getline < ARGV[3])
                    data[$1] = $2;

            ARGC = 2;                           # finished with sym, dat file
            dbgfile = ARGV[1];                  # switch over to DEBUG file

            # debug file looks like:   xxxx:yyyy   XXXXXX   op operands
            # example:                 FDC8:4052   3C06     CMP AL,06    ; comments
            while (getline < dbgfile)   {       # make pass 1 through debug file
                if ($1 ~ SEG_OFS) {
                    split($1, so, ":");
                    if (! seg)  {
                        seg = so[1];           # get segment for later use
                        start = hex(so[2]);
                        stop = so[2];          # take last one
                if ($3 ~ CALL_OR_JUMP)  {
                    if ($4 ~ /\:|\[.*\]|FAR/)  # don't do [xxxx] or xxxx:yyyy etc.
                    # should also ignore e.g. CALL DI
                    if ($3 ~ /JMP/)
                        jmptab[get_off($1)] = $4;   # jmptab for resolving JMP JMP
                    if (! (mk_fp($4) in ftab))      # put call/jmp target into table
                        ftab[mk_fp($4)] = (($3 ~ /CALL/) ? "func_" : 
                                        ($3 ~ /LOOP/) ? "loop_" : "loc_") $4;
            stop = hex(stop);

            # pass 2: build cross-ref table, improve some label names, etc.
            while (getline < dbgfile)   {
                if ((did_jmpret == 1) && (! ($1 in ftab)))
                    not_reached[$1]++; # prev line did JMP/RET, but no label, so
                did_jmpret = 0;        #    "not reached"; may be data or dead code

                if ($3 ~ /I*RET|JMP/)   {
                    did_jmpret = 1;
                    if ($1 in ftab)     # if target is a ret/jmp, change label name
                        ftab[$1] = (($3 ~ /JMP/) ? "jmp_" : "ret_") get_off($1);
                    # oops, this will also replace labels supplied in sym file!
                # below *not* "else if" -- JMP handled both places
                # build xref table and outside-range table
                if (($3 ~ CALL_OR_JUMP) && ($4 !~ SQ_BRACK) && ($5 !~ SQ_BRACK)) {
                    if ($4 ~ /FAR/)
                    else if ($4 ~ SEG_OFS)
                    else {
                        off = hex($4);
                        if ((off < start) || (off > stop))
                    if ($4 !~ /\:|FAR/)      # don't do [xxxx] or xxxx:yyyy
                        xref[mk_fp($4)] = xref[mk_fp($4)] get_ftab_name($1) " ";

        {                                   # pass 3: for each line in dbg file
            while (! ($1 ~ SEG_OFS)) {      # ignore any lines without xxxx:yyyy
                print; getline;
                if (! $0) exit;

            jmpline = "";

            # indicate if this is possible unreached (dead) code; show 
            # cross-reference (xref) table; show all labels for this address
            if ($1 in not_reached)  {              # possible dead code
                print ""
                print ";;; not reached?";
            else if ($1 in ftab) {                 # if segment:offset in table
                print ""
                if (xref[$1]) 
                    print "; xref: " xref[$1]      # show xref
                nf = split(ftab[$1], label, ",");
                for (i=1; i<=nf; i++)
                    if (label[i])                  # show all labels for this addr
                        printf("%24s%s:\n", " ", label[i]);
                ftab_found[$1] = 1;

            # if a CALL, LOOP, or some kind of JMP, show eventual destination
            # of any JMP JMP, and possibly replace number address with string name
            if ($3 ~ CALL_OR_JUMP)  {
                if ($4 !~ /FAR/)    {
                    if ($4 in jmptab)
                        jmpline = " -> " get_ftab_name(resolve_jmp_jmp($4));
                    $4 = get_ftab_name($4);     # replace number with name

            # cheap replacement of [xxxx] with names from data file
            if (match($0, SQ_BRACK))            # match sets RSTART, RLENGTH
                if ((addr = substr($0, RSTART+1, RLENGTH-2)) in data)
                    sub(SQ_BRACK, data[addr], $0);  # sub() does substitution

            # get rid of DEBUG segment override ugliness
            if ($3 ~ SEG_OVERRIDE) {
                ovride_addr = $1;               # save to use on next line
                byte = $2;
                override = $3;
            else if (ovride_addr)   {
                $1 = ovride_addr; ovride_addr = "";
                $2 = byte $2; 
                sub(/\[/, override "[", $0);    # plug in override:

            # print out (possibly altered) line
            if (! ovride_addr)  {
                printf("%s\t%-15s\t", $1, $2);
                for (i=3; i<=NF; i++)
                    printf("%s ", $i);
                if (jmpline)
                    printf("%s", jmpline);

        # print list of CALL, JMP, etc. references outside disasm range
        END {
            if (did_anything) {
                printf("\n;; outside range %s:%04X-%04X:\n", seg, start, stop);
                for (x in outside)
                    printf(";; " ((x ~ SEG_OFS) ? "%s" : "%04X") "\n", x);
                # should suppress following if within a not-reached block?
                printf("\n;; possible unresolved labels:\n");
                for (x in ftab)
                    if (! (x in ftab_found))
                        printf(";; %s\n", ftab[x]);

With output from DEBUG in INT212F.OUT, a symbol table produced by FTAB in INT212F.LOG, and the optional data file INT212F.DAT, you can produce a nice looking disassembly of the main MSDOS.SYS code segment, INT212F.LST, with:

        nicedbg int212f.out int212f.log int212f.dat > int212f.lst

We will examine this INT212F.LST file in more detail momentarily, but the following except provides some idea of what NICEDBG produces:

        FDC8:4282 2E8E1EE73D    MOV DS,CS:DOS_DS 
        FDC8:4287 C5368405      LDS SI,USER_SP 
        FDC8:428B C3            RET 
        ; ...
        FDC8:4D59 E826F5        CALL INT2F_12_18
        FDC8:4D5C C744022103    MOV WORD PTR [SI+02],IN_DOS
        FDC8:4D61 8C5410        MOV [SI+10],SS
        FDC8:4D64 C3            RET

This is quite usable. You can see that INT 21h AH=34h (Get InDOS Flag Address) calls the code for INT 2Fh AX=1218h (Get Caller's Registers) and then moves DOS_DS:0321 into the caller's ES:BX registers. This is just as you would expect.

You could make this even more readable by going into INT212F.LOG and taking the only partially useful names, such as INT21_34 and INT2F_12_18 produced by FTAB, and replacing them with more evocative names, such as GET_INDOS_34 and GET_STACKPTR_1218. But this is left as an exercise for the reader (who may in any case know all the DOS function numbers by heart and not require such a crutch). The point is simply that you can manually change or add to INT212F.LOG as you discover new functions. For example, you can add the following two functions that you already know about from running INTCHAIN:

        FDC8:40F8   INT21_DISPATCH
        FDC8:44BD   INT2F_DISPATCH

Please note that INT212F.LST is not included on the accompanying disk, as redistributing a large piece of MS-DOS would obviously violate Microsoft's copyright! However, it should be easy for readers to produce their own personal copies, given the instructions in this chapter. Let us quickly summarize the steps involved in producing INT212F.LST:

  1. INTCHAIN 21/6200 and use last line to locate DOS INT 21h handler.
  2. DEBUG or SYMDEB to unassemble INT 21h handler; locate dispatch table.
  3. Run FTAB on INT 21h dispatch table > tmpfile.
  4. INTCHAIN 2F/1200 and use last line to locate DOS INT 2Fh handler.
  5. INTCHAIN 2F/1200 and use last line to locate DOS INT 2Fh handler.
  6. Run FTAB on INT 2Fh dispatch table >> tmpfile.
  7. SORT < tmpfile> symfile.
  8. Inspect top and bottom of symfile to create script for DEBUG or SYMDEB.
  9. DEBUG > script < outfile.
  10. Optionally create datafile.
  11. Optionally change and add to symfile.
  12. NICEDBG outfile symfile [datafile] > lstfile
  13. Check "outside range" comment at end of lstfile. Possibly alter script, and goto step 9.

The last point needs an explanation. Because code and data are intermixed within DOS, DEBUG and SYMDEB are likely to encounter data that they will misinterpret as code. This invalid code can throw off the unassembly of valid code further on in memory. The result is that INT212F.LST may contain, for example, several CALLs to func_9024 but, instead of showing code at offset 9024h, there is instead some bogus-looking instruction at offset 9023h. NICEDBG will list such possibly unresolved labels at the end of the listing; you can use this to split the DEBUG or SYMDEB u command into two or more parts. For example, let's say that there are valid-looking calls to func_9024, but no func_9024 itself. If the original DEBUG script contained the following command:

        u fdc8:4052 b500

you can split this in two, making DEBUG restart unassembly at offset 9024h:

        u fdc8:4052 9024
        u fdc8:9024 b500  

At this point, of course, you may idea of postprocessing DEBUG output a little ridiculous. You may want to switch to genuine disassembler such as V Communications' Sourcer.

Remember that we've disassembled just one MSDOS.SYS code segment. You can apply the same techniques to other parts of MS-DOS (the outside range list produced by NICEDBG is helpful here), to DR DOS, or to NetWare's NETX code.

Examining a Few DOS Functions

Let's look at a small portion of the MS-DOS 6.0 disassembly produced by DEBUG with a little help from FTAB, INTCHAIN, and NICEDBG. Figure 6-18 below shows the code for a few simple DOS functions.

Figure 6-18: MS-DOS 6.0 Code for Functions 34h, 52h, 1Fh, 32h, and 0Dh

        FDC8:4D59   E826F5          CALL INT2F_12_18 
        FDC8:4D5C   C744022103      MOV Word Ptr [SI+02],0321 
        FDC8:4D61   8C5410          MOV [SI+10],SS 
        FDC8:4D64   C3              RET 

        FDC8:4D65   E81AF5          CALL INT2F_12_18 
        FDC8:4D68   C744022600      MOV Word Ptr [SI+02],0026 
        FDC8:4D6D   8C5410          MOV [SI+10],SS 
        FDC8:4D70   C3              RET 

        FDC8:4D71   B200            MOV DL,00 

        FDC8:4D73   16              PUSH SS 
        FDC8:4D74   1F              POP DS 
        FDC8:4D75   8AC2            MOV AL,DL 
        FDC8:4D77   E8415D          CALL INT2F_12_19 
        FDC8:4D7A   7222            JB loc_4D9E 
        FDC8:4D7C   C43EA205        LES DI,[05A2] 
        FDC8:4D80   26F6454480      TEST Byte Ptr ES:[DI+44],80 
        FDC8:4D85   7517            JNZ loc_4D9E 
        FDC8:4D87   E8B003          CALL func_513A 
        FDC8:4D8A   E83749          CALL func_96C4 
        FDC8:4D8D   E8CA03          CALL func_515A 
        FDC8:4D90   720C            JB loc_4D9E 
        FDC8:4D92   E8EDF4          CALL INT2F_12_18 
        FDC8:4D95   896C02          MOV [SI+02],BP 
        FDC8:4D98   8C440E          MOV [SI+0E],ES 
        FDC8:4D9B   32C0            XOR AL,AL 
        FDC8:4D9D   C3              RET 

        ; xref: FDC8:4D7A FDC8:4D85 FDC8:4D90 
        FDC8:4D9E   B0FF            MOV AL,FF 
        FDC8:4DA0   C3              RET 

        FDC8:4DA1   B0FF            MOV AL,FF 
        FDC8:4DA3   16              PUSH SS 
        FDC8:4DA4   1F              POP DS 
        FDC8:4DA5   E89203          CALL func_513A 
        FDC8:4DA8   830E110604      OR Word Ptr [0611],+04 
        FDC8:4DAD   E8844C          CALL func_9A34 
        FDC8:4DB0   83261106FB      AND Word Ptr [0611],-05 
        FDC8:4DB5   C706B50D0000    MOV Word Ptr [0DB5],0000 
        FDC8:4DBB   BBFFFF          MOV BX,FFFF 
        FDC8:4DBE   891E2000        MOV [0020],BX 
        FDC8:4DC2   891E1E00        MOV [001E],BX 
        FDC8:4DC6   E89103          CALL func_515A 
        FDC8:4DC9   B8FFFF          MOV AX,FFFF 
        FDC8:4DCC   50              PUSH AX 
        FDC8:4DCD   B82011          MOV AX,1120 
        FDC8:4DD0   CD2F            INT 2F 
        FDC8:4DD2   58              POP AX 
        FDC8:4DD3   C3              RET 

off, notice our old friends INT 21h AH=34h and 52h. Except for the clarity of the code displayed in figure 6-18, these hold no surprises for us. The functions are nearly identical. They both get the caller's register structure, and return different values into the caller's BX. Perhaps NICEDBG could be improved to recognize the caller's register structure and, where appropriate (which would be the difficult part) replace expressions such as [SI+02] and [SI+10] with something like CALLER_BX and CALLER_ES. That's for version 2.0!

More interesting is the code that appears next in figure 6-18 for INT 21h functions 1Fh and 32h. These Disk Parameter Block functions have been around for a while, but Microsoft only documented them starting in DOS 5.0. Note that the code for function 1Fh simply sets DL=0 and falls into the code for function 32h. This makes sense, since function 1Fh is Get Default DPB, and function 32h is Get DPB. Get DPB takes a drive number in DL and returns the DPB in DS:BX.

Where does the DPB come from? The Get DPB code calls several subfunctions not shown here, but armed with the NICEDBG output, you can examine the code for each of these subfunctions fairly easily. In essence, INT 21h AH=1Fh and AH=32h call the internal Set Drive function (INT 2Fh AX=1219h), which in turn calls the INT 2Fh AX=1217h function that we examined in figure 6-17. As noted there, this function sets the working Current Directory Structure field at DOS:05A2h (SDA+282h). Note that this is not the same as changing drives; it merely sets up a working area in the DOS data segment. When INT 2Fh AX=1219h has returned, Get DPB pulls the CDS pointer out of the working CDS field where the INT 2Fh function just put it. It then calls a subroutine that gets the DPB pointer from offset 45h in the CDS. Having examined the different subroutines that Get DPB calls, we can decorate the code with comments:

Figure 6-19: MS-DOS 6.0 Code for DPB Functions 1Fh and 32h

        FDC8:4D71   B200            MOV DL,00           ; 0 = default drive
                                                    ; fall through!
        FDC8:4D73   16              PUSH SS 
        FDC8:4D74   1F              POP DS              ; get DOS DS
        FDC8:4D75   8AC2            MOV AL,DL 
        FDC8:4D77   E8415D          CALL INT2F_12_19    ; Set Drive, like 2f/1217
        FDC8:4D7A   7222            JB loc_4D9E 
        FDC8:4D7C   C43EA205        LES DI,[05A2]       ; SDA+282h = curr CDS ptr 
        FDC8:4D80   26F6454480      TEST Byte Ptr ES:[DI+44],80 ; CDS[43-44h] = flags
        FDC8:4D85   7517            JNZ loc_4D9E        ; if net/redir drive, fail
        FDC8:4D87   E8B003          CALL func_513A      ; enter crit #1 (2A/8001)
        FDC8:4D8A   E83749          CALL func_96C4      ; ES:BP get DPB from CDS[45h]
        FDC8:4D8D   E8CA03          CALL func_515A      ; exit crit #1 (2A/8101)
        FDC8:4D90   720C            JB loc_4D9E         ; fail?
        FDC8:4D92   E8EDF4          CALL INT2F_12_18    ; get caller's regs
        FDC8:4D95   896C02          MOV [SI+02],BP      ; caller's BX
        FDC8:4D98   8C440E          MOV [SI+0E],ES      ; caller's DS
        FDC8:4D9B   32C0            XOR AL,AL           ; al = 0 for success
        FDC8:4D9D   C3              RET 

The final function to examine back in figure 6-18 is INT 21h AH=0Dh (Disk Reset). The function does its real work inside the call to func_9A34 (not shown), which loops over all buffers, calling the internal Flush Buffer function (INT 2Fh AX=1215h). But note in figure 6-18 that Disk Reset also calls INT 2Fh AX=1120h, which is the network redirector Flush All Disk Buffers function. This provides a good illustration of how the network redirector works as a series of hooks in DOS. At various key moments, DOS issues an INT 2Fh AH=11h call; any installed redirector can pick up the call and do what it needs (see chapter 8).

One of the things that probably isn't clear from the DOS code shown in this chapter, but which becomes clear from examining the INT212F.LST file, is that hooks play an important role in DOS. In addition to the INT 2Fh AH=11h redirector interface, DOS also checks the SHARE hooks. These, however, are implemented in a totally different manner from the redirector (see SHARHOOK.C at listing 8-XX in chapter 8). Of course, many DOS functions get passed down to installable device drivers; the DOS code calls these drivers using the Strategy and Interrupt pointers in the device driver header (see chapter 7).

Remember also that external programs probably hook many of these DOS calls. You saw earlier, for example, that SMARTDRV and DBLSPACE hook the Disk Reset call. Thus, it is a little misleading to view the INT 21h AH=0Dh handler in MSDOS.SYS in isolation. When examining the code for a DOS function, it is important to remember that DOS isn't just the code in MSDOS.SYS and IO.SYS, but it is the sum total of the interactions of this code with all the DOS extensions you are likely to find on a user's machine. This not only means understanding the role of programs such as Windows, SMARTDRV, MSCDEX, DOSKEY, and DBLSPACE, but also understanding where non-Microsoft programs such as Stacker, NetWare, and 386MAX come in. A good example of this, as we saw in chapter 4, is the way that the trivially-simple Set PSP function suddenly takes on new meaning and complexity when Novell NetWare is running.

Examining the DOS Lseek Functio

As a more extensive, but still relatively self-contained, example, let's examine the DOS Move File Pointer function (INT 21h AH=42h), frequently known after its C/Unix equivalent as lseek. We had occasion to examine the DOS code for this function while working on chapter 8 of this book. An early draft of the network-redirector specification in chapter 8, in discussing the redirector INT 2Fh AX=1121h Seek From End function, asserted that "DOS never calls this function." Since this was based merely on empirical evidence (we never seen 2F/1121 called), it made sense to examine the DOS code to verify that DOS did not contain a call to INT 2Fh AX=1121h.

To our surprise, the DOS code for lseek did contain a call to this INT 2Fh function. It turns out that DOS only calls the redirector's Seek From End function under a special set of circumstances having to do with network FCBs and various SHARE modes. Frankly, we still don't quite understand this. In any case, the rest of the code for INT 21h AH=42h is fairly straightforward, yet long enough to be a little more interesting than the feeble little examines we've seen so far. In addition, there is some interesting Windows-related code in DOS that we'll encounter along the way.

Before we examine the disassembly listing for INT 21h AH=42h, call that the function has the following specification:

        Move File Pointer
            AH = 42h
            AL = method (0 = from beginning; 1 = from current pos; 2 = from end)
            BX = file handle
            CX:DX = hi:lo offset from beginning, current, or end
            INT 21h
        Output success:
            Carry clear
            DX:AX = new hi:lo position
        Output failure:
            Carry set
            AX = error value (1 = invalid function; 6 = invalid handle)

Microsoft's DOS programmer's reference further notes that "A program should never attempt to move the file pointer to a position before the start of the file. Although this action does not generate an error during the move, it does generate an error on a subequent read or write operation. A program can move the file position beyond the end of the file. On a subsequent write operation, MS-DOS writes data to the given position in the file, filling the gap between the previous end of the file and the given position with undefined data. This is a common way to reserve file space without writing to the file."

This tends to suggest that almost any CX:DX parameters to lseek are valid. Indeed, as we're about to see, the code does little more than move the CX:DX parameter into the file's SFT entry. The hard part is getting the SFT entry. To make sense of the code listing, you'll need to know the following offsets in the SFT (for further information, see the appendix under INT 21h AH=52h):

        02h     WORD        open mode
        05h     WORD        device info word
        11h     DWORD       file size
        15h     DWORD       current file position
        2Fh     WORD        machine number (Windows VM ID)

Figure 6-20 shows the DOS code for INT 21h AH=42h (Move File Pointer). Many explanatory comments were added by hand to the code generated by NICEDBG.

Figure 6-20: MS-DOS 6.0 Code for INT 21h AH=42h (lseek)

        ; xref: FDC8:50D5 FDC8:9D52 FDC8:9DC1 FDC8:9E9C 
        FDC8:A845   E8E100          CALL func_A929 ; TURNS BX HANDLE INTO
                                            ; ES:DI SFT (see fig. 6-21)
        ; xref: FDC8:A8B4 
        FDC8:A848   7302            JNB loc_A84C
        FDC8:A84A   EB9E            JMP jmp_A7EA  -> loc_43ED   ; couldn't: fail!

        ; xref: loc_A848 
                        loc_A84C:              ; ES:DI=valid SFT entry
        FDC8:A84C   3C02            CMP AL,02          ; which move method?
        FDC8:A84E   760A            JBE loc_A85A 
        FDC8:A850   36C606230301    MOV Byte Ptr SS:[0323],01 ; SDA+3=error locus
        FDC8:A856   B001            MOV AL,01                 ; 1=invalid function

        ; note many jmp jmp in DOS code:
        ;   A858 -> A7EA -> A7D8 -> A7D4 -> A716 -> A6FB -> 43ED
        ; usually to use short jmp, but is it still worth it?
        ; but can it ever be changed??

        ; xref: jmp_A8AB 
        FDC8:A858   EB90            JMP jmp_A7EA  -> loc_43ED  ; fail!

        ; xref: FDC8:A84E 
        FDC8:A85A   3C01            CMP AL,01 
        FDC8:A85C   720A            JB loc_A868             ; below = 0
        FDC8:A85E   771B            JA loc_A87B             ; above = 2

                            ; handling seek method #1: from current pos
        FDC8:A860   26035515        ADD DX,ES:[DI+15]       ; SFT->file_pos
        FDC8:A864   26134D17        ADC CX,ES:[DI+17]
                            ; fall through to method #0

        ; xref: FDC8:A85C FDC8:A88A 
                        loc_A868:           ; #0: from beginning
        FDC8:A868   8BC1            MOV AX,CX
        FDC8:A86A   92              XCHG AX,DX              ; DX:AX <- CX:DX

        ; xref: FDC8:A8A9 
        FDC8:A86B   26894515        MOV ES:[DI+15],AX   ; update SFT->file_pos
        FDC8:A86F   26895517        MOV ES:[DI+17],DX 
        FDC8:A873   E8FF99          CALL INT2F_12_18    ; get caller's regs
        FDC8:A876   895406          MOV [SI+06],DX      ; move into caller's DX
                            ;;; later on, loc_43FD does MOV [SI], AX
                            ;;; see table 6-2 for caller reg struct

        ; xref: jmp_A8EF 
        FDC8:A879   EBA7            JMP jmp_A822  -> loc_43E4   ; success!

        ; xref: FDC8:A85E 
                        loc_A87B:                   ; #2: from end
        FDC8:A87B   26F6450680      TEST Byte Ptr ES:[DI+06],80 ; dev info: NETWORK
        FDC8:A880   750A            JNZ loc_A88C 

        ; xref: FDC8:A891 FDC8:A8A2 
        FDC8:A882   26035511        ADD DX,ES:[DI+11]       ; SFT->file_size
        FDC8:A886   26134D13        ADC CX,ES:[DI+13]       ; CX:DX += file_size
        FDC8:A88A   EBDC            JMP loc_A868            ; go to method #0

        ; xref: FDC8:A880 
                        loc_A88C:               ; this is a network drive!

        ;;; This is seek method #2 (from end of file), and network bit is set
        ;;; in SFT. DOS may call a network redirector's 2F/1121 Seek From End
        ;;; handler, but only if some strange conditions are met: it can't
        ;;; be an FCB open, and certain SHARE bits must be set.

        FDC8:A88C   26F6450380      TEST Byte Ptr ES:[DI+03],80 ; open mode: FCB!
        FDC8:A891   75EF            JNZ loc_A882                ; an FCB open
                                        ;;; this is not an FCB open ;;;
        FDC8:A893   268B4502        MOV AX,ES:[DI+02]           ; open mode
        FDC8:A897   25F000          AND AX,00F0 
        FDC8:A89A   3D4000          CMP AX,0040     ; OPEN_SHARE_DENYNONE
        FDC8:A89D   7405            JZ DO_2F_1121   ; redir seek from end
        FDC8:A89F   3D3000          CMP AX,0030     ; OPEN_SHARE_DENYREAD
        FDC8:A8A2   75DE            JNZ loc_A882    ; no: update caller's regs

        ; xref: FDC8:A89D 
        FDC8:A8A4   B82111          MOV AX,1121  ; Call network redirector's
        FDC8:A8A7   CD2F            INT 2F       ; Seek from End function
        FDC8:A8A9   73C0            JNB loc_A86B ; update caller's DX:AX from SFT

        ; xref: jmp_A8F9 
        FDC8:A8AB   EBAB            JMP jmp_A858  -> loc_43ED       ; fail!

        sft = get_sft(handle)  // see below
        if (seek from begin) then set sft->file_pos = new_pos
        if (seek from end) then (signed) new_pos += file_size; goto seek from begin
        if (seek from current) then new_pos += sft->file_pos; goto seek from begin
        set caller's new_pos (DX:AX) = sft->file_pos

We haven't explained the very line of the INT 21h AH=42h handler, however, where DOS calls a subroutine, here called func_A929, to turn the caller's BX file handle into an SFT entry in ES:DI. This is shown in figure 6-21 below. The code for func_A929 turns out to be very interesting, because it shows some of MS-DOS's interaction with Windows. As indicated in the xref generated by NICEDBG, this same subroutine is also called by other parts of DOS, including the code for functions 3Eh and 68h:

Figure 6-21: MS-DOS 6.0 Code to Verify SFT Virtual Machine ID

        ; xref: INT21_3E INT21_68 FDC8:A7E5 INT21_42 FDC8:A8B1 FDC8:A907 
                                            ; func_A62A turns BX handle
        FDC8:A929   E8FEFC          CALL func_A62A ; into ES:DI SFT (fig. 6-22)
        FDC8:A92C   721C            JB ret_A94A    ; percolate error up
        ; valid handle, but it could be for another DOS box!
        FDC8:A92E   50              PUSH AX 
        FDC8:A92F   36F606301001    TEST Byte Ptr SS:IN_WIN3E,01 
        FDC8:A935   7404            JZ loc_A93B 
        FDC8:A937   33C0            XOR AX,AX 
        FDC8:A939   EB08            JMP loc_A943 

        ; xref: FDC8:A935 
                        loc_A93B:                   ; Windows running
        FDC8:A93B   36A13E03        MOV AX,SS:MACHINE_ID 
        FDC8:A93F   263B452F        CMP AX,ES:[DI+2F]       ; SFT->share_machine

        ; xref: FDC8:A939 
                        loc_A943:                   ; okay
        FDC8:A943   58              POP AX 
        FDC8:A944   7501            JNZ loc_A947 
        FDC8:A946   C3              RET 

        ; xref: FDC8:A944 
                        loc_A947:                   ; failure
        FDC8:A947   B006            MOV AL,06               ; "invalid handle"
        FDC8:A949   F9              STC 

        ; xref: FDC8:A92C 
        FDC8:A94A   C3              RET 

This code deals with the fact that, under Windows Enhanced mode, it is possible to have multiple processes in different DOS boxes that happen to have the same PSP ID (though note that SYSTEM.INI has a UniqueDOSPSP= setting). Normally, the current PSP and a file handle are sufficient to specify an open file. Under Windows Enhanced mode, the current virtual machine (VM) ID is also needed to specify an open file.

In this subroutine, DOS (a) checks if Windows Enhanced mode is running (see chapter 1 to see how DOS initially sets the IN_WIN3E flag); (b) gets the current VM ID (see chapter 1 to see how the DOSMGR VxD patches DOS's MACHINE_ID word with the current VM ID); and (c) compares the current VM ID against the machine ID field at offset 2Fh in the SFT. If the SFT's machine ID doesn't match the current VM, lseeks fails with error code 6, as if the handle in BX were invalid. It wasn't invalid per se, but it belonged to another process that happened to have the same PSP in another DOS box.

We still haven't seen, though, how DOS turns a file handle in BX into an SFT entry in ES:DI. This is accomplished by func_A62A in figure 6-22 below, which turns turns the BX handle (which is really an index into the current PSP's Job File Table) into a JFT pointer (equivalent to INT 2Fh AX=1220h), then turns the JFT pointer into an SFT index, and then turns the SFT index into an SFT entry (equivalent to INT 2Fh AX=1216h). The disassembly below starts off with DOS's INT 2Fh AX=1220h handler; func_A62A appears in the middle of the listing.

Figure 6-22: MS-DOS 6.0 Code to Turn File Handle into SFT Pointer

        ; xref: FDC8:4F01 func_A62A loc_A671 loc_A6EA loc_A7DD FDC8:A90F FDC8:A924 
        FDC8:A60D   2E8E06D73D      MOV ES,CS:DOS_DS        ; get DOS_DS
        FDC8:A612   268E063003      MOV ES,ES:CURR_PSP      ; use current PSP       
        FDC8:A617   263B1E3200      CMP BX,ES:[0032]        ; # files in JFT
        FDC8:A61C   7204            JB loc_A622 
        FDC8:A61E   B006            MOV AL,06               ; invalid handle

        ; xref: FDC8:A637 
                        loc_A620:                   ; fail
        FDC8:A620   F9              STC 
        FDC8:A621   C3              RET 

        ; xref: FDC8:A61C 
                        loc_A622:                   ; file handle < # files
        FDC8:A622   26C43E3400      LES DI,ES:[0034]        ; JFT ptr in PSP
        FDC8:A627   03FB            ADD DI,BX               ; add on BX handle

        ; xref: FDC8:A62D 
        FDC8:A629   C3              RET                     ; return ptr -> SFT ndx

        ;;; code to turn handle in BX into SFT entry in ES:DI ;;;
        ; xref: FDC8:4EDC INT21_4400_01 INT21_4402_03 FDC8:61DD INT21_440A \
        ; FDC8:757B func_A929 FDC8:B27B 
        FDC8:A62A   E8E0FF          CALL INT2F_12_20    ; turn BX handle->ES:DI JFT
        FDC8:A62D   72FA            JB ret_A629 
        FDC8:A62F   26803DFF        CMP Byte Ptr ES:[DI],FF ; unused!
        FDC8:A633   7504            JNZ loc_A639 
        FDC8:A635   B006            MOV AL,06               ; invalid handle
        FDC8:A637   EBE7            JMP loc_A620            ; fail

        ; xref: FDC8:A633 
        FDC8:A639   53              PUSH BX 
        FDC8:A63A   268A1D          MOV BL,ES:[DI]          ; JFT entry -> SFT index
        FDC8:A63D   32FF            XOR BH,BH           
        FDC8:A63F   E80200          CALL INT2F_12_16        ; SFT index -> SFT ES:DI
        FDC8:A642   5B              POP BX 
        FDC8:A643   C3              RET 

        ; xref: FDC8:6DF1 FDC8:A516 FDC8:A63F FDC8:A686 
                        INT2F_12_16:                ; SFT ndx -> ES:DI SFT
        FDC8:A644   2E8E06D73D      MOV ES,CS:DOS_DS        ; get DOS DS
        FDC8:A649   26C43E2A00      LES DI,ES:[002A]        ; SysVars+4 -> first SFT

        ; xref: FDC8:A65E 
                        loc_A64E:                   ; walk SFT chain
        FDC8:A64E   263B5D04        CMP BX,ES:[DI+04]       ; SFT # files
        FDC8:A652   720E            JB loc_A662             ; in this table!
        FDC8:A654   262B5D04        SUB BX,ES:[DI+04]       ; subtract #files this SFT
        FDC8:A658   26C43D          LES DI,ES:[DI]          ; follow linked list
        FDC8:A65B   83FFFF          CMP DI,-01              ; end of SFTs?
        FDC8:A65E   75EE            JNZ loc_A64E            ; loop to next SFT
        FDC8:A660   F9              STC                     ; invalid SFT index
        FDC8:A661   C3              RET                     ; fail!

        ; xref: FDC8:A652 
                        loc_A662:                   ; in this SFT
        FDC8:A662   50              PUSH AX 
        FDC8:A663   B83B00          MOV AX,003B             ; SFT each size entry
        FDC8:A666   F6E3            MUL BL                  
        FDC8:A668   03F8            ADD DI,AX               ; offset of this entry
        FDC8:A66A   58              POP AX 
        FDC8:A66B   83C706          ADD DI,+06              ; skip past SFT header
        FDC8:A66E   C3              RET 

The basic sequence here is: BX handle -> JFT entry (2F/1220) -> SFT ndx -> SFT entry (2F/1216).

Recall that the file handle in BX is really an index into the current PSP's JFT. Thus, the code for INT 2Fh AX=1220h gets the current PSP from the familiar global DOS variable, and checks PSP:0032 (which holds the maximum number of file handles available to this PSP). If the handle in BX is < the file-handle maximum (i.e., the JFT size), then this code gets a far pointer to the JFT from PSP:0034 and adds BX onto the JFT pointer, yielding a far pointer in ES:DI to the file 's JFT entry.

Each JFT entry is a single byte that holds an index into the SFT, or FFh to indicate an unused entry. The code above ensures that the caller hasn't passed in a file handle whose corresponding JFT entry is unused.

If DOS has a valid SFT index, it passes it to a function (equivalent to INT 2Fh AX=1216h), which returns a pointer to the corresponding SFT entry. From the listing above, we can see how this code works: DOS gets a pointer to the SFT from SysVars+4, and walks the SFT chain, comparing the SFT index against the number of files in each SFT until it finds the right one. DOS then multiples the remaining SFT index by 3Bh (the size of an SFT entry) and adds it onto the start of this SFT, to form an SFT entry.

That's it. We've now examined the DOS code for lseek in its entirety. We've seen how the specification for INT 21h AH=42h is actually implemented in working code, how DOS gets from a file handle in BX to an SFT entry in ES:DI, and how it can use this SFT to get and set the current file position and size, and also to check the Windows VM ID. But remember that this is DOS, so it possible and even likely that some important third-party extensions such as NetWare hook the lseek function. Our disassembly of the DOS kernel of course neglects to deal with whatever changes these might make to the behavior of lseek.

We have only presented a fairly random selection of extremely simple DOS functions, viewed in isolation from key third-party DOS extensions. Just to properly discuss this simple DEBUG disassembly of 30 kbytes of DOS code would require an entire book. In fact, properly explaining each function, examining its interactions with resident software such as SmartDrv, Windows, and NetWare could easily be the subject of several books. For further in-depth discussions of this code, see Chappell's "DOS Internals" and Mike Podanoffsky's "DOS: The Source" (this forthcoming book is described in more detail below).

Other Parts of DOS

As noted earlier, NICEDBG places an "outside range" list at the end of a disassembly listing. This list indicates locations that are called or jumped to in the listing, but which don't themselves appear in the listing. This list provides additional addresses for unassembly by DEBUG or SYMDEB.

For example, the disassembly of the MSDOS.SYS code segment includes the function INT2F_DISPATCH. As you know from the earlier investigation in figure 6-13, the INT 2Fh handler in MSDOS.SYS jumps to the handler in IO.SYS. Here is how this shows up in the INT212F.LST file produced by NICEDBG:

        ; xref: FDC8:44DA FDC8:462F FDC8:4687 FDC8:46E0 
        FDC8:44DF   EA05007000      JMP 0070:0005 

        ; ...

        ;; outside-range FDC8:4045-B800:
        ;; 0070:0005
        ; ...

You can use this one address, 0070:0005, as the starting point for a disassembly of the IO.SYS code:

        -u 0070:0005 0005
        0070:0005 EA93087000     JMP    0070:0893 
        -u 0070:0893 0893
        0070:0893 2EFF2EE606     JMP    FAR CS:[06E6] 
        -dd 0070:06e6 06e6
        0070:06E6  FFFF:1302
        -u ffff:1302
        FFFF:1302 80FC13         CMP    AH,13 
        FFFF:1305 7413           JZ 131A 
        FFFF:1307 80FC08         CMP    AH,08 
        FFFF:130A 743B           JZ 1347 
        FFFF:130C 80FC16         CMP    AH,16 
        FFFF:130F 7479           JZ 138A 
        FFFF:1311 80FC4A         CMP    AH,4A                         ;'J' 
        FFFF:1314 7503           JNZ    1319 
        FFFF:1316 E9A700         JMP    13C0 
        FFFF:1319 CF             IRET    

        C:\UNDOC2\CHAP6>type io.scr
        u ffff:1302 1319

        C:\UNDOC2\CHAP6>symdeb /x < io.scr > io.out

        C:\UNDOC2\CHAP6>nicedbg io.out > io.lst

        C:\UNDOC2\CHAP6>type io.lst
        ; ....
        ;; outside range FFFF:1302-1319:
        ;; 131A
        ;; 1347
        ;; 138A
        ;; 13C0

Now, of course, we expand the unassembly range for SYMDEB, based on the addresses in the outside range list. Also, we can start to create a file with symbolic names:

        -u 0070:0005 0005
        0070:0005 EA93087000     JMP    0070:0893 
        -u 0070:0893 0893
        0070:0893 2EFF2EE606     JMP    FAR CS:[06E6] 
        -dd 0070:06e6 06e6
        0070:06E6  FFFF:1302
        -u ffff:1302
        FFFF:1302 80FC13         CMP    AH,13 
        FFFF:1305 7413           JZ 131A 
        FFFF:1307 80FC08         CMP    AH,08 
        FFFF:130A 743B           JZ 1347 
        FFFF:130C 80FC16         CMP    AH,16 
        FFFF:130F 7479           JZ 138A 
        FFFF:1311 80FC4A         CMP    AH,4A                         ;'J' 
        FFFF:1314 7503           JNZ    1319 
        FFFF:1316 E9A700         JMP    13C0 
        FFFF:1319 CF             IRET    

        C:\UNDOC2\CHAP6>type io.scr
        u ffff:1302 1319

        C:\UNDOC2\CHAP6>symdeb /x < io.scr > io.out

        C:\UNDOC2\CHAP6>nicedbg io.out > io.lst

        C:\UNDOC2\CHAP6>type io.lst
        ; ....
        ;; outside range FFFF:1302-1319:
        ;; 131A
        ;; 1347
        ;; 138A
        ;; 13C0

We continue in this way until no unresolved references remain. As noted earlier, sometimes DEBUG and SYMDEB get thrown off tracks because of data residing in the middle of a code segment. Based on the NICEDBG "unresolved label" list, you may need to split a single u command in a DEBUG script into two or more separate u commands.

Of course, the techniques shown here for disassembly in memory of MSDOS.SYS and IO.SYS also work for any other resident software. In figure 6-11, for example, we saw SMARTDRV, MSCDEX, DOSKEY, SHARE, PRINT, COMMAND.COM, and so on, all camped out on the INT 2Fh chain. You can submit any of the addresses displayed by INTCHAIN to DEBUG or SYMDEB for diassembly, and process the resulting output with NICEDBG.

However, it is much easier to disassemble separate programs such as SMARTDRV, MSCDEX, COMMAND, and PRINT on disk rather than in memory, because these programs don't involve the segment-moving contortions of the DOS kernel. PRINT in particular is probably the most-disassembled piece of DOS, as this was how many TSR writers learned their craft. You can use a disassembler such as Sourcer to examine these programs.

Given the ability to reverse engineer DOS, an almost infinite amount of information on DOS programming is readily available, at your fingertips: to answer some question about DOS, look at the code running on your machine. But one obvious problem with this approach is what true in one configuration may not be true in another. Applications patch DOS; DOS changes (though not much, in truth) from one version to version. Describing software based on its source code (whether supplied or disassembled) can either be the only accurate way to find out what the software really does, or it can be dangerous, relying on features that may change. There are no certainties here. Your best bet is to examine the source code, but to realize how it may change, either because of future versions, or because of unforseen interactions with other software.

Am I Going to Jail For This?

[This was written in 1993, long before the author became an attorney, and in any case should not be construed as legal advice.]

Among many programmers there seems to be some doubts about the legality of what we've been doing in this chapter. Programmers frequently think that disassembling Microsoft's code is illegal, and even that it is somehow a full-blown criminal (rather than civil) offense, punishable by a stiff prison sentence! We had better look into this now.

The following discussion of the legalities of disassembly was not written by an attorney, and should not in any way be viewed as legal advice. However, I have benefited enormously from discussions with Gene K. Landy, a partner at the law firm of Shapiro, Israel & Weiner, P.C. in Boston. Any errors and misconceptions of course remain mine.

Landy is the author of a superb book/disk package, The Software Developer's and Marketer's Legal Companion , published by Addison-Wesley (1993), which includes several extremely useful discussions of reverse engineering. Chapter 1 discusses reverse engineering in the context of copyright, including the important Sega v. Accolade case. Chapter 2 discusses software trade secrets and confidentiality agreements. Chapter 11 covers shrink-wrap licenses and warranties, and the standard shrink-wrap license limitation on reverse engineering, noting the important case of Vault v. Quaid. This is a fine book that every software developer will want to have in these troubled, legally-complex, times.

Why the typical programmer's idea that you can wind up behind bars just for having seen the CLI instruction at the beginning of the INT 21h dispatch code? Quite simply, because the standard license agreement that comes with all Microsoft products states, as plain as day:

OTHER RESTRICTIONS.... You may not reverse engineer, decompile, or disassemble the software.

The very top of the license agreement states that, "This is a legal agreement between you (either an individual or an entity) and Microsoft Corporation. By opening the sealed software packet(s) you are agreeing to be bound by the terms of this Agreement."

Well, that settles it, doesn't it? If you use any Microsoft software, you have entered into a binding legal agreement not to disassemble it, even if disassembly were otherwise a legitimate activity, right?

No. Attorneys have long questioned whether shrink-wrap licenses are binding, because of the mechanism they use. The few court cases that have decided issues of shrink-wrap licenses have spread further doubt about their effectiveness. As Landy explains in his chapter on shrink-wrap licenses,

The central concept of a shrink wrap license is its system of acceptance or rejection: If you accept the contract, you tear open the envelope; if you reject it, you return the package for a refund. But does this "tear open" concept work? Does the law really allow the licensor to force the user to this choice? ... A fundamental idea in contract law, from its eighteenth century roots to the present, is the bargain<197>what lawyers sometimes call a "meeting of the minds." In a classic contract, the terms are bargained out, then the sale takes place as agreed. While the sale of goods in all states (except Louisiana) is now governed by a state statute, the Uniform Commercial Code, the same concept has carried over. A contract and its terms are agreed before or at time of the sale. The problem with the Shrink Wrap License is that the retail software sale is over and done with before the customer is presented with the one-sided terms of the Shrink Wrap license. After the sale is already made, it is too late to try to impose adverse terms.

Similarly, Raymond T. Nimmer's excellent textbook, The Law of Computer Technologynotes that "The attempt to alter the expectations of the common purchaser by virtue of a printed form included within the product package is unlikely to be successful."

How about the specific shrink-wrap license limitation against disassembly and reverse engineering? A number of important cases have held that shrink-wrap or tear-me-open license agreements cannot be used to outlaw reverse engineering. Both Landy's book and Nimmer's discusses the important case of Vault v. Quaid (1987-1988). The state of Louisiana had enacted special legislation to validate various aspects of shrink-wrap licenses, including the restriction on reverse engineering. Vault (a California corporation) took Quaid (a Canadian corporation) to court in Louisiana to try to take advantage of this exceptional law. Unfortunately for Vault, but fortunately for those who think that disassembly is an important consumer right, the court ruled that the Louisiana statute was preempted by federal law.

So Microsoft's shrink-wrap license limitation against disassembly probably isn't worth the paper it's printed on.

How about the law of "trade secrets"? To begin with, reverse engineering is actually one of the few legitimate ways to discover a trade secret. The Uniform Trade Secrets Act (UTSA), adapted in the mid-1980s by almost all states, says explicitly that discovery through reverse engineering is a proper means of gaining access to non-patented trade secrets. Choosing one of the many books on intellectual property more or less at random, we find (Roger E. Schechter, Unfair Trade Practices and Intellectual Property , pp. 135-136, italics added):

REVERSE ENGINEERING IS NOT IMPROPER MEANS Many products are manufactured pursuant to plans or with technologies that are trade secrets and then sold to the public at large. In some cases the method of manufacture of these items may be discovered by careful study of the object. Typical methods of discovery include taking the product apart or performing experiments on it. This process of analysis is usually called "reverse engineering." Numerous cases hold that reverse engineering is not an improper means of learning a trade secret. Risk of discovery by reverse engineering is a risk that a firm takes when it chooses to rely on trade secret protection for a valuable commercial asset. Note that if a firm secures patent protection for a new device or manufacturing process it is protected against "reverse engineering." This is one of the most important differences between patent and trade secret protection.

Given that MS-DOS is not patented (the two patent numbers, 4,955,066 and 5,109,433, in the front of all Microsoft's manuals are for data compression, as used for example in Microsoft's help compilers), it then all seems to be quite straightforward: as far trade secret law is concerned, reverse engineering is okay. The rationale here is that trade secret law is basically about the loyalty of employees or others who receive important business information in confidence. You violate trade secret law by committing, inducing or exploiting violations of trust. One does not violate anyone's trust by disassembling a product purchased on the open market.

So far, the shrink-wrap license statement against disassembling seems ineffective, and trade secrets law says disassembly is okay. What about the fact that MS-DOS is copyrighted? Does copyright law permit us to study how DOS works internally, and then build products based on this new-found knowledge? For example, does it violate Microsoft's copyright to figure out how IO.SYS preloads DBLSPACE.BIN in MS-DOS 6, and then write a replacement for DBLSPACE.BIN that supports the same interface?

Disassembly is sometimes regarded as a form of copying (translation from one medium to another, or one language to another), and therefore as possible copyright infringement. However, disassembly for the purposes of achieving compatibility is generally regarded as "fair use." An important decision by the Court of Appeals for the Ninth Circuit in Sega v. Accolade (August 1992), overturning a lower court's ruling, held that Accolade's use of knowledge reverse-engineered from the Sega Genesis system did not violate Sega's copyright and constituted fair use. According to the court (as quoted in UNIX Review , May 1993),

We conclude that where disassembly is the only way to gain access to the ideas and functional elements embodied in a copyrighted computer program and where there is a legitimate reason for seeking such access, disassembly is a fair use of the copyrighted work, as a matter of law.

The importance of Sega v. Accolade was underlined in a comment in Microprocessor Report (December 9, 1992): "For the industry, many can breathe a deep sigh of relief. No longer are we unwitting copyright violators because we need to understand the parameters to an undocumented `Int 21' call."

Naturally, not all members of the industry breathed a sigh of relief upon hearing the appeals court's ruling. In particular, a group calling itself the Business Equipment Manufacturers, which includes IBM, Intel, and Microsoft, is seeking stronger protection against reverse engineering. Arguing for greater protection for reverse engineering is the so-called American Committee for Interoperable Systems, which includes Sun Microsystems, Amdahl, and Chips & Technologies (see "Reverse Engineering Reversals," Upside , May 1993).

If disassembly for the purposes of achieving compatibility is okay (and this, by the way, is also true in Europe under article 6 of the EC's directive on software protection), then how about this book's quotations from disassembly listings? Have we violated Microsoft's copyright by reprinting several chunks of code from MS-DOS and Windows in this book?

Again, no. For purposes of copyright, computer programs are considered to be "literary works." While it is a bogus notion that a compiled program without its source code merits being called a literary work, if the phrase "literary work" means anything at all in the context of computer software, it must include the possibility for literary criticism. Our inclusion of brief excerpts from disassembly listings is essentially a form of scholarly quotation, which is one of the oldest forms of fair use (see William S. Strong, The Copyright Book , 4th edition, chapter 8).

Remember too that throughout this chapter we have relied on DEBUG, a tool which Microsoft provides with every copy of MS-DOS. Microsoft has made no effort to secure MS-DOS against disassembly, especially given DEBUG's ability to trace into an INT 21h or INT 2Fh call.

Use the Source, Luke!

Is there any alternative to disassembly? One alternative is of course to rely entirely on the vendor's documentation and not consider whether this documentation is an accurate reflection of the actual software. But as the reader has probably figured out by now, relying on vendor documentation has as many risks as does relying on undocumented behavior that has been discerned through disassembly.

Depending on what you are interested in, there may be another, better alternative to disassembly: source code.

For example, programmers' questions sometimes aren't really about how the operating system behaves in a certain circumstance, but about what their compiler's run-time library (RTL) does. There is a persistent confusion among many programmers of the difference between a FILE* in C and a DOS file handle. Programmers often call the DOS Set Handle Count function (INT 21h AH=67h) and then wonder why the C fopen() function still fails. Confusion such as this can be cleared up by a careful study of the RTL source code. Both Microsoft C and Borland C++ come with RTL source code.

Sometimes, rather than having specific questions about MS-DOS, programmers are just curious about how operating systems work in general. In this case, the best approach is probably to study one of the several excellent books available on the design and implementation of UNIX. Some of these, such as Bach's Design of the UNIX Operating System and Andleigh's UNIX System Architecture , present detailed pseudocode for UNIX. Others, such as Tanenbaum's wonderful Operating Systems: Design and Implementation (MINIX) and Comer's Operating System Design: The XINU Approach , come with complete source code for UNIX workalikes. Despite the numerous differences between DOS and UNIX, the books should be required reading for anyone planning to delve into DOS internals. DOS's handling of memory, processes, files, devices, and so on, can often best be understood by contrasting it with the design and implementation of a well-understood system such as UNIX.

For a more specifically DOS-like approach to operating system design and implementation, another alternative to disassembly of MS-DOS is to examine the source code that is available for several DOS workalikes. Embedded DOS from General Software (Redmond WA) has Steve Jones's superb documentation on DOS internals (for an excellent discussion of making a fully-reentrant DOS, see Steve's article "DOS Meets Real-Time" in the February 1992 Embedded Systems Programming ). General Software's Utility SDK and Device Driver SDK come with complete source code in C for versions of utilities such as CHKDSK, FORMAT, FDISK, DISKCOPY. ROM DOS 5 from Datalight (Arlington WA) is also available with source code.

Last, but not least, Mike Podanoffsky (mikep@world.std.com) has written RxDOS, an inexpensive DOS available with fully commented, assembly language source code. Podanoffsky is currently writing a full-length book on RxDOS, tentatively titled DOS: The Source , that will be available in 1994. While obviously not identical to the MS-DOS source, this source code may be more than adequate for your needs. For example, figure 6-23 below shows the implementation of INT 21h functions 50h, 51h, and 52h from RXDOS.ASM:

Figure 6-23: RxDOS Implementation of INT 21h AH=50h, 51h, and 52h

                ;  50h Set PSP Address                                          ;
                ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
                ;  bx      contains PSP address to use                          ;
                mov word ptr [ _RxDOS_CurrentPSP ], bx   ; Seg Pointer to current PSP

                ;  51h Get PSP Address                                          ;
                ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
                ;  bx      contains PSP address to use                          ;
                mov bx, word ptr [ _RxDOS_CurrentPSP ]  ; Seg Pointer of current PSP
                RetCallersStackFrame es, si
                mov word ptr es:[ _BX  ][ si ], bx

                ;  52h Get Dos Data Table Pointer                               ;
                ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
                ;  es:bx returns pointer to dos device parameter block          ;
                ; --- DOS Undocumented Feature -------------------------------- ;
                RetCallersStackFrame es, si
                mov word ptr es:[ _ExtraSegment ][ si ], ds
                mov word ptr es:[ _BX ][ si ], offset _RxDOS_pDPB

There are no big surprises here (really, how else could Get and Set PSP be implemented, anyway?), but we can see that this accurately reflects MS-DOS, and that having this code earlier in the chapter might have saved us a lot of trouble.

More interesting, figure 6-24 below shows the RxDOS implementation of lseek, the MS-DOS implementation of which we saw earlier, in figure 6-20. The RxDOS code provides a useful guide to the MS-DOS disassembly.

Figure 6-24: RxDOS Implementation of INT 21h AH=42h (lseek)

                ;  42h Lseek (Move) File Pointer                                ;
                ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -;
                ;  al      move method                                          ;
                ;  bx      handle                                               ;
                ;  cx:dx   distance to move pointer                             ;
                def _method, ax
                def _handle, bx
                ddef _moveDistance, cx, dx
                ddef _newPosition

                mov ax, bx                          ; handle
                call MapAppToSysHandles             ; map to internal handle info
                call FindSFTbyHandle                ; get corresponding SFT (es: di )
                jc _moveFilePointer_36              ; if could not find -->

                getdarg cx, dx, _moveDistance
                mov ax, word ptr [ _method ][ bp ]
                Goto SEEK_BEG,   _moveFilePointer_beg
                Goto SEEK_CUR,   _moveFilePointer_cur
                Goto SEEK_END,   _moveFilePointer_end
                SetError -1,     _moveFilePointer_36

        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ;  seek from end
        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                add dx, word ptr es:[ sftFileSize. _low  ][ di ]
                adc cx, word ptr es:[ sftFileSize. _high ][ di ]
                jmp short _moveFilePointer_beg

        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ;  seek from current position
        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                add dx, word ptr es:[ sftFilePosition. _low  ][ di ]
                adc cx, word ptr es:[ sftFilePosition. _high ][ di ]
            ;  jmp short _moveFilePointer_beg
        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ;  seek from beginning
        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                mov word ptr es:[ sftFilePosition. _low  ][ di ], dx
                mov word ptr es:[ sftFilePosition. _high ][ di ], cx

        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ;  Return
        ;- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                RetCallersStackFrame ds, bx
                mov word ptr [ _AX ][ bx ], dx
                mov word ptr [ _DX ][ bx ], cx

If you want a disassembly of genuine MS-DOS, but don't want to DIY (do it yourself), and for some reason would be happy with a disassembly of DOS 1.1 or 2.1, Information Modes (Denton TX) sells inexpensive disassembly listings of these early versions of DOS. Imodes used the information gleaned from its long-ago disassembly project as part of its well-known product, The $25 Network ("Skeptical? We make believers! Over 15,000 sold"). For example, figure 6-25 below shows Imodes' rendition of the Get and Set PSP functions from D1.ASM, a disassembly dated April 1987 (it is an interesting reflection on the state of knowledge about DOS internals at the time that function 52h is labelled "get device driver list").

Figure 6-25: Imodes Disassembly of DOS 2.1 Set and Get PSP

        ;........................... Set current PSP ......................... Fn 50
            MOV CS:L0191,BX            ;current PSP seg

        ;........................... Get current PSP ......................... Fn 51
            CALL L0C1A                 ;ds:si--> user's stack
            PUSH CS:L0191              ;
            POP [SI+2]                 ;return in bx

Figure 6-26 below shows the Imodes interpretation of the lseek function from DOS 2.1, which you can compare against the MS-DOS 6.0 disassembly in figure 6-20 and the RxDOS implementation in figure 6-24.

Figure 6-26: Imodes Disassembly of DOS 2.1 INT 21h AH=42h (lseek)

        ;........................... Lseek (handle) .......................... Fn 42
                                    ;bx = handle
                                    ;cx_dx = hi_low dword offset
                                    ;al = seek mode,  0 - from file start
                                    ;                 1 - from current position
                                    ;                 2 - from file end
                                    ;return: cy=0, dx_ax = new position (from start)
                                    ;    - or -
                                    ;return: cy=1, ax = 1 - invalid function (mode)
                                    ;                   6 - invalid handle
        CMP AL,3                   ;is method in range 0..2 ?
        JC L3BDD                   ;no:      yes-->
        MOV AL,1                   ;err = invalid function

        JMP SHORT L3BD3            ;dos error return

        PUSH SS
        POP DS
        CALL L38FB                 ;with bx=handle, get handle defn.
        PUSH ES
        POP DS
        JC L3BD1                   ;if handle bad--> ret, invalid handle
        TEST BYTE PTR [DI+1Bh],80h ;is char device?
        JZ L3BF2                   ;yes:   no-->
        XOR AX,AX                  ;record = 0 always
        XOR DX,DX
        JMP SHORT L3C08            ;--> set random record fields

        DEC AL                     ;is method 0, from file start ?
        JL L3C05                   ;no:   yes-->
        DEC AL                     ;is method 1, from current position ?
        JL L3C18                   ;no:   yes-->

        ;. . . . . . . . . . . . . . method 2, from end of file
        XCHG DX,AX                 ;ax = LSWord
        XCHG DX,CX                 ;dx = MSWord
        ADD AX,[DI+13h]            ;add fcb's file size
        ADC DX,[DI+15h]            ;
        JMP SHORT L3C08            ;--> set fields

        ;. . . . . . . . . . . . . . method 0, from start of file
        XCHG DX,AX                 ;ax = LSWord
        XCHG DX,CX                 ;dx = MSWord

As with the PSP functions, this disassembly of lseek in DOS 2.1 bears many similarities to the disassembly of lseek in DOS 6.0. On the other hand, the DOS 2.1 version of course does not do Windows, and doesn't contain any network-redirector code.

Microsoft's DOS OEM Adaptation Kit (OAK)

But perhaps you care deeply and desperately about getting the genuine article: commented source code from Microsoft for MS-DOS 5.0 and higher. Microsoft does not publicize the product a great deal, but Microsoft will sell you an OEM Adaptation Kit, upon signing of a license agreement. Microsoft's OAK comes on an oddly-formatted tape cartridge, but a version on normal PC diskettes is available from Annabooks (San Diego CA).

The contents of the OAK are Microsoft confidential, so unfortunately we cannot reproduce any of it here, but we can give you some idea of its contents:


As you can see from this very partial directory tree, Microsoft supplies some components of the OAK in .ASM source code form, and others are supplied as .OBJ files. The idea, of course, is that the OEM will change parts of IO.SYS but not MSDOS.SYS, so IO.SYS comes with source, but MSDOS.SYS comes only with .OBJ files. Having .OBJ files is almost as good as having source code, though, since .OBJ files contain names for functions and variables. An .OBJ disassembler such as WDISASM (included with Watcom C) can basically regenerate the source code, missing only comments (which are probably out-of-date and misleading anyway).

Examination of the OAK contents mostly confirms what has already been known for many years as a result of reverse engineering. However, it is sometimes interesting to know the actual names for undocumented functions as they appear in Microsoft's source code. For example, the undocumented structure generally called the List of Lists is called SysInitVars in the DOS source because the structure is actually intended for use by SYSINIT. INT 21h AH=52h, which returns a pointer to this structure, and which is generally called Get List of Lists or Get SysVars, is called GET_IN_VARS in the DOS source. It turns out that there is little correspondence between the documented names for INT 21h functions and their actual names in the DOS source. For example, AH=1Bh is Get Default Drive Data and AH=1Ch is Get Drive Data in the MS-DOS Programmer's Reference , but in the code they are called SLEAZEFUNC and SLEAZEFUNCDL.

Looking over the OAK contents, it seems a shame that source code for MS-DOS and Windows isn't more widely available. In the same way that the old IBM PC and IBM AT technical references (for example, IBM, Technical Reference—Personal Computer AT , 1985) greatly promoted the development of innovative new software and hardware by publishing complete assembly-language listings of the system ROM BIOS, likewise Microsoft could promote greater understanding of DOS and Windows by making the source code for these fundamental technologies available. This isn't as ridiculous as it may sound. Consider that just a few years ago, compiler run-time library source code was kept proprietary too. Now almost all compilers come with RTL source.

Microsoft did at one point make some attempt at opening up DOS to closer inspection. The original MS-DOS (Versions 1.0-3.2) Technical Reference Encyclopedia (1986), one of the few books ever to be subject to a recall from the publisher, made an attempt to provide descriptions, not only of each DOS function's inputs and outputs, but also of its internal operation. Each function was accompanied by a flowchart titled, "How It Works." While an excellent idea, the execution was flawed. Some functions (such as INT 21h AH=48h Allocate Memory) were described in great detail, with the flowchart running for many pages, while others were described in only the vaguest terms such as "call internal function". The Microsoft encyclopedia carried the following warning:

Note: These flowcharts were written for MS-DOS Version 3.2. This in no way means that all future or past versions of MS-DOS will behave in the same manner. You should take care not to write programs that make use of the specific structure of the function routine, because this could result in lack of compatibility with other versions of DOS. Microsoft guarantees only that if you input the values in the registers in the specified way, you will get back the specified values. How the function actually accomplishes a task is subject to change.

In addition to the generally vague and misleading flowcharts for each DOS function, the Microsoft encyclopedia also carried an extremely detailed flowchart for COMMAND.COM. It is not clear whether the book was recalled because of the embarrassing errors it contained or because of any information on DOS internals that it inadvertently provided.

Conclusion: what have we learned here? DOS is foundation upon which everything else (including Windows 3.x) rests. You don't want to think about DOS internals every time you use a high-level C or C++ call to read or write a file. But without at least some basic understanding, you have only a vague notion of how your code works, and how it interacts with other software. Solution: "know and forget."

[Many thanks to Samuel Okei from Texas Tech Univ. for his skilled conversion and reformatting of what was a complex 25-year-old file with obscure and obsolete typesetting codes, into HTML.]