:| _|\ \::::::::::.
Extending DOS Executables
by Digital Alchemist
The reason behind this essay is to show how techniques first developed by virus
writers can be used for benevolent purposes. It is my opinion that all
knowledge is good and viral techniques are certainly no exception. I will lead
you through the development of a program called DOSGUARD which benignly
modifies DOS executables, both COM and EXE.
DESCRIPTION OF DOSGUARD
DOSGUARD is a DOS COM program which I developed in order to restrict access to
certain programs on my computer. DOSGUARD modifies all of the COM and EXE
files in the current directory, adding code to each one that requires the user
to correctly enter a password before running the original program.
DOSGUARD, while sufficient for this article, could use a little work in the
realm of user friendliness. More user feedback and a better way to specify
which files to be modified are needed. In addition, I have written a version
of DOSGUARD that uses simple xor encryption to improve security.
DOSGUARD was written using turbo assembler.
STRUCTURE OF COM FILES
Unlike the EXE file format, the programmer has no input into the segment format
of COM files. All COM files consist of 1 segment only, with no predefined
distinction between data and code. After DOS finishes some preparatory work,
the COM file is loaded at offset 100h. The first 256 bytes are known as the
Program Segment Prefix(PSP). Located at offset 80h is an important data
structure called the DTA or Data Transfer Area. The DTA is important, but most
of the rest of the PSP can be ignored by the programmer. Before actually
starting execution of the COM program, DOS sets up the stack at the top of the
segment(the highest memory address).
OUTLINE OF COM MODIFICATION
1. Open the file and read 1st 5 bytes.
2. Make sure the file is not really an EXE file because after DOS 6.0 some
files ending in ".com" were really EXEs.
3. Check to see if the file has already been modified by DOSGUARD by checking
if the values of the 4th and 5th bytes match the DOSGUARD identification
string of "CG".
4. Make sure the file is not so large that when DOSGUARD adds its code it
doesn't exceed the 64k segment size.
5. If the file passes 2-4 then its ok to modify, so DOSGUARD opens it and
writes the code to the end of the file.
6. Calculate the size of the jump to the code we added and write the jump
instruction along with the identification string to the beginning of the
I'll go over each of these steps in a little more detail with code snippets
where necessary. The complete source code for DOSGUARD can be found at the
end of the article and at my web page. Hopefully, the comments will be enough
to explain any areas I don't discuss in detail.
Essentially, the way DOSGUARD modifies COM files is by inserting a jump at the
beginning of the file which goes straight to the password authentication code,
located at the end of the file. If the correct password is entered by the
user, then it will restore the 5 bytes that were overwritten by the jump and
the identification string and execute the program just like DOSGUARD was never
COM MODIFICATION - STEP 1
Once we've found a COM file, the first thing to do is open it. Then, after
running some tests on the file, we can determine if it is suitable for
modification. But first, we need to read the first 5 bytes because we'll
need them later.
mov ax, 3D02h ;Open file R/W
mov dx, 9Eh ;Filename, stored in DTA
mov bx, ax ;Save file handle in bx
mov ax, 3F00h ;Read first 5 bytes from file
mov cx, 5
mov dx, offset obytes
COM MODIFICATION - STEP 2
After DOS 6.0, some files with the COM extension are actually EXEs.
COMMAND.COM, for instance, is one of these. If we try to modify an EXE file as
if it were a COM file, then we're going to really screw things up. To prevent
this, we make sure that the string "MZ" doesn't appear in the first two bytes o
the file. "MZ" is the string which tells DOS that a file is an EXE.
;Check to see if file is really an EXE
cmp word ptr[obytes], 'ZM'
COM MODIFICATION - STEP 3
If the file had been previously altered by DOSGUARD, then the 4th and 5th bytes
will contain the identification string "CG". We need to make sure we skip file
that have this identification string.
;Check to see if file is already infected
;if it is, then skip it
cmp word ptr [obytes + 3], 'GC'
COM MODIFICATION - STEP 4
Another thing to watch out for is the file's size. If the file will exceed
one segment in size when we add our code, then the file is too big to modify.
;Make sure file isn't too large
mov ax, ds:[009Ah] ;Size of file from DTA
add ax, offset ENDGUARD - offset COMGUARD + 100h
jc NO_INFECT ;If ax overflows then don't infect
COM MODIFICATION - STEP 5
If the file is a suitable candidate for modification, then we simply write our
code to the end of the file. Also, we have to save the original first 5 bytes
from the file somewhere in your code. In DOSGUARD's case, the 5 bytes are
already saved in the proper place because "obytes" is located within the code
which we are about to write.
xor cx, cx ;cx = 0
xor dx, dx ;dx = 0
mov ax, 4202h ;Move file pointer to the end of file
mov ax, 4000h ;Write the code to the end of file
mov dx, offset COMGUARD
mov cx, offset ENDGUARD - offset COMGUARD
COM MODIFICATION - STEP 6
The final step is to calculate the size of the jump to our code and write the
opcode for the jump and the identification string over the first 5 bytes of the
mov ax, 4200h ;Move file pointer to beginning of
xor cx, cx ; file to write jump
xor dx, dx
;Prepare the jump instruction to be written to beginning of file
xor ax, ax
mov byte ptr [bytes], 0E9h ;opcode for jmp
mov ax, ds:[009Ah] ;size of the file
sub ax, 3 ;size of the jump instruction
mov word ptr [bytes + 1], ax;size of the jump
;Write the jump
mov cx, 5; ;size to be written
mov dx, offset bytes
mov ax, 4000h
mov ah, 3Eh ;Close file
RESPONSIBILITIES OF INSERTED CODE
There are two problems which the inserted code has to deal with. First, since
the code could be located at any arbitrary offset within the segment, it cannot
depend on the compiled absolute addresses of its data labels. To solve this
problem we use a technique virus writers call the delta offset. The delta
offset is the difference between the actual and compiled addresses of data.
Anytime our code accesses data in memory it adds the delta offset to the data's
compiled address. The following piece of code finds the delta offset.
sub bp, offset GET_START
The "call" pushes the current ip onto the stack, which is the actual address of
the label "GET_START." Subtract the compiled address from the actual one and
there's our delta offset.
The second problem is to make sure the first 5 bytes of the host are restored t
their original values before we return from our jump and execute the host.
STRUCTURE OF EXE FILES
The EXE file format is much more complicated than the COM format. The big
difference is that EXE files allow the program to specify how it wants its
segments to be laid out in memory, allowing programs to exceed one 64k segment
in size. Most EXEs will have separate code, data, and stack segments.
All of this information is stored in the EXE Header. Here's a brief rundown of
what the header looks like:
Offset Size Field
0 2 Signature. Will always be 'MZ'
2 2 Last Page Size. Number of bytes on the last
page of memory.
4 2 Page Count. Number of 512 byte pages in the file.
6 2 Relocation Table Entries. Number of items in the
relocation pointer table.
8 2 Header Size. Size of header in paragraphs,
including the relocation pointer table.
10 2 Minalloc
12 2 Maxalloc
14 2 Initial Stack Segment.
16 2 Initial Stack Pointer.
18 2 Checksum. (Usually ignored)
20 2 Initial Instruction Pointer
22 2 Initial Code Segment
24 2 Relocation Table Offset. Offset to the start of
the relocation pointer table.
26 2 Overlay Number. Primary executables(the ones we
wish to modify) always have this set to zero.
Following the EXE header is the relocation pointer table, with a variable
amount of blank space between the header and the start of the table. The
relocation table is a table of offsets. These offsets are combined with
starting segment values calculated by DOS to point to a word in memory where
the final segment address is written. Essentially, the relocation pointer
table is DOS's way to handle the dynamic placement of segments into physical
memory. This isn't a problem with COM files because there is only one segment
and the program isn't aware of anything else. Following the relocation pointer
table is another variable amount of reserved space and finally the program
To successfully add code to an EXE file requires careful manipulation of the EX
header and relocation pointer table.
OUTLINE OF EXE MODIFICATION
1. Open the file and read the 1st 2 bytes(DOSGUARD actually reads 5).
2. Check for EXE signature "MZ".
3. Read the EXE header.
4. Check the file for previous infection.
5. Make sure that the Overlay Number is 0.
6. Make sure the file is a DOS EXE.
7. If the file passes 2-6 then it is ok to modify. The first step is to check
the relocation pointer table to see if there is room to add 2 pointers. If
there is room, then jump to step 9.
8. If there isn't enough room in the relocation pointer table, then DOSGUARD
has to make room. It reads in the entire file after the relocation pointer
table and writes it back out one paragraph higher in memory.
9. Save the original ss, sp, cs, and ip.
10. Adjust the file length to paragraph boundary.
11. Write code to the end of the file.
12. Adjust the EXE header to reflect the new starting segments and file size.
13. Write out the header.
14. Modify the relocation pointer table.
The easiest way to think about EXE modification is to imagine that we are
adding a complete COM program to the end of the file. Our code will occupy its
own segment located just after the host. This one segment will serve as a code
data, and stack segment just like in a COM program. Instead of inserting a jum
to take us there, we will simply adjust the starting segment values in the EXE
header to point to our segment.
EXE MODIFICATION - STEP 1
The same as with COM files, except that the only bytes we actually need are the
first two. With EXE files we will use different methods for determining
previous modification(I try to avoid using the viral term "infection") and for
transferring execution to our code.
EXE MODIFICATION - STEP 2
Check the first two bytes for the EXE signature "MZ". If the file doesn't
start with "MZ," then it isn't a DOS EXE.
cmp word ptr[obytes], 'ZM'
(... download for full text ...)