This article contains the following executables: UCRASM.ARC
Randy is the designer of numerous hardware and software projects, including assemblers. In addition to consulting, he is currently an instructor in computer science at California Polytechnic University, Pomona and UC Riverside. He can be contacted at 16217 Sunset Trail, Riverside, CA 92506.
Given the obvious benefits of space and speed, it's sad that assembly language has such a bad reputation. Some people cite portability, others complain about how hard it is to learn, but the real reason people don't write more code in assembly language can be summed up in one word: effort. Assembly language will never gain ground against high-level languages unless we can make it easier to program.
There are many reasons why people feel that programming in assembly language is much more difficult than programming in a high-level language such as C. A few points, however, top the list:
- Assembly language programmers usually "start from scratch" on new projects.
- Good assembly language programming requires a completely new mind-set, and most programmers are uncomfortable with the steep learning curve.
- There is no standardized set of library routines supplied with all assemblers (such as the standard C library). Commercial packages are available, but cost often prevents their universal acceptance.
I've had quite a bit of success teaching students assembly language at the University of California Riverside using this library. It definitely eases the transition from a high-level language to assembly. Of course, the best assembly language programs are not "C programs written with MOV instructions," but my experience shows that getting students to write real programs in assembly language as soon as possible improves the quality of their education. For professional assembly language programmers, the UCR Standard Library package provides the opportunity to write code in assembly which would be too kludgey in a high-level language (yes, assembly is more appropriate for many applications), but require too much unwritten support code in a typical assembly language application.
What's in the Library
While space restrictions prevent me from describing each entry in detail, a few examples will give you an idea of the types of routines available in the library. See Table 1, which lists the names and brief descriptions of all the routines currently in the library. I must emphasize that current means as of the writing of this article. There may be even more routines available by the time you read this. The library is in a state of constant flux.
Table 1: A brief description of the routines available in the UCR StdLib Package: The standard library routines are divided into nine distinct classes: input, output, file I/O, conversions, utility, string, memory management, character set, and floating point.
Input Routines Getc Reads a single character from the (library) standard input. GetcStdIn Reads a single character from the DOS standard input. GetcBIOS Reads a single character from the keyboard by calling BIOS. SetInAdrs } GetInAdrs } Maintain pointers to the current input routine. By changing PushInAdrs } them around, you can redirect the input obtained through PopInAdrs } the Getc routine. Gets (m) Reads a string from the keyboard. Scanf Formatted input routine, similar to C. Output Routines Putc Writes a single character to the (library) standard output. PutCR Writes a CR/LF pair using Putc. PutcStdOut Calls DOS to print a character to the DOS standard output. PutcBIOS Calls BIOS to print a character to the display. GetOutAdrs, } SetOutAdrs, } Like the corresponding routines above, these routines PushOutAdrs, } maintain the stdlib standard output pointers. PopOutAdrs- } Puts Prints string pointed at by ES:DI. Puth Prints value in AL as two hex digits. Putw Prints value in AX as four hex digits. Puti Prints value in AX as signed decimal number. Putu Prints value in AX as unsigned decimal number. Putl Prints 32-bit value in DX:AX as signed decimal integer. Putul Prints 32-bit value in DX:AX as unsigned decimal integer. PutlSize Prints integer value in AX using CX print positions. PutUSize Prints unsigned value in AX using CX print positions. PutLSize Prints long integer value in DX:AX using CX print positions, PutULSize Prints unsigned long value in DX:AX using CX print positions, Print Prints zero-terminated literal string constant following the call. Printf Similar to the corresponding C routine. File I/O Routines Fcreate Creates (and opens for writing) a new file. ES:DI points at the name. FOpen Opens an existing file for reading or writing. ES:DI points at the name. FClose Closes a file. FReadOn, } Redirect the standard input routines so that they take FReadOff } their input from a file. FWriteOn, } Redirect the standard output routines so that they send FWriteOff } their output to a file. FSeek Moves the file pointer around in the file. DOSHandle Converts STDLIB file handle to DOS file handle. FDelete Deletes a file. FRename Renames a file. Conversion Routines Atol (2) Converts ASCII string to long integer returned in DX:AX. Atoul (2) Converts ASCII string to unsigned 32-bit value in DX:AX. Atou (2) Converts ASCII string to 16-bit unsigned integer in AX. Atoh (2) Converts ASCII string of hex digits to 16-bit value in AX. Atolh (2) Converts ASCII string of hex digits to 32-bit value in DX:AX. Atoi (2) Converts ASCII string to 16-bit signed integer in AX. Itoa (2,m) Converts signed integer value in AX to string. Utoa (2,m) Converts unsigned value in AX to string, similar to Itoa/2. Htoa (2,m) Converts binary value in AL to exactly two hexadecimal characters. Wtoa (2,m) Converts binary value in AX to exactly four hexadecimal characters. Ltoa (2,m) Converts 32-bit signed value in DX:AX to string, like Itoa. Ultoa (2,m) Converts 32-bit unsigned value in DX:AX to string. Sprintf (m) Performs in-core formatting operation, like its C counterpart. Sscanf Like the C sscanf routine. ToLower If character in AL is upper case, converts it to lower case. ToUpper Converts lower-case chars in AL to upper case. Utility Routines ISize On input, AX contains signed decimal integer. On output, AX contains number of print positions this integer would require (including sign). USize Returns output size of unsigned value in AX. LSize Returns output size of 32-bit signed value in DX:AX. ULSize Returns output size of 32-bit unsigned value in DX:AX. IsAINum Returns Z-flag set if character is alphanumeric. IsXDigit Returns Z-flag set if character is a hex digit. IsDigit Returns Z-flag set if character is a decimal digit. IsAlpha Returns Z-flag set if character is alphabetic. IsLower Returns Z-flag set if character is a lowercase alphabetic. IsUpper Returns Z-flag set if character is an uppercase alphabetic. String Routines Strcpy (I) Copies string from ES:DI to DX:SI (including 0 byte). StrDup (I) Copies string from ES:DI to newly allocate storage on heap. Returns the pointer to the new string in ES:DI. StrLen Computes the length of a string. Strcat (m, I, mI) Concatenates two strings. StrChr Searches for a character within a string. StrStr (I) Searches for a substring within a string. Strcmp (I) Compares two strings. Strupr (m) Converts all lowercase characters in string to upper case. Strlwr (m) Converts all uppercase characters in string to lower case. Strset (m) Overwrites all characters in string with character passed in AL. Strspan (I) Skips over characters in string which belong to a specified set. Returns number of skipped characters in CX. Strcspan (I) Like the routines above, but skips characters not in the set. Strins (m, I,mI) Inserts one string into another. Strdel (m) Deletes characters from a string. StrRev (m) Reverses characters in string (submitted by Mike Blaszczak). Memory Management Routines MemInit Initializes the memory manager. Malloc Allocates blocks of memory. Realloc Resizes an allocated block. Free Deallocates a block allocated via malloc. Dupptr Informs the memory manager that two different pointers refer to the same block. Free will not deallocate the block until you call it once for each duplicated pointer. IsInHeap Returns true if a pointer points anywhere within the heap area. IsPtr Returns true if ES:DI points at a currently allocated object. Character Set Routines Set Macro which lets you define up to eight character sets. CreateSets Allocates storage and intializes up to eight sets on the heap. EmptySet Sets a character set to the empty set. RangeSet Builds character set from lower-bound and upper-bound characters. AddStr (I) Adds the characters in a string to a character set. RmvStr (I) Removes characters specified in string from a character set. AddChar Adds a single character (in AL) to a character set. RmvChar Removes a single character from a character set. Member Sets Z-flag if character in AL is a member of specified character set. CopySet Copies the bits from one set to another. SetUnion Computes the union of two character sets. SetIntersect Computes the intersection of two a character sets. SetDifference Computes the set difference of two character sets. NextItem Returns the ASCII code of the first item found in the set. RmvItem Removes next available item from set and returns this value. Floating-Point Routines lsfpa Loads floating-point accumulator (fpacc) with 32-bit single precision value. ssfpa Stores fpacc into a single precision variable. ldfpa Loads fpacc from a 64-bit double precision variable. sdfpa Stores fpacc from a double precision variable. lefpa (I) Loads fpacc from an 80-bit extended precision variable. sefpa Stores fpacc into an extended precision variable. lsfpo Loads floating-point operand (fpop) from single precision variable. ldfpo Loads fpop from a double precision variable. lefpo (I) Loads fpop from an extended precision variable. itof Converts 16-bit integer to floating-point value in fpacc. utof Converts unsigned 16-bit value to floating point. ultof Converts unsigned 32-bit value to floating point. ltof Converts signed 32-bit value to floating point. fpadd Adds fpop to fpacc. fpsub Subtracts fpop from fpacc. fpcmp Compares fpacc to fpop. fpmul Multiplies fpacc by fpop. fpdiv Divides fpacc by fpop. ftoa Converts fpacc to a decimal string. etoa Converts fpacc to ASCII string in exponent form.
Notes: Routines with the "2" suffix imply that the routine does not preserve the DI register. The routine returns DI pointing at the first character beyond the string. If the routine does not have the "2" suffix, then it preserves the ES:DI registers.
The "m" suffix implies that the routines automatically allocate storage for their string results on the heap by calling malloc. These routines return a pointer to the new string in ES:DI. Routines without the "m" suffix expect you to pass the address of a sufficiently large buffer in the ES:DI register pair.
Routines with the "I" suffix accept literal string constants in the code stream (that is, the string constant appears after the call to the specified routine).
If the routine you want isn't in the library, write it up, send it in, and we'll make sure it appears in the next release!
The library itself is currently divided into nine categories: input, output, file I/0, conversions, utilities, string operations, memory management, character-set operations, and floating-point operations. Most of these routines are high level; that is, they are nontrivial and typically require many lines of code and considerable knowledge to implement.
The printf routine is a good example of a high-level routine. It's something C programmers take for granted, but you rarely see it in typical assembly language programs. A typical call to the printf routine is shown in Example 1. As you can see, this code is close to that used by C and much more convenient than the usual methods assembly language programmers employ. Except for the DB and DD directives, this probably doesn't look a whole lot like assembly language, though. The printf statement invokes a macro found in the distribution stdlib.a file. Among other things, this macro issues a call to the sl_printf subroutine in the library. I use macros to invoke the library routines, not because I'm trying to be cute, but as a way to provide "smart linking" facilities in MASM 5.1. See Listing One (page 80) for a sample macro definition from the library.
Example 1: A typical call to the UCR printf routine
printf db "The value of I is %3d, string s is %s\n",O dd I, S
Not all routines in the UCR StdLib completely mimic their C counterparts. For example, itoa accepts an integer value in AX and converts it to a zero-terminated string of characters, storing the string starting at ES:DI. Like C, ES:DI must point at an array large enough to hold the result, or bad things may happen. However, it's often inconvenient to declare an array and load its address into ES:DI prior to calling itoa. Therefore, a second routine, itoam, is available to simplify the calling sequence. You pass itoam an integer value in AX and it returns a pointer to the converted string in ES:DI. You do not have to preallocate storage for the call; itoam does this automatically by calling malloc. At other times, it would be nice if itoa returned a pointer to the end of the converted string rather than to the beginning of it. (This is useful, for example, when building long strings via successive calls to routines such as itoa.) A third routine, itoa2, provides this facility. Having different routines such as itoa, itoa2, and itoam reduces the size of a program because you don't need as many "glue" instructions between library calls to massage the parameters.
The string routines should look familiar to C programmers. Almost all C string-handling routines (and then some!) are present in the UCR StdLib. Like itoa, there are various forms of some of the string functions, mainly for convenience. For example, strcat takes four different forms: strcat, strcatl, strcatm, and strcatml. These variants provide different ways of passing parameters to and from the strcat routine. strcat concatenates two strings: It copies the string pointed at by DX:SI to the end of the string pointed at by ES:DI. The destination string (ES:DI) must have sufficient memory allocated to hold the concatenation. strcatm also concatenates the string pointed at by ES:DI and DX:SI, but it does not treat the string pointed at by ES:DI as the target location. Instead, it (essentially) does what's shown in Example 2. strcatl links a literal string constant (the "I" suffix stands for "literal") to the end of the string referenced by ES:DI. Likewise, strcatml copies the source string (ES:DI) onto the heap, appends the literal string constant following the call, and returns the address of this new string in ES:DI. For example, if filename points at a DOS filename and you want to append ".EXE" to the end of it, you could use the code in Example 3.
Example 2: strcatm concatenates the string pointed at by ES:DI and DX:SI, but it does not treat the string pointed at by ES:DI as the target location.
temp = malloc( strlen(es:di) + strlen(dx:si) +1); strcpy(temp, es:di); strcpy(temp + strlen(es:di), dx:si); es:di = temp;
Example 3: Use this code to append .EXE to the end of filename.
les di, filename strcatml db ".EXE", 0 mov word ptr FullFileName, di mov word ptr FullFileName+2, es
Listing Two (page 80) provides a flavor of UCR Stdlib routine use. This is a short program which computes various statistics about the information in a text file and prints the results. The distribution package also includes test.asm, a file that demonstrates use of each routine in the library. Hopefully, Listing One will show you that it's worthwhile to obtain a copy of the library, and the test file will exemplify the use of each routine in the library.
Conclusion
To keep the library growing, I assign my assembly language students at UC Riverside a final project of adding one new, useful routine to the library. My Commercial Software Development class gets charged with testing, documenting, and otherwise cleaning up the work of the assembly language students. As such, new routines appear in the library on a periodic basis. The text box entitled "Obtaining and Licensing the UCR Standard Library" provides details on obtaining the latest version of the library.
The UCR Standard Library has proven to be of considerable aid in getting students up to speed with 80x86 assembly language. Beginners are finding assembly language easier to deal with than ever before, particularly when combined with MASM 6.0's high-level constructs. Advanced assembly language programmers use the library to help them quickly develop new programs in record time. Given that the library is free for all to use, it is definitely worth adding to your personal development suite if you use or are interested in learning how to use 80x86 assembly language.
Obtaining and Licensing the UCR Standard Library
There are two "official" distribution points for the UCR StdLib distribution package: BIX and Internet. If you have access to the BIX telecommunications service, you may obtain a copy of "UCRSTDLB.ZIP" from the IBM.PC/LISTINGS area. If you have ftp access to the Internet, you may obtain a copy of UCRSTDLB.ZIP via anonymous ftp to ucrmath.ucr.edu. After logging on, switch to the "PC" subdirectory (this is UNIX, so case counts!) and the rest should be fairly obvious.
In addition to the two official distribution points, the UCR StdLib routines are also available from many other telecommunications services, BBSs, and shareware distributors, including M&T Online and the DDJ Forum on CompuServe. However, there is a version control problem and I cannot guarantee that any service other than the ones mentioned above will have the latest version of the software. Given the way shareware and freeware software propagate around the world, I do not see a reasonable solution to this problem.
All replies and comments should be sent to the following address:
Randall Hyde Department of Computer Science UC Riverside, Riverside, CA 92521 Internet e-mail: [email protected] BIXmail: rhyde
_THE UCR STANDARD ASSEMBLY LANGUAGE LIBRARY_ by Randall Hyde[LISTING ONE]
<a name="0086_000d"> Printf macro ; The following extrn declaration only occurs if this is the first time ; printf appears in the source file. ; ifndef sl_printf stdlib segment para public 'slcode' extrn sl_printf:far stdlib ends endif ; ; Perform the call to the actual sl_printf routine: call far ptr sl_printf endm ; <a name="0086_000e"> <a name="0086_000f">[LISTING TWO]
<a name="0086_000f"> ;************************************************************************** ; TS: A "Text Statistics" package which demonstrates the use of the UCR ; Standard Library Package. ; Note: The purpose of this program is not to demonstrate blazing fast ; assembly code (it's not particularly fast) but rather to demonstrate how ; easy it is to write assembly code using the standard library and MASM 6.0. ; Randall Hyde ;************************************************************************** ; The following include must be outside any segment and before the ; ZZZZZZSEG segment. It includes all the macro definitions for the ; UCR Standard Library. include stdlib.a ;Links into the UCR Standard includelib stdlib.lib ; Library package. ; dseg segment para public 'data' ; WordCount dw 0 ;Holds file word count value LineCnt dw 0 ;Holds # of lines in file ControlCnt dw 0 ;Counts # of control characters Punct dw 0 ;Counts # of punctuation characters AlphaCnt dw 0 ;Counts # of alphabetic characters NumericCnt dw 0 ;Counts numeric digits in file Other dw 0 ;Counts other chars in file MemorySize dw 0 ;# of paragraphs of free memory Chars dw 0 ;Total number of chars in file TotalChars dq 0.0 ;FP version of the above FileHandle dw 0 ;STDLIB file handle Const100 dq 100.0 ; ; Create some sets to use in this program: set CharSet,Alphabetic,Punctuation,Control ; ; Character Counter array. CharCnt [ch] contains the number of "ch" ; characters appearing in the file. CharCnt dw 256 dup (0) ; ; Boolean flag to denote in/not in a word: InWord db 0 ; dseg ends ; cseg segment para public 'code' assume cs:cseg, ds:dseg ; ; LESI is a macro which loads a 32-bit immediate value into ES:DI. lesi macro adrs mov di, seg adrs mov es, di lea di, adrs endm ; ; ldxi loads a 32-bit immediate value into dx:si: ldxi macro adrs mov dx, seg adrs lea si, adrs endm ; ; Some useful constants- cr equ 13 lf equ 10 EOFError equ 8 ; public PSP ;DOS Program Segment Prefix PSP dw ? ;Needed by StdLib MemInit routine ; ; Okay, here's the main program which does the job: TS proc mov cs:PSP, es ;Save pgm seg prefix mov ax, seg dseg ;Set up the segment registers mov ds, ax mov es, ax ; ; Initialize the memory manager, giving all free memory to the heap. mov dx, 0 meminit mov MemorySize, cx ;Save # of available paragraphs. ; ; Set up the character sets: ; First, build the Alphabetic set: mov al, "A" mov ah, "Z" lesi Alphabetic RangeSet AddStrL db "abcdefghijklmnopqrstuvwxyz",0 ; ; Create the set with the punctuation characters: lesi Punctuation AddStrL db "!@#$%^&*()_-+={[}]|\':;<,>.?/~`", '"', 0 ; ; Create the control character set: lesi Control mov al,0 mov ah, 1fh RangeSet mov al, 7fh AddChar ; ; Print the amount of available memory and prompt the user to enter a file name. printf db "Text Statistics Program",cr,lf db "There are %d paragraphs of memory available",cr,lf db lf db "Enter a filename:",0 dd MemorySize ; getsm ;Get the filename. ; ; Open the file. mov al, 0 ;Open for reading. fopen ;Open the file. jnc GoodOpen ; ; If the carry flag comes back set, we've got an error, print an appropriate ; message and quit: print db "DOS error #",0 puti ;Error code is in AX. putcr jmp Return2DOS ; ; If the carry flag comes back clear, we've successfully opened the file. AX ; contains the STDLIB filehandle, ES:DI still points at the filename ; allocated on the heap: GoodOpen: mov FileHandle, AX ;Save STDLIB file handle. print db "Computing text statistics for ",0 puts ;Print filename free ;Dispose of space on heap putcr putcr ; ; The following loops check for transitions between words and delimiters. Each ; time we go from "not a word" -> "word" this code bumps up word count by one. mov ax, FileHandle fReadOn ; TSLoop: getc jnc NoError jmp ReadError ; ; See if the character is alphabetic NoError: lesi Alphabetic ;Set contains A-Z, a-z Member jz NotAlphabetic inc AlphaCnt jmp StatDone ; ; See if the character is a digit: NotAlphabetic: cmp al, "0" jb NotNumeric cmp al, "9" ja NotNumeric inc NumericCnt jmp StatDone ; ; See if the character is a punctuation character NotNumeric: lesi Punctuation Member jz NotPunctuation inc Punct jmp StatDone ; ; See if this is a control character: NotPunctuation: lesi Control Member jz NotControl inc ControlCnt jmp StatDone ; NotControl: inc Other StatDone: mov bl, al ;Use char as index into CharCnt mov bh, 0 shl bx, 1 ;Convert word index to byte index inc CharCnt [bx] ; ; Count lines and characters here: cmp al, lf jne NotNewLine inc LineCnt ; NotNewLine: inc Chars ; Count words down here cmp InWord, 0 ;See if we're in a word. je NotInAWord cmp al, " " ja WCDone mov InWord, 0 ;Just left a word jmp WCDone ; NotInAWord: cmp al, " " jbe WCDone mov InWord, 1 ;Just entered a word inc WordCount ; WCDone: ; ; Okay, or the current character into the character set so we can keep ; track of the characters which appear in this file. lesi CharSet AddChar jmp TSLoop ; ; Come down here on EOF or other read error. ReadError: cmp ax, EOFError je Quit print db "DOS Error ",0 puti putcr jmp Return2DOS ; ; Return to DOS. Quit: freadoff mov ax, FileHandle fclose printf db cr,lf,lf db "Number of words in this file is %d",cr,lf db "Number of lines in this file is %d",cr,lf db "Number of control characters is %d",cr,lf db "Number of punctuation characters is %d",cr,lf db "Number of alphabetic characters is %d",cr,lf db "Number of numeric characters is %d",cr,lf db "Number of other characters is %d",cr,lf db "Total number of characters is %d",cr,lf db lf, 0 dd WordCount,LineCnt,ControlCnt,Punct dd AlphaCnt,NumericCnt,Other,Chars ; ; Print the characters that actually appeared in the file. lesi CharSet EC64: mov cx, 64 ;Chars/line on output. EachChar: RmvItem cmp al, 0 je CSDone cmp al, " " jbe EachChar putc loop EachChar putcr jmp EC64 ; CSDone: print db cr,lf,lf db "Press any key to continue:",0 getc putcr putcr ; ; Now print the statistics for each character: mov ax, Chars ;Get character count, utof ; convert it to a floating lesi TotalChars ; point value, and save this sdfpa ; value in "TotalChars". ; ; Print out each character, the number of occurrences, and the ratio of ; this character's count to the total number of characters. mov bx, " "*2 ;Start output with spaces. ComputeRatios: cmp CharCnt[bx], 0 je SkipThisChar mov ax, bx shr ax, 1 ;Convert index to character putc ; and print it. print db " = ",0 mov ax, CharCnt [bx] mov cx, 4 putisize print db " Percentage of total is ",0 ; utof ; ; Divide by the total number of characters in the file: lesi TotalChars ldfpo fpdiv ; ; Multiply by 100 to get a percentage lesi Const100 ldfpo fpmul ; ; Print the ratio: mov al, 7 mov ah, 3 ftoam puts free print db "%",cr,lf,0 ; SkipThisChar: inc bx inc bx cmp bx, 200h jb ComputeRatios putcr ; Return2DOS: mov ah, 4ch int 21h ; TS endp ; cseg ends ; ; Allocate a reasonable amount of space for the stack (2k). sseg segment para stack 'stack' stk db 256 dup ("stack ") sseg ends ; ; zzzzzzseg must be the last segment that gets loaded into memory! ; The UCR Standard Library package uses this segment to determine where ; the end of the program lies. zzzzzzseg segment para public 'zzzzzz' LastBytes db 16 dup (?) zzzzzzseg ends end TS
Copyright © 1992, Dr. Dobb's Journal