input file with "magic numbers" (DEADCAFEh, etc.) into array given output from DumpPE -disasm: get Win32 filename from first line of dumppe output ignore everything until see "Disassembly" for each line in disassembly if it's a function label if opstring length > minlength and if opstring contains at least one ret or jmp if opstring length is maximum, might be junk so: chop it off at the first ret or jmp output filename, previous function name, and opstring set up for next opstring: set new function name; clear opstring; oplength = 0 else if it's an instruction line, and opstring isn't at maxlength if line contains a large hex operand if the hex operand is found in the "magic" array add mnemonic "_" magic number to opstring else if it's a Windows API call add API name to opstring without trailing 'W' or 'A' else if it's a branch target add "loc" to opstring else if it's common (mov, push, pop, add esp) if it's an API mov add "mov_" and API name to opstring else do nothing else if it's junk code (nop, int 3, etc.) do nothing else if it's data ; relying on DumpPE to separate code/data do nothing else if we've seen this same thing many times in a row do nothing else if mnemonic has prefix (rep, lock, etc.) add prefix "_" mnemonic to opstring else add mnemonic to opstring if added to opstring oplength++ stop processing when see hex dump output last one send output to mkmd5db
Figure 4: Pseudocode for the Opstring program.