Decompile Once, Run Anywhere (Web Techniques, Sep 1997)



January 01, 2002
URL:http://drdobbs.com/decompile-once-run-anywhere-web-techniqu/184414282

Web Techniques: Figure 1

Figure 1


Source code and byte code displayed using j++'s disassembly window.

Figure 1



public class Hello
{
    public static void main(String[] args)
    {
        // simple example to demonstrate
        // Mocha decompiler

        String welcome[] = new String[2];
        welcome[0] = "Hello Web Techniques";
        welcome[1] = "From Dublin, Ireland";

        int i;

        for (i=0; i < welcome.length; i++)
            System.out.println(welcome[i]);
    }
}
Web Techniques: Figure 2

Figure 2


Icebreaker, used by some decompiler hackers, allows the hacker to iterate back to original source by comparing decomplied /recompiled byecode to the original.

Figure 2


Compiled from Hello.java
public class Hello extends java.lang.Object {
    public static void main(java.lang.String []);
    public Hello();

Method void main(java.lang.String [])
   0 iconst_2
   1 anewarray class #6 <Class java.lang.String>
   4 astore_1
   5 aload_1
   6 iconst_0
   7 ldc #1 <String "Hello Web Techniques">
   9 aastore
  10 aload_1
  11 iconst_1
  12 ldc #2 <String "From Dublin, Ireland">
  14 aastore
  15 iconst_0
  16 istore_2
  17 goto 32
  20 getstatic #9 <Field java.lang.System.out Ljava/io/PrintStream;>
  23 aload_1
  24 iload_2
  25 aaload
  26 invokevirtual #10 <Method 
java.io.PrintStream.println(Ljava/lang/String;)V>
  29 iinc 2 1
  32 iload_2
  33 aload_1
  34 arraylength
  35 if_icmplt 20
  38 return

Method Hello()
   0 aload_0
   1 invokenonvirtual #8 <Method java.lang.Object.<init>()V>
   4 return

}


/* Decompiled by Mocha from Hello.class */
/* Originally compiled from Hello.java */

import java.io.PrintStream;

public class Hello
{
    public static void main(String astring1[])
    {
        String astring2[] = new String[2];
        astring2[0] = "Hello Web Techniques";
        astring2[1] = "From Dublin, Ireland";
        for (int i = 0; i < astring2.length; i++)
            System.out.println(astring2[i]);
    }

    public Hello()
    {
    }
}

Decompile Once, Run Anywhere (Web Techniques, Sep 1997)

Decompile Once, Run Anywhere

Protecting Your Java Source

By Godfrey Nolan

Partly due to the media controversy over Internet fraud and partly because of the history of the language, Java is one of the most secure languages for Internet applications. Applets cannot read or write to a Web browser's hard disk, call native code, or access any third-party URLs. The programmer has no access to memory pointers, so they can't stray outside the bounds of their applet or application.

That's not to say that Java hasn't any security holes, or that security isn't a moving target, but JavaSoft is working hard to limit hostile applets. JavaBeans, for example, will use the sandbox concept--where signed applets cannot be executed until the user is happy with their credentials. Quite rightly, Java is seen as a much safer option than an ActiveX control.

The number of applets appearing on Web pages is increasing rapidly. Sun claims that there are now tens of thousands of Web pages with Java applets. With the advent of the JavaStation and cross-platform office suites from Lotus and Corel, Java has begun to shift away from the Web site and onto the desktop. But is an applet or application safe from prying eyes? No. The focus has always been on safeguarding the end user's security; the developer has largely been ignored.

One reason the number of Web pages increased exponentially and the overall design jumped in leaps and bounds was that the HTML or client-side scripting for most interesting Web effects could easily be copied by viewing the source. It was taken for granted, however, that Java applets, ActiveX controls, and any server-side scripting code would not be visible.

Now, several decompilers have been written that take advantage of the large amount of symbolic information contained within an applet. In most cases, the decompiled code is missing only the comments, and often, the new code is tidier than the original. Possibly, this will level the playing field once again.

The Truth about Decompiling

Java's strength is its portability, which stems from Java's two-stage process: Source code is compiled into an intermediate bytecode known as a "class file," which can be either an applet or an application. A Java Virtual Machine (JVM) then interprets the class file. The class file is platform independent-it's the JVM that changes. JVMs have been built for most of the major and many of the minor operating systems, making Java class files very portable. But what has often been touted as Java's greatest advantage may well turn out to be its Achilles heel.

To get a better idea of what exactly is in a class file, we need to turn to the Java disassembler, javap, which comes with Sun's original JDK. javap -p prints the methods and variables of the class, whereas javap -c disassembles the bytecode into the JVM instructions annotated with the methods and variables. Listing One shows the Java source code for a simple "hello world" example. Listing Two is a partial listing of the output from the javap -c. Source code and bytecode can also be displayed together using Visual J++'s disassembly window; see Figure 1.

Unfortunately for the developer, the JVM instruction set is limited and the stack machine is relatively straightforward. It doesn't take long to understand how simple source code is compiled into bytecode and consequently, how bytecode relates to someone else's source code. To encourage companies to build JVMs, the VM specification is actually in the public domain on the JavaSoft ftp server and in book form, from Addison-Wesley. This specification allows programmers to write their own simple decompiler to reverse-engineer Listing Two into Listing One. But why write your own decompiler when several are already available?

Decompiler tools have long been available for languages from C to Visual Basic. For example, dcc from the University Of Queensland's Department of Computer Science decompiles .exe files into C programs. Hexadecimal editors have also been around for a long time, and generations of hackers have been using hex dumps to patch PC-based software to get around licensing restrictions.(Also see " Online".)

Java decompilers are particularly effective because applets are typically small, making the reverse-engineered code easy to understand-even without comments, they must contain a lot of symbolic information to satisfy the JVM, and users are actively encouraged to download them at no cost. All that was needed was a tool that took advantage of Java's design.

Decompiler Tools

The best-known Java decompiler is Mocha, written by Hanpeter van Vliet. Van Vliet intended Mocha to be used for learning purposes rather than for using applets without the original programmer's permission. Nevertheless, it caused quite a controversy-see the sidebar entitled "A Tercentennial." Mocha, a set of Java classes called from the Java command-line interpreter, is about as easy to install as the JDK and surprisingly effective for a beta shareware program. Out of ten sample applets downloaded from the Web, Mocha decompiled six completely, and a significant amount of code was recovered from the remaining four. Listing Three shows how Mocha dealt with our example in Listing One.

Other commercial decompilers are available. IBM wrote a decompiler called Jive, but the whitepaper Jive: A Java Decompiler has been classified and taken off their Web site. Java2Rose, a tool that originally worked with Rational Rose C++, has now been built into their Java compiler, Rose/Java. Rational Rose2Java was about as effective as Mocha on our sample of applets. But IceBreaker from Breaker Technologies was positively deadly when used in conjunction with Mocha.

IceBreaker, which is not readily available, consists of three windows; see Figure 2. The top window contains the source code typically generated by a failed attempt using Mocha. The bottom two panes contain bytecodes; compiling the source code in the top window generates the bytecode in the bottom left pane, and the target bytecode is in the bottom right pane. Differences between the two sets of bytecodes are shown in red. After examining the differences, the hacker guesses the source code, recompiles, and gradually reduces the number of red lines. As Martin Lambert of Breaker Technologies puts it, "build frequently to see the red lines going away." IceBreaker is much easier than it looks, because the JVM is a simple stack machine with a limited number of possible bytecodes. With a little practice, IceBreaker will decompile almost any small applet.

Implications

Many applets use a combination of getCodeBase, getDocumentBase, hash codes, and/or server authentication to stop anyone downloading an applet and then using it on another Web server. But if the applet is decompiled, any licensing restrictions can be removed before using it elsewhere. Applets can be lifted without anyone knowing. Even if the applet doesn't compile completely, useful code can be merged into another applet for later use.

Microsoft is well aware of Java's decompilation problems, but they insist that ActiveX and Java are compatible-not competitive-technologies. ActiveX controls do not have the same security restrictions as Java. They rely on the Authenticode system, where each control is digitally signed. If a signed ActiveX control does anything malicious, the software publisher can theoretically be tracked down and shot-sorry, prosecuted. An unsigned control is flagged as dangerous, but nothing prevents it from being downloaded. For all ActiveX's lack of security, it is much more difficult to reverse-engineer an ActiveX control. Expect to see an increase in ActiveX controls, as more developers become aware of this issue and ActiveX becomes available on more platforms.

Last year, Corel paid Novell $170 million for the WordPerfect product suite. WordPerfect, Quattro, and a personal information manager have now all been completely rewritten in Java and will hopefully become the first of many new programs available in a single cross-platform version. By making a prerelease copy of Office for Java available on its Web site, Corel has essentially allowed Web users to download huge chunks of their source code. Eric Lefebvre of Corel contends that the company is "not worried about the disassembling of Java," that it is "well aware of Mocha" but "will continue to offer Corel Office for Java from our Web site to all."

Protect Yourself!

As a first step, you might try compiling your applet or application as a release version rather than a debug version in products like Symantec Cafe. Decompilers have to supply their own variable names, but as you can see in Listing Three, the code is still understandable. You might also consider writing code that takes advantage of things that decompilers don't like, such as casts. Try adding random code segments that don't do anything. This might act as a fingerprint and form the basis for legal proceedings at a later date. You can compile the Java source code into an executable or object code using Asymetrix' SuperCede, but this will render the application nonportable.

To hinder or stop Java decompilation, you can use van Vliet's Java class obfuscator, Crema, which scrambles the variable and method names so that the decompiled source code is unintelligible gibberish. Obfuscated or not, however, the resultant decompiled code is still source code, which can always be manipulated back into a readable form.

Running our example in Listing One through Crema made it impossible for Mocha to decompile; it gave up with a java.lang.NullPointerException error. Not bad for a language that doesn't support pointers. Nevertheless, the application executes as normal.

JOBE, another obfuscator written by Erik Jokipii from University of California at San Diego, claims to remove symbolic information from the class files and to be aware of a package's class hierarchy, making it safe to use on large projects.

But as Bob Foster of Symantec Internet Tools remarks:

If you have code you want to hide, [then] run it on your server. Don't publish it (bytecodes, object code, obfuscated, whatever) or others will be able to read it.

Unfortunately, splitting code between the Web server and the browser adds more load to the Web server, attracts unwanted visitors, and is not very efficient for small applets.

The only sure-fire way of safeguarding an applet is to encrypt it. Breaker Technologies is currently promoting SoftSeal, which enables content providers to seal their files within a layer of encryption along with details of where they were originally created and where licenses may be purchased. Decompilation is almost impossible because of the encryption; altering individual bytes is also useless because of the digital signature. However, SoftSeal needs a SoftSeal-enabled Web browser. Neither Internet Explorer nor Navigator is currently SoftSeal enabled, and it is unclear whether or not the applet is vulnerable when it is eventually unencrypted.

Conclusion

Melodramatic as it may sound, putting an applet on a Web page is currently equivalent to distributing your source code. Asking the user to sign a licensing agreement or passing it through an obfuscater will not deter everyone from decompiling your code. More efficient decompilers in Java or other languages that are more adept at pattern matching will appear or possibly already exist.

A new version of Mocha was promised, but this never happenedbecause of the author's untimely death. According to his wife, van Vliet did sell the source code for both Mocha and Crema before he died-one or both of these products are likely to be incorporated into Borland's JBuilder. Other, less effective decompilers already exist, such as WingDis and DejaVu, and it's likely that more will appear, perhaps even Mocha 2 as part of JBuilder.

Decompiling other kinds of executable code is not impossible. Anyone who can read assembler can reverse-engineer code into a usable format. Anyone who has worked with compilers and debuggers can begin to spot patterns and start guessing code. But the sheer size of a project and the lack of intelligible variable names normally make it too difficult to decompile other languages. Unfortunately, this doesn't apply to most Java applets or applications. However, obfuscating and adding some of the other ideas I've mentioned may just make it too irritating for any rational human being to attempt to decompile your code.

(Get the source code for this article here.)


Godfrey is the technical director of Internet Business Ireland. He is also the Internet-security columnist for Software Futures magazine and is writing a book on Web servers for Windows 95, due out sometime this year. You can reach him at godfrey.ibi.ie.

"Decompile Once, Run Anywhere
By Godfrey Nolan
Web Techniques, Sept 1997

Web Techniques grants permission to use these listings for private or 
commercial use provided that credit to Web Techniques and the author is 
maintained within the comments of the source. For questions, contact
[email protected].
NOLAN

LISTING ONE


public class Hello
{
    public static void main(String[] args)
    {
        // simple example to demonstrate
        // Mocha decompiler

        String welcome[] = new String[2];
        welcome[0] = "Hello Web Techniques";
        welcome[1] = "From Dublin, Ireland";

        int i;

        for (i=0; i < welcome.length; i++)
            System.out.println(welcome[i]);
    }
}



LISTING TWO



Compiled from Hello.java
public class Hello extends java.lang.Object {
    public static void main(java.lang.String []);
    public Hello();

Method void main(java.lang.String [])
   0 iconst_2
   1 anewarray class #6 <Class java.lang.String>
   4 astore_1
   5 aload_1
   6 iconst_0
   7 ldc #1 <String "Hello Web Techniques">
   9 aastore
  10 aload_1
  11 iconst_1
  12 ldc #2 <String "From Dublin, Ireland">
  14 aastore
  15 iconst_0
  16 istore_2
  17 goto 32
  20 getstatic #9 <Field java.lang.System.out Ljava/io/PrintStream;>
  23 aload_1
  24 iload_2
  25 aaload
  26 invokevirtual #10 <Method 
java.io.PrintStream.println(Ljava/lang/String;)V>
  29 iinc 2 1
  32 iload_2
  33 aload_1
  34 arraylength
  35 if_icmplt 20
  38 return

Method Hello()
   0 aload_0
   1 invokenonvirtual #8 <Method java.lang.Object.<init>()V>
   4 return

}



LISTING THREE


/* Decompiled by Mocha from Hello.class */
/* Originally compiled from Hello.java */

import java.io.PrintStream;

public class Hello
{
    public static void main(String astring1[])
    {
        String astring2[] = new String[2];
        astring2[0] = "Hello Web Techniques";
        astring2[1] = "From Dublin, Ireland";
        for (int i = 0; i < astring2.length; i++)
            System.out.println(astring2[i]);
    }

    public Hello()
    {
    }
}

Web Techniques: Online

Online


Dcc

www.cs.uq.edu.au/groups/csm/dcc.html

Obfuscators

HoseMocha

www.math.gatech.edu/~mladue/HoseMocha.java

Crema

java.cern.ch:80/CremaE1/DOC/index.html

HashJava

www.sbktech.org/hashjava.html

Decompilers

Mocha

web.inter.NL.net/users/H.P.van.Vliet

WingDis

www.wingsoft.com/products.shtm

DejaVu

www.isg.de/OEW/Java/


Web Techniques: Sidebar

Sidebar


A Tercentennial


By Hanpeter van Vliet

Hanpeter van Vliet was the author of Mocha, the controversial Java decompiler. The first beta version of Mocha was released in June 1996, to no great fanfare. However, when its existence was reported in C|net in August, a furor arose in the Java-development community.

Van Vliet subsequently removed the decompiler from his site and wrote "A Tercentennial," a manifesto of sorts which he published in The Local, a "virtual pub" located at the Java UK Experience site (java.motiv.co.uk).

Van Vliet then held a vote to determine whether Mocha should be re-posted. The response was overwhelmingly in favor of its return, and it reappeared on his site along with Crema, an obfuscator for Mocha.

Not long afterward, many Mocha links inexplicably "went 404." Attempts to track down subsequent versions of Mocha and Crema came up short. Finally, it became known that Hanpeter had succumbed to the cancer he had been battling. He passed away on December 31, 1996 at the age of 34.

Mocha and Crema are still available at web.inter.NL.net/users/H.P.van.Vliet/. "A Tercentennial" is reprinted here in its entirety with permission from Motiv Systems, Ltd.

The Dutch have a reputation for stealing coffee. Exactly three centuries ago, in1696, my ancestors stole a coffee plant from the heavily guarded plantations of Mocha (Yemen). They shipped it to their east-Indian colony and cultivated it into a unique and successful species that would become known as "Java."

So what could be more appropriate to celebrate this than to release Mocha, the Java decompiler, this year? And isn't it apt that both the Java compiler and decompiler were written by Dutchmen? Not everybody seems to agree.

I should have known. By American standards, three hundred years ago is prehistoric. Coffee should be weak and instant, and in an oversized mug (with plenty of free refills). A cup of Mocha was bound to upset some stomachs. Whoever brewed that cup was liable for damages.

What's the fuss? Mocha is a Java decompiler, a program that reconstructs source code from binary classes. Although there are decompilers for many languages (Visual Basic, C, Clipper, Smalltalk, to name a few) the situation with Java is rather unique.

First of all, by design, Java's compiled classes contain an exceptional amount of symbolic information. Class names, field names, method names, and method signatures are necessary for the runtime linking of classes. In addition, data types and exceptions are required for the bytecode-verification process to ensure that downloaded programs play by the rules of the language. More symbolic information also means more meaningful decompiler output. And because Java programs—applets—are typically small, the absence of comments in the source code is hardly an obstacle to understanding.

Secondly, compiled Java programs are free. In fact, you get them without asking for them. This obviously does not help to convince the receiver that they do represent a value. Like the plastic toys you find in the cereal box (if the kids give you a chance) they appear cute sometimes, but always worthless. It can be tempting to use the interesting parts of that free stuff in creative ways.

Last but not least, Java is in a rather explosive phase. Many companies are attempting to stake out a part of that expanding market. A small advantage in know-how could prove essential in establishing yourself over your competitors. Making your source code available to the world is not the smartest move, but in a way that's what you're doing if you distribute binary classes.

All in all, there is a very low threshold to "borrowing" code, and at the same time, the "tactical" value of a few lines of code is, apparently, considered enormous.

Enough reason for two companies to threaten to sue me for damages. Not immediately recognizing this as a knee-jerk reaction, I have responded to that by temporarily removing Mocha from my site. I needed some time to figure out whether the claims could be substantiated, and I wanted to give developers a time-out to get over the shock. In the meantime, I have discovered that it is a small (but vocal) minority that objects to Java decompilers, and that their moaning has no legal basis.

In other words, Mocha will be back. In fact, it will be meaner than ever.

I'm not going to defend Mocha. It defends itself. Even if its existence cannot be justified in other ways, it at least drives home the point that compilation is not a good way to hide your secrets. That is a valuable insight both for implementers of security (remember the hole in Netscape's first SSL implementation?) and for commercial applet developers. Attempting to ban decompilers--to the extent that they are only available to criminals--is ostrich policy.

A smarter response is to find ways to deal with decompilers. In the area of cryptography, the answer was found long ago in public-key algorithms. Knowledge of the algorithm (and the key) simply does not help you to break the cipher. Hence, publication of the algorithm (whether explicitly or via reverse engineering) is safe.

Commercial software developers have two options. Whether they use Java or not, the only way to keep their algorithms absolutely confidential is not to distribute them. Partitioning an application in client and server modules is a way to do that, and with Java this is becoming easier than ever. But it has some obvious drawbacks--it increases load on the server, and it is overkill for little applets.

The alternative is to accept the risk of reverse engineering (like we've done for years), but to try and make it as hard as possible. To that end, it would help if the difference in abstraction between source code and object code would be large. For the more complex the transformation done by the compiler, the more difficult it is to do the reverse transformation. Unfortunately, the Java language is not very abstract, and the bytecode was designed to be close to the language.

With Java, it seems that the best thing you can do is to remove as much symbolic information as possible from your program. Or better yet, replace the symbolic information with invalid identifiers like numbers and keywords. This does not stop a decompiler, but it does make its output unintelligible to both humans and compilers. This is a good opportunity to plug my Java obfuscator, called "Crema." I used an early version of this program to protect Mocha itself against decompilation. If you happen to have a copy of Mocha, you'll notice that most of its classes have numbers rather than names. Decompilation of such a class results in an interesting potpourri of numbers, not in a valid Java source. Crema has since been refined in many ways, and will soon be released.




Code Resources

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.