Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Decompile Once, Run Anywhere (Web Techniques, Sep 1997)


Decompile Once, Run Anywhere (Web Techniques, Sep 1997)

Decompile Once, Run Anywhere

Protecting Your Java Source

By Godfrey Nolan

Partly due to the media controversy over Internet fraud and partly because of the history of the language, Java is one of the most secure languages for Internet applications. Applets cannot read or write to a Web browser's hard disk, call native code, or access any third-party URLs. The programmer has no access to memory pointers, so they can't stray outside the bounds of their applet or application.

That's not to say that Java hasn't any security holes, or that security isn't a moving target, but JavaSoft is working hard to limit hostile applets. JavaBeans, for example, will use the sandbox concept--where signed applets cannot be executed until the user is happy with their credentials. Quite rightly, Java is seen as a much safer option than an ActiveX control.

The number of applets appearing on Web pages is increasing rapidly. Sun claims that there are now tens of thousands of Web pages with Java applets. With the advent of the JavaStation and cross-platform office suites from Lotus and Corel, Java has begun to shift away from the Web site and onto the desktop. But is an applet or application safe from prying eyes? No. The focus has always been on safeguarding the end user's security; the developer has largely been ignored.

One reason the number of Web pages increased exponentially and the overall design jumped in leaps and bounds was that the HTML or client-side scripting for most interesting Web effects could easily be copied by viewing the source. It was taken for granted, however, that Java applets, ActiveX controls, and any server-side scripting code would not be visible.

Now, several decompilers have been written that take advantage of the large amount of symbolic information contained within an applet. In most cases, the decompiled code is missing only the comments, and often, the new code is tidier than the original. Possibly, this will level the playing field once again.

The Truth about Decompiling

Java's strength is its portability, which stems from Java's two-stage process: Source code is compiled into an intermediate bytecode known as a "class file," which can be either an applet or an application. A Java Virtual Machine (JVM) then interprets the class file. The class file is platform independent-it's the JVM that changes. JVMs have been built for most of the major and many of the minor operating systems, making Java class files very portable. But what has often been touted as Java's greatest advantage may well turn out to be its Achilles heel.

To get a better idea of what exactly is in a class file, we need to turn to the Java disassembler, javap, which comes with Sun's original JDK. javap -p prints the methods and variables of the class, whereas javap -c disassembles the bytecode into the JVM instructions annotated with the methods and variables. Listing One shows the Java source code for a simple "hello world" example. Listing Two is a partial listing of the output from the javap -c. Source code and bytecode can also be displayed together using Visual J++'s disassembly window; see Figure 1.

Unfortunately for the developer, the JVM instruction set is limited and the stack machine is relatively straightforward. It doesn't take long to understand how simple source code is compiled into bytecode and consequently, how bytecode relates to someone else's source code. To encourage companies to build JVMs, the VM specification is actually in the public domain on the JavaSoft ftp server and in book form, from Addison-Wesley. This specification allows programmers to write their own simple decompiler to reverse-engineer Listing Two into Listing One. But why write your own decompiler when several are already available?

Decompiler tools have long been available for languages from C to Visual Basic. For example, dcc from the University Of Queensland's Department of Computer Science decompiles .exe files into C programs. Hexadecimal editors have also been around for a long time, and generations of hackers have been using hex dumps to patch PC-based software to get around licensing restrictions.(Also see " Online".)

Java decompilers are particularly effective because applets are typically small, making the reverse-engineered code easy to understand-even without comments, they must contain a lot of symbolic information to satisfy the JVM, and users are actively encouraged to download them at no cost. All that was needed was a tool that took advantage of Java's design.

Decompiler Tools

The best-known Java decompiler is Mocha, written by Hanpeter van Vliet. Van Vliet intended Mocha to be used for learning purposes rather than for using applets without the original programmer's permission. Nevertheless, it caused quite a controversy-see the sidebar entitled "A Tercentennial." Mocha, a set of Java classes called from the Java command-line interpreter, is about as easy to install as the JDK and surprisingly effective for a beta shareware program. Out of ten sample applets downloaded from the Web, Mocha decompiled six completely, and a significant amount of code was recovered from the remaining four. Listing Three shows how Mocha dealt with our example in Listing One.

Other commercial decompilers are available. IBM wrote a decompiler called Jive, but the whitepaper Jive: A Java Decompiler has been classified and taken off their Web site. Java2Rose, a tool that originally worked with Rational Rose C++, has now been built into their Java compiler, Rose/Java. Rational Rose2Java was about as effective as Mocha on our sample of applets. But IceBreaker from Breaker Technologies was positively deadly when used in conjunction with Mocha.

IceBreaker, which is not readily available, consists of three windows; see Figure 2. The top window contains the source code typically generated by a failed attempt using Mocha. The bottom two panes contain bytecodes; compiling the source code in the top window generates the bytecode in the bottom left pane, and the target bytecode is in the bottom right pane. Differences between the two sets of bytecodes are shown in red. After examining the differences, the hacker guesses the source code, recompiles, and gradually reduces the number of red lines. As Martin Lambert of Breaker Technologies puts it, "build frequently to see the red lines going away." IceBreaker is much easier than it looks, because the JVM is a simple stack machine with a limited number of possible bytecodes. With a little practice, IceBreaker will decompile almost any small applet.

Implications

Many applets use a combination of getCodeBase, getDocumentBase, hash codes, and/or server authentication to stop anyone downloading an applet and then using it on another Web server. But if the applet is decompiled, any licensing restrictions can be removed before using it elsewhere. Applets can be lifted without anyone knowing. Even if the applet doesn't compile completely, useful code can be merged into another applet for later use.

Microsoft is well aware of Java's decompilation problems, but they insist that ActiveX and Java are compatible-not competitive-technologies. ActiveX controls do not have the same security restrictions as Java. They rely on the Authenticode system, where each control is digitally signed. If a signed ActiveX control does anything malicious, the software publisher can theoretically be tracked down and shot-sorry, prosecuted. An unsigned control is flagged as dangerous, but nothing prevents it from being downloaded. For all ActiveX's lack of security, it is much more difficult to reverse-engineer an ActiveX control. Expect to see an increase in ActiveX controls, as more developers become aware of this issue and ActiveX becomes available on more platforms.

Last year, Corel paid Novell $170 million for the WordPerfect product suite. WordPerfect, Quattro, and a personal information manager have now all been completely rewritten in Java and will hopefully become the first of many new programs available in a single cross-platform version. By making a prerelease copy of Office for Java available on its Web site, Corel has essentially allowed Web users to download huge chunks of their source code. Eric Lefebvre of Corel contends that the company is "not worried about the disassembling of Java," that it is "well aware of Mocha" but "will continue to offer Corel Office for Java from our Web site to all."

Protect Yourself!

As a first step, you might try compiling your applet or application as a release version rather than a debug version in products like Symantec Cafe. Decompilers have to supply their own variable names, but as you can see in Listing Three, the code is still understandable. You might also consider writing code that takes advantage of things that decompilers don't like, such as casts. Try adding random code segments that don't do anything. This might act as a fingerprint and form the basis for legal proceedings at a later date. You can compile the Java source code into an executable or object code using Asymetrix' SuperCede, but this will render the application nonportable.

To hinder or stop Java decompilation, you can use van Vliet's Java class obfuscator, Crema, which scrambles the variable and method names so that the decompiled source code is unintelligible gibberish. Obfuscated or not, however, the resultant decompiled code is still source code, which can always be manipulated back into a readable form.

Running our example in Listing One through Crema made it impossible for Mocha to decompile; it gave up with a java.lang.NullPointerException error. Not bad for a language that doesn't support pointers. Nevertheless, the application executes as normal.

JOBE, another obfuscator written by Erik Jokipii from University of California at San Diego, claims to remove symbolic information from the class files and to be aware of a package's class hierarchy, making it safe to use on large projects.

But as Bob Foster of Symantec Internet Tools remarks:

If you have code you want to hide, [then] run it on your server. Don't publish it (bytecodes, object code, obfuscated, whatever) or others will be able to read it.

Unfortunately, splitting code between the Web server and the browser adds more load to the Web server, attracts unwanted visitors, and is not very efficient for small applets.

The only sure-fire way of safeguarding an applet is to encrypt it. Breaker Technologies is currently promoting SoftSeal, which enables content providers to seal their files within a layer of encryption along with details of where they were originally created and where licenses may be purchased. Decompilation is almost impossible because of the encryption; altering individual bytes is also useless because of the digital signature. However, SoftSeal needs a SoftSeal-enabled Web browser. Neither Internet Explorer nor Navigator is currently SoftSeal enabled, and it is unclear whether or not the applet is vulnerable when it is eventually unencrypted.

Conclusion

Melodramatic as it may sound, putting an applet on a Web page is currently equivalent to distributing your source code. Asking the user to sign a licensing agreement or passing it through an obfuscater will not deter everyone from decompiling your code. More efficient decompilers in Java or other languages that are more adept at pattern matching will appear or possibly already exist.

A new version of Mocha was promised, but this never happenedbecause of the author's untimely death. According to his wife, van Vliet did sell the source code for both Mocha and Crema before he died-one or both of these products are likely to be incorporated into Borland's JBuilder. Other, less effective decompilers already exist, such as WingDis and DejaVu, and it's likely that more will appear, perhaps even Mocha 2 as part of JBuilder.

Decompiling other kinds of executable code is not impossible. Anyone who can read assembler can reverse-engineer code into a usable format. Anyone who has worked with compilers and debuggers can begin to spot patterns and start guessing code. But the sheer size of a project and the lack of intelligible variable names normally make it too difficult to decompile other languages. Unfortunately, this doesn't apply to most Java applets or applications. However, obfuscating and adding some of the other ideas I've mentioned may just make it too irritating for any rational human being to attempt to decompile your code.

(Get the source code for this article here.)


Godfrey is the technical director of Internet Business Ireland. He is also the Internet-security columnist for Software Futures magazine and is writing a book on Web servers for Windows 95, due out sometime this year. You can reach him at godfrey.ibi.ie.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.