Technical Paper

Fast Virtual Machine Architecture

» 

Java Technology Software

Site information

» Products
» Downloads
» Solutions
» Documentation
» FAQs
» Support

Performance and related products

» Fast VM paper

Promotions

» Test Drive
» DSPP Partner Edge

Related links

» Software
» HP-UX
» NonStop
» Visual Threads
» Systems
» HP OpenVMS
» HP-UX
» NonStop
 
 
 
Content starts here

Contents

» Introduction
» Benefits of the Fast VM
» High Performance Capabilities
» Architecture

» Overview
» Command Processor
» untime Subsystem
» Reusable Components

» Object Factory
» Operating System Services
» Compiler and Symbol Table

» Garbage Collection
» Mark-and-Sweep
» Copying
» Mostly Copying
» Summary

INTRODUCTION

The Fast Virtual Machine (Fast VM) is the hp® next-generation, Just-In-Time (JIT) compiler designed to increase Java application performance. By generating native code for methods as they are invoked, Fast VM yields application performance at rates typically 4 to 5 times faster than the same application run with the conventional Java Development Kit's (JDK) Classic JVM combined with other JIT compilers.

This Fast VM release targets the Alpha microprocessor running on hp Tru64 UNIX® systems.

This document highlights the Fast VM's benefits, describes its high performance capabilities, and presents a detailed view into the Fast VM architecture and technology.


 

Benefits of the Fast VM

Tru64 UNIX customers have come to rely on hp's timely SDK releases containing the Classic JVM and JIT compiler. Now, with Fast VM, Tru64 UNIX users will enjoy enhanced product features including:

  • High Performance. Efficient object format and allocation, runtime optimizations, and a Java execution environment highly tuned for the Alpha platform take the runtime performance of Java applications to the next level, virtually eliminating performance as a roadblock to deploying Java applications.
     
  • Java Compatibility. The Fast VM implements the full JDK and passes all the Java Compatibility Kit (JCK) tests. This is in contrast to research projects that have demonstrated excellent Java performance but implement only a subset of the JDK.
     
  • JDK Class File and Shared Library Support. Rather than using its own modified versions, Fast VM takes advantage of the thoroughly tested JDK class files and shared libraries.
     
  • Ease of Use. Users are presented with a single integrated java command. Fast VM is invoked by typing, "java -fast".


 

HIGH PERFORMANCE CAPABILITIES

Fast VM performance enhancements include direct execution of Java methods, efficient object format and allocation, a Java execution environment highly tuned for the Alpha architecture, and runtime optimizations. These techniques were designed to make compilation time negligible to the Java user.

  • Direct Execution

    When the Classic JVM executes a Java program, it reads and then interprets the bytecodes. The result is a JVM easily ported to numerous architectures but with slow execution when compared to conventional programming languages. A technique used to address the poor performance of Java bytecode interpretation is the integration of a JIT compiler with the interpreter. Instead of interpreting the bytecodes, the JVM passes them to the JIT compiler, which translates them into native code for the platform on which it is running. Although JIT compilers significantly increase performance, they are constrained by the interpreter. For example, method invocation returns control back to the interpreter to perform stack management rather than calling the method directly.

    Fast VM takes the approach of a conventional compiler and translates Java bytecodes directly into native machine code. Thus, every Java method is compiled. Java code executes as if it were written in a conventional programming language: there is a single stack per thread and calls are direct and conform to the Alpha calling standard. To the operating system, a Java method appears just like a procedure written in a conventional programming language.

  • Efficient Object Format

    The Fast VM eliminates the performance bottleneck resulting from the Classic JVM's representation of objects via handles.

    Over the past decade, modern reduced instruction set computer (RISC) systems have become prevalent. The speeds of these processors have increased at much faster rates than corresponding memory systems. For many applications, memory references rather than execution speeds become the performance bottleneck. The Classic JVM represents an object as a pointer to a data structure called a handle, which contains a pointer to the instance data. This object layout results in an unnecessary memory reference for every object access. Although handles have a number of desirable qualities (especially related to garbage collection and portability), the additional memory reference may result in significantly degraded performance on modern RISC processors.

    A complication faced by Fast VM is that some native methods in the JDK assume that an object reference points to a handle. The Fast VM provides an innovative solution to this problem by allowing this infrequent case of native methods accessing an object instance through a handle to work while Fast VM accesses that same object instance with only a single level of indirection. This is accomplished by allocating the handle and instance data adjacent to each other. Instance data is accessed by adding an offset to an object's address or by double indirection through the handle. Following is an illustration of the Fast VM's object format:

    The Fast VM Object Format

    Word

    Purpose

    0 [Handle] Pointer to Instance Data. Contains the address of the 4th word of this structure.
    1 [Handle] Pointer to Sun Metadata
    2 Pointer to class object and garbage collector bits
    3 Monitor and Array Length Information
    4 … (actual data) [Instance Data]


     

  • Fast Object Allocation

    Tru64 UNIX provides an efficient implementation of native threads and quick access to thread local storage, which allows Fast VM to perform fast object allocation. Each thread is given its own memory area from which to allocate objects. In the normal case, object allocation is accomplished by incrementing a pointer and requires no synchronization with other threads.

  • Fast Monitors

    An attraction of the Java programming language is that it makes it easy for programmers to write multi-threaded applications. In order to ensure the consistency of a set of related data structures, synchronization primitives are available to the programmer. These primitives are also used extensively by the JDK libraries so that these libraries can be safely invoked by multi-threaded applications.

    In the common case that only one thread tries to lock a given object, synchronization is accomplished without operating system intervention. The thread obtains a spin lock located in the object header, updates the header, and releases the spin lock. This results in monitor synchronization that is not a performance bottleneck for most real world applications.

  • Optimization of Runtime Checks

    One of the appeals the Java programming language holds for programmers is that it is strongly typed and provides automatic array bounds checking. Fast VM performs extensive analysis to minimize any performance penalty resulting from these runtime checks. For example, many array bound checks are redundant and can be eliminated. If an array bounds check is required, Fast VM performs a highly optimized code sequence that checks the lower and upper bounds with a single comparison instruction.

    Fast VM emits no additional instructions to detect a NULL pointer exception. Instead, optimized code is emitted and if the infrequent incident of de-referencing of a NULL pointer occurs, a signal is raised by the operating system, caught by Fast VM, and translated into a NullPointerException exception. Thus, only programs that actually de-reference NULL pointers run slower due to this safety feature.

  • Optimized Method Calls

    The Fast VM monitors program execution and optimizes method calls based on the changing environment. A key benefit of this approach is that users avoid performance penalties due to features they are not using.

    For example, if a method is not overridden, the method is called directly. However, when the method is overridden, the direct call is replaced by a call using a virtual function table (this action involves an extra memory reference).

  • Runtime Machine Specific Optimizations

    An advantage that virtual machines have over conventional compilers involves their knowledge of the runtime execution environment. For example, only later versions of the Alpha processors have byte manipulation instructions. Fast VM recognizes the type of Alpha processor it is executing on and emits processor specific code patterns.


 

ARCHITECTURE

Overview

The Fast VM is written in a portable subset of C++ using high level object-oriented abstractions and consists of reusable components. The following illustration provides an overview of the Fast VM architecture:

Command Processor

Fast VM Executable


 

Runtime Interface

RTL Entry Points - called directly from generated code

Java Native Interface (JNI)

Exception Handlers

Native methods overriding JDK provided versions

Glue – interpreter routines called by native methods


 

Reusable Components

Object Factory

Garbage Collector

Compiler and Symbol Table

System Services

 

Architecture – Command Processor

The Fast VM begins execution when the user invokes the java command with the appropriate switch or environment variable set. The Command Processor performs the following actions:

  • Parses and interprets the specified switches and environment variables. One of the command options is the name of the class containing the "main" method to be executed.
  • Loads the specified class resulting in the following:
    • Allocation of the class's static variables.
    • Production of stub code for each method of the class. Invoking the stub code results in the method being compiled and executed. Additionally, future invocations of the method go directly to the compiled code.
    • Compilation and execution of the class's static initializer.
  • Invokes the class's "main" method.
  • Returns control to the user after the main method completes.

 

Architecture – Runtime Subsystem

The most common method for returning control to Fast VM is through the RTL Entry points. These consist of approximately 25 entry points that the compiled code invokes directly. The entry points include:

  • Mathematical routines for integer divide, integer remainder, floating point divide, floating pointer remainder, or conversion.
  • Object creation routines.
  • Monitoring routines (enter and exit).
  • Checking routines for array store and cast operations.
  • Exception handling routine for throwing or catching exceptions.
  • Compilation routines for compiling methods into native machine code.

Another way control is returned to the Fast VM occurs when the user invokes a C or C++ routine that uses the Java Native Interface (JNI). JNI provides the Java programmer a JVM-independent mechanism for writing native methods. The function prototypes are provided in a file called "jni.h" and the JVM provides the implementation of these functions. JNI routines perform functions such as reading and writing a field, invoking a method, etc.

Rather than add expensive checks to generated code, Fast VM establishes exception handlers to catch certain conditions. For example, instead of prefixing every pointer dereference with a check for a NULL pointer, Fast VM emits code without checks and establishes a signal handler which throws a NullPointerException after the program dereferences address zero.

Both the Classic and Fast VMs use most JDK native methods. However, certain native methods depend upon the Classic JVM's object layout. In these cases, Fast VM provides an alternative implementation of the native method. The alternative native methods are collected in the Native Methods subsystem. One example is the java_lang_Object_hashCode which, when given an object, returns its hashcode. The Classic JVM's implementation depends on references that are pointers to immovable handles. In contrast, the Fast VM eliminates handles and references are pointers to objects freely moved by the garbage collector.

Some of the native methods in the JDK reference static variables or call routines exported by the Classic interpreter. In order to use these native methods, Fast VM provides an implementation of the required interpreter entry points and collects them together in the Glue subsystem. An example is a routine called SignalError that is frequently called by native methods implementing the Abstract Windowing Toolkit (AWT).

 

Architecture – Reusable components

Object Factory

The Object Factoryis the heart of the JVM. It contains C++ classes responsible for creating and manipulating objects. For example, the class JavaObject exports methods such as Create, MonitorEnter, and MonitorExit that operate on instances of java.lang.Object. The architecture is such that changing the format of an object involves modifying this one class. This class is "extended" to provided specific subclasses such as JavaClassObject (instances of java.lang.Class) and JavaArrayObject (array objects).

Operating System Services

The Operating System Services module contains routines providing portable system services. Examples of these services include thread management, exception processing, and file system operations.

Compiler and Symbol Table

The Compiler and Symbol Table module is responsible for loading and verifying classes, compiling methods and providing access to the symbol table.

An integral part of the Fast VM is the Garbage Collector, described in the next section.


 

Garbage Collection

Two common garbage collection algorithms are termed mark-and-sweep and copying. The Fast VM garbage collector combines techniques of the conservative mark-and-sweep and the accurate copying collectors.

Mark-and-Sweep

A mark-and-sweep collection typically consists of two phases. In the first phase, each object known to be reachable is visited and marked as live and then scanned for references to other objects, which in turn are visited. During the second phase, memory is linearly traversed, or swept, and unmarked objects are added to a free list. An optional third phase involves the compaction of marked objects.

Copying

A copying collector divides memory into two areas, referred to as fromspace and tospace. Objects are allocated in fromspace. When this area runs out, live objects are copied into tospace. The tospace area is then re-designated as fromspace and the area formerly occupied by fromspace is re-designated as tospace. Empirical studies show that most Java objects are short lived and consequently a large percentage of fromspace is not copied.

One key piece of data required by a copying collector is precise information that determines whether a given memory location contains a reference to an object. If the referenced object is copied, the memory location must be updated with the new address. Because the Fast VM supports the JDK, it must also support the old Native Method Interface (NMI). Therefore, when a program is executing a native method that is accessed via NMI instead of the newer JNI interface, the collector can not distinguish between an integer whose value is coincidentally an address within the heap and an actual reference to an object within that same heap. The collector has precise information regarding references within Java frames, within objects, and within JNI native methods.

Mostly Copying

Mostly-Copying is effectively a hybrid conservative and copying collector. It copies objects known to be alive and only pointed to by precise references. Objects, that are referenced via imprecise pointers, are not physically moved, but rather added to tospace using a sophisticated bookkeeping algorithm. It is worth noting that at a given collection point only a small percentage of the objects are referenced by imprecise pointers and typically at the next collection point a different set objects is identified as being imprecise. Another way of looking at this is that at a given collection point only a small number of threads are likely to be in a non-JNI native method and at the next collection point that native method has probably completed.


 

SUMMARY

The Fast VM provides users of hp Tru64 UNIX with one of the fastest Virtual Machines available today. This paper describes the architectural overview of a modern JVM and emphasizes how hp customers benefit from Fast VM and its high performance capabilities.


 


Trademarks

HP and the names of hp products referenced herein are either trademarks and/or service marks or registered trademarks and/or service marks of hp.

UNIX is a registered trademark in the United States and other countries, licensed exclusively through The Open Company.

Printable version Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc., in the U.S. and other countries.