Java Virtual Machine - Kirkwood Community · PDF filejava virtual machine Source code (java or...

Post on 17-Mar-2018

252 views 4 download

Transcript of Java Virtual Machine - Kirkwood Community · PDF filejava virtual machine Source code (java or...

Java Virtual Machine

part 1

One more architecture: the Java Virtual Machine

• JVM is a virtual machine that executes Java Byte Code (JBC)

– start with Java source code (.java text file)

– Java compiler (javac) creates JBC file with .class extension

• standardized binary format

• single program consists of one or more class files

• multiple class files may be packaged together as a .jar file for program distribution

JVM execution environment

• JVM is represented by an executable program called java

– emulates JVM instruction set

– interpreter

– JBC is stack based, so JVM uses stack architecture

Why a “virtual machine?”

• Idea didn’t originate with Java – idea comes from time-sharing systems (circa 1960s)

• VM advantages:

– platform independence

– transcends physical limits

– ease of updates

– security safeguards

Platform independence

• Java compiler is platform-independent: makes no assumptions about characteristics of underlying hardware

• JVM required to run Java byte code

• Works as a wrapper around a real machine’s architecture– so the JVM itself is extremely platform dependent

Java environment vs. “traditional” HLL environment

hardware platform

machine language

java virtual machine

Source code (java or other HLL)

compiler/linker

java compiler

class file (JBC)

How it works

• Java compiler translates source code into JBC

• JVM acts as interpreter - translates specific byte codes into machine instructions specific to the harbor platform it’s running on

• Acts like giant switch/case structure: each bytecode instruction triggers jump to a specific block of code that implements the instruction in the architecture’s native machine language

JVM’s superpower: transcends physical limits

• No hardware costs (both $ and resource tradeoffs)

• Because of multithreading, can have (seemingly) unlimited processor power

• No backward compatibility issues

• Can be adapted to optimize hardware resources of specific platform

• Designed from scratch in mid-90s: several generations of engineering experience led to design superior to most physical chips

JVM and security issues

• A virtual machine can be (and JVM is) configured to run in secure environment

• VM can intervene if a program tries to do something it shouldn’t – can enforce stricter security policies than those of OS

• JVM bytecode is verifiable

– most security flaws happen by accident

– byte code is checked by both compiler and JVM

– result: improved software quality & reliability

Downside of virtual machines

• Before Java, virtual machines relatively uncommon because of performance issues

• Takes about 1000 times longer to do an operation in software instead of hardware

– hardware advances & compiler improvements mitigate this

– in practical terms, speed difference is anywhere from 2x slower to more than about 6% slower

• VM doesn’t provide direct control over hardware available with native low-level language

Characteristics of JVM

• Not just a virtual machine; something like a virtual operating system

• Case in point: output statement printf()

– Compiled C program calls the operating system’s write() function

– Compiled Java program makes similar call, but to a JVM routine which then calls the “real” write() function

Java & threading

• Because Java is a virtual machine, it is free of some of the constraints of a real machine architecture

• Java threads exemplify this

– separate processes running in parallel

– simulates a multi-processor environment independent of actual platform

Characteristics of JVM

• Stack-based language & machine

• Each thread within a program has its own stack

• 32-bit word size

• Relatively small instruction set (about 200 instructions)

Characteristics of JVM: registers

• 4 registers (sort of):

– program counter (PC)

– optop: points to top of operand stack for currently-active method

– frame: points to stack frame for current method

– vars: points to start of local variables for current method

• Each program (or thread) has these, as well as its own stack

Characteristics of JVM: registers

• No general-purpose registers

– means more memory fetches, detrimental to performance

– tradeoff is high degree of portability

• Most instructions access stack

Characteristics of JVM: stack memory

• Each method call produces its own stack frame, which is pushed on the thread’s stack; a return instruction pops the stack

• Stack frame includes:

– local variables section

– operand stack section

Local variables section of stack frame

• Consists of set of word-size slots, each of which holds a single variable; includes

– parameters & locally-declared data, in order of declaration;

– if method is non-static, first slot (slot 0) contains pointer to “this”

Operand stack section of stack frame

• Operand stack section is where method’s instructions operate - “the” stack referred to when talking about instructions operating on the stack

• Maximum depth of operand stack is determined at compile time

• Current stack depth is determined by number & type of operands on stack:

– double and long values take up two slots

– all other data types take one slot

JVM Method area

• Stores classes used by executing program; includes:

– bytecode & access types of methods

– values & access types of static variables

• PC points to this area – location of next instruction

• Method area also includes constant pool – storage for literal values used in program

JVM Heap

• Memory allocated for objects from this area

• Holds object’s instance values and pointer to object’s class in the Method area

JVM instruction set

• Instructions consist of one-byte opcode followed by 0 or more operands

• Instruction types include:

– load/store of local variables & object fields

– array

– arithmetic and logical

– type conversion

– control

– method call/return

Java Byte Code

• As assembly language code is to most HLLs, JBC is to Java

• Although most programmers work at the high level, a thorough understanding of the lower level helps us achieve better performing, lower cost software

JVM instructions*

• A JVM instruction consists of a one-byte opcode specifying the operation to be performed, followed by zero or more operands supplying arguments or data that are used by the operation

• Many instructions have no operands and consist only of an opcode

* This and the next several slides are almost verbatim from the official

reference on all things JVM; the quoted parts are in purple:

http://java.sun.com/docs/books/jvms/second_edition/html/

The JVM Loop

Ignoring exceptions, the inner loop of a Java

virtual machine interpreter is effectively

do {

fetch an opcode;

if (operands)

fetch operands;

execute the action for the opcode;

} while (there is more to do);

Opcodes and operands

• The number and size of the operands are determined by the opcode

• If an operand is more than one byte in size, then it is stored in big-endian order

• The bytecode instruction stream is only single-byte aligned

• Not assuming data alignment means that immediate data larger than a byte must be constructed from bytes at run time on many machines

JBC Data Types

• Correspond closely to Java types; conspicuous for its absence is boolean, which in JBC is stored as an int

• This is because it is no more (and is likely to be less) efficient in most real architectures to access a single bit as opposed to a single (32-bit) word – so boolean values are stored as 1 or 0

• Other sub-word storage types (byte, short and char) are promoted to word type for arithmetic operations (implicit promotion, to us) – but that’s in the stack, not in memory

• Operations on these types are, effectively, int operations

JBC data types Data type JBC Code Explanation

int i 32-bit signed integer

float f 32-bit IEEE 754 floating point number

long l 64-bit integer – takes 2 stack frames

double d 64-bit IEEE 754 floating point number (2 stack frames)

byte b 8-bit signed integer

short s 16-bit signed integer

char c 16-bit unsigned integer or Unicode (UTF-16) character

address a Objects

JVM stack frames

• Recall that each thread or program has its own JVM stack to store frames

• Frames are created when methods are invoked

• Frame consists of:

– operand stack

– local variable table (array)

– pointer to the runtime constant pool of the current method’s class

JVM stack frames

• Size of both operand stack and local variable table are determined at compile time

• Operand stack stores:

– operands for opcode instructions

– operation results

– return values from methods

JVM instructions

• Data typing in Java requires type-specific instructions; thus for example, the add instruction comes in four different flavors:

– iadd: adds integers

– ladd: adds longs

– fadd: adds floats

– dadd: adds doubles

• Similar instructions exist for other arithmetic operations

Arithmetic instructions

• Each arithmetic instruction works as follows:

– top 2 elements are popped off stack

– result is computed

– result is pushed to stack

• “Elements” may be one or two words in size:

– ints and floats: 32 bits, single word

– doubles and longs: 64 bits, two words each

Arithmetic instructions

• The modulus operation exists only for the integer and long types; the instruction is irem or lrem

• On the high level, modular division is allowed on float and double types, but the result is always a whole number – evidence of implicit type conversion

Data typing and arithmetic instructions

• Mixed-type expressions must have all operands converted to single data type for evaluation

• Unary conversion operations exist to facilitate this:

– i2f: converts int to float

– b2i: converts byte to int

– etc.

• Always possible to convert between the 4 basic types (i, f, l and d), and anything can be converted to int

Logical & shift operations

• Operate on integer types only

• Logical operations include and, or and xor; examples:

– land: and on 2 longs

– ixor: xor on 2 ints

• Shift operations on ints include ishl, ishr and iushr (unsigned shift right); similar operations exist for longs

Data access operations

• Load/store instructions exist for the 4 primitive types; transfer values between local variable table & operand stack

– iload: push int variable on stack

– dstore: store stack value in local double variable

– const versions exist for literal value load/store

• Can also load/store objects using aload/astore

Object creation & manipulation

• Although both class instances and arrays are objects, the Java virtual machine creates and manipulates class instances and arrays using distinct sets of instructions:

• Create a new class instance: new.

• Create a new array: newarray, anewarray, multianewarray.

Object creation & manipulation

• Access static fields and instance variables: getfield, putfield, getstatic, putstatic

• Load an array component onto the operand stack: iaload, laload, faload, daload, aaload, etc.

• Store a value from the operand stack as an array component: iastore, etc.

• Get the length of array: arraylength

Stack manipulation & method handling

• Several instructions operate directly on the operand stack, including pop, pop2, swap, and several others

• Method invocations are handled by instructions specific to the type of method:

– invokevirtual: starts an instance method

– invokestatic: starts a static method

– invokeinterface: starts a method specified by an interface

• Various return instructions are used to return values from methods (ireturn, dreturn, etc. – also just return for void methods)

Java class files

• Consists of stream of bytes

• Class file data types describe the various fields in the class file format:

– u1: unsigned 1-byte number

– u2: unsigned 2-byte

– u4: unsigned 4-byte

• The next several slides describe the class file format in depth

ClassFile structure ClassFile {

u4 magic;

u2 minor_version;

u2 major_version;

u2 constant_pool_count;

cp_info constant_pool[constant_pool_count-1];

u2 access_flags;

u2 this_class;

u2 super_class;

u2 interfaces_count;

u2 interfaces[interfaces_count];

u2 fields_count;

field_info fields[fields_count];

u2 methods_count;

method_info methods[methods_count];

u2 attributes_count;

attribute_info attributes[attributes_count];

}

Fields in Classfile structure

• magic_number: used to identify this file as a class; the magic number is the hex value CAFEBABE (no, I’m not kidding)

• minor_version and major_version are class file versions; values must fall within a range of numbers (defined by Sun) in order to be runnable on a particular JVM

Fields in Classfile structure

• constant_pool [] and constant_pool_count:

– constant_pool is an array of string literals, class, interface and field descriptors that are referenced in the Classfile structure

– constant_pool_count is the size of the constant_pool

• access_flags: set of flags indicating access information (public, private, etc.) about the class or interface

Fields in Classfile structure

• this_class: must be valid index to constant_pool; entry at that index is structure describing (very briefly) the current class

• super_class: 0 if this class is not derived; otherwise, must be valid to constant_pool; entry at index describes the superclass

Fields in Classfile structure

• interfaces[] and interfaces_count: the latter is the number of superinterfaces of the current class; the former is an array whose entries are valid indexes to the constant_pool, where the entries are structures describing all of this class’s superinterfaces

• fields[], fields_count, methods[], methods_count, attributes[] and attributes_count: more of the same

Example descriptor field

• A method descriptor has the following format:

method_info {

u2 access_flags;

u2 name_index;

u2 descriptor_index;

u2 attributes_count;

attribute_info attributes[attributes_count];

}

Digging in a little deeper …

• The method_info structure (which is itself an entry in the constant_pool) contains another structure, attribute_info[]

• As the name suggests, this structure is an array of method attributes

• Attributes include constant values, code, exceptions and several others; we will confine our discussion to the first two

ConstantValue attribute

• fixed-length structure representing value of a static constant; descriptor is:

• Both indexes refer to the constant_pool, where they must match legitimate entries; value of attribute_length for ConstantValue is 2

ConstantValue_attribute {

u2 attribute_name_index;

u4 attribute_length;

u2 constantvalue_index;

}

Code_attribute

• Contains actual JVM instructions for a method

• Descriptor: Code_attribute {

u2 attribute_name_index;

u4 attribute_length;

u2 max_stack;

u2 max_locals;

u4 code_length;

u1 code[code_length];

u2 exception_table_length;

{ u2 start_pc;

u2 end_pc;

u2 handler_pc;

u2 catch_type; } exception_table[exception_table_length];

u2 attributes_count;

attribute_info attributes[attributes_count];

}

Code_attribute fields

• max_stack and max_locals give the size of the operand stack and local variable table for the method’s frame

• code_length and code[] are the number of instructions and an array containing the instructions themselves

Code_attribute fields

• exception_table[] is an ordered array of exception handler descriptors; exception_table_length is the array size

• start_pc and end_pc indicate the indexes of the code[] array that define the range within which exception listeners are active

Even imagination has its limits

• The JVM, because it isn’t real, has access to theoretically unlimited resources

• Of course, it still has to exist in the real world, and both hardware and the JVM spec itself impose constraints

• The next several slides describe some of these

Size matters

• Consider the following ClassFile attributes:

– u2 constant_pool_count;

– u2 fields_count;

– u2 methods_count;

– u2 attributes_count;

• All of these are type u2: unsigned 2-byte integer; the maximum size of a 2-byte integer is 65,535

• Thus, each of the arrays described by the various count attributes is limited to ~64K entries

Size matters (again)

• The Code_attribute descriptor includes these fields:

– u2 max_stack;

– u2 max_locals;

– u4 code_length;

• Thus the operand stack and local variable table for a method are subject to the same 64K limit

• The code_length attribute implies that the number of instructions could exceed the limit, but the actual code length is limited by the size of the exception_table

Last word on sizes

• 255 is the maximum number of:

– array dimensions

– parameters to a single method

• 65,535 is the maximum length of:

– identifiers

– String literals