Codesigned Virtual Machines Part
description
Transcript of Codesigned Virtual Machines Part
![Page 1: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/1.jpg)
Codesigned Virtual MachinesPart <II>
2006. 10. 18Yu, Young Jin
DCSLAB
![Page 2: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/2.jpg)
Contents• Introduction• Case Study (1)
– Transmeta Crusoe• Case Study (2)
– IBM AS/400
![Page 3: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/3.jpg)
Applying Codesigned VMs
• Advantages(performance, power efficiency, flexibility) can be achieved,– At the macro level: entirely new ISAs
• VLIW: Transmeta Crusoe, IBM Daisy/BOA• OO source ISA: IBM AS/400
– At the micro level• The implementation of specific performance enhan
cement• Instructions reordering, …
![Page 4: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/4.jpg)
Case Study (1):
Transmeta Crusoe
![Page 5: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/5.jpg)
Introduction• In Jan. of 2000, Transmeta Corp. introduce
d the Crusoe processors.– Remarkably low power consumption
• As might not be expected, The new technology is fundamentally software-based.– The power savings come from replacing large n
umbers of transistors with software.
![Page 6: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/6.jpg)
The Crusoe Processor• Consists of a hardware engine logically sur
rounded by a software layer.– H/W: The engine
• is a VLIW CPU capable of executing up to four operations in each clock cycle.
• No resemblance to the x86 instruction set.
– S/W: Code Morphing Software(CMS)• Dynamically “morphs” x86 instructions into VLIW in
structions
![Page 7: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/7.jpg)
The Crusoe Processor
![Page 8: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/8.jpg)
• CMS technology changes the entire approach to designing microprocessors.– Demonstrate practical microprocessors can
be implemented as HW-SW hybrids.– Expanded the design space– Development teams may enlist software
experts, working in parallel with hardware engineers to bring products to market faster.
The Crusoe Processor
![Page 9: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/9.jpg)
Technology Perspective• Decoupled the x86 ISA from the underlying
processor hardware.– Each new CPU design only requires a new version
of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set.
• Because the CMS would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processor in the field.
![Page 10: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/10.jpg)
x86 vs. Crusoe
![Page 11: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/11.jpg)
Crusoe Processor Fundamentals
• VLIW engine– Two integer units, a floating point unit, a memory(stor
e/load) unit, a branch unit– Molecule: a long(64 or 128bits) instruction word conta
in up to four RISC-like instructions, called atom.– All atoms within a molecule are executed in parallel, a
nd the molecule format directly determines how atoms get routed to functional units.
• This greatly simplifies the decode and dispatch hardware.
![Page 12: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/12.jpg)
Crusoe Processor Fundamentals
• The integer register file– Has 64 registers, %r0 through %r63– CMS allocates some registers to hold
x86 state while others contain state internal to the system, or can be used as temporary registers.
![Page 13: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/13.jpg)
Crusoe Processor Fundamentals
• To keep the processor running at full speed, molecules are packed as fully as possible with atoms.
![Page 14: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/14.jpg)
Conventional superscalar…
• This type of processor hardware is much more complex than the Crusoe processor’s simple VLIW engine.
![Page 15: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/15.jpg)
Code Morphing Software• CMS
– Is fundamentally a dynamic translation system
– In this case, x86 ISA -> VLIW ISA– “x86 ISA” is the only thing x86 code
sees. • The only program written directly for the
VLIW engine is the Code Morphing Software itself.
![Page 16: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/16.jpg)
Hierarchy
![Page 17: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/17.jpg)
Hierarchy
![Page 18: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/18.jpg)
Crusoe’s VLIW instr. Scheduling
![Page 19: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/19.jpg)
Code Morphing Software
![Page 20: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/20.jpg)
CMS Memory Layout
![Page 21: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/21.jpg)
CMS: Drawing the HW-SW line• Choosing which functions to
implement in HW and which in SW is a major engineering challenge– Involving issues such as cost and
complexity, overall performance and power consumption
– For example, The HW-SW line might be drawn differently for a high-end server processor.
![Page 22: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/22.jpg)
CMS: Decoding and Scheduling
• Code Morphing can translate an entire group of x86 instructions at once, – Whereas a superscalar x86 translates single
instructions in isolation.
• The Code Morphing approach can amortize the cost of translation over many executions.– Allowing it to use much more sophisticated
translation and scheduling algorithm.
![Page 23: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/23.jpg)
CMS: Caching• The translation cache resides in a separate
memory space that is inaccessible to x86 code.
• As an application executes,– Code Morphing “learns” more about the program
and improves it so will execute faster and faster.
• Some benchmarks do not accurately predict the performance of Crusoe processor!!
![Page 24: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/24.jpg)
CMS: Filtering• The translation system needs to
– Choose carefully how much effort to spend on translating and optimizing a given piece of x86 code.
• A wide choice of execution modes– Interpretation only(no translation)– Simple-mined code generation– Highly-optimized code generation
![Page 25: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/25.jpg)
CMS: Prediction and Path Selection
• CMS can gather feedback
– Instrumentation profiling• The translator adds code to collect info.
– This data can be used later to decide when and what to optimize and translate.• For example, if a given branch is highly
biased,…
![Page 26: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/26.jpg)
CMS: Making a Translation
Front end
Well-knownoptimizations
Scheduling
The molecules explicitly encode the instruction-level parallelism, hence they can be executed by a simple VLIW engine.
![Page 27: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/27.jpg)
HW Support for Code Morphing• Exceptions • “precise exception” problemtrap
“too soon”
* Solution: Use Shadow Register !
![Page 28: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/28.jpg)
HW Support for Code Morphing• All registers holding x86 state are shadowe
d. (working/shadow copy)– Normal atoms only update the working copy of t
he register.– “commit” operation: working -> shadow regs.– “rollback” operation: shadow -> working regs.
• Undoing changes to memory– Holding store data in a “gated store buffer”– Commit / rollback
![Page 29: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/29.jpg)
HW Support for Code Morphing• Alias Hardware
– When the translator moves a load operation ahead of a store operation,
– it converts the load into a load-and-protect and the store into a store-under-alias-mask.
– Always safe to reorder memory ld/stores.
![Page 30: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/30.jpg)
HW Support for Code Morphing• Alias Hardware
<Original Code>
St 0(r1), r2…Ld r3, 0(r4)…St 0(r5), r6…Ld r7, 0(r8)Add r9, r3, r7
<Rescheduled Code> - UnsafeLd r3, 0(r4)Ld r7, 0(r8)St 0(r1), r2……St 0(r5), r6…Add r9, r3, r7
<Rescheduled Code> - ProtectedLdp r3, 0(r4) xLdp r7, 0(r8) x xStam 0(r1), r2……Stam 0(r5), r6…Add r9, r3, r7
* The ldp/stam pair is an excellent example that illustrates the interplay between the codesigned hardware and software in a codesigned VM.
![Page 31: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/31.jpg)
HW Support for Code Morphing• Coping with Self-Modifying Code
– X86 inst. in memory get overwritten, either• Because OS is loading a new program, or• Because an application is using self-modifying
code.– When this happens to code that has
already been translated,• The CMS needs to be notified to keep it from
erroneously executing a translation for the old code.
![Page 32: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/32.jpg)
HW Support for Code Morphing• Coping with Self-Modifying Code
– Whenever the system translates a block of x86 code, it write-protects the page.• It does so by setting a dedicated
“translated” bit in that page’s entry in the processor’s memory management unit.
• That bit is invisible to x86 software.– When a protected page is written to, the
simplest remedy is to invalidate the affected translations.
![Page 33: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/33.jpg)
Example: A complex translation
![Page 34: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/34.jpg)
Case Study (2):
IBM AS/400
![Page 35: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/35.jpg)
From IBM’s homepage…• The accelerating rate of change of
both hardware and software technologies necessitates that the system you select has been designed with the future in mind.– “We believe that the IBM AS/400 will be
the number one choice !”
![Page 36: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/36.jpg)
Introduction• The design of AS/400 insulates app
programs from changing hw characteristics through the layer of microcode.– The interface: TIMI– The microcode layer: LIC
• In 1995, AS/400 changed its processor technology ( CISC -> 64bit RISC )– No recompiling/rewriting– Not only did they run, but they were fully 64-bit
programs.
![Page 37: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/37.jpg)
AS/400 architecture
TIMI layer separates the hw and LIC from OS
Instructions are translated to a specific hw instruction set as part of the backend of the compilation process.
![Page 38: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/38.jpg)
AS/400 architecture• TIMI is a virtual instruction set.
– All user-mode programs are stored as TIMI instructions.
– Conceptually somewhat similar to the VM architecture of programming env such as Smalltalk, Java and .NET
– Stored within the final program object– Object-based ISA
![Page 39: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/39.jpg)
Memory Architecture• The TIMI has a memory architecture
composed of objects.– The objects are completely isolated from
one another and can only be accessed via pointers.
– Actual address values contained in pointers are not made visible to SW above TIMI.
– The implementation of the object-based memory is done entirely below the TIMI.
![Page 40: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/40.jpg)
Memory Architecture• Protecting the integrity of pointers is an es
sential part of any Object-Based system.– The object pointers are encoded in 128bits.
• Upper 64 bits: type info, authorization, …• Lower 64 bits: 64-bit PowerPC virtual addr.
– Significant extension to PowerPC mem.arch.• Adding of protection for object pointers
– Load/Store-pointer instruction.– 65th bit for indicating whether the location contains a poin
ter
![Page 41: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/41.jpg)
Instruction Set• TIMI instruction format
• Multiway conditional branch– This is the “architected representation”– It is translated to an impl-dependent form, and it doe
s the work of multiple RISC instructions.
opcodeopcodeextend
operand1 … operandN dest1 … dest4
2 bytes 2 bytes 3 bytes 3 bytes 3 bytes 3 bytes
(optional) (optional) (optional) (optional) (optional)
Addn & branch Eq 0 Gt 0 0 0 sum addend
1addend
2 dest1 dest2
![Page 42: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/42.jpg)
Instruction SetInstr. addn 34 32 31 muln 36 34 37 Instr.
… const Binary(2) Binary(2) Binary(4) const …
1 31 32 33 34 35 36 37
ODT DirectionVector
4 A 2 3 … 1 3 D F …ODT EntryString
• Add numeric and multiply numeric, are generic• Entries in the ODT indicate the types of operands and the data flow.• The actual storage locations: after the TIMI is translated
![Page 43: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/43.jpg)
Input/Output• The presence of IOPs simplifies the task of
pushing the device-dependent aspects out of the central processor.
![Page 44: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/44.jpg)
Input/Output• At the level of TIMI,
– There is no secondary(disk) storage; rather it is part of the unified mem architecture.• All disk management SW, drivers, etc. exist in the i
mpl-dependent part of the system.
• The OS interacts with SW below the TIMI level(and with I/O devices)– through instructions that operate on the TIMI-le
vel objects.
![Page 45: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/45.jpg)
Input/Output• TIMI-Supported Objects
– Access group, Context, …– Authorization List, User Profile, …– Dictionary, Index, …– Queue, Mode descriptor, …– Logical unit descriptor, …– Module, Program, …
![Page 46: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/46.jpg)
Code Translation & Concealment
• HLL -> Template(TIMI + ODT) -> Program Object• The contents of the program object cannot be dir
ectly observed above the TIMI level.• Materialization
– Giving back to the user in the original, machine-independent form
– The platform switch is transparent to the user.
![Page 47: Codesigned Virtual Machines Part](https://reader033.fdocuments.in/reader033/viewer/2022052914/56815930550346895dc6622e/html5/thumbnails/47.jpg)
Code Translation & Concealment
Space objectHLL
Program
Progm. object
Compiler
Space object
<template>TIMI,ODT
Program Object
<template>TIMI,ODT
Impl-dependentExecutable
code
Create program source result
TIMI Level
Translator