Emulation: Interpretation and Binary Translation

Post on 15-Jan-2016

185 views 12 download

description

Emulation: Interpretation and Binary Translation. Haiyu Li lihaiyu@capp.snu.ac.kr. Content. Emulation Interpreter Decode-and-Dispatch Interpreter Threaded Interpretation Direct Threaded Interpretation with Predecoding Comparison - PowerPoint PPT Presentation

Transcript of Emulation: Interpretation and Binary Translation

CAPP

Emulation: Interpretation and Binary Translation

Haiyu Li

lihaiyu@capp.snu.ac.kr

Content

Emulation Interpreter

Decode-and-Dispatch Interpreter Threaded Interpretation Direct Threaded Interpretation with Predecoding

Comparison Interpreting a Complex Instruction Set (CISC)

Emulation

IA-32 program

FX!32 emulator

Alpha ISA HostTarget ISA

Host

GuestSource ISA

emulator

Emulation

Emulation : Interpretation, binary translation

Interpretation:

Fetching analyzing performing

next

Interpreter

Code

Data

Stack

Program Counter

Condition Codes

Reg 0

Reg 1

……

Reg n-1

Interpreter Code

Source Memory StateSource Context

Block

Interpreter Overview

Decode-and-dispatch interpreter

decodes an instruction

dispatches it to an interpretation routine based on the type of instruction.

Decode-and-dispatch interpreter

While (!halt&&interrupt){

inst=code[PC];

opcode=extract(inst,31,6);

switch(opcode){

case LoadWordAndZero:LoadWordAndZero(inst);

case ALU:ALU(inst);

case Branch: Branch(inst);

· · · · ·

}

Code for Interpreting the PowerPC Instruction Set Architecture.

Decode-and-dispatch interpreter

Instruction function list

LoadWordAndZero(inst){

RT=extract(inst,25,5);

RA=extract(inst,20,5);

displacement=

source=regs[RA];

address=source+displacement;

regs[RT]=(data[address]<<32)>>32

PC=PC+4;

}

ALU(inst){

RT=extract(inst,25,5);

RA=extract(inst,20,5);

RB=extract(inst,15,5);

source1=

source2=

extended_opcode=extract(inst,10,10);

switch(extended_opcode){

case Add:Add(inst);

case AddCarrying:

· · · · · · ·}

PC=PC+4;

}

Decode-and-dispatch interpreter

Advantage: Low memory requirements Zero star-up time

Disadvantage: Steady-state performance is slow- A source instruction must be parsed each time

it is emulated- Put a lot of pressure on the cache- Branches .

Decode-and-dispatch interpreter

Switch(opcode)

case

return

1.Switch statement->case Indirect

2.ALU(inst) direct

3.Return from the routine Indirect

4.Loop back-edge direct

While (!halt&&interrupt){ switch(opcode){ case ALU:ALU(inst); · · · · ·}

Threaded Interpretation

Instruction function list Add: RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5); source1=regs[RA]; source2=regs[RB]; sum=source1+source2; regs[RT]=sum; PC=PC+4; If (halt || interrupt) goto exit; inst=code[PC]; opcode=extract(inst,31,6); extended_opcode=extract(inst,10,10); routine=dispatch[opcode,extended_opcode]; goto *routine;}

Switch(opcode)

case

case

case

Put the dispatch code to the end of each of the instruction interpretation routines.

Threaded Interpretation

Advantage: low Memory requirements (more than decode-

and-dispatch interpretation) Start-up time is zero

Disadvantage: Steady-state performance is slow (better than

decode-and-dispatch interpretation) - Indirect branch

Predecoding

lwz r1, 8(r2) ;load word and zeroAdd r3, r3, r1 ; r3=r3+r1stw r3, 0(r4) ;store word

LoadWordAndZero(inst) RT=extract(inst,25,5); RA=extract(inst,20,5); RB=extract(inst,15,5);

Parsing an instruction, putting it in a form (intermediate form)

07

1 2 08

Intermediate form

RT= code[TPC].dest;RA= code[TPC].src1;displacement=code[TPC].src2

Predecoding

Struct instruction{

unsigned long op;

unsigned char dest;

unsigned char src1;

unsigned int src2;

} code [CODE_SIZE

Fast interpretation, but Memory requirements are high.

Direct Threaded Interpretation

the instruction codes contained in the intermediate code can be replaced with the actual addresses of the interpreter routines.

001048d0

1 2 08

07

1 2 08

If (halt || interrupt) goto exit; opcode= code[TPC].op; routine=dispatch [opcode]; goto *routine;

If (halt || interrupt) goto exit; routine= code[TPC].op; goto *routine;

Comparison

Source code source code Interpreter routines Source code Interpreter routines

dispatch loop

(a) (b ) ( c)

Comparison Intermediate

code

Interpreter routines

Source code

(d)

Predecoder

ComparisonDecode-and-Dispatch Interpretation

Indirect Threaded Interpreter

Direct Threaded Interpreter with Predecoding

Memory requirements

Low Low(more than the first one)

High

Start-up performance

Fast Fast Slow

Steady-state

performance Slow Slow

(better than the first one)

Medium

Code portability

Good Good Medium

RISC ISA & CISC ISA RISC ISA (Power PC) 32 bit register. 32bit

length.

Op Rd Rs1 Rs2 OpxRegister-register

Op Rd Rs1 Const

31 25 20 15 10 0

31 25 20 15 0

Register-immediate

Op Const opx

2

Jump/call

Interpreting a Complex Instruction Set

Prefixes Opcode ModR/M SIB Displacement Immediate

CISC instruction sets have wide variety of formats, variable instruction lengths, and even variable field lengths

Up to fourPrefixes of 1 byte each(optional)

1-,2-,or 3-byteopcode

1byte(if required)

1byte(if required)

AddressDisplacement

Of 1,2,or 4Bytes or none

Immediate data

Of 1,2,or 4Bytes or none

Mod Reg/

Opcode

R/M Scale Index Base

7 6 5 3 2 0 7 6 5 3 2 0

IA-32 Instruction Format

Jaemok Lee
X86 명령어는 피연산자로서, 메모리에 있는 값을 사용할 수 있는 경우가 상당히 많습니다.즉... 다음과 같은 명령어가 가능합니다.ADD AX, [SI]이건 SI 레지스터가 가르키고 있는 주소에 있는 값을, AX 라는 레지스터에 더하는 명령입니다.RISC 명령으로 표현하자면,LD R1, [SI]ADD AX, AX, R1이렇게 되겠죠...이렇게 피연산자가 Register 인 경우도 있고, Memory 인 경우도 있습니다. 그것을 구별하기 위해서 ModR/M 이라는 플래그가 있습니다.

IA-32 Instruction set

ModR/M:e.g. Mod=10 R/M=000 REG=001 ModR/M=10001000=88H Effective Address = [EAX]+disp32

IA-32 Instruction set

SIB e.g. SS=01=2H; Index=000; Base=001; SIB value=01000001=41H; Scales Index=[EAX*2] Mod bits Effective Address 00 [scales index]+disp32 01 [scales index]+disp8+[EBP] 10 [scales index]+disp32+[EBP]

IA-32 Instruction set

Add AX, 8[ESI]; Effective addr=ESI+disp Disp=8 AX=AX+mem[8+[ESI]];

Prefixes Opcode ModR/M SIB Displacement Immediate

0 ADD 01 000 110 000000…….1000

AX

Mod=01;

R/M=110;

Reg=000;

ADD Mod Reg R/M Displacement

Interpreting a Complex Instruction Set

Program Flow

for a Basic CISC ISA Interpreter Figure 2.12 (code) General

Decode(fill-in instruction

Structure)

Dispatch

Inst.1Specialized

routine

Inst.1Specialized

routine

Inst.1Specialized

routine

Interpreting a Complex Instruction Set Interpreter Based on

Optimization of Common Cases

SimpleInst.1

Specializedroutine

SimpleInst.m

Specializedroutine

ComplexInst.m+1

Specializedroutine

ComplexInst.m+1

Specializedroutine

PrefixSet flags

Dispatch On

first byte

Shared routines

Interpreting a Complex Instruction Set

Threaded Interpreter for a CISC ISA

SimpleInstructionSpecialized

routine

SimpleDecode/Dispatch

SimpleInstructionSpecialized

routine

SimpleDecode/Dispatch

SimpleInstructionSpecialized

routine

SimpleDecode/Dispatch

SimpleDecode/Dispatch

SimpleInstructionSpecialized

routine

ComplexDecode/Dispatch