Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer...

Post on 05-Jan-2016

219 views 0 download

Tags:

Transcript of Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer...

Introduction to Language Processing TechnologyNatawut Nupairoj, Ph.D.

Department of Computer EngineeringChulalongkorn University

Outline

Level of Programming Languages. Language Processors. Specification of Programming Languages.

swap(int v[], int k)

{ int temp;

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

}

swap:

muli $2, $5, 4

add $2, $4, $2

lw $15, 0($2)

...

Assembler

C Compiler

Level of Programming Languages

000010001101101100110000

000010001101101100110000

000010001101101100110000

000010001101101100110000

...

•High level: C / Java / Pascal•Low level: Assembly / Bytecode•Machine Language

High-Level Language Characteristics Expressions:

a = b + (c – d)/2; Data types:

Integer, character, boolean. Record, array.

Control structures: Selective. Iterative.

High-Level Language Characteristics Declarations:

Identifier can be constant, variable, procedure, function, and type.

Abstraction: Object-oriented concept. Concern only what, not how.

Encapsulation: Object-oriented concept. Information hiding.

Language Processors

Program that manipulates programs express in some programming languages.

Example:Editor.Translator / Compiler. Interpreter.

Translator

Translate a “source” program into an “equivalent” “object” program.

Translatorsourceprogram

objectprogram

error messages

CC++FORTRANJavaVB

AssemblyCBytecodep-code

Tombstone Diagrams

Ordinary program

Program P

Written with Language L

L

P

Java

Sort

x86

Sort

x86

Web Browser

x86

Web Browser

Tombstone Diagrams

Machine

M

Machine M

x86

SPARCx86

SPARC

x86

Web Browser

Tombstone Diagrams

Translator

L

S T

S is translatedto T

Translator is written with Language L

C

Java x86

x86

Java x86

C++

Java C

Tombstone Diagrams

Compilation

x86

C x86

x86

x86

x86

Sort

C

Sort

x86

Sort

Tombstone Diagrams

Cross Compilation

x86

C SPARC

x86

SPARC

SPARC

Sort

SPARC

Sort

C

Sort

Tombstone Diagrams

x86

Java C

x86

x86

C x86

x86

Two-stage compilation

C

Sort

Java

Sort

x86

Sort

Tombstone Diagrams

x86

C x86

x86

Compiling a compiler

C

Pascal x86

x86

Pascal x86

Tombstone Diagrams

Interpreter

S

L

Interpret source S

x86Written in language L

Basic

x86

Basic

x86

SQL

SPARC

Basic

Sort

Tombstone Diagrams

Abstract machine = hardware emulator interpreter for low-level language.

x86

C x86

x86

370

C

370

x86

x86

370

x86=

370

HW1

370

370

HW1

Tombstone Diagrams

Java Portable environment: write-once-run-anywhere. Interpretive compiler.

M

Java JVM JVM

M

JVM = Bytecode

Tombstone Diagrams

x86

JVM

x86

SPARC

JVM

SPARC

JVM

Sort

JVM

Sort

x86

Java JVM

x86

JVM

Sort

Java

Sort

Tombstone Diagrams

BootstrappingCompiler L that is written on L language.

Full bootstrapStart from nothing.

Half bootstrapStart from other machine.

NNP

C NNP

Tombstone Diagrams

Full Bootstrap

NNP

Csub

Csub NNP

NNP

Csub NNP

NNP

Csub

C NNP

NNP

C NNP

NNP

Csub NNP

NNP

Csub NNP

NNP

Csub NNP

Tombstone Diagrams

NNP

C

C NNP

NNP

C NNP

NNP

C NNP

Tombstone Diagrams

NNP

Csub

Csub NNP

NNP

Csub NNP

NNP

Csub

C NNP

NNP

C NNP

NNP

Csub NNP

NNP

C NNP

NNP

C

C NNP

Tombstone Diagrams

Half Bootstrap

x86

C x86

x86

C

C NNP

x86

C NNP

x86

C NNP

x86

C

C NNP

NNP

C NNP

x86

C X86

x86

Specification of Programming Language Specification

Syntax Define symbol and structure of the language. Grammar.

Contextual constraints Constraints beyond grammar. Rules of the language: scope rules, type rules, etc.

Semantics Meaning of program: its behaviors when run. How to translate a sentence S of the language L to a

machine code on M

Syntax

Context-free grammarTerminals.Non-terminals / Variables.Start symbol.Production rules.

Usually being expressed with BNF notation.

BNF Notation

Backus-Naur Form. Given production rule:

N N

Can be written as:

N ::=

Example: Mini-Triangle Program

! This is a comment. It continues to the end-of-line.

let

const m ~ 7;

var n: Integer

in

begin

n:= 2 * m * m;

putint(n);

end

Terminalsbegin const do else end ifin let then var while; : := ~ ( )+ - * / < >= \

Mini-Triangle Syntax

Program ::= Command

Command ::= single-Command

| Command ; single-Command

single-Command ::= V-name := Expression

| Identifier ( Expression )

| if Expression then single-Command

else single-Command

| while Expression do single-Command

| let Declaration in single-Command

| begin Command end

Mini-Triangle Syntax

Expression ::= primary-Expression

| Expression Operator primary-Expression

primary-Expression ::= Integer-Literal

| V-name

| Operator primary-Expression

| ( Expression )

V-name ::= Identifier

Declaration ::= single-Declaration

| Declaration ; single-Declaration

single-Declaration ::= const Identifier ~ Expression

| var Identifier : Type-denoter

Mini-Triangle Syntax

Type-denoter ::= Identifier

Operator ::= + | - | * | / | < | > | = | \

Identifier ::= Letter | Identifier Letter

| Identifier Digit

Integer-Literal ::= Digit | Integer-Literal Digit

Comment ::= ! Graphic* eol

Letter ::= a | b | … |z

Digit ::= 0 | 1 | 2 | … | 9

Syntax Tree

Ordered tree with Internal nodes: non-terminals.Leaf nodes: terminals.N-tree of G is a syntax tree with N as the root.

Mini-Triangle Syntax Tree

Expression ::= primary-Expression| Expression Operator primary-Expression

primary-Expression ::= Integer-Literal| V-name| Operator primary-Expression|( Expression )

V-name ::= Identifier…

Expression

Expression

Expression

primary-Expr.

V-name

Ident.

d

Op.

+

Int. Lit.

10

Op.

*

primary-Expr. primary-Expr.

V-name

Ident.

n