Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

55
Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Transcript of Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Page 1: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Perl 6 Internals

Dan SugalskiTPC 5.0

“Here there be dragons”

Page 2: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The big goals of perl 6's internals

Speed Extendibility Cleanliness Compatibility Modularity Thread Safety Flexibility

Page 3: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Some global decisions

The core will be in C. (Like it or not, it's appropriate for code at this level)

The core must be modular, so pieces can be swapped out without rebuilding

It must be fast Long-term binary compatibility is a must Your average perl coder or extension writer

shouldn't need any info about the guts Things should generally be thought out,

documented, and engineered

Page 4: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The quick overview

Parser Compiler Optimizer Runtime engine

Page 5: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Parser Compiler Optimizer Interpreter

SyntaxTree Unoptimized

Bytecode

OptimizedBytecode

Fully-ladenInterpreter

PrecompiledBytecode

Page 6: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The parser

Where the whole thing starts Generally takes source of some sort and turns it

into a syntax tree

Page 7: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Bytecode Compiler

Turns a syntax tree into bytecode Performs some simple optimization

Page 8: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The optimizer

Takes the plain bytecode from the compiler and abuses it heavily

An optional step, generally skipped for compile-and-go execution

Should be able to work on small parts of a program for JIT optimization

Page 9: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Interpreter

Takes compiled (and possibly optimized) bytecode and does something with it

Generally that something is execute, but it might also be: Save to disk Translate to another format (.NET, Java bytecode) Compile to machine code

Page 10: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Parser

“Double, double, toil and troubleFire burn, and cauldron bubble”

Page 11: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Parser goals

Extendible in perl More powerful than what we have now Retargetable Self-contained and removable

Page 12: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Parsing perl isn't easy

May well be one of the toughest languages to properly parse

If we get perl right other languages are easy. Or at least easier

We have the full power of perl to draw on to do the parsing (Including the regex engine and Damian's Bizarre Idea de Jour)

Page 13: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The parser will be in C

We will be using C for the parser A full set of callbacks will be available to hook

into the parser in lots of places Adding new parsing rules (probably with regexes

describing them) will be easy The parser will be extendable via perl code

Page 14: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Compiler

“Mmmmm, tasty!”

Page 15: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

From syntax tree to bytecode

The compiler takes a syntax tree and turns it into bytecode

Very little optimization is done here. Optimization is expensive and optional Pretty straightforward—this isn't rocket science

Page 16: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Optimizer

“We can rebuild it.Make it better, faster, stronger”

Page 17: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Optimizer

Takes plain bytecode and makes it faster Does all the sorts of things that you expect an

optimizer to do—code motion, loop unrolling, common subexpression work, etc.

Will be an iterative process This will be interesting, as perl's a pain to

optimize An optional step, of course

Page 18: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Things that make optimizing perl tough

Active data Runtime redefinitions of everything Really, really late binding (Waiting for Godot

late) Perl programmers are used to more predictable

runtime characteristics than, say, C programmers.

Page 19: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The Interpreter

“Polly want a cracker?”

Page 20: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Interpreter goals

Fast Tuned for perl Language neutral where possible Event capable Sandboxable Asynchronous I/O built in Built with an eye towards TIL and/or native code

compilation Better debugging support than perl 5

Page 21: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The perl 6 interpreter is software CPU

Complete with registers and an assembly language

This can make translating perl 6 bytecode into native machine code easier

There's a lot of literature on building optimzing compilers that can be leveraged

While more complex than a pure stack-based machine, it's also faster

Opcode dispatch needs to be faster than perl 5 Opcode functions can be written in perl

Page 22: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

CPU specs

64 int, float, string, and PMC registers A segmented multiple stack architecture Interrupt-capable (for events) Pretty much completely position independent—

everything is referenced via register, pad entry, or name

Page 23: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The regex engine

The regex engine is going to be part of the perl 6 CPU, not separate as it is now

A good incentive to get opcode dispatch fast Makes expanding the regex engine a bit easier Details will be hidden as a set of regex opcodes

Page 24: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

A few words on the stack system

Each register file has an associated stack All registers of a particular type can be pushed

onto or popped off the stack in one go Individual registers or groups of registers can be

pushed or popped The stacks are all segmented so we're not relying

on finding contiguous chunks of memory for them

There's also a set of call and scratch stacks

Page 25: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Bytecode

“Could you say that a little differently?”

Page 26: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

What is bytecode?

A distilled version of a program Machine language for the PVM Can contain a lot of 'extra' information, including

full source Designed to be platform independent Should be mostly mappable as shared data

(modulo the fixup sections)

Page 27: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Data Structures

“Vtables and strings and floats, oh my!”

Page 28: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Variables

Vtable Pointer

Data Pointer

Integer Value

Float Value

Flags

Synchronization

GC Data

Generically called a PMC

Bigger than Perl 5's base data structure

Synchronization data built-in

Same for all variable types

GC data is not part of base structure

Page 29: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Scalars

Built off the base PMC structure Use the integer and float areas as caches Data pointer points off to string, large int, or large

float Vtable functions determine how it all works

Page 30: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Arrays

Built off the base PMC structure Data pointer points to array data All perl 6 arrays are typed May have an array of scalars, strings, integers, or

floats Array only takes up enough memory to hold their

types

Page 31: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Hashes

Built off the base PMC structure Data pointer points to array data All perl 6 hashes are typed May have a hash of scalars, strings, integers, or

floats Hashes only takes up enough memory to hold

their types Hashing function is overridable

Page 32: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Strings

Encoding

Type

Buffer Start

Buffer Length

String Length

String Size

Strings are sort of abstract

Perl 6 can mix and match string data (Unicode, ASCII, EBCDIC, etc)

New string types can be loaded on the fly

Flags

Unused

Page 33: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

String handling

Perl 6 has no 'built-in' string support—all string support is via loadable libraries

There'll be Unicode, ASCII, and EBCDIC support provided (at least) to start

Page 34: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Numbers

Bigints and bigfloats share the same header

Arbitrary-length floating point and integer numbers are supported

Perl automagically upgrades ints and floats when needed

Buffer Pointer

Length

Exponent

Flags

Page 35: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Vtables

All variable data access is done through a table of functions that the variable carries around with it

This allows us faster access, since code paths are specialized for just the functions they need to perform

Isolates us from the implementation of variables internally

Allows special purpose behaviour (like perl 5's magic) to be attached without cost to the rest of perl

Page 36: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Vtables (cont'd)

Makes thread safety easier A little bit more overhead because of the extra

level of indirection, but the smaller functions make up for that

Vtable functions can be written in perl. (Each class with objects blessed into it will have at least one)

There may be more than one vtable per package

Page 37: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Vtables hide data manipulation

Pretty much all the code to handle data manipulation will be done via variable vtables

Ths allows the variable implementation to change without perl needing to know

Allows far more flexibility in what you can make a variable do

Shortens the code path for data functions and trims out extraneous conditionals

Page 38: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

For example:Fetching the string value of a scalar

For scalars with strings:

String *get_str(PMC *my_PMC) { return my_PMC->data_pointer;}

For int-only scalar:

String *get_str(PMC *my_PMC) { my_PMC->data_pointer = make_string(my_PMC->integer); my_PMC->vtable =

int_and_string_vtable; return my_PMC->data_pointer;}

Page 39: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Memory Management

“Now where did I put that?”

Page 40: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Getting headers

All the fixed-size things (PMCs, string/number headers) get allocated from arenas

All headers, with the exception of PMCs (maybe) are moveable by the garbage collector

Non-PMC header allocation is very fast PMC allocation is only mostly fast

Page 41: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Buffer Management

Anything that isn't a fixed size gets allocated from the buffer pools

All buffered data, with the exception of data allocated in special pools, is moveable by the garbage collector

Because of GC, allocation is very quick

Page 42: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Garbage Collection

“Bring out yer dead!”

Page 43: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The perl 6 GC is a copying collector

Everything except PMCs is moveable in Perl 6 PMCs might be moveable too We get a compact memory heap out of this,

which allows for fast allocation Perl 6 will release empty memory back to the

system when it can Refcounts are used only to note object lifetimes,

not for GC Refcounts, for the most part, are dead

Page 44: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

GC considerations for Objects

Garbage collection and object death are now separate things

Perl's guarantee of timely object death is stronger We still don't guarantee perfect collection (but it

sucks less) We still refcount for real perl references, but only

2 bits are used Objects with more than two simultaneous

references won't get collected until a full dead variable scan is made

Page 45: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Extensions beware!

Since we have no refcounts, extensions must tell perl when they hold on to PMCs

Not a huge deal, as we piggy-back on the cross-interpreter PMC tracking we use for threads

No more struct PMC; in extensions...

Page 46: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Extending Perl 6

Page 47: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Extensions Made Easier

Perl 6 will have a real API The API is multilevel

Simple for embedders More complex for extension authors Pretty messy for vtable or opcode writers

Binary compatibility is a very strong consideration

Page 48: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Embedding

Guaranteed stable and binary compatible for the life of perl 6

Very simple API Create interpreter Destroy interpreter Parse source Run code Register native functions

Page 49: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Extensions

Much simpler interface to perl's internals The gory details are hidden Stable binary compatibility is a very strong goal

We may add functions or options, but we won't take them away

Extensions built for perl 6.0.1 should still run with perl 6.8.12 without rebuilding

Manipulating perl data should be much easier If you have to resort to Inline to wrap a library

then it means we've not got it right

Page 50: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Extensions (cont)

Inline, or something like it, is probably going to be the standard for extending perl

XS, when you have to resort to it, will be far less nasty than it is now

Page 51: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Homegrown Opcodes and Vtables

This is part of the grubby inside of perl 6 You can use any of the internal routines of perl If you do, though, you may run into backward-

compatibility issues at some point. (If it's not part of the embedding, utility, or extension API, we make no promises)

There's no guarantee that calling conventions won't change.

No guarantees that perl 6.4 will even use vtables or opcodes

Page 52: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Utility library

Perl 6 will provide a set of utility routines to handle common tasks String manipulation Encoding changes (Shift-JIS to Unicode, EBCDIC to

ASCII) Conversion routines (string to int or float) Extended precision math (int and float)

These will be stable, like the rest of the API

Page 53: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Variations on a Theme

“Tocatta and Fuge in perl minor by Wall”

Page 54: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The source doesn't have to be perl

The parser isn't obligated to be parsing perl Input source could be Python, Ruby, Java, or

INTERCAL The full perl parser is optional

Page 55: Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

The interpreter doesn't have to interpret

The interpreter is the destination for bytecode, but it doesn't have to interpret it

It might save directly to disk It might translate the bytecode into an alternate

form—Java bytecode, .NET code, or executable code, for example

The interpreter might translate to machine code on the fly, as a sort of JIT compiler. (Well, really a TIL, but...)