JavaScript on the GPU
-
Upload
jarred-nicholls -
Category
Technology
-
view
14.148 -
download
8
description
Transcript of JavaScript on the GPU
![Page 1: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/1.jpg)
If you don’t get this ref...shame on you
![Page 3: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/3.jpg)
Work @ SenchaWeb Platform Team
Doing webkitty things...
![Page 4: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/4.jpg)
WebKit Committer
![Page 5: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/5.jpg)
Co-AuthorW3C Web Cryptography
API
![Page 6: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/6.jpg)
JavaScript on the GPU
![Page 7: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/7.jpg)
Why JavaScript on the GPURunning JavaScript on the GPU
What’s to come...
What I’ll blabber about today
![Page 8: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/8.jpg)
Why JavaScript on the GPU?
![Page 9: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/9.jpg)
Why JavaScript on the GPU?
Better question:Why a GPU?
![Page 10: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/10.jpg)
Why JavaScript on the GPU?
Better question:Why a GPU?
A: They’re fast!(well, at certain things...)
![Page 11: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/11.jpg)
Totally di!erent paradigm from CPUsData parallelism vs. Task parallelismStream processing vs. Sequential processing
GPUs can divide-and-conquer
Hardware capable of a large number of “threads”e.g. ATI Radeon HD 6770m:480 stream processing units == 480 cores
Typically very high memory bandwidthMany, many GigaFLOPs
GPUs are fast b/c...
![Page 12: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/12.jpg)
Not all tasks can be accelerated by GPUsTasks must be parallelizable, i.e.:
Side e!ect freeHomogeneous and/or streamable
Overall tasks will become limited by Amdahl’s Law
GPUs don’t solve all problems
![Page 13: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/13.jpg)
![Page 14: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/14.jpg)
Let’s find out...
![Page 15: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/15.jpg)
ExperimentCode Name “LateralJS”
![Page 16: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/16.jpg)
LateralJS
Our MissionTo make JavaScript a first-class citizen on all GPUs and take advantage of hardware accelerated operations & data parallelization.
![Page 17: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/17.jpg)
OpenCLAMD, Nvidia, Intel, etc.A shitty version of C99No dynamic memoryNo recursionNo function pointersTerrible toolingImmature (arguably)
Our OptionsNvidia CUDA
Nvidia onlyC++ (C for CUDA)Dynamic memoryRecursionFunction pointersGreat dev. toolingMore mature (arguably)
![Page 18: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/18.jpg)
OpenCLAMD, Nvidia, Intel, etc.A shitty version of C99No dynamic memoryNo recursionNo function pointersTerrible toolingImmature (arguably)
Nvidia CUDANvidia onlyC++ (C for CUDA)Dynamic memoryRecursionFunction pointersGreat dev. toolingMore mature (arguably)
Our Options
![Page 19: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/19.jpg)
We want full JavaScript supportObject / prototypeClosuresRecursionFunctions as objectsVariable typing
Type Inference limitationsReasonably limited to size and complexity of “kernel-esque” functionsNot nearly insane enough
Why not a Static Compiler?
![Page 20: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/20.jpg)
![Page 21: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/21.jpg)
We want it all baby - full JavaScript support!Most insane approachChallenging to make it good, but holds a lot of promise
Why an Interpreter?
![Page 22: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/22.jpg)
OpenCL Headaches
![Page 23: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/23.jpg)
![Page 24: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/24.jpg)
Multiple memory spaces - pointer hellNo recursion - all inlined functionsNo standard libc librariesNo dynamic memoryNo standard data structures - apart from vector opsBuggy ass AMD/Nvidia compilers
Oh the agony...
![Page 25: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/25.jpg)
![Page 26: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/26.jpg)
In the order of fastest to slowest:
Multiple Memory Spaces
space description
privatevery faststream processor cache (~64KB)scoped to a single work item
localfast~= L1 cache on CPUs (~64KB)scoped to a single work group
globalconstant
slow, by orders of magnitude~= system memory over slow busavailable to all work groups/itemsall the VRAM on the card (MBs)
![Page 27: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/27.jpg)
global uchar* gptr = 0x1000;local uchar* lptr = (local uchar*) gptr; // FAIL!uchar* pptr = (uchar*) gptr; // FAIL! private is implicit
Memory Space Pointer Hell
local privateglobal
0x1000 points to something di!erentdepending on the address space!
0x1000
![Page 28: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/28.jpg)
#define GPTR(TYPE) global TYPE*#define CPTR(TYPE) constant TYPE*#define LPTR(TYPE) local TYPE*#define PPTR(TYPE) private TYPE*
Memory Space Pointer Hell
Pointers must always be fully qualifiedMacros to help ease the pain
![Page 29: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/29.jpg)
uint factorial(uint n) { if (n <= 1) return 1; else return n * factorial(n - 1); // compile-time error}
No Recursion!?!?!?No call stackAll functions are inlined to the kernel function
![Page 30: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/30.jpg)
No standard libc librariesmemcpy?strcpy?strcmp?etc...
![Page 31: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/31.jpg)
No standard libc librariesImplement our own
#define MEMCPY(NAME, DEST_AS, SRC_AS) \ DEST_AS void* NAME(DEST_AS void*, SRC_AS const void*, uint); \ DEST_AS void* NAME(DEST_AS void* dest, SRC_AS const void* src, uint size) { \ DEST_AS uchar* cDest = (DEST_AS uchar*)dest; \ SRC_AS const uchar* cSrc = (SRC_AS const uchar*)src; \ for (uint i = 0; i < size; i++) \ cDest[i] = cSrc[i]; \ return (DEST_AS void*)cDest; \ }PTR_MACRO_DEST_SRC(MEMCPY, memcpy)
Producesmemcpy_gmemcpy_lmemcpy_p
memcpy_gcmemcpy_glmemcpy_gp
memcpy_lcmemcpy_lgmemcpy_lp
memcpy_pcmemcpy_pgmemcpy_pl
![Page 32: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/32.jpg)
No malloc()No free()What to do...
No dynamic memory
![Page 33: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/33.jpg)
Create a large bu!er of global memory - our “heap”Implement our own malloc() and free()Create a handle structure - “virtual memory”P(T, hnd) macro to get the current pointer address
Yes! dynamic memory
GPTR(handle) hnd = malloc(sizeof(uint));GPTR(uint) ptr = P(uint, hnd);*ptr = 0xdeadbeef;free(hnd);
![Page 34: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/34.jpg)
![Page 35: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/35.jpg)
Ok, we get the point...FYL!
![Page 36: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/36.jpg)
HostHostHost
High-level Architecture
Esprima Parser
V8
GPUs
Stack-basedInterpreter
Data Heap
Garbage Collector
Device Mgr
Data Serializer & Marshaller
![Page 37: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/37.jpg)
HostHostHost
High-level Architecture
Esprima Parser
V8
GPUs
Stack-basedInterpreter
Data Heap
Garbage Collector
eval(code);
Build JSON AST
Device Mgr
Data Serializer & Marshaller
![Page 38: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/38.jpg)
HostHostHost
High-level Architecture
Esprima Parser
Device Mgr
V8
GPUs
Stack-basedInterpreter
Data Serializer & Marshaller
Data Heap
Garbage Collector
eval(code);
Build JSON AST
Serialize ASTJSON => C Structs
![Page 39: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/39.jpg)
HostHostHost
High-level Architecture
Esprima Parser
Device Mgr
V8
GPUs
Stack-basedInterpreter
Data Serializer & Marshaller
Data Heap
Garbage Collector
eval(code);
Build JSON AST
Serialize ASTJSON => C Structs
Ship to GPU to Interpret
![Page 40: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/40.jpg)
HostHostHost
High-level Architecture
Esprima Parser
Device Mgr
V8
GPUs
Stack-basedInterpreter
Data Serializer & Marshaller
Data Heap
Garbage Collector
eval(code);
Build JSON AST
Serialize ASTJSON => C Structs
Ship to GPU to Interpret
Fetch Result
![Page 41: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/41.jpg)
AST Generation
![Page 42: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/42.jpg)
AST Generation
Esprima in V8
JSON AST(v8::Object)
JavaScript Source
Lateral AST(C structs)
![Page 43: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/43.jpg)
$ resgen esprima.js resgen_esprima_js.c
Embed esprima.js
Resource Generator
![Page 44: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/44.jpg)
const unsigned char resgen_esprima_js[] = { 0x2f, 0x2a, 0x0a, 0x20, 0x20, 0x43, 0x6f, 0x70, 0x79, 0x72, 0x69, 0x67, 0x68, 0x74, 0x20, 0x28, 0x43, 0x29, 0x20, 0x32, ... 0x20, 0x3a, 0x20, 0x2a, 0x2f, 0x0a, 0x0a, 0};
Embed esprima.js
resgen_esprima_js.c
![Page 45: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/45.jpg)
extern const char resgen_esprima_js;
void ASTGenerator::init(){ HandleScope scope; s_context = Context::New(); s_context->Enter(); Handle<Script> script = Script::Compile(String::New(&resgen_esprima_js)); script->Run(); s_context->Exit(); s_initialized = true;}
Embed esprima.js
ASTGenerator.cpp
![Page 46: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/46.jpg)
ASTGenerator::esprimaParse( "var xyz = new Array(10);");
Build JSON AST
e.g.
![Page 47: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/47.jpg)
Handle<Object> ASTGenerator::esprimaParse(const char* javascript){ if (!s_initialized) init();
HandleScope scope; s_context->Enter(); Handle<Object> global = s_context->Global(); Handle<Object> esprima = Handle<Object>::Cast(global->Get(String::New("esprima"))); Handle<Function> esprimaParse = Handle<Function>::Cast(esprima->Get(String::New("parse"))); Handle<String> code = String::New(javascript); Handle<Object> ast = Handle<Object>::Cast(esprimaParse->Call(esprima, 1, (Handle<Value>*)&code));
s_context->Exit(); return scope.Close(ast);}
Build JSON AST
![Page 48: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/48.jpg)
{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "xyz" }, "init": { "type": "NewExpression", "callee": { "type": "Identifier", "name": "Array" }, "arguments": [ { "type": "Literal", "value": 10 } ] } } ], "kind": "var"}
Build JSON AST
![Page 49: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/49.jpg)
typedef struct ast_type_st { CL(uint) id; CL(uint) size;} ast_type;
typedef struct ast_program_st { ast_type type; CL(uint) body; CL(uint) numBody;} ast_program;
typedef struct ast_identifier_st { ast_type type; CL(uint) name;} ast_identifier;
Lateral AST structs
#ifdef __OPENCL_VERSION__#define CL(TYPE) TYPE#else#define CL(TYPE) cl_##TYPE#endif
Structs shared between Host and OpenCL
![Page 50: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/50.jpg)
ast_type* vd1_1_init_id = (ast_type*)astCreateIdentifier("Array");ast_type* vd1_1_init_args[1];vd1_1_init_args[0] = (ast_type*)astCreateNumberLiteral(10);ast_type* vd1_1_init = (ast_type*)astCreateNewExpression(vd1_1_init_id, vd1_1_init_args, 1);free(vd1_1_init_id);for (int i = 0; i < 1; i++) free(vd1_1_init_args[i]);ast_type* vd1_1_id = (ast_type*)astCreateIdentifier("xyz");ast_type* vd1_decls[1];vd1_decls[0] = (ast_type*)astCreateVariableDeclarator(vd1_1_id, vd1_1_init);free(vd1_1_id);free(vd1_1_init);ast_type* vd1 = (ast_type*)astCreateVariableDeclaration(vd1_decls, 1, "var");for (int i = 0; i < 1; i++) free(vd1_decls[i]);
Lateral AST structs
v8::Object => ast_typeexpanded
![Page 51: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/51.jpg)
ast_identifier* astCreateIdentifier(const char* str) { CL(uint) size = sizeof(ast_identifier) + rnd(strlen(str) + 1, 4); ast_identifier* ast_id = (ast_identifier*)malloc(size);
// copy the string strcpy((char*)(ast_id + 1), str);
// fill the struct ast_id->type.id = AST_IDENTIFIER; ast_id->type.size = size; ast_id->name = sizeof(ast_identifier); // offset
return ast_id;}
Lateral AST structs
astCreateIdentifier
![Page 52: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/52.jpg)
Lateral AST structsastCreateIdentifier(“xyz”)
offset field value
0 type.id AST_IDENTIFIER (0x01)
4 type.size 16
8 name 12 (offset)
12 str[0] ‘x’
13 str[1] ‘y’
14 str[2] ‘z’
15 str[3] ‘\0’
![Page 53: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/53.jpg)
ast_expression_new* astCreateNewExpression(ast_type* callee, ast_type** arguments, int numArgs) { CL(uint) size = sizeof(ast_expression_new) + callee->size; for (int i = 0; i < numArgs; i++) size += arguments[i]->size;
ast_expression_new* ast_new = (ast_expression_new*)malloc(size); ast_new->type.id = AST_NEW_EXPR; ast_new->type.size = size;
CL(uint) offset = sizeof(ast_expression_new); char* dest = (char*)ast_new;
// copy callee memcpy(dest + offset, callee, callee->size); ast_new->callee = offset; offset += callee->size;
// copy arguments if (numArgs) { ast_new->arguments = offset; for (int i = 0; i < numArgs; i++) { ast_type* arg = arguments[i]; memcpy(dest + offset, arg, arg->size); offset += arg->size; } } else ast_new->arguments = 0; ast_new->numArguments = numArgs;
return ast_new;}
Lateral AST structsastCreateNewExpression
![Page 54: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/54.jpg)
Lateral AST structsnew Array(10)
offset field value
0 type.id AST_NEW_EXPR (0x308)
4 type.size 52
8 callee 20 (offset)
12 arguments 40 (offset)
16 numArguments 1
20 callee node ast_identifier (“Array”)
40 arguments node ast_literal_number (10)
![Page 55: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/55.jpg)
Shared across the Host and the OpenCL runtimeHost writes, Lateral reads
Constructed on Host as contiguous blobsEasy to send to GPU: memcpy(gpu, ast, ast->size);Fast to send to GPU, single bu!er writeSimple to traverse w/ pointer arithmetic
Lateral AST structs
![Page 56: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/56.jpg)
Stack-basedInterpreter
![Page 57: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/57.jpg)
Building Blocks
Heap
AST Traverse Loop Interpret Loop
AST Traverse Stack
Symbol/Ref TableCall/Exec Stack
Return Stack
Lateral State
Scope Stack
JS Type Structs
![Page 58: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/58.jpg)
#include "state.h"#include "jsvm/asttraverse.h"#include "jsvm/interpreter.h"
// Setup VM structureskernel void lateral_init(GPTR(uchar) lateral_heap) { LATERAL_STATE_INIT}
// Interpret the ASTkernel void lateral(GPTR(uchar) lateral_heap, GPTR(ast_type) lateral_ast) { LATERAL_STATE
ast_push(lateral_ast); while (!Q_EMPTY(lateral_state->ast_stack, ast_q) || !Q_EMPTY(lateral_state->call_stack, call_q)) { while (!Q_EMPTY(lateral_state->ast_stack, ast_q)) traverse(); if (!Q_EMPTY(lateral_state->call_stack, call_q)) interpret(); }}
Kernels
![Page 59: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/59.jpg)
var x = 1 + 2;
Let’s interpret...
![Page 60: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/60.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
![Page 61: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/61.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDecl
![Page 62: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/62.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDtor
![Page 63: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/63.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
IdentBinary
VarDtor
![Page 64: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/64.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
IdentLiteralLiteral
VarDtorBinary
![Page 65: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/65.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
IdentLiteral
VarDtorBinaryLiteral
![Page 66: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/66.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
Ident VarDtorBinaryLiteralLiteral
![Page 67: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/67.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDtorBinaryLiteralLiteralIdent
![Page 68: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/68.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDtorBinaryLiteralLiteral
“x”
![Page 69: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/69.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDtorBinaryLiteral
“x”1
![Page 70: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/70.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDtorBinary
“x”12
![Page 71: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/71.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
VarDtor “x”3
![Page 72: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/72.jpg)
var x = 1 + 2;{ "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "x" }, "init": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Literal", "value": 1 }, "right": { "type": "Literal", "value": 2 } } } ], "kind": "var"}
AST Call Return
![Page 73: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/73.jpg)
Benchmark
![Page 74: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/74.jpg)
var input = new Array(10);for (var i = 0; i < input.length; i++) { input[i] = Math.pow((i + 1) / 1.23, 3);}
Benchmark
Small loop of FLOPs
![Page 75: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/75.jpg)
Execution Time
GPU CLATI Radeon 6770m
CPU CLIntel Core i7 4x2.4Ghz
V8Intel Core i7 4x2.4Ghz
116.571533ms 0.226007ms 0.090664ms
Lateral
![Page 76: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/76.jpg)
Execution TimeLateral
GPU CLATI Radeon 6770m
CPU CLIntel Core i7 4x2.4Ghz
V8Intel Core i7 4x2.4Ghz
116.571533ms 0.226007ms 0.090664ms
![Page 77: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/77.jpg)
![Page 78: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/78.jpg)
EverythingStack-based AST Interpreter, no optimizationsHeavy global memory access, no optimizationsNo data or task parallelism
What went wrong?
![Page 79: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/79.jpg)
Slow as molassesMemory hog Eclipse styleHeavy memory access
“var x = 1 + 2;” == 30 stack hits alone!Too much dynamic allocation
No inline optimizations, just following the yellow brick ASTStraight up lazy
Replace with something better!Bytecode compiler on HostBytecode register-based interpreter on Device
Stack-based Interpreter
![Page 80: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/80.jpg)
![Page 81: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/81.jpg)
Everything is dynamically allocated to global memoryRegister based interpreter & bytecode compiler can make better use of local and private memory
Too much global access
// 11.1207 secondssize_t tid = get_global_id(0);c[tid] = a[tid];while(b[tid] > 0) { // touch global memory on each loop b[tid]--; // touch global memory on each loop c[tid]++; // touch global memory on each loop}
// 0.0445558 seconds!! HOLY SHIT!size_t tid = get_global_id(0);int tmp = a[tid]; // temp private variablefor(int i=b[tid]; i > 0; i--) tmp++; // touch private variables on each loopc[tid] = tmp; // touch global memory one time
Optimizing memory access yields crazy results
![Page 82: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/82.jpg)
Everything being interpreted in a single “thread”We have hundreds of cores available to us!Build in heuristics
Identify side-e!ect free statementsBreak into parallel tasks - very magical
No data or task parallelism
var input = new Array(10);for (var i = 0; i < input.length; i++) { input[i] = Math.pow((i + 1) / 1.23, 3);}
input[9] = Math.pow((9 + 1) / 1.23, 3);
input[1] = Math.pow((1 + 1) / 1.23, 3);
input[0] = Math.pow((0 + 1) / 1.23, 3);
...
![Page 83: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/83.jpg)
Acceptable performance on all CL devicesV8/Node extension to launch Lateral tasksHigh-level API to perform map-reduce, etc.Lateral-cluster...mmmmm
What’s in store
![Page 84: JavaScript on the GPU](https://reader037.fdocuments.in/reader037/viewer/2022102814/549547a9b4795938368b458d/html5/thumbnails/84.jpg)
Thanks!
Jarred Nicholls@jarrednicholls