A 1 Cycle-Per-Byte XML Accelerator

43
2010-2-19 University of Toronto 1 A 1 Cycle-Per-Byte XML Accelerator Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai University of Toronto

description

A 1 Cycle-Per-Byte XML Accelerator. Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai. University of Toronto. What is XML. Extensible Markup Language A Platform independent tool for data exchange and representation Widely used in: Web service Database system Scientific application - PowerPoint PPT Presentation

Transcript of A 1 Cycle-Per-Byte XML Accelerator

Page 1: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 1

A 1 Cycle-Per-Byte XML Accelerator

Zefu Dai, Nick Ni and Jianwen Zhu

Presented by Zefu Dai

University of Toronto

Page 2: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 2

What is XML Extensible Markup

Language

A Platform independent tool for data exchange and representation

Widely used in:- Web service

- Database system

- Scientific application

- …

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

Page 3: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 3

Performance Threat: XML Parsing70 mins loading 3 GB XML file, 26x slower than loa

ding plain text

>1s per bank transaction, how many transactions per day?

Average 175 K insts parsing 1KB XML data (IBM XML4C)

With network speed reaching tens of Gbps, XML Parsing speed outstands network improvement as the performance bottleneck

Page 4: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 4

Previous work Cycle Per Byte (CPB) = Average cycle to process each byte of

XML data

Multi-core Acceleration- Require a pre-parsing process, done sequentially- 30 CPB on a 4-core processor

SIMD Acceleration- without in memory tree construction and validation- 6-15 CPB

Hardware Accelerator- Most commercial products do not reveal performance metric and

design details- 10-40 CPB

Page 5: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 5

Our Design Causes of the parsing slowdown

- Text-based Data Stream- Variable-length string comparison- Poor memory performance due to streaming and memory back-tracing

An XML Parsing Accelerator implemented in FPGA- Fixed-length string operation- Optimized circuits for string comparison- Common case optimized stallable pipeline- data structure for high bandwidth on-chip memory

Achieve 1 CPB processing speed and saturate 1 Gbps Ethernet link, running at 125 MHz

Page 6: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 6

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

Page 7: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 7

Tasks of XML ParserWell-formed Checking

- Check if the document confirms to XML syntax rules

Schema Validation- Check if the document confirms to XML semantic rules

specified in DTD or Schema files

DOM Construction- Capture the parental relationship between elements and

attributes and store them into memory in Document Object Model (DOM) format

Page 8: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 8

Well-formed Checking exampleHas an unique

root element

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

Page 9: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 9

Well-formed Checking exampleHas an unique

root element

Elements must be closed and nested properly

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

Page 10: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 10

Well-formed Checking exampleHas an unique root

element

Elements must be closed and nested properly

Unique attributes within an element

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

Page 11: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 11

XML Schema ExampleSpecify permitted

child elements/attributes

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

Page 12: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 12

XML Schema ExampleSpecify permitted

child elements/attributes

Specify type of content

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

Page 13: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 13

XML Schema ExampleSpecify permitted

child elements/attributes

Specify type of content

Specify occurrence limit

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

Page 14: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 14

DOM ConstructionCreate in-memory tree

structure for XML

Provide application accesses through tree operations

Root

University

ElementDepartment

Element

Department

AttributeName

ECE

Text

Element

Students

Element

Professors

Elementjunior

Text213

Elementprofessor

AttributeName

AttributeField

… …

Mike

Text

network

Text

Page 15: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 15

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

Page 16: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 16

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Page 17: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 17

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

<Elem attr=‘xyz’> content</elem>

Page 18: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 18

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO<Elem attr=‘xyz’>content</Elem>

Page 19: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 19

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO<Elem attr=‘xyz’> content </Elem>

Page 20: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 20

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr xyz content

Elem attr xyz content

Page 21: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 21

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr xyz content

H(Elem) H(attr)

Elem attr xyz content

rule name rule content

Page 22: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 22

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr

xyz content

Elem

attr content

xyz

rule name

rule content

Page 23: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 23

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

Page 24: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 24

Recurring Idioms (Dwarfs)Identified 3 recurring computational idioms

(referred to as Dwarfs) - One-to-one String Matching

- One-to-many String Membership Test

- One-to-many String Search

One of the major reasons accounting for low performance

Page 25: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 25

Dwarf I: One-to-one String Matching

Tests if a subject string equals to a reference string

Example: correct nesting

The string is variable-length- Not efficient on conventional architecture

Solution: memory stack- Convert variable-length string comparison to fixed-length

character comparison

Page 26: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 26

Dwarf II: One-to-many String Membership Test

Tests if a subject string equals to any member of a set of reference strings

Example: unique attribute within an element

String comparison against all previously arrived attributes belonging to the same element- Expensive memory back-tracing

Solution: Bloom Filter- achieved in one memory lookup

<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Page 27: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 27

Dwarf III: One-to-many String Search

“Finds” a subject string among a set of reference strings (different to just “test”)

Example: Search for corresponding schema rule

string comparison against all candidates

- Undeterministic look up time

Solution: Balance Routing Table Scheme Achieved in one memory lookup

Page 28: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 28

Dwarf II: Bloom FilterExample: attribute name uniqueness checkingCommon case: attribute name is unique

- Filter out obvious cases using Bloom Filter- Lookup into a bit array instead of compare strings

Uncommon case: attribute name may already exists- Stall the entire design- Do all necessary string comparisons to confirm the

existences of the incoming sting- Assumption: low occurring rate (high cost)

Page 29: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 29

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 0 0 0 0 0 0 0 0 0

Current set = {}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Page 30: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 30

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 1 0 0 0 0 1 0 0 0

Current set = {name}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Page 31: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 31

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Page 32: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 32

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

Unique!0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}

Input = field

<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Page 33: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 33

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

False Positive!0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}

Input = field

<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Page 34: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 34

Bloom Filter ImplementationImplement the Bloom Filter algorithm in a pipeline

- Attribute name usually has multiple characters- Allow multiple processing cycles for each attribute name

HashCodeGenerator

Input character

0 31

0

k

h2

h1

hk

… … … …

Attribute name end Addr_valid Data_valid

update

positive

Bit ArrayIndexing Stage

Hash code Generating Stage

Matching Stage

Output

Page 35: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 35

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

Page 36: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 36

Experimental SetupSoftware XML parsers test

XML Parsing Accelerator testbed

Hardware and software platform Tested XML parsing librariesIntel Core 2 Quad Q9300 (2.5GHz,

6MB L2 Cache)2GB DDR2-800 MemoryDebian Linux 2.6.18-6 x86-64GNU C 4.1.2

Xerces-c 2.8.0 x86-64Libxml2DOM4J-1.6JAVA API for XML Processing

(JAXP) 1.6.0

8b

XML Engine

Ethernet M

ac

asyn_fifo

MC

UART

125MHz

8b

8b

cmd

data

Display

DDR2 Memory

Xilinx Virtex - 5 XC5VSX50T

125MHz 200MHz

Laptop1Gbps SGMII

UDP

Page 37: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 37

BenchmarksGroup Benchmark XML Size (KB) XSD Size (KB) Source

DOM Parsing

Security 3 - Intel Corporation

Structure 12 - codesynthesis

Tpox 15 - tpox

Hl7 136 - hl7-testharness

Qedeq 211 - qedeq.org

Xmark 116,000 - xml-benchmark

Schema Validation

CustomInfo 1 2 Intel Corporation

CDCatalog 105 2 w3schools

Workflow 13 10 qedeq.org

Page 38: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 38

Test ResultsMetric: Raw Throughput (Gbps)

Benchmark JAXP DOM4J Libxml2 Xerces-c XPA XPAmax

Security 0.199 0.059 0.294 0.100 1.000 1.040Structure 0.274 0.110 0.202 0.091 1.000 1.040Tpox 0.292 0.099 0.264 0.124 1.000 1.040Hl7 0.415 0.189 0.360 0.128 1.000 1.040Qedeq 0.481 0.221 0.338 0.133 1.000 1.040Xmark 0.550 0.256 0.416 0.187 1.000 1.040Average_par 0.373 0.158 0.314 0.127 1.000 1.040CustomInfo 0.062 - 0.107 0.054 1.000 1.040CDCatalog 0.128 - 0.232 0.113 1.000 1.040Workflow 0.227 - 0.396 0.185 1.000 1.040Average_vld 0.161 - 0.283 0.134 1.000 1.040Average_all 0.267 0.158 0.299 0.131 1.000 1.040

Page 39: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 39

Test ResultsMetric: Cycle Per Byte

Benchmark JAXP DOM4J Libxml2 Xerces-c XPA

Security 100.6 339.7 67.9 201.0 1.0Structure 73.1 181.3 99.1 220.5 1.0Tpox 68.5 201.3 75.9 161.0 1.0Hl7 48.2 106.0 55.6 155.8 1.0Qedeq 41.5 90.4 59.2 150.6 1.0Xmark 36.4 78.0 48.0 106.7 1.0Average_par 53.6 126.9 63.6 157.2 1.0CustomInfo 321.8 - 186.2 373.7 1.0CDCatalog 156.5 - 86.3 176.8 1.0Workflow 88.3 - 50.4 108.3 1.0Average_vld 124.4 70.6 148.8 1.0Average_all 75.0 126.9 66.9 152.9 1.0

Page 40: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 40

Scalability Examination Bloom Filter efficiency

- Test Attribute Name Uniqueness circuit with generated test files- Count the number of false positives

Bloom Filter Google Key Words Wikipedia Key Words

Bit_Array 4k 8k 16k 4k 8k 16k

2 Hash Func.

64b 1 66 509 6 129 502

256b 0 5 60 1 8 56

1kb 0 1 6 1 2 2

2kb 0 0 1 0 0 0

3 Hash F

u. 256b 0 0 14 1 3 9

1kb 0 0 1 0 0 0

2kb 0 0 0 0 0 0

Page 41: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 41

Implementation CostTarget Device: Xilinx Virtex-5 XC5VSX50T

LogicUtilization

Slice Register Slice LUT Block RAM

XPA 4455 (13%) 6594 (20%) 13 (11%)MC 1960 (6%) 1683 (5%) 5 (3%))EMAC 927 (2%) 712 (2%) 3 (2%)UART 151 (1%) 187 (1%) 2 (1%)TOTAL 7493 (22%) 9176 (28%) 23 (17%)XC5VSX50T 32640 32640 132

Page 42: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 42

ConclusionFPGA is a valid contender in XML processing

- Low clock frequency requirement to achieve high throughput

- Scalable to process large XML documents

- Moderate hardware cost to achieve high performance

Future work- Fully conformance to XML specification

Page 43: A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 43

Questions?

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO