A 1 Cycle-Per-Byte XML Accelerator

Post on 08-Jan-2016

31 views 2 download

Tags:

description

A 1 Cycle-Per-Byte XML Accelerator. Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai. University of Toronto. What is XML. Extensible Markup Language A Platform independent tool for data exchange and representation Widely used in: Web service Database system Scientific application - PowerPoint PPT Presentation

Transcript of A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 1

A 1 Cycle-Per-Byte XML Accelerator

Zefu Dai, Nick Ni and Jianwen Zhu

Presented by Zefu Dai

University of Toronto

2010-2-19 University of Toronto 2

What is XML Extensible Markup

Language

A Platform independent tool for data exchange and representation

Widely used in:- Web service

- Database system

- Scientific application

- …

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

2010-2-19 University of Toronto 3

Performance Threat: XML Parsing70 mins loading 3 GB XML file, 26x slower than loa

ding plain text

>1s per bank transaction, how many transactions per day?

Average 175 K insts parsing 1KB XML data (IBM XML4C)

With network speed reaching tens of Gbps, XML Parsing speed outstands network improvement as the performance bottleneck

2010-2-19 University of Toronto 4

Previous work Cycle Per Byte (CPB) = Average cycle to process each byte of

XML data

Multi-core Acceleration- Require a pre-parsing process, done sequentially- 30 CPB on a 4-core processor

SIMD Acceleration- without in memory tree construction and validation- 6-15 CPB

Hardware Accelerator- Most commercial products do not reveal performance metric and

design details- 10-40 CPB

2010-2-19 University of Toronto 5

Our Design Causes of the parsing slowdown

- Text-based Data Stream- Variable-length string comparison- Poor memory performance due to streaming and memory back-tracing

An XML Parsing Accelerator implemented in FPGA- Fixed-length string operation- Optimized circuits for string comparison- Common case optimized stallable pipeline- data structure for high bandwidth on-chip memory

Achieve 1 CPB processing speed and saturate 1 Gbps Ethernet link, running at 125 MHz

2010-2-19 University of Toronto 6

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

2010-2-19 University of Toronto 7

Tasks of XML ParserWell-formed Checking

- Check if the document confirms to XML syntax rules

Schema Validation- Check if the document confirms to XML semantic rules

specified in DTD or Schema files

DOM Construction- Capture the parental relationship between elements and

attributes and store them into memory in Document Object Model (DOM) format

2010-2-19 University of Toronto 8

Well-formed Checking exampleHas an unique

root element

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

2010-2-19 University of Toronto 9

Well-formed Checking exampleHas an unique

root element

Elements must be closed and nested properly

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

2010-2-19 University of Toronto 10

Well-formed Checking exampleHas an unique root

element

Elements must be closed and nested properly

Unique attributes within an element

<?xml version = “1.0” encoding = “UTF-8” ?><!-- this is an example xml document --><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

2010-2-19 University of Toronto 11

XML Schema ExampleSpecify permitted

child elements/attributes

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

2010-2-19 University of Toronto 12

XML Schema ExampleSpecify permitted

child elements/attributes

Specify type of content

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

2010-2-19 University of Toronto 13

XML Schema ExampleSpecify permitted

child elements/attributes

Specify type of content

Specify occurrence limit

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

2010-2-19 University of Toronto 14

DOM ConstructionCreate in-memory tree

structure for XML

Provide application accesses through tree operations

Root

University

ElementDepartment

Element

Department

AttributeName

ECE

Text

Element

Students

Element

Professors

Elementjunior

Text213

Elementprofessor

AttributeName

AttributeField

… …

Mike

Text

network

Text

2010-2-19 University of Toronto 15

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

2010-2-19 University of Toronto 16

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

2010-2-19 University of Toronto 17

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

<Elem attr=‘xyz’> content</elem>

2010-2-19 University of Toronto 18

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO<Elem attr=‘xyz’>content</Elem>

2010-2-19 University of Toronto 19

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO<Elem attr=‘xyz’> content </Elem>

2010-2-19 University of Toronto 20

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr xyz content

Elem attr xyz content

2010-2-19 University of Toronto 21

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr xyz content

H(Elem) H(attr)

Elem attr xyz content

rule name rule content

2010-2-19 University of Toronto 22

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr

xyz content

Elem

attr content

xyz

rule name

rule content

2010-2-19 University of Toronto 23

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

2010-2-19 University of Toronto 24

Recurring Idioms (Dwarfs)Identified 3 recurring computational idioms

(referred to as Dwarfs) - One-to-one String Matching

- One-to-many String Membership Test

- One-to-many String Search

One of the major reasons accounting for low performance

2010-2-19 University of Toronto 25

Dwarf I: One-to-one String Matching

Tests if a subject string equals to a reference string

Example: correct nesting

The string is variable-length- Not efficient on conventional architecture

Solution: memory stack- Convert variable-length string comparison to fixed-length

character comparison

2010-2-19 University of Toronto 26

Dwarf II: One-to-many String Membership Test

Tests if a subject string equals to any member of a set of reference strings

Example: unique attribute within an element

String comparison against all previously arrived attributes belonging to the same element- Expensive memory back-tracing

Solution: Bloom Filter- achieved in one memory lookup

<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

2010-2-19 University of Toronto 27

Dwarf III: One-to-many String Search

“Finds” a subject string among a set of reference strings (different to just “test”)

Example: Search for corresponding schema rule

string comparison against all candidates

- Undeterministic look up time

Solution: Balance Routing Table Scheme Achieved in one memory lookup

2010-2-19 University of Toronto 28

Dwarf II: Bloom FilterExample: attribute name uniqueness checkingCommon case: attribute name is unique

- Filter out obvious cases using Bloom Filter- Lookup into a bit array instead of compare strings

Uncommon case: attribute name may already exists- Stall the entire design- Do all necessary string comparisons to confirm the

existences of the incoming sting- Assumption: low occurring rate (high cost)

2010-2-19 University of Toronto 29

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 0 0 0 0 0 0 0 0 0

Current set = {}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

2010-2-19 University of Toronto 30

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 1 0 0 0 0 1 0 0 0

Current set = {name}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

2010-2-19 University of Toronto 31

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

2010-2-19 University of Toronto 32

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

Unique!0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}

Input = field

<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

2010-2-19 University of Toronto 33

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

False Positive!0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}

Input = field

<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

2010-2-19 University of Toronto 34

Bloom Filter ImplementationImplement the Bloom Filter algorithm in a pipeline

- Attribute name usually has multiple characters- Allow multiple processing cycles for each attribute name

HashCodeGenerator

Input character

0 31

0

k

h2

h1

hk

… … … …

Attribute name end Addr_valid Data_valid

update

positive

Bit ArrayIndexing Stage

Hash code Generating Stage

Matching Stage

Output

2010-2-19 University of Toronto 35

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

2010-2-19 University of Toronto 36

Experimental SetupSoftware XML parsers test

XML Parsing Accelerator testbed

Hardware and software platform Tested XML parsing librariesIntel Core 2 Quad Q9300 (2.5GHz,

6MB L2 Cache)2GB DDR2-800 MemoryDebian Linux 2.6.18-6 x86-64GNU C 4.1.2

Xerces-c 2.8.0 x86-64Libxml2DOM4J-1.6JAVA API for XML Processing

(JAXP) 1.6.0

8b

XML Engine

Ethernet M

ac

asyn_fifo

MC

UART

125MHz

8b

8b

cmd

data

Display

DDR2 Memory

Xilinx Virtex - 5 XC5VSX50T

125MHz 200MHz

Laptop1Gbps SGMII

UDP

2010-2-19 University of Toronto 37

BenchmarksGroup Benchmark XML Size (KB) XSD Size (KB) Source

DOM Parsing

Security 3 - Intel Corporation

Structure 12 - codesynthesis

Tpox 15 - tpox

Hl7 136 - hl7-testharness

Qedeq 211 - qedeq.org

Xmark 116,000 - xml-benchmark

Schema Validation

CustomInfo 1 2 Intel Corporation

CDCatalog 105 2 w3schools

Workflow 13 10 qedeq.org

2010-2-19 University of Toronto 38

Test ResultsMetric: Raw Throughput (Gbps)

Benchmark JAXP DOM4J Libxml2 Xerces-c XPA XPAmax

Security 0.199 0.059 0.294 0.100 1.000 1.040Structure 0.274 0.110 0.202 0.091 1.000 1.040Tpox 0.292 0.099 0.264 0.124 1.000 1.040Hl7 0.415 0.189 0.360 0.128 1.000 1.040Qedeq 0.481 0.221 0.338 0.133 1.000 1.040Xmark 0.550 0.256 0.416 0.187 1.000 1.040Average_par 0.373 0.158 0.314 0.127 1.000 1.040CustomInfo 0.062 - 0.107 0.054 1.000 1.040CDCatalog 0.128 - 0.232 0.113 1.000 1.040Workflow 0.227 - 0.396 0.185 1.000 1.040Average_vld 0.161 - 0.283 0.134 1.000 1.040Average_all 0.267 0.158 0.299 0.131 1.000 1.040

2010-2-19 University of Toronto 39

Test ResultsMetric: Cycle Per Byte

Benchmark JAXP DOM4J Libxml2 Xerces-c XPA

Security 100.6 339.7 67.9 201.0 1.0Structure 73.1 181.3 99.1 220.5 1.0Tpox 68.5 201.3 75.9 161.0 1.0Hl7 48.2 106.0 55.6 155.8 1.0Qedeq 41.5 90.4 59.2 150.6 1.0Xmark 36.4 78.0 48.0 106.7 1.0Average_par 53.6 126.9 63.6 157.2 1.0CustomInfo 321.8 - 186.2 373.7 1.0CDCatalog 156.5 - 86.3 176.8 1.0Workflow 88.3 - 50.4 108.3 1.0Average_vld 124.4 70.6 148.8 1.0Average_all 75.0 126.9 66.9 152.9 1.0

2010-2-19 University of Toronto 40

Scalability Examination Bloom Filter efficiency

- Test Attribute Name Uniqueness circuit with generated test files- Count the number of false positives

Bloom Filter Google Key Words Wikipedia Key Words

Bit_Array 4k 8k 16k 4k 8k 16k

2 Hash Func.

64b 1 66 509 6 129 502

256b 0 5 60 1 8 56

1kb 0 1 6 1 2 2

2kb 0 0 1 0 0 0

3 Hash F

u. 256b 0 0 14 1 3 9

1kb 0 0 1 0 0 0

2kb 0 0 0 0 0 0

2010-2-19 University of Toronto 41

Implementation CostTarget Device: Xilinx Virtex-5 XC5VSX50T

LogicUtilization

Slice Register Slice LUT Block RAM

XPA 4455 (13%) 6594 (20%) 13 (11%)MC 1960 (6%) 1683 (5%) 5 (3%))EMAC 927 (2%) 712 (2%) 3 (2%)UART 151 (1%) 187 (1%) 2 (1%)TOTAL 7493 (22%) 9176 (28%) 23 (17%)XC5VSX50T 32640 32640 132

2010-2-19 University of Toronto 42

ConclusionFPGA is a valid contender in XML processing

- Low clock frequency requirement to achieve high throughput

- Scalable to process large XML documents

- Moderate hardware cost to achieve high performance

Future work- Fully conformance to XML specification

2010-2-19 University of Toronto 43

Questions?

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

1Gbps

8b

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO