A 1 Cycle-Per-Byte XML Accelerator

2010-2-19 University of Toronto 1

Zefu Dai, Nick Ni and Jianwen Zhu

Presented by Zefu Dai

University of Toronto

What is XML Extensible Markup

Language

A Platform independent tool for data exchange and representation

Widely used in:- Web service

- Database system

- Scientific application

<?xml version = “1.0” encoding = “UTF-8” ?><University> <Department name = “ECE”> <Students> <freshman>310</freshman> <sophomore>298</sophomore> <junior>213</junior> <senior>178</senior> <graduate>86</graduate> … </Students> <Professors> <professor name=“Mike” field=“network”/> … </Professors> </Department> …</University>

Performance Threat: XML Parsing70 mins loading 3 GB XML file, 26x slower than loa

ding plain text

>1s per bank transaction, how many transactions per day?

Average 175 K insts parsing 1KB XML data (IBM XML4C)

With network speed reaching tens of Gbps, XML Parsing speed outstands network improvement as the performance bottleneck

Previous work Cycle Per Byte (CPB) = Average cycle to process each byte of

XML data

Multi-core Acceleration- Require a pre-parsing process, done sequentially- 30 CPB on a 4-core processor

SIMD Acceleration- without in memory tree construction and validation- 6-15 CPB

Hardware Accelerator- Most commercial products do not reveal performance metric and

design details- 10-40 CPB

Our Design Causes of the parsing slowdown

- Text-based Data Stream- Variable-length string comparison- Poor memory performance due to streaming and memory back-tracing

An XML Parsing Accelerator implemented in FPGA- Fixed-length string operation- Optimized circuits for string comparison- Common case optimized stallable pipeline- data structure for high bandwidth on-chip memory

Achieve 1 CPB processing speed and saturate 1 Gbps Ethernet link, running at 125 MHz

Outlines BackgroundHigh-level architectureDesign DetailsEvaluation

Tasks of XML ParserWell-formed Checking

- Check if the document confirms to XML syntax rules

Schema Validation- Check if the document confirms to XML semantic rules

specified in DTD or Schema files

DOM Construction- Capture the parental relationship between elements and

attributes and store them into memory in Document Object Model (DOM) format

Well-formed Checking exampleHas an unique

root element

Well-formed Checking exampleHas an unique

root element

Elements must be closed and nested properly

Well-formed Checking exampleHas an unique root

element

Elements must be closed and nested properly

Unique attributes within an element

XML Schema ExampleSpecify permitted

child elements/attributes

<?xml version =“1.0”?><xs:schema xmlns:xs=“http://www.w3.org/XMLSchema”> <xs:element name=“University”> <xs:complexType> <xs:element name=“Department” minOccurs=“2” > <xs:complexType> <xs:sequence> <xs:element name=“Students”>

<xs:complexType> <xs:all> <xs:element name=“freshman” type=“xs:string” /> <xs:element name=“sophomore” type=“xs:string” /> <xs:element name=“junior” type=“xs:string” /> <xs:element name=“senior” type=“xs:string” /> <xs:element name=“graduate” type=“xs:string” /> </xs:all> </xs:complexType> </xs:element>

<xs:element name=“Professors” type=“professorType”/> </xs:sequency> </xs:complexType> </xs:element> </xs:complexType> </xs:element></xs:schema>

Specify type of content

Specify occurrence limit

DOM ConstructionCreate in-memory tree

structure for XML

Provide application accesses through tree operations

University

ElementDepartment

Element

Department

AttributeName

Element

Students

Element

Professors

Elementjunior

Text213

Elementprofessor

AttributeName

AttributeField

… …

network

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

Well-formed Checking Stage

Schema Validation Stage

DOM Construction Stage

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

<Elem attr=‘xyz’> content</elem>

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO<Elem attr=‘xyz’>content</Elem>

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO<Elem attr=‘xyz’> content </Elem>

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr xyz content

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

H(Elem) H(attr)

rule name rule content

Top Level Diagram

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

Elem attr

xyz content

attr content

rule name

rule content

Recurring Idioms (Dwarfs)Identified 3 recurring computational idioms

(referred to as Dwarfs) - One-to-one String Matching

- One-to-many String Membership Test

- One-to-many String Search

One of the major reasons accounting for low performance

Dwarf I: One-to-one String Matching

Tests if a subject string equals to a reference string

Example: correct nesting

The string is variable-length- Not efficient on conventional architecture

Solution: memory stack- Convert variable-length string comparison to fixed-length

character comparison

Dwarf II: One-to-many String Membership Test

Tests if a subject string equals to any member of a set of reference strings

Example: unique attribute within an element

String comparison against all previously arrived attributes belonging to the same element- Expensive memory back-tracing

Solution: Bloom Filter- achieved in one memory lookup

Dwarf III: One-to-many String Search

“Finds” a subject string among a set of reference strings (different to just “test”)

Example: Search for corresponding schema rule

string comparison against all candidates

- Undeterministic look up time

Solution: Balance Routing Table Scheme Achieved in one memory lookup

Dwarf II: Bloom FilterExample: attribute name uniqueness checkingCommon case: attribute name is unique

- Filter out obvious cases using Bloom Filter- Lookup into a bit array instead of compare strings

Uncommon case: attribute name may already exists- Stall the entire design- Do all necessary string comparisons to confirm the

existences of the incoming sting- Assumption: low occurring rate (high cost)

Solution II: Bloom FilterFor each attribute name:

- Generate N independent hash codes

- Look up the bit array

- Update the bit array

0 0 0 0 0 0 0 0 0 0

Current set = {}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

0 1 0 0 0 0 1 0 0 0

Current set = {name}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}<student name=“john” gender=“m”, hobby=“guitar” field=“math”>

Unique!0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}

Input = field

False Positive!0 1 0 1 0 0 1 0 1 1

Current set = {name, gender, hobby}

Input = field

Bloom Filter ImplementationImplement the Bloom Filter algorithm in a pipeline

- Attribute name usually has multiple characters- Allow multiple processing cycles for each attribute name

HashCodeGenerator

Input character

… … … …

Attribute name end Addr_valid Data_valid

update

positive

Bit ArrayIndexing Stage

Hash code Generating Stage

Matching Stage

Output

Experimental SetupSoftware XML parsers test

XML Parsing Accelerator testbed

Hardware and software platform Tested XML parsing librariesIntel Core 2 Quad Q9300 (2.5GHz,

6MB L2 Cache)2GB DDR2-800 MemoryDebian Linux 2.6.18-6 x86-64GNU C 4.1.2

Xerces-c 2.8.0 x86-64Libxml2DOM4J-1.6JAVA API for XML Processing

(JAXP) 1.6.0

XML Engine

Ethernet M

asyn_fifo

125MHz

Display

DDR2 Memory

Xilinx Virtex - 5 XC5VSX50T

125MHz 200MHz

Laptop1Gbps SGMII

BenchmarksGroup Benchmark XML Size (KB) XSD Size (KB) Source

DOM Parsing

Security 3 - Intel Corporation

Structure 12 - codesynthesis

Tpox 15 - tpox

Hl7 136 - hl7-testharness

Qedeq 211 - qedeq.org

Xmark 116,000 - xml-benchmark

Schema Validation

CustomInfo 1 2 Intel Corporation

CDCatalog 105 2 w3schools

Workflow 13 10 qedeq.org

Test ResultsMetric: Raw Throughput (Gbps)

Benchmark JAXP DOM4J Libxml2 Xerces-c XPA XPAmax

Security 0.199 0.059 0.294 0.100 1.000 1.040Structure 0.274 0.110 0.202 0.091 1.000 1.040Tpox 0.292 0.099 0.264 0.124 1.000 1.040Hl7 0.415 0.189 0.360 0.128 1.000 1.040Qedeq 0.481 0.221 0.338 0.133 1.000 1.040Xmark 0.550 0.256 0.416 0.187 1.000 1.040Average_par 0.373 0.158 0.314 0.127 1.000 1.040CustomInfo 0.062 - 0.107 0.054 1.000 1.040CDCatalog 0.128 - 0.232 0.113 1.000 1.040Workflow 0.227 - 0.396 0.185 1.000 1.040Average_vld 0.161 - 0.283 0.134 1.000 1.040Average_all 0.267 0.158 0.299 0.131 1.000 1.040

Test ResultsMetric: Cycle Per Byte

Benchmark JAXP DOM4J Libxml2 Xerces-c XPA

Security 100.6 339.7 67.9 201.0 1.0Structure 73.1 181.3 99.1 220.5 1.0Tpox 68.5 201.3 75.9 161.0 1.0Hl7 48.2 106.0 55.6 155.8 1.0Qedeq 41.5 90.4 59.2 150.6 1.0Xmark 36.4 78.0 48.0 106.7 1.0Average_par 53.6 126.9 63.6 157.2 1.0CustomInfo 321.8 - 186.2 373.7 1.0CDCatalog 156.5 - 86.3 176.8 1.0Workflow 88.3 - 50.4 108.3 1.0Average_vld 124.4 70.6 148.8 1.0Average_all 75.0 126.9 66.9 152.9 1.0

Scalability Examination Bloom Filter efficiency

- Test Attribute Name Uniqueness circuit with generated test files- Count the number of false positives

Bloom Filter Google Key Words Wikipedia Key Words

Bit_Array 4k 8k 16k 4k 8k 16k

2 Hash Func.

64b 1 66 509 6 129 502

256b 0 5 60 1 8 56

1kb 0 1 6 1 2 2

2kb 0 0 1 0 0 0

3 Hash F

u. 256b 0 0 14 1 3 9

1kb 0 0 1 0 0 0

2kb 0 0 0 0 0 0

Implementation CostTarget Device: Xilinx Virtex-5 XC5VSX50T

LogicUtilization

Slice Register Slice LUT Block RAM

XPA 4455 (13%) 6594 (20%) 13 (11%)MC 1960 (6%) 1683 (5%) 5 (3%))EMAC 927 (2%) 712 (2%) 3 (2%)UART 151 (1%) 187 (1%) 2 (1%)TOTAL 7493 (22%) 9176 (28%) 23 (17%)XC5VSX50T 32640 32640 132

ConclusionFPGA is a valid contender in XML processing

- Low clock frequency requirement to achieve high throughput

- Scalable to process large XML documents

- Moderate hardware cost to achieve high performance

Future work- Fully conformance to XML specification

Questions?

Character Scanner

Token Extractor

Token Handler

DOM Constructor

Rule Match Unit

Rule Check Unit

Write Buffer

XML Cycle Buffer

RNTRHT RCT

MemoryController

to DRAM

XML Doc Ethernet

8b 8b 64b

64b 64b64b

256b 128b 64b

32b 32b

FIFO FIFO

A 1 Cycle-Per-Byte XML Accelerator

Documents

Transcript of A 1 Cycle-Per-Byte XML Accelerator

Byte division

Cougar Byte Layout

CRT 591 M001 Protocol - marrginal.rumarrginal.ru/files/dispenser/creator/crt591/CRT_591_M001_Protocol.… · LENH (1 byte) LENL (1 byte) CMT (1 byte) CM (1 byte) PM (1 byte) DATA

Byte The Core

10 logo1 byte

SPECIFICATION - tastaturcontroller.de¶hr.pdf · The output data format is as following Byte Definition Explanation Byte 0 Touch ON/OFF Touch ON/OFF Fixation Byte 2~ Byte 1 X Coordinates

Networked Applications: Sockets - האקדמיתhbinsky/intro comp comm/Sockets.pdf · Four-byte long int: (byte 3, byte 2, byte 1, byte 0) vs. (byte 0, byte 1, byte 2, byte 3) String

X10 Tester - · PDF fileX10_Tester.can Page 3 of 128. 151 BYTE hasToSendWriteColumnConfig = 0; 152 BYTE hasToSendWriteColumnDefault = 0; 153 154 BYTE StartLogicControl[6]; 155 BYTE

Byte Size Assistance

MAXIM INTEGRATED CONFIDENTIAL the delay is complete, the master transmits a du mmy byte and receives the length byte and result byte from DS28E38. Depending on the length byte received,

DZ770 DW740 DX810 コマンド€¦ · 1 byte 4 bytes 1 byte 3 bytes 1 byte Undefined length 1 byte 基本制御コマンド（サブコマンドあり） Start (STX) ID Separator

Coding Interview Questions - Home - Byte by Byte

Golden Byte 2014

Encoder assoluto multi-giro con protocollo CANopen, di ... · COB ID DLC Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7 Byte 8 580H+Node 08H 80H Object L Object H Subindex Err.0

6.Byte Mobile

XML Schema xml-xsd XML Schema - Bienvenue à TECFAtecfa.unige.ch/guides/te/files/xml-schema.pdf · XML Schema xml-xsd XML © Daniel. K. ... XML Schema - . xml-xsd-1-3 XML © Daniel.

HP XML Accelerator NetWeaver PI · enhance the SAP NetWeaver landscape •"Lunch & Learn" session −Briefing to 20-25 customers invited to hear more about HP XML Accelerator; also

Folio Byte by Byte 2009

serpact.bg · Incorrect use of robots.txt + sitemap XML files Capacity to respond #1 quickly Time to last byte - TTLB Average time the bot downloads a page ...

Byte Blaster MV