Post on 27-Mar-2015
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
BinX – A Tool for Binary File BinX – A Tool for Binary File AccessAccess
eDIKT project team
Ted Wen tedwen@edikt.org
Robert Carroll robert.carroll@edikt.org
www.edikt.orgwww.edikt.org
AgendaAgenda
About the BinX project Introduction to the BinX language Introduction to the BinX library Example application Overview of the BinX API Discussion
www.edikt.orgwww.edikt.org
The problemThe problem
Most scientific data are in binary files Binary data files are not all standardized Binary data files are platform-dependent
XML is useful to represent metadata Scientific datasets can be too large in
XML
www.edikt.orgwww.edikt.org
What is BinX?What is BinX?
Binary in XML– Annotation language
Using XML Descriptive Low-level
– Software components BinX library Generic utilities API
www.edikt.orgwww.edikt.org
How and Why BinX is usedHow and Why BinX is used
0101010101
0101010101
0101010101
0101010101
0101010101010101010100010000101110101010101010101010110
0101010101010101010100010000101110101010101010101010110
SpecialApplication
Program
SpecialApplication
Program
<dataset>… …</dataset>
<dataset>… …</dataset>
BinXLibrary
ApplicationProgram
ApplicationProgram
ApplicationProgram
ApplicationProgram
ApplicationProgram
ApplicationProgram
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
The BinX LanguageThe BinX Language
Annotating a binary data stream
Mark up data typesMark up sequences
Mark up arraysComplex structures
www.edikt.orgwww.edikt.org
Data elementsData elements
Primitive data elements– Byte, character, integer, real
Complex data elements– Arrays, struct, union
User-defined data elements
www.edikt.orgwww.edikt.org
Primitive Data TypesPrimitive Data Types
Character– <character-8>– <string> (Fixed length, variable length and delimited)
Integer– <byte-8>– <short-16>, <unsignedShort-16>– <integer-32>, <unsignedInteger-32>– <long-64>, <unsignedLong-64>
Real– <float-32>– <double-64>– <quadruple-128>
www.edikt.orgwww.edikt.org
1. <short-16 byteOrder=“littleEndian”> 32767</short-16>
2. <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32>
3. <float-32 byteOrder=“littleEndian”>100.0</float-32>
4. <float-32 byteOrder=“bigEndian”>100.0</float-32>
Primitive Data Types Primitive Data Types
Mark up data types
FF 7F 7F FF FF FF 00 00 C8 42 42 C8 00 00
1 2 3 4
www.edikt.orgwww.edikt.org
Abstract “struct” typesAbstract “struct” types
Mark up a sequence
<struct> <unsignedShort-16 /> <unsignedShort-16 /> <byte-8 /> <byte-8 /> <byte-8 /></struct>
Screen descriptor in GIF:
Screen width: unsigned short;
Screen height: unsigned short;
Packed field: a byte
Background colour index: byte
Pixel aspect ratio: byte
www.edikt.orgwww.edikt.org
Abstract “array” typesAbstract “array” types
Mark up an array
<arrayFixed> <integer-32 /> <dim indexTo=“99”> <dim indexTo=“9” /> </dim></ arrayFixed >
A 2-dimensional array containing 10-by-100,32-bit integers
www.edikt.orgwww.edikt.org
Embedded abstract typesEmbedded abstract types
Complex structures<struct>
<short-16 />
<arrayFixed>
<byte-8 />
<dim indexTo=“7” />
</arrayFixed>
<struct>
<integer-32 />
<float-32 />
<double-64 />
</struct>
</struct>
www.edikt.orgwww.edikt.org
User-defined metadataUser-defined metadata
Label the data types and structures<struct varName=“Data Sample”>
<short-16 varName=“ID” />
<arrayFixed varName=“List of 10 complex numbers”><struct varName=“Complex”>
<float-32 varName=“Real” /><float-32 varName=“Imaginary” />
</struct>
<dim indexTo=“9” />
</arrayFixed>
</struct>
www.edikt.orgwww.edikt.org
Reusable type definitionsReusable type definitions
Define macros for reuse<definitions>
<defineType typeName=“FourCC”><arrayFixed>
<character-8 /><dim count=“4” />
</arrayFixed></defineType>
</definitions>
<struct varName=“Wave_Header”><useType typeName=“FourCC” varName=“Keyword” /><integer-32 varName=“Chunk_Size” />
</struct>
www.edikt.orgwww.edikt.org
Linking to binary dataLinking to binary data
Reference the binary data file<definitions>
<defineType typeName=“Header”>… …</defineType><defineType typeName=“Format_Chunk”>… …</defineType><defineType typeName=“Data_Chunk”>… …</defineType>
</definitions>
<dataset src=“myfile.wav”><useType typeName="Header" /><useType typeName="Format_Chunk" /><useType typeName="Data_Chunk" />
</dataset>
www.edikt.orgwww.edikt.org
The BinX documentThe BinX document
<?xml version=“1.0”?>
<binx xmlns=“http://www.edikt.org/binx”>
<dataset src=“binary.bin” byteOrder=“littleEndian”>
<short-16/>
<integer-32/>
<double-64/>
</dataset>
</binx>
www.edikt.orgwww.edikt.org
A BinX documentA BinX document
<binx byteOrder=“bigEndian”>– <definitions>
<defineType typeName=“myTyp”>– <arrayFixed>
• <character-8/>• <dim indexTo=“9”/>
– </arrayFixed>
</defineType>
– </definitions>– <dataset src=“myfile.bin”>
<useType typeName=“myTyp”/> <integer-32 varName=“X” />
– </dataset>
</binx>
Root element
Data class section
Data instance section
Abstract data type
www.edikt.orgwww.edikt.org
DataBinXDataBinX
DataBinX = BinX with Data<dataset src=“myfile.bin”>
<struct><short-16 /><long-64 /><double-64 />
</struct>
<arrayFixed><integer-32 /><dim count=“2” />
</arrayFixed>
</dataset>
<dataset> <struct> <short-16>100</short-16> <long-64>1000</long-64> <double-64>5.257</double-64> </struct> <arrayFixed> <dim> <integer-32>1</integer-32> </dim> <dim> <integer-32>2</integer-32> </dim> </arrayFixed></dataset>
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
The BinX LibraryThe BinX Library
Core library
Utilities
Applications
www.edikt.orgwww.edikt.org
Output from the libraryOutput from the library
DataBinX
combined data and BinX document SchemaBinX Binary data stream
DataBinX = SchemaBinX + Binary data
www.edikt.orgwww.edikt.org
BinX ComponentsBinX Components
The library has core functionality to support generic utilities and applications
Applications
Utilities
BinX LibraryCore
BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX
Generic tools DataBinx pack/unpack Extractor
Applications Domain-specific
www.edikt.orgwww.edikt.org
BinX application modelsBinX application models
Data manipulation model
Data transportation model
Data service model
Data query model
Data catalogue model
www.edikt.orgwww.edikt.org
Data manipulation modelData manipulation model
Extraction– Subset of a dataset
Combination– Merge several datasets
Transformation– Conversion of data types– Change of sequence order– Transposition of array dimensions
Transparency– Automatic change of byte order
www.edikt.orgwww.edikt.org
Data transportation modelData transportation model
DataBinX as interlingua
XMLdocument
XMLdocument
DataBinX
DataBinX Schem
aBinX
SchemaBinX
BinX+Binary
BinX+Binary
ZIP(MIME)
ZIP(MIME)
XSLTBinXUtil
ZIPtool
SendReceive
XSLTBinXUtil
ZIPtool
www.edikt.orgwww.edikt.org
Data service modelData service model
Publishing logical datasets in BinX
DB
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
Client
BinX
BinX
BinX
BinX
Grid
0101010101
0101010101
BinX
BinX
Dataset from one binary file
Dataset from several binary files
Dataset from multiple data sources
www.edikt.orgwww.edikt.org
Data query modelData query model
Create DataBinX– From Binary and BinX
Query DataBinX– Use XPath
Create New DataBinX– Results from query
Parse DataBinX– Create new Binary and
BinX
010101010
010101010
BinX+
Binary
BinX+
BinaryDataBinX
DataBinX
XPath
NewDataBi
nX
NewDataBi
nX
010101010
010101010
BinX+
Binary
BinX+
Binary
www.edikt.orgwww.edikt.org
Data catalogue modelData catalogue model
Primary storage
Binary data files
Metadata
Syntactic annotation
Semantic annotation
Classification
Domain specific
Cross-reference
XLink 0101010101
0101010101
BinX
1.1
BinX
1.1
BinX
1.2.1
BinX
1.2.1
BinX
1.2.2
BinX
1.2.2
BinX
1.2.3
BinX
1.2.3
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
BinX
1.2
BinX
1.2
BinX1
BinX1
BINARY
Detailed
Abstract
METADATA
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
Application in AstronomyApplication in Astronomy
Case Study
Data Conversion
Between FITS and VOTable
www.edikt.orgwww.edikt.org
Application in astronomyApplication in astronomy
FITS and VOTable conversion
DataBinX Utility
BinX libraryCore
SIMPLE = T… …END
01010101
SIMPLE = T… …END
01010101
<?xml version=.<VOTABLE>… …
</VOTABLE>
<?xml version=.<VOTABLE>… …
</VOTABLE>
www.edikt.orgwww.edikt.org
FITS fileFITS file
SIMPLE = T / file does conform to FITS standard
BITPIX = 8 / number of bits per data pixel
NAXIS = 1 / number of data axes
… …
END
3D 4A 14 0F 1C FE 25 04 … …
XTENSION= ‘BINTABLE’ / binary table extension
BITPIX = 8 / 8-bit bytes
NAXIS = 2 / 2-dimensional binary table
… …
END
7B 3E 40 2C 16 70 E7 6F … …
0 79
Primary HDU
Extension
Header
Header
Data
Data
www.edikt.orgwww.edikt.org
VOTableVOTable
<VOTABLE><RESOURCE>
<PARAM name=“Obs” value=“Bob”/><TABLE name=“Stars”> <FIELD name=“Star-name” datatype=“char” arraysize=“10” /> <FIELD name=“RA” datatype=“float” /> <FIELD name=“Dec” datatype=“float” /> <FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” /> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> </TABLEDATA> </DATA></TABLE>
</RESOURCE></VOTABLE>
www.edikt.orgwww.edikt.org
FITS →DataBinX →VOTableFITS →DataBinX →VOTable
FITS to VOTable conversion
DataBinX Utility
FITSFITS
SchemaBinX
SchemaBinX
Preprocessor
DataBinX
DataBinX
VOTable
VOTable
XSLTXSLT
XSLTtransformer
www.edikt.orgwww.edikt.org
VOTable→DataBinX→FITSVOTable→DataBinX→FITS
VOTable to FITS conversion
XSLTtransformer
VOTable
VOTable
XSLTXSLT
DataBinX
DataBinX
FITSFITS
SchemaBinX
SchemaBinX
DataBinXUtility
BinaryData
BinaryData
Postprocessor
FITSHeader
FITSHeader
www.edikt.orgwww.edikt.org
SupportSupport
Information and software download:– http://www.edikt.org/binx
Questions:– support@edikt.org
Requirements and suggestions:– tedwen@edikt.org– robertc@edikt.org
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
BinX APIBinX API
www.edikt.orgwww.edikt.org
Parsing a BinX documentParsing a BinX document
BxBinxFile* pReader = new BxBinxFile();
If (pReader->parse(“mybinx.xml”))
{BxDataset* pDataset =
pReader->getDataset();
}
www.edikt.orgwww.edikt.org
Reading a BinX documentReading a BinX document
BxArrayFixed* pArray = pDataset->getArray(0);
BxArrayFixed* pArray = pDataset->getArray(“fixed”);
Get an array object
BxDataset* pStruct = pArray->get(0, 0); Get a struct from the array
www.edikt.orgwww.edikt.org
Reading a BinX documentReading a BinX document
BxFloat32* pReal = pStruct->getFloat(“Real”);
Float real = pReal->getFloat(); Get the data value
www.edikt.orgwww.edikt.org
Creating BinX documentCreating BinX document
BxBinxFileWriter* pWriter = new BxBinxFileWriter();
Create a object to write out the document
BxDataset* pData = new BxDataset(); Create a new dataset (in memory BinX
document)
BxShort16* i16 = new BxShort16(100);pData->addDataObject(i16);
www.edikt.orgwww.edikt.org
Creating BinX documentCreating BinX document
BxBinaryFile* pbf = new BxBinaryFile(); Create a new binary file
pbf->setDatasetPointer(pData); Create a link to the BinX document
pWriter->setBinaryFilePtr(pbf);pWriter->save("TestDataset.xml"); Save the BinX document
www.edikt.orgwww.edikt.org
Merge binary dataMerge binary data
BxBinxFileReader * pFile1 = new BxBinxFileReader(“file1.xml”);
BxBinxFileReader * pFile2 = new BxBinxFileReader(“file2.xml”);
BxDataset * pDataset1 = pFile1->getDataset();BxDataset * pDataset2 = pFile2->getDataset();BxArray * pArray1 = pDataset1->getArray(0);BxArray * pArray2 = pDataset2->getArray(0);BxDataObject * pData1 = pArray1->getNext();BxDataObject * pData2 = pArray2->getNext();FILE * fo = fopen(“output.dat”,”wb”);pData1->toStreamBinary(fo);pData2->toStreamBinary(fo);
www.edikt.orgwww.edikt.org
SummarySummary
One BinX document can describe
many binary files Generate BinX document from code Easy to use interfaces Flexible