Inserts At Drive Speed
description
Transcript of Inserts At Drive Speed
![Page 1: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/1.jpg)
Inserts At Drive SpeedBen HaleyResearch DirectorNetQoS
![Page 2: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/2.jpg)
Overview Introduction Our problem Why use a storage engine? How to implement a read-only storage engine Optimization
Goal: Provide a new tool that might help solve your issues.
![Page 3: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/3.jpg)
Who is NetQoS? Commercial software vendor Network Traffic Analysiso Who is on the network?o What applications are they using?o Where is the traffic going?o How is the network running?o Can the users get their work done?
Built on MySQL
![Page 4: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/4.jpg)
Who Are Our Customers?
![Page 5: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/5.jpg)
Problem Domain Collected, analyzed and reported on network data Each collector received >100k records/second Data was stored for the top IP addresses,
applications, ToS for each interface Data was stored at 15-minute resolution Kept data for 6 weeks – 13 months
![Page 6: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/6.jpg)
What Did Customers Want? Greater resolution New ways to look at data More detail Use existing hardware
![Page 7: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/7.jpg)
Key Observations Information was in optional or temporary files Data is unchanging Large data volumes (100s of GB/day) Data collectors scattered over the enterprise Expensive to pull data to a central analysis box Most analysis focused on short timeframes Small subset of the data was interesting Hierarchical data Flexible formats
![Page 8: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/8.jpg)
1st Approach – Custom Service C++ service to query data Create a result set to pull back to reporting console
Advantageso Fasto Leveraged existing software
Issueso Not very flexibleo Only access through the console UI
![Page 9: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/9.jpg)
2nd Approach – Traditional DB Insert data into database Reporting console queries database
Advantageso Easyo Somewhat flexibleo Access from standard DB tools
Issueso Hard to maintain insert/delete rateso Database load operations tax CPU and I/Oo Not as flexible as desired
![Page 10: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/10.jpg)
3rd Approach –Storage Engine Manage data outside the database Create storage engine to retrieve data into MySQL
Advantageso Fasto Extremely flexibleo Only pay CPU and I/O overhead in querieso Access from standard DB tools
Issueso Learning curveo Multiple moving parts
![Page 11: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/11.jpg)
What Does This Look Like?
MySQL
MyI
SAM
Inno
DBAr
chiv
e…
Cust
omData Files
Queries
Data Collection
and Management
![Page 12: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/12.jpg)
Collector Provides Collect data Create data files Age out old data Indexing Compression
Collector manages data
![Page 13: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/13.jpg)
MySQL Provides Remote Connectivity SSL Encryption SQL Supporto Queries (select)o Aggregations (group by)o Sorting (order by)o Integration with other data (join operations)o Functionso UDF Support
MySQL gives us a SQL stack
![Page 14: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/14.jpg)
Storage Engine Provides Map data into MySQL Provides optimization information on indexes Efficient data extraction Flatten data structure Decompression
Storage engine provides the glue between collector and MySQL
![Page 15: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/15.jpg)
How To Great document for storage engines:
http://forge.mysql.com/wiki/MySQL_Internals_Custom_Engine#Writing_a_Custom_Storage_Engine
I am going to concentrate on divergence
![Page 16: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/16.jpg)
Overview of our approach Singleton data storage Storage engine maps to the data storage Table schema is a view into storage Table name for unique view Column names map to data elements Indices may be real or virtual
![Page 17: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/17.jpg)
Storage Management External process creates/removes data Storage engine indicates the data During query, storage engine locks data range
![Page 18: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/18.jpg)
Simple Create Table ExampleCREATE TABLE `testtable` (
`Router` int(10) unsigned,`Timestamp` int(10) unsigned,`Srcaddr` int(10) unsigned,`Dstaddr` int(10) unsigned,`Inpkts` int(10) unsigned,`Inbytes` int(10) unsigned,index `routerNDX`(`Router`)
) ENGINE=NFA;
![Page 19: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/19.jpg)
Behind the Scenes MySQL creates a .frm file (defines table) Storage engine validates the DDL No data tables are created No indices are created Table create/delete is almost free
![Page 20: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/20.jpg)
Validation Example – Static Format Restricted to specific table names Each table name maps to a subset of data Fixed set of columns for table name Fixed index definitions
![Page 21: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/21.jpg)
Validation Example – Dynamic Fmt Table name can be anything Column names must match known definitionso Physical Columnso Virtual Columns
Indices may be real or artificialo Realized Indiceso Virtual Indices
![Page 22: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/22.jpg)
Virtual Columns Represent alternate ways of representing data or
derived values Provides a shortcut instead of using functions Examples:o Actual columns
• ipAddress – IP address• ipMask – subnet CIDR mask (0-32)
o Virtual columns• ipMaskBits – bit pattern described by ipMask• ipSubnet – ipAddress & ipMaskBits
![Page 23: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/23.jpg)
Optimizing Columns Our storage engine supports many columns Storage engines have to return the entire row defined
in the table MySQL uses only columns referenced in select
statement Table acts as view, so make view narrower
![Page 24: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/24.jpg)
Index Optimization Options MySQL parsero Define indices in schemao Provide guidance to MySQL
Roll your owno Limits table interoperabilityo Best left to the experts
![Page 25: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/25.jpg)
Real Index Expose the internal file format Example:o Data organized by timestamp, srcAddro Query: select router, count(*)
where srcAddr=inet_aton(10.1.2.3)and timestamp > ‘2009-04-20’;
o Add index timestampo Storage engine walks data by timestamp
![Page 26: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/26.jpg)
Real Index #2 Example:o Data organized by timestamp, srcAddro Query: select router, count(*)
where srcAddr=inet_aton(10.1.2.3)and timestamp > ‘2009-04-20’;
o Add index timestamp, srcAddro Storage engine still walks data by timestamp
Database is unable to leverage the full index!
![Page 27: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/27.jpg)
Virtual Index Not completely supported, but good enough Example:o Data organized by timestamp, srcAddro Query: select router, count(*)
where srcAddr=inet_aton(10.1.2.3)and timestamp > ‘2009-04-20’;
o Add index srcAddr, timestampo Storage engine still walks data by timestamp, but
filters on srcAddro Would fail on range scan of srcAddr
![Page 28: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/28.jpg)
Virtual Index #2 Example:o Data organized by timestamp, srcAddro Query: select router, count(*)
where srcAddr=inet_aton(10.1.2.3)and timestamp > ‘2009-04-20’;
o Add index timestampo Add index srcAddro Storage engine still walks data by timestamp, but
filters on srcAddr
![Page 29: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/29.jpg)
Index Optimization Leverage storage format Add virtual index support where helpful Don’t overanalyzeo Be accurate if fasto Estimates are fineo Heuristics are often greato Be careful about mixing approaches
![Page 30: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/30.jpg)
Index Heuristics Example Start with large estimate for number of rows returned Adjust estimate based on expected value of column
constraintso Time – great – files are organized by timeo srcAddr
• Good for equality• Terrible for range
o Bytes – poor
![Page 31: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/31.jpg)
Typical Query Pattern Create temp tableo Specify only necessary columnso Specify optimal indices for where clause and engine
Select … Drop table
![Page 32: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/32.jpg)
KISS Only support what you have to:o Do you need multiple datasets?o Do you need flexible table definitions?o Do you need insert/delete/alter support?o How will data be accessed?
It’s OK to limit functionality to just solving your problem
![Page 33: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/33.jpg)
Conclusion Why we used a storage engine Storage engine pattern Optimizationo Columnso Indiceso Virtual indices
![Page 34: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/34.jpg)
Application Log analysis Transaction records ETL alternative Custom database
![Page 35: Inserts At Drive Speed](https://reader035.fdocuments.in/reader035/viewer/2022081604/56816694550346895dda75a1/html5/thumbnails/35.jpg)
Questions