Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g.,...
Transcript of Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g.,...
1
Boosting the Power of Swift with Metadata SearchPresentersDean HildebrandEran RomNilesh Bhosale
Joint work withPaula Ta-ShmaGuy Hadash
1
Agenda
▪ What is Object Metadata?
▪ What is Metadata Search?
▪ Use Cases
▪ Demo
▪ Implementation Details
▪ Future Work
2 2
What is Metadata?
▪ User-defined metadata▪ Unique feature of object storage compared to other storage systems
▪ Swift and S3 metadata are compatible through Swift3 middleware
▪ Metadata is the structured data about the unstructured object▪ Who, what, when, where, and why of account, container, object
▪ Perfect for indexing and searching
3 3
Metadata Examples
4
Age Biomarkers Developmental Stage Cell Surface Markers Cell Type/Cell LineDisease State Extract Molecule Genetic Characteristics Immunoprecipitation AntibodyOrganism Platform Sex Strain Time Point Tissue Type Treatment Compound
Biomedical
Astronomy & Astrophysics
Geospatial
Image
Music
4
What Swift Metadata Exists and How do I use it?
▪ User Metadata can be added/removed to Accounts/Containers/Objects
▪ E.g., X-Container-Meta-{name}, X-Remove-Container-Meta-{name}
▪ System metadata also exists, some can even be set by the user▪ E.g., Content-Type, Last-Modified
▪ Semantics▪ PUT and POST Metadata Semantics
Account/Container – New user metadata added to existing list of metadataObject – New user metadata overwrites all existing user metadata
▪ COPY retains existing metadata unless new metadata is specified▪ HEAD returns metadata only
5
What is Metadata Search?
6
▪ Automatically index and catalog Swift user and system metadata
▪ Provide REST-API for searching for objects based on their metadata
▪ Currently available in IBM SoftLayer Swift object storage service
6
Why is Metadata Search Valuable?
7
▪ Imagine Internet without Google
▪ Swiftly find needles in the OpenStack
▪ Help users and administrators perform Data Analytics
▪ Metadata can be on highest tier (SSD) while data resides on lower tier (Disk/Tape)
General Use Cases
▪ Data Mining
▪ Data Warehousing
▪ Selective data retrieval, data backup,
data archival, data migration
▪ Management/Reporting 7
8
City: RomeTime: Day
photo1.jpgCity: RomeTime: Night
photo2.jpgCity: HaifaTime: Day
photo3.jpg
GET /MyPhotoSpace?query=city=‘Rome’ AND Time=’Day’
GET /MyPhotoSpace?query=time=‘Night’
* Schematic, not complete syntax
Sample Use-CasesAdvanced Photo Album
8
photo4.jpgCity: TokyoTime: Night
Media use case - Complex Searches
Search Query
GET /MyPhotoSpace?query=tags ~ 'John' OR tags ~ 'Bob' OR tags ~ 'Alice' AND date > 2/12/2012 AND date < 3/12/2013 AND num_views > 10000
What we searched for?
▪ Date range search
▪ Free Text matching
▪ Integer comparison
9
Metadata Enrichment
Storlet
Object Store
Swift
Upload
EnrichedMetadata
Data
myvideo.mxf
Metadata
Data
myvideo.mxf
Data
Metadata Search with Enriched Metadata – Developed with RAI Italy
10
Finding objects by their metadata values
SwiftGet objects whose loudness
is faulty
Object Store
Metadata Search Facility
myvideo.mxf
Find faulty objects
11
Analyze IoT data efficiently and cost effectively
– Treat Swift as a long term store for semi-structured IoT data
– Store in Parquet format– Queryable via Apache Spark SQL– Optimized predicate pushdown
- Implemented a custom Spark SQL external data source driver
- Uses metadata indexes- Searches for Swift objects whose min/max
values overlap requested ranges
Get all data for morning traffic:SELECT codigo, intensidad, velocidad FROM madridtraffic WHERE tf >= '08:00:00' AND tf <= '12:00:00'
Brute force method13245 Swift requests
Optimized predicate pushdown616 Swift requests
21.5 times improvement
Swift
Analytics Use Case
IoT Analytics Use Case Example Metadata
IoT Use Case - EMT Madrid Bus Service
▪ Search capability allows understanding traffic at a
given time slot, helps plan better for future events
▪ Historical Data about bus trips - generated by IoT
devices mounted on the EMT Buses
▪ Data ingested into Object Store, along with relevant
metadata
14
Data Collected from EMT Buses
15
Kafka + Secor
Groups into objects, uploads at regular intervals
Storletsgeneratemetadata
1. Storlet converts GPS coordinates from UTM to lat,long
2. Storlet calculates GPS bounding box and stores as metadata
Bus Data continuously uploaded to Object Store
16
17
Demo
17
Behind the Scenes of Metadata Search
18
▪Metadata search involves two flows:
▪ Indexing objects’ metadata
▪ Serving search queries
18
Indexing Objects’ Metadata
19 19
Storage System input data path
Indexing Objects’ Metadata
20 20
Storage System input data path Indexer
Indexing Objects’ Metadata
21 21
Storage System input data path
Queue
Index / SearchIndex /
Search
Indexer
Indexing Objects’ Metadata
22 22
Swift Proxy pipeline Swift Storage Tier
Rabbit
Elastic SearchElastic
Search
Indexer Middleware
Serving Search Requests
23 23
Swift Proxy pipeline
Elastic SearchElastic
Search
MD SearchMiddleware
Swift Object Store
ProxyService
StorageNodes
Indexer
Swift ProxyNodes
StorageNodes
Swift StorageNodes
HTTP SwiftRequests
Load Balancer
Overall Architecture
24
Search
...Rabbit
ProxyService
Indexer
Search
Rabbit
ElasticSearch Cluster
Example:
GET http://iotserver.example.com/v1/AUTH_...2357c/busData?
query=X-Object-Meta-Top-Left-G in [40.7,22.5],[39.9,22.1] AND
X-Object-Meta-Bottom-Right-G in [40.7,22.5],[39.9,22.1]
X-Context: search
Query API
Example:GET http://iotserver.example.com/v1/AUTH_...2357c/busData?
query=X-Object-Meta-Top-Left-G in [40.7,22.5],[39.9,22.1] AND
X-Object-Meta-Bottom-Right-G in [40.7,22.5],[39.9,22.1]
▪Query Features:1. Multiple criteria possible2. Supports various operators
• =, !=, <, <=,in,~,...3. Supports metadata data types
• strings, integers, floats, dates, geo-points, free text• Allows comparisons and range searches
Query API
Where Do We Go From Here?
▪Extend to support File-based (NFS/SMB) attributes▪Standardize Search API▪Standardize back-end APIs to allow support for any queuing and/or database systems▪Work on visualizing information through Kibana, etc▪Collaborate with OpenStack Community Efforts▪ Swift Event Notification Mechanism▪ OpenStack Searchlight
■ Also built on Elastic Search and RabbitMQ■ Work to standardize search API
27
Spectrum Scale Object Store
ProxyService
ObjectService
SpectrumScale
ObjectService
SpectrumScale
..
.Keystone
AuthenticationService
SwiftServices
AdditionalServices in
Cluster
Metadata Index DB
Search and SwiftRequestsLoad
Balancer
Will be Available with IBM Spectrum Scale - 4Q15
ES
ProxyService
Middleware
RMQ
28
Middleware
ES RMQ
1.Pre-installed and configured Virtual Appliance
2.Roll-your-own solution○ White Paper to be
released describing how to setup and configure
○ Will include a source tarball
○ Fine tune as per your requirements
29 29