Having Your Cake and Eating It Too

49
Having Your Cake and Eating It Too With Apache OODT and Apache Solr Andrew F. Hart Paul M. Ramirez

description

Having Your Cake and Eating It Too. With Apache OODT and Apache Solr. Andrew F. Hart Paul M. Ramirez. About Myself…. Software Engineer NASA Jet Propulsion Laboratory “Data Management” Committer: OODT, SIS, Gora, Streams (Incubating) Mentor: Streams (Incubating). What We’ll Cover. - PowerPoint PPT Presentation

Transcript of Having Your Cake and Eating It Too

Page 1: Having Your Cake and Eating It Too

Having Your Cake and Eating It Too

With Apache OODT and Apache Solr

Andrew F. Hart

Paul M. Ramirez

Page 2: Having Your Cake and Eating It Too

About Myself…• Software Engineer– NASA Jet Propulsion Laboratory– “Data Management”

• Committer: – OODT, SIS, Gora, Streams (Incubating)

• Mentor: Streams (Incubating)

Page 3: Having Your Cake and Eating It Too

What We’ll Cover• Overview of OODT & Solr Projects

• Strategies for Combining OODT and Solr

• Detailed Deployment/Config. Example

• Where to Learn More & Participate

Page 4: Having Your Cake and Eating It Too

Apache OODT• Object Oriented Data Technology• Origin in NASA mission data systems• Components for– Information integration– Data cataloging and archiving– Configurable workflow processing

Page 5: Having Your Cake and Eating It Too

Apache OODT• OODT @ Apache– Incubation: 2010, Graduation: 2011– 29 Committers– Latest Release: 0.5 (Dec. 26, 2012)

Page 6: Having Your Cake and Eating It Too

Apache OODT• Karoo Array Telescope (KAT-7)

Page 7: Having Your Cake and Eating It Too

Apache OODT• Virtual Pediatric Intensive Care Unit

Page 8: Having Your Cake and Eating It Too

Apache OODT• Regional Climate Model Evaluation

System

Page 9: Having Your Cake and Eating It Too

Apache OODT• Commonalities between systems– Lots of data– Defined processing steps / algorithms

• Archives important (… search important)

Page 10: Having Your Cake and Eating It Too

Apache OODT• Strengths of OODT for the above use

cases– Loosely coupled components– Standard protocols, well-defined

interfaces– Highly configurable– Vetted, reliable code

Page 11: Having Your Cake and Eating It Too

Apache Solr• Search + Web Services– Powerful features– Flexible formats– Highly configurable

Page 12: Having Your Cake and Eating It Too

Apache Solr• The White House

Page 13: Having Your Cake and Eating It Too

Apache Solr• Netflix

Page 14: Having Your Cake and Eating It Too

Apache Solr• NASA Planetary Data System

Page 15: Having Your Cake and Eating It Too

OODT & Solr• Why use these projects together?• Archives often need search capability• Similarities / Compatibilities– XML-based configuration– Environment (Java, Tomcat)

Page 16: Having Your Cake and Eating It Too

Example Integration“Standard” Data Archive Pipeline

Page 17: Having Your Cake and Eating It Too

Example Integration“Standard” Data Archive Pipeline + Search

Page 18: Having Your Cake and Eating It Too

OODT Products• Typically 1-1 with Files• Each uniquely identifiable (GUID)• Support for higher-level

“ProductType”– A way to define collections

Page 19: Having Your Cake and Eating It Too

OODT Metadata• Annotations for products• Key:{Val|Multival}• Common across all OODT components• Two general classes: – System– User

Page 20: Having Your Cake and Eating It Too

OODT Metadata• System Metadata– Added automatically by OODT

Components– Used to track state– Used to encode relationships between

data

Page 21: Having Your Cake and Eating It Too

OODT Metadata• User Metadata– Specified as “policy”– Can be product-level, or productType-

level– Used to extract & persist information

from files as they are ingested (become products)

Page 22: Having Your Cake and Eating It Too

OODT Metadata• Metadata (Policy) Example

(external)

Page 23: Having Your Cake and Eating It Too

Solr Schema• XML document• Define what will be indexed (“Fields”)• Provide high-level context hints– Data type, behavior, pre-processing

• Extremely flexible, extensible

Page 24: Having Your Cake and Eating It Too

Solr Schema• Solr Schema Example

(external)

Page 25: Having Your Cake and Eating It Too

Making the Connection• SolrIndexer Tool– Part of the File Manager component

tools–Map OODT Metadata to Solr Fields– Create Solr documents from OODT

products– Note: only talking about metadata

Page 26: Having Your Cake and Eating It Too

SolrIndexer Tool• Org.Apache.Oodt.Cas.Filemgr.Tools

• Available since 0.4 Release• Recommend to use 0.5+ as some

stability improvements were added• Several modes of operation

Page 27: Having Your Cake and Eating It Too

SolrIndexer Tool

Page 28: Having Your Cake and Eating It Too

SolrIndexerTool• Invocation Examples: Ingest all

products from the specified File Manager instancejava -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

Page 29: Having Your Cake and Eating It Too

SolrIndexerTool• Invocation Examples: Ingest all

products from the specified ProductType(s)java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

Page 30: Having Your Cake and Eating It Too

SolrIndexerTool• Invocation Examples: Ingest a single

product by its unique product id

java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b8-7999-11e1-b581-8b771498975d \ [--delete] \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

Page 31: Having Your Cake and Eating It Too

SolrIndexerTool• Invocation Examples: Force

optimization of the Solr index

java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl http://localhost:8080/solr

Page 32: Having Your Cake and Eating It Too

Indexer.properties• Configuration file for the SolrIndexer• Specify mapping between OODT

product metadata and Solr fields• Additional “pre-processing” features

Page 33: Having Your Cake and Eating It Too

Indexer.properties• Example Indexer.properties file

(external)

Page 34: Having Your Cake and Eating It Too

Use Case I• Building a searchable data archive• “Long-term” / “Lights-out” archive• Products & metadata immutable• Many NASA mission data systems

use this model• Want to make it easily searchable

Page 35: Having Your Cake and Eating It Too

Use Case I“Standard” Data Archive Pipeline + Search

Page 36: Having Your Cake and Eating It Too

Use Cases II• Building an interactively editable,

searchable data archive• Data and metadata mutable• Want to dynamically select

product(s) to edit based on metadata

Page 37: Having Your Cake and Eating It Too

Use Case IIInteractively Editable Data Archive Pipeline + Search

Page 38: Having Your Cake and Eating It Too

Use Case IIInteractively Editable Data Archive Pipeline + Search

Solr catalog out of sync!

Page 39: Having Your Cake and Eating It Too

Synchronization• Two ways (at least) to solve this:

A. Modify the OODT Curator ServicesB. Treat OODT Curator Services as “black

box” and write “wrapper” service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)

Page 40: Having Your Cake and Eating It Too

Modify Curator Services• Services implemented in JAX-RS• /curator/src/main/java/org/apache/oodt/cas/

curation/service

• [curator_url]/services/metadata/update• Options:– Utilize Solr Java API–Wrap call to OODT SolrIndexer tool

Page 41: Having Your Cake and Eating It Too

Use Case II-AModified Curator Services to Simultaneously update Solr

Page 42: Having Your Cake and Eating It Too

Example• Interactive event

tagging

Page 43: Having Your Cake and Eating It Too

Wrap Curator Services• Curator Service/API is “black box”• Develop custom service that: – Issues POST request to Curator service– Updates Solr index via, e.g.:• Utilize Solr Java API• Wrap call to OODT SolrIndexer tool

Page 44: Having Your Cake and Eating It Too

Use Case II-BWrapping OODT Curation Services with Custom UI & Services

Page 45: Having Your Cake and Eating It Too

Example

Page 46: Having Your Cake and Eating It Too

Lessons• Solr compliments OODT File Manager• RESTful interfaces (Solr + OODT

Curator) allow for great flexibility in designing services and UI

• “Best” approach depends on situation

Page 47: Having Your Cake and Eating It Too

Next Steps• Develop “SolrCatalog” for OODT File

Manager?– Pros: Reduction in “moving parts”– Cons: Restrictive?

• Implement Use Case II-A as optional mode for Curator web service layer

Page 48: Having Your Cake and Eating It Too

Learning More• Solr– http://lucene.apache.org/solr• [email protected]

• OODT– http://oodt.apache.org• https://cwiki.apache.org/confluence/display/

OODT/Home• [email protected]

Page 49: Having Your Cake and Eating It Too

Thanks!• Questions?