Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Transcript of Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Filling the Data LakeChuck Yarbrough, Director, Pentaho SolutionsMark Burnette, Enterprise Sales Engineer
Preview presentation for Strata + HadoopWorld San Jose 2016 session Thursday, March 31 at 11:50 am, room 230B
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75552
Hadoop is Hard…
Empower team members to
integrate and process Hadoop
Data
Establish a modern data on
boarding process that is flexible and
scalable
Deliver governed analytic insights
for large production use
bases
Things that can help ease the pain
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75553
Proper Care and Feeding of the Data Lake
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75554
How do we effectively scale data pipelines to accommodate exploding data sources, volumes, and complexity?
More Data, More Problems
Have you ever had the pleasure of…
Migrating hundreds of tables between databases?
Enabling business users to onboard a variety of data themselves?
Ingesting hundreds of changing data sources into Hadoop?
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75555
More Data, More Problems
Modern data onboarding is more than just “connecting” or “loading” – it includes:
Managing a changing array of data sources
Establishing repeatable processes at scale
Maintaining control and governance
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75556
CSVCSV
RDBMS
Big Data On Boarding
Ingest Procedures
Hadoop
AVRO
RDBMS
Disparate Data Sources Dynamic Integration Processes Dynamic Transformations
RDBMS
© 2015, Pentaho. All rights reserved. pentaho.com. Worldwide +1 (866) 660-75557
Continuous Big Data On BoardingBlueprint
Streamline data ingest from wide
variety of source data
Reduce dependence on hard coded data
movement procedures
Simplify regular data movement at scale
into Hadoop