Not Your Father's Database by Vida Ha

Not Your Father’s Database: How to Use Apache Spark Properly in Your Big Data Architecture

Spark Summit East 2016

Not Your Father’s Database: How to Use Apache Spark Properly in Your Big Data Architecture

Spark Summit East 2016

About Me

2005 Mobile Web & Voice Search

About Me

2012 Reporting & Analytics

About Me

2012 Reporting & Analytics

2014 Solutions Engineering

This system talks like a SQL Database…

Is this your Spark infrastructure?

But the performance is very different…

Is this your Spark infrastructure?

Just in Time Data Warehouse w/ Spark

and more…HDFS

Know when to use other data stores besides file systems

Today’s Goal

Good: General Purpose Processing

Types of Data Sets to Store in File Systems: • Archival Data • Unstructured Data • Social Media and other web datasets • Backup copies of data stores

Types of workloads • Batch Workloads • Ad Hoc Analysis

– Best Practice: Use in memory caching • Multi-step Pipelines • Iterative Workloads

Benefits: • Inexpensive Storage • Incredibly flexible processing • Speed and Scale

Bad: Random Access

sqlContext.sql( “select * from my_large_table where id=2I34823”)

Will this command run in Spark?

Bad: Random Access

sqlContext.sql( “select * from my_large_table where id=2I34823”)

Will this command run in Spark? Yes, but it’s not very efficient — Spark may have to go through all your files to find your row.

Bad: Random Access

Solution: If you frequently randomly access your data, use a database.

• For traditional SQL databases, create an index on your key column.

• Key-Value NOSQL stores retrieves the value of a key efficiently out of the box.

Bad: Frequent Inserts

sqlContext.sql(“insert into TABLE myTable select fields from my2ndTable”)

Each insert creates a new file: • Inserts are reasonably fast. • But querying will be slow…

Bad: Frequent Inserts

Solution: • Option 1: Use a database to support the inserts. • Option 2: Routinely compact your Spark SQL table files.

Good: Data Transformation/ETL

Use Spark to splice and dice your data files any way:

File storage is cheap: Not an “Anti-pattern” to duplicately store your data.

Bad: Frequent/Incremental Updates

Update statements — not supported yet.

Why not? • Random Access: Locate the row(s) in the files. • Delete & Insert: Delete the old row and insert a new one. • Update: File formats aren’t optimized for updating rows.

Solution: Many databases support efficient update operations.21

Use Case: Up-to-date, live views of your SQL tables.

Tip: Use ClusterBy for fast joins or Bucketing with 2.0.

Bad: Frequent/Incremental Updates

Incremental SQL Query

Database Snapshot

Good: Connecting BI Tools

Tip: Cache your tables for optimal performance.

Bad: External Reporting w/ load

Too many concurrent requests will overload Spark.

Solution: Write out to a DB to handle load.

Bad: External Reporting w/ load

Good: Machine Learning & Data Science

Use MLlib, GraphX and Spark packages for machine learning and data science.

Benefits: • Built in distributed algorithms. • In memory capabilities for iterative workloads. • Data cleansing, featurization, training, testing, etc.

Bad: Searching Content w/ load

sqlContext.sql(“select * from mytable where name like '%xyz%'”)

Spark will go through each row to find results.

Thank you

Not Your Father's Database by Vida Ha

Data & Analytics

Transcript of Not Your Father's Database by Vida Ha

Father's Day 2014

ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha … › _res › projects › default... · 2018-03-09 · ha ha ha ha ha ha ha ha ha ha ha ha ha

ABREVIATURAS - editorialreus.es · social y cultural — la práctica deportiva ha calado de tal manera en el tejido social que se ha integrado como parte importante de la vida diaria,

Codemotion Madrid 2014 - Spring ha muerto... ¡larga vida a spring boot!

My Father's Suitcase

Università degli Studi Di Milano · 2015. 6. 8. · Cuando miro el fondo de tus ojos claros. Gracias a la vida que me ha dado tanto Me ha dado la risa y me ha dado el llanto Asi

Full Dominical n. 3485 · 2016. 12. 22. · la vida per la qual «tot ha vingut a l’existència», s’ha encarnat, ha assumit la feblesa, la limitació, la fragilitat de la condició

FATHER'S FIRST FATHER'S MIDDLE FATHER'S LAST COUNTR DP …

happy father's day

GENERAL · Baden Powell., rememorando este «día histórico» escribe: «El recuerdo de esta infeliz prueba del arte culinario tne ha acornpañado durante toda tz~i vida y ha sido

Father's day

CARACTERIZACIÓN DE LA MICROEMPRESA EN EL SECTOR … D948.pdf · persona íntegra y de buenos principios, me ha enseñado como afrontar los retos de la vida, ha soportado mis defectos

Father's Eye

Pastor’s Pondering · 2020-06-01 · sermons4kids.com A Father's Wisdom Theme: Father's Day - Listen to your father's teaching. Scripture: Listen, my sons, to a father's instruction;

CONTENIDO · Web viewPero el hombre a medida que ha ido descubriendo las leyes de la naturaleza, ha modificado su forma de vida y ha progresado en consecuencia. Porque razón no ha

Origen del ideal filosófico de la vida...por Werner Jaeger (7) : ¿cómo y cuándo ha nacido y se ha afir- mado de manera consciente el ideal filosófico de 'a Vida? (6) Ver ARISTÓTELES,

HAPPY FATHER'S ............ DADDY....!

VIDA VIDA VIDA

Not Your Father's Database by Vida Ha

· Vida religiosa: los institutos de Vida contempla- tiva y los de Vida activa, de la Vida monástica y conventual, que se ha de conservar acomodando 10 que sea necesario "a las