BigData Workshop Introduction Session - Ahmedabad Java Meetup

download BigData Workshop Introduction Session - Ahmedabad Java Meetup

If you can't read please download the document

Transcript of BigData Workshop Introduction Session - Ahmedabad Java Meetup

For Ahmedabad Java Meetup Group (300+ members strong now!)

Big Data Workshop An introduction and workshop launch session

May, 2014Dhruv GohilFrom Ishi systems

Welcome!Why a workshop and not a presentation

What you should do in workshop?

What is expected from you in this session

What you should expect from this session?

What are up-coming sessions going to be like?

Seems too serious?

Now, This is much better!

So, let's change the font!

OK... So what are we gonna do today?

Workshop setup and series introductionAlready done! (See it's easy!)

Big is not only big.

Why we need 'Big data'?

What 'Big data' is NOT?

fear of Big data? Kick it off!

Let me tell you a story..

http://en.wikipedia.org/wiki/Information_Management_System

If you still think about 'Entities' and 'Tables'

Everything you have been taught in college about Database is ALL WRONG.

http://slideshot.epfl.ch/play/suri_stonebraker

Big Data is...

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Big Data is not only big

Volume, Velocity, VarietyGB/TB vs PB/EBCentralized vs DistributedStructured vs Semi-Structured/UnstructuredData Model vs SchemaKnown relationships vs Flexible associations

What 'Big data' is NOT?

Big data Hadoop , Hadoop Big data !

What 'Big data' is NOT?

Applying for a job here?Hadoop !

What 'Big data' is NOT?

Why always Hadoop comes to mind with big data?What else we should know?Tools vs MethodologiesBeing too futuristic vs. being practical/economical

Big Data in your organization

http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of-a-c-programmer/We brought RTSC. Right To Source Code.Now, deal with it.

Big Data in your organization

Cost of tools/software decreases, but cost of knowledge increases

Being agile is the only way to deal competition

Are you working with...

Social networking and media

Mobile devices

Internet transactions

Networked devices and sensors

Big Data in your product/service

Have to change thinking in perspective of access vs. storage

Design based on when/where data is used vs. when/where data is produced.

Use redundancy in contrast of storage cost

Understand NoSQL = Not Only SQL

Streams

In memory analytics

Massively parallel processing (Data crunching)

Big Data in your project

Random Research says.. 99% client of yours asked for Big Data project, ended up having total paid customers less then your own fingers.

A Project hits Business scalability much much earlier then technical scalability.

Big Data for your clients

Business first - technology second

Current reality for client projects:

Use big data tools which works at small scale :-)

Design with domain in mind not the database client suggests.

Always design for read optimization in mind (the golden rule)

Big Data project for small data customers

If you can do it postgresql, then do it postgresql (the blue elephant rule)

Few important tips..

The CAP theorem- Basics of NoSQL Databases

Read a lot about design of database before using any non traditional database. Or read good negative posts to know when NOT to use it.

e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

Now... the good parts !

It's your time to speak now!

Workshop session:About practical selection of technology and design for real word use cases.

All references used in workshop reference

Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/

Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html

Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147

Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html

Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/

Good big data compatible OSS softwares : http://netflix.github.io/

Practical Hbase usage : https://www.facebook.com/UsingHbase

Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes

On-line analytics in STORM : http://hortonworks.com/hadoop/storm/

E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376

Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/

Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791

High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/

CLIQUE PARA EDITAR O FORMATO DO TEXTO DO TTULO

Clique para editar o formato do texto da estrutura de tpicos2. Nvel da estrutura de tpicos3. Nvel da estrutura de tpicos4. Nvel da estrutura de tpicos5. Nvel da estrutura de tpicos6. Nvel da estrutura de tpicos7. Nvel da estrutura de tpicos

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline Level