The concept of Datalake with Hadoop

The concept of Data Lake: Data processing in Hadoop

avkash@bigdataperspective.com@avkashchauhan

Agenda this hour:

Data lake Concept through live examplesData Processing in Hadoop

Data lake or data graveyard……

Data Ware Houses and Databases

Database

Concerns with existing approach

Significant time investment with the data prototypingCost of data storage and repetitive processingTime to get value out of data varies on several factorsSeveral Point Solutions to process data differentlyReal users do not have access to data immediately and when they do it is subset of the real dataKnowledge Gap between those who wants to use data and those who actually process the dataOne team (IT/CO)managing every ones need cause significant bottlenecks

Concept of Data Lake with HDFS

Data Map

Technical Definition of Data Lake

Store all the information without any modificationData is stored without consideration of its typeSeparate team develops their own point solutions and manage their own solutions.Security and governance is applied at data access or other critical points onlyIT or Cooperate views the data same as any other teamUnstructured information is still informationDepending on demands, scaling up or down is possibleData redundancy keeps guaranteed data availability

Hadoop is the Answer……

How Hadoop is the answer?

Hadoop has HDFS as Data processing layer to store unstructured data HDFS provides fast and reliable data access to applicationsApplications designed to support LOB can process data in disk or in-memory which accessing full volume of data in real-time

HDFS: Quick Intro….

Data processing in Hadoop

Unstructured data processing through MapReduce Programming ParadigmData is stored as regular file within supported file format If a file format is not supported yet, it can be made supported programmaticallyData is accessed through any kind of program Once data is read, program can choose any way to process the data within MapReduce frameworkResult data is stored back into HDFS for later usage

What is HDFS?

HDFS is a fault tolerant and self-healing distributed file system designed to turn a cluster of industry standard servers into a massively scalable pool of storage.Developed specifically for large-scale data processing workloads where scalability, flexibility and throughput are critical.HDFS accepts data in any format regardless of schema, optimizes for high bandwidth streaming, and scales to proven deployments of 100PB and beyond.

Conclusion:

Concept of Data lake is designed to use Hadoop as one single point of data storage system available for instant data processing by individual team at any given timeIt is designed to:

Reduce engineering overheadProvider faster access of data to real usersRemove repetitive processing

Thanks….

The concept of Datalake with Hadoop

Technology

Transcript of The concept of Datalake with Hadoop

Configuración para Hadoop Configuración de WPS para Hadoop · Configuración para Hadoop Versión 4.2 Introducción ¿Qué es Hadoop? Hadoop esun marco de trabajo del software de

Hadoop Installation Guide | Hadoop Configuration

2. Hadoop - lsd.ls.fi.upm.eslsd.ls.fi.upm.es/nuevas-tendencias-en-sistemas-distribuidos/Hadoop_… · Hadoop Hadoop Software Ecosystem Hadoop MapReduce Hadoop Distributed File System

Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.

Apache Hadoop and Hive. Outline Architecture of Hadoop Distributed File System Hadoop usage at Facebook Ideas for Hadoop related research.

Why use Hadoop?, Challenges / Learning Hadoop & Average Salary of Hadoop Professional

Datalake and the rise of the microservices - Meetupfiles.meetup.com/4533812/Datalake and the rise of... · Data Lake and the rise of the Microservices. About Me ... • Orchestration

[Hadoop] Terapot: Massive Email Archiving with Hadoop

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)

Hadoop Online Tutorials - indiatrainings.in · Menu Search Hadoop Online Tutorials Author REPLY #1825 Hadoop Eco System › Forums › Hadoop Discussion Forum › 250 Hadoop Interview

Deploying MongoDB and Hadoop to Amazon Web Services · Deploying MongoDB and Hadoop to Amazon Web Services After performing a proof of concept on a single‐node cluster, it’s time

MapReduce Programming with Apache Hadoop - DSTdst.lbl.gov/ACSDownloads/kjackson/downloads/Hadoop-HDFS8-12pm.… · MapReduce Programming with Apache Hadoop Viraj Bhat ... (hadoop,

PROFESSIONAL HADOOP® SOLUTIONS - Startseite€¦ · The Hadoop Ecosystem 7 Hadoop Core Components 7 Hadoop Distributions 10 Developing Enterprise Applications with Hadoop 12 Summary

Hadoop Present - Open Enterprise Hadoop

Hadoop 1.0 vs Hadoop 2.0

Global Azure Cloud Camp Bogota Introduccion Azure datalake

Hue: The Hadoop UI - Hadoop Singapore

250 Hadoop Interview Questions and Answers for Experienced Hadoop Developers - Hadoop Online Tutorials

· (Page views ? Hourly? Monthly Hadoop Node Hadoop Node Hadoop Camus Node Hadoop Node Hadoop Node Hadoop Node Hadoop Node Hadoop Node Ad-Hoc Analysis External Datastores Trends

Securing Hadoop: Security Recommendations for Hadoop ...