Big Data Tools

PowerPoint Presentation

Big Data toolsHbaseHIVEZookeeperPigHadoop Random access DatabasesApplications such as HBase, Cassandra, ccouchDB, Dynamo, and MongoDB are some of the databases that store huge amounts of data and access the data in a random manner.

Hbase :-Hbase is a distributed column-oriented database built on top of the Hadoop file system.It is an open-source project and is horizontally scalable.It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the hadoop File System.Hbase Architecture :-One can store the data in HDFS either directly or through Hbase.

Hbase has three major components:Client libraryMaster serverRegion serversRegion servers can be added or removed as per requirment.

Components :-Master ServerAssigns regions to the region servers and takes the help of Apache ZooKeeper for this task Handles load balancing of the regions across region servers. It unloads the busy servers and shifts the regions to less occupied servers.Is responsible for schema changes and other metadata operations such as creation of tables and column families.

RegionsRegions are nothing but tables that are split up and spread across the region servers.Region server Communicate with the client and handle data-related operations.Handle read and write requests for all the regions under it.Zookeeper:- Zookeeper is an open-source project that provides services like maintaining configuration information, naming, providing distributed synchronization, etc.Clients communicate with region servers via zookeeper.

HIVE Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

Features of HiveIt stores schema in a database and processed data into HDFS.It is designed for OLAPIt provides SQL type language for querying called HiveQL or HQL.It is familiar, fast, scalable, and extensible.PIGPigwas initially developed at Yahoo! to allow people using Hadoop to focus more on analyzing large data sets and spend less time having to write mapper and reducer programs

Pig componentsLanguage Which is called PigLatinRuntime EnvironmentWhere PigLatin programs are executed.Think of the relationship between a java Virtual Machine(JVM) and Java application.

The Programming LanguageThe first step in a Pig program is toLOADthe data you want to manipulate from HDFS.Then you run the data through a set oftransformations.Finally, youDUMPthe data to the screen or youSTOREthe results in a file somewhere.

Big Data Tools

Documents

Transcript of Big Data Tools