Big Data Final Presentation

74
BIG DATA Final Presentation By: Hemanth Aroumougam Friday, April 4, 14

description

This is a presentation about big data which I presented during my Internship in Fujitsu

Transcript of Big Data Final Presentation

Page 1: Big Data Final Presentation

BIG DATA Final

PresentationBy: Hemanth Aroumougam

Friday, April 4, 14

Page 2: Big Data Final Presentation

During the first generation....

Friday, April 4, 14

Page 3: Big Data Final Presentation

Employees in companies started entering data into computer systems

Friday, April 4, 14

Page 4: Big Data Final Presentation

As the second generation comes...

Friday, April 4, 14

Page 5: Big Data Final Presentation

Friday, April 4, 14

Page 6: Big Data Final Presentation

But now as generations move on there is a third one to this list

and it is...

Friday, April 4, 14

Page 7: Big Data Final Presentation

Now a days even machines are automatically entering data into

computer systems.

Friday, April 4, 14

Page 8: Big Data Final Presentation

Friday, April 4, 14

Page 9: Big Data Final Presentation

Friday, April 4, 14

Page 10: Big Data Final Presentation

BIG DATA is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Friday, April 4, 14

Page 11: Big Data Final Presentation

• Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured.

Friday, April 4, 14

Page 12: Big Data Final Presentation

Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity.

Friday, April 4, 14

Page 13: Big Data Final Presentation

In BIG DATA there are 3Vs which are the

defining properties and the dimensions of

Big Data

Friday, April 4, 14

Page 14: Big Data Final Presentation

The 3Vs are...

Friday, April 4, 14

Page 15: Big Data Final Presentation

•Volume

•Variety

•Velocity

Friday, April 4, 14

Page 16: Big Data Final Presentation

Volume- Big Volume consists of simple

SQL analytics and with complex non-SQL analytics. In other words volume refers to the

amount of data.

Friday, April 4, 14

Page 17: Big Data Final Presentation

SQL• SQL Stands for Structured Query Language.

• SQL is a standardized query language for requesting information from a database.

• SQL was first introduced as a commercial database system in 1979 by the Oracle Corporation.

• Historically, SQL has been the favorite query language for database management systems running on minicomputers and mainframes.

Friday, April 4, 14

Page 18: Big Data Final Presentation

VolumePetabyte (PB)

Terabyte (TB)

Gigabyte (GB)

Megabyte (MB)

Kilobyte (KB)

Friday, April 4, 14

Page 19: Big Data Final Presentation

Variety- Large number of diverse data sources to integrate. In other

words variety is basically referring to the number of

different types of data.

Friday, April 4, 14

Page 20: Big Data Final Presentation

VARIETYStructured Data

Unstructured Data

Semi structured Data

Friday, April 4, 14

Page 21: Big Data Final Presentation

Structured Data

• Structured Data is data that resides in a fixed field within a record or file is called structured data. This includes data contained in relational databases and spreadsheets. Structured data has the advantage of being easily entered, stored, queried and analyzed.

Friday, April 4, 14

Page 22: Big Data Final Presentation

• Library Catalogues (date, author, place, subject, etc)

• Census records (birth, income, employment, place etc.)

• Phone numbers (and the phone book)

• Economic data (GDP, PPI, ASX etc.)

• XML-TEI (bringing structure to the text through tagging particular elements like versions of the word ”canal’ in 17th C Dutch.

• Databases

• Data warehouse

• Enterprise systems (CRM, ERP, etc)

EXAMPLES OF STRUCTURED DATA

Friday, April 4, 14

Page 23: Big Data Final Presentation

Semi structured Data• Semi-structured data is a form of

structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables

Friday, April 4, 14

Page 24: Big Data Final Presentation

• Web Pages

• Information Integration

• XML

EXAMPLES OF SEMI STRUCTURED DATA

Friday, April 4, 14

Page 25: Big Data Final Presentation

Unstructured Data• Unstructured Data refers to information

that either does not have a pre-defined data model or is not organized in a predefined manner. Unstructured information is typically text-heavy. In other words unstructured data is something that is at the other end of the spectrum. It might be in any form: text, audio, video. We definitely don’t know from looking at the data what it means ,unless we apply human understanding to it.

Friday, April 4, 14

Page 26: Big Data Final Presentation

EXAMPLES OF UNSTRUCTURED DATA

• Book

• Story

• Heavy text

• audio

• video

• RSS Feeds

• Word documents

• Excel Spreadsheets

• Email messagesFriday, April 4, 14

Page 27: Big Data Final Presentation

Velocity- Velocity is basically referring to the speed in which the

data is processed.

Friday, April 4, 14

Page 28: Big Data Final Presentation

TYPES OF VELOCITYREAL TIME ANALYSIS

NEAR REAL TIME

PERIODIC

BATCH

Friday, April 4, 14

Page 29: Big Data Final Presentation

Benefits of Batch Processing.

It can shift the time of job processing to when the computing resources are less busy.

• It avoids idling the computing resources with minute-by-minute manual intervention and supervision.

• By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one.

• It allows the system to use different priorities for batch and interactive work.

• Rather than running one program multiple times to process one transaction each time, batch processes will run the program only once for many transactions, reducing system overhead.

Friday, April 4, 14

Page 30: Big Data Final Presentation

Friday, April 4, 14

Page 31: Big Data Final Presentation

Friday, April 4, 14

Page 32: Big Data Final Presentation

Friday, April 4, 14

Page 33: Big Data Final Presentation

ORACLE BIG DATA SOLUTION

• Oracle is the first vendor to offer a complete and integrated solution to address the full spectrum of enterprise big data requirements. Oracle’s big data strategy is centered on the idea that you can extend your current enterprise information architecture to incorporate big data. New big data technologies, such as Hadoop and Oracle NoSQL database, run alongside your Oracle data warehouse to deliver business value and address your big data requirements.

Friday, April 4, 14

Page 34: Big Data Final Presentation

Friday, April 4, 14

Page 35: Big Data Final Presentation

Advantages and Disadvantages of

BIG DATA

Friday, April 4, 14

Page 36: Big Data Final Presentation

ADVANTAGES• Data mining allows uses are that you can find correlations easier

• More calculated now therefore accuracy is higher

• Data is now combined into a big mass which allows for links to be found

• For example: company with decades of information can make use of Big Data and data analysis to create competitive advantages and open new business opportunities

• Started because companies have been finding it hard to manage all their data 

• Creates new growth opportunities, lots of jobs

Friday, April 4, 14

Page 37: Big Data Final Presentation

DISADVANTAGES• Big risks on security and privacy

• Challenges arise: expensive, need to spend a lot to get it working

• A lot of analyzing: uncover patterns, apply algorithms, connections relationships

• Still need specialization regarding the analysts; hard to find the right skill set

Friday, April 4, 14

Page 38: Big Data Final Presentation

BIG DATA Softwares

Friday, April 4, 14

Page 39: Big Data Final Presentation

•Hadoop- Apache Foundation

•MongoDB- Mongo, Inc

Friday, April 4, 14

Page 40: Big Data Final Presentation

• Apache Hadoop is an open source data framework for storage and large scale processing for data sets on clusters of commodity hardwares. It is licensed under the Apache License 2.0.  The Apache Hadoop framework is composed of the following modules:

• Hadoop Common – contains libraries and utilities needed by other Hadoop modules.

• Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.

• Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.

• Hadoop MapReduce – a programming model for large scale data processing.

• This is written in- JavaFriday, April 4, 14

Page 41: Big Data Final Presentation

• MongoDB is a big data software which came from the word “humongous”. MongoDB is a cross-platform document-oriented database. A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. This is classified as NoSQL.  A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.

• MarkLogic is an American Business company that makes NoSQL database.

• Language written in- C++

Friday, April 4, 14

Page 42: Big Data Final Presentation

Friday, April 4, 14

Page 43: Big Data Final Presentation

•Enterprise NoSQL Database Technology

•Best Big Data Search

•Real-time Your Hadoop

Friday, April 4, 14

Page 44: Big Data Final Presentation

Enterprise NoSQL Database Technology

• For more than a decade, MarkLogic has delivered a powerful, agile, and trusted enterprise-grade NoSQL (Not Only SQL) database that enables organizations to turn all data into valuable and actionable information. Key features include ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, government-grade security, and more.

Friday, April 4, 14

Page 45: Big Data Final Presentation

Best Big Data Research

• Search all data for more value. Bring all relevant content back to users – unstructured and structured, internal and public.

• Real-time updates. Real-time results. When documents are updated or inserted, they are available for search immediately.

• Able to query all types of data. Structured, semi-structured, and unstructured content are all supported within the same queries.

• Real-time alerts for fast response. MarkLogic has the highest performance alerting engine available, capable of running millions of custom queries on each and every change to the document repository – no polling required.

• Search you can bank on. Businesses that count on revenue through paid content search and retrieval trust MarkLogic to deliver.

MarkLogic’s scale-out, real-time platform is more than a search engine linked to a content repository – it is the most complete platform for building search-oriented applications.

Friday, April 4, 14

Page 46: Big Data Final Presentation

Real Time your Hadoop

Get more power out of Hadoop. Hadoop and MarkLogic together can allow you to tackle problems that would be difficult or impossible to address by either technology alone.

Save money by leveraging common infrastructure. Using MarkLogic and Hadoop Distributed File System (HDFS) enables common batch-processing infrastructure to be used across many different projects and applications.

Enterprise-class support for Hadoop. Our partnership with Intel provides a strong, supported platform for building secure, enterprise-class Big Data Applications with Apache Hadoop.

Seamlessly combine the power of MapReduce with MarkLogic’s real-time, interactive analysis and indexing on a single, unified platform.

Friday, April 4, 14

Page 47: Big Data Final Presentation

Friday, April 4, 14

Page 48: Big Data Final Presentation

Some points of what can you accomplish

with BIG DATA?

Friday, April 4, 14

Page 49: Big Data Final Presentation

Dialogue with Consumers• Today’s consumers are a tough nut to crack. They look around a lot

before they buy. You want to make customers to buy your products.

• Big Data allows you to profile these increasingly vocal and fickle little ‘tyrants’ in a far-reaching manner so that you can engage in an almost one-on-one, real-time conversation with them. This is not actually a luxury. If you don’t treat them like they want to, they will leave you in the blink of an eye.

Friday, April 4, 14

Page 50: Big Data Final Presentation

Re-develop your Products• Big Data can also help you understand how others perceive your

products so that you can adapt them.

• Analysis of unstructured social media text allows you to uncover the sentiments of your customers and even segment those in different geographical locations or among different demographic groups.

Friday, April 4, 14

Page 51: Big Data Final Presentation

Perform Risk Analysis• Success not only depends on how you run your company. Social and

economic factors are crucial for your accomplishments as well.  Predictive analytics, fueled by Big Data allows you to scan and analyze newspaper reports or social media feeds so that you permanently keep up to speed on the latest developments in your industry and its environment.

• Detailed health-tests on your suppliers and customers are another goodie that comes with Big Data. This will allow you to take action when one of them is in risk of defaulting.

Friday, April 4, 14

Page 52: Big Data Final Presentation

Keeping your data safe• You can map the entire data landscape across your company with

Big Data tools, thus allowing you to analyze the threats that you face internally.

• You will be able to detect potentially sensitive information that is not protected in an appropriate manner and make sure it is stored according to regulatory requirements.

Friday, April 4, 14

Page 53: Big Data Final Presentation

Friday, April 4, 14

Page 54: Big Data Final Presentation

Where they use BIG DATA and How?

Friday, April 4, 14

Page 55: Big Data Final Presentation

Big Data is used in many fields like....

Friday, April 4, 14

Page 56: Big Data Final Presentation

• Fault Logging and cost predictions- Car makers place hundreds of sensors on components around the car which constantly log data on performance and faults. All of this data can be used to reengineer designs for more efficient products and to predict what the strain of warranty repairs are likely to be on cost and man resource.

Car Makers

Friday, April 4, 14

Page 57: Big Data Final Presentation

Friday, April 4, 14

Page 58: Big Data Final Presentation

WHERE From Factories and from sensors

Data Center(Headquarters)

NEEDS Safety and Quality Analysis

BENEFITS Feedback from Design

TOYOTA

Friday, April 4, 14

Page 59: Big Data Final Presentation

• B2B supplier profiling- Finance professionals can use big data to check on the ‘health’ of their suppliers and business partners. They can monitor a variety of indicators including when creditors pay their bills and whether there is any change

• Fraud detection-Companies like Visa are using big data to create fraud detection models which can flag up potential fraudsters.

Finance

Friday, April 4, 14

Page 60: Big Data Final Presentation

WHERE Where ever they buy

Data Center(Headquarters)

NEEDS Detect Fraud, Customer’s Behavior

BENEFITS Personal Recommendation

VISA

Friday, April 4, 14

Page 61: Big Data Final Presentation

•  Simulations- Manufacturers can take real data from their products on the market and then run simulations based on what would happen if they changed one particular component or design aspect. They can then find ways to make the product cheaper, more reliable or more environmentally friendly. The Formula 1 racing teams are particularly adept in this area, as are advanced aerospace companies.

•  Expanded product design modeling- Similarly, with new big-data enabled computer aided design programs, product designers can substitute components or materials from huge databases and then access in-depth information on how this affects the final product, including the ramifications on cost, production processes, environmental effects, legislative requirements, supply chain and so on. 

General Manufacturing

Friday, April 4, 14

Page 62: Big Data Final Presentation

Friday, April 4, 14

Page 63: Big Data Final Presentation

WHERE Several Branches

Data Center(GM Headquarters in Gurgaon )

NEEDS Safety and Quality Analysis.

BENEFITS Awareness and Indication on what to fix.

GM

Friday, April 4, 14

Page 64: Big Data Final Presentation

•  Suspect tracking- By combining CCTV images, facial recognition software, travel trends and identifiers on travel cards, police forces can capture criminals by automatically linking people to their likely destinations on buses and metro systems. This allows police to catch those that they miss at the scene of the crime and also to control arrest statistics, meeting targets for arrests in one London borough, for instance, as needed.

Policing

Friday, April 4, 14

Page 65: Big Data Final Presentation

Friday, April 4, 14

Page 66: Big Data Final Presentation

WHERE Several Branches

Data Center(CBI Headquarters in Delhi)

NEEDS To identify person’s behavior and actions

BENEFITSGive awareness for what that person is going to do next. What is their next plan?

CBI

Friday, April 4, 14

Page 67: Big Data Final Presentation

Utilities (oil & gas)• Asset monitoring- As with the machines in manufacturing

plants, the utilities companies use big data to keep track on all of their assets spread across a country, continent or the globe. This enables them to fix any broken asset (such as a sewage cleansing plant, a leaking pipe or a gas pump), perform pre-emptive running maintenance or isolate areas in which repair actions have been ineffective.

Friday, April 4, 14

Page 68: Big Data Final Presentation

Friday, April 4, 14

Page 69: Big Data Final Presentation

WHERE From the Machines in the Manufacturing plantsData Center(ChevronHeadquarters)

NEEDSTo keep track of what is going on in the

Manufacturing plant. Like broken pipes, leakage and etc...

BENEFITSThis gives them feedback from designs so they know how to improve the construction of the manufacturing plant because that is their main source of how they get oil and gas.

CHEVRON

Friday, April 4, 14

Page 70: Big Data Final Presentation

Retail and Marketing• Mood mapping- Retailers use feeds from social networks to

build an understanding of how their products and company reputation is seen among the public. With the constant streams of opinions from Facebook, Twitter, Google+ and the like, companies are able to cheaply and quickly gather large samples of customer opinion.

Friday, April 4, 14

Page 71: Big Data Final Presentation

Friday, April 4, 14

Page 72: Big Data Final Presentation

Friday, April 4, 14

Page 73: Big Data Final Presentation

WHERE From Social Media Networking Sites Data Center(Air Jordan Headquarters)

NEEDS Customer’s behavior, helps to find out opinions and feelings, feedback of their brand.

BENEFITS This gives them feedback on what the customers are thinking about their product. Gives feedback from audiences to improve their product.

Air Jordan

Friday, April 4, 14

Page 74: Big Data Final Presentation

THANK YOU !!!

Friday, April 4, 14