Assignment on Business Analytics

2
 ASSIGNMENT OF BUSINESS ANALYTICS Topic: Unstructured Data Analytics Submitted To: Submitted By: Na vpr eet Kaur Vin ay Goyal MBA-2 nd  Year (2216)

description

Business Analytics

Transcript of Assignment on Business Analytics

7/21/2019 Assignment on Business Analytics

http://slidepdf.com/reader/full/assignment-on-business-analytics 1/6

 

ASSIGNMENTOF

BUSINESS ANALYTICS

Topic: Unstructured Data Analytics

Submitted To: Submitted By:

Navpreet Kaur Vinay Goyal

MBA-2nd

 Year (2216)

7/21/2019 Assignment on Business Analytics

http://slidepdf.com/reader/full/assignment-on-business-analytics 2/6

Business analytics (BA) refers to the skills, technologies, practices for

continuous iterative exploration and investigation of past business performance togain insight and drive business planning. Business analytics focuses on developing

new insights and understanding of business performance based

on data and statistical methods. In contrast, business intelligence traditionallyfocuses on using a consistent set of metrics to both measure past performance and

guide business planning, which is also based on data and statistical methods.

Examples of BA uses include:

 

Exploring data to find new patterns and relationships (data mining) 

  Explaining why a certain result occurred (statistical analysis, quantitative

analysis)

  Experimenting to test previous decisions (A/B testing, multivariate testing)

  Forecasting future results ( predictive modeling, predictive analytics)

Unstructured Data

Unstructured data is a generic label for describing any data that is not in a

database or other type of data structure.

Unstructured data is a generic label for describing data that is not

contained in a database or some other type of  data

structure . Unstructured data can be textual or non-textual. Textual

unstructured data is generated in media like email messages, PowerPoint

 presentations, Word documents, collaboration software and instant

messages. Non-textual unstructured data is generated in media

like JPEG images, MP3 audio files and Flash video files.

If left unmanaged, the sheer volume of unstructured data that’s generated

each year within an enterprise can be costly in terms of storage. 

7/21/2019 Assignment on Business Analytics

http://slidepdf.com/reader/full/assignment-on-business-analytics 3/6

Unmanaged data can also pose a liability if information cannot be located

in the event of a compliance or lawsuit. The information contained in

unstructured data is not always easy to locate. It requires that data in both

electronic and hard copy documents and other media be scanned so a

search application can parse out concepts based on words used in specific

contexts. This is called semantic search. It is also referred to as enterprise

search. 

In customer-facing businesses, the information contained in unstructured

data can be analyzed to improve customer relationship management and

relationship marketing. As social media applications like Twitter and

Facebook go mainstream, the growth of unstructured data is expected to

far outpace the growth of structured data. According to the "IDC

Enterprise Disk Storage Consumption Model" report released in Fall 2009,

while transactional data is projected to grow at a compound annual growth

rate (CAGR) of 21.8%, it's far outpaced by a 61.7% CAGR prediction for

unstructured data.

Unstructured data (or unstructured information) refers to information that

either does not have a pre-defined data model or is not organized in a pre-

defined manner. Unstructured information is typically text-heavy, but may

contain data such as dates, numbers, and facts as well. This results in

irregularities and ambiguities that make it difficult to understand using

traditional computer programs as compared to data stored in fielded form

in databases or  annotated (semantically tagged) in documents.

Dealing with unstructured data

Techniques such as data mining, Natural Language Processing(NLP), text

analytics, and noisy-text analytics provide different methods to find patterns in, or otherwise interpret, this information. Common techniques

for structuring text usually involve manual  tagging with metadata or  part-

of-speech tagging for further  text mining- based structuring. Unstructured

Information Management Architecture (UIMA) provides a common

framework for processing this information to extract meaning and create

structured data about the information.

7/21/2019 Assignment on Business Analytics

http://slidepdf.com/reader/full/assignment-on-business-analytics 4/6

The phrase "unstructured data" usually refers to information that doesn't

reside in a traditional row-column database. As you might expect, it's the

opposite of structured data -- the data stored in fields in a database.

Unstructured data files often include text and multimedia content.

Examples include e-mail messages, word processing documents, videos,

 photos, audio files, presentations, webpages and many other kinds of

 business documents. Note that while these sorts of files may have an

internal structure, they are still considered "unstructured" because the data

they contain doesn't fit neatly in a database.

Features of “unstructured” data 

Does not reside in traditional databases and data warehouses

May have an internal structure, but does not fit a relational data model

Generated by both humans and machines

  Textual and multimedia content

  Machine-to-machine communication

Examples include

  Personal messaging –  email, instant messages, tweets, chat

  Business documents –  business reports, presentations, survey

responses

  Web content –  web pages, blogs, wikis, audio files, photos,

videos

  Sensor output –  satellite imagery, geolocation data, scanner

transactions

7/21/2019 Assignment on Business Analytics

http://slidepdf.com/reader/full/assignment-on-business-analytics 5/6

Implementing Unstructur ed Data ManagementOrganizations use of variety of different software tools to help them

organize and manage unstructured data. These can include the following: 

Big data tools: Software like Hadoop can process stores of bothunstructured and structured data that are extremely large, very complex

and changing rapidly.

  Business intelligence software: Also known as BI, this is a broad category

of analytics, data mining, dashboards and reporting tools that help

companies make sense of their structured and unstructured data for the

 purpose of making better business decisions.

  Data integration tools: These tools combine data from disparate sources so

that they can be viewed or analyzed from a single application. They

sometimes include the capability to unify structured and unstructured data.  Document management systems: Also called "enterprise content

management systems," a DMS can track, store and share unstructured data

that is saved in the form of document files.

  Information management solutions: This type of software tracks structured

and unstructured enterprise data throughout its lifecycle.

  Search and indexing tools: These tools retrieve information from

unstructured data files such as documents, Web pages and photos.

Unstructur ed Data Technology

  A group called the Organization for the Advancement of Structured

Information Standards (OASIS) has published the Unstructured Information

Management Architecture (UIMA) standard. The UIMA "defines

 platform-independent data representations and interfaces for software

components or services called analytics, which analyze unstructured

information and assign semantics to regions of that unstructured

information."

  Many industry watchers say that Hadoop has become the de facto industry

standard for managing Big Data. This open source project is managed by

the Apache Software Foundation. 

7/21/2019 Assignment on Business Analytics

http://slidepdf.com/reader/full/assignment-on-business-analytics 6/6

Unstructured Data Analysis Unstructured data represents up to 80%

of the data within an organization. You can use InfoSphere Warehouse to extract

structured information out of previously untapped business text. The business value is

immense, e.g., enabling fraud detection and better customer profiling.InfoSphere Warehouse Unstructured Data Analysis Augments Dynamic Warehouse with

the ability to extract structured information out of previously untapped business text and

correlate with Structured Data to gain business insight.

InfoSphere Warehouse Unstructured Data Analyses Design Studio tooling is targeted

towards the ETL specialist who uses text analysis in the context of a larger data

warehouse project and who is not an expert on text analysis or the UIMA framework. It

contains a basic set of functions to configure and use a fixed set of configurable analysis

engines which are shipped with the product. It also provides function to use (but not

modify) third party analysis engines that are UIMA 1.4.x compliant.