Unstructured BI in pharmaceutical company

K6221 Business Intelligence Mini Assignment

K6221 Business Intelligence 2011-2012Mini Assignment

Sesagiri Raamkumar Aravind (G1101761F)

Mane Shivaji Dilip Kumar (G1101841A)

“Enterprises today have access to large amounts of information from internal as well as external sources.

The information typically comes in either structured or less structured forms. However, enterprises

generally do not make the best use of the information they have access to, tending instead to focus on just

internal structured data generated by core transactional systems.”

Statement Elucidation

As per the problem statement, even though enterprises have access to plethora of required information around them,

they make good use of the data coming from traditional OLTP systems only and it is restricted to structured content.

Internal and external unstructured data is not leveraged for making business decisions. Wittles (n.d.) asserts that

only 20% of an organization’s data is structured and ready for use in BI data analysis. The remaining 80% is

unstructured data. Therefore, the significance of unstructured data is highly underestimated in most enterprises.

Scenario

The authors opt to critically discuss the problem statement based on a particular scenario. The scenario is

“‘Marketing Director’ of a major pharmaceutical company monitoring the performance of a newly launched

potential blockbuster drug in the Asia Pacific region (excluding Japan).”

Discussion

Large enterprises of today rely on enormous and complicated information systems to fuel their growth and help with

their daily operations and sustainability. The amount spent on such systems even reaches billions in certain

companies. In our scenario, pharmaceutical companies inadvertently rely on unstructured data for leading the race

against competitors as studies show that the average company makes decisions based on data that is 14 months old.

It has become clear that companies that can make faster decisions will spearhead that particular market. Strategic

adoption of the IT systems is very critical as it has direct impact to the process of research , development and sales of

drugs (Dave). Enterprises have reached a stable stage with respect to the setup of BI infrastructure that can handle

internal data extracted from different sources such as ERP and CRM systems. Enterprise data warehouses are

updated on a daily basis with transactional data coming from different regions. Data from EDW cascades to

region/domain specific data marts and ODS so as to meet local reporting needs. In totality, EDW provides a good

canvas for supporting transactional and historical reporting needs of MIS, ESS and DSS systems.

A product launch is a major make or break event for a pharmaceutical company as it feels the push to realize

revenue generation through short term and long term strategies so as to fund further R&D activities. A marketing

director cannot afford to rely entirely on transactional data for making sound business decisions. These decisions are

made to increase visibility and saleability of the new drug in a particular market. As a part of the job, the marketing

director would be expecting to get information about different aspects. The table 1.1 provides the details

of 5


Sl.No Information Source Type

Readily Available Remarks

1Sales of drug in each market (split-up by day, region, distributor etc) Internal Structured Y

Assumption that internal DSS has data from all markets at required frequency

2Marketing Cost in each market (by media)- this includes free samples Internal Structured Y

Assumption that internal DSS is integrated with CRM systems

3

Perception about the drug from Doctors, Sales Personnel, Marketing staff, other internal staff and general public

Internal and External Unstructured N

Can be got only after collation from different sources

4

Market Share of new drug by value and volume by each market on comparison to other competitor drugs from same therapeutic area External Structured N

Can be got at end of every quarter from market intelligence firms such as IMS

5Actuals vs Budget and Actuals vs Forecast comparison by each market. Internal Structured Y

Assumption that internal DSS has data from all markets at required frequency

6Details about dept level decisions recorded in documents Internal Unstructured Y and N

'Y' because Readily available in repository and 'N' because not in integrated state

Table 1.1: Valuable information for pharmaceutical company during drug launch

It is clear that information about some important aspects is of unstructured format. Examples of unstructured data in

an enterprise are HTML content (e.g. web chat, blogs and web pages), Documents (e.g. memos, research papers,

MoMs and articles), Forms (e.g. patent applications), Emails, SMS content and Multimedia content (audio, video,

images) (Ferguson,2011; McCallum, 2005; SPSS, 2003).

Decision makers in a company have to rely on facts to make sound business decisions. The availability of sufficient

and timely facts can help in the process. In this case, the Marketing Director should be able to pull the required data

and the system should have the mechanism to push specific information as well. A distinction is made between data

and information because only information should be pushed to a user as he/she will not have time to analyze plain

facts without any context. Typical examples applicable to this case are listed below.

Pull data: Sales & Expenses data, Market share, and Supply chain inventory data.

Push information: Supply chain deficiencies, summarized content delivery from analytics systems pertaining to

sentiment and opinion about the new drug from internal and external social media platforms, flash updates on sales,

libel cases on new drug from FDA and other sources.

The Push type of information is mostly of unstructured format thereby justifying its importance. Unstructured data

characteristics are visibly and intrinsically different from transactional data. Differentiating factors are mainly

related to representation, source, context, understandability, timeliness and shelf-life. In general, characteristics of

unstructured data are:-

of 5


Does not reside in relational database tables.

Has no predefined structure or format.

Not arranged in any order.

Difficult to categorize for use in BI.

Resides in several documents over multiple sources

Internal (data within an organization)

External (data outside the organization)

These characteristics make it difficult for technical personnel to store and catalog unstructured data in an Enterprise

Data Warehouse (EDW) apart from the inherent difficulty in capturing required data. The heterogeneous nature of

the sources adds to the complexity. Typical sources for unstructured data include Email archives, Call center

transcripts, Customer feedback databases, Enterprise intranets, Enterprise content management systems, File

systems, Document management systems, Social networking sites and RSS Newsfeeds (Ferguson 2011:6).

There are techniques for unstructured data to be captured and utilized. Crawlers can be used for capturing relevant

information from enterprise data ecosystem, social media sites and WWW. The captured information is then tagged

and indexed for retrieval purpose. The final stage is the knowledge discovery stage that involves text mining and

web mining (popularly called as content analytics), to derive insight for business benefits.

An ideal BI system should provide the ability to create Enterprise Mashups. Mashups are used to integrate

information sources and functionality from different sources to create new services. These kinds of applications are

more suitable for agile development project thereby suitable to our scenario to look at data from different sources

that help in making decisions. However, there are few challenges to it. Choosing the right information sources

amongst unstructured data and content sifting mechanisms are some known challenges. Mashups are an emerging

trend that is there to stay as it provides a one-stop shop for decision makers.

Future considerations for handling unstructured data

Ensuring that user content is accurately tagged.

Ensure that content is up-to-date and relevant.

Validating content sources.

Identify business drivers to get the best solution.

For scalability issues allocate adequate processing power to analytics.

Figure 1 gives a pictorial representation of the current usage of BI in pharmaceutical companies and the neglected

blue ocean segment of unstructured data BI.

of 5


Fig 1: Usage of Business Intelligence in a pharmaceutical company

Conclusion

Enterprises are aware of the importance of unstructured data in current day scenario but they fail to leverage it due

to technical (capturing and storing) and logical (classification and integration) constraints. This situation is bound to

improve with best practices and simpler technical processes. Investment in Content Analytics and Enterprise

Mashups will definitely be realized in the long run.

References

Wittles, G. (n.d.). Unstructured data offers a vast store of untapped BI value . Retrieved from

http://www.themanager.org/strategy/Unstructured_data.htm (Wittles)

Dave , W. (n.d.). Unstructured data in life sciences. Retrieved from

http://blogs.hds.com/storagestat/2011/11/unstructured-data-in-life-sciences.html (Dave)

Ferguson, M. (n.d.). Integrating and analyzing unstructured data. Info 360 BI Conference. Washington DC.

(Ferguson, 2011)

McCallum, A. 2005. Information Extraction. (http://people.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf )

Retrieved 17 February 2011. (McCallum, 2005)

SPSS. 2003. Meeting the challenge for text: Making text ready for predictive analysis. Chicago (SPSS, 2003)

of 5

http://people.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf

http://blogs.hds.com/storagestat/2011/11/unstructured-data-in-life-sciences.html


Grimes, S. (n.d.). Nimble intelligence: Enterprise bi mashup best practices. Retrieved from

http://www.jackbe.com/downloads/nimblebi_grimes.pdf (Grimes)

of 5

http://www.jackbe.com/downloads/nimblebi_grimes.pdf

Unstructured BI in pharmaceutical company

Technology

Transcript of Unstructured BI in pharmaceutical company