Final Report (Phase 5 and 6)

31
University of Indianapolis A Friend in Rome Data Analysis CIS 351-01 - Information Systems Management & Unstructured Data

Transcript of Final Report (Phase 5 and 6)

University of Indianapolis

A Friend in Rome Data Analysis

CIS 351-01 - Information Systems Management & Unstructured Data

Carla Sikacho

Jeffrey Gardner

Justin Jones

A Friend in Rome Data Analysis 1

Table of Contents

Page(s)Business Objective...……………………………………………………………..………………..2Analytics Objectives...……………...…………………………………………………………...2-3Key Variables …..……………………………………………………………………….……..3Data Collection and Quality……………………………………………………………………….4Competence Test...………………………………………………………………………………...4Results………………..…………………..…………………………………………………....5-20

Overview of all the Data…………………..…………..……...………………………...…5Diagram O1…………..……….………...……………………………………....…5Diagram O2………………...…………………………………………………...6-7

Caption of Feedback.….…….……...……………………………………………………..7Demographics..…………….……………………………………………………………...8Reviews Concepts Tree…………...……………………...…………………...……...…....9

Positive Concept Tree……………………...…………………………………….10Complete Concepts Tree...……………………………………………………….11

Negative Concepts……………………………………………………………………….12Negative Concepts……………………………………………………………………12-13

Diagram N1……………………………………………………………………....12Diagram N2……………………………………………………………………....12

Positive Concepts……………………………………………………………..…...…14-15Diagram P1……………………………………………………………………....14Diagram P2……………………………………………………………………....14

Unknown Concepts…..……………………………………………………………….16-17Diagram U1………………………………………………………………..…….16Diagram U2………………………………………………………………..…….16

Country of Origin Distribution.………….…………………………………………...18-19Diagram C1…………………..…………………………………………………..18Diagram C2…………………….………………………………………………...18

Reflections……………………………………………………………………………………….20Recommendations………..……..……………………………………………………………20-21References………………...……………………………………………………………………...22

A Friend in Rome Data Analysis 2

Business Objective

Our primary objective is to give A Friend in Rome feedback that will enable it to provide

tourists with a genuine and unparalleled touring experience when they visit Rome. This will

inspire popularity for the company and ultimately, will make it more profitable. We will look at

the data provided to us on TripAdvisor, which includes both textual and numerical data. Through

text, data, and predictive analytics we aim to identify areas in which customers are both satisfied

and dissatisfied. In addition we are aiming to acknowledge the most popular destinations, and

from which countries the travelers are coming from. By establishing where the company is both

succeeding and encountering drawbacks we can recommend ways in which the company can

better itself moving forward; making A Friend in Rome the premiere choice for all travelers. As

requested by Silvia from A Friend in Rome we will also be seeking ways to entice returning

visitors to Rome into using the services provided by A Friend in Rome. Their current clients are

primarily first time visitors and she would like to have more second time visitors using their

service.

Analytics Objective

Through the use of text mining we will be able to look at the reviews on TripAdvisor and

look for trends in customer satisfaction and dissatisfaction. We will be able to see where the

majority or minority of A Friend in Rome’s past customers have been coming from. Through the

use of data mining we will be able to identify the reviews that are most reliable by identifying the

quality of the reviewer based on their reviewing history. We will also be able to see how A

Friend in Rome has been doing on the 5-star scale that is used to evaluate companies on

TripAdvisor. We will be using predictive analysis to determine if there are popular times of the

year for people to use the services provided by A Friend in Rome. We can also use the data to

A Friend in Rome Data Analysis 3

determine where the majority of A Friend in Rome’s customers are coming from and with this

we can predict where they will likely continue to come from.

Key Variables in Our Analysis

Rating: This section is a rating based on number of circles. The scale is out of 5.

1 - Terrible.

2 - Poor

3 - Average

4 - Very Good

5 - Excellent

This is key because it will give us a score for how A Friend in Rome did. Based off of this we

can then look at the review to find keys words and basically the reason/s why the reviewer put

the rating that they did. A Friend in Rome can then use this information to tailor their future

tours and advertising in such a way that will hopefully attract more customers.

Country: This section will identify what country the reviewer is from. This is key because it will

give us an idea of where the majority of A Friend in Rome’s customers are coming from. With

this information we can then set up marketing scheme’s that will be tailored to specific areas.

Review: This section contains the elaborated description of the reviewers experience with a

Friend in Rome. This is key because it will tell us what exactly was great or poor about the

reviewers experience with A Friend in Rome. This may actually be our most important section

because it will identify specifically what the strengths of A Friend in Rome are so they can

continue and enhance those aspects. It will also identify the ways in which A Friend in Rome can

improve.

A Friend in Rome Data Analysis 4

Data Collection and Quality

All of the data we have used in our analysis has come from the reviews of A Friend in

Rome on TripAdvisor. This data by was manually extracted and exported into an Excel

spreadsheet. From this data we collected, there was an array of different variables did not end up

being as useful as we had originally anticipated. These are reasons we have focused our analysis

on the key variables identified on page 3 of this report. We were provided access to A Friend in

Rome’s Facebook profile, however, the data there was primarily their own posts and that did not

give us any insight into their customers. Thus, we only used the information provided to us on

TripAdvisor.

Competence Test

For our text analytics, we have taken the time to look over the positives and negatives to

confirm that they were in fact positives and negatives. We discovered that some of the results

IBM SPSS Modeler was associating as negatives were not actually pertaining to the performance

of A Friend in Rome. Just under half of our data was reported to be unknown. This is because

much of the substance of the reviews were neither positive nor negative but were filler words

that were not of great value to our analysis.

A Friend in Rome Data Analysis 5

Results

Overview of All the Data:

Diagram O1

Diagram O1 depicts the underlying terms of selected concepts. The system has selected

75 of concepts for scoring out of a total of 215. This is based on the preselected dictionary in an

early stage of this analysis. As shown, the word excellent was identified 93 times in 5.9% of the

documents. This same word is reflective of having occurred in 84% percent of documents in

which positive concepts were considered.

A Friend in Rome Data Analysis 6

Diagram O2

Diagram O2 is an interactive workbench that IBM SPSS Modeler produces to allow data

to be displayed for understanding. There are four segments in this diagram, we first will focus

on the top left portion. This unfolds the scope of the data provided. The data contains concepts

framed into categories around family, trips, and Southern Europe. Next, the workbench displays

in more details of the categories and offers this data in a category web. You will notice 26

documents mention Southern Europe and trip or guide, 18 noting family or related to family

structure, and another 14 documents take up the word trip.

To move to the lower bottom half of Diagram O2, we can discuss the 17 types of

associations; first to point out the importance of the “unknown” or words without any meaning to

this interpretation of the data. This, in itself is pertinent and will have additional discovers

related to the data. 57% of the entire data contained words such as friend, husband, and time

having no significance in this analysis. Positive content made up 34% of the entire dataset and

words and phrases such as recommended, pleasure, and the best help determine absolute

interpretation. It is also acknowledgeable IBM SPSS Modeler recognizes words such as Rome

A Friend in Rome Data Analysis 7

and Vatican, and associates these terms with location. Location makes up 16% of the entire

dataset. Most importantly, the software take negative implementation and associates them with

words and word phrases like difficult and too late. This terms are not recognized in the same

context used in the review and produces a false negative. These occur only in 4% of the entire

dataset.

Caption of Who Provided Feedback:

This is a graph board of who provided the feedback our team analyzed. As you may

notice, the reviewer status of Jr has provided the most reviews. This section identifies the status

of the reviewer. Jr status indicates that they have provided less than three reviews. Persons with

no reviewer status is very minimum.

A Friend in Rome Data Analysis 8

Demographics:

This is a visual representation of the gender ratio who provided an assessment of A

Friend in Rome. There were over 70 women who comment on the services performed and about

30 men. Nearly 60 people did not provide their gender.

A Friend in Rome Data Analysis 9

Reviews - Original concepts excellent tree:

This graphic reflects how IBM SPSS Modeler defined what it considered to be positive

concepts. Based on concepts around customer satisfaction, the software successfully extracted

words and phrases such as helpful, like, would recommend, and excellent. It found these words

to be similar and associated them with positive implementation. It also associated words like

tour, Rome, and Vatican with location. Leaving many combinations of the information such as

history, guide, and day having no significance in this particular analytic process.

A Friend in Rome Data Analysis 10

Reviews - Positive concept excellent`s tree map:

The decision tree above describes the words used to determine a positive correlation and

also as a premised of positive concepts for preparations of this report. Excellent being the

strongest positive competence and word such as would recommend, helpful and like have a

correlation.

A Friend in Rome Data Analysis 11

Reviews - Complete concepts tree:

This graphic reflects how IBM SPSS Modeler defined what it considered to be negative

concepts. Based on concepts around customer satisfaction, the software acknowledges a single

word having a negative connotation; wrong. Our team unveils that this negative is taken out of

context.

A Friend in Rome Data Analysis 12

Reviews - Negative concepts:

Diagram N1

Diagram N2

The graphics in red in Diagram N1, shows the negative concepts and the small amount of

content within this dataset having a negative implementation. IBM SPSS Modeler identified 368

concepts, but these words do not reflect the performance of A Friend in Rome, rather they are

A Friend in Rome Data Analysis 13

used to describe other events leading to their traveler’s experience. Our team kept these word

true to the context in which they were used. The highlighted word “wrong” is actually being

associated with “mistake” and in this particular instance, the “mistake” was the tourist not hiring

A Friend in Rome for the entire visit. We understand that these are the types of customer Silva

desires to use her services. This is evident in Diagram N2

A Friend in Rome Data Analysis 14

Reviews - Positive concepts:

Diagram P1

Diagram P2

A Friend in Rome Data Analysis 15

The graphics displayed in green in Diagram P1, shows the positive concepts and show

amount of content within this dataset having an unambiguously positive implementation. IBM

SPSS Modeler identified 158 concepts, and with these words and phrases being directly

reflective of the performance of A Friend in Rome. Our group was able to keep these word true

to the context in which they were used. The highlighted word “would recommend” is actually

being associated with a tourist stating how they would most definitely recommend Silvia, in

particular, to avoid group tours and “experience the sites of Rome with an articulate, amicable

and informed guide.” Again, identifying desired experience that A Friend in Rome offers. This

is axiomatic in Diagram P2

A Friend in Rome Data Analysis 16

Reviews - Unknown concepts:

Diagram U1

Diagram U2

The graphics in gold in Diagram U1, shows the unknown concepts and the much larger

amount of content within this dataset having an unknown implementation. IBM SPSS Modeler

identified 89 concepts, but these words cannot determine causation in relation to the performance

of A Friend in Rome; rather they are inconclusive in describing the traveler’s experience. Our

A Friend in Rome Data Analysis 17

team kept these word true to the context in which they were used. The highlighted word “tour”

is actually being associated with, in this particular instance, the reviewer mentioning that Silvia

was “courteous and prompt with responses, and the value of the tours she offered blew everyone

else away.” We understand that these are the responses can imply favorable to A Friend in

Rome, but IBM SPSS Modeler classifies this as unknown. This is made transparent in Diagram

U2.

A Friend in Rome Data Analysis 18

Country of Origin Distribution:

Diagram C1

Diagram C2

A Friend in Rome Data Analysis 19

The graphs above were produced in IBM Cognos; Diagram C, and IBM SPSS Modeler;

Diagram C2. When we looked at what countries A Friend in Rome’s customers are coming

from. Both softwares were able to identify just over 50% of the people who wrote reviews on

TripAdvisor came from the United States of America. This information is very useful as A

Friend in Rome will now be able to design their marketing in a fashion that will appeal to

Americans in order to increase their clientele. They could also gear their marketing to places

where Americans typically vacate. An example of this would be a terminal in the airport where

flights from America arrive. A Friend in Rome could also reevaluate their current marketing

scheme to explore reasons why they aren’t getting more customers from other countries. The

blue section in Diagram C2 provides without a country name is the percentage of reviewers who

did not provide a country of residence.

A Friend in Rome Data Analysis 20

Reflection

We found success with our text mining as it provided us with lots of positive feedback. It

is apparent that A Friend in Rome has done a great job in servicing the customers they have had

thus far. This is encouraging, however, it would have been nice if some of the reviews had

provided constructive feedback instead of constant compliments. We were also successful with

our data analysis, particularly pertaining to the countries. It provided us with very useful trends

for recommending marketing strategies for A Friend in Rome. Unfortunately, we didn’t have the

data needed to discover ways in which A Friend in Rome can acquire repeat customers. In order

to do the predictive analysis we would have needed to have data on customers who have already

been repeat customers and then we could look at those customers specifically and found trends

that lead to them becoming a repeat customer. Thus, we did not completely achieve our

objective, however, we did make some intriguing discoveries.

Recommendations for Future Data Collection

We would recommend collecting as much data about your customers as you can while

you interact with them. After analyzing your Facebook page, it offered little in regards to

feedback from previous clients. One possible avenue for collecting data as TripAdvisor has done

would be to develop a mobile application for Android and IPhone devices for A Friend in Rome.

This could create an incredible platform to track and monitor users; advertising based directly to

their activities retrieved from this data. As all of the data we collected came from TripAdvisor it

won’t reflect the entirety of your customers. Only certain types of people use and write reviews

on websites like TripAdvisor. As such, collecting your own data first hand would make for

higher quality and more encompassing data. Examples of data that would be useful to know are

A Friend in Rome Data Analysis 21

things like: where exactly your customers are coming from, where they are staying while in

Rome, the purpose of their travels, and their age. These are just a few details that would be great

benefit to future analysis of A Friend in Rome.

A Friend in Rome Data Analysis 22

References

"Cognos Insight." IBM -. Web. 26 Apr. 2015.

<http://www03.ibm.com/software/products/en/cogninsi>.

"IBM SPSS Modeler Features." IBM SPSS Modeler Features. Web. 26 Apr. 2015. <http://www-

01.ibm.com/software/analytics/spss/products/modeler/features.html>.