Hyatt Hotel Group Project

Page | 1IST 659 PROJECT IMPLEMENTATION REPORT: GROUP-1

Project Implementation Report:

Net Promoter Score (NPS) Analysis

Hyatt Hotel GroupBy

Group-1ERIK BEBERNES

AVINASH BHAMBANI

WANER LI

APURVA PATIL

SHRADDHA RAO

RITWICK CHATTERJEE

SEEPANA MOHIT RAO


Table of Contents Introduction….................................................................................... 3

Major Data Questions, Methods & Justifications……………....... 4

Results, Interpretation and Recommendations…………………...7

Other Eminent Analysis Performed……………………………....14

Reflections…………………………………………………………..43


INTRODUCTION:The purpose of this project is to conduct analysis on Grand Hyatt Hotel data set and come up with reliable

and actionable insights for the Hotel representatives. This data set also included the NPS (Net Promoter

Score) values for different hotels in different locations all over the world. NPS score measures “customer

experience and predicts business growth.” This metric has changed the business and serves the purpose of

gauging customer satisfaction.

The NPS calculation involves categorizing customer reviews in 3 categories namely; Promoters, Passives

and Detractors.

● Promoters usually give a likelihood to recommend score of 9-10 and are loyal enthusiasts

who will keep coming to the hotel and refer others, promote business growth.

● Passives usually give a score of 7-8 and are satisfied but unenthusiastic customers who may be

looking for better offerings.

● Detractors usually give a score of 0-6 and unsatisfied by the hotel services. These customers are

harmful for business as they would give a negative publicity to the hotel and thus will sway

potential customers away.

The data set had 13 months’ data divided month wise, from Feb 2014 to Jan 2015. Looking at the size and

expanse of the data set, we focused on only February 2014 data. Though it may be a small sample size for

conducting analysis, we had to stick it due to the time constraints of the course. Being a team with 7

members, we had frequent team meetings which revolved around topics such major data questions, data

cleansing, grouping columns, model feasibility, visualizations and coming up with actionable insights. The

first step was coming up with data questions and we completed this task by goings through the purpose of

the project again and again. And thus, all of us came up with questions which need to answered in order to

get actionable insights. All of the data questions were put together and any repetition or non-relevance was

removed from the data questions. After this step, everyone chose 2-3 data questions and performed analysis

to answer/ solve these problems. Due to the fact that we were 7 team members we further divided into

groups of 2-3 in order to deal with the relevant data question and performing analysis. Once everyone was

done with their analysis, we met again in order to understand the methods adopted and discussing further

possibilities on other ways to analyze the data. The next step was to incorporate all relevant improvements,

and coming with most reliable way to analyze the data. Apart from that we also shifted our attention to data

visualization, in order to represent the data with respect to Hyatt officials. These graphs & charts included

geographical maps with different indicators, bar graphs etc. that help to better understand the analysis and

evident findings from the analysis performed.


Major Data Questions, Methods & Justifications:● Identifying customer patterns on Valentine’s day and looking into the overall tendency of

customers all across United States

Method Adopted : First, we made a data frame with only the postal code and arrival date columns

and then created a table that displayed only February 7th and February 14th, with frequencies of

arrivals for each zip code. Since the table had the same amount of rows for each of the dates, we made

another table for only 2/7 (first 317 rows) and a different table for 2/14 (last 317 rows). We used the

cbind function to combine the columns from the two new tables into another table, so we have the

frequency of each zip code for both dates in separate columns, then cleaned the zip codes using the zip

code package. The last step was creating a new column called percent that divides the freq column for

2/14 by the freq column for 2/7. The values of the percent column are what we used to make the graph

(ggplot).

Justification: A simplified version of this analysis would be to plot the amount of check-ins for each

zip code, but this would leave the data skewed. Hotels with more rooms in higher population areas will

obviously tend to have more check-ins per day than smaller hotels. That is why we decided to calculate

percent change.

● Identifying the type of guests that are more likely to be promoters.

Method Adopted: To start our demographics analysis, we decided to run an association rules model

that includes guest country, guest state, gender, age range, purpose of visit, language and likelihood to

recommend as factors. With a support set at .3 and confidence set at .7, our model returned 23 rules.

However, almost all of our rules involved English as a language and the United States. We suspected

this was due to most of the data coming from the United States, so we ran a summary of the countries

and saw that 55,585 of our 68,455 observations were from the U.S. To develop more interesting rules,

we came to the conclusion that removing Country and Language would be beneficial, and also to

replace Likelihood to Recommend with NPS type, because it involves three factors as opposed to ten

and provides essentially the same information. After making further adjustments to support and

confidence, we acquired 31 rules that gave us interesting insight into which guests are most likely to be

promoters.

Justification: We chose to use association rules because it’s a reliable way of determining strong

relationships among multiple factors simultaneously.


● Finding correlations between nightly rate, length of stay, purpose of visit and likelihood to

recommend

Method Adopted: We relied on a visual to analyze these variables by creating a scatterplot with

nightly rate on the x-axis and length of stay on the y-axis. The shape of each dot was identified

purpose of visit, while likelihood to recommend was used as a color gradient.

Justification: Making a graph was the easiest way to notice trends amongst three numeric variables

and one factor at the same time. Much like a supply-demand graph, we expected to see less

observations of long stays at higher nightly rates. A linear model could have confirmed a strong

relationship, however it wouldn’t have been possible to tell the direction one affected the other (for

example, there could be longer stays at higher nightly rates). The graph also gave us the ability to

incorporate purpose of visit and likelihood to recommend, which wouldn’t have been possible with

association rules or linear modeling.

● Identifying the relationship between length-of-stay, nightly rate and revenue

Method Adopted: To analyze the relationship between length-of-stay and revenue according to each

room type, we decided to run ggplot to generate a fitted line plot. At first, we tried to analyze all types

of rooms from all countries that Hyatt hotels are located in; however, we couldn’t generate an ideal

line chart on such a large dataset. We decided to narrow the countries down to the United States,

China, India, Japan, and Australia since they are having the most customer traffic. As for room types,

we chose Guest Room Queen, Guest Room King, Guest Room Double and Guest Room Twin as our

sample since these four types are the most common in those five countries.

Justification: We chose geom_point(), geom_smooth(method = 'lm'), and geom_smooth(method =

'loess',span = 0.8), because they are well suited for determining the relationship between two variables

and helping us find trends. The result helped us determine potential discount packages and pricing

strategies for customers, so Hyatt can further expand its market share. We believe that providing the right

discount packages to a targeted group of customers would greatly benefit NPS scores.

● Influence of guest satisfaction metrics on NPS

Method Adopted: For our next question, we decided to apply association rules to all guest satisfaction

metrics. We created a new and exclusive data frame for all 8 satisfaction metrics. To better understand

the results, we added NPS Type and applied the apriori algorithm using the “arules” library. Our

formula returned 41 results.

Justification: It is easy to evaluate the result given the fact that we have “Lift”, a meaningful metric

that tells us the relative strength of a given combination.


● Correlation between early check-in and late check-out with NPS score, purpose of visit, age

Method Adopted: Before starting our analysis of check-in and check-out, we plotted histograms

depicting frequency of check-in and check-out hours from the data, and only proceeded when we saw

major trends in customers checking in early or checking out late. To see trends in check-in and check-

out with numerical features like stay duration, no. of adults & no. of kids; histograms were plotted to see

frequency distribution of these features with the check-in check-out behaviors shown. Percentage of

Promoters and Detractors leaving early and late were calculated and distributions were studied. Further

research on association rules in the data was also done by using “lappy” to convert the dataframe

columns into factors.

Justification: Check-in times between 7am and 2pm were considered early as regular check-ins for

Hyatt hotels start at 3pm. Check-out times between 1pm and 4pm were considered late as regular

checkout for Hyatt hotels ends at 12 noon. The purpose of visits was effectively coded as Leisure = 1,

Business = -1 and Combination as 0. We did this in order to build linear regression models that would

predict the probability of a customer checking in early or late. Customers on leisure trips showed more

of a tendency to request these than guests on business trips. The data included far more promoters than

detractors, so in addition to studying the relationship of NPS with early checking and late checkout, we

calculated the percentage of promoter and percentage of detractors that have shown these behaviors and

studied if that fraction varied. Association rules were also created to see which combination of features

have more of a tendency to show these behaviors. The method was used to see lifts by multiple

combined features.


Results, Interpretation and Recommendations:Identifying customer patterns on Valentine’s day and looking into the overall tendency of customers

all across United States

Results & Interpretation:

● The size of each dot represents the percent change in the number of arrivals by zip code,

so larger dots indicate that the hotel is more popular for guests coming specifically for

Valentine’s Day.


Recommendations:

○ Hyatt should offer, or continue to offer, Valentine’s themed packages at hotels with large

percent increases in the number of arrivals on Valentine’s day.

○ Hyatt should consider offering a free or discounted night before or after Valentine’s day,

since these hotels will have an influx of guests that otherwise wouldn’t be there.

Based on Identifying which types of guests are likely to be promoters


● We proceeded by making a heat-map. The x-axis is age range, the y axis is purpose of visit and the

color of each square represents likelihood to recommend

● The graph confirms what the association rules told us. Older guests visiting for both leisure and

business have a very high likelihood to recommend. It also shows us something we didn’t notice in

our association rules output, that is young professionals are promoters as well.


Recommendations:

○ Hyatt should provide nightly rate discounts for guests who are above 60 years and for young

professionals. Lower nightly rates will incentivize them to choose Hyatt over competing hotels and

will ultimately lead to more promoters. Attracting more guests on the extreme ends of the age

spectrum will help Hyatt

Further Analysis on Demographics and NPS: Purpose of Visit by Age

Results & Interpretations:

● A large majority of Hyatt guests are middle-aged. Attracting more guests on the extreme ends of the

age spectrum will help Hyatt, as observed in the heat map above. It’s also worth noting that middle-

aged guests are more likely to be visiting for business than leisure, while the opposite is true for

guests 66+.


Correlation between nightly rate, length of stay, purpose of visit and likelihood to recommend


● The above graph addresses a few questions: how does nightly rate effect length of stay, how does

length of stay and nightly rate reflect likelihood to recommend, and why are these guests visiting?

As you can see, length of stays is long for both low nightly rates and very high nightly rates. Most

often, long stay high nightly rates are guests visiting for business. These guests are obviously highly

valuable to Hyatt, because they are not only bringing in a lot of revenue, but a majority of them

have a high likelihood to recommend.

Recommendations:

○ Hyatt should market and offer promotions to the companies from where, significant number

of guests are coming.

○ Hyatt should come up with a balanced cost determining formula for nightly rates.

It is observed low nightly rates generally have the longest stays, and there is a gradual

decline as nightly rate increases.


Based on the relationship between length-of-stay, nightly rate and revenue


● In China, the most popular types of rooms for long-term travelers are Guest Room Double and

Guest Room King as well.

● We could also consider to form a long-term cooperative relationship with transnational corporations

and provide discounts to Guest Room Double to encourage them choose Guest Room Double more

often.

● As we could also see from the below graphs, few customers choose Guest Room Queen. We

recommend that Hyatt Hotel provide more discount to Guest Room Queen to attract short-term

travelers and put more effort on advertising of Guest Room Queen in China during holiday season.

● The nightly rate for each type of room could be summarized as Double>Twin>King>Queen. Hyatt

Hotel in China might consider slightly increase the nightly charge for Guest Room King and Guest

Room Queen to generate more profit.


Recommendations:

○ From our analysis the suggested ideal price for Guest Room Twin in China will be around 100

RMB per night.

○ Hyatt Hotel should consider setting 200 RMB/night for Guest Room King, since the demand around

200 RMB are pleasant. The same goes for Guest Room Double, Hyatt Hotel should consider setting

150-200 RMB/night for Guest Room King.


Correlation between early check-in and late check-out with NPS score, purpose of visit, age


● Following are the graphs that individually show the relation of early check-in with suspected

features to have a correlation

● Ages 36-65 have more tendencies to check-in early

● Guests travelling for leisure tend to check-in early

Late Check-In Results

● Following are the graphs that individually show the relation of late check-out with suspected

features to have a correlation

● Ages 36-65 have more tendencies to check-out early

● Guests travelling for leisure tend to check-out early


Early Check-In and Late Check-Out by NPS

● Below are the plots of early checking and late checkout by NPS

● 82.52% of the Promoters do early check-in compared to 82.22% of the Detractors who check-in

early

● 37.25% of the Promoters do late check-out compared to 37.19% of the Detractors who check-out

late.

OTHER EMINENT ANALYSIS PERFORMEDAs there were several inputs from all the team members, there was a contention on how to add all

the results in the project report. So, recognizing any test result as important becomes subjective,

which may not be fair to the work done and the chosen approach. Thus we are adding all the

prominent test results performed and their underlying explanations under this section.

LOCATION ANALYSIS:

When beginning our location analysis our goal was to identify patterns concerning likelihood to

recommend. We wanted to identify large regions where there was clear evidence showing that

likelihood to recommend in that area was significantly greater or less. Unfortunately, after

performing a linear regression on country, state and city the R-squared values were very low.

Linear Model: Country as a predictor of Likelihood to Recommend

country <- lm(formula = febdata2$Likelihood_Recommend_H~febdata2$Country_PL, data = febdata2)


summary(country)

Residual standard error: 1.953 on 68402 degrees of freedom

Multiple R-squared: 0.004445, Adjusted R-squared: 0.003688

F-statistic: 5.873 on 52 and 68402 DF, p-value: < 2.2e-16

Linear model: State as a predictor of Likelihood to Recommend

state<-lm(formula = febdata2$Likelihood_Recommend_H~febdata2$State_PL, data = febdata2)

> summary(state)




Linear Model: City as a predictor of Likelihood to Recommend

> city<-lm(formula= febdata2$Likelihood_Recommend_H~febdata2$City_PL, data = febdata2)

> summary(city)




These values told us that location alone does not have a significant impact on likelihood to

recommend. It is worth noting though that City has a higher R-squared than Country and State.

To further analyze location, we created a map that displayed likelihood to recommend by zip

code.


Graph of Likelihood to Recommend by Zip code

There are a few anomalies concerning likelihood to recommend (such as the green dot near

Houston, Texas), but there isn’t really a nationwide pattern where, for example, zip codes in a

certain state are noticeably lower. To further examine the zip codes, we also made maps that

zoomed in on high population areas (Southern California and the Northeast).


Graph of Likelihood to Recommend by Zip code- Southern California

When zoomed in on Southern California, likelihood to recommend by zip code seems slightly

more sporadic as compared to other areas in the country. Our advice to Hyatt upon observing

these graphs would be to identify any major differences between high likelihood hotels and low

likelihood hotels, and to make changes accordingly.


Graph of Likelihood to Recommend by Zip code- New York City

Like the graph of California, this gives a clearer picture of a region where Hyatt has a lot of

hotels. Overall, it appears as though Hyatt is doing well in this area. There aren’t any green dots

and the lowest likelihood to recommend looks to be near Manhattan (the brown dot is a 6 on a 1-

10 scale). As we already stated, it would be wise for Hyatt to try and find if there is anything

different about this particular hotel that would lead to a lower likelihood. Otherwise, the hotel may

simply attract guests that are harder to please.

AMENITIES ANALYSIS

Our analysis regarding hotel amenities relied heavily on linear modeling and visualization. We

performed several rounds of linear regression in an effort to determine the most statistically

significant amenities in determining likelihood to recommend, but low R-squared values made it

difficult to come up with any worthwhile insight to provide to Hyatt. Nonetheless, here is the final


model. It includes only statistically significant amenities in predicting likelihood to recommend

(all p-values are less than .001).

lm(formula = Likelihood_Recommend_H ~ Bell.Staff_PL + Laundry_PL +

Mini.Bar_PL + Self.Parking_PL + Shuttle.Service_PL, data = febamenities)

Residuals:

Min 1Q Median 3Q Max

-8.1159 -0.5583 0.5925 1.2778 1.7776

Coefficients: (4 not defined because of singularities)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.79993 0.01691 520.346 < 2e-16 ***

Bell.Staff_PLN 0.25823 0.04100 6.299 3.02e-10 ***

Bell.Staff_PLY -0.11418 0.03678 -3.104 0.00191 **

Laundry_PLN -0.09878 0.02451 -4.030 5.59e-05 ***

Laundry_PLY NA NA NA NA

Mini.Bar_PLN -0.23718 0.03158 -7.512 5.93e-14 ***

Mini.Bar_PLY NA NA NA NA

Self.Parking_PLN -0.12741 0.02710 -4.701 2.59e-06 ***

Self.Parking_PLY NA NA NA NA

Shuttle.Service_PLN 0.05773 0.02089 2.764 0.00572 **

Shuttle.Service_PLY NA NA NA NA

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




Our initial model contained all amenities, and we gradually narrowed our results down by

eliminating variables that weren’t statistically significant. If we were to offer advice strictly based

on p –values, the remaining variables: bell staff, laundry, mini bar, self-parking and shuttle

service, are all amenities that Hyatt should include consider when managing their hotels. The


inclusion of these amenities, with the exception of bell staff, lead to a greater likelihood to

recommend. But as previously mentioned, an R-squared of only .008 means that these variables

have minimal effect. A deeper look into amenities is needed for any actionable insight.

➢ Are guests satisfied with the internet?

Another amenity, internet, had ten columns relating to guest satisfaction with it. In an effort to

determine how important it is in predicting likelihood to recommend, we ran a linear model with

all internet related variables.

> internet<-lm(formula = aprildata4$Likelihood_Recommend_H~aprildata4$Internet_Sat_H +

aprildata4$Internet_Dissat_Lobby_H + aprildata4$Internet_Dissat_Slow_H +

aprildata4$Internet_Dissat_Wired_H +aprildata4$Internet_Dissat_Other_H

+aprildata4$Internet_Dissat_Billing_H + aprildata4$Internet_Dissat_Expensive_H +

aprildata4$Internet_Dissat_Connectivity_H + aprildata4$TV_Internet_General_H

+aprildata4$Room_Dissat_Internet_H, data = aprildata4)

> summary(internet)


(38085 observations deleted due to missingness)


F-statistic: 1098 on 14 and 30355 DF, p-value: < 2.2e-16

Compared to our analysis of all the other amenities, the R-squared of this model (.3362) is

extremely high. Because of this, our recommendation to Hyatt would be to invest heavily in

internet services and ensuring guests have the ability to easily connect from all areas of the hotel.

One last aspect related to internet that we thought was worth looking into concerned age. We

predicted that older guests, 66+ are less likely to put a premium on how well the internet is

working…mostly because they are less likely to be using it frequently. The following boxplot

confirms our hypothesis:

> plot(febdata2$Age_Range_H, febdata2$Internet_Sat_H)


As we thought, the mean internet satisfaction for guests 65 and younger was lower than that of

guests 66+. It should be noted that older guests are more likely to be visiting for leisure, and this

may explain why internet is less important to them.

➢ What are guests paying for amenities?

Another thought we had on amenities was that if an amenity is available that is free for the guests

to use, then they should be paying more per night. Conversely, if the hotel offers an amenity that

requires the guest to pay extra to benefit from it, then the nightly rate should be lower. Allowing

the guests to save money by lowering the nightly rate will give them the ability to spend more

money using amenities, and therefore enhance their overall experience. A better overall

experience will increase their likelihood to recommend. These boxplots (using grid.arrange)

compare average nightly rates when and when not an amenity is present.


After examining this, we can group each of these amenities into one of four categories:

1. Cases where an amenity exists that doesn’t present any extra cost to the customer, and the

nightly rate is higher: Spa, Outdoor pool, possibly bell staff.

2. Cases where an amenity exists that doesn’t present any extra cost to the customer, and the

nightly rate is lower: indoor pool, fitness center

3. Cases where amenity needs to be paid for, but nightly rate is higher when the amenity

exists: Golf, Mini bar

4. Cases where amenity needs to be paid for, but nightly rate is lower when the amenity

exists: None

Based on these conclusions, we recommend to Hyatt to decrease nightly rates when the hotel has a

golf course and if the room has a mini bar. This provides guests with a greater incentive to use

these amenities and possibly raise NPS scores.


MARKETING STRATEGY TO INCREASE NPS

➢ Length of Stay VS. Revenue

United States

From this graph, we could rough summarize the nightly rate for each type of room as

Double>King>Twin>Queen. The two most popular types of room for business travellers are

Guest Room Double and Guest Room King. We could consider provide consistently partnership

with international corporations and offer them annual discount for these two types of room.

Normally speaking, the nightly rate of Guest Room Queen is higher than it’s of Guest Room

Twin. Maybe Hyatt hotel need to consider increase the nightly rate of Guest Room Queen in the

U.S.


India

We can draw a conclusion that Hyatt Hotel doesn’t provide Guest Room Queen in India. In

addition, we could see the revenue generated from Guest Room Twin is pretty low. My team

recommend that Hyatt Hotel can cancel Guest Room Twin in India and replace them with Guest

Room Double or Guest Room King.


Japan

From this graph, we could tell the most popular room type in Japan is Guest Room Twin because

Japan’s territory is limited. The nightly rate for each type of room could simply summarize as

Twin>Queen>Double>Twin. Long-term travelers choose Guest Room King more frequently, so

that we might need to increase the nightly rate of Room King.


➢ Nightly Rate VS Revenue

Japan

From this graph, we could tell that Guest Room Twin is price sensitive. As you charge more on

Twin Room, the revenue decrease since the demand decrease. The ideal price for Guest Room

Twin in Japan will be 100-200 JPY per night. Guest Room Queen is not price sensitive, because

as the price increase, the revenue increase correspondingly. Hyatt Hotel could consider set 300

JPY/night for Guest Room Queen. The same goes for Guest Room King, Hyatt Hotel could set

500 JPY/night for Guest Room King and 250 JPY/night for Guest Room Double.


Australia

From this graph, we could tell that Guest Room Twin, Guest Room Double and Guest Room King

in Australia are price sensitive. As you charge more on Twin Room, the revenue decrease

dramatically since the demand decrease. The ideal price for Guest Room Twin in Australia will be

100 AUD per night. Guest Room Double and Guest Room King are also price sensitive, Hyatt Hotel

could consider set 150-200 AUD/night for Guest Room Double and 200-250 AUD/night for Guest

Room King.


India

From this graph, we could tell that Guest Room Twin, Guest Room Double and Guest Room

King are not price sensitive because revenues for each room type don’t change much. The ideal

price for Guest Room Twin in India will be less than 800 INR per night, 1000-3000 INR/night

for Guest Room King and 1500-2500 INR/night for Guest Room Double.


Check-In and Check-Out

● Following graph shows the frequency of check-in and checkout over hours of a day showing that

majority of customer’s check-in early or check-out late

● Following are the graphs that individually show the relation of early check-in with

suspected features to have a correlation


● Following are the graphs that individually show the relation of late check-out with

suspected features to have a correlation

These are some additional trials: -

Below are the visualizations of the association rules for early check-in:


Below are the visualizations of the association rules for late check-out


➢ Interpretation of Results

➔ More than 80% of all guest’s check-in early

➔ More than 70% of all guest’s check-out late

➔ Given the little variation in the percentage population in each group that shows these behaviors,

our team accepted this as a null hypothesis.

➔ However, it is highly likely that the volume of data on which we operated if increased might result

in converting this null hypothesis to an accepted phenomenon. Several competitors of Hyatt

including Sheraton, Four Seasons, Wyndham, et al. have moved to charged early check-in and late

check-out models where the customers pay for this benefit. This also aligns with the fundamentals

of Receptive Programmed Decision Making and empowers the customer with a receptive choice.

➔ As a part of the same analysis our team also figure out that based on numerical features like stay

duration, no. of adults & kids; and categorical features like age group and purpose of visit, it should


be feasible to build linear regression models that can predict with approximation the chances of a

customer to avail for early check-in and late check-out once he/she makes the booking and these

details get available to the hotel. Customers with predicted chances above a certain threshold can be

pro-actively mailed too, making them aware of this facility and asking them if they want to avail the

same, thereby further aiding towards a better probable NPS.

➔ Ages 36-65 have more tendencies to check-in early or check-out late

➔ Guests travelling for leisure tend to check-in early or check-out late

➔ Guests travelling for lesser no. of days with small group size and lesser kid count tend to check-in

early or check-out late.

➔ Guests aged 26-35 travelling for leisure with no kids and stay for a single day have the highest

changes of checking in early and are usually Promoters.

➔ Single person guests with no family travelling for leisure and staying more than 6 days have the

greatest chance of Normal checkout

Booking Channel

● We have a lot of promoters for the booking channel

➔ Creating a data frame of NPS_Type and Booking Channel

febBookChannel <- sqldf('select Booking_Channel, NPS_Type from febData')

febBookChannel.freq <- as.data.frame(table(febBookChannel))

ggplot(febBookChannel.freq, aes(x=febBookChannel.freq$Booking_Channel, y=febBookChannel.freq$Freq)) +

geom_bar(aes(fill = febBookChannel.freq$NPS_Type), stat = "identity") + xlab('Booking Channel') + ylab('Frequency')

+ guides(fill=guide_legend(title="NPS Type"))


➢ Comparing Booking Channel v/s POV (Purpose of Visit)

promoterFebPOV <- sqldf('select Booking_Channel, POV_H from febData where NPS_Type = "Promoter"')

promoterFebPOV.freq <- as.data.frame(table(promoterFebPOV))

ggplot(promoterFebPOV.freq, aes(x=promoterFebPOV.freq$Booking_Channel, y=promoterFebPOV.freq$Freq)) +

geom_bar(aes(fill = promoterFebPOV.freq$POV_H), stat = "identity") + xlab('Booking Channel') + ylab('Frequency') +

guides(fill=guide_legend(title="NPS Type"))

➔ Comparing Booking Channel v/s POV

passiveFebPOV <- sqldf('select Booking_Channel, POV_H from febData where NPS_Type = "Passive"')

passiveFebPOV.freq <- as.data.frame(table(passiveFebPOV))

ggplot(passiveFebPOV.freq, aes(x=passiveFebPOV.freq$Booking_Channel, y=passiveFebPOV.freq$Freq)) +

geom_bar(aes(fill = passiveFebPOV.freq$POV_H), stat = "identity") + xlab('Booking Channel') + ylab('Frequency') +

guides(fill=guide_legend('NPS Type'))


Grouping by POV_H

➔ Comparing Booking_Channel VS NPS_Type

businessFebData <- sqldf('select Booking_Channel, NPS_Type from febData where POV_H = "Business"')

businessFebData.freq <- as.data.frame(table(businessFebData))

ggplot(businessFebData.freq, aes(x=businessFebData.freq$Booking_Channel, y=businessFebData.freq$Freq)) +

geom_bar(aes(fill = businessFebData.freq$NPS_Type), stat = "identity") + xlab('Booking Channel for Business') +

ylab('Frequency') + guides(fill=guide_legend(title="NPS Type"))

leisureFebData <- sqldf('select Booking_Channel, NPS_Type from febData where POV_H = "Leisure"')

leisureFebData.freq <- as.data.frame(table(leisureFebData))

ggplot(leisureFebData.freq, aes(x=leisureFebData.freq$Booking_Channel, y=leisureFebData.freq$Freq)) +

geom_bar(aes(fill = leisureFebData.freq$NPS_Type), stat = "identity") + xlab('Booking Channel for Leisure') +

ylab('Frequency') + guides(fill=guide_legend(title="NPS Type"))


combinationFebData <- sqldf('select Booking_Channel, NPS_Type from febData where POV_H = "Combination of

both business and leisure"')

combinationFebData.freq <- as.data.frame(table(combinationFebData))

ggplot(combinationFebData.freq, aes(x=combinationFebData.freq$Booking_Channel,

y=combinationFebData.freq$Freq)) + geom_bar(aes(fill = combinationFebData.freq$NPS_Type), stat = "identity") +

labs(x='Booking Channel for Combination', y='Frequency') + guides(fill=guide_legend(title="NPS Type"))


➢ Guest satisfaction which is collected via feedback, we decided to ask which metrics from guest

satisfaction are relevant than others to boost NPS

● We decided to apply association rules over all guest satisfaction metrics. We created a new data frame

where all 8 metrics were added. To understand result better, we added NPS_Type in it and applied apriori

algorithm using the “arules” library. As a result, we get 41 rules of which 6 were relevant in understanding the

result

➢ Why association rules for this?

● It is easy to evaluate the result given the fact that we have “Lift” as meaningful metric to know whether a

given combination is relevant or not

Output:lhs rhs support confidence lift

[1] {} => {NPS_Type=} 0.94149455 0.9414946 1.00000

[2] {Overall_Sat_H=10} => {Condition_Hotel_H=10} 0.02052541 0.8759209 31.19884

[3] {Condition_Hotel_H=10} => {Overall_Sat_H=10} 0.02052541 0.7310807 31.19884

[4] {Overall_Sat_H=10} => {Customer_SVC_H=10} 0.02174757 0.9280764 29.27289

[5] {Customer_SVC_H=10} => {Overall_Sat_H=10} 0.02174757 0.6859500 29.27289

[6] {Overall_Sat_H=10} => {NPS_Type=Promoter} 0.02322954 0.9913196 24.68251

[7] {NPS_Type=Promoter} => {Overall_Sat_H=10} 0.02322954 0.5783840 24.68251

[8] {Guest_Room_H=10} => {Condition_Hotel_H=10} 0.02340047 0.9105724 32.43306

[9] {Condition_Hotel_H=10} => {Guest_Room_H=10} 0.02340047 0.8334855 32.43306

[10] {Guest_Room_H=10} => {Customer_SVC_H=10} 0.02217233 0.8627823 27.21341

[11] {Customer_SVC_H=10} => {Guest_Room_H=10} 0.02217233 0.6993476 27.21341

[12] {Guest_Room_H=10} => {NPS_Type=Promoter} 0.02406625 0.9364794 23.31707

[13] {NPS_Type=Promoter} => {Guest_Room_H=10} 0.02406625 0.5992169 23.31707

[14] {Condition_Hotel_H=10} => {Customer_SVC_H=10} 0.02398505 0.8543075 26.94610

[15] {Customer_SVC_H=10} => {Condition_Hotel_H=10} 0.02398505 0.7565236 26.94610

[16] {Condition_Hotel_H=10} => {NPS_Type=Promoter} 0.02577727 0.9181431 22.86052

[17] {NPS_Type=Promoter} => {Condition_Hotel_H=10} 0.02577727 0.6418190 22.86052

[18] {Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02858652 0.9016606 22.45013

[19] {NPS_Type=Promoter} => {Customer_SVC_H=10} 0.02858652 0.7117656 22.45013

[20] {Overall_Sat_H=10,

Condition_Hotel_H=10} => {NPS_Type=Promoter} 0.02038610 0.9932129 24.72965

[21] {NPS_Type=Promoter,

Overall_Sat_H=10} => {Condition_Hotel_H=10} 0.02038610 0.8775938 31.25842


Condition_Hotel_H=10} => {Overall_Sat_H=10} 0.02038610 0.7908557 33.74974

[23] {Overall_Sat_H=10,


Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02158860 0.9926904 24.71665


Overall_Sat_H=10} => {Customer_SVC_H=10} 0.02158860 0.9293598 29.31337


Customer_SVC_H=10} => {Overall_Sat_H=10} 0.02158860 0.7552021 32.22822

[26] {Guest_Room_H=10,

Condition_Hotel_H=10} => {Customer_SVC_H=10} 0.02077924 0.8879839 28.00831


Customer_SVC_H=10} => {Condition_Hotel_H=10} 0.02077924 0.9371699 33.38042

[28] {Condition_Hotel_H=10,

Customer_SVC_H=10} => {Guest_Room_H=10} 0.02077924 0.8663412 33.71156


Condition_Hotel_H=10} => {NPS_Type=Promoter} 0.02216891 0.9473703 23.58824


Guest_Room_H=10} => {Condition_Hotel_H=10} 0.02216891 0.9211620 32.81025


Condition_Hotel_H=10} => {Guest_Room_H=10} 0.02216891 0.8600179 33.46550




Guest_Room_H=10} => {Customer_SVC_H=10} 0.02142963 0.8904436 28.08589



[35] {Condition_Hotel_H=10,







Condition_Hotel_H=10,



Guest_Room_H=10,



Guest_Room_H=10,



Condition_Hotel_H=10,



From output we infer that, there are higher number of promoters for following metrics

● Quality of Customer Service metric● Hotel Condition● Guest Room● One important observation, there is higher chance that if a customer is promoter then he/she is

satisfied by both hotel condition and Quality of customer service metric● A promoter usually gives full rating of 10 to hotel condition and Quality of customer service metric● Since we have many values for Lift parameter in 25-30 range when compared to maximum lift

value, it can be inferred that there is a higher probability of promoters for almost all metrics

➢ Which locations across globe has a larger share of revenue from reservation?We applied 3 concepts to achieve this. First a “tapply” of February reservation revenue with cities at which Hyatt is located worldwide. Then club the result of “tapply” with data from “world.cities” dataset imported from the web. “World.cities” has 4 columns “country, city, latitude and longitude”. After merging the above result of “tapply” and “world.cities”. We plot the same result on world map with each dot representing the location and color representing the revenue of that place.

➢ Why map and tapply?Tapply is quick, efficient in grouping data based on a column. Map is more interactive way to visualize results and easy to understand.

Output: Result of tapply and merge


➢ Mapping result globally

The output shows that we have maximum revenue from USA because of fact that we have 50% of locations

in Americas compared globally.

● Of 370 locations worldwide, 185 are in USA● 30% of total revenue in reservation is from USA

Within USA:

● Hyatt Regency Clearwater Beach Resort and Spa leads in US with maximum revenue of 827.137$

● Hyatt Place Lincoln/Downtown-Haymarket has the lowest revenue in US with a value of 129.34$

➔ Get a comparative analysis of worldwide revenue from reservation across countries, for Hyatt to

know where it needs to enhance operations if needed.

The data-frame generated in the above result by merging “World.cities” and “tapply” result is used in this

step. To find where it needs to enhance operations, “sqldf” library was used. Using this library, sql queries

are run on data frame.

➢ Why sqldf ?

It is easy to get the analysis once all the data we need is in form of a data-frame. So SQL library is

convenient and quick to generate the desired result.


Global View:

• China has the lowest revenue coming from reservation with a mean value of just 127.5412$, lower than the average at global level

• Though USA accounts for close to 30% of reservation revenue, France on other hand is highest revenue generator for Hyatt with just 5 locations against 185 in USA

We can infer from worldwide reservation revenue that Hyatt should expand itself more in France because just

5 locations and it is highest revenue generator for Hyatt. It can be upcoming market, so Hyatt management

give a thought about this. At the same time, they should look into their China operations. Being the lowest

revenue generator Hyatt management should study factors which are crucial in boosting the operations at

China.


Reflections:

This whole project has been a great learning experience for all us. There were several aspects of this project which were very novel and challenging for all of us. As none of us have used R tool before, it was quite a challenge to put the logic behind the codes and come up with meaningful results and useful. In addition to the these, the most significant takeaways from this project are as follows:

● A Data Question is the basis of any analysisAs soon as the project description was given to all of us, most of us jumped directly to come up with fancy graphs and charts depicting correlations between different attributes. As soon as we came up with graphs we couldn’t understand their utility with respect to the business growth as these graph did not give us any conclusive insights. Thus, we went back to the start and decided to come up with major data questions and work on those individually. Without any data questions, there is high chance of getting stranded in numerous analysis with no results or conclusion.

● Any assumptions made should be examined carefully.Assumptions are always made while doing any type of analysis. This could be very critical when comes to data analysis. As we did several analyses in order to come up with meaningful results, we faced many setbacks just because we made many assumptions on the basis of individual understanding. And with further examination we noticed that the analysis done is based on false premise and is not supported by data. Thus, it is very important to observe what the data conveys and not what you understood from the data.

● Sample size is crucial when it comes to data analysisAs we mentioned above, we have been working on the February data for all of our analysis. During this period of brainstorming and performing analysis, we tried several different models, and correlations; but some of them were not reliable as there were many anomalies and discrepancies in the results. It could be apprehended that the sample size provides necessary reliability to the results. But at the same time, a big data set will bring cleansing and handling issues. Thus a sample size must be chosen carefully.

● Data Cleansing and Grouping should be done prudently.During the entire projects there were several instances where we have to get back to the original data set and add some columns which were removed or not considered into the workable data frames. Reckless handling of data set, can be disastrous, as recovering lost data can be another colossal task. Thus, one should make sure that you have an original copy intact and work on copy of the same.

● Purpose is paramountWith 7 people on a team, there were many instances where we had intense discussions on individual analysis and how they serve the purpose of the project.As all of us worked hard to come up with meaningful and reliable insights, all of us were little possessive with our work and wanted to be included in the final project.But we made sure we had unbiased opinions on each other’s work, which lead us to identify and select the best work. This is very necessary when it comes to achieve a bigger purpose.

Hyatt Hotel Group Project

Data & Analytics

Transcript of Hyatt Hotel Group Project