You are on page 1of 43

Page%|%1%

IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Project Implementation Report:%

Net Promoter Score (NPS) Analysis


Hyatt Hotel Group
By%
Group-1%
ERIK BEBERNES%
AVINASH BHAMBANI%
WANER LI%
APURVA PATIL%
SHRADDHA RAO
RITWICK CHATTERJEE%
SEEPANA MOHIT RAO%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%2%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

!
Table!of!Contents%
!%
Introduction.................................................................................... 3%

Major Data Questions, Methods & Justifications....... 4%

Results, Interpretation and Recommendations...7

Other Eminent Analysis Performed....14

Reflections..43
%
%
%
%
%
%
%
%
%
%
%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%3%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

INTRODUCTION:
The purpose of this project is to conduct analysis on Grand Hyatt Hotel data set and come up with reliable
and actionable insights for the Hotel representatives. This data set also included the NPS (Net Promoter
Score) values for different hotels in different locations all over the world. NPS score measures customer
experience and predicts business growth. This metric has changed the business and serves the purpose of
gauging customer satisfaction.
The NPS calculation involves categorizing customer reviews in 3 categories namely; Promoters, Passives
and Detractors.%
% Promoters usually give a likelihood to recommend score of 9-10 and are loyal enthusiasts
who will keep coming to the hotel and refer others, promote business growth.
% Passives usually give a score of 7-8 and are satisfied but unenthusiastic customers who may be
looking for better offerings.
% Detractors usually give a score of 0-6 and unsatisfied by the hotel services. These customers are
harmful for business as they would give a negative publicity to the hotel and thus will sway
potential customers away.
The data set had 13 months data divided month wise, from Feb 2014 to Jan 2015. Looking at the size and
expanse of the data set, we focused on only February 2014 data. Though it may be a small sample size for
conducting analysis, we had to stick it due to the time constraints of the course. Being a team with 7
members, we had frequent team meetings which revolved around topics such major data questions, data
cleansing, grouping columns, model feasibility, visualizations and coming up with actionable insights. The
first step was coming up with data questions and we completed this task by goings through the purpose of
the project again and again. And thus, all of us came up with questions which need to answered in order to
get actionable insights. All of the data questions were put together and any repetition or non-relevance was
removed from the data questions. After this step, everyone chose 2-3 data questions and performed analysis
to answer/ solve these problems. Due to the fact that we were 7 team members we further divided into
groups of 2-3 in order to deal with the relevant data question and performing analysis. Once everyone was
done with their analysis, we met again in order to understand the methods adopted and discussing further
possibilities on other ways to analyze the data. The next step was to incorporate all relevant improvements,
and coming with most reliable way to analyze the data. Apart from that we also shifted our attention to data
visualization, in order to represent the data with respect to Hyatt officials. These graphs & charts included
geographical maps with different indicators, bar graphs etc. that help to better understand the analysis and
evident findings from the analysis performed.
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%4%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Major Data Questions, Methods & Justifications:%


% Identifying customer patterns on Valentines day and looking into the overall tendency of
customers all across United States
Method Adopted : First, we made a data frame with only the postal code and arrival date columns
and then created a table that displayed only February 7th and February 14th, with frequencies of arrivals
for each zip code. Since the table had the same amount of rows for each of the dates, we made another
table for only 2/7 (first 317 rows) and a different table for 2/14 (last 317 rows). We used the cbind
function to combine the columns from the two new tables into another table, so we have the frequency
of each zip code for both dates in separate columns, then cleaned the zip codes using the zip code
package. The last step was creating a new column called percent that divides the freq column for 2/14
by the freq column for 2/7. The values of the percent column are what we used to make the graph
(ggplot).
Justification: A simplified version of this analysis would be to plot the amount of check-ins for each
zip code, but this would leave the data skewed. Hotels with more rooms in higher population areas will
obviously tend to have more check-ins per day than smaller hotels. That is why we decided to calculate
percent change.
% Identifying the type of guests that are more likely to be promoters.
Method Adopted: To start our demographics analysis, we decided to run an association rules model
that includes guest country, guest state, gender, age range, purpose of visit, language and likelihood to
recommend as factors. With a support set at .3 and confidence set at .7, our model returned 23 rules.
However, almost all of our rules involved English as a language and the United States. We suspected
this was due to most of the data coming from the United States, so we ran a summary of the countries
and saw that 55,585 of our 68,455 observations were from the U.S. To develop more interesting rules,
we came to the conclusion that removing Country and Language would be beneficial, and also to
replace Likelihood to Recommend with NPS type, because it involves three factors as opposed to ten
and provides essentially the same information. After making further adjustments to support and
confidence, we acquired 31 rules that gave us interesting insight into which guests are most likely to be
promoters.
Justification: We chose to use association rules because its a reliable way of determining strong
relationships among multiple factors simultaneously.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%5%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

% Finding correlations between nightly rate, length of stay, purpose of visit and likelihood to
recommend
Method Adopted: We relied on a visual to analyze these variables by creating a scatterplot with
nightly rate on the x-axis and length of stay on the y-axis. The shape of each dot was identified
purpose of visit, while likelihood to recommend was used as a color gradient.
Justification: Making a graph was the easiest way to notice trends amongst three numeric variables
and one factor at the same time. Much like a supply-demand graph, we expected to see less
observations of long stays at higher nightly rates. A linear model could have confirmed a strong
relationship, however it wouldnt have been possible to tell the direction one affected the other (for
example, there could be longer stays at higher nightly rates). The graph also gave us the ability to
incorporate purpose of visit and likelihood to recommend, which wouldnt have been possible with
association rules or linear modeling.

% Identifying the relationship between length-of-stay, nightly rate and revenue


Method Adopted: To analyze the relationship between length-of-stay and revenue according to each
room type, we decided to run ggplot to generate a fitted line plot. At first, we tried to analyze all types
of rooms from all countries that Hyatt hotels are located in; however, we couldnt generate an ideal
line chart on such a large dataset. We decided to narrow the countries down to the United States,
China, India, Japan, and Australia since they are having the most customer traffic. As for room types,
we chose Guest Room Queen, Guest Room King, Guest Room Double and Guest Room Twin as our
sample since these four types are the most common in those five countries.
Justification: We chose geom_point(), geom_smooth(method = 'lm'), and geom_smooth(method =
'loess',span = 0.8), because they are well suited for determining the relationship between two variables
and helping us find trends. The result helped us determine potential discount packages and pricing
strategies for customers, so Hyatt can further expand its market share. We believe that providing the right
discount packages to a targeted group of customers would greatly benefit NPS scores.%
% Influence of guest satisfaction metrics on NPS
Method Adopted: For our next question, we decided to apply association rules to all guest satisfaction
metrics. We created a new and exclusive data frame for all 8 satisfaction metrics. To better understand
the results, we added NPS Type and applied the apriori algorithm using the arules library. Our
formula returned 41 results.
Justification: It is easy to evaluate the result given the fact that we have Lift, a meaningful metric
that tells us the relative strength of a given combination.!

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%6%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

% Correlation between early check-in and late check-out with NPS score, purpose of visit, age
Method Adopted: Before starting our analysis of check-in and check-out, we plotted histograms
depicting frequency of check-in and check-out hours from the data, and only proceeded when we saw
major trends in customers checking in early or checking out late. To see trends in check-in and check-
out with numerical features like stay duration, no. of adults & no. of kids; histograms were plotted to see
frequency distribution of these features with the check-in check-out behaviors shown. Percentage of
Promoters and Detractors leaving early and late were calculated and distributions were studied. Further
research on association rules in the data was also done by using lappy to convert the dataframe
columns into factors.
Justification: Check-in times between 7am and 2pm were considered early as regular check-ins for
Hyatt hotels start at 3pm. Check-out times between 1pm and 4pm were considered late as regular
checkout for Hyatt hotels ends at 12 noon. The purpose of visits was effectively coded as Leisure = 1,
Business = -1 and Combination as 0. We did this in order to build linear regression models that would
predict the probability of a customer checking in early or late. Customers on leisure trips showed more
of a tendency to request these than guests on business trips. The data included far more promoters than
detractors, so in addition to studying the relationship of NPS with early checking and late checkout, we
calculated the percentage of promoter and percentage of detractors that have shown these behaviors and
studied if that fraction varied. Association rules were also created to see which combination of features
have more of a tendency to show these behaviors. The method was used to see lifts by multiple
combined features.
!
%
%
%
%
%
%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%7%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Results, Interpretation and Recommendations:%


Identifying customer patterns on Valentines day and looking into the overall tendency of customers
all across United States
Results & Interpretation:%
% The size of each dot represents the percent change in the number of arrivals by zip code,
so larger dots indicate that the hotel is more popular for guests coming specifically for
Valentines Day.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%8%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Recommendations:%
% Hyatt should offer, or continue to offer, Valentines themed packages at hotels with large
percent increases in the number of arrivals on Valentines day.
% Hyatt should consider offering a free or discounted night before or after Valentines day,
since these hotels will have an influx of guests that otherwise wouldnt be there.

Based on Identifying which types of guests are likely to be promoters


Results & Interpretation:%
% We proceeded by making a heat-map. The x-axis is age range, the y axis is purpose of visit and the
color of each square represents likelihood to recommend
% The graph confirms what the association rules told us. Older guests visiting for both leisure and
business have a very high likelihood to recommend. It also shows us something we didnt notice in
our association rules output, that is young professionals are promoters as well.

%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%9%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Recommendations:%
% Hyatt should provide nightly rate discounts for guests who are above 60 years and for young
professionals. Lower nightly rates will incentivize them to choose Hyatt over competing hotels and
will ultimately lead to more promoters. Attracting more guests on the extreme ends of the age
spectrum will help Hyatt
%
Further Analysis on Demographics and NPS: Purpose of Visit by Age
Results & Interpretations:%
% A large majority of Hyatt guests are middle-aged. Attracting more guests on the extreme ends of the
age spectrum will help Hyatt, as observed in the heat map above. Its also worth noting that middle-
aged guests are more likely to be visiting for business than leisure, while the opposite is true for
guests 66+.
%
%

%
%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%10%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Correlation between nightly rate, length of stay, purpose of visit and likelihood to recommend%
Results & Interpretations:
%

%
%
% The above graph addresses a few questions: how does nightly rate effect length of stay, how does
length of stay and nightly rate reflect likelihood to recommend, and why are these guests visiting?
As you can see, length of stays is long for both low nightly rates and very high nightly rates. Most
often, long stay high nightly rates are guests visiting for business. These guests are obviously highly
valuable to Hyatt, because they are not only bringing in a lot of revenue, but a majority of them
have a high likelihood to recommend.
Recommendations:%
% Hyatt should market and offer promotions to the companies from where, significant number
of guests are coming.
% Hyatt should come up with a balanced cost determining formula for nightly rates.
It is observed low nightly rates generally have the longest stays, and there is a gradual
decline as nightly rate increases.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%11%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
Based on the relationship between length-of-stay, nightly rate and revenue
Results & Interpretations:%
% In China, the most popular types of rooms for long-term travelers are Guest Room Double and
Guest Room King as well.
% We could also consider to form a long-term cooperative relationship with transnational corporations
and provide discounts to Guest Room Double to encourage them choose Guest Room Double more
often.
% As we could also see from the below graphs, few customers choose Guest Room Queen. We
recommend that Hyatt Hotel provide more discount to Guest Room Queen to attract short-term
travelers and put more effort on advertising of Guest Room Queen in China during holiday season.
% The nightly rate for each type of room could be summarized as Double>Twin>King>Queen. Hyatt
Hotel in China might consider slightly increase the nightly charge for Guest Room King and Guest
Room Queen to generate more profit.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%12%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
Recommendations:%
% From our analysis the suggested ideal price for Guest Room Twin in China will be around 100
RMB per night.
% Hyatt Hotel should consider setting 200 RMB/night for Guest Room King, since the demand around
200 RMB are pleasant. The same goes for Guest Room Double, Hyatt Hotel should consider setting
150-200 RMB/night for Guest Room King.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%13%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Correlation between early check-in and late check-out with NPS score, purpose of visit, age
Results & Interpretation:%
% Following are the graphs that individually show the relation of early check-in with suspected
features to have a correlation
% Ages 36-65 have more tendencies to check-in early
% Guests travelling for leisure tend to check-in early

%
Late Check-In Results%
% Following are the graphs that individually show the relation of late check-out with suspected
features to have a correlation
% Ages 36-65 have more tendencies to check-out early
% Guests travelling for leisure tend to check-out early

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%14%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Early Check-In and Late Check-Out by NPS%


% Below are the plots of early checking and late checkout by NPS

%
% 82.52% of the Promoters do early check-in compared to 82.22% of the Detractors who check-in
early
% 37.25% of the Promoters do late check-out compared to 37.19% of the Detractors who check-out
late.
%
%
OTHER EMINENT ANALYSIS PERFORMED
As there were several inputs from all the team members, there was a contention on how to add all
the results in the project report. So, recognizing any test result as important becomes subjective,
which may not be fair to the work done and the chosen approach. Thus we are adding all the
prominent test results performed and their underlying explanations under this section.

LOCATION ANALYSIS:%
When beginning our location analysis our goal was to identify patterns concerning likelihood to
recommend. We wanted to identify large regions where there was clear evidence showing that
likelihood to recommend in that area was significantly greater or less. Unfortunately, after
performing a linear regression on country, state and city the R-squared values were very low.%
Linear Model: Country as a predictor of Likelihood to Recommend%
country <- lm(formula = febdata2$Likelihood_Recommend_H~febdata2$Country_PL, data =
febdata2)%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%15%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

summary(country)%
Residual standard error: 1.953 on 68402 degrees of freedom%
Multiple R-squared: 0.004445, Adjusted R-squared: 0.003688%
F-statistic: 5.873 on 52 and 68402 DF, p-value: < 2.2e-16%
%
Linear model: State as a predictor of Likelihood to Recommend%
state<-lm(formula = febdata2$Likelihood_Recommend_H~febdata2$State_PL, data = febdata2)%
> summary(state)%
Residual standard error: 1.953 on 68407 degrees of freedom%
Multiple R-squared: 0.004811, Adjusted R-squared: 0.004127%
F-statistic: 7.036 on 47 and 68407 DF, p-value: < 2.2e-16%
%
Linear Model: City as a predictor of Likelihood to Recommend%
> city<-lm(formula= febdata2$Likelihood_Recommend_H~febdata2$City_PL, data = febdata2)%
> summary(city)%
Residual standard error: 1.933 on 68086 degrees of freedom%
Multiple R-squared: 0.02954, Adjusted R-squared: 0.0243%
F-statistic: 5.633 on 368 and 68086 DF, p-value: < 2.2e-16%
%
These values told us that location alone does not have a significant impact on likelihood to
recommend. It is worth noting though that City has a higher R-squared than Country and State.%
To further analyze location, we created a map that displayed likelihood to recommend by zip
code.%
%
%
%
%
%
%

%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%16%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Graph of Likelihood to Recommend by Zip code%

%
%
There are a few anomalies concerning likelihood to recommend (such as the green dot near
Houston, Texas), but there isnt really a nationwide pattern where, for example, zip codes in a
certain state are noticeably lower. To further examine the zip codes, we also made maps that
zoomed in on high population areas (Southern California and the Northeast).%
%
%
%
%
%
%
%
%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%17%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
Graph of Likelihood to Recommend by Zip code- Southern California%

%
%
When zoomed in on Southern California, likelihood to recommend by zip code seems slightly
more sporadic as compared to other areas in the country. Our advice to Hyatt upon observing
these graphs would be to identify any major differences between high likelihood hotels and low
likelihood hotels, and to make changes accordingly.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%18%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Graph of Likelihood to Recommend by Zip code- New York City%


%
%

%
Like the graph of California, this gives a clearer picture of a region where Hyatt has a lot of
hotels. Overall, it appears as though Hyatt is doing well in this area. There arent any green dots
and the lowest likelihood to recommend looks to be near Manhattan (the brown dot is a 6 on a 1-
10 scale). As we already stated, it would be wise for Hyatt to try and find if there is anything
different about this particular hotel that would lead to a lower likelihood. Otherwise, the hotel may
simply attract guests that are harder to please.%
%
AMENITIES ANALYSIS%
Our analysis regarding hotel amenities relied heavily on linear modeling and visualization. We
performed several rounds of linear regression in an effort to determine the most statistically
significant amenities in determining likelihood to recommend, but low R-squared values made it
difficult to come up with any worthwhile insight to provide to Hyatt. Nonetheless, here is the final

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%19%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

model. It includes only statistically significant amenities in predicting likelihood to recommend


(all p-values are less than .001).%
%
lm(formula = Likelihood_Recommend_H ~ Bell.Staff_PL + Laundry_PL +%
Mini.Bar_PL + Self.Parking_PL + Shuttle.Service_PL, data = febamenities)%
%
Residuals:%
Min 1Q Median 3Q Max%
-8.1159 -0.5583 0.5925 1.2778 1.7776%
%
Coefficients: (4 not defined because of singularities)%
Estimate Std. Error t value Pr(>|t|) %
(Intercept) 8.79993 0.01691 520.346 < 2e-16 ***%
Bell.Staff_PLN 0.25823 0.04100 6.299 3.02e-10 ***%
Bell.Staff_PLY -0.11418 0.03678 -3.104 0.00191 **%
Laundry_PLN -0.09878 0.02451 -4.030 5.59e-05 ***%
Laundry_PLY NA NA NA NA %
Mini.Bar_PLN -0.23718 0.03158 -7.512 5.93e-14 ***%
Mini.Bar_PLY NA NA NA NA %
Self.Parking_PLN -0.12741 0.02710 -4.701 2.59e-06 ***%
Self.Parking_PLY NA NA NA NA %
Shuttle.Service_PLN 0.05773 0.02089 2.764 0.00572 **%
Shuttle.Service_PLY NA NA NA NA %
---%
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1%
%
Residual standard error: 1.978 on 55478 degrees of freedom%
Multiple R-squared: 0.008219, Adjusted R-squared: 0.008112%
F-statistic: 76.62 on 6 and 55478 DF, p-value: < 2.2e-16%
%
Our initial model contained all amenities, and we gradually narrowed our results down by
eliminating variables that werent statistically significant. If we were to offer advice strictly based
on p values, the remaining variables: bell staff, laundry, mini bar, self-parking and shuttle
service, are all amenities that Hyatt should include consider when managing their hotels. The

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%20%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

inclusion of these amenities, with the exception of bell staff, lead to a greater likelihood to
recommend. But as previously mentioned, an R-squared of only .008 means that these variables
have minimal effect. A deeper look into amenities is needed for any actionable insight.%
%
% Are guests satisfied with the internet?
Another amenity, internet, had ten columns relating to guest satisfaction with it. In an effort to
determine how important it is in predicting likelihood to recommend, we ran a linear model with
all internet related variables.%
%
> internet<-lm(formula = aprildata4$Likelihood_Recommend_H~aprildata4$Internet_Sat_H +
aprildata4$Internet_Dissat_Lobby_H + aprildata4$Internet_Dissat_Slow_H +
aprildata4$Internet_Dissat_Wired_H +aprildata4$Internet_Dissat_Other_H
+aprildata4$Internet_Dissat_Billing_H + aprildata4$Internet_Dissat_Expensive_H +
aprildata4$Internet_Dissat_Connectivity_H + aprildata4$TV_Internet_General_H
+aprildata4$Room_Dissat_Internet_H, data = aprildata4)%
> summary(internet)%
Residual standard error: 1.505 on 30355 degrees of freedom%
(38085 observations deleted due to missingness)%
Multiple R-squared: 0.3362, Adjusted R-squared: 0.3359%
F-statistic: 1098 on 14 and 30355 DF, p-value: < 2.2e-16%
Compared to our analysis of all the other amenities, the R-squared of this model (.3362) is
extremely high. Because of this, our recommendation to Hyatt would be to invest heavily in
internet services and ensuring guests have the ability to easily connect from all areas of the hotel.%
One last aspect related to internet that we thought was worth looking into concerned age. We
predicted that older guests, 66+ are less likely to put a premium on how well the internet is
workingmostly because they are less likely to be using it frequently. The following boxplot
confirms our hypothesis:%
> plot(febdata2$Age_Range_H, febdata2$Internet_Sat_H)%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%21%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
As we thought, the mean internet satisfaction for guests 65 and younger was lower than that of
guests 66+. It should be noted that older guests are more likely to be visiting for leisure, and this
may explain why internet is less important to them.%
% What are guests paying for amenities?
Another thought we had on amenities was that if an amenity is available that is free for the guests
to use, then they should be paying more per night. Conversely, if the hotel offers an amenity that
requires the guest to pay extra to benefit from it, then the nightly rate should be lower. Allowing
the guests to save money by lowering the nightly rate will give them the ability to spend more
money using amenities, and therefore enhance their overall experience. A better overall
experience will increase their likelihood to recommend. These boxplots (using grid.arrange)
compare average nightly rates when and when not an amenity is present.%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%22%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
After examining this, we can group each of these amenities into one of four categories:%
1.% Cases where an amenity exists that doesnt present any extra cost to the customer, and the
nightly rate is higher: Spa, Outdoor pool, possibly bell staff.
2.% Cases where an amenity exists that doesnt present any extra cost to the customer, and the
nightly rate is lower: indoor pool, fitness center
3.% Cases where amenity needs to be paid for, but nightly rate is higher when the amenity
exists: Golf, Mini bar
4.% Cases where amenity needs to be paid for, but nightly rate is lower when the amenity
exists: None
Based on these conclusions, we recommend to Hyatt to decrease nightly rates when the hotel has a
golf course and if the room has a mini bar. This provides guests with a greater incentive to use
these amenities and possibly raise NPS scores.%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%23%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
MARKETING STRATEGY TO INCREASE NPS%
% Length of Stay VS. Revenue
United States%

%
From this graph, we could rough summarize the nightly rate for each type of room as
Double>King>Twin>Queen. The two most popular types of room for business travellers are
Guest Room Double and Guest Room King. We could consider provide consistently partnership
with international corporations and offer them annual discount for these two types of room.
Normally speaking, the nightly rate of Guest Room Queen is higher than its of Guest Room
Twin. Maybe Hyatt hotel need to consider increase the nightly rate of Guest Room Queen in the
U.S.%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%24%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

India%
%

%
We can draw a conclusion that Hyatt Hotel doesnt provide Guest Room Queen in India. In
addition, we could see the revenue generated from Guest Room Twin is pretty low. My team
recommend that Hyatt Hotel can cancel Guest Room Twin in India and replace them with Guest
Room Double or Guest Room King. %
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%25%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Japan%
%

%
From this graph, we could tell the most popular room type in Japan is Guest Room Twin because
Japans territory is limited. The nightly rate for each type of room could simply summarize as
Twin>Queen>Double>Twin. Long-term travelers choose Guest Room King more frequently, so
that we might need to increase the nightly rate of Room King. %
%
%
%
%
%
%
%
%
%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%26%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

% Nightly Rate VS Revenue


Japan%
%

%
%
From this graph, we could tell that Guest Room Twin is price sensitive. As you charge more on
Twin Room, the revenue decrease since the demand decrease. The ideal price for Guest Room
Twin in Japan will be 100-200 JPY per night. Guest Room Queen is not price sensitive, because
as the price increase, the revenue increase correspondingly. Hyatt Hotel could consider set 300
JPY/night for Guest Room Queen. The same goes for Guest Room King, Hyatt Hotel could set
500 JPY/night for Guest Room King and 250 JPY/night for Guest Room Double.%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%27%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Australia%

%
%
From this graph, we could tell that Guest Room Twin, Guest Room Double and Guest Room King
in Australia are price sensitive. As you charge more on Twin Room, the revenue decrease
dramatically since the demand decrease. The ideal price for Guest Room Twin in Australia will be
100 AUD per night. Guest Room Double and Guest Room King are also price sensitive, Hyatt Hotel
could consider set 150-200 AUD/night for Guest Room Double and 200-250 AUD/night for Guest
Room King. %
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%28%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

India%
%

%
From this graph, we could tell that Guest Room Twin, Guest Room Double and Guest Room
King are not price sensitive because revenues for each room type dont change much. The ideal
price for Guest Room Twin in India will be less than 800 INR per night, 1000-3000 INR/night
for Guest Room King and 1500-2500 INR/night for Guest Room Double.%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%29%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Check-In and Check-Out %


% Following graph shows the frequency of check-in and checkout over hours of a day showing that
majority of customers check-in early or check-out late

%
%
% Following are the graphs that individually show the relation of early check-in with
suspected features to have a correlation
%

%
%
%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%30%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
% Following are the graphs that individually show the relation of late check-out with
suspected features to have a correlation

%
These are some additional trials: -%
Below are the visualizations of the association rules for early check-in:%

%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%31%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
Below are the visualizations of the association rules for late check-out%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%32%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%33%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
%
% Interpretation of Results
% More than 80% of all guests check-in early
% More than 70% of all guests check-out late
% Given the little variation in the percentage population in each group that shows these behaviors,
our team accepted this as a null hypothesis.
% However, it is highly likely that the volume of data on which we operated if increased might result
in converting this null hypothesis to an accepted phenomenon. Several competitors of Hyatt
including Sheraton, Four Seasons, Wyndham, et al. have moved to charged early check-in and late
check-out models where the customers pay for this benefit. This also aligns with the fundamentals
of Receptive Programmed Decision Making and empowers the customer with a receptive choice.
% As a part of the same analysis our team also figure out that based on numerical features like stay
duration, no. of adults & kids; and categorical features like age group and purpose of visit, it should

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%34%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

be feasible to build linear regression models that can predict with approximation the chances of a
customer to avail for early check-in and late check-out once he/she makes the booking and these
details get available to the hotel. Customers with predicted chances above a certain threshold can be
pro-actively mailed too, making them aware of this facility and asking them if they want to avail the
same, thereby further aiding towards a better probable NPS.
% Ages 36-65 have more tendencies to check-in early or check-out late
% Guests travelling for leisure tend to check-in early or check-out late
% Guests travelling for lesser no. of days with small group size and lesser kid count tend to check-in
early or check-out late.
% Guests aged 26-35 travelling for leisure with no kids and stay for a single day have the highest
changes of checking in early and are usually Promoters.
% Single person guests with no family travelling for leisure and staying more than 6 days have the
greatest chance of Normal checkout
%
Booking Channel%
% We!have!a!lot!of!promoters!for!the!booking!channel
% Creating!a!data!frame!of!NPS_Type!and!Booking!Channel!!
febBookChannel!<9!sqldf('select!Booking_Channel,!NPS_Type!from!febData')%
febBookChannel.freq!<9!as.data.frame(table(febBookChannel))%
ggplot(febBookChannel.freq,!aes(x=febBookChannel.freq$Booking_Channel,!y=febBookChannel.freq$Freq))!+!
geom_bar(aes(fill!=!febBookChannel.freq$NPS_Type),!stat!=!"identity")!+!xlab('Booking!Channel')!+!ylab('Frequency')!
+!guides(fill=guide_legend(title="NPS!Type"))%
%

%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%35%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

% Comparing!Booking!Channel!v/s!POV!(Purpose!of!Visit)!
promoterFebPOV!<9!sqldf('select!Booking_Channel,!POV_H!from!febData!where!NPS_Type!=!"Promoter"')%
promoterFebPOV.freq!<9!as.data.frame(table(promoterFebPOV))%
ggplot(promoterFebPOV.freq,!aes(x=promoterFebPOV.freq$Booking_Channel,!y=promoterFebPOV.freq$Freq))!+!
geom_bar(aes(fill!=!promoterFebPOV.freq$POV_H),!stat!=!"identity")!+!xlab('Booking!Channel')!+!ylab('Frequency')!+!
guides(fill=guide_legend(title="NPS!Type"))%
%

%
%
% Comparing!Booking!Channel!v/s!!POV!!
passiveFebPOV!<9!sqldf('select!Booking_Channel,!POV_H!from!febData!where!NPS_Type!=!"Passive"')%
passiveFebPOV.freq!<9!as.data.frame(table(passiveFebPOV))%
ggplot(passiveFebPOV.freq,!aes(x=passiveFebPOV.freq$Booking_Channel,!y=passiveFebPOV.freq$Freq))!+!
geom_bar(aes(fill!=!passiveFebPOV.freq$POV_H),!stat!=!"identity")!+!xlab('Booking!Channel')!+!ylab('Frequency')!+!
guides(fill=guide_legend('NPS!Type'))%

%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%36%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Grouping!by!POV_H%
% !Comparing!Booking_Channel!VS!NPS_Type!
businessFebData!<9!sqldf('select!Booking_Channel,!NPS_Type!from!febData!where!POV_H!=!"Business"')%
businessFebData.freq!<9!as.data.frame(table(businessFebData))%
ggplot(businessFebData.freq,!aes(x=businessFebData.freq$Booking_Channel,!y=businessFebData.freq$Freq))!+!
geom_bar(aes(fill!=!businessFebData.freq$NPS_Type),!stat!=!"identity")!+!xlab('Booking!Channel!for!Business')!+!
ylab('Frequency')!+!guides(fill=guide_legend(title="NPS!Type"))%
%

%
%
leisureFebData!<9!sqldf('select!Booking_Channel,!NPS_Type!from!febData!where!POV_H!=!"Leisure"')%
leisureFebData.freq!<9!as.data.frame(table(leisureFebData))%
ggplot(leisureFebData.freq,!aes(x=leisureFebData.freq$Booking_Channel,!y=leisureFebData.freq$Freq))!+!
geom_bar(aes(fill!=!leisureFebData.freq$NPS_Type),!stat!=!"identity")!+!xlab('Booking!Channel!for!Leisure')!+!
ylab('Frequency')!+!guides(fill=guide_legend(title="NPS!Type"))%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%37%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

%
%
combinationFebData!<9!sqldf('select!Booking_Channel,!NPS_Type!from!febData!where!POV_H!=!"Combination!of!
both!business!and!leisure"')%
combinationFebData.freq!<9!as.data.frame(table(combinationFebData))%
ggplot(combinationFebData.freq,!aes(x=combinationFebData.freq$Booking_Channel,!
y=combinationFebData.freq$Freq))!+!geom_bar(aes(fill!=!combinationFebData.freq$NPS_Type),!stat!=!"identity")!+!
labs(x='Booking!Channel!for!Combination',!y='Frequency')!+!guides(fill=guide_legend(title="NPS!Type"))%

%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%38%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

% Guest satisfaction which is collected via feedback, we decided to ask which metrics from guest
satisfaction are relevant than others to boost NPS
% We decided to apply association rules over all guest satisfaction metrics. We created a new data frame
where all 8 metrics were added. To understand result better, we added NPS_Type in it and applied apriori
algorithm using the arules library. As a result, we get 41 rules of which 6 were relevant in understanding the
result
% Why association rules for this?
% It is easy to evaluate the result given the fact that we have Lift as meaningful metric to know whether a
given combination is relevant or not
%
Output:%
lhs rhs support confidence lift%
[1] {} => {NPS_Type=} 0.94149455 0.9414946 1.00000%
[2] {Overall_Sat_H=10} => {Condition_Hotel_H=10} 0.02052541 0.8759209 31.19884%
[3] {Condition_Hotel_H=10} => {Overall_Sat_H=10} 0.02052541 0.7310807 31.19884%
[4] {Overall_Sat_H=10} => {Customer_SVC_H=10} 0.02174757 0.9280764 29.27289%
[5] {Customer_SVC_H=10} => {Overall_Sat_H=10} 0.02174757 0.6859500 29.27289%
[6] {Overall_Sat_H=10} => {NPS_Type=Promoter} 0.02322954 0.9913196 24.68251%
[7] {NPS_Type=Promoter} => {Overall_Sat_H=10} 0.02322954 0.5783840 24.68251%
[8] {Guest_Room_H=10} => {Condition_Hotel_H=10} 0.02340047 0.9105724 32.43306%
[9] {Condition_Hotel_H=10} => {Guest_Room_H=10} 0.02340047 0.8334855 32.43306%
[10] {Guest_Room_H=10} => {Customer_SVC_H=10} 0.02217233 0.8627823 27.21341%
[11] {Customer_SVC_H=10} => {Guest_Room_H=10} 0.02217233 0.6993476 27.21341%
[12] {Guest_Room_H=10} => {NPS_Type=Promoter} 0.02406625 0.9364794 23.31707%
[13] {NPS_Type=Promoter} => {Guest_Room_H=10} 0.02406625 0.5992169 23.31707%
[14] {Condition_Hotel_H=10} => {Customer_SVC_H=10} 0.02398505 0.8543075 26.94610%
[15] {Customer_SVC_H=10} => {Condition_Hotel_H=10} 0.02398505 0.7565236 26.94610%
[16] {Condition_Hotel_H=10} => {NPS_Type=Promoter} 0.02577727 0.9181431 22.86052%
[17] {NPS_Type=Promoter} => {Condition_Hotel_H=10} 0.02577727 0.6418190 22.86052%
[18] {Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02858652 0.9016606 22.45013%
[19] {NPS_Type=Promoter} => {Customer_SVC_H=10} 0.02858652 0.7117656 22.45013%
[20] {Overall_Sat_H=10, %
Condition_Hotel_H=10} => {NPS_Type=Promoter} 0.02038610 0.9932129 24.72965%
[21] {NPS_Type=Promoter, %
Overall_Sat_H=10} => {Condition_Hotel_H=10} 0.02038610 0.8775938 31.25842%
[22] {NPS_Type=Promoter, %
Condition_Hotel_H=10} => {Overall_Sat_H=10} 0.02038610 0.7908557 33.74974%
[23] {Overall_Sat_H=10, %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%39%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02158860 0.9926904 24.71665%


[24] {NPS_Type=Promoter, %
Overall_Sat_H=10} => {Customer_SVC_H=10} 0.02158860 0.9293598 29.31337%
[25] {NPS_Type=Promoter, %
Customer_SVC_H=10} => {Overall_Sat_H=10} 0.02158860 0.7552021 32.22822%
[26] {Guest_Room_H=10, %
Condition_Hotel_H=10} => {Customer_SVC_H=10} 0.02077924 0.8879839 28.00831%
[27] {Guest_Room_H=10, %
Customer_SVC_H=10} => {Condition_Hotel_H=10} 0.02077924 0.9371699 33.38042%
[28] {Condition_Hotel_H=10, %
Customer_SVC_H=10} => {Guest_Room_H=10} 0.02077924 0.8663412 33.71156%
[29] {Guest_Room_H=10, %
Condition_Hotel_H=10} => {NPS_Type=Promoter} 0.02216891 0.9473703 23.58824%
[30] {NPS_Type=Promoter, %
Guest_Room_H=10} => {Condition_Hotel_H=10} 0.02216891 0.9211620 32.81025%
[31] {NPS_Type=Promoter, %
Condition_Hotel_H=10} => {Guest_Room_H=10} 0.02216891 0.8600179 33.46550%
[32] {Guest_Room_H=10, %
Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02142963 0.9665035 24.06463%
[33] {NPS_Type=Promoter, %
Guest_Room_H=10} => {Customer_SVC_H=10} 0.02142963 0.8904436 28.08589%
[34] {NPS_Type=Promoter, %
Customer_SVC_H=10} => {Guest_Room_H=10} 0.02142963 0.7496412 29.17047%
[35] {Condition_Hotel_H=10, %
Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02288939 0.9543187 23.76124%
[36] {NPS_Type=Promoter, %
Condition_Hotel_H=10} => {Customer_SVC_H=10} 0.02288939 0.8879679 28.00780%
[37] {NPS_Type=Promoter, %
Customer_SVC_H=10} => {Condition_Hotel_H=10} 0.02288939 0.8007056 28.51979%
[38] {Guest_Room_H=10, %
Condition_Hotel_H=10, %
Customer_SVC_H=10} => {NPS_Type=Promoter} 0.02017073 0.9707153 24.16949%
[39] {NPS_Type=Promoter, %
Guest_Room_H=10, %
Condition_Hotel_H=10} => {Customer_SVC_H=10} 0.02017073 0.9098655 28.69848%
[40] {NPS_Type=Promoter, %
Guest_Room_H=10, %
Customer_SVC_H=10} => {Condition_Hotel_H=10} 0.02017073 0.9412539 33.52589%
[41] {NPS_Type=Promoter, %
Condition_Hotel_H=10, %

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%40%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Customer_SVC_H=10} => {Guest_Room_H=10} 0.02017073 0.8812262 34.29077%


%
From output we infer that, there are higher number of promoters for following metrics%
% Quality of Customer Service metric
% Hotel Condition
% Guest Room
% One important observation, there is higher chance that if a customer is promoter then he/she is
satisfied by both hotel condition and Quality of customer service metric
% A promoter usually gives full rating of 10 to hotel condition and Quality of customer service metric
% Since we have many values for Lift parameter in 25-30 range when compared to maximum lift
value, it can be inferred that there is a higher probability of promoters for almost all metrics
%

% Which locations across globe has a larger share of revenue from reservation?
We applied 3 concepts to achieve this. First a tapply of February reservation revenue with cities at which
Hyatt is located worldwide. Then club the result of tapply with data from world.cities dataset imported
from the web. World.cities has 4 columns country, city, latitude and longitude. After merging the above
result of tapply and world.cities. We plot the same result on world map with each dot representing the
location and color representing the revenue of that place.%
% Why map and tapply?
Tapply is quick, efficient in grouping data based on a column. Map is more interactive way to visualize
results and easy to understand.%
%

Output: Result of tapply and merge%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%41%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

% Mapping result globally

The output shows that we have maximum revenue from USA because of fact that we have 50% of locations
in Americas compared globally.%

% Of 370 locations worldwide, 185 are in USA


% 30% of total revenue in reservation is from USA
Within USA:%

% Hyatt Regency Clearwater Beach Resort and Spa leads in US with maximum revenue of 827.137$
% Hyatt Place Lincoln/Downtown-Haymarket has the lowest revenue in US with a value of 129.34$
%

% Get a comparative analysis of worldwide revenue from reservation across countries, for Hyatt to
know where it needs to enhance operations if needed.
The data-frame generated in the above result by merging World.cities and tapply result is used in this
step. To find where it needs to enhance operations, sqldf library was used. Using this library, sql queries
are run on data frame. %

% Why sqldf ?
It is easy to get the analysis once all the data we need is in form of a data-frame. So SQL library is
convenient and quick to generate the desired result.%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%42%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Global View:%

China has the lowest revenue coming from reservation with a mean value of just 127.5412$, lower than
the average at global level%
Though USA accounts for close to 30% of reservation revenue, France on other hand is highest revenue
generator for Hyatt with just 5 locations against 185 in USA%
We can infer from worldwide reservation revenue that Hyatt should expand itself more in France because just
5 locations and it is highest revenue generator for Hyatt. It can be upcoming market, so Hyatt management
give a thought about this. At the same time, they should look into their China operations. Being the lowest
revenue generator Hyatt management should study factors which are crucial in boosting the operations at
China.%

%
%
%
%
%
%
%

%
%
%
%
%
%
%
%

%
%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %
Page%|%43%
IST%659%PROJECT%IMPLEMENTATION%REPORT:%GROUP:1%

Reflections:

This whole project has been a great learning experience for all us. There were several aspects of
this project which were very novel and challenging for all of us. As none of us have used R tool
before, it was quite a challenge to put the logic behind the codes and come up with meaningful
results and useful.
In addition to the these, the most significant takeaways from this project are as follows:%
% A Data Question is the basis of any analysis
As soon as the project description was given to all of us, most of us jumped directly to
come up with fancy graphs and charts depicting correlations between different
attributes. As soon as we came up with graphs we couldnt understand their utility with
respect to the business growth as these graph did not give us any conclusive insights.
Thus, we went back to the start and decided to come up with major data questions and
work on those individually. Without any data questions, there is high chance of getting
stranded in numerous analysis with no results or conclusion.
% Any assumptions made should be examined carefully.
Assumptions are always made while doing any type of analysis. This could be very
critical when comes to data analysis. As we did several analyses in order to come up
with meaningful results, we faced many setbacks just because we made many
assumptions on the basis of individual understanding. And with further examination we
noticed that the analysis done is based on false premise and is not supported by data.
Thus, it is very important to observe what the data conveys and not what you
understood from the data.
% Sample size is crucial when it comes to data analysis
As we mentioned above, we have been working on the February data for all of our
analysis. During this period of brainstorming and performing analysis, we tried several
different models, and correlations; but some of them were not reliable as there were
many anomalies and discrepancies in the results. It could be apprehended that the
sample size provides necessary reliability to the results. But at the same time, a big data
set will bring cleansing and handling issues. Thus a sample size must be chosen
carefully.
% Data Cleansing and Grouping should be done prudently.
During the entire projects there were several instances where we have to get back to the
original data set and add some columns which were removed or not considered into the
workable data frames. Reckless handling of data set, can be disastrous, as recovering
lost data can be another colossal task. Thus, one should make sure that you have an
original copy intact and work on copy of the same.
% Purpose is paramount
With 7 people on a team, there were many instances where we had intense discussions
on individual analysis and how they serve the purpose of the project.
As all of us worked hard to come up with meaningful and reliable insights, all of us were
little possessive with our work and wanted to be included in the final project.
But we made sure we had unbiased opinions on each others work, which lead us to
identify and select the best work. This is very necessary when it comes to achieve a
bigger purpose.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %

You might also like