Professional Documents
Culture Documents
Approach
Bhargav Ramprasad Panth Maaz Hasan Onurcan Onder
16340549 17319658 17314870
panthb@tcd.ie mhasan@tcd.ie ondero@tcd.ie
Some Aditya Mandal
17311198
mandals@tcd.ie
Abstract
The proliferation of social media has dramatically affected the behavior of advertising industry.
Today, a remarkable proportion of social media platforms are being used like an advertising billboard.
There are many reasons behind the popularity of social media marketing, and of course, one of the
biggest reason is, it allows users to write their reviews and opinions for the presented product (food,
place etc.). However, since these social media platforms do not provide any proper rating system for
those kinds of advertisements, the further evaluation, and analysis of that huge amount of data is not
being easy or not even possible with basic techniques. In this paper, a hybrid approach is proposed
to predict the user ratings with a resonably good accuracy.
1 Introduction
There have been interests in implementing more advance rating prediction systems using different tech-
niques/approaches such as machine learning, text mining, semantic analysis etc. However, there is still
no commonly accepted accurate prediction system for this issue. Accordingly, this paper proposes a
hybrid rating prediction technique based on the user reviews. As a part of this study, the effectiveness of
different techniques and approaches are analyzed and substantiated. . Finally, an effective hybrid system
based on the observations is developed. Accordingly, stop words are removed, stemming is applied for
feature selection. Then a unique lexicon is created using the Yelp user reviews dataset. Finally three
different machine learning classifiers are applied, evlatued and analyzed.
2 Related Work
Various techniques and systems have been described in the literature on user rating prediction. Re-
searchers have been looking into ways of improving the accuracy of the predictions.
To this end, Channapragada and Shivaswamy (2015) developed a system that predicts the rating
of a business based on the user review. They have used Linear Regression as the regression algorithm,
Support Vector Machine (SVM) and Naive Bayes as the classification algorithms. They have examined
Yelp dataset and used it to evaluate their system. Even by looking at the visualizations of the dataset, they
could able to make some assumptions. For example, the data shows that there is a consistent decrease
in the rating with the increase of the word number in a review (Fig. 1). Another pre-processing has
been done with words, they have calculated some weights for the polarity of the words, e.g., worst is
the most negative or incredible is the most positive one. Finally, as a part of their project, they have
1
Figure 1: Frequency Distribution of Length Review vs Average Rating, from Channapragada and Shiv-
aswamy (2015)
compared these three machine learning algorithms with different feature combinations from the dataset.
They have used 80% of the data for training and the rest for the tests. Experiments showed that, since
linear regression gives numbers between two integers, mean square error value was lower compared to
classification algorithms. However, it was not meaningful to give a non-decimal rating for a 5-star review.
In the comparison between two classification algorithms, SVM was always giving a slightly better result
with all the features. Because, Naive Bayes algorithm detects a false conditional independence between
features.
Wang (2015) has used the Yelp user reviews and applied Sentiment Analysis. Sentiments were
predicted using Naive Bayes, multi-class SVM, and Perceptron learning algorithm. In all of the models
used in this study, it was observed that removing stop words, common symbols and stemming reduced
the chances of multicollinearity and provided results with reduced dimension. Significantly less training
error was achieved by doing so. The study is concluded with a remark from the author that the per-
formance of the multi-class SVM and Naive Bayes algorithm had a less accuracy in comparison to the
Perceptron learning algorithm for the prediction of results. Also, by adding regularization terms and
running cross-validation parameters can improve the performance of the test set.
Ganu et al. (2009) focused on identifying information structure and sentiment from free-form
text reviews to predict the rating. The authors extracted their corpus of over 50000 restaurant reviews
from Citysearch New York. First, they analyzed the data to identify categories which are specific to the
restaurant reviews domain using 7-fold cross-validation. They were able to identify the following six
categories: Food, Service, Price, Ambience, Anecdotes, and Miscellaneous. To classify the sentences
into the above-mentioned categories and sentiment classes, they manually annotated a training set of
approximately 3400 sentences with both category and sentiment information. They trained and tested
SVM classifiers on their manually annotated data (one classifier for each topic and one for each senti-
ment type). Then they performed 7-fold cross validation with accuracy, precision and recall metrics to
observe the performance of their classification. They performed an in-depth analysis of the corpus of
52264 user reviews, such that they can study the relation between the textual structure of the reviews and
the metadata entered by the reviewers, such as star rating. Then, they compared star rating with the sen-
2
timent annotation produced by their classifier using the Pearson correlation coefficient. The coefficient
ranges from -1 to 1, with -1 for negative correlation, 1 for positive correlation and 0 for no correlation.
Their results showed a positive correlation (0.45) between the star rating and the percentage of positive
sentences in the review, and a negative correlation (-0.48) between the star rating and the percentage of
negative sentences. For Rating prediction, they experimented and used the popular Mean Squared Error
(MSE) accuracy metric to evaluate their prediction techniques. with different prediction strategies. Like
in one of their experiments, they based the computation of the text rating on the number of Positive and
Negative sentences in the review, either Review based or Topic-Based or Rating Based. They further
used multivariate regression to model the user provided star rating as the dependent variable; the sen-
tence types, represented as (category, sentiment) pairs are the independent variables. They concluded
that Predicting the regression-based text ratings is more difficult than predicting the sentiment-based text
ratings and results in high MSE values.
3
than 3 value 1.0 is assigned which s inferred as a ”Positive” sentiment and otherwise it was assigned 0.0
for ”Negative” sentiment. Cross validation is used and the algorithms are excecuted on a sample size of
100000. Sample set is randomly split into training (70% of the data) and test (the remaining 30%) sets.
where tp, fp and fn are the number of True Positives, False Positives, and False Negatives respectively.
Accuracy refers to the closeness of a measured value to a standard or known value. Precision refers
to the closeness of two or more measurements to each other.
In the predictive analysis, Confusion Matrix is used as another evaluation criterion. Confusion Matrix
is a table with multiple rows and columns that reports the number of false positives, false negatives,
true positives, and true negatives. This allows more detailed analysis than a mere proportion of correct
classifications such as accuracy.
4
4.2 Results
4.2.1 Naive Bayes
Multinomianal-Naive Bayes is evaluated on 100,000 instances. The results are represented with preci-
sion, recall and f1-score metrics. First, polarity of the reviews are observed (Fig. 2). Then same methods
are implemented on 5 classes which represent 5 stars (Fig. 3). The results are observed relatively high
for 2 classes polarity evaluation. However, a significant decrease is observed in the results for 5 classes.
This inference can be based on the fact that lexicons with 4 and 5 stars are relatively close and lexicons
with rating of 1,2 and 3 are relatively close.
5
Figure 3: 5 Classes Evaluation using Naiven Bayes
6
Figure 5: 5 Classes Evaluation using SVM
7
Figure 6: Error Comparison of Feature Selection Algorithms using Logistic Regression
5 Conclusion
Various machine learning algorithms were experimented in predicting the reviews of Yelp dataset. Effec-
tiveness of each of the algorithms were calculated with precision, recall and F1 metrics. No significant
8
improvement was noticed with the removal of features such as stop words in the case of polarity classifi-
cation. However, there was an effective improvement in the results in the case of multiclass classification.
Overall, we achieved an accuracy of 79 percent for polarity in comparsion with a 40 percent accuracy
for multiclass classification. And Naive Bayes was noted to be the best performing algorithm.
References
Channapragada, S. and R. Shivaswamy (2015). Prediction of rating based on review text of yelp reviews.
Ding, X., B. Liu, and P. S. Yu (2008). A holistic lexicon-based approach to opinion mining. pp. 231–240.
Ganu, G., N. Elhadad, and A. Marian (2009). Beyond the stars: improving rating predictions using
review text content. In WebDB, Volume 9, pp. 1–6. Citeseer.
Wang, J. (2015). Predicting yelp star ratings based on text analysis of user reviews.
9
Author Declaration for Group Assignments
17319658 Maaz Hasan For research purpose, I contributed to the literature review by reading 10%
three research papers focusing on the methods for achieving the results
pertaining to the research question. I also implemented the Bag of Words
model from NLP and Naive Bayes on the yelp dataset to categorise the
data as good or bad but the precision and recall score obtained, I have
implemented the stop words removal on the overall system. I also
contributed for drafting the final essay.
17314870 Onurcan Onurcan studied on machine-learning papers and applied Support 30%
Onder Vector Machine to overall system. He wrote the abstract, introduction,
related work sections and fine-tuned the whole paper to make the paper
suitable for formatting requirements. Also learnt LaTeX tool.
17311198 Some Some studied on papers about data pre-processing, created a unique 30%
Aditya lexicon and applied stemming. He also applied Naïve Bayes algorithm
Mandal to overall system. He wrote implementation section.
We have read and we understand the plagiarism provisions in the General Regulations
of the University Calendar for the current year, found at: http://www.tcd.ie/calendar
We have also completed the Online Tutorial on avoiding plagiarism ‘Ready, Steady,
Write’, located at http://tcd-ie.libguides.com/plagiarism/ready-steady-write
We declare that the assignment together with any supporting artefact is offered for
assessment as our original and unaided work, expect in so far as any advice and/or
assistance from any other named person in preparing it and any reference material
used are duly and appropriately acknowledged. We declare that the percentage
contribution by each member as stated above has been agreed by all members of the
group and reflects the actual contribution of the group members.