Professional Documents
Culture Documents
9/29/14, 10:58 PM
Bora Beran
Worst blog article you ever saw? Well, my next one will
be better.
Page 1 of 8
9/29/14, 10:58 PM
download.file("http://cran.rC
project.org/src/contrib/Archive/sentiment/sentiment_0.2.tar.gz",
"sentiment.tar.gz")
install.packages("sentiment.tar.gz",Grepos=NULL,Gtype="source")
Lets take the first stab by using the classify_polarity function. Comment Text column contains
reviews for a hypothetical product. We are using our calculated field Sentiment for both text and
color coding as it returns one of three classifications: negative, neutral and positive.
You will notice that the results are not perfect. Second row from the bottom, is in fact a negative
comment about delayed delivery but classified as a positive comment. More on that later. Now lets
have a look at what the calculated field looks like.
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 2 of 8
9/29/14, 10:58 PM
As you can see the R script is very simple. We are calling the function and retrieving the column
corresponding to best_fit. Another method in this package is classify_emotion which classifies text
into emotion such as anger, joy, fear The function call is very similar but we get a different
dimension from the results this time. Especially the two lines that are associated with emotion fear
look far off. But how does this work and how can it be made better?
(http://boraberan.files.wordpress.com/2013/12/image3.png)
Sentiment analysis techniques can be classified into two high level categories:
1. Lexicon based : This technique relies on dictionaries of words annotated with their orientation
described as polarity and strength e.g. negative and strong, based on which a polarity score for
the text is calculated. This method gives high precision results as long as lexicon used has a good
coverage of words encountered in the text being analyzed.
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 3 of 8
9/29/14, 10:58 PM
2. Learning based : These techniques require training a classifier with examples of known polarity
presented as text classified into positive, negative and neutral classes.
Rs sentiment package follows a lexicon based approach hence we were able to get right into the
action,
given
it
comes
with
a
lexicon
(http://people.cs.pitt.edu/~wiebe/pubs/papers/emnlp05polarity.pdf) for English. In your R
package library under \sentiment\data folder you can find the lexicon as a file named
subjectivity.csv.gz.
The text that was incorrectly classified as having positive polarity is the following Took 4 weeks to
receive it even though I paid for 2 day delivery. What a scam. If you open the file, as you probably
suspected, you will find out that scam is not a word in the lexicon. Lets add the following line to the
file,
scam,strongsubj,negative
then save, zip the file, restart RServe and refresh our workbook.
(http://boraberan.files.wordpress.com/2013/12/image4.png)
Now, you can see that the text is classified correctly as expressing negative sentiment. When using
lexicon-based systems, adding new words to the lexicon or using a completely new lexicon are
potential paths to follow if you are not getting good results. Incorrect classifications are more likely
if slang, jargon and colloquial words are being used in the text youre analyzing since these are not
covered extensively in common lexica.
You can download the workbook containing the example HERE (http://sdrv.ms/1ckyO5P).
Happy Holidays!
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 4 of 8
9/29/14, 10:58 PM
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 5 of 8
9/29/14, 10:58 PM
I wrote about how to take advantage of Rserve configuration file to preload packages and
other objects here which you may find useful
http://boraberan.wordpress.com/2013/12/16/logistic-regression-in-tableau-using-r/
If you do this Rserve will load the package on start only once instead of evaluating
library(sentiment) command every time you refresh your view in Tableau. It shortens your
code in R, also would give you better performance. i will add a pointer from this article to that
one and a note that the example assumes the libraries are pre-loaded in Rserve configuration
to avoid future confusion.
~ Bora
Reply
Praveen Koppolu says:
March 12, 2014 at 9:53 am
Im trying to replicate the above, got the error
Error in base::parse(text = .cmd) : :1:71: unexpected input
1: library(sentiment);polarity_data = classify_polarity(.arg1,algorithm=
^
Please help me.
Reply
Bora Beran says:
March 25, 2014 at 10:36 pm
I cant tell much from the snippet. This sort of error commonly happens when the line is a
continuation and there is something missing like a trailing comma on the previous line. The
other likely cause is ASCII vs UTF. It could be that youre using a different kind of quote or
there is some other non-ASCII character in there. If you are editing in a tool like Word or
copy-pasting from a browser etc., it is likely that you get the wrong character while it may
appear like the right character on the surface.
Reply
kurrabac says:
March 25, 2014 at 3:05 pm
there is simpler way doing for Twitter text sentiment analysis in R. Try this pacakge.
https://github.com/okugami/sentiment140/blob/master/README.md
Reply
Hein says:
May 16, 2014 at 4:15 am
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 6 of 8
9/29/14, 10:58 PM
I installed R v3.0.2 and tried to install this package. I got an error message that it is not
available for this version of R.
Reply
Shubho Ray says:
May 5, 2014 at 9:24 pm
Hi,
This is a very interesting article which prompted me to recreate it with my own response data.
The only problem is, Im unable to add any extra lines to the subjectivity.csv file that you had
mentioned.
Can you point out as to what could be the reason?? If my query is not clear, do let me know what
extra information you require.
Reply
Bora Beran says:
May 11, 2014 at 12:32 pm
Hi Shunho,
What were the steps you used? Unzip, edit, save, zip? If the file you saved didnt work,
potential issues could be related to encoding, using different new line etc. If your changes
seem to be ignored, it could be because the package is already loaded in which case loading
again (if this is done in your Rserve config, restarting Rserve) could be the solution.
Reply
Shubho Ray says:
May 28, 2014 at 1:32 am
Sorry for the delayed response.
Steps used were exactly the same that you mentioned, but I didnt understand what
exactly you meant by the different new line issue.
As for reloading Rserve, I did that too, every time I made any changes to the file.
So, how to proceed from this point?
Matthew Loxton says:
May 8, 2014 at 2:17 pm
Looks like R Sentiment has been discontinued, in the interim do you have any other suggestions?
Reply
Bora Beran says:
May 11, 2014 at 11:34 am
You can still get the sentiment package from Omegahat.org. I picked sentiment since it is the
most straightforward to use.
Qdap is another package that can be used for sentiment analysis and it is still in CRAN. You
can also write your own function to do it or give a shot to sentiment140 as suggested in the
above link.
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 7 of 8
9/29/14, 10:58 PM
Reply
http://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/
Page 8 of 8