Professional Documents
Culture Documents
INTRODUCTION
1.1 Introduction
SOCIAL MEDIA, is a group of Internet-based applications that build on the
ideological and technological foundations of Web 2.0, and that allow the creation
and exchange of user-generated content. Via social media, people can enjoy
enormous information, convenient communication experience and so on. However,
social media may have some side effects such as cyberbullying, which may have
negative impacts on the life of people, especially children and teenagers.
Cyberbullying can be defined as aggressive, intentional actions performed by an
individual or a group of people via digital communication methods such as sending
messages and posting comments against a victim. Different from traditional bullying
that usually occurs at school during face-to-face communication, cyberbullying on
social media can take place anywhere at any time. For bullies, they are free to hurt
their peers feelings because they do not need to face someone and can hide behind
the Internet. For victims, they are easily exposed to harassment since all of us,
especially youth, are constantly connected to Internet or social media.
As reported cyberbullying victimization rate ranges from 10% to 40%. In the
United States, approximately 43% of teenagers were ever bullied on social media.
The same as traditional bullying, cyberbullying has negative, insidious and sweeping
impacts on children . The outcomes for victims under cyberbullying may even be
tragic such as the occurrence of self-injurious behavior or suicides.
One way to address the cyberbullying problem is to automatically detect and
promptly report bullying messages so that proper measures can be taken to prevent
possible tragedies. Previous works on computational studies of bullying have shown
that natural language processing and machine learning are powerful tools to study
bullying. Cyberbullying detection can be formulated as a supervised learning
problem. A classifier is first trained on a cyberbullying corpus labeled by humans,
and the learned classifier is then used to recognize a bullying message. Three kinds
of information including text, user demography,and social network features are often
used in cyberbullying detection[9]. Since the text content is the most reliable,our
work here focuses on text-based cyberbullying detection
In the text-based cyberbullying detection, the rst and also critical step is the numerical
representation learning for text messages. labeling data is labor intensive and time
consuming and cyberbullying is hard to judge from a third view due to its intrinsic
ambiguities.
Natural Language Processing is a field that covers computer understanding and manipu-
lation of human language, and its ripe with possibilities for newsgathering, Anthony
Pesce said in Natural Language Processing in the kitchen. You usually hear about it in
the context of analyzing large pools of legislation or other document sets, attempting to
discover patterns or root out corruption.
NLP algorithms are typically based on machine learning algorithms. Instead of hand-
coding large sets of rules, NLP can rely on machine learning to automatically learn these
rules by analyzing a set of examples (i.e. a large corpus, like a book, down to a collection
of sentences), and making a statical inference. In general, the more data analyzed, the
more accurate the model will be.
Summarize blocks of text using Summarizer to extract the most important and
central ideas while ignoring irrelevant information.
Create a chat bot using ParseyMcParseface, a language parsing deep learning
model made by Google that uses Point-of-Speech tagging.
Automatically generate keyword tags from content using AutoTag, which
leverages LDA, a technique that discovers topics contained within a body of text.
Identify the type of entity extracted, such as it being a person, place, or
organization using Named Entity Recognition.
Use Sentiment Analysis to identify the sentiment of a string of text, from very
negative to neutral to very positive.
Reduce words to their root, or stem, using PorterStemmer, or break up text
into tokens using Tokenizer.
These libraries provide the algorithmic building blocks of NLP in real-world applications.
Algorithm provides a free API endpoint for many of these algorithms, without ever
having to setup or provision servers and infrastructure.
Apache Open NLP: a machine learning toolkit that provides tokenizers, sentence
segmentation, part-of-speech tagging, named entity extraction, chunking, parsing,
coreference resolution, and more.
Natural Language Toolkit (NLTK): a Python library that provides modules for
processing text, classifying, tokenizing, stemming, tagging, parsing, and more.
Standford NLP: a suite of NLP tools that provide part-of-speech tagging, the
named entity recognizer, coreference resolution system, sentiment analysis, and
more.
MALLET: a Java package that provides Latent Dirichlet Allocation, document
classification, clustering, topic modeling, information extraction, and more.
Sentiment analysis (also known as opinion mining) refers to the use of natural language
processing, text analysis and computational linguistics to identify and extract subjective
information in source materials. Sentiment analysis is widely applied to reviews and
social media for a variety of applications, ranging from marketing to customer service.
A basic task in sentiment analysis is classifying the polarity of a given text at the
document, sentence, or feature/aspect levelwhether the expressed opinion in a
document, a sentence or an entity feature/aspect is positive, negative, or neutral.
Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional
states such as "angry", "sad", and "happy".
The central problems (or goals) of AI research include reasoning, knowledge, planning,
learning, natural language processing (communication), perception and the ability to
move and manipulate objects.General intelligence is among the field's long-term
goals.Approaches include statistical methods, computational intelligence, and traditional
symbolic AI. Many tools are used in AI, including versions of search and mathematical
optimization, logic, methods based on probability and economics. The AI field draws
upon computer science, mathematics, psychology, linguistics, philosophy, neuroscience
and artificial psychology.
The field was founded on the claim that human intelligence "can be so precisely
described that a machine can be made to simulate it". This raises philosophical arguments
about the nature of the mind and the ethics of creating artificial beings endowed with
human-like intelligence, issues which have been explored by myth, fiction and
philosophy since antiquity. Some people also consider AI a danger to humanity if it
progresses unabatedly.
Chapter 2
LITERATURE SURVEY
Karthik Dinakar, Roi Reichart, Henry Lieberman Modeling the Detection of Textual
Cyberbullying MIT Media Lab, computer Science & Artificial Intelligence
Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 USA.
The scourge of cyber bullying has assumed alarming proportions with an ever-
increasing number of adolescents admitting to having dealt with it either as a victim or as
a by stander. Anonymity and the lack of meaningful supervision in the electronic
medium are two factors that have exacterbated this social menace. Comments or posts
involving sensitive topics that are personal to an individual are more likely to be
internalized by a victim, often resulting in tragic outcomes. We decompose the overall
detection problem in- to detection of sensitive topics, lending itself into text classification
sub-problems. We experiment with a corpus of 4500 YouTube comments, applying a
range of binary and multiclass classifiers. We find that binary classifiers for in- dividable
labels outperform multiclass classifiers. Our findings show that the detection of textual
cyber bullying can be tackled by building individual topic-sensitive classifiers.
Andreas M. Kaplan*, Michael Haenlein Users of the world, unite! The challenges
andopportunities of Social Media ESCP Europe, 79 Avenue de la Republique, F-
75011 Paris, France
The concept of Social Media is top of the agenda for many businessexecutives
today. Decision makers, as well as consultants, try to identify ways inwhich firms can
make profitable use of applications such as Wikipedia, YouTube, Facebook, Second Life,
and Twitter. Yet despite this interest, there seems to be verylimited understanding of
what the term Social Media exactly means; this articleintends to provide some
clarification. We begin by describing the concept of SocialMedia, and discuss how it
differs from related concepts such as Web 2.0 and User Generated Content. Based on this
definition, we then provide a classification of Social Media which groups applications
currently subsumed under the generalized term intomore specific categories by
characteristic: collaborative projects, blogs, contentcommunities, social networking sites,
virtual game worlds, and virtual social worlds.Finally, we present 10 pieces of advice for
companies which decide to utilize SocialMedia.
Qianjia Huang, Vivek K. Singh, Pradeep K. Atrey Cyber Bullying Detection Using
Social and Textual Analysis Department of Applied Computer Science ,The
University of Winnipeg, Winnipeg, MB, Canada, The Media Lab Massachusetts
Institute of Technology Cambridge, MA, USA, Department of Computer Science
University at Albany SUNY Albany, NY, USA.
Cyber Bullying, which often has a deeply negative impact onthe victim, has
grown as a serious issue among adolescents.To understand the phenomenon of cyber
bullying, expertsin social science have focused on personality, social relationships and
psychological factors involving both the bully andthe victim. Recently computer science
researchers have alsocome up with automated methods to identify cyber
bullyingmessages by identifying bullying-related keywords in cyberconversations.
However, the accuracy of these textual feature based methods remains limited. In this
work, we investigate whether analyzing social network features can improve the accuracy
of cyber bullying detection. By analyzing the social network structure between users and
deriving features such as number of friends, network embeddedness, and relationship
centrality, we find that the detection of cyber bullying can be significantly improved by
integrating the textual features with social network features
Jun-Ming Xu, Kwang-Sung Jun, Xiaojin Zhu, Amy Bellmore, Learning from
Bullying Traces in Social Media Department of Computer Sciences, University of
Wisconsin-Madison, Madison, WI 53706, USA, Department of Educational
Psychology, University of Wisconsin-Madison, Madison, WI 53706, USA.
We introduce the social study of bullying tothe NLP community. Bullying, in
both physical and cyber worlds (the latter known as cyberbullying), has been recognized
as a serious national health issue among adolescents. However, previous social studies of
bullying are handicapped by data scarcity, while the few computational studies narrowly
restrict themselves to cyberbullying which accounts for only a small fraction of all
bullying episodes. Our main contribution is to present evidence that social media, with
appropriate natural language processing techniques, can be a valuable and abundant data
source for the study of bullying in both worlds. We identify several key problems in
using such datasources and formulate them as NLP tasks, including text classification,
role labeling, sentiment analysis, and topic modeling. Since this is an introductory paper,
we present baseline results on these tasks using off-the-shelf NLP solutions, and
encourage the NLP community to contribute better models in the future.
Chapter 3
SYSTEM ANALYSIS
System analysis is the technique for finding the best answer for the issue.
Framework study is the procedure by which find out about the realistic issues,
characterize articles and pre-requisites and assesses the arrangements. It is the state of
mind about the affiliation and the issue it includes, an arrangement of advances that aids
in tackling these issues. Attainability study assumes an essential part in framework
examination which gives the objective for configuration and improvement.
A classifier is first trained on a cyberbullying corpus labeled by humans, and the learned
classifier is then used to recognize a bullying message. Three kinds of information
including text, user demography, and social network features are often used in
cyberbullying detection . Since the text content is the most reliable, our work here
focuses on text-based cyberbullying detection.
In the text-based cyberbullying detection, the first and also critical step is the numerical
representation learning for text messages. In fact, representation learning of text is
extensively studied in text mining, information retrieval and natural language processing
(NLP). Bag-of-words (BoW) model is one commonly used model that each dimension
corresponds to a term. Latent Semantic Analysis (LSA) and topic models are another
popular text representation models, which are both based on BoW models. By mapping
text units into fixed-length vectors, the learned representation can be further processed
for numerous language processing tasks.
Some approaches have been proposed to tackle these problems by incorporating expert
knowledge into feature learning. Proposed to combine BoW features, sentiment features
and contextual features to train a support vector machine for online harassment detection.
It can utilized label specific features to extend the general features, where the label
specific features are learned by Linear Discriminative Analysis. In addition, common
sense knowledge was also applied. Nahar et.al presented a weighted TF-IDF scheme via
scaling bullying-like features by a two factor. Besides content-based information, Maral
et.al proposed to apply users information, such as gender and history messages, and
context information as extra features. But a major limitation of these approaches is that
the learned feature space still relies on the BoW assumption and may not be robust. In
addition, the performance of these approaches rely on the quality of hand-crafted
features, which require extensive domain knowledge.
Advantages
1)Most cyberbullying detection methods rely on the BoW model. Due to the sparsity
problems of both data and features, the classifier may not be trained very well. Stacked
densoing autoencoder (SDA), as an unsupervised representation learning method, is able
to learn a robust feature space. In SDA, the feature correlation is explored by the
reconstruction of corrupted data. The learned robust feature representation can then boost
the training of classifier and finally improve the classification accuracy. In addition, the
corruption of data in SDA actually generates artificial data to expand data size, which
alleviate the small size problem of training data.
Software Requirement
Software System Configuration:-
Operating System : Windows95/98/2000/XP
Application Server : Tomcat5.0/6.X
Front End : HTML, Java, Jsp
Scripts : JavaScript.
Server side Script : Java Server Pages.
Database : My sql
Database Connectivity : JDBC.
Hardware Requirements
Hardware System Configuration:-
Processor - Pentium III
Speed - 1.1 GHz
RAM - 256 MB (min)
Hard Disk - 20 GB
Floppy Drive - 1.44 MB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
Functional Requirements
Functional requirements of the system are described in this section. In natural language,
requirements are expressed. Functions of the system are described and also behaviour of
the system is observed, when certain inputs are given.
Following are the functional requirements in this system :
New user should be Signed Up.
After registering, he/she can Sign In.
Host Instance.
Sender can upload the file using a receiver Email Id.
File is uploaded to Amazon Cloud Server.
SYSTEM DESIGN
Outline is a creative procedure; a fine plan is the way to valuable framework. The
framework "Outline" is characterize as "The procedure of applying an assortment of
methods and standards for the guideline of characterizing a procedure or a framework in
adequate component to allow its physical acknowledgment". Diverse configuration
elements are taken after to extend the framework. The outline design depicts the elements
of the framework, the segments or components of the framework and their hope to end-
clients.
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are
necessary to put transaction data in to a usable form for processing can be achieved by
inspecting the computer to read data from a written or printed document or it can occur
by having people keying the data directly into the system. The design of input focuses on
controlling the amount of input required, controlling the errors, avoiding delay, avoiding
extra steps and keeping the process simple. The input is designed in such a way so that it
provides security and ease of use with retaining the privacy. Input Design considered the
following things:
OBJECTIVES
1.Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process
and show the correct direction to the management for getting correct information from
the computerized system.
2.It is achieved by creating user-friendly screens for the data entry to handle large volume
of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can
be performed. It also provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered with the help
of screens. Appropriate messages are provided as when needed so that the user will not
be in maize of instant. Thus the objective of input design is to create an input layout that
is easy to follow.
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users
and to other system through outputs. In output design it is determined how the
information is to be displaced for immediate need and also the hard copy output. It is the
most important and direct source information to the user. Efficient and intelligent output
design improves the systems relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought out manner;
the right output must be developed while ensuring that each output element is designed so
that people will find the system can use easily and effectively. When analysis design
computer output, they should Identify the specific output that is needed to meet the
requirements.
3. Create document, report, or other formats that contain information produced by the
system.
The output form of an information system should accomplish one or more of the
following objectives.
User
Check Request
Yes Accept No
Request?
Accepted Rejected
Post Messages
First user checks the friend request that is sent to him/her from other people. If the user
wishes to accept the request then he can click on accept request button and then the user
can start chatting and post messages to friends. If user donot want to accept the request
then the user can click on delete request button and then the friend request sent by other
people will be deleted and then the user will no longer be able to chat with them.
5.2 System Architecture
Framework engineering is the hypothetical plan that characterizes the structure and
conduct of a framework. A design clarification is a legitimate depiction of a framework,
sorted out in a mode that backings examination about the auxiliary properties of the
framework. It characterizes the framework device or building squares and gives an
arrangement from which items can be obtained, and frameworks built up, that will work
commonly to actualize the for the most part framework. The System design is uncovered
underneath.
Users first enters text messages, and these text messages are stored in database. From the
database, all the messages are extracted and the classification of the messages is done,
that is messages are classified into bullying words and normal words. Once classification
of the messages is done then the sentiment prediction is done using NLP, that is the
attitude of the person who is sending the message is determined by correlating between
the other words in the message. For the input messages mining rules are applied to
determine the frequency of the occurrence of the words in the messages. Neurons are
trained using the training dataset. Finally result of the messages is predicted that is whether
the entered messages are positive, negative or neutral.
USE CASE DIAGRAM
Send Request
Accept Request
User
Post Messages
Share Images
In this module, users can send a friend request to all the people and only those
who accept his/her friend request, the user can start chatting with him/her. In the same
manner other people can also send a friend request and it is upto the user to accept the
friend request or delete the friend request. If the friend request sent by the user is deleted
then the user is not able to chat with that person. Once the friend request is accepted
among users then the users can share images, post messages and can start chatting with
their friends.
Apply Classfication
NLP
Analyse Sentiment
Generate Result
NLP gathers all the text messages from the database, and then the classification of
the messages is done, for the classified messages the sentiment prediction is done. Based
on training the neurons using training dataset, finally the result of the messages are
predicted.
Weka is a collection of machine learning algorithms for data mining tasks. The
algorithms can either be applied directly to a dataset or called from your own Java code.
Weka contains tools for data pre-processing, classification, regression, clustering,
association rules, and visualization. It is also well-suited for developing new machine
learning schemes.
Training dataset is used to train the machine, then for the test dataset that is the messages
entered by the user the data mining rules are applied and then finally the result is
predicted using J48 algorithm
Class Diagram
Main(); parseJson();
QueryProcess.getData(); filterData();
AI Module
trainData();
testData();
predictResult()
Sequence Diagram
In this module there is a datastore, NLP,Weka tool. First all the text messages
entered by the users are gathered from a database, and then the sentiment analysis is
performed using NLP, J48 algorithm is applied to predict the data and then data mining
rules are applied for the predicted data and then finally the result is predicted.
Chapter 6
Modules
The advantage of corrupting the original input in mSDA can be explained by feature co-
occurrence statistics. The cooccurrence information is able to derive a robust feature
representation under an unsupervised learning framework, and this also motivates other
state-of-the-art text feature learning methods such as Latent Semantic Analysis and topic
models.
A denoising autoencoder is trained to reconstruct these removed features values from the
rest uncorrupted ones. Thus, the learned mapping matrix W is able to capture correlation
between these removed features and other features. The major modifications include
semantic droupout noise and sparse mapping constraints.
However, a direct use of these bullying features may not achieve good performance
because these words only account for a small portion of the whole vocabulary and these
vulgar words are only one kind of discriminative features for bullying.
The bullying features play an important role and should be chosen properly. In the
following, the steps for constructing bullying feature set Zb are given, in which the first
layer and the other layers are addressed separately. For the first layer, expert knowledge
and word embeddings are used. For the other layers, discriminative feature selection is
conducted. Layer One: firstly, we build a list of words with negative affective, including
swear words and dirty words. Then, we compare the word list with the BoW features of
our own corpus, and regard the intersections as bullying features. and does not reflect the
current usage and style of cyberlanguage.
Therefore, we expand the list of pre-defined insulting words, i.e. insulting seeds, based on
word embeddings as follows: Word embeddings use real-valued and low-dimensional
vectors to represent semantics of words. The well-trained word embeddings lie in a
vector space where similar words are placed close to each other. In addition, the cosine
similarity between word embeddings is able to quantify the semantic similarity between
words. Considering the Interent messages are our interested corpus, we utilize a well-
trained word2vec model on a large-scale twitter corpus containing 400 million tweets. A
visualization of some word embeddings after dimensionality reduction (PCA). It is
observed that curse words form distinct clusters, which are also far away from normal
words. Even insulting words are located at different regions due to different word usages
and insulting expressions. In addition, since the word embeddings adopted here are
trained in a large scale corpus from Twitter, the similarity captured by word embeddings
can represent the specific language pattern. For example, the embedding of the
misspelled word fck is close to the embedding of fuck so that the word fck can be
automatically extracted based on word embeddings.
Remarks Passed
The Table 7.1 shows Unit Test Case 1(UTC-1) to verify whether user interface accepts
registration details of sender and receiver.
Remarks Passed
Remarks Passed
Two or more modules are combined together to check for Integration testing.
Integration Testing purpose is to check whether the integrated modules are performing as
expected. Integrated modules testing is given below.
Table 7.9 Integration Test Case 1 for Sender Registration and Post Messages
Table 7.10 Integration Test Case 2 for sentiment analysis and prediction
Remarks Passed
System Testing
Sl No of Test Case ITC-1
Remarks Passed
Summary
This chapter presents software testing methods and different types of test cases,
used to test the different modules of the project. Various unit test cases are described for
each of the modules. Integration testing is performed, when unit test cases or unit
modules are taken together. Finally, system test case is conducted to verify the
functioning of overall system.
Chapter 8
Metrics are the various measures used to evaluate the project. The following are the
metrics used to evaluate the project.
Time : This metric is used to determine the time required to do the computation of
secret key generation, security device generation, ciphertext generation,
device update.
Bits : This metric is used to determine the bits required for secret key size,
security device size and ciphertext size
7
6
5
T
4
i
3
m 2
e 1
0
without without USB Project Approach
Revocability Device
Table 8.1 Showing the Evaluation Metric for different Security Mechanism
approaches
Bit Size 32 32 64
Inference drawn from the performance plot shown in the Figure 8.1 is that other
approaches take more time for computation of secret key generation, ciphertext
generation and device update. Other approaches consider less number of bits(32 bits
only), whereas the current application considers 64-bit secret key size.
Table 8.1 shows the number of bits considered and time taken in milliseconds for
different approaches. Without revocability option takes more time and considers less
bits(32-bit), whereas the application takes less time and considers more number of
bits(64-bit).
Summary
In this chapter, evaluation metric used to determine the performance of the application is
explained. It also describes the dataset considered, performance analysis and the
inference made for the obtained results.
CONCLUSION
This project addresses the text-based cyberbullying detection problem, where robust and
discriminative representations of messages are critical for an effective detection system.
By designing semantic dropout noise and enforcing sparsity, we have developed
semantic-enhanced marginalized denoising autoencoder as a specialized representation
learning model for cyberbullying detection. In addition, word embeddings have been
used to automatically expand and refine bullying word lists that is initialized by domain
knowledge.
Future Enhancement
[1] A. M. Kaplan and M. Haenlein, Users of the world, unite! The challenges and
opportunities of social media, Business horizons, vol. 53, no. 1, pp. 5968, 2010.
[2] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder, and M. R. Lattanner, Bullying in
the digital age: A critical review and meta-analysis of cyberbullying research among
youth. 2014.
[3] M. Ybarra, Trends in technology-based sexual and non-sexual aggression over time
and linkages to nontechnology aggression, National Summit on Interpersonal Violence
and Abuse Across the Lifespan: Forging a Shared Agenda, 2010.
[4] B. K. Biggs, J. M. Nelson, and M. L. Sampilo, Peer relations in the anxiety
depression link: Test of a mediation model, Anxiety, Stress, & Coping, vol. 23, no. 4,
pp. 431447, 2010.
[5] S. R. Jimerson, S. M. Swearer, and D. L. Espelage, Handbook of bullying in schools:
An international perspective. Routledge/Taylor & Francis Group, 2010.
[6] G. Gini and T. Pozzoli, Association between bullying and psychosomatic problems:
A meta-analysis, Pediatrics, vol. 123, no. 3, pp. 10591065, 2009.
[7] A. Kontostathis, L. Edwards, and A. Leatherman, Text mining and cybercrime,
Text Mining: Applications and Theory. John Wiley & Sons, Ltd, Chichester, UK, 2010.
[8] J.-M. Xu, K.-S. Jun, X. Zhu, and A. Bellmore, Learning from bullying traces in
social media, in Proceedings of the 2012 conference of the North American chapter of
the association for computational linguistics: Human language technologies. Association
for Computational Linguistics, 2012, pp. 656666.
[9] Q. Huang, V. K. Singh, and P. K. Atrey, Cyber bullying detection using social and
textual analysis, in Proceedings of the 3rd International Workshop on Socially-Aware
Multimedia. ACM, 2014, pp. 36.
[10] D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L. Edwards,
Detection of harassment on web 2.0, Proceedings of the Content Analysis in the WEB,
vol. 2, pp. 17, 2009.
[11] K. Dinakar, R. Reichart, and H. Lieberman, Modeling the detection of textual
cyberbullying. in The Social Mobile Web, 2011.
[12] V. Nahar, X. Li, and C. Pang, An effective approach for cyberbullying detection,
Communications in Information Science and Management Engineering, 2012.
[13] M. Dadvar, F. de Jong, R. Ordelman, and R. Trieschnigg, Improved cyberbullying
detection using gender information, in Proceedings of the 12th -Dutch-Belgian
Information Retrieval Workshop (DIR2012). Ghent, Belgium: ACM, 2012.
[14] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. de Jong, Improving cyberbullying
detection with user context, in Advances in Information Retrieval. Springer, 2013, pp.
693696.
[15] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, Stacked
denoisingautoencoders: Learning useful representations in a deep network with a local
denoising criterion, The Journal of Machine Learning Research, vol. 11, pp. 33713408,
2010.
[16] P. Baldi, Autoencoders, unsupervised learning, and deep architectures,
Unsupervised and Transfer Learning Challenges in Machine Learning, Volume 7, p. 43,
2012.
[17] M. Chen, Z. Xu, K. Weinberger, and F. Sha, Marginalized denoising autoencoders
for domain adaptation, arXiv preprint arXiv:1206.4683, 2012.
[18] T. K. Landauer, P. W. Foltz, and D. Laham, An introduction to latent semantic
analysis, Discourse processes, vol. 25, no. 2-3, pp. 259284, 1998.
[19] T. L. Griffiths and M. Steyvers, Finding scientific topics, Proceedings of the
National academy of Sciences of the United States of America, vol. 101, no. Suppl 1, pp.
52285235, 2004.
[20] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, the Journal of
machine Learning research, vol. 3, pp. 9931022, 2003.
[21] T. Hofmann, Unsupervised learning by probabilistic latent semantic analysis,
Machine learning, vol. 42, no. 1-2, pp. 177196, 2001.
[22] Y. Bengio, A. Courville, and P. Vincent, Representation learning:A review and new
perspectives, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35,
no. 8, pp. 17981828, 2013.
[23] B. L. McLaughlin, A. A. Braga, C. V. Petrie, M. H. Moore et al., Deadly Lessons::
Understanding Lethal School Violence. National Academies Press, 2002.
[24] J. Juvonen and E. F. Gross, Extending the school grounds? bullying experiences in
cyberspace, Journal of School health, vol. 78, no. 9, pp. 496505, 2008.
[25] M. Fekkes, F. I. Pijpers, A. M. Fredriks, T. Vogels, and S. P. Verloove-Vanhorick,
Do bullied children get ill, or do ill children get bullied? a prospective cohort study on
the relationship between bullying and health-related symptoms, Pediatrics, vol. 117, no.
5, pp. 15681574, 2006.
[26] M. Ptaszynski, F. Masui, Y. Kimura, R. Rzepka, and K. Araki, Brute force works
best against bullying, in Proceedings of IJCAI 2015 Joint Workshop on Constraints and
Preferences for Configuration and Recommendation and Intelligent Techniques for Web
Personalization. ACM, 2015.
[27] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the
Royal Statistical Society. Series B (Methodological), pp. 267288, 1996.
[28] C. C. Paige and M. A. Saunders, Lsqr: An algorithm for sparse linear equations and
sparse least squares, ACM Transactions on Mathematical Software (TOMS), vol. 8, no.
1, pp. 4371, 1982.
[29] M. A. Saunders et al., Cholesky-based methods for sparse least squares: The
benefits of regularization, Linear and Nonlinear Conjugate Gradient-Related Methods,
pp. 92100, 1996.
[30] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its
oracle properties, Journal of the American statistical Association, vol. 96, no. 456, pp.
13481360, 2001.