You are on page 1of 21

2012

Extracting meaningful information from Social Network streams for crisis mapping
Avijit Paul (n8459941)
Stage 2 Proposal, Doctor of Philosophy May 2012

C r e a t i v e I n d u s t r i e s F a c u l t y - Q u e e n s l a n d U n i v e r s i t y o f T e c h n o l o g y

Table of Contents
1. The Proposed Title ..................................................................................................................... 3 2. The Proposed Supervisors and their Credentials ......................................................................... 3
Principal Supervisor: Associate Professor Dr. Axel Bruns ............................................................................. 3 Associate Supervisor: Associate Professor Dr. Dian Tjondronegoro ............................................................. 3 Associate Supervisor: Dr. Oksana Zelenko .................................................................................................... 3

3. Background and Literature Review ............................................................................................. 4


Keywords .................................................................................................................................................. 5 Research Domain ...................................................................................................................................... 5

3.1 Introductory Statement ............................................................................................................ 6 3.2 Literature Review ..................................................................................................................... 8


New Media & Communication Studies ...................................................................................................... 8 Crisis Communication and Social Media ....................................................................................................... 8 Twitter Analytics ....................................................................................................................................... 9 Contextual Analysis ....................................................................................................................................... 9 Computational Linguistic ............................................................................................................................ 10 Information Design .................................................................................................................................. 10 Visual Analytics ........................................................................................................................................... 11 Early Detection ........................................................................................................................................ 11

3.3 Research Problem .................................................................................................................. 11


Central Research Problem: How to extract and present useful information from Social Media stream during crisis time? ................................................................................................................................... 11 Sub Problem 1: How to identify what is useful information? ................................................................... 12 Sub Problem 2: How to capture selected data from Social Media Stream? .............................................. 12 Sub Problem 3: How to extract and analyse captured data in real time to find useful information .......... 12 Sub Problem 4: How to present the information to stakeholders ............................................................. 13

4. Program And Design Of The Research Investigation .................................................................. 13 4.1 Objectives, Methodology and Research Plan .......................................................................... 14 4.2 Resources and Funding Required ............................................................................................ 15
Books and journals required .................................................................................................................... 16

4.3 Individual Contribution to the Research Team ........................................................................ 16 4.4 Timeline of Completion of the Program .................................................................................. 16 5. Reference List ........................................................................................................................... 18 6. Appendix .................................................................................................................................. 21 6.1 Coursework ............................................................................................................................ 21

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

1. The Proposed Title


Extracting meaningful information from Social Network streams for Crisis Mapping

2. The Proposed Supervisors and their Credentials


Principal Supervisor: Associate Professor Dr. Axel Bruns

Dr. Axel Bruns is an Associate Professor in the Creative Industries Faculty at Queensland

University of Technology (QUT) in Brisbane, Australia, and a Chief Investigator in the ARC Centre of Excellence for Creative Industries and Innovation (cci.edu.au). He is the author of Blogs, Wikipedia, Second Life and Beyond: From Production to Produsage (2008) and Gatewatching: Collaborative Online News Production (2005), and the editor of Uses of Blogs with Joanne Jacobs (2006; all released by Peter Lang, New York). On top of developing metrics to analyse and map twitter data, in recent years he has published a vast array of research in the area of Social Network and Crisis Communication that includes topics such as Twitter and Crises, Twitter and Disaster Resilience.

Associate Supervisor: Associate Professor Dr. Dian Tjondronegoro


Dr. Dian Tjondronegoro is an Associate Professor at QUT, research and teaching in the area of Mobile and Multimedia Technologies. Dr. Tjondronegoro leads the Mobile Multimedia Research Group and teaches in the area of Mobile Devices and Mobile Application Development. Of specific significance to this project is his expertise in extracting semantic contents from video using audiovisual features. Prior to this experience, Dr. Tjondronegoro has examined cross-media content tagging and clustering of text, image, and video to support extraction of semantically related web content.

Associate Supervisor: Dr. Oksana Zelenko


Dr. Oksana Zelenko is a researcher at Creative Industries Faculty at QUT. Her research area focuses on the role of visual and interaction design in the field of mental health promotion for children and young people. Previously her design work included researching and developing online visual counseling tools that are currently in use by one of Australia's largest youth counseling organisations. On top of that, Dr. Zelenko has also demonstrated expertise in the area of information design for community resilience and organisational communication.
Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

3. Background and Literature Review


During recent natural disasters (e.g., Queensland Flood in 2010-2011 and Earthquake, Tsunami and Nuclear Crisis in Japan 2011) millions of status updates appeared on various social networks, indicating that peoples reliance on social media at the time of disaster has increased tremendously in recent years. The greatest concern, however, when it comes to harvesting information from users of Social Networks to emergency service is the uncertain credibility of received data content. At present it is highly problematic to differentiate between information that has high degree of crisis- relevance and that information which has a very low degree of crisis-relevance. Prior research by Bruns (2011), Potts et al., (2011) shows that using certain methods, such as following keywords and hashtags from publicly available data in twitter make it possible to identify information related to a specific crisis in progress and extract meaningful information from these status updates or tweets. However, as tweets are produced and disseminated extremely quickly, there exists the very practical consideration of filtering highly useful information stream from non-relevant tweets (Boulos et al., 2011). This is not simply an inconvenience, it poses a significant challenge that if resolved can mean the different between life-saving decisions and life-wasting decisions. This concern is compounded by managing the complex task of appropriately disseminating the crisis-relevant information that is harvested by filtering social media stream, to the multiple government disaster relief agencies (DCS, 2011) and Non-Government Organisations (NGOs) whose relief capacities, resources and decision would be highly valued by such information. Additionally, as the state and the value of the information during crisis change constantly, information representation techniques need improvement in order to present temporal data in actionable manner. The literature demonstrates a gap in current approaches in presenting such information to these stakeholders. Therefore this project will address some of the issues that surround the management and the dynamic state of unfolding disaster by extracting high-value, context-specific and chronologically framed disaster-based information. Through a process of digital harvesting and categorising social media conversation streams, this project also seeks to deliver both a framework and a system that will facilitate key decision making processes during times of natural disaster.
Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

Keywords
Natural Disaster, Flood, Earthquake, Social Network Analysis, Twitter Analytics, Big data, Visualisation, Information Retrieval, Text Mining, Machine Learning, Natural Language Processing.

Research Domain
This research utilizes an interdisciplinary approach that combines elements from media and communication studies, crisis communication, communication design, twitter analytics, sentiment analysis and computational linguistic.

Image 1: Domain areas of this research

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

3.1 Introductory Statement


The first 24 hours are often the most critical time during any natural disaster and is also the

period when most community harms occurs (DCS, 2011). Casualty increases due to slow response time from relief organisations as they lack verifiable information (Meier, 2012). The Department of Community Safety (DCS) of Queensland Government, for example, in its 2011 report entitled All Hazards Information Management Program have identified reducing response time during disaster event a priority in order to reduce community harm (DCS, 2011) (Image 2 below).

Image 2: Enhancing disaster response system from current to future (DCS, 2011). Prior research suggests that by using crowd-sourced information from various sources including social networks, it is potentially possible to shorten the time it takes to find information that allows faster response time (Platt, Hood, & Citrin, 2011). In recent disasters people from all over the world used social network sites to update their situation and seek help. This made Social Media streams an extremely powerful information source during crisis events (Muralidharan, Rasmussen, Patterson, & Shin, 2011). Two social networking sites, Facebook and Twitter, were most popular among the social network sites during these acute events. However, prior research suggests that due to their walled- garden approach, Facebook is less accessible than twitter for public communication (Bruns, 2012). As Twitter updates are visible even to a non-registered user and Twitter allows a user to follow another user without the need to know the person, a person can follow a crisis authority quickly during disaster time to receive real time updates. This enables Twitter to draw on and also become information source at the same time. For this reason Twitter is the social network of choice for this research.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

However, as updates in Twitter happen extremely quickly, keeping track of all the updates to extract useful information is a daunting task. Additionally, during a crisis different authorities require different information to act on. Selecting relevant information set for related authority is a challenge faced while harnessing power of social media (DCS, 2011). According to CCI Floods report by Bruns, Burgess, Crawford & Shaw (2012) tweets during crisis can be categorised in five major categories; information, Media Sharing, Help and Fundraising, Direct Experience and Discussion and Reaction. Extracting and presenting in such groups can provide authorities with actionable information. However, not all tweets can be grouped distinctively and therefore challenge remains in identifying tweets in real-time that do not clearly fall into a certain group. Additionally, a large body of present Twitter research uses certain methods such as hashtags to identify messages related to a specific natural disaster and find meaningful information out of that (Bruns, Burgess, Crawford, & Shaw, 2012). However, this method of tracking via pre-defined keywords has its limitations. As most natural disasters are unpredictable events, it is difficult to guess which keywords will become popular and noteworthy in order to be selected for tracking. Additionally, when a crisis happens, users may introduce new keywords or hashtags, which may take time to become noteworthy or may be abandoned again as other, similar keywords gain importance (Bruns & Liang, 2012). On top of that, there are plenty of rumours and false information in twitter (Gayo-Avello, 2012) that makes information credibility one of the biggest issue of twitter (Castillo, Mendoza, & Poblete, 2011), (Gupta, Zhao, & Han, 2012). Not all messages that appear in tweet stream are authentic in nature. As a result, rumour and fake information during disasters often creates unnecessary situations (Mendoza, Poblete, & Castillo, 2010) and contributes significantly in the irrelevant information or noise, which needs to be eliminated in order to find information that is useful. Therefore finding information from their early ripples and grouping them together before they become prominent is one of the key areas of this research. Furthermore, as crisis continues, status and condition of a crisis situation gets updated and may make the information irrelevant. At present most of the crowdsourced crisis information visualisation uses some form of maps to display information (Elwood, 2011). However map data often do not portray this temporal aspect of data visualisation. As presenting chronological information of disaster is crucial for informed decision making at times of disaster, this is another key area of this research.
Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

Therefore the primary aim of this research is to formulate new research perspectives and

methods to extract and present relevant information from on going social media updates during natural disasters. By building a theoretical framework and an online system, this project will harvest social media conversation streams to help make life saving decisions.

3.2 Literature Review


New Media & Communication Studies


In recent years, new media such as Social Media has been heavily influencing the way we communicate socially and interpersonally (Baym, Zhang, & Lin, 2004). While it has given power to ordinary citizens to broadcast their message to potentially an unlimited number of people, inability to identify who actually reads the message makes it very limited at the same time. Therefore quite often when someone tweets they only have an imagined audience in mind and they hope someone will read it (Boyd & Marwick, 2011). According to prior research, this imagined audience affects how people tweet and how they balance their authenticity and reputation in the tweetverse (Boyd, 2011). As this research focuses on communication via social media, theories of new media and communication studies will be extensively reviewed. Additionally, to gain better understanding of Social Media usage in Crisis, literature on crisis communication will be thoroughly reviewed.

Crisis Communication and Social Media


Prior research suggests that people have been using Twitter for spontaneous volunteerism in recent crisis situations (Starbird & Palen, 2011). When a crisis looms, ordinary citizens who were not affected takes up more active role from a passive everyday user role (Bruns, 2011) to reach out and help people by using social media. Concepts such as Voluntweeters, a self organising online microblogging volunteer community has emerged in recent natural disasters without any directive or influence from governments or authorities (Starbird & Palen, 2011). And in case people are unable to directly contribute information from the ground, they tend to retweet very quickly in an effort to spread the news as fast as possible (Starbird, 2012). Apart from collective behaviour phenomena, Twitter has also been used for intensified information search, social convergence in physical space, and information contagion (Starbird, Palen, Hughes, & Vieweg, 2010). As Twitter has repeatedly been proven to maintain connectivity (Bruns, 2011), finding ways to show empathy for the people involved (Sarcevic et al., 2012), streamline

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

multi-channel communication processes and options to be readily accessible to the news media during crisis situations (Large, 2012), a thorough understanding and testing of Crisis Communication theories can help to create necessary framework that can be used to analyse social network data sets in real time.

Twitter Analytics
These two way communications multiplied by thousands of people creates a firehose of information (Wu, Hofman, Mason, & Watts, 2011). The Twitter firehose consists of the entire tweet stream at any given time (Dong et al., 2010). Since the number of updates can be extremely quick and massive (more than 5,000 tweets per second in twitter alone during Japan Tsunami (Empson, 2012)) microsyntex format such as usage hashtags are particularly useful to bring a particular topic in the forefront of an ongoing conversation (Stamberger, 2010). However, the contributing factors that establishes a keyword as hashtag is still not well researched (Cullum, 2010). In fact, there is limited research on extracting useful information from the firehose. Furthermore, identifying keywords or hashtags alone is not enough as various other metrics such as widely shared links, influential users, retweets can have significant importance and are important items to extract and analyse (Boyd, Golder, & Lotan, 2010) . At present the most common twitter analytics is done via tracking keywords and hashtags (Bruns & Liang, 2012). Other analytics involve locating and profiling user id (twitter handles) (Yugami, Igata, Anai, & Inakoshi, 2012), geo tagging (Lee, Wakamiya, & Sumiya, 2011), URL and linkage data (Aggarwal, 2011) etc. Twitter analytics has been used to track academic citation prediction (Eysenbach, 2011), temporal patterns of happiness (Dodds, Harris, Kloumann, Bliss, & Danforth, 2011) and finding meaningful expression of engagement (Huston, Weiss, & Benyoucef, 2011). Most, if not all Twitter analytics however are post-hoc and the data is archived first and analysed later. In the early stage of this research I will use the most appropriate method among the methods available to simulate and test my hypothesis and will develop a new method for real time testing in the last phase of the research. This presents the first research gaps on extracting meaningful and useful information from an on-going social media updates during crisis.

Contextual Analysis
Even though real time data processing can be used to extract data (Vlachos, 2011), it does not have the ability to identify meaning out of a given context. In order to understand meaning, it needs to learn the rules and patterns (Valero, Gmez, & Pineda, 2009). Different methods such as

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

dictionary-based, rule based, hybrid have been proposed for such pattern or named entity recognition activity (Song, Tjondronegoro, & Docherty, 2012), (Dhling & Leser, 2011). However, limited research has been conducted in conjunction with disaster response, contextual and sentiment analysis and named entity recognition (Park, Cha, Kim, & Jeong, 2012), (De Fortuny, De Smedt, Martens, & Daelemans, 2012). Thus, in order to be usable in picking early disaster signals, contextual analysis can be used to find the meaning of a word in context (Maxwell, Raue, Azzopardi, Johnson, & Oates, 2012). Therefore, by mining subjective expression or opinion, it will be able to differentiate between similar words used in different context avoid creating false alarm while grouping extracted data from a social media stream (Liu, 2010).

Computational Linguistic
In recent years there has been a growing interest in using Computational Linguistics with Twitter during a crisis (Corvey, Vieweg, Rood, & Palmer, 2010) mostly to identify trending keywords (Sakaki, Toriumi, & Matsuo, 2011). It has also been used to problems with products and service with Twitter data (N. K. Gupta, 2011). As this research requires extensive analysis of text data in order to understand uses of words in context, methods of computational linguistic in emergency will be studied in order to isolate noise data from useful data.

Information Design
Traditionally maps have been used to represent crisis related data in order to identify priority areas (Tufte, 2001). However, as the information changes rapidly in social networks, presenting crisis information gathered from Social Networks via map may not be the best way. Furthermore, most of the available crisis presentation system requires extensive manual entry and monitoring into a system that projects the data in a crisis map (Meier, 2012). Although this has proven useful, it is often time and resource consuming. Since every minute is important when saving lives after a natural disaster, alternative information design and presentation techniques such as fractal maps, heat maps or other non-map based visualisation techniques will be explored. As there has been limited research done on presenting data generated from such massive datasets during disaster, one of the major challenge for this research is to present real time information extracted from social network stream in a meaningful manner.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

10

Visual Analytics
As the amount of data driven documents and services increases rapidly, visual analytics is gaining more and more momentum in recent years (Bostock, Ogievetsky, & Heer, 2011). Collaborating and Social visualisation techniques have also gained popularity to visualise crowd-sourced data (Heer & Agrawala, 2008), (Keim et al., 2008). These visual analytic methods and processed will be studied to find how it can be used to best present the data in order to present it quickly and effectively in a crisis situation.

Early Detection
Prior research suggested use of social media to predict health disasters such as H1N1 using traditional and social media (Liu & Kim, 2011). It has also been used to suggest low-level prediction of natural disasters (Li, Wang, & Liu, 2011). However, once data is gathered, due to vast differences in the information generated, it remains quite difficult to analyse them in real time. On top of that, there is no established methodology to identify the time taken before a certain term becomes a trending topic: there is a methodological gap when it comes to identifying weak signals surfacing through social media streams before they become widely visible, in order to understand which keywords are likely to be important. Limited research has been conducted to identify links between social media updates and natural disaster prediction. Therefore the third area of interest is to probabilistically identify relationship between social media updates and potential natural disaster.

3.3 Research Problem


Based on the prior literature review, a central research problem and four sub-problems are identified. They are;

Central Research Problem: How to extract and present useful information from Social Media stream during crisis time?
As updates happen extremely quickly in social networks, especially during crisis time, one of the most important parts is to extract information that is useful. Even though it is possible to read through real time social network data, the problem remains trying to extract information that is useful and usable in close to real-time. Additionally, quality of information degrades over time and current presentation techniques pose certain limitations in getting up to date information quickly.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

11

Thus, the central challenge of this thesis is to extract useful information from Social Media and present it with as little delay as possible.

Sub Problem 1: How to identify what is useful information?


As useful is a relative term, the first problem to address is - what is useful during crisis situation? As prior research shows that in a twitter conversation there are various patterns and metrics of communication, the first challenge is to find which metrics; patterns and frameworks can identify a conversation as useful. For example, finding out who are the most active users during disaster time, who posts original messages that get retweeted most may have a significant impact to find useful conversation and therefore will be identified as a variable. Once the variables are determined, the task is to develop and test the hypothesis on archival data before testing it in a live environment at a later stage.

Sub Problem 2: How to capture selected data from Social Media Stream?
The second problem is to capture data from the social media stream during a crisis. At present there are various methods available and deployed such as twapperkeeper. However, most of the available capture methods looks for a pre determined keyword or Hashtag or pre-identified user. As this research is looking for information from a full firehose tweet stream, new methods such as Hadoop, Twitter stom and so on will be used to capture the Social Media Stream. Since there are various methods available with their own strength and weakness, finding the right way to capture will be the second issue to solve.

Sub Problem 3: How to extract and analyse captured data in real time to find useful information
Once the method for capturing information is identified, the next challenge is to analyse it and segregate noise from the information. The hypothesis developed at Sub-problem 1 will be applied to data collected at Sub-problem 2 at this stage. The challenge will be to identify how to separate filter information from the data source by applying twitter analytics, sentiment analysis, computational linguistic or any other methods necessary in real time to a live twitter data stream.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

12

Sub Problem 4: How to present the information to stakeholders


Once usable information is extracted, the next challenge is to present it in a way that is relevant to the stakeholders, authorities, communities, media to act on. As different stakeholder require different types of information and a one size filter do not fit all the information, the next challenge is to identify how to present to them in a flexible way so that they can act on it. Various visualisation techniques that were identified within the literature review will be tested at this stage to find out which technique represents temporal data in a chronological manner most effectively.

4. Program And Design Of The Research Investigation


This research will be divided in a four iterative phases that will allow me to go constantly develop and evaluate the whole research project (Image 3). The key phases are- Phase 0: Initial Literature Review (First 3 months) Phase 1: Building hypotheses and theoretical algorithm from literature (2nd 3 months) Phase 2: Capturing real time Twitter data using capturing technologies like Hadoop, Strom (last 6 months of first year) Phase 3: Real time analytics, hypothesis testing and sending for evaluation (2nd year) Phase 4: Information design and creating Crisis Visualization. (Initial months in 3rd Year)

Image 3: Key phases of the research design The phases are broken down in actionable tasks below that allow going back and forth between the tasks as deems necessary.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

13

4.1 Objectives, Methodology and Research Plan


The objective of this research is to address the research problems identified in earlier sections. And to do that, mixed methods consisting of various qualitative and quantitative research methods will be used in this research. As there are various methodologies currently available for data analysis and communication during disaster, some of the methods will use quantitative data and others will use qualitative data. Below are some of the broad methods that will be studied during this project. First objective is to identify what is useful and it will be developed by reviewing literature in this area. This review will analyse reports, media and academic writing on recent research in the area of social network, natural disaster, media & communication studies, crisis communication, twitter analytics, sentiment analysis and computational linguistics. Based on the studies, variables will be identified to find what is useful in the context of social media conversation during disaster. This will be followed closely by development and testing of the hypotheses on how Twitter users communicate during a crisis. In order to do this, I will first slice disaster related (QLD flood, Japan Tsunami, New Zealand earthquake) twitter data gathered at CCI from twapperkeeper using awk scripts (a data extraction and reporting tool) developed by Axel Bruns. By mapping relationship between twitter datasets both in the area of disaster and social media communication I will be able to test the developed hypotheses on communication during crisis. I will then formulate approaches to extract relevant information from a large dataset archived at QUT. Using visualisation tools such as Gephi I will also explore possibilities of presenting information differently. This task will be done after submission of stage 2. However, as natural disasters are happening around the world, research articles in this area are appearing rapidly. To keep abreast of these developments, the literature review will be on going throughout the Phd in case new variables are identified. Second objective is to capture live twitter streams so that it can be stored for future analysis. In order to do this, I will setup a NoSQL database (Mongo or CouchBase) with one Hadoop and one STORM cluster to store incoming twitter streams. Although the target at this stage is to use twitter firehose as the input stream, as this access needs to be purchased, if I am unable to gain access for that I will use keyword specific input streams. The database and the server will initially be hosted via two cloud instances from NeCTAR, an Australian Government project conducted as part of the Super Science initiative and financed by the Education Investment Fund. The use of database and cluster
Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

14

file system may vary if a new and improved version is released. This task will be done in between stage 2 submission and confirmation seminar. In the end this will result in a system that can capture twitter data from the twitter firehose in real-time and will provide the basis for real-time analysis on the captured data stream. Third objective is to extract the useful information from this live twitter stream. This will be done using suitable twitter analytics methods available at that point of time. Additionally, to understand the meaning of the words used based on their context, in order to identify weak signals I will apply contextual analysis and other computational linguistic methods at this stage. At this stage the whole system will go through an iterative process of testing, evaluation and improvement to make it more effective. This step will use the hypothesis developed from the first phase (first objective) and data collected from the second phase (second objective) to initially test on archival data. Based on the result, the system will be sent for evaluation to the Queensland Governments Department of Community Safety (DCS) for assessment. Improvements will be carried out based on the feedback gathered. This whole process will be done during 2nd year of candidature. Fourth objective is to present the information in a way that is useful for the stakeholders. Various presentation techniques will be used to test the extracted information in order to see which presents the most benefit. Since using maps such as Google Map or other maps are the most traditional way of presenting the information, the data will first be placed using that mapping technique. However as maps have their own limitations in dealing with temporal data in chronological order, other techniques for information design will be tested at this stage based on the extracted information. This whole process will be an iterative process with seeking feedback from DCS as there are number of ways the data can be presented and sampled.

4.2 Resources and Funding Required


In the first stage I will use my own personal computer and QUT computers in the lab in order to slice data with awk scripts and Gephi to visualize. After that, in order to do real time data extraction, I will first use free Australian Research Cloud network (NeCTAR) instances that is already available for QUT students. At the same time I will also submit NeCTAR RFP stage 2 in order to secure a longer run at using their cloud instances. If I need access to even larger cloud instances I will use AWS (Amazon Web Service) and will apply further research funding such as the auDA grant (.au domain administration Ltd) to support usage and storage at AWS or other appropriate cloud.
Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

15

Books and journals required


As this research taps into various emerging fields, some of the books and journals available are still in their early access edition and therefore not available through QUT library. If they are not available, I will request the library to purchase them.

4.3 Individual Contribution to the Research Team


Although this is an individual project, it is linked to the ARC Linkage-funded project Social Media in Times of Crisis: Learning from Recent Natural Disasters to Improve Future Strategies with collaboration from Queensland Department for Community Safety and the Eidos Institute. This project combines large-scale quantitative and close qualitative analysis to investigate the public use of social media during disasters, working with key emergency management organisations to improve their communication strategies. My contribution will be building theory and framework on what to extract as well as developing improved extraction and presentation methods for social media data stream.

4.4 Timeline of Completion of the Program


Please refer to the attached timeline.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

16

PHD TIMELINE - AVIJIT PAUL


Time Elapsed (in months for 3 yr study) PhD Milestones Stage 2 Confirmation Annual Progress Final Seminar Lodgement Generic Capabilities
Advanced theoretical knowledge and analytical skills, as well as methodological, research design and problem-solving skills in a particular research area; Advanced information processing skills and knowledge of advanced information technologies and other research technologies; Independence in research planning and execution, consistent with the level of the research degree Competence in the execution of protocols for research health and safety, ethical conduct and intellectual property ;

12

15

18

21

24

27

30

33

36 Key Dates 5th June 2012 5th March 2013 30th Sept 2013 4th Dec 2014 4th Jan 2015

Resource Implications

Constraints

Develop method Confirmation Seminar Develop skills in statistics, use or key software e.g. endnote, SPSS, AWK, STORM, Python Data analysis Apply for research grant Develop tools Submit Ethics Application Complete H&S training

ATN More Critical and Creative Thinking

AIRS Apply for research grant Confirm IP Arrangements

Apply for research grant

Skills in project management, teamwork, academic writing and oral communication; Awareness of the mechanisms for research results transfer to end-users, scholarly dissemination through publications and presentations, research policy, and research career planning.

ATN Leap ATN Leap Communication Project and Leadership Mangement ATN More Critical Writing Publication Workshop

Grad Cert in Research Commercialisati on Presentation Workshop

Meeting Final Seminar timeline Commercializati on exploration

Journal

Conference

Conference

Journal

Coursework
Advanced Information Retrieval Skills (IFN001 Mandatory for PhD candidates) Enquiry to Creative Industries (KKP 6601)

15th June 2012 15th June 2012

Thesis Writing Title & Abstract Introduction Literature Review Methodology Data Analysis - Archival Data Data Analysis - Live Data Data Analysis - Visual Analytics Discussion Conclusion Research Process (methodology in sections) Accessing Literature Consider Methodologies Hypothesis development Real Time Capture Implementation of Real time Analytics Live testing with Twitter Stream information design Gather Results Approvals/Agreements/Applications Intellectual Property Ethics Industry Health & safety Scholarships Grants in Aid Write Up Scholarship Outputs Conference Papers Journals System Commercialization Funding for large scale access to twitter data If unable to gain access will work with keywords

5. Reference List

Aggarwal, C. C. (2011). An Introduction To Social Network Data Analytics. Axel Bruns, J. B. (2011). New methodologies for researching news discussion on Twitter. Paper presented at the The Future of Journalism, Cardiff, UK. Baym, N. K., Zhang, Y. B., & Lin, M. C. (2004). Social interactions across media. New Media & Society, 6(3), 299. Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-Driven Documents. Visualization and Computer Graphics, IEEE Transactions on, 17(12), 2301-2309. Boulos, M. N. K., Resch, B., Crowley, D. N., Breslin, J. G., Sohn, G., Burtner, R., Pike, W., Jezierski, E., Chuang, K.- Y. S. (2011). Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: trends, OGC standards and application examples. International Journal of Health Geographics, 10. Boyd, D. (2011). Research on Social Network Sites. Boyd, D., Golder, S., & Lotan, G. (2010). Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. 1-10. Boyd, D., & Marwick, A. E. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New Media & Society, 13(1), 114. Bruns, A. (2011). Towards Distributed Citizen Participation: Lessons from WikiLeaks and the Queensland Floods. Paper presented at the Conference for E-Democracy and Open Government, Krems, Austria Bruns, A. (2012). Ad Hoc Innovation by Users of Social Networks: The Case of Twitter ZSI Discussion Paper Bruns, A., & Liang, Y. E. (2012). Tools and methods for capturing Twitter data during natural disasters. First Monday, 17(4-2). Bruns., A., Burgess, J., Crawford, K., & Shaw, F. (2012). CCI Floodsreport: Media Ecologies Project, ARC Centre of Excellence for Creative Industries & Innovation. Castillo, C., Mendoza, M., & Poblete, B. (2011). Information credibility on twitter. Corvey, W. J., Vieweg, S., Rood, T., & Palmer, M. (2010). Twitter in mass emergency: what NLP techniques can contribute. Paper presented at the Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media, Los Angeles, California. Cullum, B. (2010). What makes a hashtag successful. Retrieved April 8th, 2012, from http://www.movements.org/blog/entry/what-makes-a-twitter-hashtag-successful/ DCS, Q. G. (2011). All Hazards Information Management Program http://www.btrc.qld.gov.au/c/document_library/get_file?uuid=a4491bd2-cfe5-466b-a003- 45f86878bc85&groupId=12276. Brisbane: QLD Government. De Fortuny, E. J., De Smedt, T., Martens, D., & Daelemans, W. (2012). Media coverage in times of political crisis: a text mining approach: University of Antwerp, Faculty of Applied Economics. Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal patterns of happiness and information in a global social network: hedonometrics and Twitter. [; Research Support, U.S. Gov't, Non-P.H.S.]. PloS one, 6(12), e26752.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

18


Dhling, L., & Leser, U. (2011). EquatorNLP: Pattern-based Information Extraction for Disaster Response. Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zhaohui, Z. (2010). Time is of the essence: improving recency ranking using twitter data. Elwood, S. (2011). Geographic Information Science: Visualization, visual methods, and the geoweb. Progress in Human Geography, 35(3), 401-408. Empson, R. (2012, February 5). Twitter: In The Final 3 Minutes Of The Super Bowl, There Were 10,000 Tweets Per Second. Retrieved April 9th, 2012, from http://techcrunch.com/2012/02/05/twitter-in-the-final-3- minutes-of-the-super-bowl-there-were-10000-tweets-per-second/ Eysenbach, G. (2011). Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact. Journal of Medical Internet Research, 13(4). doi: e123 10.2196/jmir.2012 Gayo-Avello, D. (2012). "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" : A Balanced Survey on Election Prediction using Twitter Data. Arxiv preprint arXiv:1204.6441. Gupta, M., Zhao, P., & Han, J. (2012). Evaluating Event Credibility on Twitter. Gupta, N. K. (2011). Extracting descriptions of problems with product and services from twitter data. Heer, J., & Agrawala, M. (2008). Design considerations for collaborative visual analytics. Information Visualization, 7(1), 49-62. Huston, C., Weiss, M., & Benyoucef, M. (2011). Following the Conversation: A More Meaningful Expression of Engagement. In G. Babin, K. StanoevskaSlabeva & P. Kropf (Eds.), E-Technologies: Transformation in a Connected World (Vol. 78, pp. 199-210). Berlin: Springer-Verlag Berlin. Keim, D., Andrienko, G., Fekete, J. D., Grg, C., Kohlhammer, J., & Melanon, G. (2008). Visual analytics: Definition, process, and challenges. Information Visualization, 154-175. Large, T. (2012). TechnoTalk - Will Twitter put the U.N. out of the disaster business? Retrieved 28 March, 2012, from http://www.trust.org/alertnet/blogs/technotalk/will-twitter-put-the-un-out-of-the-disaster- business/#.T3Gkd2LX3Yk.twitter Lee, R., Wakamiya, S., & Sumiya, K. (2011). Discovery of unusual regional social activities using geo-tagged microblogs. World Wide Web-Internet and Web Information Systems, 14(4), 321-349. Li, C., Wang, Y., & Liu, X. (2011). Research on natural disaster forecasting data processing and visualization technology. Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of Natural Language Processing, 627-666. Liu, B. F., & Kim, S. (2011). How organizations framed the 2009 H1N1 pandemic via social and traditional media: Implications for US health communicators. [Article]. Public Relations Review, 37(3), 233-244. doi: 10.1016/j.pubrev.2011.03.005 Maxwell, D., Raue, S., Azzopardi, L., Johnson, C., & Oates, S. (2012). Crisees: Real-Time Monitoring of Social Media Streams to Support Crisis Management. Advances in Information Retrieval, 573-575. Meier, P. (Producer). (2012, April 4th). Collaborative Mapping Platforms: Crowdsourced Crisis Response. [Keynote] Retrieved from http://www.trendhunter.com/keynote/patrick-meier Mendoza, M., Poblete, B., & Castillo, C. (2010). Twitter Under Crisis: Can we trust what we RT?

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

19


Muralidharan, S., Rasmussen, L., Patterson, D., & Shin, J. H. (2011). Hope for Haiti: An analysis of Facebook and Twitter usage during the earthquake relief efforts. [Article]. Public Relations Review, 37(2), 175-177. doi: 10.1016/j.pubrev.2011.01.010 Park, J., Cha, M., Kim, H., & Jeong, J. (2012). Managing Bad News in Social Media: A Case Study on Dominos Pizza Crisis. Platt., A., Hood., C., & Citrin., L. (2011). Organization of Social Network Messages to Improve Understanding of an Evolving Crisis Paper presented at the Intelligence and Security Informatics (ISI), 2011 IEEE International Conference, Beijing. Potts, L., Seitzinger, J., Jones, D., & Harrison, A. (2011). Tweeting disaster: hashtag constructions and collisions. Sakaki, T., Toriumi, F., & Matsuo, Y. (2011). Tweet trend analysis in an emergency situation. Sarcevic, A., Palen, L., White, J., Starbird, K., Bagdouri, M., & Anderson, K. (2012). Beacons of hope in decentralized coordination: learning from on-the-ground medical twitterers during the 2010 Haiti earthquake. Song, W., Tjondronegoro, D. W., & Docherty, M. (2012). Understanding user experience of mobile video: framework, measurement, and optimization. Mobile Multimedia: User and Technology Perspectives, 3-30. Stamberger, K. S. a. J. (2010). Tweak the Tweet: Leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting. Paper presented at the Information Systems for Crisis Response and Management (ISCRAM), Seatle, USA. Starbird, K. (2012). Digital Volunteerism: Examining Connected Crowd Work During Mass Disruption Events. Starbird, K., & Palen, L. (2011). "Voluntweeters": self-organizing by digital volunteers in times of crisis. Paper presented at the Proceedings of the 2011 annual conference on Human factors in computing systems, Vancouver, BC, Canada. Starbird, K., Palen, L., Hughes, A. L., & Vieweg, S. (2010). Chatter on the red: what hazards threat reveals about the social life of microblogged information. Paper presented at the Proceedings of the 2010 ACM conference on Computer supported cooperative work, Savannah, Georgia, USA. Tufte, E. R. (2001). The visual display of quantitative information: Graphics Press. Valero, A. T. l., Gmez, M. M. y., & Pineda, L. V. o. (2009). Using Machine Learning for Extracting Information from Natural Disaster News Reports. Computacin y Sistemas (Computers and Systems), 13(1), 33-44. Vlachos, A. (2011). Evaluating unsupervised learning for natural language processing tasks. Paper presented at the Proceedings of the First Workshop on Unsupervised Learning in NLP, Edinburgh, Scotland. Wu, S., Hofman, J. M., Mason, W. A., & Watts, D. J. (2011). Who says what to whom on twitter. Paper presented at the Proceedings of the 20th international conference on World Wide Web, Hyderabad, India. Yugami, N., Igata, N., Anai, H., & Inakoshi, H. (2012). Advanced Analytics for Intelligent Society. Fujitsu Scientific & Technical Journal, 48(2), 110-116.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

20

6. Appendix 6.1 Coursework



AIRS Unit IFN 001 I have taken the course Advanced Information Retrieval Skills (IFN001) and submitted assignment and waiting for result. Approaches to Enquiry In the Creative Industries - KKP601

I have taken this course, Approaches to Enquiry In the Creative Industries, completed the presentation and have submitted the final assignment and waiting for result.

Extracting meaningful information from Social Network streams for Crisis Mapping Avijit Paul n8459941 PhD - Stage 2 Proposal - avijit.paul@student.qut.edu.au

21

You might also like