Treemap Visualization of The Semantic Twitter Analysis Tool

Treemap Visualization of the Semantic Twitter Analysis Tool
Albert Leimller u August 2011

Bachelor Thesis Graz University of Technology Computer and Information Services Dept. Social Learning
Author: Albert Leimller u a lei@sbox.tugraz.at Supervisor: Martin Ebner martin.ebner@tugraz.at
Contents
Abstract 1 Introduction 2 Information Visualization 2.1 Treemap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Subdivision Algorithm . . . . . . . . . . . . . . . . . . . . 3 Visual Twitter Analysis Tools 3.1 Twitter Stats . . . . . . . . . 3.2 Twitter Counter . . . . . . . 3.3 Trendistic . . . . . . . . . . . 3.4 Mirror.me . . . . . . . . . . . 3.5 Twitter Stream Graphs . . . 3 4 5 5 6 7 7 10 10 10 12 15 15 15 15 15
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 JavaScript Visualization Frameworks 4.1 Highchart JS . . . . . . . . . . . . . 4.2 jQuery Visualize . . . . . . . . . . . 4.3 Protovis and D3 . . . . . . . . . . . 4.4 JavaScript InfoVis Toolkit . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
5 STAT 17 5.1 Basic usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 An example query . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6 Implementation of the STAT visualization 20 6.1 Creating a JIT readable JSON le . . . . . . . . . . . . . . . . . 20 6.2 Initializing JIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.3 Removing a rectangle from the treemap . . . . . . . . . . . . . . 24 7 The Implementation used on an example 8 Conclusion 9 References 29 31 32
Abstract
The popularity of the internet and the number of its users increases rapidly. Everyday a huge amount of data is created. This amount of data makes it necessary to nd a useful representation of the data so that users are able to nd informations they are interested in. The data of this bachelor thesis comes from the social web platform Twitter and the goal is to implement a visualization of search results. Compared to a simple tag cloud a graphically more advanced representation which makes it also possible to search dynamically through the data was the goal. Because of the fact that the data representation should be dynamically and the quality of the visualization should be as good as possible the program was implemented in JavaScript and PHP. The data source from the program STAT. STAT was implemented by Thomas Altmann for his bachelor thesis Erschlieung und Analyse von Twitter Analyse Tools (Exploitation and Analysis of Twitter Analysis Tools). STAT is a semantic Twitter analysis tool and analyses tweets by their content. The goal is to compare dierent variations of graphical data representations and explore their potential for future use.
Introduction
The exchange of information is an important aspect in todays life. Information can now be passed along immediately and news can be spread around the world in just few seconds. This is possible due to the fact that the internet and many of its software applications became available forthrough anyone in the last years. One of them is Twitter, a microblogging platform that enables users to share their thoughts in an information stream that has 140 characters per post, which is the same length as a SMS. Twitter was launched in 2006 and has since grown rapidly in popularity.1 Twitter posts, also known as tweets, are by default public, which is an important dierence to other social networking platforms. On Twitter posts can be read by anyone who shares public messages. So it is not necessary to approve someone as friend to read posts. That is one of the reasons why Twitter is not just a social network, like Facebook or Google+ but an information source which is also used by newspapers and other news companies to spread information. Often news and information are spread more rapidly over Twitter then in online newspapers like at the protests in Iran in 2009.2 To follow just tweets that are interesting for a user, these tweets need to be ltered. That is featured on default by Twitter because Twitter shows tweets only of users on a stream of a user that follows those users. But it is still dicult to follow all tweets that are posted concerning a specic event, especially when a user has many followees. Therefore the software STAT [Altmann 2010] was developed by Thomas Altmann for his bachelor thesis Erschlieung und Analyse von Twitter Analyse Tools (Exploitation and Analysis of Twitter Analysis Tools) to archive and lter tweets of specic users, hashtags and keywords.3 This can simplify the analysis of tweets regarding a specic event. STAT was developed in Python. The task was now to expand the implementation of STAT and add a visual representation of the results using an open source JavaScript framework so that the implementation can be used on all operating systems and all browsers. There are further implementations of Twitter visualizations which will be compared and reviewed further on.
last visited: 10.12. 2011 2 http://thelede.blogs.nytimes.com/2009/12/18/twitter-hacked-by-iranian-cyber-army, last visited: 10.12. 2011 3 http://twitter.tugraz.at, last visited: 10.12. 2011
1 http://blog.twitter.com/2011/08/your-world-more-connected.html,
Information Visualization
Information Visualization [Andrews 2002], short InfoVis, deals with the visual representation of data to easier understand and read great amounts of information. InfoVis was developed very early in computer science when computer graphics were introduced. Due to the lack of more sophisticated graphics the usefulness was limited. But over the time when the quality of computer graphics increased, the use of InfoVis and its benets became more obvious. Next to InfoVis there is also data visualization and scientic visualization (SciVis) which are closely connected but still not the same. SciVis is more concerned with the visualizations of data that is given by default like medical data of a x-ray or a PET scan. In InfoVis the data is abstract and the resulting visualizations are based on the ideas and inventions of the developers. A main goal in InfoVis is to focus on areas of interest while maintaining the surrounding context. Three examples of focus-plus-context techniques are 3D perspective, sheye views and overview-plus-detail. [Andrews 2002] 3D perspective focuses on the important part in the foreground while the rest of the information is still visible in the background. Fisheye views feature a magnifying glass like distortion with the center of interest also in the center and the rest of the context on the side. The third example over-view-plus detail is like two separate and synchronised le system windows, that show the overview (le system path e.g.) in one window and the detail view in the other. Very important in InfoVis is the possibility of concurrently oer the overview of the whole data and on demand make it possible to enable to zoom and lter and to show details if necessary. To enable these features many dierenet graphs and systems were developed over the time for a variety of dierent data types. These data types are divided as follows: linear, hierarchies, networks, multidimensional, feature spaces and query spaces. Linear data are tables, lists, chronological ordered items and source code. Hierarchies are tree structures. Networks are all forms of graphs. Multidimensional are metadata attributes combined in an n-dimensional space. Feature spaces are collections of feature vectors that represent objects. Query spaces are objects retrieved by queries.
2.1
Treemap
A treemap is an information visualization graph to display hierarchical data as a set of nested rectangles. The size of the rectangles describe the importance of the entries. Each rectangle can have nested rectangles that are part of the bigger rectangle. Treemaps were rst introduced by Ben Shneiderman in the early 1990s.4 The early treemaps used the slice and dice algorithm which is stable and easy to implement. A trademark of slice and dice treemaps are long rectangles with a very small width. Later on dierent treemap algorithms were invented. One of them is the squaried treemap algorithm [Bruls 2000]. Its
4 http://www.cs.umd.edu/hcil/treemap-history,
last visited: 10.12. 2011
goal is it to make it easier to compare and select data. To order the rectangles in a squaried way a subdivision algorithm is used. Another extension to the treemap algorithm was the adoption of the cushion treemaps [Wijk 1999]. Intuitive shading is used to give the rectangles a more pleasing visual representation. The Newsmap, see Figure 1, is an example of a squaried treemap. The more relevant, the larger a news report is displayed. The user can also hover over a eld to get links and further information because the content of the small rectangles is almost impossible to read. Colors are used to divide the news into dierent subjects such as sports, politics and so on. 2.1.1 Subdivision Algorithm
The treemap squarify algorithm calculates a way to structure the rectangles of a treemap so that they t into the main rectangle. See Figure 2. The best results were achieved by rst sorting the data. Then the rst and largest rectangle with an area of 6 is placed into the main rectangle. The next rectangle, also with an area of 6, is added next to the rst one and the aspect ratio, the ratio of the width to its height, is calculated. If the aspect ratio decreases the new rectangle is placed dierently until the result is pleasing. In the example the rst rectangle was moved from a position that spans the whole height to a position that was more pleasing in the top area. The next rectangle with an area of 4 is added. The aspect ratio is calculated again and the process described before is repeated. This process is continued until the calulation is nished for all rectangles and they are placed properly.
Figure 1: Newsmap
Visual Twitter Analysis Tools
Twitter spawned many implementations that used the basic functionailty of Twitter and added new features to it. Some are text based and others are visual. Many of them are free web based services. Next to these there is also proprietary software developed for companies to better judge what their customers are thinking and for example to better place advertisments or nd out what is important at the moment and where. MAP5 features a full featured analytics service including unlimited access to billions of social media conversations and automated sentiment and geo-demographics. Another product by Sysomos is Heartbeat. It features real-time monitoring of brands and products. The implemention is not made public.
3.1
Twitter Stats
Twitter Stats6 also known as TweetStats is a web based Twitter analysis tool, see Figure 5. The user can enter a user name and the software gets the data which can take a while especially when the website is busy and many other users want to get their data processed. The website informs the user how many other users are waiting and that the user should be patient. When the calculation is
5 http://www.sysomos.com/products/overview/compare-products, last visited: 2011 6 http://www.tweetstats.com, last visited: 10.12. 2011
10.12.
Figure 2: The treemap subdivision algorithm
Figure 3: Twitter Counter over, the resulting statistics show the number of tweets per month. It is also possible to see a more detailed view and see how many tweets a user posted a day in a specic month including the number of total tweets, replies and retweets (tweets per day). There is also a statistic of the usual time of day a user tweets (tweet density). Another chart shows on which day of the week a user tweets most (aggregate daily tweets). The same chart is also available for each hour of a day (aggregate hourly tweets). There is also a ranking of the users users most replied to, users that were most retweeted and the interfaces that were used most. Next to these statistics there are also tag clouds for keywords (tweetcloud) and for hashtags (hashcloud). There is also a link to Worlde7 to create a more ellaborate tag cloud. Next to that there is also an overview of the number of friends and the number of followers and whether they are increasing or decreasing.
7 http://www.wordle.net,
3.2
Twitter Counter
Twitter Counter, see Figure 3, is also a web based Twitter analysis tool that makes it possible not just to see the statistics of one user at a time, but up to three users and compare the data. One graph shows the number of followers in the last hour, week, month, three months or six months. There is an equivalent graph for the number of followees. Another graph shows the number of tweets. There is also a prediction feature how many followers a user will have in X days and how many days to X followers. There is also the possibility to download a graph as a PNG from the website. Not all features of Twitter Counter are available free or if a user is not logged in with his Twitter account.8
3.3
Trendistic
Trendistic9 is another web based Twitter tool that shows trends on Twitter, see Figure 6. There is a basic list with trending topics including a number of hours when the trend was up to date. A very important feature is the possibility to add a keyword. The tweets concerning this keyword are displayed including the date, name of the author and highlighting the search term. The user can then browse through the results in a timeline that also shows the number of tweets concerning this topic. With a click on a specic point in the timeline, the tweets of that time are displayed. The user can zoom in and out of this timeline from a period of time of 24 hours and 30 days. To see 90 days and 180 days the user has to be registered. The trend statistic can be embedded in a website using the embed feature or it can be tweeted.
3.4
Mirror.me
Mirror.me10 is also a web based tool, see Figure 4. The main purpose is to nd networks based on people with the same interest. The user enters a term to search for. The term can be a real name, the Twitter user name, a location or an interest. The search returns a list of results. The user can pick the correct entry and see the calculated data. The starting point is a tag cloud which can also be shared, embedded or downloaded. By clicking on one of the words, a more detailed result is displayed. There is a list of followees and followers who share the same interest. It is also possible to display tweets including that interest. By clicking on people a graph of the prole images of the people in the users network is displayed. The larger the images the more active the users are on Twitter.
last visited: 10.12. 2011 last visited: 10.12. 2011 10 http://www.mirror.me, last visited: 10.12. 2011
9 http://trendistic.indextank.com, 8 http://www.twittercounter.com,
10
Figure 4: Mirror.me
11
3.5
Twitter Stream Graphs
Twitter Stream Graphs11 takes an input value which can be a keyword or the username and takes the last thousand tweets containing that word. After a short calculation a stream graph is calculated and it displays the most mentioned words next to the search word, see Figure 7.
11 http://neoformix.com/Projects/TwitterStreamGraphs,
12
Figure 5: TweetStats
13
Figure 6: Trendistic
Figure 7: Twitter Stream Graph
14
JavaScript Visualization Frameworks
There are many JavaScript visualization frameworks. Below a brief description and comparison of a few is given.
4.1
Highchart JS
Highchart JS12 is an interactive JavaScript framework that features the visualization of charts. For non-commercial use it is free. Commercial and governmential use is required to pay a fee. The main features of the framework are line and scatter charts, area charts, column charts, bar charts, pie charts, dynamic charts and combinations of various charts. The charts load dynamically and are interactive. To use Highcharts JS, jQuery, MooTools or Prototype are needed for basic JavaScript tasks. The data used has to be preprocessed. The necessary le formats are CSV, JSON or XML.
4.2
jQuery Visualize
Another possibility to create visualizations with JavaScript is using jQuery Visualize13 . This tool makes it possible to read data from an HTML table and insert it dynamically into a JavaScript visualization. The names of the elds are dened by the TH elements of the table. A JavaScript function then takes the data and processes it into the graph. The charts itself can be styled using CSS les.
4.3
Protovis and D3
Protovis14 is a framework to create custom and standard information visualizations. It was developed by the Stanford Visualization Group and is no longer in active development but can still be used. Its successor is D315 (Data-Driven Documents). It is a JavaScript framework to create custom interactive visualizations. The implementation of D3 was made easier compared to the implementation with Protovis. One of its most important features is high customizability.
4.4
JavaScript InfoVis Toolkit
The JavaScript InfoVis Toolkit, also known as JIT, is a free JavaScript framework to display various kinds of data. The graphs and charts are interactive, see Figure 8. It features charts, bar charts, pie charts, sunburst, icicle, force directed, treemap, spacetree, rgraph and hypertree. The input data is based on JSON. The implementation uses a combination of JavaScript, CSS and HTML. That makes it necessary to reload a page when the data of a JSON le has
last visited: 10.12. 2011 to jquery visualize accessible charts with html5 from designing with, last visited: 10.12. 2011 14 http://mbostock.github.com/protovis, last visited: 10.12. 2011 15 http://mbostock.github.com/d3, last visited: 10.12. 2011
13 http://www.lamentgroup.com/lab/update 12 http://www.highchart.com,
15
Figure 8: JavaScript InfoVis Toolkit (JIT) changed to be displayed properly. Other than that the JIT oers a very dynamic look and can be edited in many ways. Most of the visual customizations can be done using CSS. JavaScript controls the implementation of the datastructures.
16
5
5.1
STAT
Basic usage
STAT (semantic Twitter analysis tool) enables the user to enter one or two search terms that can be a Twitter username, hashtag or keyword and analysizes how often they where mentioned, see Figure 9. So STAT is in the position to answer certain questions. The following lists all answers STAT can give depending on the number of parameters given:16 Case 1: person X Which persons does X correspond with? Which keywords does X use? Which hashtags does X use? Case 2: person X, keyword/hashtag Y Who does X talk to about Y? Which keywords does X write together with Y? Which hashtags does X write together with Y? Case 3: person X, keyword/hashtag Y Which other persons are addresse by X together with Y? Which keywords does X use when addressing Y? Which hashtags does X use when adressing Y? Case 4: keyword/hashtag X Which persons write about X? Which keywords are used with X? Which hashtags are used with? Case 5: keyword/hashtag X, keyword/hashtag Y Which persons write X together with Y? Which keywords are used with X and Y? Which hashtags are used with X and Y? Case 6: keyword/hashtag X, person Y
16 Due to violations of the Twitter policy, Twitter forced Twapperkeeper to stop their service. So the implementation depending on Twapperkeeper is not working at the moment.
17
Which persons write about X with Y? Who does Y talk to about X? Who else is addresse with Y about X? Which keywords are used by Y about X? Which hashtags are used by Y about X?
5.2
An example query
If a user enters a name X, STAT can nd the answer to the three questions that are Which persons does X correspond with?, Which keywords does X use?, Which hashtags does X use?. STAT returns the results of a query in a list of words and their corresponding number of occurrences. Below is an example for the search term which @persons write about #opco11. In that case the person who wrote most about #opco11 was designeon with 216 tweets, see Figure 13. The user with the second most uses was herrlarbig with 169 mentions. The result is more obvious and easier to nd than if one had to read over all the tweets concerning the search term. But it is still not so easy to read the result. Therefore a visual representation of these results featured by STAT can increase the readability of the search results. designeon (216), herrlarbig (169), lisarosa (143), hamster44 (141), anjalorenz (140), dieHauteCulture (132), TwInfoManager (129), dieGoerelebt (114), mons7 (106), vilsrip (95), Rya (87), timovt (86), UE trainer (85), dunkelmunkel (83), Networking Lady (64), hosi1709 (63), lress (59), osa11 (57), germankiwi (54), VolkmarLa (47), MatthiasHeil (40), fraukeatschool (39), EDV Twitt (37), cervus (35), rueckel (35), heinerich (34), claudiajaeger (33), jowede (33), TFTUser (30), birdy1976 (29), empeiria (26), eme1408 (26), opco11 (25), woxl (25), mebner (25), e trude (24), SieSeCoBiWiBine (23), gibro (23), anntheres (22), KhPape (21), acwagner (21), gibirger (20), jrobes (20), Fontanefan (19), Biwi Uli (19), ralfa (19), KoenigAndreas (18), Bildungsjunkie (18), mediendidaktik (18), mkerres (17), wneuhaus (16), StinaSch (15), gmeder (15), martinkurz (15), joerg bern (14), ittnerfa (14), biwirenztem (14), sociallearning (14), nele we (13), ulfblanke (13), spani3l (12), WolleSch (12), schb (12), JasminHamadeh (11), sicherdeinweb (11) STAT doesnt directly communicate with Twitter. It uses archive frameworks that store tweets to get the tweets from. [Altmann 2010] In detail the process of STAT works as follows. The user enters one or two search terms which are one of the following: the name of a person, a keyword or a hashtag. If there is no archive available for that search term it is created. If it is available, STAT uses the data right away. The possibility to store tweets for a long time makes it possible to analysize data not just when it was tweeted but even afterwards 18
Figure 9: STAT web interface which would not be possible if the data came directly from Twitter because the Twitter API restricts the access of older tweets. When STAT got the data from the archive it is transformed and stored locally as a JSON le. The local JSON les get updated if the data in the archive changes. These JSON les are the base of all calculations performed by STAT.
19
Implementation of the STAT visualization
The visualization of STAT was developed using a variety of dierent components. The core of the implementation is the JavaScript InfoVis Toolkit and its implementation of treemaps. Next to this JavaScript based framework, PHP was used to calculate some necessary operations in the background. JSON les store the calculated data locally. To make it possible to get live data from Twitter a small jQuery extension and a JavaScript snippet17 was used.
6.1
Creating a JIT readable JSON le
The search results from STAT are returned by STAT in the PHP le statlive/visual/api/analyze.php. The resulting strings $a1 to $a5 are the result of its calculations and are assigned to the method treemap() of the le statlive/visual/api/treemap.php. The method takes the ve parameters. The rst one is the case number, the second and third are the search terms, the fourth is the result from STAT as string and the fth is the list of names, keywords or hashtags. i f ( $a1 != n u l l ) { $ t r e e m a p J s o n = t r e e m a p ( 1 , $ p 1 f u l l , n u l l , $a1 , $ n a m e l i s t ) ; echo w h i c h @ p e r s o n s w r i t e a b o u t . $ p 1 f u l l . ; echo <a h r e f = t r e e m a p r e s u l t . php ? i d=0&p=1>Treemap</a> ; } i f ( $a2 != n u l l ) { $ t r e e m a p J s o n = t r e e m a p ( 2 , $ p 1 f u l l , n u l l , $a2 , $ w o r d l i s t ) ; echo w h i c h k e y w o r d s a r e u s e d w i t h . $ p 1 f u l l . ; echo <a h r e f = t r e e m a p r e s u l t . php ? i d=0&p=2>Treemap</a> ; } i f ( $a3 != n u l l ) { $ t r e e m a p J s o n = t r e e m a p ( 3 , $ p 1 f u l l , n u l l , $a3 , $ h a s h l i s t ) ; echo w h i c h #h a s h t a g s a r e u s e d w i t h . $ p 1 f u l l . ; echo <a h r e f = t r e e m a p r e s u l t . php ? i d=0&p=3>Treemap</a> ; } The data is send to the method transformJSON(). This method basically transforms the incoming data in a JSON le that is readable for the JIT library.
17 http://remysharp.com/2007/05/18/add-Twitter-to-your-blog-step-by-step, last visited: 10.12. 2011
20
Depending on the size of the data and the number of entries, the data for the treemap is cut to a maximum of 50 entries, because more entries would make the treemap not readable and too confusing. The strings that STAT delivers are names and numbers. All the numbers are summed up so that the full area of the treemap can be calculated. The resulting names, numbers and the name of the search term are saved to temporary TXT les so that they can be used later on. The content of the parameter.txt is the search term: #opco11 The content of the names.txt are the user names: designeon herrlarbig lisarosa hamster44 anjalorenz dieHauteCulture TwInfoManager dieGoerelebt mons7 vilsrip ... Corresponding to the user names the numbers of each user are stored in the numbers.txt: 216 169 143 141 140 132 129 114 106 95 ... After the data is saved in a string that represents the JSON variable it will be sent to JIT. The values are set in a for loop until the data is complete. The values that are set in the JSON for each entry are the name, the id, the area size and the color. The color is calculated dynamically depending on the size of the area. The larger the area the more colorful it is. When the JSON string is completed it is returned and saved to le by the function saveJSONToFile(). Finally a link to treemap result.php is printed to the specic treemap. f u n c t i o n saveJSONToFi le ( $ j s o n ) {
21
$tmpJSONFile = . . / tmp/ j s o n . $GLOBALS [ tmp ] . . txt ; $ f = f o p e n ( $tmpJSONFile , w ) o r d i e ( E r r o r ) ; f w r i t e ( $f , $ j s o n ) ; f c l o s e ( $f ) ; } The resulting JSON has the following form: { children: [ { children : [{ children: [] , data : { c o u n t : 216 , $ c o l o r : #00 f f 0 0 , image : . . / v i s u a l i z a t i o n / p r o f i l e i m a g e . png , $ a r e a : 216 }, id : designeon , name : d e s i g n e o n } ,{ children: [] , data : { c o u n t : 169 , $ c o l o r : #00dd00 , image : . . / v i s u a l i z a t i o n / p r o f i l e i m a g e . png , $ a r e a : 169 }, id : herrlarbig , name : h e r r l a r b i g } ,{ children: [] , data : { c o u n t : 143 , $ c o l o r : #00c c 0 0 , image : . . / v i s u a l i z a t i o n / p r o f i l e i m a g e . png , $ a r e a : 143 }, id : l i s a r o s a , name : l i s a r o s a } ,{ children: [] , data : { c o u n t : 141 ,
22
$ c o l o r : #00c c 0 0 , image : . . / v i s u a l i z a t i o n / p r o f i l e i m a g e . png , $ a r e a : 141 }, i d : hamster44 , name : h a m s t e r 4 4 }, ...
6.2
Initializing JIT
The visualization itself is located at treemap result.php. This is a HTML le with embedded PHP code. All necessary CSS les that are responsible for the look of the treemap are in the header of HTML. JIT uses a base CSS and a treemap CSS. The library itself is included right below. Furthermore jQuery and a Twitter library are also included in the header. The JavaScript function init() is called when the body of the page is loaded. The PHP variable $id is initialized with the passed GET value so that PHP knows what le it has to load. After the value is assigned the TXT le containing the correct JSON is loaded into the variable $json. The JSON is then written to a temporary txt le. After these preliminary steps, JavaScript manages the visualization. The JavaScript variable json gets the data from the PHP string $json by printing the value. v a r j s o n = <?php p r i n t ( $ j s o n ) ; ? >; Then the treemap object is created. v a r tm = new $ j i t .TM. S q u a r i f i e d ( ) The treemap has Events set for right click and for hovering which are set right below initialization. On right click the following method is called. window . l o c a t i o n . h r e f = removeNode . php ? i d= + node . name + &p= + <?php p r i n t ( $p ) ; ? >; This triggers the removal of the specic node. For more about this functionality see below. When a rectangle is hovered the following code is loaded. g e t T w i t t e r I m a g e ( node . name ) ; The function sends a request to Twitter with the user name that was given. If the request was successful the HTML DIV with the id avatar is reloaded with the current data. f u n c t i o n g e t T w i t t e r I m a g e ( name ) { $ . a j a x ({ 23
u r l : h t t p : / / a p i . T w i t t e r . com/1/ u s e r s / show . j s o n ? s c r e e n n a m e= + name , dataType : j s o n p , s u c c e s s : f u n c t i o n ( data ) { document . g e t E l e m e n t B y I d ( a v a t a r ) . s r c = data . p r o f i l e i m a g e u r l ; }, error : function () { alert ( Error . ); } }); } The current tweets are retrieved using a small JavaScript library developed by Remy Sharp18 . The library is included in the HTML head and the function getCurrentTweets() in treemap result.php is called. f u n c t i o n g e t C u r r e n t T w e e t s ( name ) { getTwitters ( tweet , { i d : name , count : 3 , enableLinks : true , i g n o r e R e p l i e s : true , clearContents : false , t e m p l a t e : % t e x t % <a h r e f = h t t p : / / T w i t t e r . com/ %u s e r s c r e e n n a m e%/ s t a t u s e s /% i d s t r %/>%t i m e%</a> }); r e t u r n + name + ; }
6.3
Removing a rectangle from the treemap
Due to the fact that the JIT library uses a combination of HTML and CSS to display a treemap, a reloading by JavaScript is not possible if the structure of the treemap is changed. The whole page needs to be reloaded. Otherwise the rectangles and the content would not t together. So the implementation that was added passes the node name of the rectangle to be deleted and the id of the JSON le to the PHP le removeNode.php. In removeNode.php the values are saved in the variables $nodeName and $caseNumber. The name list, number list and the parameter are loaded from the three separate temporary TXT les
18 http://remysharp.com/2007/05/18/add-Twitter-to-your-blog-step-by-step, last visited: 10.12. 2011
24
that were created in the step before. Then the index of the rectangle that is to be deleted is calculated. The three temporary TXT les names, numbers and parameter are overwritten. After that the JSON le is created as before only with the new values. The result is also written to le and the new treemap is loaded at display treemap.php, see Figure 11. window . l o c a t i o n . h r e f = removeNode . php ? i d= + node . name + &p= + <?php p r i n t ( $p ) ; ? >; The look of the treemap is controlled by CSS and data from the JSON les. The CSS that is responsible for the tool tip of the treemap is located in visual/visualization/STAT base.css. . tip { c o l o r : #1 1 1 ; w i d t h : 139 px ; background c o l o r : w h i t e ; b o r d e r : 1 px s o l i d #c c c ; mozboxshadow : #555 2 px 2 px 8 px ; w e b k i t boxshadow : #555 2 px 2 px 8 px ; oboxshadow : #555 2 px 2 px 8 px ; boxshadow : #555 2 px 2 px 8 px ; opacity : 0.9; f i l t e r : a l p h a ( o p a c i t y =90); f o n t s i z e : 10 px ; f o n t f a m i l y : Verdana , Geneva , A r i a l , H e l v e t i c a , sanss e r i f ; p a d d i n g : 7 px ; }
25
Figure 10: STAT web interface
26
Figure 11: STAT web interface after the removal of a node
27
Figure 12: STAT web interface with tooltip
28
Figure 13: STAT web interface
The Implementation used on an example
In the following section an example search query will be performed analyzing and comparing the performance of STAT with and without the visualization and how the visualization improves the use of STAT. It will also show the constraints of the use of treemaps for visualization. The user is presented with a list of archives that are currently available. Next to each entry is a button that enables the user to analyze that certain term. In this example the hashtag #opco11 will be analyzed. After pressing the button analyze the user can then add another value that will be included in the analysis or the user can proceed to the result. Another value could be a person, a hashtag or a keyword. In this example no other value will be added. STAT itself returns three results. The rst is a list of persons that write about #opco11. The list consists of the name of the person and the number of mentions of #opco11. The second result are the keywords that are used with #opco11 and the third result are the hashtags that are used with #opco11. In any case the list is very hard to read. By clicking on visualization the user is lead to the same results visualized with treemaps. Depending on the type of result (person, hashtag or keyword), the color of the treemap is dierent. Treemaps with persons are green, treemaps with hashtags are red and treemaps with keywords are blue. That makes it easier to see with one glance what the result looks like. The result of the treemap is furthermore shortened to a maximum of 50 entries because otherwise the treemap loses its readability. The results that are available as treemaps are presented as hyperlinks. In this case there are also three results. The treemap represents the highest result
29
in the largest and brightest rectangle. The value of the result is written in the rectangle. By hovering over on of the rectangles further details are revealed, see Figure 12. If the rectangle is a person this persons Twitter user name, the number of mentions and the latest tweets are shown. In case the rectangle is a keyword or hashtag, only the number of mentions is shown. Furthermore a rectangle can be deleted by right clicking on the rectangle. The treemap reloads and the new result is presented.
30
Conclusion
The huge quantity of information that is produced every day in the modern world makes it necessary to lter the information necessary for each and anyone. Information should be accessed easily and fast. This can be enabled in the computer sciences using the so called information visualization. Information visualization is the customized visual representation of data. Compared to scientic visualization, information visualitation requires the invention by the developer to t the available data. The goal is to make the whole data visible at once, plus focusing at the same time on the detail that is important at the moment and be highly dynamic. The data resulted by STAT is a list of terms and numbers that represent the occurrances of that term in a specic query. So a meaningful representation of that data was the treemap. A treemap is a collection of rectangles that is tted into a big rectangle using a spacing algorithm. The size of the area of a treemap represents the number of occurrances of a term. That makes it possible for a user to easily see the most important terms. By hovering over smaller entries, the focus can change on these elements. The dynamic aspect of the treemap is not just the hovering tip that returns detail information of an element, there is also the possibility to delete rectangles of the treemap and review the resulting treemap. Further future customizations are thinkable. One restriction STAT and its visualization have is the necessity of archives because Twitter doesnt allow an unlimited access to its data at the moment. So the data STAT receives come from archives which means that only data that is archived can be used, which is a big downside. The analysis of live data would be very interesting but is not possible in the current release. A downside the visualization has is its use of JavaScript which is not always enabled on default by a user. Another problem that is more important is the use of the combination of CSS and HTML elements plus JavaScript. That means that a new result can only displayed properly when the whole page is reloaded and that makes the whole implementation not as dynamic as possible. Very positive is the use of JSON to store data for the visualization. Further data elements could be added easily and used in future implementations.
31
References
Thomas Altmann, 2010, Erschlieung und Analyse von Twitter Analyse Tools. Keith Andrews, 2002, Visualising Information Structures, Aspects of Information Visualization. Mark Bruls, 2000, Squaried Treemaps, IEEE TCVG Symposium on Visualization, 33-42. Michael Friendly, 2009, Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization, http://www.math.yorku.ca/SCS/ Gallery/milestone/milestone.pdf, last visited: 10.12. 2011. Ben Shneiderman, 1991, Tree Visualization with Tree-maps, A 2-d space-lling approach, ACM Transactions on Graphics, 92-99. Jarke J. van Wijk, 1999, Cushion Treemaps, Visualization of Hierarchical Information, IEEE Computer Society, 73-79.
32

Treemap Visualization of The Semantic Twitter Analysis Tool

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Treemap Visualization of The Semantic Twitter Analysis Tool

Uploaded by

Copyright:

Available Formats

Treemap Visualization of the Semantic Twitter Analysis Tool

Albert Leimller u August 2011

Author: Albert Leimller u a lei@sbox.tugraz.at Supervisor: Martin Ebner martin.ebner@tugraz.at

last visited: 10.12. 2011

Visual Twitter Analysis Tools

Figure 2: The treemap subdivision algorithm

last visited: 10.12. 2011

Twitter Stream Graphs

last visited: 10.12. 2011

Figure 7: Twitter Stream Graph

JavaScript Visualization Frameworks

JavaScript InfoVis Toolkit

Implementation of the STAT visualization

Creating a JIT readable JSON le

$ c o l o r : #00c c 0 0 , image : . . / v i s u a l i z a t i o n / p r o f i l e i m a g e . png , $ a r e a : 141 }, i d : hamster44 , name : h a m s t e r 4 4 }, ...

Removing a rectangle from the treemap

Figure 10: STAT web interface

Figure 11: STAT web interface after the removal of a node

Figure 12: STAT web interface with tooltip

Figure 13: STAT web interface

The Implementation used on an example

You might also like