You are on page 1of 35

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/306275910

Hashtags that Matter: Measuring the


propagation of Tweets in the Dilma Crisis

Article August 2016

CITATIONS READS

0 240

3 authors, including:

Ernesto Calvo
University of Maryland, College Park
68 PUBLICATIONS 1,005 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ernesto Calvo on 19 August 2016.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Hashtags that Matter:
Measuring the propagation of Tweets in the Dilma Crisis

Ernesto Calvo Eric Dunford Neil Lund


ecalvo@umd.edu edunford@umd.edu NeilBLund@gmail.com
University of Maryland University of Maryland University of Maryland

August/2016

Abstract: This research note describes a method to rank and map the propagation of
hashtags and other edge traits in complex social networks. We take advantage of the
known properties of the Generalized Friendship Paradox to measure the propagation rate
of edge attributes (hashtags) in Twitter. We proceed to map the regions of political
networks that are activated by different hashtags. We exemplify our strategy analyzing a
large network in Brazil during the second half of 2015, the #Dilma mobilizations. Results
show that government messages spread through a dense community of pro-government
users that actively retweeted the content of a small set of politicians (network
authorities). Meanwhile, opposition messages spread using a distributed network
strategy, with very active network hubs coordinating their political messages.

Acknowledgments: We thank Ed Summers for his support in setting up the data


collection environment. Twitter data was collected using twarc (Summers 2015) on
Twitters forward API stream. We thank Luigi Curini and the participants of the political
science workshop at the University of Milan as well as the participants of the Instituto de
Clculo, UBA-Argentina for comments and suggestions. The Graduate School at the
University of Maryland and GVPT provided support for this research.

Measuring and mapping the propagation of political discourses in complex


networks is an important challenge to researchers that hope to take advantage of the
wealth of social media data currently available. As political conflict migrates from Main
Street to Virtual Street, activists, lobbyists, and politicians are investing more time and
resources to spread their message among social media users, bloggers, and at online
forums. The Arab Spring, #Ferguson, #Nisman, #Dilma, and #Ayotzinapa are a few
examples of major political conflicts that have been fought online as much as offline. In
these social media markets, assessing winners and losers by measures of political
messages spread has become an important task.

In this article, we take advantage of the properties of the Generalized Friendship


Paradox (GFP) to measure the propagation of political discourses in Twitter. In complex
networks, the Friendship Paradox (Feld 1991) states that my friends have on average more
friends than I do. This mathematical property of complex networks is the result of nodes
with higher degrees (more friends) being more frequently observed in samples drawn
from the network (Feld 1991). Recent work has also shown that the paradox can be
generalized to other traits displayed by individuals in the network, provided that these
traits correlate with node degree (Eom and Jo 2014; Fotouhi et.al. 2014). Consequently,
the GFP states that my co-authors have on average more co-authors than I do and that,
as we will show, my conservative or liberal friends have more conservative or liberal
friends than I do.

In the case of political messages, the GFP will hold for discourses and hashtags that
propagate through the network and, consequently, take on the properties of higher degree
nodes. That is, hashtags that are popular will climb-up to and roll-down from higher
degree nodes and propagate. Meanwhile, hashtags that are not popular will be unable to
do so. As we will show, we can use this property of the GFP to compare hashtags and
measure their capacity to reach individuals or groups of individuals in social networks.

We exemplify our measurement strategy by comparing the propagation of


hashtags in an intense political conflict in Brazil that was prominently featured in social
media debates: the #Dilma pro- and anti-impeachment mobilizations. We search for

tweets under the string Dilma and map the dissemination of hashtags among
communities of users. Results show that, first, regions of the network were activated by
different hashtags, few of which were able to propagate outside of their original
communities while most others died out when crossing the boundaries of their
communities of origin. Second, we provide a strategy to score and rank the propagation
of discourses in complex networks using the generalized friendship paradox. Finally, we
provide evidence that the information gathered using the GFP can be used to estimate
auto-regressive network models to measure whether propagation of the hashtags is driven
by network authorities (from above) and/or by network hubs (from below). We exemplify
our strategy uncovering distinct communication strategies by government and opposition
users in the Dilma crisis in Brazil. While government messages spread vertically, from
network authorities to their local community of users, opposition messages spread
through a distributed network strategy of very active hubs. I defined the opposition
communication as a distributed we conquer type of strategy, with a coordinated set of
very active users tweeting and retweeting messages from each other through different
communities.

The auto-regressive model provides key insights onto the actors that propagate
political messages. Results show that pro-Dilma messages propagate hierarchically
through a few high level authority nodes (large groups of users retweeting a narrow set of
political authorities). Meanwhile, opposition messages propagate through intermediate
level authorities that retweet messages from each other, augmenting the circulation of
messages across their communities of followers. Consequently, while pro-government
users consistently retweeted messages from a small group of political elites, opposition
messages spread through a distributed set of very active users relying messages from one
another.

1. The Generalized Friendship Paradox

Succinctly described, the friendship paradox (FP) states that individuals in


complex networks have fewer friends than their friends. Feld (1991) was the first scholar
to describe the underlying mechanisms of the FP, showing that the distribution of friends

among friends is a weighted version of the original distribution, weighting those with
many friends especially heavily (Feld 1991: 1469). The properties of the paradox where
described as follows:

If the original distribution has n individuals with x ties apiece, the mean can be
determined as x /n. However, the distribution of friends has x cases (for all of
the friends) and they have a total of x friends, since each individual is counted
as many times as she or he has friends, x , and that individual has x friends.
Thus, the mean number of friends among the friends is x / x . This can be
shown to be a simple function of the mean and variance in the original distribution
of ties. That is: mean number of friends of friends = x / x = mean(x ) +
variance(x )/mean(x ). The expression above shows that the mean among friends
is always at least as great as the mean among individuals, and the mean among
friends increases with the variance among individuals, with a given mean among
individuals. The mean among friends is much greater than the mean among
individuals if there is much variation in the population. (Feld 1991: 1469).

An important implication of the Friendship Paradox is that specific traits that are
shared by social media users in the network would be more prevalent via these more
active nodes. This intuition was recently formalized by Eom and Jo (2014), who extend
the friendship paradox to other arbitrary node characteristics. As presented by Eom and
Jo: According to the generalized friendship paradox (GFP), your friends have on average
higher characteristics than you have (Eom and Jo 2014: 1). These higher characteristics
may include the number of co-authors in academic networks, the number of citations, as
well as other node attributes that correlate with node degree within the network. As they
show, when trait characteristics correlate with node degree, the GFP has properties that
are similar to those of the friendship paradox, allowing us to measure how important
traits that are of substantive interest are distributed (or how they propagate) through the
network.

In the particular case of social networks, researchers have been particularly


preoccupied with mapping and explaining how discourses propagate through connected
nodes in social media networks. Hashtags such as #BlackLivesMatter, #YoSoyNisman,
#ImpeachDilma, or #Ayotzinapa should behave as predicted by the GFP and propagate
through the network if and only if hashtag behavior is a shared trait that is more prevalent

among higher nodes. By contrast, political discourses or hashtags will not propagate when
traits fail to correlate with node degree. That is, discourses that propagate through the
network will take on the node characteristics of the network and behave according to the
GFP while discourses that fail to do so will not.

Both popular and academic sources have posited that Twitter played a key role in
major political protests from the Arab Spring and Euromaidan to Black Lives Matter and
Ayotzinapa (Tucker 2016). Recent scholarship has examined the role of Twitter, in
particular, as tool for both elites and non-elites to share ideas, build communities, or
coordinate political activity. Empirical analyses have highlighted Twitter as an important
digital platform for recruitment and for activism (Gonzlez-Bailn et. al 2011), as a means
of communication between politicians and their constituents. (Evans et. al 2014), and as
a tool for sharing ideas and build collective identities among protesters (Tremayne, 2014).

However, it has been argued, the growing reach of Twitter does not necessarily lead
to a broader dissemination of ideas to out-communities. Indeed, some scholars have
argued that the tendency for individuals to self-select into communities that reflect their
own personal characteristics (e.g. sorting), may lead online networks to mirror or even
exacerbate the real-life trend toward a balkanized political discourse. (Calvo 2016;
Sunstein 2002; Conover et. al 2011; Himelboim et. al 2013). Rather than exposing users
to new and diverse ideas, Twitter discourses frequently resemble echo chambers where
novel ideas live and die without ever leaving a single, insular community. As a result,
researchers have been particularly preoccupied with examining whether and how certain
political communications are able to disseminate through relatively politically
homogenous communities to reach a broader audience (Barbera et. al 2015). Our strategy
allows us to assess the extent to which different groups or communities are exposed to
distinct sets of messages.1

1
IntheTwitterstatistics,thisisdescribedasimpressions.Ourmeasureoffirstordercontiguityprovidesaproxy
forthetypeofdiscoursesthatareobservedinthewallofdifferentgroupsofusers.

2. Mapping Hashtags in Social Networks

As defined by Eom and Jo (2014), the GFP extends Feld (1991) to inquire on the
whether the average degree of a trait in a network is smaller than the degree of that
same trait among its neighbors, , where describes the set of neighbors for
any given node .

Consider for example an undirected network in the form of a star, with one higher
degree node at the center and 3 nodes of degree 1 in the periphery (see below). Each of
the nodes in the periphery has degree 1 while their friend at the center has degree 3.
Consequently, each of the three nodes in the periphery has 1 friend while on average their
friends, in this case the node at the center of the star, has 3 friends. As described by Feld
(1991), each peripheral node has fewer friends than their friend. It is only the case for the
higher node at the center of the network that she has more friends than the average
number of friends. Therefore, while nodes with low degree have fewer friends than their
friends, nodes with high degree have more friends than their friends.

Now imagine that the peripheral node B


publishes a tweet with the hashtag,
#message. If #message propagates to the
adjacent node and is then retweeted by A at
the center of the network, then the hashtag
will take on the properties of the network and
conform to the GFP. The hashtag #message
will be observed (impressed) in all walls
and correlate with network degree if and only
if it propagates to the higher degree node and
takes on the properties of the network as a
whole. Given that social networks can have multiple regions with higher degree nodes,
#message may propagate across some groups of nodes but not across others. Such
differences allow us to analyze the extent to which different hashtags propagate through
communities in social media networks. Before we describe our strategy to capture traits

and test the GFP, we take a short detour to describe the Presidential crisis in Brazil and
our strategy to collect and process these tweets.

#Dilma
In 2015, at a time of significant economic distress, protests erupted in Brazil
demanding the impeachment of President Dilma Rousseff from the Workers Party (PT).
Similar to the #Nisman case in Argentina, significant political polarization ensue, which
was reflected in the formation of well-defined Twitter communities that vocally opposed
or supported the PT administration.

Brazil has historically had a very fragmented political environment (Mainwaring


1991; Lamounier 1987; Ames 1995). There are close to 30 different legislative blocs in
Congress, with the largest party controlling barely 20% of house seats. However,
significant political activity tends to agglutinate government and opposition camps in
Congress and in elections. Consequently, scholars have defined Brazils political system
as one of Coalition Presidentialism (Limongi 2007; Pereira et.al. 2005; Samuels 2008).
While there has not been a single president whose party controlled a majority of seats
since democratization in 1985, political conflicts tend to be more structured in the
important political arenas. This is also the case in social media data, with different
communities being distinguished in the data but two large communities, government and
opposition users, featuring more prominently. In 2015, a revolt from Dilma Rousseff
allies, led by the Vice-President Michel Temer (PMDB) and the chairman of the House
(Eduardo Cunha), Balkanized the government forces into three different groups. A key
question to be answered was how messages spread across these communities of users and
which communication strategies were more successful in delivering their messages. Next,
we exemplify our strategy to measure the propagation of political messages analyzing the
data from the early months of this crisis.2

2
Weanalyzethedataofthe#Dilmamobilizations,inthemonthspriortotheinitiationofimpeachment
proceedingsagainstDilma.Ouranalysis,consequently,isinterestinginanalyzinghowactorsdemandedorrejected
thepossibilityofimpeachmentratherthantheresponsetoitsinitiation.

The Data

Between September and December of 2015, we collected 3,255,254 tweets that


used the characters Dilma and were posted by 499,572 users. To analyze the structure
of social media networks in Brazil and to analyze the propagation of political messages,
we selected all re-tweets in the sample, a total of 1,828,431 re-tweets, representing 56%
of all activity and covering a total of 341,699 users. Of the total accounts, we selected for
our analyses only nodes that participated multiple times (the primary connected
network), resulting in 49,424 accounts that were responsible for 1,421,716 re-tweets. As
in other cases, the data showed a remarkable degree of concentration of the information,
with less than 5% of the total accounts being responsible for 44% of the information being
circulated in the network.3 Together with the information of each tweet, we collected 16
variables reporting the screen name of the users, the number of followers, the number of
followees, the time of the tweet and re-tweet, as well as the status of the users account
(verified or not verified) among others.

To create the #Dilma network we implemented the following procedure: 1) we


loaded all 1,421,716 edges with the original authors set as authorities and the accounts
that re-tweeted information as hubs. We estimated a layout of node coordinates using the
Fruchterman-Reingold forced-directed algorithm4 in R 3.2 igraph (Csardi and Nepusz
2006) and proceeded to identify the communities of the #Dilma networks via random
walk community detection.5 The Fruchterman-Reingold algorithm seeks to maximize
network visualization, communicating information about the proximity between nodes
(data reduction) while preventing nodes from overlapping onto each other.

3
TherateofretweetsandtheconcentrationoftheinformationisconsistentwithsimilarcasesinArgentina
(#Nisman)andtheUS(#Ferguson).
4
Thisforceddirectedalgorithmplacesnodesaccordingtoproximitybutaddsaspringthatseparatesnodesto
favorvisualization.Thisisimportant,asweareinterestedinhighlightinghowdifferentareasofthenetworkare
activatedbydistincthashtags.
5
Randomwalkcommunitydetectionisimplementedintheigraphpackage(CsardiandNepusz2006).Weused4,
5,and6stepwalkswhichproducedcomparablecommunities.VisualizationinFigure1uses5stepsetting.

Figure1:#DilmaSocialMediaNetwork,withlabelsidentifyingcommunitiesthatsupportedDilma,Cunha,andthe
opposition.SeptemberthroughDecember2015

Note: Network of 49,424 high activity nodes from the Dilma case, using 1.4 million retweets from September through December of
2015.

In Figure 1, the government community is depicted in gray on the left of the graph.
It is also possible to see the pro-Michel Temer and Cunha community as separate but
proximate to the government. The opposition has a primary community in pink and two
smaller sub-groups, one from the left above the Cunha community and one that is in the
lower part of the graph. As in the #Nisman case in Argentina, many of the most significant
actors in the opposition were led by opposition media entities such as Veja and OGlobo
(Calvo, 2015). We also observe very significant activity from conservative users from
Venezuela and Chile, a feature that has repeated over a number of mobilizations that
migrated from Main Street to Virtual Street.6 By contrast, the pro-Dilma users were
mostly led by political figures with a very concentrated following. The different structure
of the government and opposition social networks, consequently, should affect how
messages spread across these different communities. In what follows, in Section 3, we
measure the propagation of hashtags through the full #Dilma network.

3. Mapping Friends of Friends in #Dilma

To measure the propagation of hashtags in the #Dilma network we proceed as


follow. First, we create an adjacency list of neighboring nodes for every node and an
adjacency list of edges for every node in the network. Our function takes as input the
two adjacency lists and collects a matrix of counts with and ,
summing the tally of traits in neighboring edges e and nodes i.e. the trait count. For
example, a trait could be the hashtag #ForaDilma and our function will add a 1 to node
every time that the hashtag is observed in an adjacent edge. The average prevalence of the
traits among adjacent nodes, in turn, will compute the mean count of traits in adjacent
nodes i.e. the prevalence of the trait among my friends. Every node in the network,
therefore, will have an associated count of the trait in connected edges as well as an
associated count of the trait among friends. If the distribution of the trait in the network

6
Wefindsimilarpatternsinthe#NismancaseaswellasintheantiBacheletdemonstrationsthatfollowedthe
replacementofhercabinetinmid2015.

10

takes on the properties of the network as a whole, then the prevalence of the trait will be
lower in node than among the friends of .

We consider a number of different hashtags, words, and accounts for our matrix of
rows and categories. We also included information about the prevalence of the trait
Dilma, our search string, as a baseline category. The string of characters dilma is
observed in every tweet of our dataset and maps the overall degree structure of the
network as a whole. After we collected the matrix of traits for every node i, we collected a
traits of friends matrix with the average prevalence of the trait in the adjacent nodes i.e.
. This second matrix contains the first-order auto-regressive value of the trait
in adjacent nodes and could also be used to estimate models with network dependency.

A strategy to visualize the propagation of different hashtags in a social media


networks is to plot the relationship between the individuals node trait (my trait count)
and the prevalence of the trait in neighboring node trait (my friends trait count). Figure
2 provides such an example using the pro-government hashtag there will not be a coup
(#NaoVaiTerGolpe). The horizontal axis reports the log of the count of the trait while the
vertical axis describes the difference between the prevalence of the trait in each node and
the average count among the friends of that node. In Figure 2, the upper line (solid)
describes the maximum propagation of a trait , as estimated from the prevalence of the
word Dilma in the data. The dash line describes the propagation of the hashtag of
interest, # NaoVaiTerGolpe. The red line describes the point at which the data moves from
my friends have more of the trait than I do (expansion) to my friends have less of the
trait than I do (contraction).

11

Figure2:Propagationofthehashtag#NaoVaiTerGolpeintheDilmaNetwork

Note: Solid line describes maximum possible propagation of a hashtag (Dilma). Dashed line describes the
propagation of the hashtag #NaoVaiTerGolpe. A score of .735 indicates that the #NaoVaiTerGol propagated 73.5%
of its maximum possible value. Values above the red line indicate that I have less of the trait #NaiVaiTerGol than
my friends. Values below the red line indicate that the hashtag is moving towards lower degree nodes.

12

Figure3:PropagationofsixdifferenthashtagsintheDilmaNetwork

Note: Solid line describes maximum possible propagation of a hashtag (Dilma). Dashed line
describes the propagation of each hashtag. Values above the red line indicate that I have less of
the trait #hashtag than my friends.

For nodes of low degree (very few counts of the trait), the average count among
friends is very large. As we reach higher degree nodes, however, we encounter those
individuals that are the most connected which have more of the trait than their friends.

Once we set the baseline to the trait Dilma we may compare the propagation of
different hashtags to that baseline. We consider the maximum propagation of a hashtag
as the largest trait count at the point in which node has the same expected count of a
trait than their friends. That is, the point at which the loess line intersects with the mean
trait value (red line) in Figure 2, .

We can see, for example, that the maximum propagation for the hashtag
#NaoVaiTerGolpe is 244 5.5. We may also compare the hashtag to the maximum
propagation of the baseline trait Dilma, 1,808 7.5, for a propagation score of

13

5.5/7.5=0.73. That is, #NaoVaiTerGolpe propagates 73% of the maximum possible


propagation value. Compare to the hashtag #Impeach, which propagates through the
network almost as much as Dilma, 86% of the maximum value, 6.2/7.4. Meanwhile,
the hashtag #OcupyBrasilia propagates the least, a mere 21.6% of the maximum
propagation estimated from the baseline trait Dilma.

4. How closely related are Hashtags and Authorities in the Network

Once we computed the prevalence of traits among nodes, we have at our disposal
a matrix of rows (nodes) and columns (traits). This information allows us to
compare the relative proximity to each other. Indeed, we can use this matrix as a distance
input to estimate how hashtags are connected to each other, how they are connected to
particular user accounts, etc. Consider the Dilma network described before. To the
prevalence rate of the six hashtags in the previous section we added the prevalence of 30
user accounts named in the network, creating a matrix of 49,424 rows (nodes) and 36
columns (traits). We then compute the inter-correlation matrix of all 36 columns and
draw a dendrogram of the 36 different accounts and traits, as shown in Figure 4.

Figure 4 presents a dendrogram to describe the level of association across hashtags


and user accounts. The dendrogram shows that #ForaDilma, #VenPraRua, and
#Impeach are all clustered together, closely connected to opposition authorities within
the network. By contrast, #NaoVaiTerGolpe and #ForaCunha are outside of the main
clusters, next to the accounts Turquim5 and blogplanalto. Loosely connected to the full
network are also the official accounts of vice-president Michel Temer and Deputy
Eduardo Cunha, members of the government coalition and two of the key figures pushing
for the impeachment of Dilma Rousseff.

14

Figure4:DendrogramdescribesclustersofimportantaccountsandhashtagsintheDilmaNetwork

Note: Estimated using the inter-correlation matrix of the GFP traits.

15

Figure5:RegionsoftheDilmaNetworkactivatedbydifferenthashtags

Note: Network of 49,424 high activity nodes from the Dilma case, using 1.4 million retweets
from September through December of 2015. Nodes are resized by the counts of hashtags
collected from the edges.

16

Figure6:DilmaNetworkandpropagationof#ForaCunhaand#ForaDilma.

Note: Left plots describe the location of nodes, middle plots describe the counts of the traits, while the right
plot describes the second-order contiguity counts of the trait among adjacent nodes. The second order
contiguity allows us to measure impressions, given that those are the users that would see the hashtag in
their walls.

17

Figure7:DilmaNetworkandpropagationof#Impeachand#NaoVaiTerGolpe.

Note: Left plots describe the location of nodes, middle plots describe the counts of the traits, while the right
plot describes the second-order contiguity counts of the trait among adjacent nodes. The second order
contiguity allows us to measure impressions, given that those are the users that would see the hashtag in
their walls.

18

To visualize the regions of the network where the hashtags propagate we can resize
the nodes of the network in Figure 2 according to their reported counts of traits. This is
shown in Figure 5, showing that accounts that where identified by the community
detection algorithm as being in the pro-Dilma coalition disseminated the
#NaoVaiTerGolpe and the #ForaCunha hashtags. By contrast, almost all of the
#ForaDilma hashtags circulated among the opposition nodes in the lower right side of the
plots.

To describe the regions within the different communities that are activated by the
different hashtags, we may also use the counts of the traits among nodes and to measure
impressions use the average count of traits among friends as in Figures 6 and 7. In both
figures, the middle plot describes the prevalence of traits in each node while the plots on
the right describe the second-order contiguity, measuring those individuals that would
have been able to observe the hashtag in their walls. This second-order contiguity
captures what Twitter defines as impressions. That is, tweets that would be posted on
the wall of an individual irrespective of whether they interacted with the post.

While we have shown that hashtags propagate through different network regions
and propagate at different rates, we still do not know whether messages spread from
above or from below. That is, we do not know if messages spread through the network
through important authorities that have high in-degree or through very active users with
high out-degree. In the next section we use a network first order auto-regressive model
(NAR1) to measure the rate of propagation and, more importantly, the type of strategy
that explains messages spread.

5. A network auto-regressive (NAR1) model of propagation of Hashtags

As we will show, the information collected using the GFP may also be used to estimate
whether propagation is carried out through higher in-degree nodes (authorities) or higher
out-degree nodes (hubs). Given that we know the relative prevalence of each trait across
nodes, we can use the first order contiguity information (the average prevalence of the
trait in contiguous nodes) to estimate an auto-regressive network model:

19

~ W BX Eq. 1

Where is the count of the trait k for node i, W is the effect of the first-order
auto-regressive count of the trait among 's adjacent nodes in the adjacency weights
matrix , and the set of covariates and parameters BX. Equation 1 is identical in all
respects to the spatial auto-regressive (SAR) models with contiguity weights, frequently
used in spatial statistics (Anselin and Ray 1991). This model provides a simple alternative
to the more computationally demanding exponential random graph models (ERGM) for
ordinal (count) dependent variables (Hunter et.al. 2008).

The auto-regressive model has two different components that provide critical
information to researchers. The substantive interpretation of the auto-regressive
parameter describes the direct effect of the adjacent local trait count on each nodes trait
counts. That is, describes the local effect of nearby prevalence. As in the spatial auto-
regressive models, we expect 0 1, which indicates positive local effects from
contiguous nodes which decline as we move away to non-adjacent nodes. The second
component, BX, describe the direct effect of network covariates that, in our case, will
include in-degree and out-degree information.

To provide an example of how informative is the NAR1 model, we run a


specification that includes the first order network lag as well as two independent variables
that report the in-degree of the node (authority) as well as the out-degree of the node
(hub). We use splines with 3 degrees of freedom to assess the non-linear effect of
authorities and hubs on the propagation of hashtags. Per design, we expect the auto-
regressive term to be above 0 and below 1, 0 1. However, we are agnostics as to the
relative importance of authorities and hubs in the propagation of different hashtags. We
simply assume that hashtags may propagate by take on the properties of higher in-degree
nodes (authorities) or higher out-degree nodes.

20

Table 1: Network Auto-Regressive model (NAR1) of the Dilma networks hashtags

Trait Trait Trait Trait Trait Trait


#ForaDilma #Impeach #ForaCunha #NaoVaiTerGolpe #VemPraRua #OccupaBrasilia
Localeffectoftheauto 0.143*** 0.136*** 0.117*** 0.228*** 0.095*** 0.049***
regressiveFoFTrait(AR1) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
OutDegreeSplines InDegreeSplines

bs(log(Authority))1 0.118*** 0.029 0.107*** 0.107*** 0.002 0.01


(0.017) (0.035) (0.017) (0.028) (0.022) (0.007)
bs(log(Authority))2 0.170*** 2.149*** 0.294*** 0.564*** 0.008 0.113***
(0.046) (0.097) (0.045) (0.077) (0.06) (0.019)
bs(log(Authority))3 0.414*** 9.461*** 1.038*** 1.861*** 1.822*** 0.552***
(0.082) (0.172) (0.081) (0.138) (0.107) (0.033)
bs(log(Hubs))1 0.165*** 1.401*** 0.022 0.191*** 0.006 0.104***
(0.015) (0.034) (0.015) (0.026) (0.02) (0.006)
bs(log(Hubs))2 0.684*** 4.511*** 0.358*** 0.136*** 0.524*** 0.358***
(0.025) (0.053) (0.025) (0.042) (0.033) (0.01)
bs(log(Hubs))3 2.871*** 3.207*** 1.901*** 2.378*** 3.921*** 0.877***
(0.032) (0.068) (0.032) (0.054) (0.042) (0.013)
Constant 0.061*** 0.016 0.040*** 0.155*** 0.081*** 0.013***
(0.005) (0.011) (0.005 (0.008) (0.006) (0.002)
Observations 49,424 49,424 49,424 49,424 49,424 49,424
R2 0.469 0.733 0.344 0.509 0.471 0.183
AdjustedR2 0.469 0.733 0.344 0.508 0.471 0.182
ResidualStd.Error(df=49416) 0.321 0.678 0.318 0.541 0.422 0.131
FStatistic(df=7;49416) 6,228.6*** 19,411.0*** 3,696.0*** 7,305.4*** 6,291.7*** 1,576.2***
Note: Auto-regressive network estimates of trait counts.

21

Results of the network auto-regressive models are presented in Table 1. As


expected, the auto-regressive term is both positive and smaller than one, indicating that
higher trait counts in adjacent nodes increases the observed prevalence of a hashtag but
also that the effect of adjacent nodes declines rapidly for second order adjacent nodes and
higher. The values of range from a minimum of 0.049 to 0.228, which indicates that
that second order adjacency falls rather quickly towards zero (e.g. this is the case even for
the hashtag with the strongest local network effect #NaoVaiTerGolpe, 0.2282=
0.051984).

The local effect captured by is strongest for the #NaoVaiTerGolpe hashtag


promoted by government allies. The effect of this coefficient conforms to visual inspection
in Figure 2, where it is possible to observe that the #NaoVaiTerGolpe hashtag is compactly
described by a grey region that maps onto the accounts of government allies. By contrast,
hashtags that propagated over a larger region of the network (e.g. #Impeach), display
weaker local effects, 0.136.

A more interesting and counter-intuitive result emerges when considering the


effect of higher in-degree (authority) and out-degree (hub) nodes. Hashtags such as
#NaoVaiTerGolpe and #Impeach where more readily propagated through authority
nodes (government, media), while #ForaDilma is more readily propagated through very
active hubs. Results of the auto-regressive models shows that #ForaDilma spread more
rapidly through very active opposition users; that #Impeach spread through the highest
degree nodes (such as the media), while #NaoVaiTerGolpe spread more readily through
active authority nodes by politicians.

22

Figure8:ThePropagationofHasthagsthroughindegree(authorities)andoutdegree(hubs)nodes

Note: Lines describe the effect of in-degree and out-degree values on the rate of propagation of hashtags (LN).

23

To facilitate the interpretation of the results, Figure 8 provides the expected


increase in the prevalence rate of hashtags for nodes of different in- and out-degrees. As
we can see, as the in-degree and out-degree of the node increases, we also observe a higher
prevalence of each of the hashtags. However, it is interesting to note that the prevalence
of the hashtag #NaoVaiTerGolpe increases rapidly with in-degree scores but more
moderately when considering out-degree scores. Indeed, the hashtag #NaoVaiTerGolpe
spreads through a compact community of users that is connected to a few high in-degree
authorities that spread their political messages. By contrast, the hashtag #ForaDilma
spreads through high out-degree users (hubs) rather than authorities. In other words, the
#ForaDilma hashtag spreads through a distributed strategy of very active nodes that are
coordinating their messages.

6. Discussion: Distributed we Conquer

The analyses presented in this paper provide a clear view of the communication
strategies of communities featured in the #Dilma crisis. Our results show that the pro-
Dilma messages spread to a compact community of users that was vertically connected to
important political figures. The message was concentrated rather than distributed and
was promoted by nodes with high in-degree scores. For this reason, pro-Dilma messages
spread to a smaller set of users with lower out-degree reach. By contrast, our results show
that the anti-Dilma message spread through a distributed network of very active hubs that
re-tweeted each other messages. The list of the most active hubs and authorities in the
appendix to this article provides evidence that two particular communities of opposition
users were most active in the propagation of these messages, closely linked to the the
Journal Veja and newspaper OGlobo.

Our strategy to map the propagation of hashtags in the social networks provides new
insights that facilitate the interpretation of social network behavior. As with simpler
models, we can immediately see that #Impeach, #ForaDilma, and #NaoVaiTerGolpe
appear more frequently in the social network. More importantly, however, our strategy
allows us to rank the propagation of traits, to measure the extent to which different traits
are connected to each other, and, more importantly, to describe whether hashtags

24

propagate from below or from above. Results from our analyses show that hashtags
such as #NaoVaiTerGolpe propagate in a more compact region of the network space,
display stronger local effects, and where more intensely communicated by political
authorities in the network. Our analyses also show that #Impeach displayed weaker local
effects and propagates through a distributed network of nodes with high out-degree
activity.

As social network data becomes available, devising strategies to understand and


measure the propagation of political discourses becomes more pressing. The GFP
strategy described in this paper provides researchers with a strategy to extract
theoretically relevant information and to better describe the evolution of social conflict
in social media.

25

References

Ames, Barry. 1995. "Electoral Strategy Under Open-List Proportional Representation."


American Journal of Political Science no. 39 (2):406-28.

Anselin, L., & Rey, S. (1991). Properties of tests for spatial dependence in linear
regression models. Geographical Analysis, 23(2), 112-131.

Barber, P. (2015). Birds of the Same Feather Tweet Together. Bayesian Ideal Point
Estimation Using Twitter Data. Political Analysis, 23(1), 76-91.

Barber, P., Wang, N., Bonneau, R., Jost, J. T., Nagler, J., Tucker, J., & Gonzlez-Bailn,
S. (2015). The Critical Periphery in the Growth of Social Protests. PloS one, 10(11),
e0143611.

Barber, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting From
Left to Right Is Online Political Communication More Than an Echo Chamber?
Psychological Science.

Calvo, E. (2015). Anatoma poltica de twitter en Argentina: Tuiteando #Nisman.


Buenos Aires, Argentina: Capital Intelectual.

Cao, J., Duan, D., Yang, L., Zhang, Q., Wang, S., & Wang, F. (2016). Social influence
analysis in the big data era: a review. In S. Cui, A. O. Hero, Z.-Q. Luo & J. M. Moura
(Eds.), Big Data over Networks: Cambridge University Press.

Conover, Michael, et al. "Political Polarization on Twitter." ICWSM 133 (2011): 89-96.

Cui, S., Hero, A. O., Luo, Z.-Q., & Moura, J. M. (2016). Big Data over Networks:
Cambridge University Press.

Evans, Heather K., Victoria Cordova, and Savannah Sipole. "Twitter style: An analysis of
how house candidates used Twitter in their 2012 campaigns." PS: Political Science &
Politics 47.02 (2014): 454-462.

Feld, S. L. (1991). Why your friends have more friends than you do. American journal of
sociology, 1464-1477.

26

Fotouhi, Babak, Naghmeh Momeni, and Michael G. Rabbat. "Generalized Friendship


Paradox: An Analytical Approach." Social Informatics. Springer International
Publishing, 2014. 339-352.

Gonzlez-Bailn, S., Borge-Holthoefer, J., Rivero, A., & Moreno, Y. (2011). The
dynamics of protest recruitment through an online network. Scientific reports, 1.

Himelboim, Itai, Stephen McCreery, and Marc Smith. "Birds of a feather tweet together:
Integrating network and content analyses to examine crossideology exposure on
Twitter." Journal of ComputerMediated Communication18.2 (2013): 40-60.

Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., & Morris, M. (2008).
ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for
Networks. Journal of Statistical Software, 24(3), 1-28.

Hunter, D. R., & Handcock, M. S. (2012). Inference in curved exponential family models
for networks. Journal of Computational and Graphical Statistics.

Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M., & Morris, M. (2008).
ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for
Networks. Journal of Statistical Software, 24(3), 1-28.

Jo, H.-H., & Eom, Y.-H. (2014). Generalized friendship paradox in networks with
tunable degree-attribute correlation. Physical Review E, 90(2), 022809.

Kempe, D., Kleinberg, J., & Tardos, . (2003). Maximizing the spread of influence
through a social network. Paper presented at the Proceedings of the ninth ACM
SIGKDD international conference on Knowledge discovery and data mining.

Lamounier, Bolivar. 1987. "Perspectivas da Consolidao Democrtica: o caso


brasileiro." Revista Brasileira de Cincias Sociais no. 4 (2):43-64.

Lawyer, G. (2015). Understanding the influence of all nodes in a network. Scientific


reports, 5.

Limongi, F. (2007). Democracy in Brazil: presidentialism, party coalitions and the


decision making process. Novos Estudos-CEBRAP, 3(SE), 0-0.

27

Mainwaring, Scott. 1991. "Politicians, Parties, and Electoral Systems: Brazil in


Comparative Perspective." Comparative Politics no. 24 (1):21-43.

Pereira, C., Power, T. J., & Renn, L. (2005). Under what conditions do presidents
resort to decree power? Theory and evidence from the Brazilian case. Journal of
Politics, 67(1), 178-200.

Samuels, D. (2008). Brazilian Democracy under Lula and the PT. Constructing
Democratic Governance in Latin America. Baltimore: Johns Hopkins University
Press, 2008b.

Summers, E. (2016). Twarc. https://github.com/edsu/twarc. DOI:


10.5281/zenodo.17385.

Sunstein, Cass R. Republic. com 2.0. Princeton University Press, 2009.

Tremayne, Mark. "Anatomy of protest in the digital era: A network analysis of Twitter
and Occupy Wall Street." Social movement studies 13.1 (2014): 110-126.

Tucker, J. A., Nagler, J., MacDuffee, M., Metzger, P. B., Penfold-Brown, D., & Bonneau,
R. (2016). Big Data, Social Media, and Protest. Computational Social Science, 199.

Wilson, R. M. (2010). Using the friendship paradox to sample a social network. Physics
Today, 63(11), 15.

28

Appendix A: Figures 6 and 7 Walkthrough

Below we describe the code used in Figures 6 and 7 of our article Hashtags that Matter. To
measure the propagation of different hashtags using the Friends of Friends information, we
compute the count a trait for each node in the network and the mean count of the same
trait among contiguous nodes, . Calculating the GFP and the diffusion of traits
across the network follows the process outlined in Section 5 ("Mapping Friends of Friends in
#Dilma"). That is, we:
1. Create an adjacency list of neighboring nodes, counting the occurs of traits (e.g. hashtags);
2. Calculate the average prevalence of the trait among adjacent nodes (my friends) and the
average prevalence within the adjacent nodes (friends of friends).
The above yields a count of trait propagation across the network. Functionally, this process takes
the following form:
Fof.tag<function(tags){
temp<grepl(tags,E(net)$text,ignore.case=TRUE)
aa<rep(0,length(temp))
aa[temp==TRUE]<1
stemp<sapply(el,function(x)sum(aa[x]))
mstemp<sapply(al,function(x)mean(stemp[x]))
result<as.data.frame(list("friends"=stemp,"friends.of.friends"=mstemp))
return(result)
}
where net is the network object rendered by the igraph7 package, and el and al are the edge list
and adjacency list.
Using this function, we cycle through different traits of interest (hastags) and store the
adjacency matrix for each as a list.
tags<c("ForaD","impeach","ForaCunha",
"NaoVaiTerGol","VemPraRua","OcupaBrasilia")
tags.list=list()
for(iin1:length(tags)){
tags.list[[i]]=Fof.tag(tags[i])
}
names(tags.list)=tags
Once the adjacency matrix is calculated, generating the 3D scatter plots is straightforward. We
rely on the package scatterplot3d8 to generate the graphical representation of trait diffusion. The
below code cycles through the adjacency matrices, rendering three seperate graphs as output: a
baseline graph where the z dimension is set to zero, a first-order graph where the z dimension
describes the counts of the traits, and a second-order graph where the z dimension captures the

7
CsardiG,NepuszT:Theigraphsoftwarepackageforcomplexnetworkresearch,InterJournal,ComplexSystems
1695.2006.http://igraph.org
8
Ligges,U.andMchler,M.(2003).Scatterplot3danRPackageforVisualizingMultivariateData.Journalof
StatisticalSoftware8(11),120.

29

contiguity counts of the traits among adjacent nodes. Saving the output follows standard
procedures in the R 3.2.3 environment.
for(iin1:length(tags.list)){
tag.name=names(tags.list[i])
tt=tags.list[[i]]
cairo_pdf(file=f.path,height=5,width=10)
par(mfrow=c(1,3))
#Baseline3Dplot
scatterplot3d(x=as.matrix(l[,1]),type="h",
y=as.matrix(l[,2]),
z=rep(0,nrow(l)),pch=16,color=new.color,
ylim=ylim,xlim=xlim,zlim=c(0,.1),box=F,axis=T,
x.ticklabs="",y.ticklabs="",z.ticklabs="",
xlab="",ylab="",zlab="")
#MyFriends
scatterplot3d(x=as.matrix(l[,1]),type="h",
y=as.matrix(l[,2]),
z=as.matrix(log(tt[,1]+1)),pch=16,color=new.color,
ylim=ylim,xlim=xlim,box=F,
zlab="ln(MyFriends)",xlab="",ylab="",axis=T,
x.ticklabs="",y.ticklabs="")
#FriendsofFriends
scatterplot3d(x=as.matrix(l[,1]),type="h",
y=as.matrix(l[,2]),
z=as.matrix(log(tt[,2]+1)),pch=16,color=new.color,
ylim=ylim,xlim=xlim,box=F,
zlab="ln(FriendsofFriends)",xlab="",ylab="",axis=T,
x.ticklabs="",y.ticklabs="")
dev.off()
}

30

Appendix B: Figures 2 and 3 Walkthrough

Below we describe the code used in Figures 2 and 3 of our article Hashtags that Matter. To
measure the propagation of different hashtags using the Friends of Friends information, we
take as input the count a trait for each node in the network and the mean count of the
same trait among contiguous nodes, . For a given trait, we fit a locally weighted
regression curve to find the peak propagation. The peak would be the maximum attribute value
among my friends where the friends of my friends still have more of a given trait. In other
words, the peak "friendship holding" point.

diffs<function(fof,f){
diff.reg<lowess((foff)~f)
pos<which(diff.reg$x>0)
pos.root<diff.reg[[1]][pos][which.min(abs(diff.reg[[2]][pos]))]
return(list("diff.reg"=diff.reg,"pos.root"=pos.root))
}
For plotting purposes, we add 1 to each observation and take the natural log. The "distance" is
the ratio of the peak value for a given attribute relative to the maximum possible value for the
network.

plot.lowess<function(attr.list,net.list){
#removingthemissingobservations,addoneandgetthenaturallog
nonmissing<which(attr.list$friends.of.friends!="NaN")
stemp<log(attr.list$friends[nonmissing]+1)
mstemp<log(attr.list$friends.of.friends[nonmissing]+1)
friends<log(net.list$friends[nonmissing]+1)
friends.of.friends<log(net.list$friends.of.friends[nonmissing]+1)

#createplot
plot(stemp,mstempstemp,pch=16,cex=.2,col=gray(.5),xlim=c(0,11),
ylim=c(4,7),xlab="LN(Friends)",
ylab="LN(FriendsofFriendsFriends)")
abline(h=0,lty=2,col=2)

#runmodelforattributes
difference<diffs(mstemp,stemp)
#runmodelfornetwork
differences<diffs(friends.of.friends,friends)

#getpredictionsforloessmodelandplot
lines(differences$diff.reg,col='purple',lty=6)
lines(difference$diff.reg,col='blue',lty=6)

#plotguidelinesshowingwhereLOWESScrossesthe0line
segments(difference$pos.root,5,difference$pos.root,0,lty=2)
segments(differences$pos.root,5,differences$pos.root,0,lty=2)

31


#distancebetweentheattributemaxandthenetworkmax
root.dist<differences$pos.rootdifference$pos.root
score=1(root.dist/differences$pos.root)
text(Re(difference$pos.root+1),.3,paste("distance=",round(root.dist,di
gits=3)))
text(8,6,paste("score=",round(score,digits=3)))
}
Example using the "ForaD" tag:

plot.lowess(ForaD,network)
title("#ForaD")
text(10,.5,paste("contracting"))
text(10,.5,paste("expanding"))
arrows(9,.3,9,2,length=.1)
arrows(9,.3,9,2,length=.1)

32

Appendix C: Descriptive counts of top Authorities and Hubs in #Dilma

FigureA.1:NumberofRetweetsfortopDilmaAuthorities

33

FigureA.2:NumberofRetweetsbytopDilmaHubs

34

View publication stats