Skip to content

Latest commit

 

History

History
238 lines (180 loc) · 12.6 KB

File metadata and controls

238 lines (180 loc) · 12.6 KB

twitter explorer

Table of Contents

Introduction

twitter explorer

The twitter explorer is an open framework that consists of three components: The collector (left), after having set up the credentials, allows for connection to the Twitter Search API and saves the collected tweets in CSV according to the twitwi standard. They are then passed on to the visualizer (middle), where the user can get an overview of the content and then create retweet and hashtag networks. The interactive networks are generated as html files that can be explored in the web browser. The modular structure of the three components facilitates the development of new features which are suggested by the light grey boxes.

Installation

Unix / macOS

The twitter explorer requires Python ≥ 3.6 to run. You most likely already have Python installed. To check which Python version you have, open a terminal window and type:

python -V
OR
python3 -V

If your version is above 3.6, continue to the next step. Otherwise, please refer to the guides specific to your operating system to install Python ≥ 3.6.

Download the current release of the twitter explorer and extract it. Open a terminal and change to the folder to which you downloaded the twitter explorer, replacing XXX by the release number:

cd ~/Downloads/twitter-explorer-vXXX

Now run the following command to install the necessary Python libraries (use pip3 if you used python3 before):

pip install -r requirements.txt

You can now start the collector from within the same terminal window:

streamlit run collector.py

You should see an error message that tells you to authenticate with your Twitter Developer credentials. Move on to the next section to generate the necessary keys.

To close the streamlit interface, hit CTRL + C in the terminal.

Windows

Download and install Python 3.8.2 from here, making sure to tick the option of adding Python to your PATH variable.

Download the current release of the twitter explorer to your Desktop folder and extract it.

Open a Powershell (hit the Windows key ❖ and start typing "power" until you see the "Powershell" icon and click it).

Type in the following command in the Powershell to go to the twitter explorer directory, followed by ENTER ↵, replacing XXX by the current release number:

cd .\Desktop\twitter-explorer-vXXX

Now, type in the following command to install the necessary packages, followed by ENTER :

pip3 install -r requirements.txt

After a while, all packages should be installed and you can start the collector with

streamlit run collector.py

To close the streamlit interface, hit CTRL + C in the Powershell.

Data collection

Authentication

To use the collector, you need to apply for a Twitter Developer Account. Follow the instructions here to generate your access tokens. API v2: Create a new file in the twitter explorer folder called twitter_bearertoken.txt with the following content:

# bearer_token
<insert bearer_token here>

API v1.1: Create a new file in the twitter explorer folder called twitter_apikeys.txt with the following content:

# api_key
<insert api_key here>
# api_secret_key
<insert api_secret_key here>

The twitter explorer is now ready to connect to the API using OAuth 2.0.

Collecting tweets

The collector connects to the Twitter Search API, which allows users to collect tweets from the last 7 days based on an advanced search. Please refer to @igorbrigadir's documentation of the Twitter Advanced Search or try it out in the browser to get a feeling for the possible options.

Change to the folder where you downloaded streamlit, open a terminal and start the data collector by typing:

streamlit run collector.py

The collector interface will open in your browser. You can start a search based on a keyword. The tweets will be downloaded and continuously written into a new CSV file in ./data/{currentdate_keyword}.csv. Note that there are rate limits in the free Search API. When the twitter explorer reaches a rate limit, it will sleep for 15mins and continue the search afterwards. From experience, this results to ~7500 tweets per 15mins. Also, keep in mind the following statement about the Twitter Search API:

Please note that Twitter's search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.

Data transformation

Start the visualizer, which will open the second interface in a browser window:

streamlit run visualizer.py

You can select a previously collected dataset for further analysis from a drop-down menu. If you have your own Twitter dataset, please convert it to the twitwi csv format and copy it to the ./data folder.

The visualizer will create a new folder for every collection you make in the output folder. Refer to File structure for a detailed list of files generated by the twitter explorer.

Timeline of tweets

As a first step, the visualizer creates a timeseries showing the amount of tweets in the dataset over time.

Interaction networks

The twitter explorer can generate different types of interaction networks (retweet, mention, quote, reply) in which nodes are users. A link is drawn from node i to j if i interactions with j. The following graph operations are:

Giant Component

When enabled, the graph will be reduced to its largest connected component.

Aggregation methods

  • "Soft" aggregation Removes all users that are never interacted with and only interact with one other user (and can therefore not be bridges in the network)

  • "Hard" aggregation Removes all users from the network that are interacted with less than t times.

Privacy option

Removes all accessible metadata of users that have less than 5000 followers (no public figures) from the interactive visualization in order to comply with current privacy standards. The nodes are visible and their links are taken into account, but they cannot be personally identified in the interface.

Community detection

The twitter explorer currently supports Louvain [1] and Leiden [2] algorithms for community detection. The community assignments are saved as node metadata. Note that these community detection algorithms do not take into account link direction.

Hashtag networks

The twitter explorer can generate hashtag networks in which nodes are hashtags. A link is drawn between node i and j if i and j appear in the same tweet. The following methods are available:

Giant Component

When enabled, the graph will be reduced to its largest connected component.

Community detection

The twitter explorer currently supports Louvain [1] community detection for hashtag networks.

Clustergraphs

If community detection is enabled, clustergraphs will be generated for both retweet and hashtag networks in which nodes are communities and links are weighted according the the cumulative links between users of the communities.

Export options

context

Its modular structure (division into collector/visualizer/explorer) and the ability to export the data makes the tool compatible with a variety of other data analysis tools. Both retweet and hashtag networks are saved as edgelist (.csv), GML (.gml) and GraphViz Dot (.gv).

File structure

A summary of the file structure is found below:

COLLECTED DATA (created by the collector)
.data/
.data/{date}_tweets_{keyword}.csv <-- collected dataset

INTERACTIVE NETWORKS (created by the visualizer)
./output/

./output/{date}_{keyword}/{date}_{keyword}{interaction_type}.html <-- interaction network
./output/{date}_{keyword}/{date}_{keyword}_HTN.html <-- hashtag network
./output/{date}_{keyword}/{date}_{keyword}_{interaction_type}_CG_{comdec_method}.html <-- interaction network clustergraph
./output/{date}_{keyword}/{date}_{keyword}_HTN_CG_{comdec_method}.html <-- hashtag network clustergraph

EXPORTED NETWORKS (created by the visualizer)
./output/{date}_{keyword}/export/
./output/{date}_{keyword}/export/{interaction_type}.csv <-- interaction network as edgelist
./output/{date}_{keyword}/export/{interaction_type}.gml <-- interaction network as gml
./output/{date}_{keyword}/export/{interaction_type}.gv  <-- interaction network as dot for graphviz
./output/{date}_{keyword}/export/HTN.csv <-- hashtag network as edgelist
./output/{date}_{keyword}/export/HTN.gml <-- hashtag network as gml
./output/{date}_{keyword}/export/HTN.gv  <-- hashtag network as dot for graphviz

Network exploration

explorer_screenshot
Open the generated html files to explore the generated networks (we recommend using the latest version of Firefox for full feature support). The command palette on the left displays information about the network and can be interacted with. Currently, the following features are implemented:

  • show information about the dataset
  • show number of nodes and links
  • recolor nodes according to community assignment
  • change node size according to metadata values
  • change node scaling
  • display user metadata on click
  • search for users / hashtags
  • show user tweets in dataset
  • show current user timeline
  • take a screenshot of the current graph view
  • export the graph as GML for Gephi
  • export the user metadata as a CSV

References

[1] Blondel, Vincent D., et al. "Fast unfolding of communities in large networks." Journal of statistical mechanics: theory and experiment 2008.10 (2008): P10008.
[2] Traag, Vincent A., Ludo Waltman, and Nees Jan Van Eck. "From Louvain to Leiden: guaranteeing well-connected communities." Scientific reports 9.1 (2019): 1-12.