- Table of Contents
- Introduction
- Installation
- Data collection
- Data transformation
- Network exploration
- References
The twitter explorer is an open framework that consists of three components:
The collector (left), after having set up the credentials, allows for connection to the Twitter Search API and saves the collected tweets in CSV according to the twitwi standard. They are then passed on to the visualizer (middle), where the user can get an overview of the content and then create retweet and hashtag networks. The interactive networks are generated as html files that can be explored in the web browser. The modular structure of the three components facilitates the development of new features which are suggested by the light grey boxes.
The twitter explorer requires Python ≥ 3.6 to run. You most likely already have Python installed. To check which Python version you have, open a terminal window and type:
python -V
OR
python3 -VIf your version is above 3.6, continue to the next step. Otherwise, please refer to the guides specific to your operating system to install Python ≥ 3.6.
Download the current release of the twitter explorer and extract it. Open a terminal and change to the folder to which you downloaded the twitter explorer, replacing XXX by the release number:
cd ~/Downloads/twitter-explorer-vXXXNow run the following command to install the necessary Python libraries (use pip3 if you used python3 before):
pip install -r requirements.txtYou can now start the collector from within the same terminal window:
streamlit run collector.pyYou should see an error message that tells you to authenticate with your Twitter Developer credentials. Move on to the next section to generate the necessary keys.
To close the streamlit interface, hit CTRL + C in the terminal.
Download and install Python 3.8.2 from here, making sure to tick the option of adding Python to your PATH variable.
Download the current release of the twitter explorer to your Desktop folder and extract it.
Open a Powershell (hit the Windows key ❖ and start typing "power" until you see the "Powershell" icon and click it).
Type in the following command in the Powershell to go to the twitter explorer directory, followed by ENTER ↵, replacing XXX by the current release number:
cd .\Desktop\twitter-explorer-vXXXNow, type in the following command to install the necessary packages, followed by ENTER ↵:
pip3 install -r requirements.txtAfter a while, all packages should be installed and you can start the collector with
streamlit run collector.pyTo close the streamlit interface, hit CTRL + C in the Powershell.
To use the collector, you need to apply for a Twitter Developer Account. Follow the instructions here to generate your access tokens.
API v2: Create a new file in the twitter explorer folder called twitter_bearertoken.txt with the following content:
# bearer_token
<insert bearer_token here>
API v1.1: Create a new file in the twitter explorer folder called twitter_apikeys.txt with the following content:
# api_key
<insert api_key here>
# api_secret_key
<insert api_secret_key here>
The twitter explorer is now ready to connect to the API using OAuth 2.0.
The collector connects to the Twitter Search API, which allows users to collect tweets from the last 7 days based on an advanced search. Please refer to @igorbrigadir's documentation of the Twitter Advanced Search or try it out in the browser to get a feeling for the possible options.
Change to the folder where you downloaded streamlit, open a terminal and start the data collector by typing:
streamlit run collector.py
The collector interface will open in your browser. You can start a search based on a keyword. The tweets will be downloaded and continuously written into a new CSV file in ./data/{currentdate_keyword}.csv. Note that there are rate limits in the free Search API. When the twitter explorer reaches a rate limit, it will sleep for 15mins and continue the search afterwards. From experience, this results to ~7500 tweets per 15mins.
Also, keep in mind the following statement about the Twitter Search API:
Please note that Twitter's search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.
Start the visualizer, which will open the second interface in a browser window:
streamlit run visualizer.py
You can select a previously collected dataset for further analysis from a drop-down menu. If you have your own Twitter dataset, please convert it to the twitwi csv format and copy it to the ./data folder.
The visualizer will create a new folder for every collection you make in the output folder. Refer to File structure for a detailed list of files generated by the twitter explorer.
As a first step, the visualizer creates a timeseries showing the amount of tweets in the dataset over time.
The twitter explorer can generate different types of interaction networks (retweet, mention, quote, reply) in which nodes are users. A link is drawn from node i to j if i interactions with j. The following graph operations are:
When enabled, the graph will be reduced to its largest connected component.
-
"Soft" aggregation Removes all users that are never interacted with and only interact with one other user (and can therefore not be bridges in the network)
-
"Hard" aggregation Removes all users from the network that are interacted with less than
ttimes.
Removes all accessible metadata of users that have less than 5000 followers (no public figures) from the interactive visualization in order to comply with current privacy standards. The nodes are visible and their links are taken into account, but they cannot be personally identified in the interface.
The twitter explorer currently supports Louvain [1] and Leiden [2] algorithms for community detection. The community assignments are saved as node metadata. Note that these community detection algorithms do not take into account link direction.
The twitter explorer can generate hashtag networks in which nodes are hashtags. A link is drawn between node i and j if i and j appear in the same tweet. The following methods are available:
When enabled, the graph will be reduced to its largest connected component.
The twitter explorer currently supports Louvain [1] community detection for hashtag networks.
If community detection is enabled, clustergraphs will be generated for both retweet and hashtag networks in which nodes are communities and links are weighted according the the cumulative links between users of the communities.
Its modular structure (division into collector/visualizer/explorer) and the ability to export the data makes the tool compatible with a variety of other data analysis tools. Both retweet and hashtag networks are saved as edgelist (.csv), GML (.gml) and GraphViz Dot (.gv).
A summary of the file structure is found below:
COLLECTED DATA (created by the collector)
.data/
.data/{date}_tweets_{keyword}.csv <-- collected dataset
INTERACTIVE NETWORKS (created by the visualizer)
./output/
./output/{date}_{keyword}/{date}_{keyword}{interaction_type}.html <-- interaction network
./output/{date}_{keyword}/{date}_{keyword}_HTN.html <-- hashtag network
./output/{date}_{keyword}/{date}_{keyword}_{interaction_type}_CG_{comdec_method}.html <-- interaction network clustergraph
./output/{date}_{keyword}/{date}_{keyword}_HTN_CG_{comdec_method}.html <-- hashtag network clustergraph
EXPORTED NETWORKS (created by the visualizer)
./output/{date}_{keyword}/export/
./output/{date}_{keyword}/export/{interaction_type}.csv <-- interaction network as edgelist
./output/{date}_{keyword}/export/{interaction_type}.gml <-- interaction network as gml
./output/{date}_{keyword}/export/{interaction_type}.gv <-- interaction network as dot for graphviz
./output/{date}_{keyword}/export/HTN.csv <-- hashtag network as edgelist
./output/{date}_{keyword}/export/HTN.gml <-- hashtag network as gml
./output/{date}_{keyword}/export/HTN.gv <-- hashtag network as dot for graphviz

Open the generated html files to explore the generated networks (we recommend using the latest version of Firefox for full feature support). The command palette on the left displays information about the network and can be interacted with. Currently, the following features are implemented:
- show information about the dataset
- show number of nodes and links
- recolor nodes according to community assignment
- change node size according to metadata values
- change node scaling
- display user metadata on click
- search for users / hashtags
- show user tweets in dataset
- show current user timeline
- take a screenshot of the current graph view
- export the graph as GML for Gephi
- export the user metadata as a CSV
[1] Blondel, Vincent D., et al. "Fast unfolding of communities in large networks." Journal of statistical mechanics: theory and experiment 2008.10 (2008): P10008.
[2] Traag, Vincent A., Ludo Waltman, and Nees Jan Van Eck. "From Louvain to Leiden: guaranteeing well-connected communities." Scientific reports 9.1 (2019): 1-12.


