The main ways to access Twitter data
1. Twitter Standard APIs
Twitter's standard application
programming interface (API) is the most common entry point to Twitter data. There
are a range of publicly available APIs that researchers can choose from, which often
provide free but limited access to data. Of these, the Streaming API and the Search
API are two of the most common choices.
The Streaming API returns a
real-time stream of tweet data.
There are currently two
sub-options to choose from: filtered stream and sampled stream. The filtered stream
allows researchers to perform custom filtering and sifting on millions of tweets at
any given time, and these researchers develop filtering rules that allow requests
for up to 400 different keywords, 5,000 user identities, and 25 geographic
locations. By default, each hourly filter stream returns 1% of the tweets that meet
the filtering criteria. For example, if the researcher wants to use the Filter
Stream API to collect all tweets containing the #TwitterAPI hashtag and the number
of tweets matched by that hashtag is less than the upper limit, then the researcher
will be able to get all tweets containing that subject hashtag. Otherwise, the
researcher obtains a partial sample of tweets containing the #TwitterAPI hashtag.
In comparison, sampling streams are able to return a randomly sampled
dataset of all newly posted tweets in real time (Pfeffer et al., 2018) without
having to endure the limitations of filtering rules. Sampling streams can be a
better option when researchers do not have a specific topic of interest and just
want to understand the current temperature of all conversations on the Twitter
platform.
The Search API is another widely used portal for accessing
Twitter data.
It can return historical tweets that match the
rules set by the user. The standard version of the Search API allows researchers to
access a free sample of tweets posted by Twitter users in the past 7 days.
Therefore, the standard version of the Search API is a good choice for researchers
who only need historical tweet data from the past week or are willing to collect
tweet data every other week to keep track of changes. However, if a research project
requires historical tweet data over a longer period of time, or if the amount of
data exceeds the sampling limit of the free specification, researchers will need to
consider subscribing to a more advanced paid version.
2. Third-party platforms
The market for the provisioning of
social media data has been growing rapidly. Third-party data platforms may also be
considered for researchers who prefer a user-friendly interactive interface and can
afford the additional cost.
Knowlesys Intelligence
System provides a more user-friendly interface than the publicly available
Twitter API to help users filter data, download and aggregate reports. Especially
for researchers who are not familiar with the API language, they need less learning
cost. Moreover, Knowlesys Intelligence System also provides access to data from
social media other than Twitter, such as Facebook, Instagram, YouTube, TikTok, etc.,
as well as traditional websites and dark web.
3. Sharing Twitter
IDs
In addition to using APIs or purchasing third-party
services, leveraging tweet IDs shared by other research teams is another way to
collect Twitter data. Researchers sometimes follow the spirit of open science to
share datasets. Twitter has specific instructions for researchers to share datasets
publicly. Twitter datasets that meet the terms of open data sharing can be found on
a number of specialized websites, such as the DocNow Catalog. For example, the
DocNow Catalog. after obtaining tweet IDs, researchers can also restore the full
content of tweets with relative ease using tools such as rebydratoR (Coakley &
Steinert-Threlkeld, n.d.) or packages such as rtweet (Kearney et al., n.d.).