The main ways to access Twitter data

1. Twitter Standard APIs

Twitter's standard application programming interface (API) is the most common entry point to Twitter data. There are a range of publicly available APIs that researchers can choose from, which often provide free but limited access to data. Of these, the Streaming API and the Search API are two of the most common choices.

The Streaming API returns a real-time stream of tweet data.

There are currently two sub-options to choose from: filtered stream and sampled stream. The filtered stream allows researchers to perform custom filtering and sifting on millions of tweets at any given time, and these researchers develop filtering rules that allow requests for up to 400 different keywords, 5,000 user identities, and 25 geographic locations. By default, each hourly filter stream returns 1% of the tweets that meet the filtering criteria. For example, if the researcher wants to use the Filter Stream API to collect all tweets containing the #TwitterAPI hashtag and the number of tweets matched by that hashtag is less than the upper limit, then the researcher will be able to get all tweets containing that subject hashtag. Otherwise, the researcher obtains a partial sample of tweets containing the #TwitterAPI hashtag.

In comparison, sampling streams are able to return a randomly sampled dataset of all newly posted tweets in real time (Pfeffer et al., 2018) without having to endure the limitations of filtering rules. Sampling streams can be a better option when researchers do not have a specific topic of interest and just want to understand the current temperature of all conversations on the Twitter platform.

The Search API is another widely used portal for accessing Twitter data.

It can return historical tweets that match the rules set by the user. The standard version of the Search API allows researchers to access a free sample of tweets posted by Twitter users in the past 7 days. Therefore, the standard version of the Search API is a good choice for researchers who only need historical tweet data from the past week or are willing to collect tweet data every other week to keep track of changes. However, if a research project requires historical tweet data over a longer period of time, or if the amount of data exceeds the sampling limit of the free specification, researchers will need to consider subscribing to a more advanced paid version.

2. Third-party platforms

The market for the provisioning of social media data has been growing rapidly. Third-party data platforms may also be considered for researchers who prefer a user-friendly interactive interface and can afford the additional cost. Knowlesys Intelligence System provides a more user-friendly interface than the publicly available Twitter API to help users filter data, download and aggregate reports. Especially for researchers who are not familiar with the API language, they need less learning cost. Moreover, Knowlesys Intelligence System also provides access to data from social media other than Twitter, such as Facebook, Instagram, YouTube, TikTok, etc., as well as traditional websites and dark web.

3. Sharing Twitter IDs

In addition to using APIs or purchasing third-party services, leveraging tweet IDs shared by other research teams is another way to collect Twitter data. Researchers sometimes follow the spirit of open science to share datasets. Twitter has specific instructions for researchers to share datasets publicly. Twitter datasets that meet the terms of open data sharing can be found on a number of specialized websites, such as the DocNow Catalog. For example, the DocNow Catalog. after obtaining tweet IDs, researchers can also restore the full content of tweets with relative ease using tools such as rebydratoR (Coakley & Steinert-Threlkeld, n.d.) or packages such as rtweet (Kearney et al., n.d.).