As the speculations about Twitter’s algorithm have been going on for months now with new news every day, the conversations can now come to a conclusion. From seeing Elon Musk’s tweets on top to getting verified accounts on top of the screen, the Twitter algorithm has been predicted by many to understand how it works.
Twitter wants to provide you with the finest current events the world has to offer. The approximately 500 million Tweets that are posted every day must be reduced to a small number of the best Tweets in order for them to appear on your device’s For You timeline. They have detailed how the algorithm chooses Tweets for your feed in a blog post.
Twitter provides a detailed explanation of the numerous interconnected services and positions that make up its recommendation system. Although there are many places in the app where Tweets are suggested (Search, Explore, Ads), this article will concentrate on the For You feed on the home timeline.
How are tweets chosen?
A collection of core models and features that extract latent information from Tweet, user, and engagement data form the basis of Twitter’s recommendations. What is the likelihood that you will engage with another user in the future? and other crucial questions about the Twitter network are addressed by these models. or “What are Twitter communities, and what are the most popular Tweets within them?” There are three major stages in the recommendation pipeline that use these features:
- Perform candidate sourcing, a method that involves gathering the top Tweets from various sources of advice.
- Employ a machine learning algorithm to rank each Tweet.
- Use filters and heuristics, such as excluding Tweets from banned users, NSFW material, and Tweets you’ve already seen.
Home Mixer is the service in charge of creating and delivering the For You schedule. Product Mixer, their unique Scala system that makes it easy to create content feeds, serves as the foundation for Home Mixer. This service serves as the technological framework that links various potential sources, scoring formulas, heuristics, and filters.
Understanding the parts of the chart will make you understand how the tweets on your feed will be decided as per the algorithm.
To find current and pertinent Tweets for a user, Twitter uses a number of Candidate Sources. Through these sources, they try to select the top 1500 Tweets for each request from a group of hundreds of millions. Both the individuals you follow (In-Network) and the people you don’t follow are sources for candidates. (Out-of-Network). Currently, the For You stream has an average split of 50% In-Network Tweets and 50% Out-of-Network Tweets, though individual user ratios may differ.
The largest candidate source, In-Network, seeks to provide the most pertinent, most recent Tweets from users you follow. Using a logistic regression algorithm, it effectively ranks the Tweets of the people you follow according to how relevant they are.
The best Tweets are then moved on to the following phase. The real Graph is the most crucial factor in determining how In-Network Tweets are ranked. A model called a Real Graph forecasts the probability of interaction between two users. The more of the tweet author’s tweets will appear the better your two Real Graph scores are.
It’s more difficult to find relevant Tweets outside of a user’s network because you can’t know if a Tweet will be pertinent to you if you don’t follow the author. Twitter tackles this from two different angles. Social Network Their initial strategy is to make an educated guess about what you would find interesting by looking at the interactions of people you follow or who share your interests. They move through the interactions and follows graph to respond to the following queries:
What Tweets have my followers and I been talking about lately?
Who shares my interests in Twitter and what other tweets have they lately liked?
“We generate candidate Tweets based on the answers to these questions and rank the resulting Tweets using a logistic regression model. Graph traversals of this type are essential to our Out-of-Network recommendations; we developed GraphJet, a graph processing engine that maintains a real-time interaction graph between users and Tweets, to execute these traversals. While such heuristics for searching the Twitter engagement and follow network have proven useful (these currently serve about 15% of Home Timeline Tweets), embedding space approaches have become the larger source of Out-of-Network Tweets.” mentioned Twitter in its blog.
Embedding space methods seek to respond to the following more broad inquiry about content similarity: What Users and Tweets share my interests? By creating numerical representations of users’ interests and Tweets’ substance, embeddings produce useful results. The similarity between any two users, Tweets, or user-Tweet combinations in this embedding space can then be calculated. They take advantage of this similarity as a stand-in for relevance, provided they produce correct embeddings.
There are 145k groups with three-weekly updates. In the realm of communities, users and Tweets are represented and have the option of joining numerous communities. The number of users in a community can vary from a few thousand for personal friend groups to hundreds of millions for news or pop culture.
“We can embed Tweets into these communities by looking at the current popularity of a Tweet in each community. The more that users from a community like a Tweet, the more that Tweet will be associated with that community.”
The For You timeline’s objective is to provide you with pertinent Tweets. They currently have 1500 applicants in the pipeline who could be useful. Ranking Tweets on your timeline is primarily based on scoring, which serves as a direct predictor of each prospective Tweet’s relevance. All candidates are handled equally at this point, regardless of the candidate source from which they came.
A 48M parameter neural network is used to achieve ranking and is constantly trained on Tweet interactions to maximize enthusiastic engagement. (e.g. Likes, Retweets, and Replies). Each of the ten labels produced by this ranking system, which considers thousands of features and assigns a number to each Tweet based on them, denotes the likelihood of an engagement.
After the Ranking stage, they apply heuristics and filters to implement various product features. These features work together to create a balanced and diverse feed.
- Filter tweets based on their content and your tastes with visibility filtering.
- Take away Tweets from accounts that you ban or mute, for example.
- Avoid reading too many Tweets in a row from the same source. Maintain an equitable distribution of In-Network and Out-of-Network Tweets by monitoring the content balance.
- Feedback-based Fatigue: Reduces a Tweet’s score if a user has given it unfavorable feedback.
- Make sure someone you follow responded to the tweet or followed the creator of the tweet for social proof.
- Conversations: Connect a Reply to the initial Tweet to give it more context.
- Edited Tweets: Determine whether the Tweets that are presently on a device need to be replaced with the edited versions.
Mixing and Serving
A set of Tweets is available for Home Mixer to broadcast to your device. The system combines Tweets with non-Tweet content, such as ads, follower recommendations, and onboarding prompts, as the final stage before returning them to your device for display.
The system above completes on average 5 billion times per day in less than 1.5 seconds. The amount of CPU time needed for a single pipeline execution is 220 seconds or roughly 150 times the latency you see in the app.
Their open-source initiative aims to give you, our customers, complete transparency about how our systems operate. They have made the code underlying recommendations available for you to examine so that you can learn more about the algorithm, and they are also developing a number of features to give you more transparency within the app. Some of the most recent changes include:
- A better Twitter analytics tool with more exposure and engagement data for creators.
- Greater openness regarding any safety ratings assigned to your accounts.
- Tweets increased awareness of the reasons Tweets show on your timeline.
“Twitter is the center of conversations around the world. Every day, we serve over 150 billion Tweets to people’s devices. Ensuring that we’re delivering the best content possible to our users is both a challenging and an exciting problem. We’re working on new opportunities to expand our recommendation systems—new real-time features, embeddings, and user representations—and we have one of the most interesting datasets and user bases in the world to do it with. We are building the town square of the future. If this interests you, please consider joining us.”, they concluded in their blog post.