x-data-signals

Name: x-data-signals
Rating: 65
Author: ElemontCapital

by ElemontCapital

A suite of high-performance AI agent skills derived from the open-source x.AI x-algorithm

⭐ 1🍴 0📅 Jan 22, 2026

agentic-ai agentic-workflow agents ai ai-agents claude-code codex skills

View on GitHub Run in Manus

SKILL.md

name: x-data-signals description: Use this skill to decode the "DNA" of the recommendation engine. These signals are the foundational data structures used to retrieve candidates (before ranking) and as features for the Heavy Ranker (during ranking). version: 1.0.0 license: MIT

X Data Signals

Deep dive into the X recommendation engine's core signal libraries: SimClusters (Community Embeddings), RealGraph (Interaction Probabilities), TweepCred (Reputation), and TwHIN (Knowledge Graph).

Context

The engine relies on four primary signal pillars:

SimClusters (v2): A Matrix Factorization framework that anchors users and tweets into ~145k community vectors. It is the primary driver for "Embedding-Based Candidate Generation" (EBCG).
RealGraph: A weighted, directed graph of user interactions, predicting the probability P(u -> v) of engagement. It powers the "In-Network" feed.
TweepCred: A continuous PageRank score (0-100) determining user authority.
TwHIN: (Twitter Heterogeneous Information Network) Dense knowledge-graph embeddings that capture multi-modal relationships (Users, Tweets, Ads, Topics) in a shared vector space.

For detailed logic, see:

What it does

Identifies "Lookalike" Audiences: Uses SimClusters to find content popular in communities you implicitly belong to, even if you don't follow the authors.
Quantifies Relationship Strength: Uses RealGraph to assign a floating-point weight to every user-user connection, prioritizing close friends over acquaintances.
Filters Low-Quality Nodes: Uses TweepCred to prune candidate pools during the retrieval stage, saving compute by ignoring low-authority accounts.
Calculates Embedding Similarity: Computes dot-product scores between User embeddings and Tweet embeddings to predict relevance in the "Earlybird" (Light Ranker) stage.

Guidelines

SimClusters v2 Implementation: The source code distinguishes between "Known-For" (what a Creator talks about) and "Interested-In" (what a Consumer likes). A tweet is recommended if the Creator's "Known-For" vector aligns with the Consumer's "Interested-In" vector.
GraphJet vs. RealGraph:
- RealGraph: The offline/batch-calculated interaction model (the "map").
- GraphJet: The real-time, in-memory graph processing engine that serves the RealGraph data to the HomeMixer.
TwHIN vs. SimClusters:
- SimClusters is sparse and interpretable (e.g., "Cluster 123 = JavaScript").
- TwHIN is dense and uninterpretable (64-dim float vectors). TwHIN is often used for "TwHIN-Collab" filtering in the candidate generation phase.
Signal Decay: RealGraph weights decay over time. A "Like" from 2018 is worth significantly less than a "Like" from today. The UserInteractionSignal service handles this time-decay logic.
Code Locations:
- src/scala/com/twitter/simclusters_v2: Core logic for community embeddings.
- src/scala/com/twitter/graph/batch/job/twhin: Knowledge graph embedding generation.
- src/java/com/twitter/search/earlybird: Where real-time signals meet search indices.

Example Trigger Prompts

"/explain-graph TweepCred @user"
"/explain-graph SimClusters @user"
"/explain-graph RealGraph interactions"
"How does SimClusters v2 compute 'InterestedIn' scores?"
"Compare RealGraph weights vs Follow links"
"How TweepCred affects HeavyRanker min_reputation"
"Explain TwHIN embeddings with SimClusters"
"Fave-based vs Follow-based clustering logic"

Score

Total Score

65/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

x-data-signals

SKILL.md

name: x-data-signals description: Use this skill to decode the "DNA" of the recommendation engine. These signals are the foundational data structures used to retrieve candidates (before ranking) and as features for the Heavy Ranker (during ranking). version: 1.0.0 license: MIT

X Data Signals

Context

What it does

Guidelines

Example Trigger Prompts

Score

Reviews

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices

x-data-signals

SKILL.md

name: x-data-signals description: Use this skill to decode the "DNA" of the recommendation engine. These signals are the foundational data structures used to retrieve candidates (before ranking) and as features for the Heavy Ranker (during ranking). version: 1.0.0 license: MIT

X Data Signals

Context

What it does

Guidelines

Example Trigger Prompts

Score

Reviews

Related

Related Skills

orpc-contract-first

component-refactoring

web-design-guidelines

frontend-code-review

frontend-testing

vercel-react-best-practices