Back to list
donbr

lifesciences-graph-builder

by donbr

AI Agent wrappers for Life Sciences APIs (Open Targets, ChEMBL, UniProt). Accelerating drug discovery with Model Context Protocol (MCP) and FastMCP.

4🍴 0📅 Jan 20, 2026

SKILL.md


name: lifesciences-graph-builder description: "Orchestrates life sciences APIs to build knowledge graphs using the Fuzzy-to-Fact protocol, combining MCPs for nodes and curl for edges, then persisting to Graphiti. This skill should be used when the user asks to "build knowledge graphs", "find biological connections", "explore drug repurposing", "validate drug targets", or mentions traversing gene→protein→pathway→drug→disease paths, multi-API orchestration, or graph persistence workflows."

Life Sciences Graph Builder

Orchestrate multi-API graph construction using the Fuzzy-to-Fact protocol.

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                         GRAPH CONSTRUCTION KIT                          │
├─────────────────────────────────────────────────────────────────────────┤
│  TIER 1: MCP TOOLS (Verified Nodes)                                     │
│  ├── HGNC: search_genes, get_gene                                       │
│  ├── UniProt: search_proteins, get_protein                              │
│  ├── ChEMBL: search_compounds, get_compound                             │
│  ├── STRING: search_proteins, get_interactions                          │
│  ├── Open Targets: search_targets, get_associations                     │
│  └── WikiPathways: get_pathways_for_gene, get_pathway_components        │
├─────────────────────────────────────────────────────────────────────────┤
│  TIER 2: CURL COMMANDS (Relationship Edges)                             │
│  ├── ChEMBL /mechanism: Drug → Target                                   │
│  ├── ChEMBL /drug_indication: Drug → Disease                            │
│  ├── ChEMBL /activity: Drug → Target (with Ki/IC50)                     │
│  ├── Ensembl /homology: Gene → Orthologs                                │
│  ├── STRING /enrichment: Protein Set → GO/KEGG terms                    │
│  └── NCBI elink: Gene → PubMed                                          │
├─────────────────────────────────────────────────────────────────────────┤
│  TIER 3: GRAPHITI (Persistence)                                         │
│  └── add_memory: Persist validated subgraph as JSON episode             │
└─────────────────────────────────────────────────────────────────────────┘

Workflow: Fuzzy-to-Fact Protocol

Phase 1: Anchor Node (Naming)

Resolve fuzzy user input to canonical identifier.

# MCP: HGNC
result = hgnc.search_genes("p53")
gene = hgnc.get_gene("HGNC:11998")  # → cross_references: UniProt, Ensembl, Entrez

Phase 2: Enrich Node (Functional)

Decorate node with metadata and cross-references.

# MCP: UniProt
protein = uniprot.get_protein("UniProtKB:P04637")
# → function text reveals interactors: BAX, BCL2, FAS

Phase 3: Expand Edges (Interactions)

Build adjacency list from interaction databases.

# MCP: STRING
interactions = string.get_interactions("STRING:9606.ENSP00000269305")
# → MDM2 (0.999), SIRT1 (0.999), ATM (0.995)
# Curl: Open Targets (gene-disease)
curl -s -X POST "https://api.platform.opentargets.org/api/v4/graphql" \
  -H "Content-Type: application/json" \
  -d '{"query": "{ target(ensemblId: \"ENSG00000141510\") { associatedDiseases(page: {size: 5}) { rows { disease { name } score } } } }"}'

Phase 4: Target Traversal (Pharma)

Follow edges to actionable targets.

# MCP: HGNC (resolve downstream effector)
bcl2 = hgnc.search_genes("BCL2")  # → HGNC:990

# MCP: ChEMBL (find inhibitors)
venetoclax = chembl.search_compounds("Venetoclax")  # → CHEMBL:3137309
# Curl: ChEMBL mechanism (Drug → Target edge)
curl -s "https://www.ebi.ac.uk/chembl/api/data/mechanism?molecule_chembl_id=CHEMBL3137309&format=json" \
  | jq '.mechanisms[] | {action: .action_type, target: .target_chembl_id}'
# → INHIBITOR → CHEMBL4860 (BCL2)

Phase 5: Persist Graph

Store validated subgraph in Graphiti.

# MCP: Graphiti
graphiti.add_memory(
    name="TP53-BCL2-Venetoclax pathway",
    episode_body=json.dumps({
        "nodes": [
            {"id": "HGNC:11998", "type": "Gene", "symbol": "TP53"},
            {"id": "HGNC:990", "type": "Gene", "symbol": "BCL2"},
            {"id": "CHEMBL:3137309", "type": "Compound", "name": "Venetoclax"}
        ],
        "edges": [
            {"source": "HGNC:11998", "target": "HGNC:990", "type": "REGULATES"},
            {"source": "CHEMBL:3137309", "target": "HGNC:990", "type": "INHIBITOR"}
        ]
    }),
    source="json",
    group_id="drug-repurposing"
)

Quick Edge Discovery Commands

Edge TypeCurl Command
Drug → Targetcurl -s "https://www.ebi.ac.uk/chembl/api/data/mechanism?molecule_chembl_id={ID}&format=json"
Target → Drugscurl -s "https://www.ebi.ac.uk/chembl/api/data/mechanism?target_chembl_id={ID}&format=json"
Drug → Diseasecurl -s "https://www.ebi.ac.uk/chembl/api/data/drug_indication?molecule_chembl_id={ID}&format=json"
Gene → DiseaseOpen Targets GraphQL (see Phase 3)
Gene → Orthologscurl -s "https://rest.ensembl.org/homology/id/human/{ENSG}?type=orthologues&content-type=application/json"
Protein Set → GOcurl -s "https://string-db.org/api/json/enrichment?identifiers={IDs}&species=9606"
Gene → PubMedcurl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=pubmed&id={ID}&retmode=json"

Example: Drug Repurposing Graph

Build a complete subgraph for drug repurposing analysis:

# Step 1: Anchor - Resolve gene
# MCP: hgnc.search_genes("TP53") → HGNC:11998

# Step 2: Get protein context
# MCP: uniprot.get_protein("UniProtKB:P04637")
# → function mentions BCL2

# Step 3: Find BCL2 inhibitors
curl -s "https://www.ebi.ac.uk/chembl/api/data/mechanism?target_chembl_id=CHEMBL4860&format=json" \
  | jq '.mechanisms[] | {drug: .molecule_chembl_id, action: .action_type}'

# Step 4: Get drug indications
curl -s "https://www.ebi.ac.uk/chembl/api/data/drug_indication?molecule_chembl_id=CHEMBL3137309&format=json" \
  | jq '.drug_indications[:3][] | {disease: .mesh_heading, phase: .max_phase_for_ind}'

# Step 5: Find clinical trials
curl -s "https://clinicaltrials.gov/api/v2/studies?query.intr=venetoclax&filter.overallStatus=RECRUITING&pageSize=3&format=json" \
  | jq '.studies[] | {nct: .protocolSection.identificationModule.nctId}'

# Step 6: Persist to Graphiti
# MCP: graphiti.add_memory(...)

Node Types (Canonical CURIEs)

TypeCURIE PatternExample
GeneHGNC:\d+HGNC:11998
ProteinUniProtKB:[A-Z0-9]+UniProtKB:P04637
CompoundCHEMBL:\d+CHEMBL:3137309
TargetCHEMBL:\d+CHEMBL:4860
DiseaseEFO_\d+ or MONDO_\d+EFO_0000574
PathwayWP:WP\d+WP:WP1742
TrialNCT:\d+NCT:00461032

Edge Types

EdgeSourceTargetProperties
ENCODESGeneProtein-
REGULATESGeneGenedirection: activation/repression
INTERACTSProteinProteinscore, evidence_type
INHIBITORCompoundTargetKi, IC50
AGONISTCompoundTargetEC50
TREATSCompoundDiseasemax_phase
ASSOCIATED_WITHGeneDiseasescore, evidence_sources
MEMBER_OFGenePathway-

Query Best Practices

Gene Discovery (Human-Centric)

  • Default to species=9606 (human) for gene/protein searches
  • Use page_size=10 for exploration, page_size=50 for batch operations
  • Use slim=True for batch operations to reduce token usage
  • Only use organism=null for comparative genomics across species

Drug Discovery vs Repurposing

  • Drug repurposing: Use max_phase≥2 (clinical validation, shorter approval path)
  • General discovery: No phase filter (include preclinical tools, mechanism probes)
  • Check mechanisms before bioactivity data

Clinical Landscape

  • Default status=RECRUITING for active research
  • Use phase filter only for specific analysis:
    • PHASE3+ for commercialization analysis
    • PHASE1/2 for early pipeline
    • No filter for full landscape

See Also

  • lifesciences-genomics: Ensembl, NCBI, HGNC endpoints
  • lifesciences-proteomics: UniProt, STRING, BioGRID endpoints
  • lifesciences-pharmacology: ChEMBL, PubChem, IUPHAR endpoints
  • lifesciences-clinical: Open Targets, ClinicalTrials.gov endpoints

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon