mastering-postgresql

Name: mastering-postgresql
Rating: 65
Author: SpillwaveSolutions

by SpillwaveSolutions

PostgreSQL development for Python with full-text search, vector similarity, JSONB indexing, and production cloud deployment

⭐ 0🍴 0📅 Jan 7, 2026

agentic-skill bm25 claude-code-skill postgresql vector-search

View on GitHub Run in Manus

SKILL.md

name: mastering-postgresql description: PostgreSQL development for Python with full-text search (tsvector, tsquery, BM25 via pg_search), vector similarity (pgvector with HNSW/IVFFlat), JSONB and array indexing, and production deployment. Use when creating search features, storing AI embeddings, querying vector similarity, optimizing PostgreSQL indexes, or deploying to AWS RDS/Aurora, GCP Cloud SQL/AlloyDB, or Azure. Covers psycopg2, psycopg3, asyncpg, SQLAlchemy integration, Docker development setup, and index selection strategies. Triggers are Use "PostgreSQL search", "pgvector", "BM25 postgres", "JSONB index", "psycopg", "asyncpg", "PostgreSQL Docker", "AlloyDB vector". Does NOT cover - DBA administration (backup, replication, users), MySQL/MongoDB/Redis, schema design theory, stored procedures. allowed-tools:

Read
Bash
Write

PostgreSQL Python Development

Build search, vector similarity, and data-intensive applications with PostgreSQL and Python.

Quick Reference

Task	Go To
Docker/local setup	setup-and-docker.md
Full-text search & BM25	search-fulltext.md
pgvector & JSONB indexing	search-vectors-json.md
Python drivers & pools	python-drivers.md
Python query patterns	python-queries.md
AWS RDS/Aurora	cloud-aws.md
GCP Cloud SQL/AlloyDB	cloud-gcp.md
Azure Flexible Server	cloud-azure.md
Neon & Supabase	cloud-serverless.md
Cloud common (pooling, config)	cloud-common.md

When NOT to Use This Skill

DBA tasks: Backup strategies, replication setup, user management, security hardening
Other databases: MySQL, MongoDB, Redis, Elasticsearch-specific queries
Schema design: Normalization theory, data modeling patterns
Stored procedures: PL/pgSQL function development
Application frameworks: Django ORM specifics, FastAPI integration details

Quick Start Checklist

Copy this checklist to track progress:

Setup Progress:
- [ ] Docker environment running (docker-compose up -d)
- [ ] Connected to database (psql or Python)
- [ ] Extensions created (pgvector, pg_trgm)
- [ ] Table created with search_vector and embedding columns
- [ ] GIN index on search_vector created
- [ ] HNSW index on embedding created
- [ ] Test full-text query returns results
- [ ] Test vector query returns results

Quick Start: Search + Vectors in 5 Minutes

1. Start PostgreSQL with pgvector

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_PASSWORD: devpass
    ports: ["5432:5432"]
    volumes: [pgdata:/var/lib/postgresql/data]
volumes:
  pgdata:

docker-compose up -d

# Verify container is running:
docker-compose ps
# Expected: postgres service with status "Up"

2. Enable Extensions

CREATE EXTENSION vector;      -- pgvector for embeddings
CREATE EXTENSION pg_trgm;     -- Trigram for fuzzy search

-- Verify extensions installed:
SELECT extname, extversion FROM pg_extension 
WHERE extname IN ('vector', 'pg_trgm');
-- Expected: 2 rows with version numbers

3. Create Searchable Table with Vectors

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT,
    embedding vector(1536),
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(content, '')), 'B')
    ) STORED
);

-- Create indexes
CREATE INDEX idx_docs_search ON documents USING GIN (search_vector);
CREATE INDEX idx_docs_embedding ON documents USING hnsw (embedding vector_cosine_ops);

-- Verify indexes created:
SELECT indexname FROM pg_indexes WHERE tablename = 'documents';
-- Expected: idx_docs_search, idx_docs_embedding, documents_pkey

4. Query from Python

import asyncpg

async def search(pool, query: str, embedding: list[float], limit: int = 10):
    return await pool.fetch("""
        SELECT id, title,
               ts_rank(search_vector, websearch_to_tsquery('english', $1)) AS text_rank,
               embedding <=> $2::vector AS vector_dist
        FROM documents
        WHERE search_vector @@ websearch_to_tsquery('english', $1)
        ORDER BY vector_dist
        LIMIT $3
    """, query, embedding, limit)

# Verify connection works:
# pool = await asyncpg.create_pool('postgresql://postgres:devpass@localhost/postgres')
# rows = await pool.fetch("SELECT 1 AS test")
# assert rows[0]['test'] == 1

Decision Trees

Which Search Approach?

Need search? ─┬─► Exact keyword match ──────► B-tree index + WHERE =
              │
              ├─► Full-text search (FTS) ───► tsvector + GIN + ts_rank
              │
              ├─► Relevance like Google ────► pg_search BM25 (ParadeDB)
              │
              ├─► Typo tolerance ───────────► pg_trgm + similarity()
              │
              ├─► Semantic/AI search ───────► pgvector + embeddings
              │
              └─► Hybrid (keywords + semantic) ► Combine tsvector + pgvector

Which Vector Index?

Vector index? ─┬─► Dataset < 100K rows ────► No index (exact search OK)
               │
               ├─► Need best recall ────────► HNSW (slower build, fast query)
               │
               ├─► Fast index build ────────► IVFFlat (needs data first)
               │
               ├─► On AlloyDB ──────────────► ScaNN (Google optimized)
               │
               ├─► On Azure ────────────────► pg_diskann (32x less memory)
               │
               ├─► Billions of vectors ─────► VectorChord vchordrq (self-host)
               │
               └─► Dimensions > 2000 ───────► halfvec or binary quantization

Which Python Library?

Python lib? ──┬─► Sync, simple, stable ─────► psycopg2
              │
              ├─► Async + modern features ──► psycopg3
              │
              ├─► Max async performance ────► asyncpg
              │
              └─► ORM needed ───────────────► SQLAlchemy + asyncpg/psycopg

Which Index Type for Column?

Column type? ─┬─► Scalar (int, text, timestamp) ─► B-tree (default)
              │
              ├─► JSONB ────────────────────────┬► GIN (general queries)
              │                                 └► GIN jsonb_path_ops (@> only)
              │
              ├─► Array ────────────────────────► GIN
              │
              ├─► tsvector ─────────────────────► GIN (or GiST for updates)
              │
              ├─► vector ───────────────────────► HNSW or IVFFlat
              │
              └─► Range / Geometric ────────────► GiST

Common Patterns

For implementation details, see the reference files:

Pattern	Reference
Full-text search with ranking	search-fulltext.md#ranking-functions
BM25 search	search-fulltext.md#bm25-with-pg_search
Vector similarity	search-vectors-json.md#distance-operators
JSONB containment	search-vectors-json.md#jsonb-indexing
Array overlap	search-vectors-json.md#array-indexing
Bulk insert	python-queries.md#bulk-insert-strategies
Connection pool	python-drivers.md#asyncpg-async-only

Index Tuning Quick Reference

HNSW Parameters

Parameter	Default	Guidance
`m`	16	Higher = better recall, more memory. 12-48 typical
`ef_construction`	64	Higher = better index quality, slower build. 64-200
`hnsw.ef_search`	40	Set at query time. Higher = better recall, slower

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=100);
SET hnsw.ef_search = 100;  -- Before querying

-- Verify setting applied:
SHOW hnsw.ef_search;

IVFFlat Parameters

Parameter	Guidance
`lists`	sqrt(rows) for <1M rows; rows/1000 for >1M
`ivfflat.probes`	Start at sqrt(lists), increase for recall

CREATE INDEX ON docs USING ivfflat (embedding vector_l2_ops) WITH (lists=100);
SET ivfflat.probes = 10;

Troubleshooting Quick Reference

Symptom	Likely Cause	Fix
Seq Scan on indexed column	Stats outdated	`ANALYZE tablename;`
Vector search slow	No index or low ef_search	Create HNSW index, increase ef_search
Poor vector recall	IVFFlat probes too low	Increase `ivfflat.probes`
FTS not matching	Wrong language config	Check `to_tsvector('english', ...)`
Index not used	Query doesn't match ops	Verify operator class matches query
Connection timeout	Pool exhausted	Increase pool size or fix leaks
Extension not found	Not installed	`CREATE EXTENSION name;`
HNSW build OOM	Insufficient memory	Increase `maintenance_work_mem`
Filtered queries return few results	Filtering after index scan	Enable `hnsw.iterative_scan`
Connection drops in production	No health checking	Use `check=ConnectionPool.check_connection`
Scaling past 100M vectors	pgvector limits	Consider VectorChord vchordrq

For detailed troubleshooting, see search-vectors-json.md.

Script Usage

pip install -r scripts/requirements.txt  # Install dependencies first

Script	Purpose	When to Use
`setup_extensions.py`	Install pgvector, pg_trgm extensions	Initial database setup
`create_search_tables.py`	Create tables with search_vector and embedding columns	After extensions installed
`health_check.py`	Check index health, bloat, and performance	Diagnosing slow queries
`vector_search.py --demo`	Demonstrate vector similarity queries	Learning pgvector patterns
`bulk_insert.py`	High-performance data loading	Importing large datasets
`fts_examples.py`	Full-text search query examples	Learning FTS syntax
`connection_pool.py`	Connection pooling patterns	Production deployments

Example:

python scripts/setup_extensions.py --host localhost --dbname mydb
python scripts/create_search_tables.py --host localhost --dbname mydb
python scripts/health_check.py --host localhost --dbname mydb

Cloud Quick Reference

Provider	pgvector	BM25 Support	Connection Pooling
AWS RDS/Aurora	0.8.0	pg_textsearch (preview)	RDS Proxy
GCP Cloud SQL	0.8.0	pg_textsearch (preview)	Cloud SQL Proxy
GCP AlloyDB	0.8.0 + ScaNN	pg_textsearch (preview)	Built-in
Azure Flexible	0.8.0 + pg_diskann	pg_textsearch (preview)	Built-in PgBouncer
Neon	✅	pg_search	Built-in
Supabase	✅	pg_search	Built-in

Serverless options: Neon (scale-to-zero, instant branching) and Supabase (BaaS with auth/real-time) are ideal for dev/test and startups. See cloud-serverless.md.

BM25 Options:

pg_search (ParadeDB): Production-ready, self-host or ParadeDB managed service
pg_textsearch (TigerData): Preview status, available on managed PostgreSQL services

See provider-specific files for setup commands: AWS | GCP | Azure

Reference Files

Load these for detailed implementation guidance:

Reference	Load When
setup-and-docker.md	Docker setup, extension installation, postgresql.conf tuning
search-fulltext.md	Full-text search (FTS), BM25 setup, trigram fuzzy search
search-vectors-json.md	pgvector tuning, JSONB/array indexing, maintenance
python-drivers.md	psycopg2/psycopg3/asyncpg, connection pools, SQLAlchemy
python-queries.md	Bulk inserts, FTS queries, vector queries, JSONB operations
cloud-aws.md	AWS RDS/Aurora setup, RDS Proxy
cloud-gcp.md	GCP Cloud SQL/AlloyDB, ScaNN indexes
cloud-azure.md	Azure Flexible Server, pg_diskann
cloud-serverless.md	Neon, Supabase (scale-to-zero, branching)
cloud-common.md	Extension matrix, pooling, production config, costs

Score

Total Score

65/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

○LICENSE

ライセンスが設定されている

0/10

✓説明文

100文字以上の説明がある

+10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

3ヶ月以内に更新

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

mastering-postgresql

SKILL.md

PostgreSQL Python Development

Quick Reference

When NOT to Use This Skill

Quick Start Checklist

Quick Start: Search + Vectors in 5 Minutes

1. Start PostgreSQL with pgvector

2. Enable Extensions

3. Create Searchable Table with Vectors

4. Query from Python

Decision Trees

Which Search Approach?

Which Vector Index?

Which Python Library?

Which Index Type for Column?

Common Patterns

Index Tuning Quick Reference

HNSW Parameters

IVFFlat Parameters

Troubleshooting Quick Reference

Script Usage

Cloud Quick Reference

Reference Files

Score

Reviews

analytics-events

docs-review

clojure-write

add-malli-schemas

docs-write

clojure-review

mastering-postgresql

SKILL.md

PostgreSQL Python Development

Quick Reference

When NOT to Use This Skill

Quick Start Checklist

Quick Start: Search + Vectors in 5 Minutes

1. Start PostgreSQL with pgvector

2. Enable Extensions

3. Create Searchable Table with Vectors

4. Query from Python

Decision Trees

Which Search Approach?

Which Vector Index?

Which Python Library?

Which Index Type for Column?

Common Patterns

Index Tuning Quick Reference

HNSW Parameters

IVFFlat Parameters

Troubleshooting Quick Reference

Script Usage

Cloud Quick Reference

Reference Files

Score

Reviews

Related

Related Skills

analytics-events

docs-review

clojure-write

add-malli-schemas

docs-write

clojure-review