Back to list
FortiumPartners

using-weaviate

by FortiumPartners

Ensemble Plugin Ecosystem - Modular Claude Code plugins for AI-augmented development workflows

0🍴 1📅 Jan 22, 2026

SKILL.md


name: using-weaviate description: Weaviate vector database for semantic search, hybrid queries, and AI-native applications. Use for embeddings storage, similarity search, RAG pipelines, and multi-modal retrieval.

Weaviate Vector Database Skill

Version: 1.0.0 | Target: <500 lines | Purpose: Fast reference for Weaviate operations


Overview

What is Weaviate: Open-source vector database for AI-native applications combining vector search with structured filtering and keyword search.

When to Use This Skill:

  • Storing and querying vector embeddings
  • Implementing semantic/similarity search
  • Building RAG (Retrieval-Augmented Generation) pipelines
  • Hybrid search (vector + keyword)
  • Multi-tenant vector applications

Auto-Detection Triggers:

  • weaviate-client in requirements.txt or pyproject.toml
  • weaviate-client or weaviate-ts-client in package.json
  • WEAVIATE_URL, WEAVIATE_API_KEY, or WCD_URL environment variables
  • docker-compose.yml with semitechnologies/weaviate image

Progressive Disclosure:

  • This file (SKILL.md): Quick reference for immediate use
  • REFERENCE.md: Comprehensive patterns, modules, and advanced configuration

Table of Contents

  1. Core Concepts
  2. Quick Start
  3. CLI Decision Tree
  4. Collection Schema
  5. Data Operations
  6. Search Operations
  7. Generative Search (RAG)
  8. Multi-Tenancy
  9. Docker Setup
  10. Error Handling
  11. Best Practices
  12. Quick Reference Card
  13. Agent Integration

Core Concepts

ConceptDescription
CollectionSchema definition for a data type (formerly "Class")
ObjectIndividual data item with properties and vector
VectorNumerical representation of data for similarity search
ModulePlugin for vectorization, generative AI, or reranking
TenantIsolated data partition for multi-tenant applications

Quick Start

Python Setup

import weaviate
from weaviate.classes.init import Auth

# Connect to Weaviate Cloud (recommended: use context manager)
with weaviate.connect_to_weaviate_cloud(
    cluster_url="https://your-cluster.weaviate.network",
    auth_credentials=Auth.api_key("your-wcd-api-key"),
    headers={"X-OpenAI-Api-Key": "your-openai-key"}
) as client:
    print(client.is_ready())  # True

# Or connect to local instance
client = weaviate.connect_to_local()

TypeScript Setup

import weaviate, { WeaviateClient } from 'weaviate-client';

const client: WeaviateClient = await weaviate.connectToWeaviateCloud(
  'https://your-cluster.weaviate.network',
  { authCredentials: new weaviate.ApiKey('your-wcd-api-key') }
);
await client.close();

Environment Variables

export WEAVIATE_URL="https://your-cluster.weaviate.network"  # or http://localhost:8080
export WEAVIATE_API_KEY="your-wcd-api-key"
export OPENAI_API_KEY="sk-..."

CLI Decision Tree

User wants to...
├── Connect to Weaviate
│   ├── Cloud (WCD) ─────────► weaviate.connect_to_weaviate_cloud()
│   ├── Local Docker ────────► weaviate.connect_to_local()
│   └── Custom URL ──────────► weaviate.connect_to_custom()
│
├── Create collection
│   ├── With auto-vectorization ► Configure.Vectorizer.text2vec_openai()
│   └── Bring own vectors ──────► Configure.Vectorizer.none()
│
├── Insert data
│   ├── Single object ──────► collection.data.insert()
│   ├── Bulk import ────────► collection.batch.dynamic()
│   └── With custom vector ─► DataObject(properties=..., vector=...)
│
├── Search data
│   ├── Semantic search ────► query.near_text() or query.near_vector()
│   ├── Keyword search ─────► query.bm25()
│   ├── Hybrid search ──────► query.hybrid()
│   └── With filters ───────► filters=Filter.by_property()
│
├── RAG / Generative
│   ├── Single prompt ──────► generate.near_text(single_prompt=...)
│   └── Grouped task ───────► generate.near_text(grouped_task=...)
│
└── Multi-tenancy
    ├── Create tenant ──────► collection.tenants.create()
    └── Query tenant ───────► collection.with_tenant("name")

Collection Schema

Create with Vectorizer

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    name="Article",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small"
    ),
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="content", data_type=DataType.TEXT),
        Property(name="category", data_type=DataType.TEXT),
        Property(name="view_count", data_type=DataType.INT)
    ]
)

Property Data Types

TypePythonDescription
TEXTstrTokenized text, searchable
INTintInteger numbers
NUMBERfloatFloating point
BOOLEANboolTrue/False
DATEdatetimeISO 8601 date
OBJECTdictNested object

See REFERENCE.md: Full data types, vectorizer modules, index configuration


Data Operations

Insert Single Object

articles = client.collections.get("Article")

uuid = articles.data.insert(
    properties={
        "title": "Introduction to Vector Databases",
        "content": "Vector databases store embeddings...",
        "category": "Technology"
    }
)
articles = client.collections.get("Article")

with articles.batch.dynamic() as batch:
    for item in data:
        batch.add_object(properties=item)

    # Check errors INSIDE context manager
    if batch.number_errors > 0:
        for obj in batch.failed_objects[:5]:
            print(f"Error: {obj.message}")

Update and Delete

# Update properties
articles.data.update(uuid="...", properties={"view_count": 2000})

# Delete by UUID
articles.data.delete_by_id("12345678-...")

# Delete by filter
from weaviate.classes.query import Filter
articles.data.delete_many(
    where=Filter.by_property("category").equal("Outdated")
)

Search Operations

Vector Search (Semantic)

from weaviate.classes.query import MetadataQuery

response = articles.query.near_text(
    query="machine learning algorithms",
    limit=5,
    return_metadata=MetadataQuery(distance=True)
)

for obj in response.objects:
    print(f"{obj.properties['title']} (distance: {obj.metadata.distance})")

Hybrid Search (Vector + Keyword)

response = articles.query.hybrid(
    query="neural network optimization",
    alpha=0.5,  # 0=keyword only, 1=vector only
    limit=10
)
from weaviate.classes.query import Filter

response = articles.query.near_text(
    query="artificial intelligence",
    filters=(
        Filter.by_property("category").equal("Technology") &
        Filter.by_property("view_count").greater_than(1000)
    ),
    limit=10
)

Filter Operators

OperatorUsage
equal.equal(value)
not_equal.not_equal(value)
greater_than.greater_than(value)
less_than.less_than(value)
like.like("pattern*")
contains_any.contains_any([...])

See REFERENCE.md: Aggregations, reranking, advanced filter patterns


Generative Search (RAG)

Configure and Query

from weaviate.classes.config import Configure

# Create with generative module
client.collections.create(
    name="KnowledgeBase",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    generative_config=Configure.Generative.openai(model="gpt-4o"),
    properties=[...]
)

# Single object generation
response = kb.generate.near_text(
    query="quantum computing",
    single_prompt="Summarize: {content}",
    limit=1
)
print(response.objects[0].generated)

# Grouped generation (RAG)
response = kb.generate.near_text(
    query="best practices",
    grouped_task="Based on these, provide 5 recommendations:",
    limit=5
)
print(response.generated)

See REFERENCE.md: All generative modules, reranking configuration


Multi-Tenancy

Enable and Use

from weaviate.classes.config import Configure
from weaviate.classes.tenants import Tenant

# Create multi-tenant collection
client.collections.create(
    name="CustomerData",
    multi_tenancy_config=Configure.multi_tenancy(enabled=True),
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    properties=[...]
)

# Create tenants
collection = client.collections.get("CustomerData")
collection.tenants.create([
    Tenant(name="customer_123"),
    Tenant(name="customer_456")
])

# Query specific tenant
tenant_data = collection.with_tenant("customer_123")
tenant_data.data.insert(properties={"name": "Item 1"})
response = tenant_data.query.near_text(query="search", limit=10)

See REFERENCE.md: Tenant management, activity status, offloading


Docker Setup

Basic docker-compose.yml

version: '3.8'

services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.28.2
    restart: unless-stopped
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,generative-openai'
      OPENAI_APIKEY: ${OPENAI_API_KEY}
    volumes:
      - weaviate_data:/var/lib/weaviate

volumes:
  weaviate_data:
# Start
docker-compose up -d

# Check status
curl http://localhost:8080/v1/.well-known/ready

# View logs
docker-compose logs -f weaviate

See REFERENCE.md: Production configuration, authentication, multiple modules


Error Handling

Common Errors

ErrorCauseSolution
ConnectionErrorWeaviate not reachableCheck URL, Docker running
AuthenticationErrorInvalid API keyVerify WEAVIATE_API_KEY
UnexpectedStatusCodeError 422Schema validationCheck property types
ObjectAlreadyExistsErrorDuplicate UUIDUse update or new UUID

Error Pattern

from weaviate.exceptions import (
    WeaviateConnectionError,
    UnexpectedStatusCodeError,
    ObjectAlreadyExistsException
)

try:
    uuid = collection.data.insert(properties=data)
except ObjectAlreadyExistsException:
    logging.warning("Object exists, updating instead")
except UnexpectedStatusCodeError as e:
    if e.status_code == 429:
        time.sleep(5)  # Rate limited
    raise

See REFERENCE.md: Comprehensive error handling, retry patterns


Best Practices

  1. Use context managers for automatic cleanup
  2. Batch for bulk operations (10-100x faster)
  3. Filter before vector search to reduce computation
  4. Use multi-tenancy for customer isolation
  5. Tune hybrid alpha: 0.5 start, lower for technical terms

Anti-Patterns to Avoid

  • Blocking sync calls in async code
  • Ignoring batch errors (check inside context manager)
  • Over-fetching properties (specify only needed ones)
  • Individual inserts for bulk data

See REFERENCE.md: Index optimization, compression, production readiness checklist


Quick Reference Card

# Connect
client = weaviate.connect_to_local()
client = weaviate.connect_to_weaviate_cloud(url, auth_credentials=Auth.api_key(key))

# Collection
client.collections.create(name="...", vectorizer_config=..., properties=[...])
collection = client.collections.get("...")

# Insert
collection.data.insert(properties={...})
with collection.batch.dynamic() as batch: batch.add_object(properties={...})

# Search
collection.query.near_text(query="...", limit=10)
collection.query.hybrid(query="...", alpha=0.5, limit=10)

# RAG
collection.generate.near_text(query="...", single_prompt="...", limit=1)

# Multi-tenant
collection.with_tenant("tenant_name").query.near_text(...)

Agent Integration

AgentUse Case
backend-developerVector search, RAG pipelines
deep-debuggerQuery performance, index optimization
infrastructure-developerDocker/Kubernetes deployment

Handoff to Deep-Debugger: Slow queries, index issues, batch failures. Provide query patterns, schema, errors.


See Also

Score

Total Score

65/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon