Back to list
bobmatnyc

cloudflare-workers-edge-ai-development

by bobmatnyc

Dynamic RAG-powered skills service for code assistants via MCP - Vector + Knowledge Graph hybrid search for intelligent skill discovery

8🍴 0📅 Jan 18, 2026

SKILL.md


name: Cloudflare Workers & Edge AI Development skill_id: cloudflare-edge-ai version: 1.0.0 description: Build ultra-low-latency edge computing applications with Cloudflare Workers, Workers AI for LLM inference, V8 isolates, Durable Objects, and serverless patterns deployed across 330+ data centers worldwide category: Cloud & Serverless tags:

  • cloudflare
  • edge-computing
  • serverless
  • workers
  • workers-ai
  • v8-isolates
  • durable-objects
  • edge-ai
  • low-latency
  • wrangler author: mcp-skillset license: MIT created: 2025-11-25 last_updated: 2025-11-25 toolchain:
  • Cloudflare Workers
  • Wrangler CLI
  • Workers AI
  • TypeScript frameworks:
  • Cloudflare Workers
  • Hono (lightweight framework)
  • Vectorize (vector DB) related_skills:
  • fastapi-web-development
  • terraform-infrastructure
  • web3-blockchain

Cloudflare Workers & Edge AI Development

Overview

Master Cloudflare Workers - the fastest serverless platform with <1ms cold starts and Workers AI for running LLMs at the edge. Deploy across 330+ data centers globally for ultra-low-latency applications that run close to your users.

When to Use This Skill

  • Building globally distributed APIs with <50ms latency
  • Running AI/LLM inference at the edge (Workers AI)
  • Creating serverless backends without managing infrastructure
  • Implementing edge middleware (auth, rate limiting, A/B testing)
  • Building real-time collaborative applications (Durable Objects)
  • Processing high-traffic workloads cost-effectively
  • Deploying static sites with dynamic edge logic

Core Principles

1. V8 Isolates (Not Containers)

Workers run in V8 isolates - much faster than containers

// Basic Worker structure
export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    // ✅ <1ms cold start (V8 isolate, not container!)
    // ✅ Runs in 330+ locations automatically
    // ✅ 0ms warm start (keeps isolate alive)

    return new Response("Hello from the edge!", {
      headers: { "Content-Type": "text/plain" }
    });
  }
};

// WRONG: Don't use Node.js APIs (not available)
// import fs from 'fs';  // ❌ No filesystem
// process.env.VAR;      // ❌ No process object

// CORRECT: Use Workers APIs
const value = env.MY_KV_NAMESPACE.get("key");  // ✅ KV storage
const response = await fetch("https://api.example.com");  // ✅ fetch API

Key Differences from Node.js/Lambda:

  • ❌ No filesystem access
  • ❌ No Node.js built-ins (fs, http, crypto from Node)
  • ✅ Web Standard APIs (fetch, Request, Response, WebSockets)
  • ✅ Sub-millisecond cold starts
  • ✅ No VPC configuration needed

2. Workers AI - LLM Inference at the Edge

// Run LLaMA 2, Mistral, or other models at the edge
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { prompt } = await request.json();

    // Text generation with LLaMA 2
    const response = await env.AI.run("@cf/meta/llama-2-7b-chat-int8", {
      messages: [
        { role: "system", content: "You are a helpful assistant" },
        { role: "user", content: prompt }
      ]
    });

    return Response.json(response);
  }
};

// Image generation
const image = await env.AI.run("@cf/stabilityai/stable-diffusion-xl-base-1.0", {
  prompt: "A futuristic city at sunset"
});

// Text embeddings (for vector search)
const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: "Hello, world!"
});

// Image classification
const result = await env.AI.run("@cf/microsoft/resnet-50", {
  image: imageBytes
});

Workers AI Benefits:

  • 🚀 <50ms inference latency globally
  • 💰 Pay only for inference time (no idle costs)
  • 🌍 Runs in 330+ locations automatically
  • 🔒 Data never leaves Cloudflare's network

3. KV Storage for Edge Data

// wrangler.toml
# kv_namespaces = [
#   { binding = "MY_KV", id = "xxxxx", preview_id = "yyyyy" }
# ]

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    const key = url.pathname.slice(1);

    // Read from KV (eventually consistent)
    const value = await env.MY_KV.get(key);

    if (value === null) {
      return new Response("Not found", { status: 404 });
    }

    // Write to KV
    await env.MY_KV.put("user:123", JSON.stringify({ name: "Alice" }), {
      expirationTtl: 3600,  // Expire in 1 hour
      metadata: { createdAt: Date.now() }
    });

    // List keys
    const list = await env.MY_KV.list({ prefix: "user:" });

    // Delete key
    await env.MY_KV.delete(key);

    return new Response(value);
  }
};

KV Best Practices:

  • ✅ Use for read-heavy workloads (caching, config)
  • ✅ Values up to 25 MB
  • ✅ Eventually consistent (may take 60s to propagate)
  • ❌ Not for transactional data or immediate consistency

4. Durable Objects for Stateful Logic

// Durable Object - Single-instance, strongly consistent storage
export class RateLimiter {
  state: DurableObjectState;

  constructor(state: DurableObjectState, env: Env) {
    this.state = state;
  }

  async fetch(request: Request) {
    const count = (await this.state.storage.get("count")) || 0;
    const newCount = count + 1;

    if (newCount > 100) {
      return new Response("Rate limit exceeded", { status: 429 });
    }

    await this.state.storage.put("count", newCount);
    return new Response(`Request ${newCount}/100`);
  }
}

// Worker that uses Durable Object
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    // Get Durable Object instance (one per user)
    const id = env.RATE_LIMITER.idFromName("user:123");
    const stub = env.RATE_LIMITER.get(id);

    // Forward request to Durable Object
    return stub.fetch(request);
  }
};

// wrangler.toml
# [[durable_objects.bindings]]
# name = "RATE_LIMITER"
# class_name = "RateLimiter"
# script_name = "my-worker"

Durable Objects Use Cases:

  • ✅ Real-time collaboration (Google Docs-style)
  • ✅ WebSocket connections
  • ✅ Distributed locks and coordination
  • ✅ Game servers, chat rooms
  • ✅ Strongly consistent counters
// Vector database at the edge
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { text } = await request.json();

    // Generate embedding with Workers AI
    const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
      text: text
    });

    // Insert into vector database
    await env.VECTORIZE.insert([
      {
        id: "doc-1",
        values: embedding.data[0],
        metadata: { text: text }
      }
    ]);

    // Query similar vectors
    const results = await env.VECTORIZE.query(embedding.data[0], {
      topK: 5,
      returnMetadata: true
    });

    return Response.json(results);
  }
};

// wrangler.toml
# [[vectorize]]
# binding = "VECTORIZE"
# index_name = "my-index"

Best Practices

Request Routing & Middleware

import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { bearerAuth } from 'hono/bearer-auth';

const app = new Hono<{ Bindings: Env }>();

// CORS middleware
app.use('/*', cors({
  origin: ['https://example.com'],
  allowMethods: ['GET', 'POST', 'PUT', 'DELETE'],
}));

// Authentication middleware
app.use('/api/*', bearerAuth({ token: 'secret-token' }));

// Rate limiting
app.use('/api/*', async (c, next) => {
  const ip = c.req.header('cf-connecting-ip');
  const id = c.env.RATE_LIMITER.idFromName(ip);
  const limiter = c.env.RATE_LIMITER.get(id);

  const response = await limiter.fetch(c.req.raw);
  if (response.status === 429) {
    return c.text('Rate limit exceeded', 429);
  }

  await next();
});

// Routes
app.get('/api/users/:id', async (c) => {
  const userId = c.req.param('id');
  const user = await c.env.MY_KV.get(`user:${userId}`);
  return c.json(JSON.parse(user));
});

app.post('/api/chat', async (c) => {
  const { message } = await c.req.json();

  const response = await c.env.AI.run("@cf/meta/llama-2-7b-chat-int8", {
    messages: [{ role: "user", content: message }]
  });

  return c.json(response);
});

export default app;

Caching Strategies

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const cacheUrl = new URL(request.url);
    const cacheKey = new Request(cacheUrl.toString(), request);
    const cache = caches.default;

    // Check cache first
    let response = await cache.match(cacheKey);

    if (!response) {
      // Cache miss - fetch from origin
      response = await fetch(request);

      // Clone response to cache it
      response = new Response(response.body, response);
      response.headers.set("Cache-Control", "public, max-age=3600");

      // Don't await cache.put (let it run in background)
      ctx.waitUntil(cache.put(cacheKey, response.clone()));
    }

    return response;
  }
};

Environment Variables & Secrets

// wrangler.toml
# [vars]
# API_URL = "https://api.example.com"
#
# Run: wrangler secret put API_KEY
# (for sensitive values)

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const apiUrl = env.API_URL;       // From [vars]
    const apiKey = env.API_KEY;       // From secrets

    const response = await fetch(`${apiUrl}/data`, {
      headers: { "Authorization": `Bearer ${apiKey}` }
    });

    return response;
  }
};

Common Patterns

A/B Testing at the Edge

export default {
  async fetch(request: Request): Promise<Response> {
    const cookie = request.headers.get('cookie') || '';
    const variant = cookie.includes('variant=b') ? 'B' : 'A';

    // Or use hash of IP for consistent assignment
    // const ip = request.headers.get('cf-connecting-ip');
    // const variant = hashIP(ip) % 2 === 0 ? 'A' : 'B';

    if (variant === 'B') {
      return fetch('https://variant-b.example.com');
    }

    return fetch('https://variant-a.example.com');
  }
};

Geolocation-Based Routing

export default {
  async fetch(request: Request): Promise<Response> {
    const country = request.cf?.country;

    if (country === 'US') {
      return fetch('https://us.api.example.com');
    } else if (country === 'EU') {
      return fetch('https://eu.api.example.com');
    }

    return fetch('https://global.api.example.com');
  }
};

Anti-Patterns

❌ DON'T: Block the response

// BAD: Slow, synchronous loop
export default {
  async fetch(request: Request): Promise<Response> {
    for (let i = 0; i < 1000000; i++) {
      // Blocks isolate!
    }
    return new Response("Done");
  }
};

// GOOD: Use ctx.waitUntil for background work
export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    ctx.waitUntil(
      (async () => {
        // Background work (logging, analytics)
        await env.MY_KV.put("last-request", Date.now());
      })()
    );

    return new Response("Done");
  }
};

❌ DON'T: Use Workers for long-running tasks

Workers CPU limit: 10ms (free), 50ms (paid), 30s (unbound)

✅ Good for: API requests, edge logic, AI inference
❌ Bad for: Video encoding, large file processing, long ML training

Testing & Development

# Install Wrangler CLI
npm install -g wrangler

# Create new project
wrangler init my-worker

# Local development (with Workers AI, KV, DO)
wrangler dev

# Deploy to production
wrangler deploy

# Tail logs
wrangler tail

# Run tests with Vitest
npm test
  • fastapi-web-development: Compare serverless vs traditional APIs
  • terraform-infrastructure: Deploy Workers with Terraform
  • web3-blockchain: Build Web3 apps on Cloudflare

Additional Resources

Example Questions

  • "How do I run LLaMA 2 at the edge with Workers AI?"
  • "Show me how to implement rate limiting with Durable Objects"
  • "What's the difference between KV and Durable Objects?"
  • "How do I cache API responses at the edge?"
  • "Write a Worker that does geolocation-based routing"

Score

Total Score

75/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon