← スキル一覧に戻る
rfp-ingest
Atemndobs / nebula-rfp
⭐ 0🍴 0📅 2026年1月18日
Ingest RFP opportunities from multiple data sources (SAM.gov, eMMA, RFPMart). Use when adding new data sources, modifying ingestion logic, or debugging data fetching issues.
read, grep, glob, bash(npm:*), bash(npx:*)
SKILL.md
---
name: rfp-ingest
description: Ingest RFP opportunities from multiple data sources (SAM.gov, eMMA, RFPMart). Use when adding new data sources, modifying ingestion logic, or debugging data fetching issues.
allowed-tools: Read, Grep, Glob, Bash(npm:*), Bash(npx:*)
---
# RFP Ingestion Skill
## Overview
This skill helps implement multi-source RFP data ingestion with canonical schema normalization and deduplication.
## Supported Data Sources
| Source | Priority | API Type | Rate Limits |
|--------|----------|----------|-------------|
| SAM.gov | P1 | REST API | 10 req/sec, 10k/day |
| Maryland eMMA | P1 | Web scraping | Respectful crawling |
| RFPMart API | Current | REST API | As documented |
| RFPMart CSV | Current | Manual upload | N/A |
| GovTribe | P2 | REST API (paid) | Per subscription |
## CSV Upload (RFPMart Email Alerts)
RFPMart sends periodic email alerts with CSV attachments. These can be manually uploaded through the Admin UI.
### CSV Format (No Header Row)
| Column | Index | Content | Example |
|--------|-------|---------|---------|
| ID | 0 | RFP identifier | `SW-82097` |
| Country | 1 | Country code | `USA` |
| State | 2 | State name | `Idaho` |
| Title | 3 | Full title with location | `SW-82097 - USA (Idaho) - Data Concealment...` |
| Deadline | 4 | Due date | `March 25,2026` |
| URL | 5 | RFPMart link | `https://www.rfpmart.com/...` |
### ID Prefix → Category Mapping
```typescript
const categoryMap: Record<string, string> = {
SW: "Software Development",
ITES: "IT Services",
NET: "Networking",
TELCOM: "Telecommunications",
DRA: "Data & Research",
CSE: "Security Services",
HR: "Human Resources",
PM: "Project Management",
MRB: "Marketing & Branding",
// ... other prefixes default to "Other"
};
```
### IT-Relevant Prefixes
When filtering for IT-relevant RFPs only, these prefixes are included:
- `SW` - Software Development
- `ITES` - IT Services
- `NET` - Networking
- `TELCOM` - Telecommunications
- `DRA` - Data & Research
- `CSE` - Security Services
### Key Files
| File | Purpose |
|------|---------|
| `convex/ingestion/rfpmartCsv.ts` | CSV parser and Convex action |
| `components/admin/CsvUpload.tsx` | Drag-and-drop upload UI |
### Usage
1. Navigate to **Admin** → **Data Sources** tab
2. Scroll to **RFPMart CSV Upload** section
3. Drop a CSV file or click to browse
4. Toggle "Only import IT-relevant RFPs" if desired
5. View results summary (new/updated/skipped/errors)
### Implementation Example
```typescript
// Parsing CSV with quoted fields
function parseCSVLine(line: string): string[] {
const fields: string[] = [];
let current = "";
let inQuotes = false;
for (let i = 0; i < line.length; i++) {
const char = line[i];
if (char === '"') {
if (inQuotes && line[i + 1] === '"') {
current += '"';
i++;
} else {
inQuotes = !inQuotes;
}
} else if (char === "," && !inQuotes) {
fields.push(current);
current = "";
} else {
current += char;
}
}
fields.push(current);
return fields;
}
```
## Canonical Schema
All sources must normalize to this schema:
```typescript
interface Opportunity {
externalId: string; // Source-specific ID
source: "sam.gov" | "emma" | "rfpmart" | "govtribe";
title: string;
description: string;
summary?: string;
location: string;
category: string;
naicsCode?: string;
setAside?: string; // "Small Business", "8(a)", etc.
postedDate: number; // Unix timestamp
expiryDate: number; // Unix timestamp
url: string;
attachments?: Attachment[];
eligibilityFlags?: string[];
rawData: Record<string, unknown>;
ingestedAt: number;
}
```
## SAM.gov Integration
### API Endpoint
```
https://api.sam.gov/opportunities/v2/search
```
### Required Headers
```typescript
{
"Accept": "application/json",
"X-Api-Key": process.env.SAM_GOV_API_KEY
}
```
### Example Query
```typescript
const params = new URLSearchParams({
postedFrom: "2024-01-01",
postedTo: "2024-12-31",
limit: "100",
offset: "0",
ptype: "o", // Opportunities only
});
```
### Field Mapping
| SAM.gov Field | Canonical Field |
|---------------|-----------------|
| `noticeId` | `externalId` |
| `title` | `title` |
| `description` | `description` |
| `postedDate` | `postedDate` (parse to timestamp) |
| `responseDeadLine` | `expiryDate` (parse to timestamp) |
| `placeOfPerformance.state` | `location` |
| `naicsCode` | `naicsCode` |
| `setAsideDescription` | `setAside` |
## Convex Implementation
### Ingestion Action
```typescript
// convex/ingestion.ts
import { action, internalMutation } from "./_generated/server";
import { v } from "convex/values";
import { internal } from "./_generated/api";
export const ingestFromSam = action({
args: { daysBack: v.optional(v.number()) },
handler: async (ctx, args) => {
const apiKey = process.env.SAM_GOV_API_KEY;
if (!apiKey) throw new Error("SAM_GOV_API_KEY not configured");
const fromDate = new Date();
fromDate.setDate(fromDate.getDate() - (args.daysBack ?? 7));
const response = await fetch(
`https://api.sam.gov/opportunities/v2/search?` +
`api_key=${apiKey}&postedFrom=${fromDate.toISOString().split("T")[0]}&limit=100`,
{ headers: { Accept: "application/json" } }
);
if (!response.ok) {
throw new Error(`SAM.gov API error: ${response.status}`);
}
const data = await response.json();
let ingested = 0;
let updated = 0;
for (const opp of data.opportunitiesData ?? []) {
const result = await ctx.runMutation(internal.rfps.upsert, {
externalId: opp.noticeId,
source: "sam.gov",
title: opp.title ?? "Untitled",
description: opp.description ?? "",
location: opp.placeOfPerformance?.state ?? "USA",
category: opp.naicsCode ?? "Unknown",
postedDate: new Date(opp.postedDate).getTime(),
expiryDate: new Date(opp.responseDeadLine).getTime(),
url: `https://sam.gov/opp/${opp.noticeId}/view`,
rawData: opp,
});
if (result.action === "inserted") ingested++;
else updated++;
}
// Log ingestion
await ctx.runMutation(internal.ingestion.logIngestion, {
source: "sam.gov",
status: "completed",
recordsProcessed: data.opportunitiesData?.length ?? 0,
recordsInserted: ingested,
recordsUpdated: updated,
});
return { ingested, updated, source: "sam.gov" };
},
});
```
### Upsert Mutation
```typescript
// convex/rfps.ts (internal mutation)
export const upsert = internalMutation({
args: {
externalId: v.string(),
source: v.string(),
title: v.string(),
description: v.string(),
location: v.string(),
category: v.string(),
postedDate: v.number(),
expiryDate: v.number(),
url: v.string(),
rawData: v.optional(v.any()),
},
handler: async (ctx, args) => {
const existing = await ctx.db
.query("rfps")
.withIndex("by_external_id", (q) =>
q.eq("externalId", args.externalId).eq("source", args.source)
)
.first();
const now = Date.now();
if (existing) {
await ctx.db.patch(existing._id, { ...args, updatedAt: now });
return { id: existing._id, action: "updated" as const };
}
const id = await ctx.db.insert("rfps", {
...args,
ingestedAt: now,
updatedAt: now,
});
return { id, action: "inserted" as const };
},
});
```
## Deduplication Strategy
1. **Exact match**: `externalId` + `source` combination
2. **Title similarity**: Fuzzy match titles within same deadline window
3. **URL canonicalization**: Normalize URLs before comparison
## Eligibility Pre-Filtering
Detect disqualifiers during ingestion:
```typescript
const DISQUALIFIER_PATTERNS = [
{ pattern: /u\.?s\.?\s*(citizen|company|organization)\s*only/i, flag: "us-org-only" },
{ pattern: /onshore\s*(only|required)/i, flag: "onshore-required" },
{ pattern: /on-?site\s*(required|mandatory)/i, flag: "onsite-required" },
{ pattern: /security\s*clearance\s*required/i, flag: "clearance-required" },
{ pattern: /small\s*business\s*set[- ]aside/i, flag: "small-business-set-aside" },
];
function detectEligibilityFlags(text: string): string[] {
return DISQUALIFIER_PATTERNS
.filter(({ pattern }) => pattern.test(text))
.map(({ flag }) => flag);
}
```
## Scheduled Ingestion
```typescript
// convex/crons.ts
import { cronJobs } from "convex/server";
import { internal } from "./_generated/api";
const crons = cronJobs();
crons.interval(
"ingest-sam-gov",
{ hours: 6 },
internal.ingestion.ingestFromSam,
{ daysBack: 3 }
);
export default crons;
```
## Error Handling
| Error Type | Action |
|------------|--------|
| Rate limit (429) | Exponential backoff, retry after delay |
| Auth error (401/403) | Log error, alert admin |
| Server error (5xx) | Retry up to 3 times |
| Parse error | Log raw data, skip record |
## Testing Approach
1. Mock API responses for unit tests
2. Use sandbox/test endpoints when available
3. Validate schema transformation
4. Test deduplication logic
5. Verify eligibility flag detection