← スキル一覧に戻る

data-quality-checker
by armanzeroeight
🚀 A collection of Claude subagents, skills, rules, guides, and blueprints for Developers, Engineers, and Creators. | Covering programming languages, DevOps, Cloud, and beyond.
⭐ 20🍴 4📅 2026年1月18日
SKILL.md
name: data-quality-checker description: Implement data quality checks, validation rules, and monitoring. Use when ensuring data quality, validating data pipelines, or implementing data governance.
Data Quality Checker
Implement comprehensive data quality checks and validation.
Quick Start
Use Great Expectations for validation, implement schema checks, monitor data quality metrics, set up alerts.
Instructions
Great Expectations Setup
import great_expectations as gx
context = gx.get_context()
# Create expectation suite
suite = context.add_expectation_suite("data_quality_suite")
# Add expectations
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name="data_quality_suite"
)
# Schema validation
validator.expect_table_columns_to_match_ordered_list(
column_list=["id", "name", "email", "created_at"]
)
# Null checks
validator.expect_column_values_to_not_be_null("email")
# Value ranges
validator.expect_column_values_to_be_between("age", min_value=0, max_value=120)
# Uniqueness
validator.expect_column_values_to_be_unique("email")
# Run validation
results = validator.validate()
Custom Validation Rules
def validate_data_quality(df):
issues = []
# Check for nulls
null_counts = df.isnull().sum()
if null_counts.any():
issues.append(f"Null values found: {null_counts[null_counts > 0]}")
# Check for duplicates
duplicates = df.duplicated().sum()
if duplicates > 0:
issues.append(f"Found {duplicates} duplicate rows")
# Check data freshness
max_date = df['created_at'].max()
if (datetime.now() - max_date).days > 1:
issues.append("Data is stale")
return issues
Data Quality Metrics
def calculate_quality_metrics(df):
return {
'completeness': 1 - (df.isnull().sum().sum() / df.size),
'uniqueness': df.drop_duplicates().shape[0] / df.shape[0],
'validity': (df['email'].str.contains('@').sum() / len(df)),
'timeliness': (datetime.now() - df['created_at'].max()).days
}
Best Practices
- Validate at ingestion
- Monitor quality metrics
- Set up alerts for failures
- Document quality rules
- Regular quality audits
- Track quality trends
スコア
総合スコア
70/100
リポジトリの品質指標に基づく評価
✓SKILL.md
SKILL.mdファイルが含まれている
+20
✓LICENSE
ライセンスが設定されている
+10
✓説明文
100文字以上の説明がある
+10
○人気
GitHub Stars 100以上
0/15
✓最近の活動
3ヶ月以内に更新
+5
○フォーク
10回以上フォークされている
0/5
✓Issue管理
オープンIssueが50未満
+5
○言語
プログラミング言語が設定されている
0/5
✓タグ
1つ以上のタグが設定されている
+5
レビュー
💬
レビュー機能は近日公開予定です
