スキル一覧に戻る
oriolrius

apache-nifi

by oriolrius

Production-ready multi-cluster Apache NiFi platform with unified CLI, automated provisioning, shared PKI, and Site-to-Site communication. Deploy isolated clusters with TLS/SSL, comprehensive testing (31 validations + 32 tests), and visual Mermaid documentation.

0🍴 0📅 2025年11月14日
GitHubで見るManusで実行

SKILL.md


name: apache-nifi description: Expert guidance for Apache NiFi data integration platform including flow design, processors, controller services, process groups, NiFi Registry integration, and cluster configuration. Use this when working with data flows, processors, or NiFi configuration. tags: [nifi, dataflow, data-integration] color: orange

Apache NiFi Expert Skill

You are an expert in Apache NiFi, a powerful data integration and distribution platform.

Core Concepts

Architecture

  • FlowFiles: Data packages flowing through NiFi (content + attributes)
  • Processors: Building blocks for data transformation and routing
  • Connections: Queues between processors with backpressure
  • Process Groups: Containers for organizing flows
  • Controller Services: Shared services (DB connections, SSL contexts, etc.)
  • Reporting Tasks: Monitoring and metrics collection

Key Processors

  • Data Ingestion: GetFile, ListenHTTP, ConsumeKafka, GetFTP
  • Data Egress: PutFile, InvokeHTTP, PublishKafka, PutFTP
  • Routing & Transformation: RouteOnAttribute, UpdateAttribute, JoltTransformJSON
  • Content: ExtractText, ReplaceText, SplitText, MergeContent
  • Database: ExecuteSQL, PutSQL, QueryDatabaseTable
  • Scripting: ExecuteScript (Groovy, Python, JavaScript)

Flow Design Best Practices

Organization

Process Group Structure:
├── Input Group (data ingestion)
├── Validation Group (data quality)
├── Transformation Group (ETL)
├── Enrichment Group (lookups)
└── Output Group (data egress)

Performance

  • Use backpressure thresholds (object count + data size)
  • Enable connection load balancing for clusters
  • Set appropriate concurrent tasks per processor
  • Use batching for high-volume flows

Error Handling

  • Always configure failure relationships
  • Use funnels to consolidate error handling
  • Implement retry logic with penalization
  • Log errors to provenance repository

NiFi Registry Integration

Version Control

# Connect to Registry
1. Add Registry Client in NiFi UI (Controller Settings)
2. Right-click Process Group → Version → Start version control
3. Select Registry, Bucket, and Flow name
4. Commit changes with descriptive messages

Best Practices

  • Version control all production flows
  • Use meaningful commit messages
  • Create separate buckets per environment (dev/test/prod)
  • Tag stable releases

Expression Language

Common Expressions

# Attributes
${filename}                           # Get attribute value
${filename:isEmpty()}                 # Check if empty
${filename:replace('.txt', '.csv')}   # String manipulation

# Date/Time
${now():format('yyyy-MM-dd')}         # Current date
${created:toDate('yyyy-MM-dd'):format('MM/dd/yyyy')} # Convert format

# Conditional
${attribute:equals('value'):ifElse('yes', 'no')} # If-else logic

# Content
${file.size}                          # FlowFile size
${file.size:gt(1000000)}             # Size comparison

Configuration Files

nifi.properties (Key Settings)

# Web UI
nifi.web.http.host=0.0.0.0
nifi.web.http.port=8080

# Clustering
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-node1
nifi.cluster.node.protocol.port=11443

# State Management
nifi.state.management.embedded.zookeeper.start=true
nifi.zookeeper.connect.string=localhost:2181

# Repository Settings
nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository

Docker Deployment

Single Node

services:
  nifi:
    image: apache/nifi:latest
    ports:
      - "8443:8443"
    environment:
      - SINGLE_USER_CREDENTIALS_USERNAME=admin
      - SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUghvvgEvjnaLjFEB
    volumes:
      - ./nifi/conf:/opt/nifi/nifi-current/conf
      - ./nifi/content_repository:/opt/nifi/nifi-current/content_repository
      - ./nifi/database_repository:/opt/nifi/nifi-current/database_repository
      - ./nifi/flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
      - ./nifi/provenance_repository:/opt/nifi/nifi-current/provenance_repository
      - ./nifi/state:/opt/nifi/nifi-current/state

Cluster Configuration

services:
  zookeeper:
    image: zookeeper:3.8
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zookeeper:2888:3888;2181

  nifi-1:
    image: apache/nifi:latest
    environment:
      - NIFI_CLUSTER_IS_NODE=true
      - NIFI_CLUSTER_NODE_PROTOCOL_PORT=11443
      - NIFI_ZK_CONNECT_STRING=zookeeper:2181
      - NIFI_ELECTION_MAX_WAIT=1 min
    depends_on:
      - zookeeper

Security

Authentication

  • Single-User: Basic username/password (dev/test)
  • LDAP: Enterprise directory integration
  • OIDC: OAuth2/OpenID Connect (Keycloak)
  • Certificate: mTLS with client certificates

Authorization

  • File-based: policies.xml
  • Ranger: Apache Ranger integration
  • OIDC groups: Map OIDC groups to NiFi policies

Monitoring & Troubleshooting

Key Metrics

  • FlowFile counts and sizes in queues
  • Processor task durations
  • System diagnostics (CPU, memory, disk)
  • Bulletin board for errors

Logs

# Main logs
logs/nifi-app.log          # Application log
logs/nifi-user.log         # User actions
logs/nifi-bootstrap.log    # Bootstrap process

# Enable DEBUG logging
conf/logback.xml → Set logger level to DEBUG

Common Issues

IssueSolution
Out of MemoryIncrease heap in bootstrap.conf
Disk fullClean content/provenance repos
Connection timeoutCheck network/firewall rules
FlowFile stuckCheck backpressure, processor logs

PLC4X Integration

PLC4X Processors

  • GetPLC: Read data from PLCs (Siemens S7, Modbus, OPC-UA, etc.)
  • PutPLC: Write data to PLCs

Example Configuration

Connection String: s7://192.168.1.100
Fields: DB1.DBW0:INT,DB1.DBD4:REAL,DB1.DBX8.0:BOOL
Polling Interval: 1 sec

API & CLI

NiFi REST API

# Get process groups
curl -u admin:password https://localhost:8443/nifi-api/process-groups/root

# Get processor status
curl -u admin:password https://localhost:8443/nifi-api/processors/{id}/status

NiFi CLI (Toolkit)

# Export flow
nifi-toolkit/cli.sh nifi export-flow -u http://localhost:8080/nifi -b /tmp/backup.json

# Import flow
nifi-toolkit/cli.sh nifi import-flow -u http://localhost:8080/nifi -i /tmp/backup.json

Resources

スコア

総合スコア

65/100

リポジトリの品質指標に基づく評価

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

0/10
説明文

100文字以上の説明がある

+10
人気

GitHub Stars 100以上

0/15
最近の活動

3ヶ月以内に更新

+5
フォーク

10回以上フォークされている

0/5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

レビュー

💬

レビュー機能は近日公開予定です