polars-expertise

Name: polars-expertise
Rating: 65
Author: DeevsDeevs

by DeevsDeevs

Perfect agent system to be 10x Deevs' engineer

⭐ 17🍴 3📅 Jan 23, 2026

ai-agents claude-code opencode

View on GitHub Run in Manus

SKILL.md

name: polars-expertise version: 1.0.0 description: > This skill should be used when the user asks about Polars DataFrame library (Apache Arrow) for Python or Rust. Triggers: "polars expressions", "lazy vs eager", "scan_parquet streaming", "convert pandas to polars", "pyspark to polars", "kdb to polars", "group_by_dynamic", "rolling_mean", "polars window functions", "asof join", "polars GPU", "polars parquet", "LazyFrame". Time series: OHLCV resampling, rolling windows, financial data patterns. Performance: native expressions over map_elements, early projection, categorical types, streaming.

Polars

High-performance DataFrame library built on Apache Arrow. Supports Python and Rust with expression-based API, lazy evaluation, and automatic parallelization.

Quick Start

Python

uv pip install polars
# GPU support: uv pip install polars[gpu]

import polars as pl

# Eager: immediate execution
df = pl.DataFrame({"symbol": ["AAPL", "GOOG"], "price": [150.0, 140.0]})
df.filter(pl.col("price") > 145).select("symbol", "price")

# Lazy: optimized execution (preferred for large data)
lf = pl.scan_parquet("trades.parquet")
result = lf.filter(pl.col("volume") > 1000).group_by("symbol").agg(
    pl.col("price").mean().alias("avg_price")
).collect()

Rust

# Cargo.toml - select features you need
[dependencies]
polars = { version = "0.46", features = ["lazy", "parquet", "temporal"] }

use polars::prelude::*;

fn main() -> PolarsResult<()> {
    // Eager
    let df = df![
        "symbol" => ["AAPL", "GOOG"],
        "price" => [150.0, 140.0]
    ]?;

    // Lazy (preferred)
    let lf = LazyFrame::scan_parquet("trades.parquet", Default::default())?;
    let result = lf
        .filter(col("volume").gt(lit(1000)))
        .group_by([col("symbol")])
        .agg([col("price").mean().alias("avg_price")])
        .collect()?;
    Ok(())
}

Core Pattern: Expressions

Everything in Polars is an expression. Expressions are composable, lazy, and parallelized.

# Expression building blocks
pl.col("price")                      # column reference
pl.col("price") * pl.col("volume")   # arithmetic
pl.col("price").mean().over("symbol") # window function
pl.when(cond).then(a).otherwise(b)   # conditional

Expressions execute in contexts: select(), with_columns(), filter(), group_by().agg()

When to Use Lazy

Use Lazy (`scan_*`, `.lazy()`)	Use Eager (`read_*`)
Large files (> RAM)	Small data, exploration
Complex pipelines	Simple one-off ops
Need query optimization	Interactive notebooks
Streaming required	Immediate feedback

Lazy benefits: predicate pushdown, projection pushdown, parallel execution, streaming.

Style: Use `.alias()` for Column Naming

Always use .alias("name") instead of name=expr kwargs:

# GOOD: Explicit .alias() - works everywhere, composable
df.with_columns(
    (pl.col("price") * pl.col("volume")).alias("value"),
    pl.col("price").mean().over("symbol").alias("avg_price")
)

# AVOID: Kwarg style - less flexible, doesn't chain
df.with_columns(
    value=pl.col("price") * pl.col("volume"),  # avoid
    avg_price=pl.col("price").mean().over("symbol")  # avoid
)

.alias() is explicit, chains with other methods, and works consistently in all contexts.

Anti-Patterns - AVOID

# BAD: Python functions kill parallelization
df.with_columns(pl.col("x").map_elements(lambda x: x * 2))  # SLOW

# GOOD: Native expressions are parallel
df.with_columns((pl.col("x") * 2).alias("x"))  # FAST

# BAD: Row iteration
for row in df.iter_rows():  # SLOW
    process(row)

# GOOD: Columnar operations
df.with_columns(process_expr)  # FAST

# BAD: Late projection
lf.filter(...).collect().select("a", "b")  # reads all columns

# GOOD: Early projection
lf.select("a", "b").filter(...).collect()  # reads only needed columns

Performance Checklist

Using scan_* (lazy) for large files?
Projecting columns early in pipeline?
Using native expressions (no map_elements)?
Categorical dtype for low-cardinality strings?
Appropriate integer sizes (i32 vs i64)?
Streaming for out-of-memory data? (collect(engine="streaming"))

Reference Navigator

Python References

Topic	File	When to Load
Expressions, types, lazy/eager	python/core_concepts.md	Understanding fundamentals
Select, filter, group_by, window	python/operations.md	Common operations
CSV, Parquet, streaming I/O	python/io_guide.md	Reading/writing data
Joins, pivots, reshaping	python/transformations.md	Combining/reshaping data
Performance, patterns	python/best_practices.md	Optimization

Rust References

Topic	File	When to Load
DataFrame, Series, ChunkedArray	rust/core_concepts.md	Rust API fundamentals
Expression API in Rust	rust/operations.md	Operations syntax
Readers, writers, streaming	rust/io_guide.md	I/O operations
Feature flags, crates	rust/features.md	Cargo setup
Allocators, SIMD, nightly	rust/performance.md	Performance tuning
Zero-copy, FFI, Arrow	rust/arrow_interop.md	Arrow integration

Shared References

Topic	File	When to Load
SQL queries on DataFrames	sql_interface.md	SQL syntax needed
Query optimization, streaming	lazy_deep_dive.md	Understanding lazy engine
NVIDIA GPU acceleration	gpu_support.md	GPU setup/usage

Migration Guides

From	File	When to Load
pandas	migration_pandas.md	Converting pandas code
PySpark	migration_spark.md	Converting Spark code
q/kdb+	migration_qkdb.md	Converting kdb code

Time Series / Financial Data Quick Patterns

# OHLCV resampling
df.group_by_dynamic("timestamp", every="1m").agg(
    pl.col("price").first().alias("open"),
    pl.col("price").max().alias("high"),
    pl.col("price").min().alias("low"),
    pl.col("price").last().alias("close"),
    pl.col("volume").sum()
)

# Rolling statistics
df.with_columns(
    pl.col("price").rolling_mean(window_size=20).alias("sma_20"),
    pl.col("price").rolling_std(window_size=20).alias("volatility")
)

# As-of join for market data alignment
trades.join_asof(quotes, on="timestamp", by="symbol", strategy="backward")

Load python/best_practices.md for comprehensive time series patterns.

Runnable Examples

Example	File	Purpose
Financial OHLCV	examples/financial_ohlcv.py	OHLCV resampling, rolling stats, VWAP
Pandas Migration	examples/pandas_migration.py	Side-by-side pandas vs polars
Streaming Large Files	examples/streaming_large_file.py	Out-of-memory processing patterns

Development Tips

Use LSP for navigating Polars code:

Python: Pyright/Pylance provides excellent type inference for Polars expressions
Rust: rust-analyzer understands Polars types and expression chains

LSP operations like goToDefinition and hover help explore Polars API without leaving the editor.

Score

Total Score

65/100

Based on repository quality metrics

✓SKILL.md

SKILL.mdファイルが含まれている

+20

✓LICENSE

ライセンスが設定されている

+10

○説明文

100文字以上の説明がある

0/10

○人気

GitHub Stars 100以上

0/15

✓最近の活動

1ヶ月以内に更新

+10

○フォーク

10回以上フォークされている

0/5

✓Issue管理

オープンIssueが50未満

✓言語

プログラミング言語が設定されている

✓タグ

1つ以上のタグが設定されている

Reviews

💬

Reviews coming soon

polars-expertise

SKILL.md

Polars

Quick Start

Python

Rust

Core Pattern: Expressions

When to Use Lazy

Style: Use `.alias()` for Column Naming

Anti-Patterns - AVOID

Performance Checklist

Reference Navigator

Python References

Rust References

Shared References

Migration Guides

Time Series / Financial Data Quick Patterns

Runnable Examples

Development Tips

Score

Reviews

greeter

pr-creator

code-reviewer

skill-creator

browser-use

git-workflow

polars-expertise

SKILL.md

Polars

Quick Start

Python

Rust

Core Pattern: Expressions

When to Use Lazy

Style: Use .alias() for Column Naming

Anti-Patterns - AVOID

Performance Checklist

Reference Navigator

Python References

Rust References

Shared References

Migration Guides

Time Series / Financial Data Quick Patterns

Runnable Examples

Development Tips

Score

Reviews

Related

Related Skills

greeter

pr-creator

code-reviewer

skill-creator

browser-use

git-workflow

Style: Use `.alias()` for Column Naming