log_ingest

A Rust CLI tool for ingesting and searching application log files.

Overview

Parses application logs containing signature messages and loads them into SQLite for querying. Also provides fast text search across log files with session-aware context expansion. Designed to handle large log volumes (10GB+ per day) with parallel processing.

Features

  • Parse signature: messages extracting app info, device details, and feature flags
  • Support for both plain .log and gzip compressed .log.gz files
  • File discovery by date range using YYYY/mm/dd directory structure
  • Batched inserts for performance with large files
  • Parallel file processing for multi-day ingestion
  • Indexed columns (session_id, version) for efficient queries
  • Extensible parser architecture for adding new message types
  • Full-text search with timestamp and message extraction
  • Session-aware expanded search that follows changeSessionId chains and correlation IDs
  • Exception search filtered by app signature

Installation

cargo build --release

Usage

The CLI uses subcommands: signature, search, and search-exceptions.

signature — Load log entries into SQLite

Process a single file

log_ingest signature --file /path/to/logs.log --output output.db

Process a date range

log_ingest signature \
  --from 2026/01/20 \
  --to 2026/01/21 \
  --base-dir /var/log/myapp \
  --filename app.log \
  --output output.db

The tool will look for files at <base-dir>/YYYY/MM/DD/<filename>.gz or <base-dir>/YYYY/MM/DD/<filename> for each day in the range.

Parallel processing

When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.

# Use all CPU cores (default)
log_ingest signature --from 2026/01/01 --to 2026/01/31 ...

# Limit to 4 threads
log_ingest signature --threads 4 --from 2026/01/01 --to 2026/01/31 ...

# Sequential processing (disable parallelism)
log_ingest signature --threads 1 --from 2026/01/01 --to 2026/01/31 ...

Options

Option Description
--file <PATH> Single log file to process
--from <DATE> Start date (YYYY/mm/dd)
--to <DATE> End date (YYYY/mm/dd)
--base-dir <PATH> Base directory containing log files
--filename <NAME> Log filename (e.g., app.log)
-o, --output <PATH> Output SQLite database path
--batch-size <N> Batch size for inserts (default: 10000)
--threads <N> Number of parallel threads (0 = all cores, 1 = sequential)

search — Search log files for matching lines

Searches a log file for lines containing a query string and prints the timestamp and message for each match. Supports both plain .log and gzip .log.gz files, with parallel chunk-based processing for plain files.

# Basic search
log_ingest search --file /path/to/logs.log --query "NullPointerException"

# Include correlationId in output
log_ingest search --file /path/to/logs.log --query "timeout" -c

# Expanded search: find all lines sharing sessionId/correlationId with matches,
# following changeSessionId chains backward to the session start (signature line)
log_ingest search --file /path/to/logs.log --query "Exception" -e

Expand mode (-e) performs a two-pass search:

  1. Pass 1: Scans the entire file to find matching lines and collects their session IDs and correlation IDs. Also builds a changeSessionId graph and tracks which sessions have signature lines.
  2. Expansion: Follows changeSessionId chains backward from seed sessions, stopping at sessions that have a signature (session start boundary).
  3. Pass 2: Re-reads the file and outputs all lines belonging to expanded session IDs or correlation IDs, stopping early when all expanded sessions are destroyed (sessionDestroyed).

Options

Option Description
--file <PATH> Log file to search
--query <TEXT> Text to search for in log lines
-c, --correlation-id Include correlationId in output
-e, --expand Expand results to full session context
--threads <N> Number of parallel threads (0 = all cores, 1 = sequential)

search-exceptions — Search for exceptions filtered by app

A specialized search that finds Exception lines and expands them to full session context, filtered to only sessions belonging to specific apps (identified by their signature: line).

# Find exceptions for a specific app
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP

# Filter to multiple apps
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP --app ANOTHER_APP

This uses the same two-pass expand approach as search -e, with an additional app-filtering step that:

  • Matches expanded sessions to their app via the signature:APP/... line
  • Propagates app membership forward through changeSessionId chains
  • Filters correlation IDs to only those originating from matching-app sessions
  • Uses strict app isolation in pass 2 to prevent lines from non-matching apps leaking through shared correlation IDs

Options

Option Description
--file <PATH> Log file to search
--app <NAME> Filter to sessions with this app signature (repeatable)
--threads <N> Number of parallel threads (0 = all cores, 1 = sequential)

Database Schema

The schema uses normalized lookup tables to minimize disk usage for large datasets.

-- Lookup tables for low-cardinality text columns
CREATE TABLE apps (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE models (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE devices (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE os_versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE app_names (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);

-- Main table with foreign keys and millisecond timestamp
CREATE TABLE signature_entries (
    id INTEGER PRIMARY KEY,
    session_id TEXT NOT NULL,
    timestamp_ms INTEGER NOT NULL,  -- Unix epoch milliseconds
    app_id INTEGER NOT NULL REFERENCES apps(id),
    version_id INTEGER NOT NULL REFERENCES versions(id),
    offline_login_usage INTEGER,
    is_password_autofill_enabled INTEGER,
    camera_roll_usage INTEGER,
    os_id INTEGER REFERENCES os_versions(id),
    app_name_id INTEGER REFERENCES app_names(id),
    touch_id INTEGER,
    is_offline_login_enabled INTEGER,
    model_id INTEGER REFERENCES models(id),
    device_id INTEGER REFERENCES devices(id),
    password_autofill_usage INTEGER
);

CREATE INDEX idx_session_id ON signature_entries(session_id);
CREATE INDEX idx_timestamp ON signature_entries(timestamp_ms);
CREATE INDEX idx_version ON signature_entries(version_id);

Example Queries

-- Percentage of users with password autofill enabled
SELECT
    ROUND(100.0 * SUM(is_password_autofill_enabled) / COUNT(*), 2) as pct
FROM signature_entries;

-- Count by app version
SELECT v.name as version, COUNT(*) as cnt
FROM signature_entries se
JOIN versions v ON se.version_id = v.id
GROUP BY v.name
ORDER BY cnt DESC;

-- Device breakdown
SELECT d.name as device, COUNT(*) as cnt
FROM signature_entries se
JOIN devices d ON se.device_id = d.id
GROUP BY d.name;

-- Convert timestamp_ms to readable datetime
SELECT
    datetime(timestamp_ms / 1000, 'unixepoch') as timestamp,
    session_id
FROM signature_entries
LIMIT 10;

Development

# Build
cargo build

# Run tests
cargo test

# Format
cargo fmt

# Lint
cargo clippy

Cross-Compilation

To build a Linux x86_64 binary from macOS:

  1. Install cargo-zigbuild and Zig:

    cargo install cargo-zigbuild
    brew install zig
    
  2. Add the Linux target:

    rustup target add x86_64-unknown-linux-gnu
    
  3. Build:

    cargo zigbuild --release --target x86_64-unknown-linux-gnu
    

The binary will be at target/x86_64-unknown-linux-gnu/release/log_ingest.

License

MIT

Description
A Rust CLI tool for processing log files, including loading client signatures into a SQLite database for usage analysis.
Readme 698 KiB
Languages
Rust 100%