# log_ingest A Rust CLI tool for ingesting and searching application log files. ## Overview Parses application logs containing signature messages and loads them into SQLite for querying. Also provides fast text search across log files with session-aware context expansion. Designed to handle large log volumes (10GB+ per day) with parallel processing. ## Features - Parse `signature:` messages extracting app info, device details, and feature flags - Support for both plain `.log` and gzip compressed `.log.gz` files - File discovery by date range using `YYYY/mm/dd` directory structure - Batched inserts for performance with large files - Parallel file processing for multi-day ingestion - Indexed columns (`session_id`, `version`) for efficient queries - Extensible parser architecture for adding new message types - Full-text search with timestamp and message extraction - Session-aware expanded search that follows `changeSessionId` chains and correlation IDs - Exception search filtered by app signature ## Installation ```bash cargo build --release ``` ## Usage The CLI uses subcommands: `signature`, `search`, and `search-exceptions`. ### `signature` — Load log entries into SQLite #### Process a single file ```bash log_ingest signature --file /path/to/logs.log --output output.db ``` #### Process a date range ```bash log_ingest signature \ --from 2026/01/20 \ --to 2026/01/21 \ --base-dir /var/log/myapp \ --filename app.log \ --output output.db ``` The tool will look for files at `/YYYY/MM/DD/.gz` or `/YYYY/MM/DD/` for each day in the range. #### Parallel processing When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention. ```bash # Use all CPU cores (default) log_ingest signature --from 2026/01/01 --to 2026/01/31 ... # Limit to 4 threads log_ingest signature --threads 4 --from 2026/01/01 --to 2026/01/31 ... # Sequential processing (disable parallelism) log_ingest signature --threads 1 --from 2026/01/01 --to 2026/01/31 ... ``` #### Options | Option | Description | |--------|-------------| | `--file ` | Single log file to process | | `--from ` | Start date (YYYY/mm/dd) | | `--to ` | End date (YYYY/mm/dd) | | `--base-dir ` | Base directory containing log files | | `--filename ` | Log filename (e.g., `app.log`) | | `-o, --output ` | Output SQLite database path | | `--batch-size ` | Batch size for inserts (default: 10000) | | `--threads ` | Number of parallel threads (0 = all cores, 1 = sequential) | ### `search` — Search log files for matching lines Searches a log file for lines containing a query string and prints the timestamp and message for each match. Supports both plain `.log` and gzip `.log.gz` files, with parallel chunk-based processing for plain files. ```bash # Basic search log_ingest search --file /path/to/logs.log --query "NullPointerException" # Include correlationId in output log_ingest search --file /path/to/logs.log --query "timeout" -c # Expanded search: find all lines sharing sessionId/correlationId with matches, # following changeSessionId chains backward to the session start (signature line) log_ingest search --file /path/to/logs.log --query "Exception" -e ``` Expand mode (`-e`) performs a two-pass search: 1. **Pass 1**: Scans the entire file to find matching lines and collects their session IDs and correlation IDs. Also builds a `changeSessionId` graph and tracks which sessions have signature lines. 2. **Expansion**: Follows `changeSessionId` chains backward from seed sessions, stopping at sessions that have a signature (session start boundary). 3. **Pass 2**: Re-reads the file and outputs all lines belonging to expanded session IDs or correlation IDs, stopping early when all expanded sessions are destroyed (`sessionDestroyed`). #### Options | Option | Description | |--------|-------------| | `--file ` | Log file to search | | `--query ` | Text to search for in log lines | | `-c, --correlation-id` | Include correlationId in output | | `-e, --expand` | Expand results to full session context | | `--threads ` | Number of parallel threads (0 = all cores, 1 = sequential) | ### `search-exceptions` — Search for exceptions filtered by app A specialized search that finds `Exception` lines and expands them to full session context, filtered to only sessions belonging to specific apps (identified by their `signature:` line). ```bash # Find exceptions for a specific app log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP # Filter to multiple apps log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP --app ANOTHER_APP ``` This uses the same two-pass expand approach as `search -e`, with an additional app-filtering step that: - Matches expanded sessions to their app via the `signature:APP/...` line - Propagates app membership forward through `changeSessionId` chains - Filters correlation IDs to only those originating from matching-app sessions - Uses strict app isolation in pass 2 to prevent lines from non-matching apps leaking through shared correlation IDs #### Options | Option | Description | |--------|-------------| | `--file ` | Log file to search | | `--app ` | Filter to sessions with this app signature (repeatable) | | `--threads ` | Number of parallel threads (0 = all cores, 1 = sequential) | ## Database Schema The schema uses normalized lookup tables to minimize disk usage for large datasets. ```sql -- Lookup tables for low-cardinality text columns CREATE TABLE apps (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE models (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE devices (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE os_versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE app_names (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); -- Main table with foreign keys and millisecond timestamp CREATE TABLE signature_entries ( id INTEGER PRIMARY KEY, session_id TEXT NOT NULL, timestamp_ms INTEGER NOT NULL, -- Unix epoch milliseconds app_id INTEGER NOT NULL REFERENCES apps(id), version_id INTEGER NOT NULL REFERENCES versions(id), offline_login_usage INTEGER, is_password_autofill_enabled INTEGER, camera_roll_usage INTEGER, os_id INTEGER REFERENCES os_versions(id), app_name_id INTEGER REFERENCES app_names(id), touch_id INTEGER, is_offline_login_enabled INTEGER, model_id INTEGER REFERENCES models(id), device_id INTEGER REFERENCES devices(id), password_autofill_usage INTEGER ); CREATE INDEX idx_session_id ON signature_entries(session_id); CREATE INDEX idx_timestamp ON signature_entries(timestamp_ms); CREATE INDEX idx_version ON signature_entries(version_id); ``` ## Example Queries ```sql -- Percentage of users with password autofill enabled SELECT ROUND(100.0 * SUM(is_password_autofill_enabled) / COUNT(*), 2) as pct FROM signature_entries; -- Count by app version SELECT v.name as version, COUNT(*) as cnt FROM signature_entries se JOIN versions v ON se.version_id = v.id GROUP BY v.name ORDER BY cnt DESC; -- Device breakdown SELECT d.name as device, COUNT(*) as cnt FROM signature_entries se JOIN devices d ON se.device_id = d.id GROUP BY d.name; -- Convert timestamp_ms to readable datetime SELECT datetime(timestamp_ms / 1000, 'unixepoch') as timestamp, session_id FROM signature_entries LIMIT 10; ``` ## Development ```bash # Build cargo build # Run tests cargo test # Format cargo fmt # Lint cargo clippy ``` ## Cross-Compilation To build a Linux x86_64 binary from macOS: 1. Install [cargo-zigbuild](https://github.com/rust-cross/cargo-zigbuild) and [Zig](https://ziglang.org/): ```bash cargo install cargo-zigbuild brew install zig ``` 2. Add the Linux target: ```bash rustup target add x86_64-unknown-linux-gnu ``` 3. Build: ```bash cargo zigbuild --release --target x86_64-unknown-linux-gnu ``` The binary will be at `target/x86_64-unknown-linux-gnu/release/log_ingest`. ## License MIT