# log_ingest A Rust CLI tool for loading log files into a SQLite database for analysis. ## Overview Parses application logs containing signature messages and loads them into SQLite for querying. Designed to handle large log volumes (10GB+ per day) with batched inserts and efficient parsing. ## Features - Parse `signature:` messages extracting app info, device details, and feature flags - Support for both plain `.log` and gzip compressed `.log.gz` files - File discovery by date range using `YYYY/mm/dd` directory structure - Batched inserts for performance with large files - Parallel file processing for multi-day ingestion - Indexed columns (`session_id`, `version`) for efficient queries - Extensible parser architecture for adding new message types ## Installation ```bash cargo build --release ``` ## Usage ### Process a single file ```bash log_ingest --file /path/to/logs.log --output output.db ``` ### Process a date range ```bash log_ingest \ --from 2026/01/20 \ --to 2026/01/21 \ --base-dir /var/log/myapp \ --filename app.log \ --output output.db ``` The tool will look for files at `/YYYY/MM/DD/.gz` or `/YYYY/MM/DD/` for each day in the range. ### Parallel processing When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention. ```bash # Use all CPU cores (default) log_ingest --from 2026/01/01 --to 2026/01/31 ... # Limit to 4 threads log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ... # Sequential processing (disable parallelism) log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ... ``` ### Options | Option | Description | |--------|-------------| | `--file ` | Single log file to process | | `--from ` | Start date (YYYY/mm/dd) | | `--to ` | End date (YYYY/mm/dd) | | `--base-dir ` | Base directory containing log files | | `--filename ` | Log filename (e.g., `app.log`) | | `-o, --output ` | Output SQLite database path | | `--batch-size ` | Batch size for inserts (default: 10000) | | `--threads ` | Number of parallel threads (0 = all cores, 1 = sequential) | ## Database Schema The schema uses normalized lookup tables to minimize disk usage for large datasets. ```sql -- Lookup tables for low-cardinality text columns CREATE TABLE apps (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE models (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE devices (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE os_versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); CREATE TABLE app_names (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE); -- Main table with foreign keys and millisecond timestamp CREATE TABLE signature_entries ( id INTEGER PRIMARY KEY, session_id TEXT NOT NULL, timestamp_ms INTEGER NOT NULL, -- Unix epoch milliseconds app_id INTEGER NOT NULL REFERENCES apps(id), version_id INTEGER NOT NULL REFERENCES versions(id), offline_login_usage INTEGER, is_password_autofill_enabled INTEGER, camera_roll_usage INTEGER, os_id INTEGER REFERENCES os_versions(id), app_name_id INTEGER REFERENCES app_names(id), touch_id INTEGER, is_offline_login_enabled INTEGER, model_id INTEGER REFERENCES models(id), device_id INTEGER REFERENCES devices(id), password_autofill_usage INTEGER ); CREATE INDEX idx_session_id ON signature_entries(session_id); CREATE INDEX idx_timestamp ON signature_entries(timestamp_ms); CREATE INDEX idx_version ON signature_entries(version_id); ``` ## Example Queries ```sql -- Percentage of users with password autofill enabled SELECT ROUND(100.0 * SUM(is_password_autofill_enabled) / COUNT(*), 2) as pct FROM signature_entries; -- Count by app version SELECT v.name as version, COUNT(*) as cnt FROM signature_entries se JOIN versions v ON se.version_id = v.id GROUP BY v.name ORDER BY cnt DESC; -- Device breakdown SELECT d.name as device, COUNT(*) as cnt FROM signature_entries se JOIN devices d ON se.device_id = d.id GROUP BY d.name; -- Convert timestamp_ms to readable datetime SELECT datetime(timestamp_ms / 1000, 'unixepoch') as timestamp, session_id FROM signature_entries LIMIT 10; ``` ## Development ```bash # Build cargo build # Run tests cargo test # Format cargo fmt # Lint cargo clippy ``` ## Cross-Compilation To build a Linux x86_64 binary from macOS: 1. Install [cargo-zigbuild](https://github.com/rust-cross/cargo-zigbuild) and [Zig](https://ziglang.org/): ```bash cargo install cargo-zigbuild brew install zig ``` 2. Add the Linux target: ```bash rustup target add x86_64-unknown-linux-gnu ``` 3. Build: ```bash cargo zigbuild --release --target x86_64-unknown-linux-gnu ``` The binary will be at `target/x86_64-unknown-linux-gnu/release/log_ingest`. ## License MIT