4.9 KiB
log_ingest
A Rust CLI tool for loading log files into a SQLite database for analysis.
Overview
Parses application logs containing signature messages and loads them into SQLite for querying. Designed to handle large log volumes (10GB+ per day) with batched inserts and efficient parsing.
Features
- Parse
signature:messages extracting app info, device details, and feature flags - Support for both plain
.logand gzip compressed.log.gzfiles - File discovery by date range using
YYYY/mm/dddirectory structure - Batched inserts for performance with large files
- Parallel file processing for multi-day ingestion
- Indexed columns (
session_id,version) for efficient queries - Extensible parser architecture for adding new message types
Installation
cargo build --release
Usage
Process a single file
log_ingest --file /path/to/logs.log --output output.db
Process a date range
log_ingest \
--from 2026/01/20 \
--to 2026/01/21 \
--base-dir /var/log/myapp \
--filename app.log \
--output output.db
The tool will look for files at <base-dir>/YYYY/MM/DD/<filename>.gz or <base-dir>/YYYY/MM/DD/<filename> for each day in the range.
Parallel processing
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
# Use all CPU cores (default)
log_ingest --from 2026/01/01 --to 2026/01/31 ...
# Limit to 4 threads
log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
# Sequential processing (disable parallelism)
log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
Options
| Option | Description |
|---|---|
--file <PATH> |
Single log file to process |
--from <DATE> |
Start date (YYYY/mm/dd) |
--to <DATE> |
End date (YYYY/mm/dd) |
--base-dir <PATH> |
Base directory containing log files |
--filename <NAME> |
Log filename (e.g., app.log) |
-o, --output <PATH> |
Output SQLite database path |
--batch-size <N> |
Batch size for inserts (default: 10000) |
--threads <N> |
Number of parallel threads (0 = all cores, 1 = sequential) |
Database Schema
The schema uses normalized lookup tables to minimize disk usage for large datasets.
-- Lookup tables for low-cardinality text columns
CREATE TABLE apps (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE models (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE devices (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE os_versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
CREATE TABLE app_names (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
-- Main table with foreign keys and millisecond timestamp
CREATE TABLE signature_entries (
session_id TEXT NOT NULL,
timestamp_ms INTEGER NOT NULL, -- Unix epoch milliseconds
app_id INTEGER NOT NULL REFERENCES apps(id),
version_id INTEGER NOT NULL REFERENCES versions(id),
offline_login_usage INTEGER,
is_password_autofill_enabled INTEGER,
camera_roll_usage INTEGER,
os_id INTEGER REFERENCES os_versions(id),
app_name_id INTEGER REFERENCES app_names(id),
touch_id INTEGER,
is_offline_login_enabled INTEGER,
model_id INTEGER REFERENCES models(id),
device_id INTEGER REFERENCES devices(id),
password_autofill_usage INTEGER,
PRIMARY KEY (session_id, timestamp_ms)
) WITHOUT ROWID;
CREATE INDEX idx_session_id ON signature_entries(session_id);
CREATE INDEX idx_version ON signature_entries(version_id);
Example Queries
-- Percentage of users with password autofill enabled
SELECT
ROUND(100.0 * SUM(is_password_autofill_enabled) / COUNT(*), 2) as pct
FROM signature_entries;
-- Count by app version
SELECT v.name as version, COUNT(*) as cnt
FROM signature_entries se
JOIN versions v ON se.version_id = v.id
GROUP BY v.name
ORDER BY cnt DESC;
-- Device breakdown
SELECT d.name as device, COUNT(*) as cnt
FROM signature_entries se
JOIN devices d ON se.device_id = d.id
GROUP BY d.name;
-- Convert timestamp_ms to readable datetime
SELECT
datetime(timestamp_ms / 1000, 'unixepoch') as timestamp,
session_id
FROM signature_entries
LIMIT 10;
Development
# Build
cargo build
# Run tests
cargo test
# Format
cargo fmt
# Lint
cargo clippy
Cross-Compilation
To build a Linux x86_64 binary from macOS:
-
Install cargo-zigbuild and Zig:
cargo install cargo-zigbuild brew install zig -
Add the Linux target:
rustup target add x86_64-unknown-linux-gnu -
Build:
cargo zigbuild --release --target x86_64-unknown-linux-gnu
The binary will be at target/x86_64-unknown-linux-gnu/release/log_ingest.
License
MIT