Add log ingestion tool for loading signature logs into SQLite
- Parse signature messages from log files extracting app info, device details, and feature flags (autofill, touchID, offline login, etc.) - Support both plain .log and gzip compressed .log.gz files - File discovery by date range (YYYY/mm/dd directory structure) - Batch inserts for performance with large files (10GB+ per day) - Index on session_id and version for efficient queries - Extensible parser architecture via MessageParser trait - Parallel file processing for multi-day ingestion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
180
README.md
Normal file
180
README.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# log_ingest
|
||||
|
||||
A Rust CLI tool for loading log files into a SQLite database for analysis.
|
||||
|
||||
## Overview
|
||||
|
||||
Parses application logs containing signature messages and loads them into SQLite for querying. Designed to handle large log volumes (10GB+ per day) with batched inserts and efficient parsing.
|
||||
|
||||
## Features
|
||||
|
||||
- Parse `signature:` messages extracting app info, device details, and feature flags
|
||||
- Support for both plain `.log` and gzip compressed `.log.gz` files
|
||||
- File discovery by date range using `YYYY/mm/dd` directory structure
|
||||
- Batched inserts for performance with large files
|
||||
- Parallel file processing for multi-day ingestion
|
||||
- Indexed columns (`session_id`, `version`) for efficient queries
|
||||
- Extensible parser architecture for adding new message types
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cargo build --release
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Process a single file
|
||||
|
||||
```bash
|
||||
log_ingest --file /path/to/logs.log --output output.db
|
||||
```
|
||||
|
||||
### Process a date range
|
||||
|
||||
```bash
|
||||
log_ingest \
|
||||
--from 2026/01/20 \
|
||||
--to 2026/01/21 \
|
||||
--base-dir /var/log/myapp \
|
||||
--filename app.log \
|
||||
--output output.db
|
||||
```
|
||||
|
||||
The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.
|
||||
|
||||
### Parallel processing
|
||||
|
||||
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
|
||||
|
||||
```bash
|
||||
# Use all CPU cores (default)
|
||||
log_ingest --from 2026/01/01 --to 2026/01/31 ...
|
||||
|
||||
# Limit to 4 threads
|
||||
log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
|
||||
|
||||
# Sequential processing (disable parallelism)
|
||||
log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--file <PATH>` | Single log file to process |
|
||||
| `--from <DATE>` | Start date (YYYY/mm/dd) |
|
||||
| `--to <DATE>` | End date (YYYY/mm/dd) |
|
||||
| `--base-dir <PATH>` | Base directory containing log files |
|
||||
| `--filename <NAME>` | Log filename (e.g., `app.log`) |
|
||||
| `-o, --output <PATH>` | Output SQLite database path |
|
||||
| `--batch-size <N>` | Batch size for inserts (default: 10000) |
|
||||
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
||||
|
||||
## Database Schema
|
||||
|
||||
The schema uses normalized lookup tables to minimize disk usage for large datasets.
|
||||
|
||||
```sql
|
||||
-- Lookup tables for low-cardinality text columns
|
||||
CREATE TABLE apps (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
||||
CREATE TABLE versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
||||
CREATE TABLE models (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
||||
CREATE TABLE devices (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
||||
CREATE TABLE os_versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
||||
CREATE TABLE app_names (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
||||
|
||||
-- Main table with foreign keys and millisecond timestamp
|
||||
CREATE TABLE signature_entries (
|
||||
id INTEGER PRIMARY KEY,
|
||||
session_id TEXT NOT NULL,
|
||||
timestamp_ms INTEGER NOT NULL, -- Unix epoch milliseconds
|
||||
app_id INTEGER NOT NULL REFERENCES apps(id),
|
||||
version_id INTEGER NOT NULL REFERENCES versions(id),
|
||||
offline_login_usage INTEGER,
|
||||
is_password_autofill_enabled INTEGER,
|
||||
camera_roll_usage INTEGER,
|
||||
os_id INTEGER REFERENCES os_versions(id),
|
||||
app_name_id INTEGER REFERENCES app_names(id),
|
||||
touch_id INTEGER,
|
||||
is_offline_login_enabled INTEGER,
|
||||
model_id INTEGER REFERENCES models(id),
|
||||
device_id INTEGER REFERENCES devices(id),
|
||||
password_autofill_usage INTEGER
|
||||
);
|
||||
|
||||
CREATE INDEX idx_session_id ON signature_entries(session_id);
|
||||
CREATE INDEX idx_timestamp ON signature_entries(timestamp_ms);
|
||||
CREATE INDEX idx_version ON signature_entries(version_id);
|
||||
```
|
||||
|
||||
## Example Queries
|
||||
|
||||
```sql
|
||||
-- Percentage of users with password autofill enabled
|
||||
SELECT
|
||||
ROUND(100.0 * SUM(is_password_autofill_enabled) / COUNT(*), 2) as pct
|
||||
FROM signature_entries;
|
||||
|
||||
-- Count by app version
|
||||
SELECT v.name as version, COUNT(*) as cnt
|
||||
FROM signature_entries se
|
||||
JOIN versions v ON se.version_id = v.id
|
||||
GROUP BY v.name
|
||||
ORDER BY cnt DESC;
|
||||
|
||||
-- Device breakdown
|
||||
SELECT d.name as device, COUNT(*) as cnt
|
||||
FROM signature_entries se
|
||||
JOIN devices d ON se.device_id = d.id
|
||||
GROUP BY d.name;
|
||||
|
||||
-- Convert timestamp_ms to readable datetime
|
||||
SELECT
|
||||
datetime(timestamp_ms / 1000, 'unixepoch') as timestamp,
|
||||
session_id
|
||||
FROM signature_entries
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Build
|
||||
cargo build
|
||||
|
||||
# Run tests
|
||||
cargo test
|
||||
|
||||
# Format
|
||||
cargo fmt
|
||||
|
||||
# Lint
|
||||
cargo clippy
|
||||
```
|
||||
|
||||
## Cross-Compilation
|
||||
|
||||
To build a Linux x86_64 binary from macOS:
|
||||
|
||||
1. Install [cargo-zigbuild](https://github.com/rust-cross/cargo-zigbuild) and [Zig](https://ziglang.org/):
|
||||
```bash
|
||||
cargo install cargo-zigbuild
|
||||
brew install zig
|
||||
```
|
||||
|
||||
2. Add the Linux target:
|
||||
```bash
|
||||
rustup target add x86_64-unknown-linux-gnu
|
||||
```
|
||||
|
||||
3. Build:
|
||||
```bash
|
||||
cargo zigbuild --release --target x86_64-unknown-linux-gnu
|
||||
```
|
||||
|
||||
The binary will be at `target/x86_64-unknown-linux-gnu/release/log_ingest`.
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user