245 lines
8.1 KiB
Markdown
245 lines
8.1 KiB
Markdown
# log_ingest
|
|
|
|
A Rust CLI tool for ingesting and searching application log files.
|
|
|
|
## Overview
|
|
|
|
Parses application logs containing signature messages and loads them into SQLite for querying. Also provides fast text search across log files with session-aware context expansion. Designed to handle large log volumes (10GB+ per day) with parallel processing.
|
|
|
|
## Features
|
|
|
|
- Parse `signature:` messages extracting app info, device details, and feature flags
|
|
- Support for both plain `.log` and gzip compressed `.log.gz` files
|
|
- File discovery by date range using `YYYY/mm/dd` directory structure
|
|
- Batched inserts for performance with large files
|
|
- Parallel file processing for multi-day ingestion
|
|
- Indexed columns (`session_id`, `version`) for efficient queries
|
|
- Extensible parser architecture for adding new message types
|
|
- Full-text search with timestamp and message extraction
|
|
- Session-aware expanded search that follows `changeSessionId` chains and correlation IDs
|
|
- Exception search filtered by app signature
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
cargo build --release
|
|
```
|
|
|
|
## Usage
|
|
|
|
The CLI uses subcommands: `signature`, `search`, and `search-exceptions`.
|
|
|
|
### `signature` — Load log entries into SQLite
|
|
|
|
#### Process a single file
|
|
|
|
```bash
|
|
log_ingest signature --file /path/to/logs.log --output output.db
|
|
```
|
|
|
|
#### Process a date range
|
|
|
|
```bash
|
|
log_ingest signature \
|
|
--from 2026/01/20 \
|
|
--to 2026/01/21 \
|
|
--base-dir /var/log/myapp \
|
|
--filename app.log \
|
|
--output output.db
|
|
```
|
|
|
|
The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.
|
|
|
|
#### Parallel processing
|
|
|
|
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
|
|
|
|
```bash
|
|
# Use all CPU cores (default)
|
|
log_ingest signature --from 2026/01/01 --to 2026/01/31 ...
|
|
|
|
# Limit to 4 threads
|
|
log_ingest signature --threads 4 --from 2026/01/01 --to 2026/01/31 ...
|
|
|
|
# Sequential processing (disable parallelism)
|
|
log_ingest signature --threads 1 --from 2026/01/01 --to 2026/01/31 ...
|
|
```
|
|
|
|
#### Options
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `--file <PATH>` | Single log file to process |
|
|
| `--from <DATE>` | Start date (YYYY/mm/dd) |
|
|
| `--to <DATE>` | End date (YYYY/mm/dd) |
|
|
| `--base-dir <PATH>` | Base directory containing log files |
|
|
| `--filename <NAME>` | Log filename (e.g., `app.log`) |
|
|
| `-o, --output <PATH>` | Output SQLite database path |
|
|
| `--batch-size <N>` | Batch size for inserts (default: 10000) |
|
|
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
|
|
|
### `search` — Search log files for matching lines
|
|
|
|
Searches a log file for lines containing a query string and prints the timestamp and message for each match. Supports both plain `.log` and gzip `.log.gz` files, with parallel chunk-based processing for plain files.
|
|
|
|
```bash
|
|
# Basic search
|
|
log_ingest search --file /path/to/logs.log --query "NullPointerException"
|
|
|
|
# Include correlationId in output
|
|
log_ingest search --file /path/to/logs.log --query "timeout" -c
|
|
|
|
# Expanded search: find all lines sharing sessionId/correlationId with matches,
|
|
# following changeSessionId chains backward to the session start (signature line)
|
|
log_ingest search --file /path/to/logs.log --query "Exception" -e
|
|
```
|
|
|
|
Expand mode (`-e`) performs a two-pass search:
|
|
1. **Pass 1**: Scans the entire file to find matching lines and collects their session IDs and correlation IDs. Also builds a `changeSessionId` graph and tracks which sessions have signature lines.
|
|
2. **Expansion**: Follows `changeSessionId` chains backward from seed sessions, stopping at sessions that have a signature (session start boundary).
|
|
3. **Pass 2**: Re-reads the file and outputs all lines belonging to expanded session IDs or correlation IDs, stopping early when all expanded sessions are destroyed (`sessionDestroyed`).
|
|
|
|
#### Options
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `--file <PATH>` | Log file to search |
|
|
| `--query <TEXT>` | Text to search for in log lines |
|
|
| `-c, --correlation-id` | Include correlationId in output |
|
|
| `-e, --expand` | Expand results to full session context |
|
|
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
|
|
|
### `search-exceptions` — Search for exceptions filtered by app
|
|
|
|
A specialized search that finds `Exception` lines and expands them to full session context, filtered to only sessions belonging to specific apps (identified by their `signature:` line).
|
|
|
|
```bash
|
|
# Find exceptions for a specific app
|
|
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP
|
|
|
|
# Filter to multiple apps
|
|
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP --app ANOTHER_APP
|
|
```
|
|
|
|
This uses the same two-pass expand approach as `search -e`, with an additional app-filtering step that:
|
|
- Matches expanded sessions to their app via the `signature:APP/...` line
|
|
- Propagates app membership forward through `changeSessionId` chains
|
|
- Filters correlation IDs to only those originating from matching-app sessions
|
|
- Uses strict app isolation in pass 2 to prevent lines from non-matching apps leaking through shared correlation IDs
|
|
|
|
#### Options
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `--file <PATH>` | Log file to search |
|
|
| `--app <NAME>` | Filter to sessions with this app signature (repeatable) |
|
|
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
|
|
|
## Database Schema
|
|
|
|
The schema uses normalized lookup tables to minimize disk usage for large datasets.
|
|
|
|
```sql
|
|
-- Lookup tables for low-cardinality text columns
|
|
CREATE TABLE apps (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
|
CREATE TABLE versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
|
CREATE TABLE models (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
|
CREATE TABLE devices (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
|
CREATE TABLE os_versions (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
|
CREATE TABLE app_names (id INTEGER PRIMARY KEY, name TEXT NOT NULL UNIQUE);
|
|
|
|
-- Main table with foreign keys and millisecond timestamp
|
|
CREATE TABLE signature_entries (
|
|
id INTEGER PRIMARY KEY,
|
|
session_id TEXT NOT NULL,
|
|
timestamp_ms INTEGER NOT NULL, -- Unix epoch milliseconds
|
|
app_id INTEGER NOT NULL REFERENCES apps(id),
|
|
version_id INTEGER NOT NULL REFERENCES versions(id),
|
|
offline_login_usage INTEGER,
|
|
is_password_autofill_enabled INTEGER,
|
|
camera_roll_usage INTEGER,
|
|
os_id INTEGER REFERENCES os_versions(id),
|
|
app_name_id INTEGER REFERENCES app_names(id),
|
|
touch_id INTEGER,
|
|
is_offline_login_enabled INTEGER,
|
|
model_id INTEGER REFERENCES models(id),
|
|
device_id INTEGER REFERENCES devices(id),
|
|
password_autofill_usage INTEGER
|
|
);
|
|
|
|
CREATE INDEX idx_session_id ON signature_entries(session_id);
|
|
CREATE INDEX idx_timestamp ON signature_entries(timestamp_ms);
|
|
CREATE INDEX idx_version ON signature_entries(version_id);
|
|
```
|
|
|
|
## Example Queries
|
|
|
|
```sql
|
|
-- Percentage of users with password autofill enabled
|
|
SELECT
|
|
ROUND(100.0 * SUM(is_password_autofill_enabled) / COUNT(*), 2) as pct
|
|
FROM signature_entries;
|
|
|
|
-- Count by app version
|
|
SELECT v.name as version, COUNT(*) as cnt
|
|
FROM signature_entries se
|
|
JOIN versions v ON se.version_id = v.id
|
|
GROUP BY v.name
|
|
ORDER BY cnt DESC;
|
|
|
|
-- Device breakdown
|
|
SELECT d.name as device, COUNT(*) as cnt
|
|
FROM signature_entries se
|
|
JOIN devices d ON se.device_id = d.id
|
|
GROUP BY d.name;
|
|
|
|
-- Convert timestamp_ms to readable datetime
|
|
SELECT
|
|
datetime(timestamp_ms / 1000, 'unixepoch') as timestamp,
|
|
session_id
|
|
FROM signature_entries
|
|
LIMIT 10;
|
|
```
|
|
|
|
## Development
|
|
|
|
```bash
|
|
# Build
|
|
cargo build
|
|
|
|
# Run tests
|
|
cargo test
|
|
|
|
# Format
|
|
cargo fmt
|
|
|
|
# Lint
|
|
cargo clippy
|
|
```
|
|
|
|
## Cross-Compilation
|
|
|
|
To build a Linux x86_64 binary from macOS:
|
|
|
|
1. Install [cargo-zigbuild](https://github.com/rust-cross/cargo-zigbuild) and [Zig](https://ziglang.org/):
|
|
```bash
|
|
cargo install cargo-zigbuild
|
|
brew install zig
|
|
```
|
|
|
|
2. Add the Linux target:
|
|
```bash
|
|
rustup target add x86_64-unknown-linux-gnu
|
|
```
|
|
|
|
3. Build:
|
|
```bash
|
|
cargo zigbuild --release --target x86_64-unknown-linux-gnu
|
|
```
|
|
|
|
The binary will be at `target/x86_64-unknown-linux-gnu/release/log_ingest`.
|
|
|
|
## License
|
|
|
|
MIT
|