Add search, including specialized search for exceptions

This commit is contained in:
2026-02-21 01:21:30 +01:00
parent 7e03af23de
commit c01b9ba97e
11 changed files with 1748 additions and 367 deletions

View File

@@ -1,10 +1,10 @@
# log_ingest
A Rust CLI tool for loading log files into a SQLite database for analysis.
A Rust CLI tool for ingesting and searching application log files.
## Overview
Parses application logs containing signature messages and loads them into SQLite for querying. Designed to handle large log volumes (10GB+ per day) with batched inserts and efficient parsing.
Parses application logs containing signature messages and loads them into SQLite for querying. Also provides fast text search across log files with session-aware context expansion. Designed to handle large log volumes (10GB+ per day) with parallel processing.
## Features
@@ -15,6 +15,9 @@ Parses application logs containing signature messages and loads them into SQLite
- Parallel file processing for multi-day ingestion
- Indexed columns (`session_id`, `version`) for efficient queries
- Extensible parser architecture for adding new message types
- Full-text search with timestamp and message extraction
- Session-aware expanded search that follows `changeSessionId` chains and correlation IDs
- Exception search filtered by app signature
## Installation
@@ -24,16 +27,20 @@ cargo build --release
## Usage
### Process a single file
The CLI uses subcommands: `signature`, `search`, and `search-exceptions`.
### `signature` — Load log entries into SQLite
#### Process a single file
```bash
log_ingest --file /path/to/logs.log --output output.db
log_ingest signature --file /path/to/logs.log --output output.db
```
### Process a date range
#### Process a date range
```bash
log_ingest \
log_ingest signature \
--from 2026/01/20 \
--to 2026/01/21 \
--base-dir /var/log/myapp \
@@ -43,22 +50,22 @@ log_ingest \
The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.
### Parallel processing
#### Parallel processing
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
```bash
# Use all CPU cores (default)
log_ingest --from 2026/01/01 --to 2026/01/31 ...
log_ingest signature --from 2026/01/01 --to 2026/01/31 ...
# Limit to 4 threads
log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
log_ingest signature --threads 4 --from 2026/01/01 --to 2026/01/31 ...
# Sequential processing (disable parallelism)
log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
log_ingest signature --threads 1 --from 2026/01/01 --to 2026/01/31 ...
```
### Options
#### Options
| Option | Description |
|--------|-------------|
@@ -71,6 +78,63 @@ log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
| `--batch-size <N>` | Batch size for inserts (default: 10000) |
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
### `search` — Search log files for matching lines
Searches a log file for lines containing a query string and prints the timestamp and message for each match. Supports both plain `.log` and gzip `.log.gz` files, with parallel chunk-based processing for plain files.
```bash
# Basic search
log_ingest search --file /path/to/logs.log --query "NullPointerException"
# Include correlationId in output
log_ingest search --file /path/to/logs.log --query "timeout" -c
# Expanded search: find all lines sharing sessionId/correlationId with matches,
# following changeSessionId chains backward to the session start (signature line)
log_ingest search --file /path/to/logs.log --query "Exception" -e
```
Expand mode (`-e`) performs a two-pass search:
1. **Pass 1**: Scans the entire file to find matching lines and collects their session IDs and correlation IDs. Also builds a `changeSessionId` graph and tracks which sessions have signature lines.
2. **Expansion**: Follows `changeSessionId` chains backward from seed sessions, stopping at sessions that have a signature (session start boundary).
3. **Pass 2**: Re-reads the file and outputs all lines belonging to expanded session IDs or correlation IDs, stopping early when all expanded sessions are destroyed (`sessionDestroyed`).
#### Options
| Option | Description |
|--------|-------------|
| `--file <PATH>` | Log file to search |
| `--query <TEXT>` | Text to search for in log lines |
| `-c, --correlation-id` | Include correlationId in output |
| `-e, --expand` | Expand results to full session context |
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
### `search-exceptions` — Search for exceptions filtered by app
A specialized search that finds `Exception` lines and expands them to full session context, filtered to only sessions belonging to specific apps (identified by their `signature:` line).
```bash
# Find exceptions for a specific app
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP
# Filter to multiple apps
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP --app ANOTHER_APP
```
This uses the same two-pass expand approach as `search -e`, with an additional app-filtering step that:
- Matches expanded sessions to their app via the `signature:APP/...` line
- Propagates app membership forward through `changeSessionId` chains
- Filters correlation IDs to only those originating from matching-app sessions
- Uses strict app isolation in pass 2 to prevent lines from non-matching apps leaking through shared correlation IDs
#### Options
| Option | Description |
|--------|-------------|
| `--file <PATH>` | Log file to search |
| `--app <NAME>` | Filter to sessions with this app signature (repeatable) |
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
## Database Schema
The schema uses normalized lookup tables to minimize disk usage for large datasets.