Add search, including specialized search for exceptions
This commit is contained in:
86
README.md
86
README.md
@@ -1,10 +1,10 @@
|
||||
# log_ingest
|
||||
|
||||
A Rust CLI tool for loading log files into a SQLite database for analysis.
|
||||
A Rust CLI tool for ingesting and searching application log files.
|
||||
|
||||
## Overview
|
||||
|
||||
Parses application logs containing signature messages and loads them into SQLite for querying. Designed to handle large log volumes (10GB+ per day) with batched inserts and efficient parsing.
|
||||
Parses application logs containing signature messages and loads them into SQLite for querying. Also provides fast text search across log files with session-aware context expansion. Designed to handle large log volumes (10GB+ per day) with parallel processing.
|
||||
|
||||
## Features
|
||||
|
||||
@@ -15,6 +15,9 @@ Parses application logs containing signature messages and loads them into SQLite
|
||||
- Parallel file processing for multi-day ingestion
|
||||
- Indexed columns (`session_id`, `version`) for efficient queries
|
||||
- Extensible parser architecture for adding new message types
|
||||
- Full-text search with timestamp and message extraction
|
||||
- Session-aware expanded search that follows `changeSessionId` chains and correlation IDs
|
||||
- Exception search filtered by app signature
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -24,16 +27,20 @@ cargo build --release
|
||||
|
||||
## Usage
|
||||
|
||||
### Process a single file
|
||||
The CLI uses subcommands: `signature`, `search`, and `search-exceptions`.
|
||||
|
||||
### `signature` — Load log entries into SQLite
|
||||
|
||||
#### Process a single file
|
||||
|
||||
```bash
|
||||
log_ingest --file /path/to/logs.log --output output.db
|
||||
log_ingest signature --file /path/to/logs.log --output output.db
|
||||
```
|
||||
|
||||
### Process a date range
|
||||
#### Process a date range
|
||||
|
||||
```bash
|
||||
log_ingest \
|
||||
log_ingest signature \
|
||||
--from 2026/01/20 \
|
||||
--to 2026/01/21 \
|
||||
--base-dir /var/log/myapp \
|
||||
@@ -43,22 +50,22 @@ log_ingest \
|
||||
|
||||
The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.
|
||||
|
||||
### Parallel processing
|
||||
#### Parallel processing
|
||||
|
||||
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
|
||||
|
||||
```bash
|
||||
# Use all CPU cores (default)
|
||||
log_ingest --from 2026/01/01 --to 2026/01/31 ...
|
||||
log_ingest signature --from 2026/01/01 --to 2026/01/31 ...
|
||||
|
||||
# Limit to 4 threads
|
||||
log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
|
||||
log_ingest signature --threads 4 --from 2026/01/01 --to 2026/01/31 ...
|
||||
|
||||
# Sequential processing (disable parallelism)
|
||||
log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
|
||||
log_ingest signature --threads 1 --from 2026/01/01 --to 2026/01/31 ...
|
||||
```
|
||||
|
||||
### Options
|
||||
#### Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
@@ -71,6 +78,63 @@ log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
|
||||
| `--batch-size <N>` | Batch size for inserts (default: 10000) |
|
||||
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
||||
|
||||
### `search` — Search log files for matching lines
|
||||
|
||||
Searches a log file for lines containing a query string and prints the timestamp and message for each match. Supports both plain `.log` and gzip `.log.gz` files, with parallel chunk-based processing for plain files.
|
||||
|
||||
```bash
|
||||
# Basic search
|
||||
log_ingest search --file /path/to/logs.log --query "NullPointerException"
|
||||
|
||||
# Include correlationId in output
|
||||
log_ingest search --file /path/to/logs.log --query "timeout" -c
|
||||
|
||||
# Expanded search: find all lines sharing sessionId/correlationId with matches,
|
||||
# following changeSessionId chains backward to the session start (signature line)
|
||||
log_ingest search --file /path/to/logs.log --query "Exception" -e
|
||||
```
|
||||
|
||||
Expand mode (`-e`) performs a two-pass search:
|
||||
1. **Pass 1**: Scans the entire file to find matching lines and collects their session IDs and correlation IDs. Also builds a `changeSessionId` graph and tracks which sessions have signature lines.
|
||||
2. **Expansion**: Follows `changeSessionId` chains backward from seed sessions, stopping at sessions that have a signature (session start boundary).
|
||||
3. **Pass 2**: Re-reads the file and outputs all lines belonging to expanded session IDs or correlation IDs, stopping early when all expanded sessions are destroyed (`sessionDestroyed`).
|
||||
|
||||
#### Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--file <PATH>` | Log file to search |
|
||||
| `--query <TEXT>` | Text to search for in log lines |
|
||||
| `-c, --correlation-id` | Include correlationId in output |
|
||||
| `-e, --expand` | Expand results to full session context |
|
||||
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
||||
|
||||
### `search-exceptions` — Search for exceptions filtered by app
|
||||
|
||||
A specialized search that finds `Exception` lines and expands them to full session context, filtered to only sessions belonging to specific apps (identified by their `signature:` line).
|
||||
|
||||
```bash
|
||||
# Find exceptions for a specific app
|
||||
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP
|
||||
|
||||
# Filter to multiple apps
|
||||
log_ingest search-exceptions --file /path/to/logs.log --app XAMARIN_APP --app ANOTHER_APP
|
||||
```
|
||||
|
||||
This uses the same two-pass expand approach as `search -e`, with an additional app-filtering step that:
|
||||
- Matches expanded sessions to their app via the `signature:APP/...` line
|
||||
- Propagates app membership forward through `changeSessionId` chains
|
||||
- Filters correlation IDs to only those originating from matching-app sessions
|
||||
- Uses strict app isolation in pass 2 to prevent lines from non-matching apps leaking through shared correlation IDs
|
||||
|
||||
#### Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--file <PATH>` | Log file to search |
|
||||
| `--app <NAME>` | Filter to sessions with this app signature (repeatable) |
|
||||
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
||||
|
||||
## Database Schema
|
||||
|
||||
The schema uses normalized lookup tables to minimize disk usage for large datasets.
|
||||
|
||||
Reference in New Issue
Block a user