Parallel processing

This commit is contained in:
2026-01-22 00:15:25 +01:00
parent 2d9f6eaa98
commit 946d0184a1
5 changed files with 256 additions and 17 deletions

View File

@@ -12,6 +12,7 @@ Parses application logs containing signature messages and loads them into SQLite
- Support for both plain `.log` and gzip compressed `.log.gz` files
- File discovery by date range using `YYYY/mm/dd` directory structure
- Batched inserts for performance with large files
- Parallel file processing for multi-day ingestion
- Indexed columns (`session_id`, `version`) for efficient queries
- Extensible parser architecture for adding new message types
@@ -42,6 +43,21 @@ log_ingest \
The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.
### Parallel processing
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
```bash
# Use all CPU cores (default)
log_ingest --from 2026/01/01 --to 2026/01/31 ...
# Limit to 4 threads
log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
# Sequential processing (disable parallelism)
log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
```
### Options
| Option | Description |
@@ -53,6 +69,7 @@ The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-
| `--filename <NAME>` | Log filename (e.g., `app.log`) |
| `-o, --output <PATH>` | Output SQLite database path |
| `--batch-size <N>` | Batch size for inserts (default: 10000) |
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
## Database Schema