Parallel processing

2026-01-22 00:15:25 +01:00
parent 2d9f6eaa98
commit 946d0184a1
5 changed files with 256 additions and 17 deletions
--- a/README.md
+++ b/README.md
@@ -12,6 +12,7 @@ Parses application logs containing signature messages and loads them into SQLite
 - Support for both plain `.log` and gzip compressed `.log.gz` files
 - File discovery by date range using `YYYY/mm/dd` directory structure
 - Batched inserts for performance with large files
+- Parallel file processing for multi-day ingestion
 - Indexed columns (`session_id`, `version`) for efficient queries
 - Extensible parser architecture for adding new message types

@@ -42,6 +43,21 @@ log_ingest \

 The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.

+### Parallel processing
+
+When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
+
+```bash
+# Use all CPU cores (default)
+log_ingest --from 2026/01/01 --to 2026/01/31 ...
+
+# Limit to 4 threads
+log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
+
+# Sequential processing (disable parallelism)
+log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
+```
+
 ### Options

 | Option | Description |
@@ -53,6 +69,7 @@ The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-
 | `--filename <NAME>` | Log filename (e.g., `app.log`) |
 | `-o, --output <PATH>` | Output SQLite database path |
 | `--batch-size <N>` | Batch size for inserts (default: 10000) |
+| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |

 ## Database Schema