Parallel processing
This commit is contained in:
17
README.md
17
README.md
@@ -12,6 +12,7 @@ Parses application logs containing signature messages and loads them into SQLite
|
||||
- Support for both plain `.log` and gzip compressed `.log.gz` files
|
||||
- File discovery by date range using `YYYY/mm/dd` directory structure
|
||||
- Batched inserts for performance with large files
|
||||
- Parallel file processing for multi-day ingestion
|
||||
- Indexed columns (`session_id`, `version`) for efficient queries
|
||||
- Extensible parser architecture for adding new message types
|
||||
|
||||
@@ -42,6 +43,21 @@ log_ingest \
|
||||
|
||||
The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-dir>/YYYY/MM/DD/<filename>` for each day in the range.
|
||||
|
||||
### Parallel processing
|
||||
|
||||
When processing multiple files, parsing runs in parallel by default using all available CPU cores. A single writer thread handles database inserts to avoid SQLite contention.
|
||||
|
||||
```bash
|
||||
# Use all CPU cores (default)
|
||||
log_ingest --from 2026/01/01 --to 2026/01/31 ...
|
||||
|
||||
# Limit to 4 threads
|
||||
log_ingest --threads 4 --from 2026/01/01 --to 2026/01/31 ...
|
||||
|
||||
# Sequential processing (disable parallelism)
|
||||
log_ingest --threads 1 --from 2026/01/01 --to 2026/01/31 ...
|
||||
```
|
||||
|
||||
### Options
|
||||
|
||||
| Option | Description |
|
||||
@@ -53,6 +69,7 @@ The tool will look for files at `<base-dir>/YYYY/MM/DD/<filename>.gz` or `<base-
|
||||
| `--filename <NAME>` | Log filename (e.g., `app.log`) |
|
||||
| `-o, --output <PATH>` | Output SQLite database path |
|
||||
| `--batch-size <N>` | Batch size for inserts (default: 10000) |
|
||||
| `--threads <N>` | Number of parallel threads (0 = all cores, 1 = sequential) |
|
||||
|
||||
## Database Schema
|
||||
|
||||
|
||||
Reference in New Issue
Block a user