Files
log_ingest/CLAUDE.md
Alexandr Mansurov 169409738f Add log ingestion tool for loading signature logs into SQLite
- Parse signature messages from log files extracting app info, device
  details, and feature flags (autofill, touchID, offline login, etc.)
- Support both plain .log and gzip compressed .log.gz files
- File discovery by date range (YYYY/mm/dd directory structure)
- Batch inserts for performance with large files (10GB+ per day)
- Index on session_id and version for efficient queries
- Extensible parser architecture via MessageParser trait
- Parallel file processing for multi-day ingestion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 03:14:54 +01:00

89 lines
3.0 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Summary
Log ingestion tool that parses application log files and loads them into SQLite for analysis. Primary use case is analyzing mobile app telemetry - tracking feature adoption (autofill, touchID, offline login), app versions, device models, etc.
### What it does
- Parses log lines containing `signature:` messages with app/device telemetry
- Extracts: sessionId, timestamp, app name, version, OS, model, device, and feature flags
- Converts yes/no to booleans, parses numeric usage counters
- Loads into SQLite with indexes on session_id and version for efficient queries
### Log format
Logs are stored in `<base_dir>/YYYY/MM/DD/<filename>` structure, either as `.log` (plain) or `.log.gz` (gzip compressed). A single day can exceed 10GB.
Example signature message:
```
msg="signature:XAMARIN_APP/5.23.0/ details:offlineLoginUsage:0,isPasswordAutofillEnabled:no,cameraRollUsage:0,OS:26.2.0,appName:App,touchID:yes,isOfflineLoginEnabled:yes,model:iPhone15,3,device:iOS, Apple,passwordAutofillUsage:0 user-agent:..."
```
### Extensibility
The parser uses a `MessageParser` trait allowing new message types to be added. Currently only `signature:` messages are parsed; other message types are skipped.
## Commands
### Build & Run
```bash
# Build
cargo build
# Release build
cargo build --release
# Run with single file
cargo run -- --file /path/to/logs.log --output output.db
# Run with date range
cargo run -- --from 2026/01/20 --to 2026/01/21 --base-dir /var/log/app --filename app.log --output output.db
```
### Testing
```bash
# Run all tests
cargo test
# Fast test execution with cargo-nextest (recommended)
cargo nextest run
# Run a single test
cargo test test_name
```
### Quality Checks
```bash
# Format check
cargo fmt -- --check
# Apply formatting
cargo fmt
# Static analysis with Clippy
cargo clippy
```
## Architecture
### Source Files
- **src/main.rs**: CLI argument parsing (clap), orchestrates file discovery and processing
- **src/parser.rs**: Log line parsing, `MessageParser` trait, `SignatureParser` implementation
- **src/db.rs**: SQLite schema creation, batched inserts
- **src/files.rs**: File discovery by date range, handles .gz and plain files
### Database Schema
Table `signature_entries` with columns: session_id, timestamp, app, version, offline_login_usage, is_password_autofill_enabled, camera_roll_usage, os, app_name, touch_id, is_offline_login_enabled, model, device, password_autofill_usage.
Indexes on `session_id` and `version`.
### Key Design Decisions
- Batched inserts (default 10k rows per transaction) for performance
- Regex-based parsing with lazy static compilation
- Extensible via `MessageParser` trait + `ParsedMessage` enum
## CI/CD Configuration
- **ci.yaml**: Formatting, Clippy, build, and tests
- **audit.yaml**: Security audit for dependencies
- **release.yaml**: Automated release on tag push (cross-platform builds via GoReleaser)