7 Commits

Author SHA1 Message Date
poprhythm 8593808cf3 Add --retry-failed mode and mosaic retry estimates to progress report
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 17:40:15 -04:00
poprhythm 752c278dff Added skip logic - Based on random sampling, when disk_space_mb=0, it is safe to entirely skip it 2026-05-10 21:20:56 -04:00
poprhythm 4118e6e4f0 Add sample_random_scans script and first-page list-scans option
- scripts/sample_random_scans.sh: pick a random scan per machine (default: first list page) and download mosaic and/or tiles
- --list-scans-first-page-only: one HTTP request for scan list (up to 320 IDs)
- scripts/machines.example.txt; .gitignore local machines.txt (copy from example)
- README: document usage
2026-04-26 20:56:52 -04:00
poprhythm ae37c06f15 Enhance CSV metadata with error tracking for mosaics and tiles 2026-04-25 16:06:54 -04:00
poprhythm e8d3bf7180 Add EXIF writing and machine metadata support 2026-04-24 18:21:37 -04:00
poprhythm f2193011ca Add --metadata-only mode; harden resume and idempotency
- Add --metadata-only flag: fetches scan detail pages, writes
  metadata.json + scans.csv rows, skips all image downloads.
  Re-runs skip scans whose metadata.json already exists.
- Atomic progress.json saves (temp-file rename).
- Heal-on-resume: tiles on disk but not in progress are silently
  re-marked before building the pending list.
- scans.csv dedup: skip row if mosaic URL already in progress.
- Rename mosaic_downloaded -> mosaic_on_disk (reflects disk state).
- --recheck now checks mosaics as well as tiles.
- RunStats dataclass replaces raw int return; richer run summary.
- Fix argparse allow_abbrev reverted; fix --scan-id + --metadata-only
  glob fallback when scan_time is absent.
- Add .venv/ to .gitignore.
- README: fix typo, update worker counts, document all new behaviour.
2026-04-24 09:44:57 -04:00
poprhythm e122f6435a Initial commit
Add spruce scraper with CLI, session management, parsers, progress tracking,
recheck logic, and test suite. Includes example config and README.
2026-04-22 10:41:18 -04:00