Commit Graph

11 Commits

Author SHA1 Message Date
poprhythm 6390f5d529 Scraping resilience, metadata tooling, and repository hygiene
Consolidates mosaic and session hardening (login retry, skip processed scans, no retry on 404, started_at), progress reporting (Markdown tables, by-year rollup, rolling-window rate/ETA), and metadata workflow scripts (run_metadata_scan.sh, scan_progress_report.py, export_machine_metadata.py). Adds mosaic reconstruction sample JPEGs referenced by the report. Updates .gitignore for backup/ and .claude/; sample_random_scans helper is documented for branch testing/sample-runs only (see README).
2026-05-14 19:52:53 -04:00
poprhythm 752c278dff Added skip logic - Based on random sampling, when disk_space_mb=0, it is safe to entirely skip it 2026-05-10 21:20:56 -04:00
poprhythm 5a7fdd820b Update mosaic URL formatting to zero-pad scan IDs to 6 digits 2026-05-09 22:06:14 -04:00
poprhythm 3ac383f5c1 Update mosaic reconstruction report with enhanced scan details and pixel dimensions 2026-04-27 14:06:49 -04:00
poprhythm 4118e6e4f0 Add sample_random_scans script and first-page list-scans option
- scripts/sample_random_scans.sh: pick a random scan per machine (default: first list page) and download mosaic and/or tiles
- --list-scans-first-page-only: one HTTP request for scan list (up to 320 IDs)
- scripts/machines.example.txt; .gitignore local machines.txt (copy from example)
- README: document usage
2026-04-26 20:56:52 -04:00
poprhythm 08a29d124a Add offline mosaic EXIF tagging (stitch --write-exif, tag_mosaic_exif CLI)
- spruce.exif: tag_mosaic_jpeg_for_scan_dir, resolve_machine_label_for_scan_dir; ProcessingSoftware for tile-stitched mosaics
- spruce.settings: load_config(require_credentials=False) for config without login
- scripts/tag_mosaic_exif.py and tests; stitch script --write-exif path
2026-04-26 20:47:23 -04:00
poprhythm 314b68322c Add script to stitch tiles into a mosaic, with gutter/padding support
- scripts/stitch_mosaic_from_tiles.py: grid layout from metadata.json, flip-x/y, tile gap (gutters), compare to server mosaic.jpg
- tests/test_stitch_mosaic.py, Pillow in requirements, docs/mosaic_reconstruction_report.md
2026-04-26 20:44:56 -04:00
poprhythm ae37c06f15 Enhance CSV metadata with error tracking for mosaics and tiles 2026-04-25 16:06:54 -04:00
poprhythm e8d3bf7180 Add EXIF writing and machine metadata support 2026-04-24 18:21:37 -04:00
poprhythm f2193011ca Add --metadata-only mode; harden resume and idempotency
- Add --metadata-only flag: fetches scan detail pages, writes
  metadata.json + scans.csv rows, skips all image downloads.
  Re-runs skip scans whose metadata.json already exists.
- Atomic progress.json saves (temp-file rename).
- Heal-on-resume: tiles on disk but not in progress are silently
  re-marked before building the pending list.
- scans.csv dedup: skip row if mosaic URL already in progress.
- Rename mosaic_downloaded -> mosaic_on_disk (reflects disk state).
- --recheck now checks mosaics as well as tiles.
- RunStats dataclass replaces raw int return; richer run summary.
- Fix argparse allow_abbrev reverted; fix --scan-id + --metadata-only
  glob fallback when scan_time is absent.
- Add .venv/ to .gitignore.
- README: fix typo, update worker counts, document all new behaviour.
2026-04-24 09:44:57 -04:00
poprhythm e122f6435a Initial commit
Add spruce scraper with CLI, session management, parsers, progress tracking,
recheck logic, and test suite. Includes example config and README.
2026-04-22 10:41:18 -04:00