Add --metadata-only mode; harden resume and idempotency

- Add --metadata-only flag: fetches scan detail pages, writes
  metadata.json + scans.csv rows, skips all image downloads.
  Re-runs skip scans whose metadata.json already exists.
- Atomic progress.json saves (temp-file rename).
- Heal-on-resume: tiles on disk but not in progress are silently
  re-marked before building the pending list.
- scans.csv dedup: skip row if mosaic URL already in progress.
- Rename mosaic_downloaded -> mosaic_on_disk (reflects disk state).
- --recheck now checks mosaics as well as tiles.
- RunStats dataclass replaces raw int return; richer run summary.
- Fix argparse allow_abbrev reverted; fix --scan-id + --metadata-only
  glob fallback when scan_time is absent.
- Add .venv/ to .gitignore.
- README: fix typo, update worker counts, document all new behaviour.
This commit is contained in:
2026-04-24 09:44:57 -04:00
parent e122f6435a
commit f2193011ca
8 changed files with 294 additions and 93 deletions
+3 -1
View File
@@ -57,9 +57,11 @@ class ProgressTracker:
def save(self) -> None:
self.path.parent.mkdir(parents=True, exist_ok=True)
self.path.write_text(
tmp = self.path.with_suffix(".json.tmp")
tmp.write_text(
json.dumps({"completed_urls": sorted(self._done)}, indent=2)
)
tmp.replace(self.path) # atomic on POSIX; avoids corrupt JSON on crash
class CsvWriter: