Files
SPRUCE-scraper/docs/sample_random_scans_run_progress.md
T

2.9 KiB
Raw Blame History

sample_random_scans.sh run progress (checkpoint)

Snapshot from terminal session 9 (repo: /Users/igt/Documents/spruce_scraper), as of when the machine was about to be restarted. Date: 2026-04-26.

Active run (incomplete)

A full scan was in progress: mosaic + all tiles (worker count from config.yaml), with scan listing using --list-scans-first-page-only (one page, up to 320 scan IDs, uniform random choice among that page).

Item Value
Script ./scripts/sample_random_scans.sh
Machines file machines.txt (12 machines)
Config config.yaml
State files archives/scans.csv, archives/tiles.csv, archives/.progress.json

Where it stopped

The run was on step [9/12], machine BW3-17 [AMR-20], scan ID 153772.

  • Mosaic: HTTP 404 for …/RootView_Database/153772/mosaic.jpg (same pattern as other scans: tiles still available).
  • Tiles: 33784 total; progress bar showed roughly 5% completed — last log line observed was on the order of ~1736 / 33784 tiles (exact count advances continuously; re-check archives/.progress.json or resume to see current).

Not yet started in this full-scan pass: steps [10/12][12/12]: BW3-19 [AMR-21], BW3-20 [AMR-26], BW3-21 [AMR-17] (lines 1214 of machines.txt).

Skipped machine in this pass

  • [4/12] BW2-8 [AMR-25]: SKIPPEDscraper.py --list-scans --list-scans-first-page-only exited with code 1 (could not get scan list or pick an ID). The script continued with the next machine.

Completed machines in this full-scan pass (steps 13, 58)

Step Machine Scan ID Mosaic Tiles downloaded
1 BW1-4 [AMR-15] 71478 404 56
2 BW1-6 [AMR-19] 156875 saved 72
3 BW1-7 [AMR-18] 10837 404 1170
4 BW2-8 [AMR-25] skipped
5 BW2-10 [AMR-22] 146368 saved 156
6 BW2-11 [AMR-23] 160022 saved 529
7 BW2-13 [AMR-24] 156957 saved 143
8 BW3-16 [AMR-16] 77300 404 400

After restart

  1. cd to the repo and activate the same venv as before.
  2. Re-run ./scripts/sample_random_scans.sh with the same mode (full scan — default if that is what you used). The scraper resumes from archives/.progress.json and will continue BW3-17 scan 153772 (remaining tiles) before moving to later machines, unless you change options or data manually.

Other runs in the same log (for context)

  • Earlier DRY_FLAG[@]: unbound variable errors from the script were fixed in later invocations.
  • A mosaic-only pass over all 12 machines completed with banner: 12 machine(s) with mosaic step completed, 0 skipped (random scan per machine from the first page of IDs). That is a separate completed run from the in-progress full scan above.