Files
SPRUCE-scraper/docs/sample_random_scans_run_progress.md
T

51 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# `sample_random_scans.sh` run progress (checkpoint)
Snapshot from terminal session **9** (repo: `/Users/igt/Documents/spruce_scraper`), as of when the machine was about to be restarted. **Date:** 2026-04-26.
## Active run (incomplete)
A **full scan** was in progress: **mosaic + all tiles** (worker count from `config.yaml`), with scan listing using **`--list-scans-first-page-only`** (one page, up to 320 scan IDs, uniform random choice among that page).
| Item | Value |
|------|--------|
| Script | `./scripts/sample_random_scans.sh` |
| Machines file | `machines.txt` (12 machines) |
| Config | `config.yaml` |
| State files | `archives/scans.csv`, `archives/tiles.csv`, `archives/.progress.json` |
### Where it stopped
The run was on **step [9/12]**, machine **BW3-17 [AMR-20]**, **scan ID 153772**.
- **Mosaic:** HTTP **404** for `…/RootView_Database/153772/mosaic.jpg` (same pattern as other scans: tiles still available).
- **Tiles:** **33784** total; progress bar showed roughly **5%** completed — last log line observed was on the order of **~1736 / 33784** tiles (exact count advances continuously; re-check `archives/.progress.json` or resume to see current).
**Not yet started** in this full-scan pass: steps **[10/12][12/12]**: **BW3-19 [AMR-21]**, **BW3-20 [AMR-26]**, **BW3-21 [AMR-17]** (lines 1214 of `machines.txt`).
### Skipped machine in this pass
- **[4/12] BW2-8 [AMR-25]:** `SKIPPED``scraper.py --list-scans --list-scans-first-page-only` exited with **code 1** (could not get scan list or pick an ID). The script continued with the next machine.
### Completed machines in this full-scan pass (steps 13, 58)
| Step | Machine | Scan ID | Mosaic | Tiles downloaded |
|------|---------|---------|--------|------------------|
| 1 | BW1-4 [AMR-15] | 71478 | 404 | 56 |
| 2 | BW1-6 [AMR-19] | 156875 | saved | 72 |
| 3 | BW1-7 [AMR-18] | 10837 | 404 | 1170 |
| 4 | BW2-8 [AMR-25] | — | — | skipped |
| 5 | BW2-10 [AMR-22] | 146368 | saved | 156 |
| 6 | BW2-11 [AMR-23] | 160022 | saved | 529 |
| 7 | BW2-13 [AMR-24] | 156957 | saved | 143 |
| 8 | BW3-16 [AMR-16] | 77300 | 404 | 400 |
## After restart
1. `cd` to the repo and activate the same venv as before.
2. Re-run **`./scripts/sample_random_scans.sh`** with the **same mode** (full scan — default if that is what you used). The scraper **resumes** from `archives/.progress.json` and will continue **BW3-17** scan **153772** (remaining tiles) before moving to later machines, unless you change options or data manually.
## Other runs in the same log (for context)
- Earlier **`DRY_FLAG[@]: unbound variable`** errors from the script were fixed in later invocations.
- A **mosaic-only** pass over all 12 machines completed with banner: *12 machine(s) with mosaic step completed, 0 skipped* (random scan per machine from the first page of IDs). That is a **separate** completed run from the **in-progress full scan** above.