Add sample_random_scans script and first-page list-scans option

- scripts/sample_random_scans.sh: pick a random scan per machine (default: first list page) and download mosaic and/or tiles
- --list-scans-first-page-only: one HTTP request for scan list (up to 320 IDs)
- scripts/machines.example.txt; .gitignore local machines.txt (copy from example)
- README: document usage
This commit is contained in:
2026-04-26 20:56:52 -04:00
parent 08a29d124a
commit 4118e6e4f0
6 changed files with 236 additions and 7 deletions
+9
View File
@@ -81,6 +81,9 @@ python scraper.py --list-machines
# List all scans for a machine
python scraper.py --list-scans --machine "BW3-20 [AMR-26]"
# List only the first table page (one HTTP call; up to 320 — newest/first per server order)
python scraper.py --list-scans --list-scans-first-page-only --machine "BW3-20 [AMR-26]"
# Preview what would be downloaded (dry run)
python scraper.py --machine "BW3-20 [AMR-26]" --dry-run
@@ -94,6 +97,11 @@ python scraper.py --machine "BW3-20 [AMR-26]" --mosaic-only
# Download mosaics for all machines
python scraper.py --mosaic-only
# One random completed scan per machine: mosaic + all tiles (from machines.txt; uses --list-scans + --scan-id)
# MOSAIC_ONLY=1 ./scripts/sample_random_scans.sh machines.txt # optional: mosaics only, no tiles
# cp scripts/machines.example.txt machines.txt # then edit: one label per line
# ./scripts/sample_random_scans.sh machines.txt
# Download all tiles for a specific scan
python scraper.py --machine "BW3-20 [AMR-26]" --scan-id 158374 --workers 4
@@ -115,6 +123,7 @@ python scraper.py --machine "BW3-20 [AMR-26]" --scan-id 158374 --workers 4
| `--recheck` | Scan archive for zero-byte/missing tiles and mosaics; remove bad entries from `.progress.json` so they re-download on next run |
| `--list-machines` | Print all machines and exit |
| `--list-scans` | Print all scans for `--machine` and exit |
| `--list-scans-first-page-only` | With `--list-scans`: a single list request (up to 320 scans) instead of paginating the full history |
| `--verbose` / `-v` | Debug logging |
### `config.yaml` (optional keys)