Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 93236582a0 | |||
| 4b06ab4516 | |||
| 9f341ea27d | |||
| 1334cfaf92 | |||
| 1e1695a27d | |||
| 3a3083b4e3 | |||
| fa4a60843c | |||
| e3f0c07119 | |||
| 0cd7243c8d | |||
| 752c278dff | |||
| 5a7fdd820b |
@@ -7,3 +7,6 @@ __pycache__/
|
||||
.DS_Store
|
||||
explore_dumps/
|
||||
.venv/
|
||||
scripts/sync_to_nas.sh
|
||||
backup/
|
||||
.claude/
|
||||
|
||||
@@ -97,10 +97,8 @@ python scraper.py --machine "BW3-20 [AMR-26]" --mosaic-only
|
||||
# Download mosaics for all machines
|
||||
python scraper.py --mosaic-only
|
||||
|
||||
# One random completed scan per machine: mosaic + all tiles (from machines.txt; uses --list-scans + --scan-id)
|
||||
# MOSAIC_ONLY=1 ./scripts/sample_random_scans.sh machines.txt # optional: mosaics only, no tiles
|
||||
# cp scripts/machines.example.txt machines.txt # then edit: one label per line
|
||||
# ./scripts/sample_random_scans.sh machines.txt
|
||||
# One random completed scan per machine (helper script): check out branch `testing/sample-runs`,
|
||||
# then see `scripts/sample_random_scans.sh` and `docs/sample_random_scans_run_progress.md`.
|
||||
|
||||
# Download all tiles for a specific scan
|
||||
python scraper.py --machine "BW3-20 [AMR-26]" --scan-id 158374 --workers 4
|
||||
|
||||
|
After Width: | Height: | Size: 53 KiB |
|
After Width: | Height: | Size: 13 MiB |
|
After Width: | Height: | Size: 41 KiB |
|
After Width: | Height: | Size: 37 KiB |
|
After Width: | Height: | Size: 7.6 MiB |
|
After Width: | Height: | Size: 50 KiB |
|
After Width: | Height: | Size: 58 KiB |
|
After Width: | Height: | Size: 15 MiB |
|
After Width: | Height: | Size: 52 KiB |
|
After Width: | Height: | Size: 218 KiB |
|
After Width: | Height: | Size: 50 MiB |
|
After Width: | Height: | Size: 62 KiB |
@@ -0,0 +1,50 @@
|
||||
# `sample_random_scans.sh` run progress (checkpoint)
|
||||
|
||||
Snapshot from terminal session **9** (repo: `/Users/igt/Documents/spruce_scraper`), as of when the machine was about to be restarted. **Date:** 2026-04-26.
|
||||
|
||||
## Active run (incomplete)
|
||||
|
||||
A **full scan** was in progress: **mosaic + all tiles** (worker count from `config.yaml`), with scan listing using **`--list-scans-first-page-only`** (one page, up to 320 scan IDs, uniform random choice among that page).
|
||||
|
||||
| Item | Value |
|
||||
|------|--------|
|
||||
| Script | `./scripts/sample_random_scans.sh` |
|
||||
| Machines file | `machines.txt` (12 machines) |
|
||||
| Config | `config.yaml` |
|
||||
| State files | `archives/scans.csv`, `archives/tiles.csv`, `archives/.progress.json` |
|
||||
|
||||
### Where it stopped
|
||||
|
||||
The run was on **step [9/12]**, machine **BW3-17 [AMR-20]**, **scan ID 153772**.
|
||||
|
||||
- **Mosaic:** HTTP **404** for `…/RootView_Database/153772/mosaic.jpg` (same pattern as other scans: tiles still available).
|
||||
- **Tiles:** **33784** total; progress bar showed roughly **5%** completed — last log line observed was on the order of **~1736 / 33784** tiles (exact count advances continuously; re-check `archives/.progress.json` or resume to see current).
|
||||
|
||||
**Not yet started** in this full-scan pass: steps **[10/12]–[12/12]**: **BW3-19 [AMR-21]**, **BW3-20 [AMR-26]**, **BW3-21 [AMR-17]** (lines 12–14 of `machines.txt`).
|
||||
|
||||
### Skipped machine in this pass
|
||||
|
||||
- **[4/12] BW2-8 [AMR-25]:** `SKIPPED` — `scraper.py --list-scans --list-scans-first-page-only` exited with **code 1** (could not get scan list or pick an ID). The script continued with the next machine.
|
||||
|
||||
### Completed machines in this full-scan pass (steps 1–3, 5–8)
|
||||
|
||||
| Step | Machine | Scan ID | Mosaic | Tiles downloaded |
|
||||
|------|---------|---------|--------|------------------|
|
||||
| 1 | BW1-4 [AMR-15] | 71478 | 404 | 56 |
|
||||
| 2 | BW1-6 [AMR-19] | 156875 | saved | 72 |
|
||||
| 3 | BW1-7 [AMR-18] | 10837 | 404 | 1170 |
|
||||
| 4 | BW2-8 [AMR-25] | — | — | skipped |
|
||||
| 5 | BW2-10 [AMR-22] | 146368 | saved | 156 |
|
||||
| 6 | BW2-11 [AMR-23] | 160022 | saved | 529 |
|
||||
| 7 | BW2-13 [AMR-24] | 156957 | saved | 143 |
|
||||
| 8 | BW3-16 [AMR-16] | 77300 | 404 | 400 |
|
||||
|
||||
## After restart
|
||||
|
||||
1. `cd` to the repo and activate the same venv as before.
|
||||
2. Re-run **`./scripts/sample_random_scans.sh`** with the **same mode** (full scan — default if that is what you used). The scraper **resumes** from `archives/.progress.json` and will continue **BW3-17** scan **153772** (remaining tiles) before moving to later machines, unless you change options or data manually.
|
||||
|
||||
## Other runs in the same log (for context)
|
||||
|
||||
- Earlier **`DRY_FLAG[@]: unbound variable`** errors from the script were fixed in later invocations.
|
||||
- A **mosaic-only** pass over all 12 machines completed with banner: *12 machine(s) with mosaic step completed, 0 skipped* (random scan per machine from the first page of IDs). That is a **separate** completed run from the **in-progress full scan** above.
|
||||
@@ -0,0 +1,301 @@
|
||||
bucket,machine,scan_id,scan_dir
|
||||
zero,BW3-19 [AMR-21],141127,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-12\141127
|
||||
zero,BW2-8 [AMR-25],22778,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2019-11-10\22778
|
||||
zero,BW1-6 [AMR-19],93870,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-11-23\93870
|
||||
zero,BW3-19 [AMR-21],140121,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-27\140121
|
||||
zero,BW3-19 [AMR-21],144191,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-21\144191
|
||||
zero,BW3-19 [AMR-21],144426,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-23\144426
|
||||
zero,BW3-19 [AMR-21],144659,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-26\144659
|
||||
zero,BW2-13 [AMR-24],120923,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2022-12-10\120923
|
||||
zero,BW3-19 [AMR-21],140154,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-27\140154
|
||||
zero,BW2-8 [AMR-25],23645,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2019-11-17\23645
|
||||
zero,BW3-19 [AMR-21],140792,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-07\140792
|
||||
zero,BW3-19 [AMR-21],140125,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-27\140125
|
||||
zero,BW3-19 [AMR-21],141927,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-22\141927
|
||||
zero,BW2-8 [AMR-25],118438,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2022-10-30\118438
|
||||
zero,BW3-19 [AMR-21],141575,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-18\141575
|
||||
zero,BW3-19 [AMR-21],142951,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-04\142951
|
||||
zero,BW1-6 [AMR-19],90874,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-10-21\90874
|
||||
zero,BW1-6 [AMR-19],91489,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-10-27\91489
|
||||
zero,BW2-8 [AMR-25],44836,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\unknown\44836
|
||||
zero,BW3-19 [AMR-21],144692,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-26\144692
|
||||
zero,BW3-19 [AMR-21],144584,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-25\144584
|
||||
zero,BW3-19 [AMR-21],142238,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-26\142238
|
||||
zero,BW3-19 [AMR-21],141485,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-17\141485
|
||||
zero,BW1-6 [AMR-19],92123,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-11-02\92123
|
||||
zero,BW3-19 [AMR-21],141805,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-20\141805
|
||||
zero,BW3-19 [AMR-21],144856,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-29\144856
|
||||
zero,BW3-19 [AMR-21],140325,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-29\140325
|
||||
zero,BW3-19 [AMR-21],141026,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-11\141026
|
||||
zero,BW3-19 [AMR-21],140419,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-30\140419
|
||||
zero,BW3-19 [AMR-21],142969,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-04\142969
|
||||
zero,BW3-19 [AMR-21],144681,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-26\144681
|
||||
zero,BW3-19 [AMR-21],142677,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-01\142677
|
||||
zero,BW3-19 [AMR-21],141584,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-18\141584
|
||||
zero,BW3-19 [AMR-21],144159,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-19\144159
|
||||
zero,BW3-19 [AMR-21],139494,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-19\139494
|
||||
zero,BW1-6 [AMR-19],99248,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2022-02-03\99248
|
||||
zero,BW3-19 [AMR-21],139969,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-24\139969
|
||||
zero,BW3-19 [AMR-21],139511,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-19\139511
|
||||
zero,BW3-17 [AMR-20],153019,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2024-03-25\153019
|
||||
zero,BW3-19 [AMR-21],140463,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-10-01\140463
|
||||
zero,BW3-19 [AMR-21],143587,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-12\143587
|
||||
zero,BW3-17 [AMR-20],153493,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2024-04-01\153493
|
||||
zero,BW3-19 [AMR-21],144727,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-28\144727
|
||||
zero,BW3-19 [AMR-21],139946,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-24\139946
|
||||
zero,BW3-19 [AMR-21],143612,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-12\143612
|
||||
zero,BW2-8 [AMR-25],83393,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2021-07-18\83393
|
||||
zero,BW3-19 [AMR-21],143288,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-09\143288
|
||||
zero,BW2-8 [AMR-25],23902,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2019-11-19\23902
|
||||
zero,BW3-19 [AMR-21],143445,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-11-10\143445
|
||||
zero,BW3-19 [AMR-21],140154,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-09-27\140154
|
||||
tiny,BW2-13 [AMR-24],26852,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2019-12-15\26852
|
||||
tiny,BW2-13 [AMR-24],140181,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2023-09-28\140181
|
||||
tiny,BW1-6 [AMR-19],114819,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2022-09-16\114819
|
||||
tiny,BW3-21 [AMR-17],97824,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2022-01-15\97824
|
||||
tiny,BW3-21 [AMR-17],52014,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2020-08-27\52014
|
||||
tiny,BW2-8 [AMR-25],127445,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2023-03-30\127445
|
||||
tiny,BW3-19 [AMR-21],48940,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-07-24\48940
|
||||
tiny,BW1-6 [AMR-19],87810,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-09-19\87810
|
||||
tiny,BW3-21 [AMR-17],43092,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2020-05-14\43092
|
||||
tiny,BW2-13 [AMR-24],113334,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2022-08-18\113334
|
||||
tiny,BW3-19 [AMR-21],59127,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-11-12\59127
|
||||
tiny,BW3-21 [AMR-17],25737,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2019-12-05\25737
|
||||
tiny,BW2-10 [AMR-22],61950,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2020-12-10\61950
|
||||
tiny,BW1-6 [AMR-19],93265,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-11-13\93265
|
||||
tiny,BW1-6 [AMR-19],113849,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2022-09-02\113849
|
||||
tiny,BW2-11 [AMR-23],124373,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-02-21\124373
|
||||
tiny,BW2-13 [AMR-24],120371,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2022-11-29\120371
|
||||
tiny,BW1-6 [AMR-19],87277,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-09-14\87277
|
||||
tiny,BW2-11 [AMR-23],122855,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-02-03\122855
|
||||
tiny,BW1-6 [AMR-19],69086,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-02-21\69086
|
||||
tiny,BW3-19 [AMR-21],47993,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-07-15\47993
|
||||
tiny,BW2-13 [AMR-24],125103,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2023-03-02\125103
|
||||
tiny,BW3-21 [AMR-17],103344,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2022-03-25\103344
|
||||
tiny,BW3-19 [AMR-21],57723,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-10-23\57723
|
||||
tiny,BW2-8 [AMR-25],79195,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2021-06-06\79195
|
||||
tiny,BW3-19 [AMR-21],54692,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-09-19\54692
|
||||
tiny,BW3-16 [AMR-16],30599,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2020-01-19\30599
|
||||
tiny,BW2-11 [AMR-23],130942,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-05-19\130942
|
||||
tiny,BW2-13 [AMR-24],138601,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2023-09-07\138601
|
||||
tiny,BW1-6 [AMR-19],92258,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-11-03\92258
|
||||
tiny,BW2-8 [AMR-25],23181,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2019-11-13\23181
|
||||
tiny,BW3-21 [AMR-17],53547,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2020-09-09\53547
|
||||
tiny,BW2-13 [AMR-24],155307,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2024-04-28\155307
|
||||
tiny,BW2-8 [AMR-25],72356,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2021-03-27\72356
|
||||
tiny,BW3-21 [AMR-17],95618,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2021-12-16\95618
|
||||
tiny,BW3-19 [AMR-21],48393,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-07-18\48393
|
||||
tiny,BW2-13 [AMR-24],130075,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2023-05-04\130075
|
||||
tiny,BW3-21 [AMR-17],39758,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2020-04-14\39758
|
||||
tiny,BW2-11 [AMR-23],126894,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-03-23\126894
|
||||
tiny,BW2-13 [AMR-24],82264,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2021-07-07\82264
|
||||
tiny,BW1-6 [AMR-19],99228,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2022-02-03\99228
|
||||
tiny,BW2-11 [AMR-23],124000,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-02-17\124000
|
||||
tiny,BW1-4 [AMR-15],46063,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-06-18\46063
|
||||
tiny,BW2-13 [AMR-24],93211,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2021-11-13\93211
|
||||
tiny,BW3-20 [AMR-26],87312,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2021-09-14\87312
|
||||
tiny,BW2-13 [AMR-24],131348,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2023-05-25\131348
|
||||
tiny,BW1-6 [AMR-19],94711,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-12-03\94711
|
||||
tiny,BW2-11 [AMR-23],129519,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-04-23\129519
|
||||
tiny,BW3-21 [AMR-17],32767,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2020-02-08\32767
|
||||
tiny,BW2-13 [AMR-24],93571,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2021-11-19\93571
|
||||
small,BW2-11 [AMR-23],158199,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2024-07-21\158199
|
||||
small,BW3-19 [AMR-21],96770,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2021-12-31\96770
|
||||
small,BW2-13 [AMR-24],47488,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2020-07-09\47488
|
||||
small,BW3-19 [AMR-21],152767,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2024-03-21\152767
|
||||
small,BW2-10 [AMR-22],129800,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-04-27\129800
|
||||
small,BW2-11 [AMR-23],114702,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2022-09-15\114702
|
||||
small,BW2-10 [AMR-22],135212,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-07-21\135212
|
||||
small,BW2-11 [AMR-23],136572,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-08-09\136572
|
||||
small,BW3-20 [AMR-26],145231,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2023-12-03\145231
|
||||
small,BW3-19 [AMR-21],32001,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-02-01\32001
|
||||
small,BW2-11 [AMR-23],116124,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2022-10-01\116124
|
||||
small,BW3-20 [AMR-26],120928,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2022-12-10\120928
|
||||
small,BW3-16 [AMR-16],56581,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2020-10-08\56581
|
||||
small,BW3-20 [AMR-26],123441,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2023-02-10\123441
|
||||
small,BW2-13 [AMR-24],41468,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2020-04-29\41468
|
||||
small,BW2-11 [AMR-23],19698,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2019-10-13\19698
|
||||
small,BW2-11 [AMR-23],154592,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2024-04-18\154592
|
||||
small,BW2-10 [AMR-22],137156,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-08-16\137156
|
||||
small,BW3-19 [AMR-21],85449,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2021-08-20\85449
|
||||
small,BW3-19 [AMR-21],102824,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2022-03-19\102824
|
||||
small,BW1-6 [AMR-19],54986,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2020-09-22\54986
|
||||
small,BW1-6 [AMR-19],135364,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2023-07-23\135364
|
||||
small,BW1-6 [AMR-19],28609,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2020-01-01\28609
|
||||
small,BW2-10 [AMR-22],115991,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-09-29\115991
|
||||
small,BW3-20 [AMR-26],28596,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-01-01\28596
|
||||
small,BW2-10 [AMR-22],106310,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-04-28\106310
|
||||
small,BW3-16 [AMR-16],65871,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2021-01-19\65871
|
||||
small,BW3-20 [AMR-26],103751,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2022-03-29\103751
|
||||
small,BW1-6 [AMR-19],118031,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2022-10-26\118031
|
||||
small,BW2-13 [AMR-24],112247,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2022-07-20\112247
|
||||
small,BW2-13 [AMR-24],118274,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2022-10-28\118274
|
||||
small,BW3-20 [AMR-26],104298,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2022-04-03\104298
|
||||
small,BW3-19 [AMR-21],130200,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2023-05-09\130200
|
||||
small,BW3-19 [AMR-21],59385,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-11-15\59385
|
||||
small,BW2-11 [AMR-23],132767,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-06-14\132767
|
||||
small,BW3-20 [AMR-26],152753,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2024-03-21\152753
|
||||
small,BW1-4 [AMR-15],31573,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-01-28\31573
|
||||
small,BW1-6 [AMR-19],21993,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2019-11-03\21993
|
||||
small,BW3-19 [AMR-21],34801,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-02-27\34801
|
||||
small,BW2-11 [AMR-23],108563,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2022-05-22\108563
|
||||
small,BW3-21 [AMR-17],15863,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2019-09-08\15863
|
||||
small,BW2-11 [AMR-23],38719,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2020-04-03\38719
|
||||
small,BW1-6 [AMR-19],26196,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2019-12-10\26196
|
||||
small,BW2-11 [AMR-23],90722,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2021-10-19\90722
|
||||
small,BW3-16 [AMR-16],47187,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2020-07-04\47187
|
||||
small,BW2-10 [AMR-22],110531,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-06-16\110531
|
||||
small,BW3-16 [AMR-16],11297,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2015-11-01\11297
|
||||
small,BW1-4 [AMR-15],43503,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-05-17\43503
|
||||
small,BW2-11 [AMR-23],115701,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2022-09-25\115701
|
||||
small,BW3-19 [AMR-21],95504,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2021-12-14\95504
|
||||
medium,BW2-10 [AMR-22],86562,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2021-09-04\86562
|
||||
medium,BW3-20 [AMR-26],38929,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-04-05\38929
|
||||
medium,BW2-10 [AMR-22],125087,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-03-02\125087
|
||||
medium,BW2-13 [AMR-24],119980,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2022-11-20\119980
|
||||
medium,BW2-10 [AMR-22],74116,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2021-04-14\74116
|
||||
medium,BW2-10 [AMR-22],101557,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-03-05\101557
|
||||
medium,BW3-20 [AMR-26],148093,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2024-01-13\148093
|
||||
medium,BW2-10 [AMR-22],97238,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-01-07\97238
|
||||
medium,BW3-20 [AMR-26],23171,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-11-13\23171
|
||||
medium,BW3-20 [AMR-26],28007,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-12-26\28007
|
||||
medium,BW1-4 [AMR-15],52288,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-08-29\52288
|
||||
medium,BW3-16 [AMR-16],66638,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2021-01-28\66638
|
||||
medium,BW3-20 [AMR-26],54374,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-09-16\54374
|
||||
medium,BW2-8 [AMR-25],158079,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2024-07-16\158079
|
||||
medium,BW2-10 [AMR-22],122216,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-01-18\122216
|
||||
medium,BW3-20 [AMR-26],151922,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2024-03-08\151922
|
||||
medium,BW2-13 [AMR-24],47678,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2020-07-11\47678
|
||||
medium,BW2-10 [AMR-22],32062,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2020-02-01\32062
|
||||
medium,BW2-8 [AMR-25],60826,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2020-11-29\60826
|
||||
medium,BW2-10 [AMR-22],31095,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2020-01-24\31095
|
||||
medium,BW2-10 [AMR-22],144344,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-11-22\144344
|
||||
medium,BW2-10 [AMR-22],140013,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2023-09-26\140013
|
||||
medium,BW3-20 [AMR-26],55608,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-09-27\55608
|
||||
medium,BW2-8 [AMR-25],17697,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2019-09-25\17697
|
||||
medium,BW3-20 [AMR-26],26794,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-12-14\26794
|
||||
medium,BW2-10 [AMR-22],114464,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-09-11\114464
|
||||
medium,BW2-10 [AMR-22],113595,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-08-26\113595
|
||||
medium,BW3-20 [AMR-26],59494,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-11-17\59494
|
||||
medium,BW3-20 [AMR-26],17595,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-09-24\17595
|
||||
medium,BW2-10 [AMR-22],95535,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2021-12-15\95535
|
||||
medium,BW2-11 [AMR-23],159024,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2024-11-14\159024
|
||||
medium,BW3-20 [AMR-26],29326,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-01-08\29326
|
||||
medium,BW3-20 [AMR-26],129738,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2023-04-27\129738
|
||||
medium,BW2-10 [AMR-22],49731,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2020-08-01\49731
|
||||
medium,BW3-20 [AMR-26],23196,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-11-14\23196
|
||||
medium,BW2-10 [AMR-22],72647,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2021-03-30\72647
|
||||
medium,BW2-13 [AMR-24],39157,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2020-04-08\39157
|
||||
medium,BW3-20 [AMR-26],138785,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2023-09-09\138785
|
||||
medium,BW3-20 [AMR-26],148250,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2024-01-16\148250
|
||||
medium,BW3-20 [AMR-26],119471,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2022-11-12\119471
|
||||
medium,BW3-20 [AMR-26],34470,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-02-23\34470
|
||||
medium,BW3-20 [AMR-26],109734,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2022-06-07\109734
|
||||
medium,BW2-10 [AMR-22],116997,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-10-13\116997
|
||||
medium,BW2-10 [AMR-22],26076,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2019-12-08\26076
|
||||
medium,BW2-10 [AMR-22],42501,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2020-05-08\42501
|
||||
medium,BW2-8 [AMR-25],52036,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2020-08-27\52036
|
||||
medium,BW3-16 [AMR-16],37365,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2020-03-22\37365
|
||||
medium,BW2-8 [AMR-25],157670,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2024-06-25\157670
|
||||
medium,BW3-20 [AMR-26],15419,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-09-04\15419
|
||||
medium,BW2-10 [AMR-22],38651,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2020-04-03\38651
|
||||
large,BW3-20 [AMR-26],63054,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-12-21\63054
|
||||
large,BW2-10 [AMR-22],12990,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2018-10-15\12990
|
||||
large,BW1-4 [AMR-15],71109,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2021-03-15\71109
|
||||
large,BW1-4 [AMR-15],10715,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-05-01\10715
|
||||
large,BW3-21 [AMR-17],12185,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2017-03-27\12185
|
||||
large,BW1-4 [AMR-15],10907,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-06-07\10907
|
||||
large,BW2-10 [AMR-22],12693,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2018-03-12\12693
|
||||
large,BW1-4 [AMR-15],10898,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-06-05\10898
|
||||
large,BW1-4 [AMR-15],49214,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-07-27\49214
|
||||
large,BW2-11 [AMR-23],12552,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2017-12-04\12552
|
||||
large,BW3-17 [AMR-20],10937,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2015-06-18\10937
|
||||
large,BW2-10 [AMR-22],12353,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2017-08-11\12353
|
||||
large,BW2-11 [AMR-23],142004,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2023-10-23\142004
|
||||
large,BW3-17 [AMR-20],10100,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2014-06-16\10100
|
||||
large,BW3-17 [AMR-20],89168,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2021-10-04\89168
|
||||
large,BW2-8 [AMR-25],10377,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2014-11-24\10377
|
||||
large,BW3-19 [AMR-21],13055,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2018-11-19\13055
|
||||
large,BW1-6 [AMR-19],10620,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2015-03-25\10620
|
||||
large,BW3-20 [AMR-26],75333,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2021-04-26\75333
|
||||
large,BW3-20 [AMR-26],71107,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2021-03-15\71107
|
||||
large,BW3-17 [AMR-20],157907,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2024-07-05\157907
|
||||
large,BW2-10 [AMR-22],10925,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2015-06-15\10925
|
||||
large,BW2-13 [AMR-24],13017,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2018-10-30\13017
|
||||
large,BW2-8 [AMR-25],152547,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2024-03-18\152547
|
||||
large,BW1-6 [AMR-19],13004,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2018-10-21\13004
|
||||
large,BW1-6 [AMR-19],12934,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2018-08-20\12934
|
||||
large,BW2-13 [AMR-24],150086,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2024-02-12\150086
|
||||
large,BW3-16 [AMR-16],29192,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2020-01-06\29192
|
||||
large,BW2-13 [AMR-24],150620,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2024-02-19\150620
|
||||
large,BW2-13 [AMR-24],10137,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2014-07-07\10137
|
||||
large,BW2-13 [AMR-24],12969,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2018-09-10\12969
|
||||
large,BW3-16 [AMR-16],10129,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2014-06-30\10129
|
||||
large,BW1-4 [AMR-15],10930,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-06-16\10930
|
||||
large,BW1-4 [AMR-15],60897,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-11-30\60897
|
||||
large,BW3-16 [AMR-16],13042,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2018-11-12\13042
|
||||
large,BW1-4 [AMR-15],54939,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-09-21\54939
|
||||
large,BW1-6 [AMR-19],12922,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2018-08-13\12922
|
||||
large,BW1-4 [AMR-15],10905,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-06-06\10905
|
||||
large,BW2-13 [AMR-24],13104,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2018-12-31\13104
|
||||
large,BW2-11 [AMR-23],10177,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2014-07-24\10177
|
||||
large,BW1-6 [AMR-19],12492,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2017-10-23\12492
|
||||
large,BW2-10 [AMR-22],10647,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2015-03-30\10647
|
||||
large,BW2-8 [AMR-25],65492,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2021-01-14\65492
|
||||
large,BW3-19 [AMR-21],13259,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2019-04-08\13259
|
||||
large,BW3-16 [AMR-16],13105,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2018-12-31\13105
|
||||
large,BW1-6 [AMR-19],10002,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2014-02-10\10002
|
||||
large,BW2-13 [AMR-24],10176,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2014-07-24\10176
|
||||
large,BW1-7 [AMR-18],10312,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2014-10-27\10312
|
||||
large,BW3-16 [AMR-16],11143,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2015-08-04\11143
|
||||
large,BW2-10 [AMR-22],10302,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2014-10-10\10302
|
||||
xlarge,BW1-6 [AMR-19],157995,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2024-07-12\157995
|
||||
xlarge,BW2-13 [AMR-24],157337,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2024-06-10\157337
|
||||
xlarge,BW3-21 [AMR-17],12676,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2018-02-26\12676
|
||||
xlarge,BW3-16 [AMR-16],10666,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2015-04-14\10666
|
||||
xlarge,BW1-6 [AMR-19],74657,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2021-04-19\74657
|
||||
xlarge,BW1-4 [AMR-15],10921,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-06-15\10921
|
||||
xlarge,BW3-19 [AMR-21],43555,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2020-05-18\43555
|
||||
xlarge,BW3-16 [AMR-16],11988,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2016-12-07\11988
|
||||
xlarge,BW3-21 [AMR-17],12906,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-21__AMR-17\2018-07-17\12906
|
||||
xlarge,BW1-4 [AMR-15],13280,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2019-04-22\13280
|
||||
xlarge,BW1-6 [AMR-19],111563,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-6__AMR-19\2022-07-04\111563
|
||||
xlarge,BW3-20 [AMR-26],12941,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2018-08-20\12941
|
||||
xlarge,BW2-8 [AMR-25],13126,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2019-01-14\13126
|
||||
xlarge,BW1-7 [AMR-18],112645,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2022-07-29\112645
|
||||
xlarge,BW2-11 [AMR-23],12581,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2017-12-27\12581
|
||||
xlarge,BW2-13 [AMR-24],12034,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2017-01-03\12034
|
||||
xlarge,BW2-13 [AMR-24],12260,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2017-06-05\12260
|
||||
xlarge,BW1-7 [AMR-18],10065,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2014-05-05\10065
|
||||
xlarge,BW2-11 [AMR-23],13229,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2019-03-25\13229
|
||||
xlarge,BW1-4 [AMR-15],10196,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2014-08-04\10196
|
||||
xlarge,BW1-7 [AMR-18],122844,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2023-02-03\122844
|
||||
xlarge,BW2-11 [AMR-23],83433,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2021-07-19\83433
|
||||
xlarge,BW1-4 [AMR-15],43558,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2020-05-18\43558
|
||||
xlarge,BW2-11 [AMR-23],38997,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2020-04-06\38997
|
||||
xlarge,BW2-8 [AMR-25],10325,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-8__AMR-25\2014-11-03\10325
|
||||
xlarge,BW3-20 [AMR-26],10356,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2014-11-17\10356
|
||||
xlarge,BW3-20 [AMR-26],10306,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2014-10-10\10306
|
||||
xlarge,BW2-13 [AMR-24],47870,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2020-07-13\47870
|
||||
xlarge,BW2-10 [AMR-22],113242,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2022-08-15\113242
|
||||
xlarge,BW2-11 [AMR-23],11477,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-11__AMR-23\2016-02-15\11477
|
||||
xlarge,BW3-19 [AMR-21],11185,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2015-08-24\11185
|
||||
xlarge,BW3-20 [AMR-26],62336,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-12-14\62336
|
||||
xlarge,BW3-20 [AMR-26],10454,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2015-01-05\10454
|
||||
xlarge,BW3-16 [AMR-16],10329,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2014-11-03\10329
|
||||
xlarge,BW3-19 [AMR-21],13342,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-19__AMR-21\2019-05-28\13342
|
||||
xlarge,BW3-20 [AMR-26],148596,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2024-01-22\148596
|
||||
xlarge,BW2-13 [AMR-24],11987,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-13__AMR-24\2016-12-07\11987
|
||||
xlarge,BW1-7 [AMR-18],157743,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2024-06-28\157743
|
||||
xlarge,BW1-7 [AMR-18],11852,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2016-08-30\11852
|
||||
xlarge,BW2-10 [AMR-22],85215,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2021-08-16\85215
|
||||
xlarge,BW1-7 [AMR-18],8572,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-7__AMR-18\2014-01-06\8572
|
||||
xlarge,BW1-4 [AMR-15],10206,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2014-08-11\10206
|
||||
xlarge,BW3-17 [AMR-20],13191,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-17__AMR-20\2019-02-25\13191
|
||||
xlarge,BW3-20 [AMR-26],42786,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2020-05-11\42786
|
||||
xlarge,BW3-16 [AMR-16],11901,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-16__AMR-16\2016-10-03\11901
|
||||
xlarge,BW1-4 [AMR-15],10073,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2014-05-19\10073
|
||||
xlarge,BW3-20 [AMR-26],13278,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW3-20__AMR-26\2019-04-16\13278
|
||||
xlarge,BW2-10 [AMR-22],19711,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW2-10__AMR-22\2019-10-14\19711
|
||||
xlarge,BW1-4 [AMR-15],10256,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2014-09-15\10256
|
||||
xlarge,BW1-4 [AMR-15],11035,\\192.168.1.192\projects\Code\spruce_scraper\archives\BW1-4__AMR-15\2015-06-29\11035
|
||||
|
@@ -0,0 +1,15 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
VENV="/tmp/spruce_venv"
|
||||
|
||||
if [[ ! -x "$VENV/bin/python" ]]; then
|
||||
echo "Setting up venv at $VENV..."
|
||||
python3 -m venv "$VENV"
|
||||
"$VENV/bin/python" -m ensurepip --upgrade
|
||||
"$VENV/bin/pip" install -q -r "$SCRIPT_DIR/requirements.txt"
|
||||
fi
|
||||
|
||||
echo "Starting metadata-only scan of all machines..."
|
||||
"$VENV/bin/python" "$SCRIPT_DIR/scraper.py" --metadata-only "$@"
|
||||
@@ -0,0 +1,105 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Split scans.csv into per-machine metadata CSVs.
|
||||
|
||||
Reads the combined scans.csv produced by the scraper and writes one CSV per
|
||||
machine containing only the website-sourced metadata columns (no mosaic paths,
|
||||
download status, or error fields).
|
||||
|
||||
Usage:
|
||||
python scripts/export_machine_metadata.py
|
||||
python scripts/export_machine_metadata.py --input archives/scans.csv --output-dir archives/by_machine
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
METADATA_COLUMNS = [
|
||||
"machine",
|
||||
"machine_id",
|
||||
"scan_id",
|
||||
"name",
|
||||
"scan_time",
|
||||
"start_x",
|
||||
"start_y",
|
||||
"end_x",
|
||||
"end_y",
|
||||
"dx",
|
||||
"dy",
|
||||
"nx",
|
||||
"ny",
|
||||
"total_tiles",
|
||||
"scan_lines",
|
||||
"scan_mode",
|
||||
"start_datetime",
|
||||
"end_datetime",
|
||||
"status",
|
||||
"user",
|
||||
"disk_space_mb",
|
||||
]
|
||||
|
||||
|
||||
def sanitize_machine_label(label: str) -> str:
|
||||
return label.replace("[", "").replace("]", "").replace(" ", "_").strip("_")
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
p = argparse.ArgumentParser(description="Split scans.csv into per-machine metadata CSVs.")
|
||||
p.add_argument(
|
||||
"--input",
|
||||
default="archives/scans.csv",
|
||||
metavar="FILE",
|
||||
help="Path to scans.csv (default: archives/scans.csv)",
|
||||
)
|
||||
p.add_argument(
|
||||
"--output-dir",
|
||||
default="archives/by_machine",
|
||||
metavar="DIR",
|
||||
help="Directory for output CSVs (default: archives/by_machine)",
|
||||
)
|
||||
return p.parse_args()
|
||||
|
||||
|
||||
def main() -> None:
|
||||
args = parse_args()
|
||||
input_path = Path(args.input)
|
||||
output_dir = Path(args.output_dir)
|
||||
|
||||
if not input_path.exists():
|
||||
sys.exit(f"Input file not found: {input_path}")
|
||||
|
||||
with input_path.open(newline="") as fh:
|
||||
reader = csv.DictReader(fh)
|
||||
if reader.fieldnames is None:
|
||||
sys.exit(f"{input_path} appears to be empty.")
|
||||
|
||||
missing = [c for c in METADATA_COLUMNS if c not in reader.fieldnames]
|
||||
if missing:
|
||||
sys.exit(f"Expected columns not found in {input_path}: {missing}")
|
||||
|
||||
rows_by_machine: dict[str, list[dict]] = defaultdict(list)
|
||||
for row in reader:
|
||||
rows_by_machine[row["machine"]].append(row)
|
||||
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
for machine_label, rows in sorted(rows_by_machine.items()):
|
||||
safe_name = sanitize_machine_label(machine_label)
|
||||
out_path = output_dir / f"{safe_name}_scans_metadata.csv"
|
||||
with out_path.open("w", newline="") as fh:
|
||||
writer = csv.DictWriter(fh, fieldnames=METADATA_COLUMNS, extrasaction="ignore")
|
||||
writer.writeheader()
|
||||
writer.writerows(rows)
|
||||
print(f" {out_path} ({len(rows)} rows)")
|
||||
|
||||
total = sum(len(r) for r in rows_by_machine.values())
|
||||
print(f"\n{len(rows_by_machine)} machine(s), {total} total rows → {output_dir}/")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,6 +1,6 @@
|
||||
# All RootView minirhizotron machine labels (same set as `machine_metadata` in config.example.yaml).
|
||||
# Copy to the repo root as machines.txt, or: cp scripts/machines.example.txt machines.txt
|
||||
# sample_random_scans.sh: by default one random scan per line = mosaic + tiles; use MOSAIC_ONLY=1 for mosaics only
|
||||
# Random-sample helper `scripts/sample_random_scans.sh` lives on branch `testing/sample-runs` only.
|
||||
BW1-4 [AMR-15]
|
||||
BW1-6 [AMR-19]
|
||||
BW1-7 [AMR-18]
|
||||
|
||||
@@ -0,0 +1,381 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Report mosaic download progress from archives/scans.csv.
|
||||
|
||||
Output is formatted as Markdown. Add --by-year for a per-machine ×
|
||||
per-year breakdown table.
|
||||
|
||||
Rate/ETA require two calls at least 60 s apart. Mean mosaic size is
|
||||
sampled from up to 100 already-downloaded files and cached for 1 hour.
|
||||
|
||||
Usage:
|
||||
python scripts/mosaic_progress_report.py [--archive DIR] [--by-year]
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import json
|
||||
import os
|
||||
import random
|
||||
import sys
|
||||
from collections import Counter
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# Year-based viability model derived from BW1-4 observations:
|
||||
# pre-2019 → kept on server long-term (~100 %)
|
||||
# 2019-2022 → purged (~ 0 %)
|
||||
# 2023+ → recent, mostly available (~82 %)
|
||||
_R_PRE19 = 1.00
|
||||
_R_PURGED = 0.00
|
||||
_R_RECENT = 0.82
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _parse_dt(s: str) -> datetime | None:
|
||||
try:
|
||||
return datetime.fromisoformat(s.replace("Z", "+00:00"))
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def _fmt_duration(seconds: float) -> str:
|
||||
if seconds < 0:
|
||||
return "?"
|
||||
h = int(seconds // 3600)
|
||||
m = int((seconds % 3600) // 60)
|
||||
if h >= 48:
|
||||
return f"{h // 24}d {h % 24}h"
|
||||
if h > 0:
|
||||
return f"{h}h {m:02d}m"
|
||||
return f"{m}m {int(seconds % 60):02d}s"
|
||||
|
||||
|
||||
def _fmt_size(b: float) -> str:
|
||||
if b >= 1e12:
|
||||
return f"{b/1e12:.2f} TB"
|
||||
if b >= 1e9:
|
||||
return f"{b/1e9:.2f} GB"
|
||||
if b >= 1e6:
|
||||
return f"{b/1e6:.1f} MB"
|
||||
return f"{b/1e3:.0f} KB"
|
||||
|
||||
|
||||
def _md_table(headers: list[str], rows: list[list[str]], *, align: list[str] | None = None) -> str:
|
||||
"""Render a Markdown table. align values: 'l', 'r', 'c' (default 'l')."""
|
||||
if align is None:
|
||||
align = ["l"] * len(headers)
|
||||
sep_map = {"l": ":---", "r": "---:", "c": ":---:"}
|
||||
|
||||
def row_str(cells: list[str]) -> str:
|
||||
return "| " + " | ".join(cells) + " |"
|
||||
|
||||
lines = [
|
||||
row_str(headers),
|
||||
row_str([sep_map.get(a, ":---") for a in align]),
|
||||
]
|
||||
for r in rows:
|
||||
lines.append(row_str(r))
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _sample_mean_bytes(rows: list[dict], cache: dict, max_sample: int = 100) -> float | None:
|
||||
cached_mean = cache.get("mean_bytes")
|
||||
cached_n = cache.get("sample_n", 0)
|
||||
cached_ts = _parse_dt(cache.get("size_ts", ""))
|
||||
now = datetime.now(timezone.utc)
|
||||
if (
|
||||
cached_mean and cached_ts
|
||||
and (now - cached_ts).total_seconds() < 3600
|
||||
and cached_n >= min(max_sample, len(rows))
|
||||
):
|
||||
return float(cached_mean)
|
||||
|
||||
sample = random.sample(rows, min(max_sample, len(rows)))
|
||||
sizes = []
|
||||
for row in sample:
|
||||
p = row.get("mosaic_local_path", "")
|
||||
if p:
|
||||
try:
|
||||
sz = os.path.getsize(p)
|
||||
if sz > 0:
|
||||
sizes.append(sz)
|
||||
except OSError:
|
||||
pass
|
||||
if not sizes:
|
||||
return None
|
||||
mean = sum(sizes) / len(sizes)
|
||||
cache["mean_bytes"] = mean
|
||||
cache["sample_n"] = len(sizes)
|
||||
cache["size_ts"] = now.isoformat()
|
||||
return mean
|
||||
|
||||
|
||||
def _expected_remaining(pending_rows: list[dict]) -> float:
|
||||
count = 0.0
|
||||
for row in pending_rows:
|
||||
yr = row.get("scan_time", "")[:4]
|
||||
if yr < "2019":
|
||||
count += _R_PRE19
|
||||
elif yr <= "2022":
|
||||
count += _R_PURGED
|
||||
else:
|
||||
count += _R_RECENT
|
||||
return count
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--archive", default="archives")
|
||||
parser.add_argument(
|
||||
"--by-year", action="store_true",
|
||||
help="Add a per-machine × per-year done/failed breakdown table",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
archive = Path(args.archive)
|
||||
scans_csv = archive / "scans.csv"
|
||||
progress_json = archive / ".progress.json"
|
||||
rate_cache_path = archive / ".mosaic_rate_cache.json"
|
||||
|
||||
if not scans_csv.exists():
|
||||
sys.exit(f"scans.csv not found: {scans_csv}")
|
||||
|
||||
# --- Load & deduplicate (last row per machine+scan_id) ---
|
||||
latest: dict[tuple[str, str], dict] = {}
|
||||
with open(scans_csv, newline="", encoding="utf-8") as f:
|
||||
for row in csv.DictReader(f):
|
||||
key = (row.get("machine", ""), row.get("scan_id", ""))
|
||||
latest[key] = row
|
||||
|
||||
by_machine: dict[str, Counter] = {}
|
||||
# machine -> year -> Counter(status -> count)
|
||||
by_machine_year: dict[str, dict[str, Counter]] = {}
|
||||
total_counts: Counter = Counter()
|
||||
downloaded_rows: list[dict] = []
|
||||
pending_rows: list[dict] = []
|
||||
|
||||
for (_m, _sid), row in latest.items():
|
||||
status = row.get("mosaic_download_status", "")
|
||||
m = row.get("machine", "")
|
||||
yr = (row.get("scan_time") or "")[:4] or "????"
|
||||
by_machine.setdefault(m, Counter())[status] += 1
|
||||
by_machine_year.setdefault(m, {}).setdefault(yr, Counter())[status] += 1
|
||||
total_counts[status] += 1
|
||||
if status == "downloaded":
|
||||
downloaded_rows.append(row)
|
||||
elif status == "skipped_metadata_only":
|
||||
pending_rows.append(row)
|
||||
|
||||
total = sum(total_counts.values())
|
||||
downloaded = total_counts["downloaded"]
|
||||
failed = total_counts["failed"]
|
||||
zero_skipped = total_counts["skipped_zero_disk_space"]
|
||||
pending = total_counts["skipped_metadata_only"]
|
||||
processed = downloaded + failed + zero_skipped
|
||||
attempted = downloaded + failed
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
# --- Elapsed ---
|
||||
elapsed_str = ""
|
||||
if progress_json.exists():
|
||||
try:
|
||||
data = json.loads(progress_json.read_text())
|
||||
started_at = _parse_dt(data.get("started_at", ""))
|
||||
if started_at:
|
||||
elapsed_str = _fmt_duration((now - started_at).total_seconds())
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# --- Rate cache ---
|
||||
cache: dict = {}
|
||||
if rate_cache_path.exists():
|
||||
try:
|
||||
cache = json.loads(rate_cache_path.read_text())
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Rolling rate: keep up to 60 snapshots; compute rate from the oldest
|
||||
# snapshot within the last 30 minutes for a smoothed estimate.
|
||||
snapshots: list[dict] = cache.get("snapshots", [])
|
||||
# Prune snapshots older than 30 minutes, but keep at least one
|
||||
cutoff = now.timestamp() - 1800
|
||||
recent = [s for s in snapshots if s.get("ts", 0) >= cutoff]
|
||||
if not recent and snapshots:
|
||||
recent = [snapshots[-1]] # always keep one for continuity
|
||||
|
||||
rate_per_sec: float | None = None
|
||||
rate_window_str = ""
|
||||
if recent:
|
||||
oldest = recent[0]
|
||||
dt = now.timestamp() - oldest["ts"]
|
||||
dp = processed - oldest["proc"]
|
||||
if dt >= 60 and dp > 0:
|
||||
rate_per_sec = dp / dt
|
||||
window_min = dt / 60
|
||||
rate_window_str = f"{window_min:.0f}-min avg"
|
||||
|
||||
# --- Disk space ---
|
||||
mean_bytes: float | None = None
|
||||
size_note = ""
|
||||
if downloaded_rows:
|
||||
mean_bytes = _sample_mean_bytes(downloaded_rows, cache)
|
||||
if mean_bytes and cache.get("sample_n"):
|
||||
size_note = f"mean {_fmt_size(mean_bytes)} × {cache['sample_n']} sampled files"
|
||||
|
||||
dl_bytes: float | None = None
|
||||
rem_bytes: float | None = None
|
||||
if mean_bytes:
|
||||
dl_bytes = downloaded * mean_bytes
|
||||
rem_bytes = _expected_remaining(pending_rows) * mean_bytes
|
||||
|
||||
# Update cache: append new snapshot, keep last 60
|
||||
recent.append({"ts": now.timestamp(), "proc": processed})
|
||||
cache["snapshots"] = recent[-60:]
|
||||
# Keep legacy keys for backwards compat
|
||||
cache["timestamp"] = now.isoformat()
|
||||
cache["processed"] = processed
|
||||
try:
|
||||
rate_cache_path.write_text(json.dumps(cache))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
rate_str = eta_str = ""
|
||||
if rate_per_sec and rate_per_sec > 0:
|
||||
rate_str = f"{rate_per_sec * 3600:,.0f} scans/hr ({rate_window_str})"
|
||||
eta_str = _fmt_duration(pending / rate_per_sec)
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Output — Markdown
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
ts = datetime.now().strftime("%Y-%m-%d %H:%M")
|
||||
print(f"# Mosaic download progress — {ts}\n")
|
||||
print(f"**Archive:** `{archive.resolve()}` ")
|
||||
meta_parts = []
|
||||
if elapsed_str:
|
||||
meta_parts.append(f"**Elapsed:** {elapsed_str}")
|
||||
if rate_str:
|
||||
meta_parts.append(f"**Rate:** {rate_str}")
|
||||
if eta_str:
|
||||
meta_parts.append(f"**ETA:** {eta_str}")
|
||||
if meta_parts:
|
||||
print(" ".join(meta_parts) + " ")
|
||||
print()
|
||||
|
||||
# Summary table
|
||||
summary_rows = [
|
||||
["Downloaded", f"{downloaded:,}", f"{100*downloaded/total:.1f}%"],
|
||||
["Failed", f"{failed:,}", f"{100*failed/total:.1f}%"],
|
||||
["Skipped (disk=0)",f"{zero_skipped:,}", f"{100*zero_skipped/total:.1f}%"],
|
||||
["Pending", f"{pending:,}", f"{100*pending/total:.1f}%"],
|
||||
["**Total**", f"**{total:,}**", ""],
|
||||
]
|
||||
if attempted:
|
||||
summary_rows.append(["**Success rate**", f"**{100*downloaded/attempted:.1f}%**", "*(of attempted)*"])
|
||||
print(_md_table(["Metric", "Count", ""], summary_rows, align=["l", "r", "l"]))
|
||||
print()
|
||||
|
||||
# Disk space
|
||||
if dl_bytes is not None and rem_bytes is not None:
|
||||
total_bytes = dl_bytes + rem_bytes
|
||||
print(f"### Disk space\n")
|
||||
print(f"_{size_note}_\n")
|
||||
ds_rows = [
|
||||
["Downloaded so far", _fmt_size(dl_bytes), ""],
|
||||
["Estimated remaining", _fmt_size(rem_bytes), "*(model-based)*"],
|
||||
["**Grand total**", f"**{_fmt_size(total_bytes)}**", ""],
|
||||
]
|
||||
print(_md_table(["", "Size", ""], ds_rows, align=["l", "r", "l"]))
|
||||
print()
|
||||
|
||||
# Per-machine breakdown
|
||||
print("### Per-machine breakdown\n")
|
||||
machines = sorted(by_machine)
|
||||
mc_rows = []
|
||||
for m in machines:
|
||||
mc = by_machine[m]
|
||||
mt = sum(mc.values())
|
||||
mc_rows.append([
|
||||
m,
|
||||
f"{mc['downloaded']:,}",
|
||||
f"{mc['failed']:,}",
|
||||
f"{mc['skipped_zero_disk_space']:,}",
|
||||
f"{mc['skipped_metadata_only']:,}",
|
||||
f"{mt:,}",
|
||||
])
|
||||
mc_rows.append([
|
||||
"**TOTAL**",
|
||||
f"**{downloaded:,}**",
|
||||
f"**{failed:,}**",
|
||||
f"**{zero_skipped:,}**",
|
||||
f"**{pending:,}**",
|
||||
f"**{total:,}**",
|
||||
])
|
||||
print(_md_table(
|
||||
["Machine", "Done", "Failed", "Skip0", "Pending", "Total"],
|
||||
mc_rows,
|
||||
align=["l", "r", "r", "r", "r", "r"],
|
||||
))
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# --by-year table
|
||||
# -----------------------------------------------------------------------
|
||||
if args.by_year:
|
||||
print()
|
||||
print("### Downloads by machine and year\n")
|
||||
print("*Format: done / failed*\n")
|
||||
|
||||
# Only include years that have at least one downloaded or failed scan
|
||||
all_years = sorted(
|
||||
yr for yr in set(
|
||||
yr for m_data in by_machine_year.values() for yr in m_data
|
||||
)
|
||||
if any(
|
||||
by_machine_year[m].get(yr, Counter()).get("downloaded", 0)
|
||||
+ by_machine_year[m].get(yr, Counter()).get("failed", 0) > 0
|
||||
for m in by_machine_year
|
||||
)
|
||||
)
|
||||
|
||||
# Totals per year
|
||||
yr_totals: dict[str, Counter] = {}
|
||||
for yr in all_years:
|
||||
yr_totals[yr] = Counter()
|
||||
for m in machines:
|
||||
yr_totals[yr] += by_machine_year.get(m, {}).get(yr, Counter())
|
||||
|
||||
year_rows = []
|
||||
for m in machines:
|
||||
row_cells = [m]
|
||||
for yr in all_years:
|
||||
c = by_machine_year.get(m, {}).get(yr, Counter())
|
||||
d = c.get("downloaded", 0)
|
||||
f = c.get("failed", 0)
|
||||
row_cells.append(f"{d:,} / {f:,}" if (d or f) else "—")
|
||||
year_rows.append(row_cells)
|
||||
|
||||
# Totals row
|
||||
total_cells = ["**TOTAL**"]
|
||||
for yr in all_years:
|
||||
d = yr_totals[yr].get("downloaded", 0)
|
||||
f = yr_totals[yr].get("failed", 0)
|
||||
total_cells.append(f"**{d:,} / {f:,}**" if (d or f) else "—")
|
||||
year_rows.append(total_cells)
|
||||
|
||||
print(_md_table(
|
||||
["Machine"] + all_years,
|
||||
year_rows,
|
||||
align=["l"] + ["r"] * len(all_years),
|
||||
))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -0,0 +1,218 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Report metadata-scan progress and projected completion times for all machines.
|
||||
|
||||
Usage:
|
||||
python scripts/scan_progress_report.py [--archive ARCHIVE_DIR] [--recent N] [--mermaid] [--rate-chart]
|
||||
|
||||
Options:
|
||||
--archive DIR Path to archives directory (default: archives)
|
||||
--recent N Number of recent files used to compute current rate (default: 500)
|
||||
--mermaid Also print a Mermaid Gantt chart
|
||||
--rate-chart Also print a Mermaid XY chart of s/scan rate by hour
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import glob
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
# Canonical machine order and total scan counts (from README inventory, April 2026)
|
||||
MACHINES = [
|
||||
("BW1-4 [AMR-15]", 6121),
|
||||
("BW1-6 [AMR-19]", 18198),
|
||||
("BW1-7 [AMR-18]", 430),
|
||||
("BW2-8 [AMR-25]", 8191),
|
||||
("BW2-10 [AMR-22]", 16537),
|
||||
("BW2-11 [AMR-23]", 26763),
|
||||
("BW2-13 [AMR-24]", 13537),
|
||||
("BW3-16 [AMR-16]", 7325),
|
||||
("BW3-17 [AMR-20]", 471),
|
||||
("BW3-19 [AMR-21]", 15186),
|
||||
("BW3-20 [AMR-26]", 23052),
|
||||
("BW3-21 [AMR-17]", 10115),
|
||||
]
|
||||
TOTAL_SCANS = sum(t for _, t in MACHINES)
|
||||
|
||||
|
||||
def dir_name(label: str) -> str:
|
||||
return re.sub(r"[^\w\-.]", "_", label).strip("_")
|
||||
|
||||
|
||||
def get_timestamps(machine_dir: Path) -> list[float]:
|
||||
files = glob.glob(str(machine_dir / "**" / "metadata.json"), recursive=True)
|
||||
return sorted(os.path.getmtime(f) for f in files)
|
||||
|
||||
|
||||
def print_rate_chart(all_timestamps: list[float]) -> None:
|
||||
"""Print a Mermaid xychart-beta of average s/scan per hour."""
|
||||
# One avg rate per hour
|
||||
bins: dict[str, list[float]] = defaultdict(list)
|
||||
start_hour: datetime | None = None
|
||||
for i in range(1, len(all_timestamps)):
|
||||
gap = all_timestamps[i] - all_timestamps[i - 1]
|
||||
if gap < 300: # ignore inter-machine gaps
|
||||
dt = datetime.fromtimestamp(all_timestamps[i])
|
||||
hour_key = dt.strftime("%m-%d %Hh")
|
||||
bins[hour_key].append(gap)
|
||||
if start_hour is None:
|
||||
start_hour = dt.replace(minute=0, second=0, microsecond=0)
|
||||
|
||||
# Drop the last (partial) hour
|
||||
hours = sorted(bins.keys())
|
||||
if hours:
|
||||
hours = hours[:-1]
|
||||
|
||||
if not hours or start_hour is None:
|
||||
print("(not enough data for rate chart)")
|
||||
return
|
||||
|
||||
values = [f"{sum(bins[h])/len(bins[h]):.2f}" for h in hours]
|
||||
y_max = max(float(v) for v in values)
|
||||
y_ceil = int(y_max) + 3
|
||||
n = len(hours)
|
||||
|
||||
# Numeric x-axis: Mermaid auto-picks readable tick positions
|
||||
start_label = start_hour.strftime("%b %d %H:%M")
|
||||
print("```mermaid")
|
||||
print("xychart-beta")
|
||||
print(f' title "Metadata scan rate (s/scan) — hourly, starting {start_label}"')
|
||||
print(f' x-axis "Hours elapsed" 0 --> {n}')
|
||||
print(f' y-axis "s / scan" 0 --> {y_ceil}')
|
||||
print(f" line [{', '.join(values)}]")
|
||||
print("```")
|
||||
|
||||
|
||||
def fmt_dt(dt: datetime) -> str:
|
||||
return dt.strftime("%a %b %d %H:%M")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument("--archive", default="archives", help="Archives directory")
|
||||
parser.add_argument("--recent", type=int, default=500,
|
||||
help="Files used to compute recent rate (default: 500)")
|
||||
parser.add_argument("--mermaid", action="store_true", help="Print Mermaid Gantt chart")
|
||||
parser.add_argument("--rate-chart", action="store_true", help="Print Mermaid XY rate-over-time chart")
|
||||
args = parser.parse_args()
|
||||
|
||||
archive = Path(args.archive)
|
||||
if not archive.is_dir():
|
||||
sys.exit(f"Archive directory not found: {archive}")
|
||||
|
||||
now = datetime.now()
|
||||
|
||||
# --- Gather per-machine data ---
|
||||
machine_data = [] # (label, total, done, first_ts, last_ts)
|
||||
all_timestamps: list[float] = []
|
||||
|
||||
for label, total in MACHINES:
|
||||
mdir = archive / dir_name(label)
|
||||
if mdir.is_dir():
|
||||
times = get_timestamps(mdir)
|
||||
else:
|
||||
times = []
|
||||
done = len(times)
|
||||
first_ts = datetime.fromtimestamp(times[0]) if times else None
|
||||
last_ts = datetime.fromtimestamp(times[-1]) if times else None
|
||||
machine_data.append((label, total, done, first_ts, last_ts, times))
|
||||
all_timestamps.extend(times)
|
||||
|
||||
all_timestamps.sort()
|
||||
total_done = sum(d for _, _, d, *_ in machine_data)
|
||||
|
||||
# --- Rate calculation ---
|
||||
recent_times = all_timestamps[-args.recent:] if len(all_timestamps) >= 2 else all_timestamps
|
||||
if len(recent_times) >= 2:
|
||||
recent_rate = (recent_times[-1] - recent_times[0]) / len(recent_times)
|
||||
else:
|
||||
recent_rate = None
|
||||
|
||||
if len(all_timestamps) >= 2:
|
||||
overall_rate = (all_timestamps[-1] - all_timestamps[0]) / len(all_timestamps)
|
||||
else:
|
||||
overall_rate = None
|
||||
|
||||
rate = recent_rate or overall_rate or 5.0 # fallback
|
||||
|
||||
# --- Print timetable ---
|
||||
print(f"Metadata scan progress — {now.strftime('%Y-%m-%d %H:%M')}")
|
||||
print(f"Overall rate : {overall_rate:.2f} s/scan" if overall_rate else "Overall rate : n/a")
|
||||
print(f"Recent rate : {recent_rate:.2f} s/scan (last {args.recent} files)" if recent_rate else "Recent rate : n/a")
|
||||
print(f"Rate used : {rate:.2f} s/scan")
|
||||
print(f"Done : {total_done:,} / {TOTAL_SCANS:,} ({100*total_done/TOTAL_SCANS:.1f}%)")
|
||||
print()
|
||||
print(f"{'Machine':<20} {'Total':>7} {'Done':>7} {'Pct':>6} {'Completion'}")
|
||||
print("-" * 68)
|
||||
|
||||
cursor = now
|
||||
gantt_rows: list[tuple[str, datetime, datetime, str]] = [] # label, start, end, status
|
||||
|
||||
for label, total, done, first_ts, last_ts, times in machine_data:
|
||||
pct = 100 * done / total if total else 0
|
||||
|
||||
complete = done >= total or (done > 0 and done / total >= 0.999)
|
||||
|
||||
if done == 0:
|
||||
# Not started yet
|
||||
start = cursor
|
||||
finish = cursor + timedelta(seconds=total * rate)
|
||||
status = "pending"
|
||||
print(f"{label:<20} {total:>7,} {'—':>7} {'—':>5} {fmt_dt(finish)}")
|
||||
elif complete:
|
||||
# Complete — use actual timestamps
|
||||
start = first_ts
|
||||
finish = last_ts
|
||||
status = "done"
|
||||
print(f"{label:<20} {total:>7,} {done:>7,} {pct:>5.1f}% complete")
|
||||
else:
|
||||
# In progress
|
||||
remaining = total - done
|
||||
start = first_ts
|
||||
finish = cursor + timedelta(seconds=remaining * rate)
|
||||
status = "active"
|
||||
print(f"{label:<20} {total:>7,} {done:>7,} {pct:>5.1f}% {fmt_dt(finish)} ← in progress")
|
||||
|
||||
gantt_rows.append((label, start or cursor, finish, status))
|
||||
if status != "done":
|
||||
cursor = finish
|
||||
|
||||
print("-" * 68)
|
||||
print(f"{'All done':<20} {TOTAL_SCANS:>7,} {total_done:>7,} {100*total_done/TOTAL_SCANS:>5.1f}% {fmt_dt(cursor)}")
|
||||
|
||||
# --- Mermaid rate chart ---
|
||||
if args.rate_chart:
|
||||
print()
|
||||
print_rate_chart(all_timestamps)
|
||||
|
||||
# --- Mermaid Gantt ---
|
||||
if args.mermaid:
|
||||
print()
|
||||
print("```mermaid")
|
||||
print("gantt")
|
||||
print(" title Metadata scan progress — all 12 machines")
|
||||
print(" dateFormat YYYY-MM-DD HH:mm")
|
||||
print(" axisFormat %b %d")
|
||||
print()
|
||||
|
||||
section = None
|
||||
for label, start, finish, status in gantt_rows:
|
||||
prefix = label.split()[0][:3] # BW1, BW2, BW3
|
||||
if prefix != section:
|
||||
section = prefix
|
||||
print(f" section {section}")
|
||||
safe = label.replace("[", "").replace("]", "").replace(" ", "-")
|
||||
tag = f"done, " if status == "done" else ("active, " if status == "active" else "")
|
||||
s = start.strftime("%Y-%m-%d %H:%M")
|
||||
e = finish.strftime("%Y-%m-%d %H:%M")
|
||||
print(f" {label} :{tag}{safe}, {s}, {e}")
|
||||
print("```")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -122,6 +122,17 @@ def parse_args() -> argparse.Namespace:
|
||||
"and report how many were re-queued. Run before resuming after a crash."
|
||||
),
|
||||
)
|
||||
p.add_argument(
|
||||
"--max-tiles",
|
||||
type=int,
|
||||
default=None,
|
||||
metavar="N",
|
||||
help=(
|
||||
"Download at most N tiles per scan (default: all). "
|
||||
"Pass 1 to probe a single tile — useful for quickly checking "
|
||||
"whether a scan has real imagery or only placeholder responses."
|
||||
),
|
||||
)
|
||||
p.add_argument(
|
||||
"--verbose",
|
||||
"-v",
|
||||
@@ -145,6 +156,9 @@ def main() -> None:
|
||||
if args.list_scans_first_page_only and not args.list_scans:
|
||||
sys.exit("--list-scans-first-page-only requires --list-scans")
|
||||
|
||||
if args.scan_id is not None and args.scan_id <= 0:
|
||||
sys.exit("--scan-id must be a positive integer")
|
||||
|
||||
# --list-machines doesn't need credentials
|
||||
if args.list_machines:
|
||||
base_url = "http://205.149.147.131:8010/"
|
||||
@@ -270,6 +284,7 @@ def main() -> None:
|
||||
mosaic_only=args.mosaic_only,
|
||||
metadata_only=args.metadata_only,
|
||||
scan_id_filter=args.scan_id,
|
||||
max_tiles=args.max_tiles,
|
||||
)
|
||||
totals.merge(stats)
|
||||
finally:
|
||||
@@ -331,6 +346,22 @@ def _print_summary(
|
||||
)
|
||||
if not metadata_only and not mosaic_only:
|
||||
log.info(row("Tiles downloaded:", str(totals.tiles_downloaded)))
|
||||
if totals.scans_probe_skipped:
|
||||
log.info(
|
||||
row(
|
||||
"Probe-skipped scans:",
|
||||
str(totals.scans_probe_skipped),
|
||||
"probe tile was 404 or placeholder; tile pool skipped",
|
||||
)
|
||||
)
|
||||
if not metadata_only and totals.scans_disk_space_skipped:
|
||||
log.info(
|
||||
row(
|
||||
"Zero-disk-space skipped:",
|
||||
str(totals.scans_disk_space_skipped),
|
||||
"disk_space_mb=0; mosaic and tiles not attempted",
|
||||
)
|
||||
)
|
||||
if not dry_run and not metadata_only:
|
||||
log.info(
|
||||
row(
|
||||
|
||||
@@ -9,7 +9,19 @@ from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from spruce.download_result import error_code_str
|
||||
from spruce.download_result import PERMANENT_MISSING, UNKNOWN, error_code_str
|
||||
|
||||
# RootView returns ~43-byte 1×1 JPEG placeholders for empty cells; stay well
|
||||
# below smallest observed real tile (~7 KiB in production samples).
|
||||
PLACEHOLDER_MAX_BYTES = 200
|
||||
|
||||
|
||||
def _is_placeholder_tile(path: Path) -> bool:
|
||||
"""Return True if a downloaded tile looks like a 1×1 server placeholder."""
|
||||
try:
|
||||
return path.is_file() and path.stat().st_size <= PLACEHOLDER_MAX_BYTES
|
||||
except OSError:
|
||||
return False
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -21,8 +33,10 @@ class RunStats:
|
||||
scans_failed: int = 0 # metadata fetch error or missing grid params
|
||||
metadata_written: int = 0 # new metadata.json files created
|
||||
mosaics_downloaded: int = 0
|
||||
mosaics_failed: int = 0 # mosaic URL attempted but 0 bytes / HTTP error
|
||||
mosaics_failed: int = 0 # mosaic URL attempted but 0 bytes / HTTP error
|
||||
tiles_downloaded: int = 0
|
||||
scans_probe_skipped: int = 0 # probe tile was 404 or placeholder; full tile pool skipped
|
||||
scans_disk_space_skipped: int = 0 # disk_space_mb == 0; no mosaic or tiles attempted
|
||||
|
||||
def merge(self, other: "RunStats") -> None:
|
||||
self.scans_fetched += other.scans_fetched
|
||||
@@ -32,6 +46,8 @@ class RunStats:
|
||||
self.mosaics_downloaded += other.mosaics_downloaded
|
||||
self.mosaics_failed += other.mosaics_failed
|
||||
self.tiles_downloaded += other.tiles_downloaded
|
||||
self.scans_probe_skipped += other.scans_probe_skipped
|
||||
self.scans_disk_space_skipped += other.scans_disk_space_skipped
|
||||
|
||||
from tqdm import tqdm
|
||||
|
||||
@@ -224,6 +240,8 @@ def process_scan(
|
||||
dry_run: bool,
|
||||
mosaic_only: bool,
|
||||
metadata_only: bool = False,
|
||||
max_tiles: int | None = None,
|
||||
scans_csv_existing_ids: set[int] | None = None,
|
||||
) -> RunStats:
|
||||
"""
|
||||
Process one scan: fetch metadata, download mosaic and (optionally) tiles.
|
||||
@@ -247,7 +265,8 @@ def process_scan(
|
||||
candidate = machine_root / scan_date_hint / str(scan_id) / "metadata.json"
|
||||
if candidate.exists():
|
||||
found_meta = candidate
|
||||
if found_meta is None:
|
||||
# Date hint is reliable — don't glob if candidate wasn't found.
|
||||
else:
|
||||
matches = list(machine_root.glob(f"*/{scan_id}/metadata.json"))
|
||||
if matches:
|
||||
found_meta = matches[0]
|
||||
@@ -295,6 +314,23 @@ def process_scan(
|
||||
):
|
||||
scan_meta.setdefault(k, scan.get(k, ""))
|
||||
|
||||
# disk_space_mb == 0 is a reliable signal that the scan has no imagery.
|
||||
# A 300-scan investigation (50 per bucket) found 0% viability in this bucket.
|
||||
# Skip the mosaic and tile downloads entirely; write a record so scans.csv
|
||||
# stays complete.
|
||||
disk_space_skip = False
|
||||
if not metadata_only:
|
||||
try:
|
||||
if float(scan_meta.get("disk_space_mb") or "nan") == 0.0:
|
||||
disk_space_skip = True
|
||||
log.info(
|
||||
"[%s] Scan %d: disk_space_mb=0 — skipping mosaic and tiles.",
|
||||
machine["label"],
|
||||
scan_id,
|
||||
)
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
# Save per-scan metadata.json
|
||||
scan_date = _extract_date(scan_meta.get("scan_time", ""))
|
||||
scan_dir = output_dir / machine_dir_name(machine) / scan_date / str(scan_id)
|
||||
@@ -307,11 +343,11 @@ def process_scan(
|
||||
)
|
||||
stats.metadata_written += 1
|
||||
|
||||
# Mosaic (skipped entirely in metadata-only mode)
|
||||
# Mosaic (skipped entirely in metadata-only or disk_space_skip mode)
|
||||
mosaic_path = mosaic_dest(output_dir, machine, scan_meta, scan_id)
|
||||
mosaic_url = sess.mosaic_url(scan_id)
|
||||
mosaic_already_done = progress.is_done(mosaic_url)
|
||||
if metadata_only:
|
||||
if metadata_only or disk_space_skip:
|
||||
mosaic_attempt: MosaicAttempt | None = None
|
||||
else:
|
||||
mosaic_attempt = _download_mosaic(
|
||||
@@ -332,6 +368,9 @@ def process_scan(
|
||||
|
||||
if metadata_only:
|
||||
mds, mer, mco, mcl = "skipped_metadata_only", "", "", ""
|
||||
elif disk_space_skip:
|
||||
mds, mer, mco, mcl = "skipped_zero_disk_space", "", "", ""
|
||||
stats.scans_disk_space_skipped += 1
|
||||
elif mosaic_attempt is not None:
|
||||
mds = mosaic_attempt.csv_status
|
||||
mer = mosaic_attempt.error
|
||||
@@ -341,9 +380,16 @@ def process_scan(
|
||||
mds, mer, mco, mcl = "", "", "", ""
|
||||
|
||||
# Write scan-level CSV row only if this scan hasn't been recorded before.
|
||||
if mosaic_already_done and not metadata_only:
|
||||
# Skip if: (1) mosaic URL already in .progress.json, or (2) scan already
|
||||
# has a non-pending row in scans.csv from a prior run.
|
||||
already_recorded = (mosaic_already_done and not metadata_only) or (
|
||||
not metadata_only
|
||||
and scans_csv_existing_ids is not None
|
||||
and scan_id in scans_csv_existing_ids
|
||||
)
|
||||
if already_recorded:
|
||||
log.debug(
|
||||
"[%s] Scan %d: already in scans.csv (mosaic was previously downloaded), skipping CSV row.",
|
||||
"[%s] Scan %d: already in scans.csv, skipping CSV row.",
|
||||
machine["label"],
|
||||
scan_id,
|
||||
)
|
||||
@@ -381,11 +427,46 @@ def process_scan(
|
||||
}
|
||||
)
|
||||
|
||||
if mosaic_only or metadata_only:
|
||||
if mosaic_only or metadata_only or disk_space_skip:
|
||||
return stats
|
||||
|
||||
# Tiles
|
||||
tiles = sess.enumerate_tiles(scan_meta)
|
||||
if max_tiles is not None:
|
||||
tiles = tiles[:max_tiles]
|
||||
|
||||
# Tile probe: always download one tile before launching the full thread
|
||||
# pool. Two failure modes justify this:
|
||||
# 1. Mosaic failed (404/410 or empty body) — scan was set up but never
|
||||
# run; tile grid is all placeholders or 404s.
|
||||
# 2. Mosaic succeeded but tiles are server-side placeholders (1x1 JPEG,
|
||||
# ~43 B) — mosaic was generated from empty data; downloading the full
|
||||
# grid would fire thousands of guaranteed-placeholder requests.
|
||||
if (
|
||||
not dry_run
|
||||
and tiles
|
||||
and not progress.is_done(tiles[0]["url"])
|
||||
):
|
||||
probe_tile = tiles[0]
|
||||
probe_dest = tile_dest(output_dir, machine, scan_meta, probe_tile)
|
||||
probe_res = sess.download_file(probe_tile["url"], probe_dest)
|
||||
if not probe_res.ok or _is_placeholder_tile(probe_dest):
|
||||
probe_dest.unlink(missing_ok=True)
|
||||
detail = (
|
||||
"is placeholder"
|
||||
if probe_res.ok
|
||||
else f"failed ({probe_res.error_class or probe_res.error or 'unknown'})"
|
||||
)
|
||||
log.info(
|
||||
"[%s] Scan %d: probe tile %s — empty/placeholder scan, skipping %d tile(s).",
|
||||
machine["label"],
|
||||
scan_id,
|
||||
detail,
|
||||
len(tiles),
|
||||
)
|
||||
stats.scans_probe_skipped += 1
|
||||
return stats
|
||||
|
||||
stats.tiles_downloaded += _download_tiles_for_scan(
|
||||
sess,
|
||||
tiles,
|
||||
@@ -417,10 +498,24 @@ def scrape_machine(
|
||||
mosaic_only: bool,
|
||||
metadata_only: bool = False,
|
||||
scan_id_filter: int | None = None,
|
||||
max_tiles: int | None = None,
|
||||
) -> RunStats:
|
||||
"""Login, fetch scans, and download all content for one machine."""
|
||||
sess = MachineSession(machine, config)
|
||||
if not sess.login():
|
||||
login_ok = False
|
||||
for attempt in range(1, 4):
|
||||
if sess.login():
|
||||
login_ok = True
|
||||
break
|
||||
if attempt < 3:
|
||||
log.warning(
|
||||
"[%s] Login failed (attempt %d/3) — retrying in 10s.",
|
||||
machine["label"],
|
||||
attempt,
|
||||
)
|
||||
time.sleep(10)
|
||||
if not login_ok:
|
||||
log.error("[%s] Login failed after 3 attempts — skipping machine.", machine["label"])
|
||||
return RunStats()
|
||||
|
||||
if scan_id_filter is not None:
|
||||
@@ -434,8 +529,33 @@ def scrape_machine(
|
||||
log.warning("[%s] No scans found.", machine["label"])
|
||||
return RunStats()
|
||||
|
||||
# Build a set of scan_ids already fully processed in a prior run so we can
|
||||
# skip them entirely (no metadata fetch, no mosaic request).
|
||||
# Only scans with a definitive non-pending status count; skipped_metadata_only
|
||||
# rows still need to be processed in mosaic mode.
|
||||
PENDING_STATUSES = {"skipped_metadata_only", ""}
|
||||
existing_ids: set[int] = set()
|
||||
if not metadata_only and scans_csv._fh.name:
|
||||
existing_path = Path(scans_csv._fh.name)
|
||||
if existing_path.exists():
|
||||
import csv as _csv
|
||||
with open(existing_path, newline="", encoding="utf-8") as _f:
|
||||
for _row in _csv.DictReader(_f):
|
||||
if _row.get("machine") == machine["label"]:
|
||||
if _row.get("mosaic_download_status", "") not in PENDING_STATUSES:
|
||||
existing_ids.add(int(_row["scan_id"]))
|
||||
|
||||
stats = RunStats()
|
||||
for scan in scans:
|
||||
# Skip scans already fully processed in a prior run — avoids redundant
|
||||
# metadata fetches and mosaic requests for known-failed / known-done scans.
|
||||
if not metadata_only and scan["scan_id"] in existing_ids:
|
||||
log.debug(
|
||||
"[%s] Scan %d: already processed, skipping.",
|
||||
machine["label"],
|
||||
scan["scan_id"],
|
||||
)
|
||||
continue
|
||||
stats.merge(process_scan(
|
||||
sess=sess,
|
||||
scan=scan,
|
||||
@@ -448,5 +568,7 @@ def scrape_machine(
|
||||
dry_run=dry_run,
|
||||
mosaic_only=mosaic_only,
|
||||
metadata_only=metadata_only,
|
||||
max_tiles=max_tiles,
|
||||
scans_csv_existing_ids=existing_ids,
|
||||
))
|
||||
return stats
|
||||
|
||||
@@ -5,6 +5,7 @@ Progress tracking (JSON) and CSV writing.
|
||||
import csv
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Iterator
|
||||
|
||||
@@ -27,6 +28,7 @@ class ProgressTracker:
|
||||
def __init__(self, path: Path) -> None:
|
||||
self.path = path
|
||||
self._done: set[str] = set()
|
||||
self.started_at: str = datetime.now(timezone.utc).isoformat()
|
||||
self._load()
|
||||
|
||||
def _load(self) -> None:
|
||||
@@ -34,6 +36,8 @@ class ProgressTracker:
|
||||
try:
|
||||
data = json.loads(self.path.read_text())
|
||||
self._done = set(data.get("completed_urls", []))
|
||||
if "started_at" in data:
|
||||
self.started_at = data["started_at"]
|
||||
log.info("Resuming: %d URLs already downloaded.", len(self._done))
|
||||
except Exception:
|
||||
log.warning("Could not read progress file; starting fresh.")
|
||||
@@ -59,7 +63,10 @@ class ProgressTracker:
|
||||
self.path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = self.path.with_suffix(".json.tmp")
|
||||
tmp.write_text(
|
||||
json.dumps({"completed_urls": sorted(self._done)}, indent=2)
|
||||
json.dumps(
|
||||
{"started_at": self.started_at, "completed_urls": sorted(self._done)},
|
||||
indent=2,
|
||||
)
|
||||
)
|
||||
tmp.replace(self.path) # atomic on POSIX; avoids corrupt JSON on crash
|
||||
|
||||
|
||||
@@ -14,6 +14,7 @@ from bs4 import BeautifulSoup
|
||||
|
||||
from spruce.download_result import (
|
||||
OK,
|
||||
PERMANENT_MISSING,
|
||||
UNKNOWN,
|
||||
DownloadResult,
|
||||
classify_http_error,
|
||||
@@ -216,8 +217,10 @@ class MachineSession:
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def mosaic_url(self, scan_id: int) -> str:
|
||||
# The server stores scan directories zero-padded to 6 digits (e.g. 010700/).
|
||||
# Scans with IDs >= 100000 are unaffected since they are already 6 digits.
|
||||
return urljoin(
|
||||
self.image_base_url, f"RootView_Database/{scan_id}/mosaic.jpg"
|
||||
self.image_base_url, f"RootView_Database/{scan_id:06d}/mosaic.jpg"
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
@@ -261,8 +264,12 @@ class MachineSession:
|
||||
and exc.response is not None
|
||||
):
|
||||
sc = exc.response.status_code
|
||||
cl = classify_http_error(sc, exc)
|
||||
if cl == PERMANENT_MISSING:
|
||||
# 404/410 will never succeed — don't waste time retrying.
|
||||
return DownloadResult(0, sc, str(exc), cl)
|
||||
if attempt < retries:
|
||||
log.debug(
|
||||
log.warning(
|
||||
"Attempt %d/%d failed %s: %s — retrying in %.0fs",
|
||||
attempt,
|
||||
retries,
|
||||
@@ -279,7 +286,6 @@ class MachineSession:
|
||||
url,
|
||||
exc,
|
||||
)
|
||||
cl = classify_http_error(sc, exc)
|
||||
return DownloadResult(0, sc, str(exc), cl)
|
||||
return DownloadResult(0, None, "download_file: exhausted", UNKNOWN)
|
||||
|
||||
|
||||