Add EXIF writing and machine metadata support

This commit is contained in:
2026-04-24 18:21:37 -04:00
parent f2193011ca
commit e8d3bf7180
11 changed files with 577 additions and 12 deletions
+17 -4
View File
@@ -117,6 +117,15 @@ python scraper.py --machine "BW3-20 [AMR-26]" --scan-id 158374 --workers 4
| `--list-scans` | Print all scans for `--machine` and exit |
| `--verbose` / `-v` | Debug logging |
### `config.yaml` (optional keys)
| Key | Description |
|---|---|
| `write_exif` | If true (default), write EXIF to each `mosaic.jpg` after download. Set to false to skip. |
| `machine_metadata` | Map of machine label → optional fields for mosaic EXIF: `plot_number`, `enclosure` (bool), `temp_treatment` (number or string), `co2_treatment` (`ambient` / `elevated`), `latitude_wgs_84`, `longitude_wgs_84`, `elevation_masl`. Omitted keys are not written. |
`config.example.yaml` lists all 12 machine labels with full `machine_metadata` (plot, enclosure, treatments, WGS84 coordinates, elevation) and an optional `machines` filter (commented).
---
## Output layout
@@ -131,7 +140,7 @@ archives/
└── 2024-07-29/
└── 158374/
├── metadata.json # full scan parameters (grid, timestamps, etc.)
├── mosaic.jpg # pre-stitched full image (~16 MB)
├── mosaic.jpg # pre-stitched full image (~16 MB), EXIF after download
└── tiles/
├── tile_r000_c000.jpg # row 0, column 0 (zero-padding matches grid size)
├── tile_r000_c001.jpg
@@ -140,6 +149,8 @@ archives/
Tile filenames encode position: `tile_r{row}_c{col}.jpg` where row increases with depth (Y in mm) and column increases along the tube circumference (X in mm).
**Mosaic `mosaic.jpg` EXIF** (when `write_exif` is true in `config.yaml`, default on): set immediately after a successful download via `piexif` (no re-encoding). Includes `DateTime` / `DateTimeOriginal` (from scan time), `ImageDescription` (machine, scan id, name), `Make` = RootView, `Model` = machine label, `Software` = RootView + server version, `ProcessingSoftware` = this scraper, `Artist` (user), a one-line `UserComment` (grid size, pointer to `metadata.json`, and when set in `machine_metadata`: `plot_number`, `enclosure`, `temp_treatment`, `co2_treatment`), `XPKeywords` with the same treatment fields when any of those four are set, and GPS when `latitude_wgs_84`, `longitude_wgs_84`, and optionally `elevation_masl` are set. See `config.example.yaml` for the `machine_metadata` layout.
### Metadata files
**`scans.csv`** columns: `machine`, `machine_id`, `scan_id`, `name`, `scan_time`, `start_x`, `start_y`, `end_x`, `end_y`, `dx`, `dy`, `nx`, `ny`, `total_tiles`, `scan_lines`, `scan_mode`, `start_datetime`, `end_datetime`, `status`, `user`, `disk_space_mb`, `mosaic_url`, `mosaic_local_path`, `mosaic_on_disk`
@@ -195,7 +206,7 @@ Every run prints a summary table on completion:
Run complete
──────────────────────────────────────────────
Machines: 1
Scans fetched: 428 (2 already cached, 0 failed)
Scans (metadata) fetched: 428 (2 already cached, 0 metadata failed)
Metadata written: 428 (new JSON files)
──────────────────────────────────────────────
Scans CSV: archives/scans.csv
@@ -203,10 +214,11 @@ Every run prints a summary table on completion:
──────────────────────────────────────────────
```
- **Scans fetched**: metadata detail page was retrieved from the server this run.
- **Scans (metadata) fetched**: RootView scan detail page was retrieved (grid params, etc.). This does not mean the mosaic downloaded successfully; use **Mosaics downloaded** / **Mosaics failed** when not in `--metadata-only` mode.
- **Already cached**: `metadata.json` already existed on disk; no HTTP request was made.
- **Failed**: fetch error or scan missing required grid parameters.
- **metadata failed**: metadata fetch error or scan missing required grid parameters.
- **Metadata written**: new `metadata.json` files created (shown in `--metadata-only` mode).
- **Mosaics failed** (when present): mosaic URL was requested but the file was not saved (e.g. HTTP 404, or empty body). Check the log for the exact URL.
- Mosaic and tile counts appear in their respective modes.
---
@@ -219,3 +231,4 @@ Every run prints a summary table on completion:
| `beautifulsoup4` + `lxml` | HTML parsing |
| `pyyaml` | Config file |
| `tqdm` | Progress bars |
| `piexif` | EXIF for downloaded mosaics |