sam_crack/AGENTS.md

# Repository Guidelines

## Project Structure & Module Organization
Source lives in `src/` with packages: `src/dataset/` (dataset abstractions + Crack500 loader), `src/model/` (HF adapters, Trainer wrappers, predictor + CLI), `src/model_configuration/` (dataclass configs + registry), `src/evaluation/` (metrics, pipeline evaluator, CLI), `src/visualization/` (overlay/galleries + pipeline-driven CLI), and `src/tasks/` (task configs + pipeline runner for train→eval→viz). Datasets stay in `crack500/`, and experiment artifacts should land in `results/<prompt_type>/...`.

## Build, Test, and Development Commands
Install dependencies with `pip install -r requirements.txt` inside the `sam2` env. The CLI wrappers now call the TaskRunner: `python run_bbox_evaluation.py --data_root ./crack500 --test_file ./crack500/test.txt --expand_ratio 0.05` executes bbox evaluate + visualize, while `python run_point_evaluation.py --point_configs 1 3 5` sweeps multi-point setups. Reusable pipelines can be launched via the TOML templates (`tasks/bbox_eval.toml`, `tasks/point_eval.toml`) using `python -m src.tasks.run_task --task_file <file>`. HF-native commands remain available for fine-tuning (`python -m src.model.train_hf ...`), metrics (`python -m src.evaluation.run_pipeline ...`), and overlays (`python -m src.visualization.run_pipeline_vis ...`).

## Coding Style & Naming Conventions
Follow PEP 8 with 4-space indents, <=100-character lines, snake_case functions, PascalCase classes, and explicit type hints. Keep logic within its package (dataset readers under `src/dataset/`, Trainer utilities inside `src/model/`) and prefer pathlib, f-strings, and concise docstrings that clarify SAM2-specific heuristics.

## Refactor & HF Integration Roadmap
1. **Dataset module**: generalize loaders so Crack500 and future benchmarks share a dataset interface emitting HF dicts (`pixel_values`, `prompt_boxes`).
2. **Model + configuration**: wrap SAM2 checkpoints with `transformers` classes, ship reusable configs, and add HF fine-tuning utilities (LoRA/PEFT optional).
3. **Evaluation & visualization**: move metric code into `src/evaluation/` and visual helpers into `src/visualization/`, both driven by a shared HF `pipeline` API.
4. **Benchmarks**: add scripts that compare pre-trained vs fine-tuned models and persist summaries to `results/<dataset>/<model_tag>/evaluation_summary.json`.

## Testing Guidelines
Treat `python run_bbox_evaluation.py --skip_visualization` as regression test, then spot-check overlays via `--num_vis 5`. Run `python -m src.evaluation.run_pipeline --config_name sam2_bbox_prompt --max_samples 16` so dataset→pipeline→evaluation is exercised end-to-end, logging IoU/Dice deltas against committed summaries.

## Commit & Pull Request Guidelines
Adopt short, imperative commit titles (`dataset: add hf reader`). Describe scope and runnable commands in PR descriptions, attach metric/visual screenshots from `results/.../visualizations/`, and note any new configs or checkpoints referenced. Highlight where changes sit in the planned module boundaries so reviewers can track the refactor’s progress.

## Data & Configuration Tips
Never commit Crack500 imagery or SAM2 weights—verify `.gitignore` coverage before pushing. Add datasets via config entries instead of absolute paths, and keep `results/<prompt_type>/<experiment_tag>/` naming so HF sweeps can traverse directories predictably.