Commit graph

3916 commits

Author SHA1 Message Date
Fatih Akyon 7e984437ee
feat: default phase1 to 7-source paired data
Switches phase1 default to the paired/train splits so gqa, flickr, and dotav1 get proper val coverage and dotav1/soda-a val sizes are resampled to match per-source train share. Also reverts patience to 20 and phase2 pretrained back to best.pt.
2026-04-29 05:49:56 -05:00
Fatih Akyon 654afac13a
fix: align distill defaults with literature
Set phase1 patience=200 to avoid early stop on slow-drift epochs and load phase2 from last.pt instead of best.pt, matching UNIC/DUNE/EdgeCrafter which train fixed epochs and use the final checkpoint.
2026-04-29 04:02:02 -05:00
Fatih Akyon 65814e34e5
feat: add device guard to phase1 distill resume
Refuse silent CLI vs checkpoint device mismatches; provide patch_resume guidance to bake the new device.
2026-04-28 00:42:26 -05:00
Fatih Akyon 68771ae4d0
docs: fix photometric augs comment pointer
Reference docstring lives in callbacks/distill_aug.py:classify_augmentations_distill, not the reverted ultralytics/data/augment.py path. Follow-up to 79dd79181 which fixed the same stale pointer in train_image_encoder.py.
2026-04-26 04:40:03 -05:00
Fatih Akyon edb2cce222
fix: pin dinov3 recipe warmup_epochs to 1
The recipe set warmup_epochs=18 to match DINOv3's 16pct ratio at 114 ep, but
the runner scales warmup by batch/512 so at batch=1024 the effective warmup
became 36 ep (31pct of training). That broke direct comparison with the
existing 7-source runs, which use 2 effective warmup epochs.

Setting warmup_epochs=1 keeps the post-scaling value at 2, matching the
running runs. Other dinov3 axes (lr, wd schedule, augs, grad_clip) unchanged.
2026-04-26 03:59:03 -05:00
Fatih Akyon 79dd791815
docs: fix photometric augs comment pointer
Comment in ImageEncoderTrainer pointed at the old upstream path; the
photometric stack now lives in callbacks/distill_aug.py. No behavior change.
2026-04-26 03:32:11 -05:00
Fatih Akyon 931b81fe57
feat: add dinov3 distill recipe with photometric stack and wd schedule
Motivation
  fastvit-s x adaptor diverges at full scale on 7-source training (final knn
  5.9%, chance-level). Forensic smokes ruled out norm hot-swap, beta2 sweep,
  fixed-wd changes, and BN running-stat freezes. Two recipe-level mismatches
  with DINOv3 / EUPE / UNIC / DUNE distillation papers remained:
    * our pipeline still pulls Ultralytics defaults RandAugment + RandomErasing
      0.4 from cfg/default.yaml, while every reference recipe disables both
      and instead uses ColorJitter + Grayscale + GaussianBlur + Solarize;
    * we use fixed weight_decay 0.02 with ~1pct warmup, while DINOv3 ramps
      wd 0.04 -> 0.2 over training and warms up for 16pct of epochs.

What changed
  callbacks/distill_aug.py: classify_augmentations_distill, sibling to
    ultralytics/data/augment.py:classify_augmentations. Same signature plus
    grayscale, gaussian_blur, solarize knobs (default 0.0 = bit-equivalent
    to upstream). Order mirrors UNIC main_unic.py:485-521. Kept out of
    ultralytics/data/ to avoid touching the upstream cls training pipeline.
  callbacks/wd_schedule.py: half-cosine wd ramp matching DINOv3
    dinov3/optim/schedulers.py CosineSchedule, registered DDP-safe inside
    the trainer __init__ (per utils/dist.py:79 callbacks-on-rank-0 footgun).
  ultralytics/cfg/__init__.py: extend allowed_custom_keys with wd_end,
    grayscale, gaussian_blur, solarize so DDP arg serialisation passes.
  ultralytics/models/yolo/classify/train_image_encoder.py: switch
    _build_transforms to classify_augmentations_distill and forward the
    three new self.args knobs; register wd_schedule callback when wd_end > 0.
  run_enc_distill_phase1.py: new dinov3 recipe (lr0=2e-4, wd 0.04->0.2,
    warmup 18 ep, ColorJitter 0.4/0.4/0.2/0.1, grayscale 0.2, blur 0.5,
    solarize 0.2, auto_augment off, erasing off) plus override forwarding.
  Existing default / eupe / radio / unic recipes untouched.
2026-04-25 18:55:25 -05:00
Fatih Akyon b607f77e27
fix: bind resume_args fallback in distill runners
phase1 + phase2 now inherit batch/lr/nbs/epochs/patience/data from saved
train_args on --resume; drift guards on data/mode refuse silent regressions.
2026-04-25 05:28:28 -05:00
Fatih Akyon d70c3e23a2
fix: set PYTHONPATH to runner dir so DDP workers can import callbacks 2026-04-24 06:08:17 -05:00
Fatih Akyon e4557d2149
chore: reduce dataloader workers to 2 for NFS-friendly training 2026-04-24 06:04:10 -05:00
Fatih Akyon c205971dd6
refactor: register FastViTBlock/MHSABlock in nn/modules
Previous callbacks/vit_modules.py monkey-patched parse_model (162-line
verbatim copy + one extra elif). That broke under DDP because the worker
cwd is USER_CONFIG_DIR/DDP/, so the runner-local callbacks package is
off sys.path. Import the blocks directly in tasks.py and fold them into
the AIFI elif that prepends ch[f].
2026-04-24 02:47:38 -05:00
Fatih Akyon 82534ab1aa
fix: keep distill hooks alive under DDP
Runner-side model.add_callback() was silently dropped on DDP workers, so
grad_clip, beta2 and nfs_sync never ran. Register the hooks inside
ImageEncoderTrainer so they run on every rank. Also imports vit_modules at
trainer module top so FastViT/SimpleViT YAMLs parse in DDP workers too.
2026-04-23 17:31:29 -05:00
Fatih Akyon 3abbba5ce3
docs: record FastViT/SimpleViT export results
Replace target-param comments in yolo26-{fastvit,simplevit}-cls.yaml with
measured params, ONNX node counts, and TRT fp16 latency from the 2026-04-23
export sweep (all 4 variants <=1.5x the yolo26s-cls conv baseline).
Note PaddlePaddle op-coverage gap and the RKNN torch-downgrade trap so future
sweeps skip them, and clarify that the 1327-node figure in MHSABlock refers
to the AIFI ViT, not these architectures.
2026-04-23 08:54:05 -05:00
Fatih Akyon 54fc15adb1 feat: add distill_path x adaptor_arch distill axes
Current MLP adaptor + CLS+patch-only supervision yields 14pp kNN
gain but only +0.24pp COCO100 over CE baseline (ties within noise).
Detection reads raw L3/L5/L10 while distill supervises a per-teacher
MLP after the final stage, so the supervised features never reach
the detection path.

distill_path in {adaptor (default), feat_map}: feat_map routes
student L3/L5/L8 to teacher final-block tokens via 1x1 Conv per
scale with MSE, landing gradients on the same layers detection
reads (EdgeCrafter-style path alignment).

adaptor_arch in {mlp (default), linear}: linear replaces the
2-layer Linear-LN-GELU-Linear MLP with a single
Linear(in, out, bias=False). EdgeCrafter argues heavy projections
absorb the student-teacher mismatch instead of forcing it into
the backbone where detection can benefit.

loss_items tensor shape is invariant (3,) across all four combos,
so WandB plots overlay across modes. Both args registered in
allowed_custom_keys (DDP-safe). Resume guard refuses silent
switches of either arg across restart.

Defaults reproduce prior behaviour bit-identical.
2026-04-23 05:53:07 -05:00
Fatih Akyon 97e91d9755 feat: add FastViT + SimpleViT cls students
Replace unused AIFI student (12.8x slower than conv baseline at
bs=1 fp16, 1327 ONNX nodes). FastViT-S benches 1.07ms / 228 nodes,
actually faster than yolo26s-cls conv baseline (1.83ms / 234).
SimpleViT-S aligns 14x14 tokens with EUPE-ViT-B at 224px, which
lets feat_map distillation with adaptor_arch=linear collapse to
identity + projection.

Custom modules live in ultralytics/nn/modules/vit_blocks.py
(FastViTBlock, MHSABlock). Registration into parse_model goes
through callbacks/vit_modules.py, which copies parse_model
verbatim and adds one elif branch to prepend ch[f] for these
modules; avoids editing ultralytics/nn/tasks.py.

Simple-component constraint only: Conv2d, BatchNorm2d, LayerNorm,
GELU, Linear, F.scaled_dot_product_attention (no nn.MultiheadAttention,
no 2D RoPE) so ONNX/TRT/CoreML/TFLite export cleanly.

Scales yolo26{s,l}-{fastvit,simplevit}-cls: s ~5-7M, l ~15M params.
2026-04-23 05:52:18 -05:00
Fatih Akyon a9d29fb601
feat: add --scratch flag to phase2 runner
Lets users train from random init so pretrained-backbone runs can be compared against a no-pretraining control, quantifying the net contribution of the pretraining stage to final downstream accuracy.
2026-04-22 07:43:58 -05:00
Fatih Akyon 4a6b7a347f
feat: add dota_obb_finetune phase 2 mode
Completes OBB coverage for encoder distillation downstream eval alongside coco det/pose and imagenet cls; params mirror the canonical yolo26s-obb.pt (bs=32, nbs=64, lr0=0.00125, imgsz=1024, degrees=180, MuSGD muon_w=0.5) so baseline runs are directly comparable to the paper 54.8 mAP reference, using the same --batch/--lr/--nbs linear scaling as coco_det_finetune.
2026-04-21 21:08:31 -05:00
Fatih Akyon b77ed07c2e
feat: add --batch auto-scaler to phase 2 coco det
Scales lr/nbs/warmup linearly from canonical bs=128/nbs=64/lr0=0.00038 so wd_eff and lr/sample stay invariant. Adds _COCO_DET_MODES constant and per-mode flag semantics in docstring.
2026-04-21 18:19:22 -05:00
Fatih Akyon 481aaa1051
feat: add coco pose mode, align coco det recipe
Phase 2c pose runs were blocked because the runner had no pose branch;
adds coco_pose_finetune (data=coco-pose.yaml, MuSGD, pose=24, kobj=4.0)
that infers the -pose yaml from the phase1 cls yaml.

Aligns coco_det_finetune args with the published yolo26s.pt detection
recipe so phase2 coco runs match the official model's training setup.
Previously the branch drifted (missing nbs=64, cos_lr=False,
warmup_momentum/bias_lr, box/cls/dfl weights, randaugment, cutmix,
copy_paste_mode, translate/degrees/shear/hsv/erasing, muon_w=0.4355),
which made backbone comparisons against the 30.18 mAP CE baseline hard
to interpret. sgd_w/cls_w/o2m/detach_epoch from the reference aren't
accepted by this checkout's cfg validator, so only the exposed subset
is applied.

Renames modes with task prefixes so logs and wandb groups are
unambiguous: finetune -> inet_finetune, linear -> inet_linear_probe,
adamw_ft -> inet_adamw_finetune, coco_det(_frozen) ->
coco_det_finetune(_frozen). The muon_w=0.1 callback is now gated to
inet_finetune only; coco det uses muon_w=0.4355 from the published
recipe.
2026-04-21 17:56:52 -05:00
Fatih Akyon 846dc24666
feat: add --batch per-GPU auto-scaler to phase1
Ultralytics scales wd_eff with batch*accumulate/nbs but never scales lr0, so larger
global batches silently drift from the recipe's intended dynamics. The new flag takes
a per-GPU batch, computes global = per_gpu * world_size, and derives lr0, nbs, and
warmup_epochs from scale = max(1, global / NBS_CANONICAL=512) so wd_eff stays at the
recipe value while per-sample lr and optimizer-step warmup count are invariant.
2026-04-21 02:14:49 -05:00
Fatih Akyon 5105796c0f
fix: prevent val-train leak in multi-source mix
_resolve_paths' flat-dir fallback returned (p, p), which in
multi-source mode duplicated train files into the val
ConcatDataset: on the 7-source mix 844,176 of 899,176 val
samples (93.9%) were just re-enqueued train files, making val
loss meaningless as a held-out signal. Regression introduced
when multi-path support was added in 1aea2f95c.

Resolver now returns (train, None) when no held-out val is
discoverable, and additionally swaps the last `train` path
segment for `val` to auto-rescue deep layouts like
.../images/train → .../images/val (recovers O365 30k, DOTA
5,297 held-out without caller changes). get_dataset filters
None so flat sources (GQA, Flickr, SODA) drop cleanly from
val instead of polluting it.
2026-04-20 19:15:33 -05:00
Fatih Akyon 51f66a2669
fix: move lr0 assignment after recipe resolution
lr0 referenced r["lr0"] before r = RECIPES[recipe] was defined, causing UnboundLocalError when --lr flag is not passed.
2026-04-20 03:17:14 -05:00
Fatih Akyon c87765a383
feat: accept data= in paths.patch_resume
Ultralytics check_resume (trainer.py:841) restores the checkpoint's data path verbatim and does not honor caller overrides; cross-host resumes where the dataset lives at a different mount point (e.g. ultra5 NFS outage) previously needed a manual torch.load/save dance to rewrite train_args. Mirrors the existing name/device override branches so one helper call covers all four non-whitelisted fields (project, name, save_dir, data).
2026-04-18 11:31:06 -05:00
Fatih Akyon 0b7f4ca50f
feat: add --lr CLI override to phase2 script
Phase2 had hardcoded per-mode lr0 (0.1 for MuSGD finetune, 1e-3 for AdamW), with no way to change it at launch without editing the file. Mirrors phase1's _pop_flag pattern so users can sweep learning rates or drop lr on resume runs that are diverging. CLAUDE.md already documented phase2 as supporting --lr; this makes the doc true.
2026-04-18 11:27:05 -05:00
Fatih Akyon efd8cda01c
fix: decouple W&B project from local save_dir
Add callbacks.paths with run_paths() and patch_resume() helpers so fresh runs land on clean W&B project yolo-next-encoder while save_dir stays absolute local, and resumes auto-patch train_args to survive cross-machine / relocated launches without manual checkpoint edits.
2026-04-18 10:52:40 -05:00
Fatih Akyon 107e405f4b
feat: wandb fork_and_attach helper and LOCAL_PROJECT guard
Adds callbacks.wandb_config.fork_and_attach which pre-creates a forked wandb run (native fork_from or manual API-replay fallback) and hands off to DDP rank-0 via WANDB_RUN_ID+WANDB_RESUME env vars. phase1/phase2 gain an explicit module-level assert that LOCAL_PROJECT is absolute under /home/ and a --fork_from <parent_id>:<step> flag that invokes the helper before model.train(). Native fork is currently gated by wandb private preview so default path is API-replay; smoke-tested end-to-end with subprocess handoff.
2026-04-17 12:06:34 -05:00
Fatih Akyon a2a5067e5c
fix: pin project to local and harden nfs_sync
Ultralytics check_resume overwrites args.project from the ckpt (only whitelisted keys can override), so resuming a legacy NFS ckpt keeps save_dir on NFS; nfs_sync now warns-without-raising on NFS save_dir and wraps the final sync, and phase1/phase2 pin project=LOCAL_PROJECT so fresh runs land on local SSD explicitly.
2026-04-17 11:19:44 -05:00
Fatih Akyon 486872a537
feat: mirror run dir to NFS via background rsync
Writing save_dir to local SSD and rsyncing to NFS every 10min decouples training from NFS availability, avoiding a repeat of the C2-o365-coco-inet crash where a stale NFS mount destroyed the resumed run EMA state.
2026-04-17 10:58:50 -05:00
Fatih Akyon 0116afb1f5
feat: expose loss weight CLI args in phase1 script
cos_weight, l1_weight, cls_l1 were already supported by the trainer
but hardcoded in the launch script. Now configurable via --flags.
2026-04-15 11:19:37 -05:00
Fatih Akyon ac5be96d4d
build: gitignore encoder-distillation.md symlink
Experiment tracking file moved to NFS for multi-machine access,
symlinked at project root.
2026-04-15 03:55:34 -05:00
Fatih Akyon 8fba19a44b
feat: add configurable epochs arg to phase1 distillation script
Different datasets need different epoch counts to match total images seen
(e.g. O365+COCO+ImageNet at 114ep = DataComp at 30ep = ~230M images).
2026-04-15 02:42:02 -05:00
Fatih Akyon f1f5ad4879
feat: support multi-dataset distillation and frozen COCO detection
Phase1: configurable data path enables ImageNet+COCO combined training.
Phase2: coco_det_frozen mode (freeze=9) isolates backbone feature quality.
2026-04-14 09:07:07 -05:00
Fatih Akyon 1aea2f95cd
feat: add multi-dataset and loss config passthrough for distillation
UNIC trains on ImageNet-1k (main_unic.py:97), DUNE on IN-19k+GLDv2+
Mapillary (data/dino2.py). Comma-separated data paths now supported
for combining ImageFolder datasets via ConcatDataset. Loss args
(cos_weight, l1_weight, cls_l1) passed from trainer to model.
2026-04-13 11:22:08 -05:00
Fatih Akyon e0a1b56bb9
feat: add configurable loss weights (cos_weight, l1_weight, cls_l1)
UNIC (unic/modeling/losses.py:54) and DUNE (dune/model/losses.py:62)
apply 0.5cos+0.5L1 to both CLS and patches, vs our EUPE-style
cosine-only CLS. Configurable via train args for ablation testing.
2026-04-13 11:21:08 -05:00
Fatih Akyon 2214bfdfe4
fix: use None-check instead of truthiness for hidden_dim default
`hidden_dim or in_dim` silently treats 0 as falsy
2026-04-13 09:47:41 -05:00
Fatih Akyon 92429ae875
feat: add configurable proj_hidden_dim for adaptor MLP
Adaptor hidden dimension was hardcoded to backbone dim (1280). EUPE uses
3072 for larger students. Now configurable via proj_hidden_dim train arg.
2026-04-13 09:30:16 -05:00
Fatih Akyon 9542daa392
refactor: remove knn_callback, keep extract_features and knn_accuracy
kNN eval now runs inside ImageEncoderTrainer.validate() directly.
The callback closure is no longer needed. extract_features and
knn_accuracy remain as utilities for run_knn_eval.py standalone eval.
2026-04-12 20:58:40 -05:00
Fatih Akyon 98d9672b29
refactor: use knn_eval train arg instead of external callback
Remove knn_callback import and model.add_callback call. Pass
knn_eval=/data/shared-datasets/imagenet in train_args instead,
which ImageEncoderTrainer reads in _setup_train (DDP-safe).
2026-04-12 20:57:13 -05:00
Fatih Akyon 6a59bb39e3
feat: add DDP-safe kNN eval to ImageEncoderTrainer
Move kNN eval from external callback (lost in DDP subprocess) to trainer
validate() override. Enabled via knn_eval=<imagenet_path> in train_args,
which survives DDP serialization through allowed_custom_keys. Caches
dataloaders across epochs, runs every 5 epochs on rank 0 only.
2026-04-12 20:56:45 -05:00
Fatih Akyon 7868016483
fix: disable C2PSA remap in coco_det mode, restore pretrained= flow
Remap caused 17.77% mAP vs 28.02% without it. phase2-coco-d5 was
invalid (used remap unintentionally). Revert to standard pretrained=
which transfers backbone layers 0-8 via intersect_dicts.
2026-04-12 20:28:28 -05:00
Fatih Akyon a2f225806b
fix: infer det model scale from phase1 args.yaml in coco_det mode
Was hardcoded to yolo26s.yaml, breaking yolo26l COCO runs. Now reads
model config from phase1 run's args.yaml and strips -cls suffix.
2026-04-12 06:54:01 -05:00
Fatih Akyon 70d7ab226c
feat: add cls-to-det remap for coco_det mode
Load distilled cls weights with C2PSA index remapping (cls model.9 ->
det model.10) in coco_det mode. Tested: remap transfers 228 vs 192
params but produced worse mAP (17.77% vs 28.02%) due to activation
magnitude mismatch. Kept for future investigation with scaling fix.
2026-04-12 05:58:37 -05:00
Fatih Akyon f7ce7d349d
feat: replace unic recipe with eupe, add kNN eval callback
Replace unic recipe preset (lr=6e-4, wd=0.03, beta2=0.99) with eupe
recipe (lr=2e-5, wd=1e-4) matching EUPE Stage 2 proxy-to-student
distillation params. Add kNN eval every 5 epochs for frozen feature
quality tracking during Phase 1 training.
2026-04-12 05:52:49 -05:00
Fatih Akyon 23f3c0fa50
feat: add standalone kNN eval script with WandB summary update
Takes a run directory, finds weights and model config from args.yaml,
runs kNN evaluation (k=20, T=0.07), and optionally updates the finished
WandB run summary with knn/top1 via --wandb flag.
2026-04-12 05:50:06 -05:00
Fatih Akyon d7afe09ea7
feat: add teacher-averaged loss metrics and WandB epoch alignment
Add aggregated cls_cos, patch_cos, patch_l1 metrics averaged across
teachers for cross-run comparison in WandB. Define epoch-based x-axis
via wandb.define_metric so backfilled and new runs align.
2026-04-12 05:49:48 -05:00
Fatih Akyon f56b609319
feat: add stride-16 cls YAML for ViT teacher spatial alignment
yolo26-cls-s16.yaml: layer 7 Conv stride 2->1, C2PSA->C3k2.
14x14=196 patches at 224 input, matching DINOv3-ViT (196 patches,
arXiv:2312.06709) exactly. No attention, NCNN/CPU exportable.
7.1M params (+5.8%), 24.5 GFLOPS (1.9x vs base 13.2).
2026-04-11 17:52:51 -05:00
Fatih Akyon 4035e413ce
fix: drop original model.9 keys in cls-to-det remap
The else branch was missing, so cls model.9 (C2PSA) BN keys were both
remapped to model.10 AND kept as model.9. intersect_dicts shape-matched
6 C2PSA BN stats into SPPF (det model.9), corrupting initialization.
phase2-coco-d1-remap showed 11pp deficit vs non-remap at ep58.
2026-04-11 17:51:48 -05:00
Fatih Akyon dcadf170be
feat: add cls-to-det weight remap callback for C2PSA transfer
cls model.9 (C2PSA) maps to det model.10 (C2PSA) due to SPPF insertion
at det model.9. Remaps keys before intersect_dicts so C2PSA weights
transfer correctly (+42 params over standard loading).
2026-04-11 13:00:40 -05:00
Fatih Akyon f6d35ccc51
fix: add missing ClassificationDataset args to kNN eval
scale, fliplr, flipud, hsv_h/s/v are accessed unconditionally in
ClassificationDataset.__init__ at dataset.py:746, even with augment=False.
2026-04-11 12:59:21 -05:00
Fatih Akyon 565b8219cf
feat: add ImageNet kNN evaluation for distilled feature quality
RADIO/EUPE protocol (k=20, T=0.07): extract L2-normalized CLS features,
temperature-weighted voting via scatter_add_. Includes callback for
on_fit_epoch_end integration with WandB logging.
2026-04-11 11:36:51 -05:00