ultralytics

mirror of https://github.com/ultralytics/ultralytics synced 2026-05-01 20:41:09 +00:00

Author	SHA1	Message	Date
Fatih Akyon	7e984437ee	feat: default phase1 to 7-source paired data Switches phase1 default to the paired/train splits so gqa, flickr, and dotav1 get proper val coverage and dotav1/soda-a val sizes are resampled to match per-source train share. Also reverts patience to 20 and phase2 pretrained back to best.pt.	2026-04-29 05:49:56 -05:00
Fatih Akyon	654afac13a	fix: align distill defaults with literature Set phase1 patience=200 to avoid early stop on slow-drift epochs and load phase2 from last.pt instead of best.pt, matching UNIC/DUNE/EdgeCrafter which train fixed epochs and use the final checkpoint.	2026-04-29 04:02:02 -05:00
Fatih Akyon	65814e34e5	feat: add device guard to phase1 distill resume Refuse silent CLI vs checkpoint device mismatches; provide patch_resume guidance to bake the new device.	2026-04-28 00:42:26 -05:00
Fatih Akyon	68771ae4d0	docs: fix photometric augs comment pointer Reference docstring lives in callbacks/distill_aug.py:classify_augmentations_distill, not the reverted ultralytics/data/augment.py path. Follow-up to `79dd79181` which fixed the same stale pointer in train_image_encoder.py.	2026-04-26 04:40:03 -05:00
Fatih Akyon	edb2cce222	fix: pin dinov3 recipe warmup_epochs to 1 The recipe set warmup_epochs=18 to match DINOv3's 16pct ratio at 114 ep, but the runner scales warmup by batch/512 so at batch=1024 the effective warmup became 36 ep (31pct of training). That broke direct comparison with the existing 7-source runs, which use 2 effective warmup epochs. Setting warmup_epochs=1 keeps the post-scaling value at 2, matching the running runs. Other dinov3 axes (lr, wd schedule, augs, grad_clip) unchanged.	2026-04-26 03:59:03 -05:00
Fatih Akyon	79dd791815	docs: fix photometric augs comment pointer Comment in ImageEncoderTrainer pointed at the old upstream path; the photometric stack now lives in callbacks/distill_aug.py. No behavior change.	2026-04-26 03:32:11 -05:00
Fatih Akyon	931b81fe57	feat: add dinov3 distill recipe with photometric stack and wd schedule Motivation fastvit-s x adaptor diverges at full scale on 7-source training (final knn 5.9%, chance-level). Forensic smokes ruled out norm hot-swap, beta2 sweep, fixed-wd changes, and BN running-stat freezes. Two recipe-level mismatches with DINOv3 / EUPE / UNIC / DUNE distillation papers remained: * our pipeline still pulls Ultralytics defaults RandAugment + RandomErasing 0.4 from cfg/default.yaml, while every reference recipe disables both and instead uses ColorJitter + Grayscale + GaussianBlur + Solarize; * we use fixed weight_decay 0.02 with ~1pct warmup, while DINOv3 ramps wd 0.04 -> 0.2 over training and warms up for 16pct of epochs. What changed callbacks/distill_aug.py: classify_augmentations_distill, sibling to ultralytics/data/augment.py:classify_augmentations. Same signature plus grayscale, gaussian_blur, solarize knobs (default 0.0 = bit-equivalent to upstream). Order mirrors UNIC main_unic.py:485-521. Kept out of ultralytics/data/ to avoid touching the upstream cls training pipeline. callbacks/wd_schedule.py: half-cosine wd ramp matching DINOv3 dinov3/optim/schedulers.py CosineSchedule, registered DDP-safe inside the trainer __init__ (per utils/dist.py:79 callbacks-on-rank-0 footgun). ultralytics/cfg/__init__.py: extend allowed_custom_keys with wd_end, grayscale, gaussian_blur, solarize so DDP arg serialisation passes. ultralytics/models/yolo/classify/train_image_encoder.py: switch _build_transforms to classify_augmentations_distill and forward the three new self.args knobs; register wd_schedule callback when wd_end > 0. run_enc_distill_phase1.py: new dinov3 recipe (lr0=2e-4, wd 0.04->0.2, warmup 18 ep, ColorJitter 0.4/0.4/0.2/0.1, grayscale 0.2, blur 0.5, solarize 0.2, auto_augment off, erasing off) plus override forwarding. Existing default / eupe / radio / unic recipes untouched.	2026-04-25 18:55:25 -05:00
Fatih Akyon	b607f77e27	fix: bind resume_args fallback in distill runners phase1 + phase2 now inherit batch/lr/nbs/epochs/patience/data from saved train_args on --resume; drift guards on data/mode refuse silent regressions.	2026-04-25 05:28:28 -05:00
Fatih Akyon	d70c3e23a2	fix: set PYTHONPATH to runner dir so DDP workers can import callbacks	2026-04-24 06:08:17 -05:00
Fatih Akyon	e4557d2149	chore: reduce dataloader workers to 2 for NFS-friendly training	2026-04-24 06:04:10 -05:00
Fatih Akyon	c205971dd6	refactor: register FastViTBlock/MHSABlock in nn/modules Previous callbacks/vit_modules.py monkey-patched parse_model (162-line verbatim copy + one extra elif). That broke under DDP because the worker cwd is USER_CONFIG_DIR/DDP/, so the runner-local callbacks package is off sys.path. Import the blocks directly in tasks.py and fold them into the AIFI elif that prepends ch[f].	2026-04-24 02:47:38 -05:00
Fatih Akyon	82534ab1aa	fix: keep distill hooks alive under DDP Runner-side model.add_callback() was silently dropped on DDP workers, so grad_clip, beta2 and nfs_sync never ran. Register the hooks inside ImageEncoderTrainer so they run on every rank. Also imports vit_modules at trainer module top so FastViT/SimpleViT YAMLs parse in DDP workers too.	2026-04-23 17:31:29 -05:00
Fatih Akyon	3abbba5ce3	docs: record FastViT/SimpleViT export results Replace target-param comments in yolo26-{fastvit,simplevit}-cls.yaml with measured params, ONNX node counts, and TRT fp16 latency from the 2026-04-23 export sweep (all 4 variants <=1.5x the yolo26s-cls conv baseline). Note PaddlePaddle op-coverage gap and the RKNN torch-downgrade trap so future sweeps skip them, and clarify that the 1327-node figure in MHSABlock refers to the AIFI ViT, not these architectures.	2026-04-23 08:54:05 -05:00
Fatih Akyon	54fc15adb1	feat: add distill_path x adaptor_arch distill axes Current MLP adaptor + CLS+patch-only supervision yields 14pp kNN gain but only +0.24pp COCO100 over CE baseline (ties within noise). Detection reads raw L3/L5/L10 while distill supervises a per-teacher MLP after the final stage, so the supervised features never reach the detection path. distill_path in {adaptor (default), feat_map}: feat_map routes student L3/L5/L8 to teacher final-block tokens via 1x1 Conv per scale with MSE, landing gradients on the same layers detection reads (EdgeCrafter-style path alignment). adaptor_arch in {mlp (default), linear}: linear replaces the 2-layer Linear-LN-GELU-Linear MLP with a single Linear(in, out, bias=False). EdgeCrafter argues heavy projections absorb the student-teacher mismatch instead of forcing it into the backbone where detection can benefit. loss_items tensor shape is invariant (3,) across all four combos, so WandB plots overlay across modes. Both args registered in allowed_custom_keys (DDP-safe). Resume guard refuses silent switches of either arg across restart. Defaults reproduce prior behaviour bit-identical.	2026-04-23 05:53:07 -05:00
Fatih Akyon	97e91d9755	feat: add FastViT + SimpleViT cls students Replace unused AIFI student (12.8x slower than conv baseline at bs=1 fp16, 1327 ONNX nodes). FastViT-S benches 1.07ms / 228 nodes, actually faster than yolo26s-cls conv baseline (1.83ms / 234). SimpleViT-S aligns 14x14 tokens with EUPE-ViT-B at 224px, which lets feat_map distillation with adaptor_arch=linear collapse to identity + projection. Custom modules live in ultralytics/nn/modules/vit_blocks.py (FastViTBlock, MHSABlock). Registration into parse_model goes through callbacks/vit_modules.py, which copies parse_model verbatim and adds one elif branch to prepend ch[f] for these modules; avoids editing ultralytics/nn/tasks.py. Simple-component constraint only: Conv2d, BatchNorm2d, LayerNorm, GELU, Linear, F.scaled_dot_product_attention (no nn.MultiheadAttention, no 2D RoPE) so ONNX/TRT/CoreML/TFLite export cleanly. Scales yolo26{s,l}-{fastvit,simplevit}-cls: s ~5-7M, l ~15M params.	2026-04-23 05:52:18 -05:00
Fatih Akyon	a9d29fb601	feat: add --scratch flag to phase2 runner Lets users train from random init so pretrained-backbone runs can be compared against a no-pretraining control, quantifying the net contribution of the pretraining stage to final downstream accuracy.	2026-04-22 07:43:58 -05:00
Fatih Akyon	4a6b7a347f	feat: add dota_obb_finetune phase 2 mode Completes OBB coverage for encoder distillation downstream eval alongside coco det/pose and imagenet cls; params mirror the canonical yolo26s-obb.pt (bs=32, nbs=64, lr0=0.00125, imgsz=1024, degrees=180, MuSGD muon_w=0.5) so baseline runs are directly comparable to the paper 54.8 mAP reference, using the same --batch/--lr/--nbs linear scaling as coco_det_finetune.	2026-04-21 21:08:31 -05:00
Fatih Akyon	b77ed07c2e	feat: add --batch auto-scaler to phase 2 coco det Scales lr/nbs/warmup linearly from canonical bs=128/nbs=64/lr0=0.00038 so wd_eff and lr/sample stay invariant. Adds _COCO_DET_MODES constant and per-mode flag semantics in docstring.	2026-04-21 18:19:22 -05:00
Fatih Akyon	481aaa1051	feat: add coco pose mode, align coco det recipe Phase 2c pose runs were blocked because the runner had no pose branch; adds coco_pose_finetune (data=coco-pose.yaml, MuSGD, pose=24, kobj=4.0) that infers the -pose yaml from the phase1 cls yaml. Aligns coco_det_finetune args with the published yolo26s.pt detection recipe so phase2 coco runs match the official model's training setup. Previously the branch drifted (missing nbs=64, cos_lr=False, warmup_momentum/bias_lr, box/cls/dfl weights, randaugment, cutmix, copy_paste_mode, translate/degrees/shear/hsv/erasing, muon_w=0.4355), which made backbone comparisons against the 30.18 mAP CE baseline hard to interpret. sgd_w/cls_w/o2m/detach_epoch from the reference aren't accepted by this checkout's cfg validator, so only the exposed subset is applied. Renames modes with task prefixes so logs and wandb groups are unambiguous: finetune -> inet_finetune, linear -> inet_linear_probe, adamw_ft -> inet_adamw_finetune, coco_det(_frozen) -> coco_det_finetune(_frozen). The muon_w=0.1 callback is now gated to inet_finetune only; coco det uses muon_w=0.4355 from the published recipe.	2026-04-21 17:56:52 -05:00
Fatih Akyon	846dc24666	feat: add --batch per-GPU auto-scaler to phase1 Ultralytics scales wd_eff with batchaccumulate/nbs but never scales lr0, so larger global batches silently drift from the recipe's intended dynamics. The new flag takes a per-GPU batch, computes global = per_gpu world_size, and derives lr0, nbs, and warmup_epochs from scale = max(1, global / NBS_CANONICAL=512) so wd_eff stays at the recipe value while per-sample lr and optimizer-step warmup count are invariant.	2026-04-21 02:14:49 -05:00
Fatih Akyon	5105796c0f	fix: prevent val-train leak in multi-source mix _resolve_paths' flat-dir fallback returned (p, p), which in multi-source mode duplicated train files into the val ConcatDataset: on the 7-source mix 844,176 of 899,176 val samples (93.9%) were just re-enqueued train files, making val loss meaningless as a held-out signal. Regression introduced when multi-path support was added in `1aea2f95c`. Resolver now returns (train, None) when no held-out val is discoverable, and additionally swaps the last `train` path segment for `val` to auto-rescue deep layouts like .../images/train → .../images/val (recovers O365 30k, DOTA 5,297 held-out without caller changes). get_dataset filters None so flat sources (GQA, Flickr, SODA) drop cleanly from val instead of polluting it.	2026-04-20 19:15:33 -05:00
Fatih Akyon	51f66a2669	fix: move lr0 assignment after recipe resolution lr0 referenced r["lr0"] before r = RECIPES[recipe] was defined, causing UnboundLocalError when --lr flag is not passed.	2026-04-20 03:17:14 -05:00
Fatih Akyon	c87765a383	feat: accept data= in paths.patch_resume Ultralytics check_resume (trainer.py:841) restores the checkpoint's data path verbatim and does not honor caller overrides; cross-host resumes where the dataset lives at a different mount point (e.g. ultra5 NFS outage) previously needed a manual torch.load/save dance to rewrite train_args. Mirrors the existing name/device override branches so one helper call covers all four non-whitelisted fields (project, name, save_dir, data).	2026-04-18 11:31:06 -05:00
Fatih Akyon	0b7f4ca50f	feat: add --lr CLI override to phase2 script Phase2 had hardcoded per-mode lr0 (0.1 for MuSGD finetune, 1e-3 for AdamW), with no way to change it at launch without editing the file. Mirrors phase1's _pop_flag pattern so users can sweep learning rates or drop lr on resume runs that are diverging. CLAUDE.md already documented phase2 as supporting --lr; this makes the doc true.	2026-04-18 11:27:05 -05:00
Fatih Akyon	efd8cda01c	fix: decouple W&B project from local save_dir Add callbacks.paths with run_paths() and patch_resume() helpers so fresh runs land on clean W&B project yolo-next-encoder while save_dir stays absolute local, and resumes auto-patch train_args to survive cross-machine / relocated launches without manual checkpoint edits.	2026-04-18 10:52:40 -05:00
Fatih Akyon	107e405f4b	feat: wandb fork_and_attach helper and LOCAL_PROJECT guard Adds callbacks.wandb_config.fork_and_attach which pre-creates a forked wandb run (native fork_from or manual API-replay fallback) and hands off to DDP rank-0 via WANDB_RUN_ID+WANDB_RESUME env vars. phase1/phase2 gain an explicit module-level assert that LOCAL_PROJECT is absolute under /home/ and a --fork_from <parent_id>:<step> flag that invokes the helper before model.train(). Native fork is currently gated by wandb private preview so default path is API-replay; smoke-tested end-to-end with subprocess handoff.	2026-04-17 12:06:34 -05:00
Fatih Akyon	a2a5067e5c	fix: pin project to local and harden nfs_sync Ultralytics check_resume overwrites args.project from the ckpt (only whitelisted keys can override), so resuming a legacy NFS ckpt keeps save_dir on NFS; nfs_sync now warns-without-raising on NFS save_dir and wraps the final sync, and phase1/phase2 pin project=LOCAL_PROJECT so fresh runs land on local SSD explicitly.	2026-04-17 11:19:44 -05:00
Fatih Akyon	486872a537	feat: mirror run dir to NFS via background rsync Writing save_dir to local SSD and rsyncing to NFS every 10min decouples training from NFS availability, avoiding a repeat of the C2-o365-coco-inet crash where a stale NFS mount destroyed the resumed run EMA state.	2026-04-17 10:58:50 -05:00
Fatih Akyon	0116afb1f5	feat: expose loss weight CLI args in phase1 script cos_weight, l1_weight, cls_l1 were already supported by the trainer but hardcoded in the launch script. Now configurable via --flags.	2026-04-15 11:19:37 -05:00
Fatih Akyon	ac5be96d4d	build: gitignore encoder-distillation.md symlink Experiment tracking file moved to NFS for multi-machine access, symlinked at project root.	2026-04-15 03:55:34 -05:00
Fatih Akyon	8fba19a44b	feat: add configurable epochs arg to phase1 distillation script Different datasets need different epoch counts to match total images seen (e.g. O365+COCO+ImageNet at 114ep = DataComp at 30ep = ~230M images).	2026-04-15 02:42:02 -05:00
Fatih Akyon	f1f5ad4879	feat: support multi-dataset distillation and frozen COCO detection Phase1: configurable data path enables ImageNet+COCO combined training. Phase2: coco_det_frozen mode (freeze=9) isolates backbone feature quality.	2026-04-14 09:07:07 -05:00
Fatih Akyon	1aea2f95cd	feat: add multi-dataset and loss config passthrough for distillation UNIC trains on ImageNet-1k (main_unic.py:97), DUNE on IN-19k+GLDv2+ Mapillary (data/dino2.py). Comma-separated data paths now supported for combining ImageFolder datasets via ConcatDataset. Loss args (cos_weight, l1_weight, cls_l1) passed from trainer to model.	2026-04-13 11:22:08 -05:00
Fatih Akyon	e0a1b56bb9	feat: add configurable loss weights (cos_weight, l1_weight, cls_l1) UNIC (unic/modeling/losses.py:54) and DUNE (dune/model/losses.py:62) apply 0.5cos+0.5L1 to both CLS and patches, vs our EUPE-style cosine-only CLS. Configurable via train args for ablation testing.	2026-04-13 11:21:08 -05:00
Fatih Akyon	2214bfdfe4	fix: use None-check instead of truthiness for hidden_dim default `hidden_dim or in_dim` silently treats 0 as falsy	2026-04-13 09:47:41 -05:00
Fatih Akyon	92429ae875	feat: add configurable proj_hidden_dim for adaptor MLP Adaptor hidden dimension was hardcoded to backbone dim (1280). EUPE uses 3072 for larger students. Now configurable via proj_hidden_dim train arg.	2026-04-13 09:30:16 -05:00
Fatih Akyon	9542daa392	refactor: remove knn_callback, keep extract_features and knn_accuracy kNN eval now runs inside ImageEncoderTrainer.validate() directly. The callback closure is no longer needed. extract_features and knn_accuracy remain as utilities for run_knn_eval.py standalone eval.	2026-04-12 20:58:40 -05:00
Fatih Akyon	98d9672b29	refactor: use knn_eval train arg instead of external callback Remove knn_callback import and model.add_callback call. Pass knn_eval=/data/shared-datasets/imagenet in train_args instead, which ImageEncoderTrainer reads in _setup_train (DDP-safe).	2026-04-12 20:57:13 -05:00
Fatih Akyon	6a59bb39e3	feat: add DDP-safe kNN eval to ImageEncoderTrainer Move kNN eval from external callback (lost in DDP subprocess) to trainer validate() override. Enabled via knn_eval=<imagenet_path> in train_args, which survives DDP serialization through allowed_custom_keys. Caches dataloaders across epochs, runs every 5 epochs on rank 0 only.	2026-04-12 20:56:45 -05:00
Fatih Akyon	7868016483	fix: disable C2PSA remap in coco_det mode, restore pretrained= flow Remap caused 17.77% mAP vs 28.02% without it. phase2-coco-d5 was invalid (used remap unintentionally). Revert to standard pretrained= which transfers backbone layers 0-8 via intersect_dicts.	2026-04-12 20:28:28 -05:00
Fatih Akyon	a2f225806b	fix: infer det model scale from phase1 args.yaml in coco_det mode Was hardcoded to yolo26s.yaml, breaking yolo26l COCO runs. Now reads model config from phase1 run's args.yaml and strips -cls suffix.	2026-04-12 06:54:01 -05:00
Fatih Akyon	70d7ab226c	feat: add cls-to-det remap for coco_det mode Load distilled cls weights with C2PSA index remapping (cls model.9 -> det model.10) in coco_det mode. Tested: remap transfers 228 vs 192 params but produced worse mAP (17.77% vs 28.02%) due to activation magnitude mismatch. Kept for future investigation with scaling fix.	2026-04-12 05:58:37 -05:00
Fatih Akyon	f7ce7d349d	feat: replace unic recipe with eupe, add kNN eval callback Replace unic recipe preset (lr=6e-4, wd=0.03, beta2=0.99) with eupe recipe (lr=2e-5, wd=1e-4) matching EUPE Stage 2 proxy-to-student distillation params. Add kNN eval every 5 epochs for frozen feature quality tracking during Phase 1 training.	2026-04-12 05:52:49 -05:00
Fatih Akyon	23f3c0fa50	feat: add standalone kNN eval script with WandB summary update Takes a run directory, finds weights and model config from args.yaml, runs kNN evaluation (k=20, T=0.07), and optionally updates the finished WandB run summary with knn/top1 via --wandb flag.	2026-04-12 05:50:06 -05:00
Fatih Akyon	d7afe09ea7	feat: add teacher-averaged loss metrics and WandB epoch alignment Add aggregated cls_cos, patch_cos, patch_l1 metrics averaged across teachers for cross-run comparison in WandB. Define epoch-based x-axis via wandb.define_metric so backfilled and new runs align.	2026-04-12 05:49:48 -05:00
Fatih Akyon	f56b609319	feat: add stride-16 cls YAML for ViT teacher spatial alignment yolo26-cls-s16.yaml: layer 7 Conv stride 2->1, C2PSA->C3k2. 14x14=196 patches at 224 input, matching DINOv3-ViT (196 patches, arXiv:2312.06709) exactly. No attention, NCNN/CPU exportable. 7.1M params (+5.8%), 24.5 GFLOPS (1.9x vs base 13.2).	2026-04-11 17:52:51 -05:00
Fatih Akyon	4035e413ce	fix: drop original model.9 keys in cls-to-det remap The else branch was missing, so cls model.9 (C2PSA) BN keys were both remapped to model.10 AND kept as model.9. intersect_dicts shape-matched 6 C2PSA BN stats into SPPF (det model.9), corrupting initialization. phase2-coco-d1-remap showed 11pp deficit vs non-remap at ep58.	2026-04-11 17:51:48 -05:00
Fatih Akyon	dcadf170be	feat: add cls-to-det weight remap callback for C2PSA transfer cls model.9 (C2PSA) maps to det model.10 (C2PSA) due to SPPF insertion at det model.9. Remaps keys before intersect_dicts so C2PSA weights transfer correctly (+42 params over standard loading).	2026-04-11 13:00:40 -05:00
Fatih Akyon	f6d35ccc51	fix: add missing ClassificationDataset args to kNN eval scale, fliplr, flipud, hsv_h/s/v are accessed unconditionally in ClassificationDataset.__init__ at dataset.py:746, even with augment=False.	2026-04-11 12:59:21 -05:00
Fatih Akyon	565b8219cf	feat: add ImageNet kNN evaluation for distilled feature quality RADIO/EUPE protocol (k=20, T=0.07): extract L2-normalized CLS features, temperature-weighted voting via scatter_add_. Includes callback for on_fit_epoch_end integration with WandB logging.	2026-04-11 11:36:51 -05:00

1 2 3 4 5 ...

3916 commits