-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Add LWDetr model #40991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add LWDetr model #40991
Changes from all commits
Commits
Show all changes
85 commits
Select commit
Hold shift + click to select a range
23631b1
feat: add LWDetr model
sbucaille 0eb69e6
fix: changed LwDetrVit base classes from VitDet to ViT
sbucaille 12fa5e2
tests: added tests for LWDetr
sbucaille aceb10c
refactor: fix all issues and created docs
sbucaille bd48206
tests: added missing lw_detr_vit tests
sbucaille 06c7d70
docs: add lwdetr docs
sbucaille a2ef8c3
fix: fixed implementation error and associated tests
sbucaille 9faeaee
chore: removed testing lib in imports
sbucaille 0fad340
refactor: replace LwDetrImageProcessor with DeformableDetrImageProcessor
sbucaille a89f8f2
refactor: remove two-stage detection and bounding box reparameterizat…
sbucaille e942891
refactor: rename LwDetrCSPRepLayer to LwDetrC2FLayer
sbucaille f9e9631
refactor: introduce LwDetrMLP for feedforward layers in decoder
sbucaille 9cf545d
refactor: replace build_position_encoding with LwDetrSinePositionEmbe…
sbucaille 199b2bc
refactor: remove use_cae parameter and related logic from configurati…
sbucaille ab6096f
refactor: remove unused variables and simplify certain instructions
sbucaille b95de12
refactor: removed unnecessary one line instruction method with_pos_embed
sbucaille 0af67c2
refactor: use llama attention formatting for hidden shape
sbucaille 15625a5
docs: add comments about group detr
sbucaille 22d66d2
fix: removed wrong sigmoid and fixed init for class_embed
sbucaille 5b7f657
refactor: removed unused positional embeddings classes and weights fr…
sbucaille b075292
chore: removed unused import
sbucaille d863a68
chore: make style and repo-consistency after positional embeddings re…
sbucaille d6fdd91
refactor: removed unused drop path rate
sbucaille 13ad4a8
fix: ingest latest changes from rebase
sbucaille 8147b45
fix: attn_implementation setter
sbucaille 25fbaab
fix: is causal set to False
sbucaille d5b24a6
refactor: renamed ffn to mlp and moved layer norm out of mlp
sbucaille fa6deed
fix: check model inputs
sbucaille 75b3f1f
fix: moved super init call in LwDetrConfig
sbucaille f998abb
fix: super class in GradientCheckpointingLayer
sbucaille e164b8a
fix: replaced RTDetr occurences by LwDetr in test modeling file
sbucaille 05afaa7
refactor: removed head_mask from LwDetrViT
sbucaille ff48821
docs: added release date in docs
sbucaille c627a2a
fix: added missing attention mask argument
sbucaille 755c5b8
chore: make style & repo-consistency
sbucaille 8b72816
fix: ensure tensor dtype consistency in loss calculations and test cases
sbucaille 2b9ebff
docs: fixed model release date
sbucaille 6e4f583
refactor: removed unnecessary module cloning
sbucaille 97c1d37
tests: added missing _prepare_for_class method and removed batching_e…
sbucaille abae375
tests: added xlarge integration test
sbucaille e037e63
chore: added lw_detr reference in image processing auto
sbucaille 9deff21
chore: removed unnecessary properties from LwDetrConfig
sbucaille be4fc9f
fix: fix for latest main changes
sbucaille 7556efc
fix: apply modular changes from mail
sbucaille 77a94e7
docs: update model doc and docstrings
sbucaille fec5db9
fix: style
sbucaille 4cdc807
fix: update output values in convert script
sbucaille df6f2ed
feat: added proper last_hidden_states in LwDetrDecoderOutput and sepa…
sbucaille 138b009
fix: guard accelerate imports
sbucaille d71dbb8
fix: removed LWDetrConfig attribute map and changed LwDetrAttention i…
sbucaille f9e60b4
fix: parameterize amap based on config
sbucaille 635f527
fix: remove redundant decorator
sbucaille eeac74a
chore: moved LwDetrViT to LwDetr single modular file
sbucaille 514536e
fix: remove unnecessary attribute_map in LwDetrViT
sbucaille e82f6d5
chore: simplified LwDetr modules methods with proper hidden_states re…
sbucaille 13e9aa3
fix: replaced hardcoded value by variable
sbucaille 082715b
tests: added VitDet and attention tests
sbucaille 74c47a7
fix: modular conversion
sbucaille 865739f
tests: moved LwDetrViT tests to test_modeling_lw_detr file
sbucaille dec88cd
docs: add lwdetr advances in docs
sbucaille b7821a3
refactor: removed arguments to classes as much as possible and rely o…
sbucaille c9809f8
Merge branch 'main' into add_lw_detr
Cyrilvallez ad93a7b
reapply style, remove LlamaAttention inheritance to remove decorator
Cyrilvallez e5a0446
chore: updated licence and year
sbucaille 003d63d
fix: removed torch.nn.functional from modular
sbucaille 99490ca
docs: removed redundant docstring arguments covered by autodocstring …
sbucaille 11727ab
refactor: removed backbone api statements
sbucaille 8b8feb8
fix: added back num_key_value_groups in LwDetrAttention
sbucaille c68e713
chore: removed unnecessary copied from statement
sbucaille d0cfb7a
chore: moved LwDetrViT modules above LwDetr modules
sbucaille c03d5ee
tests: removed unnecessary overwrite and “test_” attributes
sbucaille 3342ea8
docs: added missing docs
sbucaille b01b7c5
Merge remote-tracking branch 'upstream/main' into add_lw_detr
sbucaille 8e2753e
style: remove unnecessary parentheses
sbucaille 06ac9fb
docs: added back logits docstring
sbucaille b150f23
docs: added docs dates
sbucaille 6f89388
Merge branch 'main' into add_lw_detr
Cyrilvallez 8a7818f
style details
Cyrilvallez e5c20a0
unessecary utf8
Cyrilvallez c745f90
might as well skip all config checks
Cyrilvallez 098fb4d
embeddings are large, increase model_split_percents
Cyrilvallez 93ba55a
fix device issue
Cyrilvallez aabdb76
update logits
Cyrilvallez 0493e06
set device in expectations
Cyrilvallez 1312458
add to toctree
Cyrilvallez File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| <!--Copyright 2026 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was released on 2024-04-05 and added to Hugging Face Transformers on 2026-01-10.* | ||
|
|
||
| <div style="float: right;"> | ||
| <div class="flex flex-wrap space-x-1"> | ||
| <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white"> | ||
| </div> | ||
| </div> | ||
|
|
||
| # LW-DETR | ||
|
|
||
| [LW-DETR](https://huggingface.co/papers/2407.17140) proposes a light-weight Detection Transformer (DETR) architecture designed to compete with and surpass the dominant YOLO series for real-time object detection. It achieves a new state-of-the-art balance between speed (latency) and accuracy (mAP) by combining recent transformer advances with efficient design choices. | ||
|
|
||
| The LW-DETR architecture is characterized by its simple and efficient structure: a plain ViT Encoder, a Projector, and a shallow DETR Decoder. | ||
| It enhances the DETR architecture for efficiency and speed using the following core modifications: | ||
| 1. Efficient ViT Encoder: Uses a plain ViT with interleaved window/global attention and a window-major organization to drastically reduce attention complexity and latency. | ||
| 2. Richer Input: Aggregates multi-level features from the encoder and uses a C2f Projector (YOLOv8) to pass two-scale features ($1/8$ and $1/32$). | ||
| 3. Faster Decoder: Employs a shallow 3-layer DETR decoder with deformable cross-attention for lower latency and faster convergence. | ||
| 4. Optimized Queries: Uses a mixed-query scheme combining learnable content queries and generated spatial queries. | ||
|
|
||
| You can find all the available Deformable DETR checkpoints under the [stevenbucaille](https://huggingface.co/stevenbucaille) organization. | ||
| The original code can be found [here](https://github.com/Atten4Vis/LW-DETR). | ||
|
|
||
| > [!TIP] | ||
| > This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille). | ||
| > | ||
| > Click on the LW-DETR models in the right sidebar for more examples of how to apply LW-DETR to different object detection tasks. | ||
|
|
||
|
|
||
| The example below demonstrates how to perform object detection with the [`Pipeline`] and the [`AutoModel`] class. | ||
|
|
||
| <hfoptions id="usage"> | ||
| <hfoption id="Pipeline"> | ||
|
|
||
| ```python | ||
| from transformers import pipeline | ||
| import torch | ||
|
|
||
| pipeline = pipeline( | ||
| "object-detection", | ||
| model="stevenbucaille/lwdetr_small_60e_coco", | ||
| dtype=torch.float16, | ||
| device_map=0 | ||
| ) | ||
|
|
||
| pipeline("http://images.cocodataset.org/val2017/000000039769.jpg") | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| <hfoption id="AutoModel"> | ||
|
|
||
| ```python | ||
| from transformers import AutoImageProcessor, AutoModelForObjectDetection | ||
| from PIL import Image | ||
| import requests | ||
| import torch | ||
|
|
||
| url = "http://images.cocodataset.org/val2017/000000039769.jpg" | ||
| image = Image.open(requests.get(url, stream=True).raw) | ||
|
|
||
| image_processor = AutoImageProcessor.from_pretrained("stevenbucaille/lwdetr_small_60e_coco") | ||
| model = AutoModelForObjectDetection.from_pretrained("stevenbucaille/lwdetr_small_60e_coco") | ||
|
|
||
| # prepare image for the model | ||
| inputs = image_processor(images=image, return_tensors="pt") | ||
|
|
||
| with torch.no_grad(): | ||
| outputs = model(**inputs) | ||
|
|
||
| results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([image.size[::-1]]), threshold=0.3) | ||
|
|
||
| for result in results: | ||
| for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]): | ||
| score, label = score.item(), label_id.item() | ||
| box = [round(i, 2) for i in box.tolist()] | ||
| print(f"{model.config.id2label[label]}: {score:.2f} {box}") | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| </hfoptions> | ||
|
|
||
|
|
||
| ## Resources | ||
|
|
||
| A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with LwDetr. | ||
|
|
||
| <PipelineTag pipeline="object-detection"/> | ||
|
|
||
| - Scripts for finetuning [`LwDetrForObjectDetection`] with [`Trainer`] or [Accelerate](https://huggingface.co/docs/accelerate/index) can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/object-detection). | ||
| - See also: [Object detection task guide](../tasks/object_detection). | ||
|
|
||
| ## LwDetrConfig | ||
|
|
||
| [[autodoc]] LwDetrConfig | ||
|
|
||
| ## LwDetrViTConfig | ||
|
|
||
| [[autodoc]] LwDetrViTConfig | ||
|
|
||
| ## LwDetrModel | ||
|
|
||
| [[autodoc]] LwDetrModel | ||
| - forward | ||
|
|
||
| ## LwDetrForObjectDetection | ||
|
|
||
| [[autodoc]] LwDetrForObjectDetection | ||
| - forward | ||
|
|
||
| ## LwDetrViTBackbone | ||
|
|
||
| [[autodoc]] LwDetrViTBackbone | ||
| - forward | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be nice to (very) briefly describe or mention the recent advances and design choices :)