LLaMA Implementation by zphang · Pull Request #21955 · huggingface/transformers (original) (raw)

Fix 2 quicktour file doctest (#21742)
Update expect output values - as Hub repo. files are updated
Update expect output values - as librosa is from 0.9.2 to 0.10.0 on CI docker
fix
update one more

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

[GPTNeo] Fix gradient checkpointing bug (#21733)
fix bug
forward contrib credits from discussions
change logic

Co-authored-by: edbeeching edbeeching@users.noreply.github.com

Generate: Fix GIT batched captioning (#21738)
Skip test_log_level for now
Added Type Hints for modeling_tf_encoder_decoder.py (#21673)
Ran Black formatting
Added imports and reformatted
Update src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py

Co-authored-by: Matt Rocketknight1@users.noreply.github.com

Auto api Value Error addition to Troubleshoot (#21708)
troubleshooting guide: added an error description for missing auto-mapping
minor polishing
changed the example
Apply suggestions from code review

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/troubleshooting.mdx

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

[deepspeed tests] fix issues introduced by #21700 (#21769)
[deepspeed tests] fix issues introduced by #21700
fix
fix
Graphormer fix (#21699)
Removed useless check for backend
fix style check for graphormer
Reverted change and corrected requires_backend for cython
code qual
fix: Change is_last chunk calc and add conditional break in chunk_iter (#21612)
fix: Change is_last chunk calc and add conditional break
format fix
account for 0 and full stride_rights, add comment
add new test
make style
update slow whisper asr test timestamps
use nested_simplify on output and round timestamp to hundreths place
[Flax] adding support for batch norm layers (#21581)
[flax] adding support for batch norm layers
fixing bugs related to pt+flax integration
cleanup, batchnorm support in sharded pt to flax
support for batchnorm tests in pt+flax integration
simplifying checking batch norm layer
[Examples] Generalise run audio classification for log-mel models (#21756)
[Examples] Generalise run audio classification for log-mel models
batch feature extractor
make style
Different behavior in DistilBERT when using "inputs_embeds" (#21752)
Different behavior in DistilBERT when using "inputs_embeds" Fixes #21089
fix failing test
[Flax] Fix erroneous kwargs being passed to generate config (#21765)
[Whisper] Add SpecAugment (#21298)
Return and rescale attention_mask
Add SpecAugment to Whisper modeling
Fix test
Update docstring
Add SpecAug related parameters to model config
Add the _mask_input_features function to doc
Fix quality
Apply suggestions from code review

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

Remove dev comments
Add test
Resolve conflict
feat: mask {feature, time} prob fast tests
Apply suggestions from code review

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com Co-authored-by: sanchit-gandhi sanchit@huggingface.co Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix-ci-whisper (#21767)
fix history
input_features instead of input ids for TFWhisport doctest
use translate intead of transcribe
Generate - update cookie cutters to not initialize cache with training and gradient checkpointing (#21759)
[time series] updated expected values for integration test. (#21762)
updated expected
prediction_length fix
prediction_length default value
default prediction_length 24
revert back prediction_length default
move prediction_length test
[GPT2, ProphetNet] Fix gradient checkpointing bug (#21772)
fix gradient checkpointing bug
fix gradient checkpointing bug
ran make fix-copies
fixed bug
fixed bug
[SpeechT5] Fix HiFiGAN tests (#21788)
Fix resume_from_checkpoint for deepspeed (#21735)
Fix resume_from_checkpoint for deepspeed

Fix resume_from_checkpoint for deepspeed, by ensuring that the deepspeed engine is the one to load the checkpoint.

Empty commit to trigger CI
Removed deepspeed skipping

Removed deepspeed skipping inside the _load_from_checkpoint function, as it is obsolete

another adjustment
Trigger CI
trigger circleci
style

Co-authored-by: ydshieh ydshieh@users.noreply.github.com Co-authored-by: Stas Bekman stas@stason.org

[examples/summarization] deal with max_length and num_beams (#21740)
Override the decoding parameters of Seq2SeqTrainer
Fix quality
Fix max_length parameter
Fix quality
Remove redundant parameter max_length
Separate the preprocess of train and validation to use different max_target_length
Fix type in gpt2 config docstring (#21782)

Fix docstring gpt2 config

Fix en documentation typos (#21799)
fix wrong url
typos in english documentation
[FX tracer] Make concrete_args from outside available (#21775)

make concrete_args from outside available

[Pipeline] Add zero shot audio classificatoin pipeline (#21600)
add pipeline
update init
add zero shot to init
update inits and correct checkpoints
update base to support input features
add tests
Update src/transformers/pipelines/zero_shot_audio_classification.py

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Update src/transformers/pipelines/zero_shot_audio_classification.py

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

update pieline code
use tiny checkpoint
nits and expected value with tiny model
style
last nit on tests values
fix styling
fix collate fn that was casting t float
update

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

[torch] remove deprecated uint8 in favor of bool (#21384)
uint8 -> bool
fix copies
style
update test modeling commen when checking attention buffers
style
use logical not on random mask instead of subtraction with 1
remove torch uint8
quality
remove modified modeling utils
Update based on review

Co-authored-by: sgugger sylvain.gugger@gmail.com

[tests] add accelerate marker (#21743)
add accelerate marker
add to docs
Update docs/source/en/testing.mdx
Fix PyTorch Perceiver PerceiverFourierPositionEncoding with fp16 (#21787)
fix perceiver fp16
hopefully fix tests
Fix nn.init.trunc_normal_ call on torch.float16 data (#21789)

fix nn.init.trunc_normal_ call on half data

Fix gradient checkpointing bug in gptneox (#21815)
Fix gradient checkpointing bug in gptneox
Remove use_cache block
Inheritance-based framework detection (#21784)
Fix quality with ruff==0.0.253 (#21828)

fix quality with ruff 0.0.253

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

introduce logger.warning_once and use it for grad checkpointing code (#21804)
logger.warning_once
style
Rename MobileViTModelTest to TFMobileViTModelTest (#21825)

Let's give TF a bit more love ❤️ 🙏

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Fix gradient checkpointing bug BioGpt (#21844)

Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in

check for None forced tokens (#21793)
Fix gradient checkpointing bug in git (#21818)

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix gradient checkpointing imagegpt (#21816)
Fix gradient checkpointing bug in gptneox
Fix gradient checkpointing bug in modeling_imagegpt.py
Revert gpt neox changes

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix tf random token masking probability in data collator (#21834)
fix tf random mask tokens probability
fix tf random mask tokens probability in collator for langauge modelling
[T5] Fix torchquant issue (#21843)
fix torchquant issue
add tests
[Blip2] Add Blip2Model (#21817)
add v1
add Blip2Model

add relevant functions
add tests
add on automapping

fix docs
fix doctest
Fix the issue of blip model returning loss even when the label is not provided. (#21811)
Fix the issue of blip model returning loss even when the label is not provoided
Fix ruff failure
Incorporate PR feedbacks
Incorporate PR feedbacks
Incorporate PR feedbacks
Incorporate PR feedbacks
[GPTJ] Fix gradient checkpointing bug (#21794)
If applied, this commit fixes generate bug in gptj
Remove extra same code block
formatting and test fix
Conflict fix and declaration error fix

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Add: task guide for zero shot object detection (#21829)
zero shot object detection part 1
added batch prediction section
added image guided object detection section
make style
added the task guide to the TOC
minor polishing
Apply suggestions from code review

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com

added embedded owlvit demo
Apply suggestions from code review

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

minor fix
make style

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Make Slack CI reporting stronger (#21823)
Use token
Avoid failure
better error
Fix
fix style

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

[Blip2] Fix Blip-2 multi gpu (#21707)
fix blip multi gpu
fix
final changes
adapt suggestions
fix failing slow test
forward contrib credits from testing and suggestions
reformat

Co-authored-by: akkikiki akkikiki@users.noreply.github.com

Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval (#21684)
Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval
minor fix return_dict
implement test for loss computation

Co-authored-by: Tiep Le 97980157+tileintel@users.noreply.github.com Co-authored-by: Tiep Le tiep.le@intel.com

🔥Rework pipeline testing by removing PipelineTestCaseMeta 🚀 (#21516)
Add PipelineTesterMixin
remove class PipelineTestCaseMeta
move validate_test_components
Add for ViT
Add to SPECIAL_MODULE_TO_TEST_MAP
style and quality
Add feature-extraction
update
raise instead of skip
add tiny_model_summary.json
more explicit
skip tasks not in mapping
add availability check
Add Copyright
A way to diable irrelevant tests
update with main
remove disable_irrelevant_tests
skip tests
better skip message
better skip message
Add all pipeline task tests
revert
Import PipelineTesterMixin
subclass test classes with PipelineTesterMixin
Add pipieline_model_mapping
Fix import after adding pipieline_model_mapping
Fix style and quality after adding pipieline_model_mapping
Fix one more import after adding pipieline_model_mapping
Fix style and quality after adding pipieline_model_mapping
Fix test issues
Fix import requirements
Fix mapping for MobileViTModelTest
Update
Better skip message
pipieline_model_mapping could not be None
Remove some PipelineTesterMixin
Fix typo
revert tests_fetcher.py
update
rename
revert
Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests
style and quality
test fetcher for all pipeline/model tests

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Improve TF weight loading, especially PT crossloading (#21792)
First commit for the improved PT-TF weight loading
Remove workarounds from TFEncoderDecoder tests
Allow a custom weight renaming function in from_pretrained and use that to clean up EncoderDecoder
make fixup
First attempt at visionencoderdecoder
Disable tensorfloat32 in tests to get consistent outputs
Quick fix to tf_vision_encoder_decoder tests
make fixup
Update Blenderbot tests
Remove unused arg in modeling_tf_opt
load_tf_sharded_weights had strict=True! This meant transfer learning was impossible, so I'm setting it to False.
Support prefixes when loading sharded TF checkpoints
make fixup
Add test to load sharded models with a weight prefix
Fix sharded weight loading test
Add a test for transfer from a sharded checkpoint
make fixup
Add test to check that crossloading from PT with a prefix works
Refactor from_pretrained in the encoderdecoder classes
Refactor from_pretrained in the encoderdecoder classes
missmatched -> mismatched
Explicitly check for None
No comments showing my very impressive and attractive knowledge of Py3.9+
Disable TF32 across all TF tests
Fix flaky test for log level (#21776)
Fix flaky test for log level
Fix other flaky test
prepare for "floordiv is deprecated and its behavior will change in a future version of pytorch" (#20211)
rounding_mode = "floor" instead of // to prevent behavioral change
add other TODO
use torch_int_div from pytrch_utils
same for tests
fix copies
style
use relative imports when needed
Co-authored-by: sgugger sylvain.gugger@gmail.com
[ConvBert] Fix #21523 (#21849)
fix reshaping Fixes #21523
add test
styling
last fixes
Update src/transformers/models/convbert/modeling_convbert.py
code quallity
Flax beam search fix (#21857)
Fix gradient checkpointing bug Bart (#21866)

Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in

[deepspeed] check whether model is NLP one instead of counting on input type (#21800)
trying to figure out whether model is NLP
drop my changes and apply easier fix
trying to handle all int input types
fix logic

Co-authored-by: Stas Bekman stas@stason.org

Change the way tensor is reshaped in BartAttention (from .view to .reshape) (#21860)
Change the .view call to .reshape
Change the .view call to .reshape to all the copies from bart attention
Fix copies and style
Fix copies and style
Fix copies and style
Fix copies and style
Fix copies and style
Revert unneccessary changes
Revert unneccessary changes
Revert unneccessary changes
Revert unneccessary changes
Italian translation of community.mdx (#21871)

Italian translation of community.mdx gh-17459

[Blip] Fix blip doctest (#21868)

fix blip doctest

Removed BLIP mention from the troubleshooting guide (#21872)

removed BLIP mention from the troubleshooting guide

update FSDP and add XLA-FSDP documentation (#21812)
update FSDP and add XLA-FSDP documentation
resolving comments
minor update
fix xla-fsdp docs
[doc] deepspeed tests (#21859)
Add an utility file to get information from test files (#21856)
Add an utility file to get information from test files

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Add check for different embedding types in examples (#21881)
Add check for different embedding types in examples
Correctly update summarization example
Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights (#21879)

apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation

Add TFVisionTextDualEncoder (#21873)
Temporary commit to stash everything so far
Temporary commit to stash everything so far
stash commit
Refactor from_pretrained
Fix final test, make fixup
Update dummies
Add model to TEST_FILES_WITH_NO_COMMON_TESTS
Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Add TFVisionTextDualEncoder to utils/documentation_tests.txt
make fixup

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Add ALIGN to transformers (#21741)

Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.

Fix Gradient checkpointing bug BigBird (#21882)

Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in

Fix WhisperModelTest (#21883)
force on the same device
fix tests

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Fix test_load_default_pipelines_pt for ClapModel (#21886)
fix tests

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

fix checkpoint (#21874)
[Refactor] Relative imports wherever we can (#21880)
initial commit
update
second batch
style
fix imports
fix relative import on pipeline
[ZAC] fix ci daily (#21893)

add correct revision after model was overwritten

Use PyAV instead of Decord in examples (#21572)
Use PyAV instead of Decord
Get frame indices
Fix number of frames
Update src/transformers/models/videomae/image_processing_videomae.py
Fix up
Fix copies
Update timesformer doctests
Update docstrings
Add inputs_embeds functionality when generating with BioGPT (#21889)
initial commit to add inputs_embeds to generation
formatting
[T5 doc] Fix confusing documentation about d_kv (#21896)
Confusing documentation in T5
Fix onfusing documentation in T5 configuration file
[Whisper] Add rescaling function with do_normalize (#21263)
add zero_mean_unit_var_norm function
normalize before MEL computation
fixup
add simple test
quality
Update tests/models/whisper/test_feature_extraction_whisper.py

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com

fixup
use attention masks if padding was applied
Update based on review

Co-authored-by: bofeng huang bofenghuang7@gmail.com

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com Co-authored-by: bofeng huang bofenghuang7@gmail.com

fix typo in Bart's attention (#21898)
[GPT-J] add deprecation warning (#21869)
add deprecation warning
remove pos ids from args docstirng
fix failing test
fsdp bf16 enable autocast (#21847)
Fix gradient checkpointing bug LED (#21840)

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix gradient checkpointing bug M2M 100 (#21841)

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix gradient checkpointing bug marian (#21842)

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Mark pipeline tests to skip them easily (#21887)
Mark pipeline tests to skip them easily
Mark the mixin as pipeline test
Update src/transformers/testing_utils.py

Co-authored-by: Yih-Dar 2521628+ydshieh@users.noreply.github.com

Clean up auto mapping names (#21903)
add new test
fix after new test

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Prophetnet batch dimension inversion fix (#21870)
decoder forward pass is working
no model has forward pass returning attentions
decoder ngram changed to not mix batch size
current basic forward pass returns identical result
passed test_model attentions
passed test_encoder_decoder_model_generate
passed test_headmasking
removed old block
removed comments bug/fixme
removed bug comments
applied styling
applied fix-copies
applied ngram forward comments
corrected dimension notation
applied styling and comment fixes
changed asserts for raise ValueError
changed question gen test
updated hidden_states integration test
applied styling
Make schedulers picklable by making lr_lambda fns global (#21768)
Make schedulers picklable by making lr_lambda fns global
add unused _get_constant_schedule_lr_lambda arg
remove unneeded _get_constant_schedule_lr_lamda
add test
make style
rebase, remove torch dep, put lambda back
repo-consistency and style
Refactor whisper asr pipeline to include language too. (#21427)
[WIP] whisper refacto to support language output.
Handling merges.
A bit more cleanup and comments.
Many improvements.

Lots of details everywhere.

Cleanup old code and tests.
Handle lone timestamp tokens (just recover when something bad happens).
Adding return_language example.
No ffmpeg.
Hmm.
Some corrections.
Both fast and slow.
New black.
Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

Update src/transformers/models/whisper/tokenization_whisper.py

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

Remove print.
Undoing tests modifications.
Smaller test modifications.
Rename.
Remove maxDiff.

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

Add Blip and Blip2 for pipeline tests (#21904)
fix
add to tests
style and quality
add missing

Co-authored-by: NielsRogge NielsRogge@users.noreply.github.com Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Temporarily skip 3 tests in BridgeTowerModelTest (#21908)

skip for now

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Faster zero shot image (#21897)
Make ZeroShotImageClassificationPipeline faster

The pipeline makes separate calls to model for each candidate label. This commit combines all labels into one call. Original code takes more that 60 seconds to process one image and 1000 candidate labels. Updated code takes less than 2 seconds.

implement batching
code formatting
Creating an even faster zero-shot-image-classifiction.

Unfortunately super tailored towards CLIP.

Co-Authored-By: Yessen Kanapin yessen@deepinfra.com

Quality.
Cleanup.
Order different on the CI it seems.
Cleanup.
Quality.

Co-authored-by: Yessen Kanapin yessen@deepinfra.com

[time series] Add Time series inputs tests (#21846)
intial test of inputs
added test for generation
remove asserts
fixed test
Update tests/models/time_series_transformer/test_modeling_time_series_transformer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Avoid modeling tests run in pipeline CI jobs (#21911)
rework is_pipeline_test
bring back 3 tests

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Fix doctests for TFVisionTextDualEncoder (#21910)
faster forward following what is done for images (#21906)
faster forward following what is done for images
add missing licence
Fix gradient checkpointing bug in MBart (#21918)
Fix gradient checkpointing bug in mvp (#21920)
Fix gradient checkpointing megatron bert (#21921)
Update model_split_percents for WhisperModelTest (#21922)

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Use large VM for repo_utils_job (#21928)

upgrade to large VM

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Cleanup more auto mapping names (#21909)
fix auto 2
fix auto 2
fix task guide issue
fix

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

feat: filter try/except when looking at custom code (#21914)
feat: filter try/except
Update src/transformers/dynamic_module_utils.py

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix AlignModelTest tests (#21923)
fix
fix

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Avoid failure in check_repo.py due to missing backends (#21930)
Update utils/check_repo.py

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Update utils/check_repo.py

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Co-authored-by: ydshieh ydshieh@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Fix wrong documentation about DataCollator padding defaults (#21919)
Fix wrong documentation about DataCollator padding defaults
Fix styling
[Flan-UL2] Add-flan-ul2 (#21929)
add doc and readme
add model docs
update toctree and fix copies
update
update doc file
fix
add FLAN-UL2 to configuration mapping
fixup
Apply suggestions from code review
more clarification

Co-authored-by: younesbelakda younesbelkada@gmail.com Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Update README logo (#21933)
[CLAP] Support batched inputs for CLAP. Fixes pipeline issues (#21931)
fix pipeline
fix feature_extraction clap
you can now batch the is_longer attribute
add tests
fixup
add expected scores
comment on is_longert
[Whisper] Fix feature normalization in WhisperFeatureExtractor (#21938)

Fix feature normalization in WhisperFeatureExtractor

Fix gradient checkpointing bug in OPT (#21943)
Fix gradient checkpointing bug in Pegasus (#21944)
Fix gradient checkpointing bug in Rembert (#21945)
Fix gradient checkpointing bug in Roformer (#21946)
Fixed gradient_checkpointing/use_cache bug in blenderbot (#21833)
Fixed gradient_checkpointing/use_cache bug in blenderbot
Update modeling_blenderbot.py
Added back if statement
Formatted using black
Update expected values in XLMProphetNetModelIntegrationTest (#21957)

update values

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

[CI] Fix ci (#21940)
fix get_proposal_pos_embed
fix order
style
zero shot simplify test
add approximate values for zero shot audio classification
Disable DDP for neuron (#21953)

Disable DDp for neuron

Co-authored-by: EC2 Default User ec2-user@ip-172-31-42-72.us-west-2.compute.internal

Fix bert issue (#21963)

Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in

[Generate] Fix gradient_checkpointing and use_cache bug for BLOOM (#21956)

Step 1 - Change use_cache fix

Add missing parameter definition in layoutlm config (#21960)

Four parameters in LayoutLM config were missing definitions, Added their definition (copied from BertConfig).

Use larger atol in torch.allclose for some tests (#21966)

Use larger atol

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Add TF contrastive image text finetuning example (#21939)
Initial commit
stash commit
Add model checkpointing and pushing
Fix model name inference
Update README
Update README
Remove a couple of Torch references
Update copyright date
make fixup
Update PushToHubCallback args!
Remove the torch summary
Add strategy.scope
Update expected values for test_xglm_sample (#21975)

update expected values for xglm

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Fix gradient checkpointing bug in BigBird Pegasus (#21976)
Fix gradient checkpointing bug in Blenderbot Small (#21977)
Fix gradient checkpointing bug in BlipText (#21978)

Make Format

Fix gradient checkpointing bug in Codegen (#21979)
Fix gradient checkpointing bug in ESM (#21980)
docs: improve clarity for language modeling (#21952)
docs: improve clarity for clm/mlm
docs: remove incorrect explanation
docs: remove incorrect explanation

Co-authored-by: pdhall99

Update Jukebox tests (#21984)
update expected values for jukebox
update expected values for jukebox
update expected values for jukebox
update expected values for jukebox
update expected values for jukebox

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Add check before int casting for PIL conversion (#21969)
Add check before int casting for PIL conversion
Line length
Tidier logic
Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens (#21959)
Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens
fix docs
Empty commit
formatting
[DETR, YOLOS] Fix device bug (#21974)
Fix integration test
Add test
Add test
Remove unneeded casts to bool (#21983)

Remove cast to Bool

Update notification_service.py (#21992)
better check
better check

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Skip test_multi_gpu_data_parallel_forward for some model tests (#21991)

skip test_multi_gpu_data_parallel_forward for some model tests

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

[Whisper] Add model for audio classification (#21754)
[Whisper] Add model for audio classification
make fix-copies
add to docs
add docstring
empty returns
add code example
switch to fleurs
stick everything on one line
Stop requiring Torch for our TF examples! (#21997)
Stop requiring Torch for our TF examples!
Slight tweak to logging in the example itself
[TF] Fix creating a PR while pushing in TF framework (#21968)
add create pr arg
style
add test
ficup
update test
last nit fix typo
add is_pt_tf_cross_test marker for the tsts
[DETR and friends] Remove is_timm_available (#21814)
First draft
Fix to_dict
Improve conversion script
Update config
Remove timm dependency
Fix dummies
Fix typo, add integration test
Upload 101 model as well
Remove timm dummies
Fix style

Co-authored-by: Niels Rogge nielsrogge@Nielss-MacBook-Pro.local

[Time-Series] informer model (#21099)
added informer to gitignore
added informer to gitignore
WIP informer2020
added checking that instantiate works
added config using gluonTS by kashif
WIP config
adding informeConfig. need to remove FeatureEmbedder
done InformerConfig, but need to change the names
Done informer model init. working on enc-dec
added things to address, after reading again enc-dec in the paper
done modeling - checking initialization work
added informer to gitignore
WIP informer2020
added checking that instantiate works
added config using gluonTS by kashif
WIP config
adding informeConfig. need to remove FeatureEmbedder
done InformerConfig, but need to change the names
Done informer model init. working on enc-dec
added things to address, after reading again enc-dec in the paper
done modeling - checking initialization work
moved enc-dec init to InformerEncoder/Decoder init
added 'init_std' to config, now model init works!
WIP conversion script, and added code sources
WIP conversion script: loading original informer pth works
WIP conversion script: change defaults in the config
WIP conversion script: supporting Informer input embedding
WIP conversion script: added parameters for the informer embed
WIP conversion script: change dim_feedforward=2048
WIP conversion script: remove unused args for loading checkpoint
just cleaning up
DataEmbedding removed, after thinking with Kashif
working on forward pass
WIP forward pass: trying to establish working batch for forward pass
cleaning and finalizing
adding HF names and docs
init after cleaning works
WIP in tests
added docs for the informer specific args
fix style
undo change
cleaning informer, now need to work only enc-dec
initial enc-dec classes
added encoder and decoder
added todo
add todos for conv_layers
added decoder docs from vanilla
added encoder docs from vanilla
remove encoder decoder from the original informer
removed AttentionLayer from the original paper
removed TriangularCausalMask, same as decoder_attention_mask
initial sparse attention
use conv_layers
fixed test_config test
fix parenthesis when itearting zip(layers, conv_layers)
error found in prob attention, added sizes as comments
fix sizes
added proposal for q_reduce indexing, and remove unused
WIP ProbMask, and changed factor=2 for testing
remove unused libs for this PR for creating the env
fix checking the attn_weights.size() after bmm
Q_reduce: changed from torch.gather to simple slicing
WIP calculate final attn_output
finish adding v_aggregated, attn_output ready
changed tgt_len to u in attention_mask, need to fix the size error
comment attention_mask for encoder, and fix if cond for v_agg
added ProbMask support (wip), removed old original code
finished ProbMask 😃
Revert "remove unused libs for this PR for creating the env"

This reverts commit 11a081e09e92771e51a5d2758d53a9afb59547f0.

fixes
make style
fix initial tests
fix more tests
dry
make style
remove unused files
style
added integration tests
fix num_static_real_features
fix header
remove unused function
fix example
fix docs
Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Update src/transformers/models/informer/modeling_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Update src/transformers/models/informer/configuration_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

fixes for reviewer
use prediction_length from model
fix style
fixed informer.mdx
added to index
updated readme
undo
make fix-copies
typo
fix copy
added Informer to toctree
in order
fixed comments
remove unneeded new lines in docs
make static real and cat optional
fix use of distil conv layers
fixed integration test
added checkpoint for convlayer
make fix-copies
updated from time series model
make fix-copies
copy decoder
fix unit tests
updated scaling config
fix integration tests
IGNORE_NON_TESTED
IGNORE_NON_AUTO_CONFIGURED
IGNORE_NON_AUTO_CONFIGURED
updated check configs
fix formatting
undo change from time series
prediction_length should not be None
aliign with the blog: prettify ProbSparse and change attention_factor to sampling_factor
make style
make fix-copies
niels CR: update contributed by
niels CR: update configuration_informer.py

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

niels CR: update kashif -> huggingface

Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

niels CR: sampling_factor only relevant when attention_type=prob
make style
fixed U_part: added multiplication by L_Q
fixed bug: remove is not None from if config.distil
fixed test: decoder_seq_length to encoder_seq_length in cross_attentions check
fix integration tests
updated model hub
do not shift as in training
undo
fix make-copies
make fix-copies
added if prediction_length is None
changed ProbSparseAttention to InformerProbSparseAttention
changed V_sum -> v_mean_dim_time
changed ConvLayer to InformerConvLayer and fixed super()
TimeSeriesTansformer->Informer in decoder's Copied from
more descriptive in ProbSparse
make style
fix coped from
Revert "added if prediction_length is None"

This reverts commit b4cbddfa05e3bd739b79569cd3c3b89e316f2451.

fixed indent
use InformerSinusoidalPositionalEmbedding
make fix-style
fix from #21860
fix name
make fix-copies
use time series utils
fix dec num_heads
docstring
added time series util doc
_import_structure
formatting
changes from review
make style
fix docs
fix doc
removed NegativeLogLikelihood

Co-authored-by: Kashif Rasul kashif.rasul@gmail.com Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com

Update tiny model creation script and some others files (#22006)
Update 1
Update 2
Update 3
Update 4
Update 5
Update 6
Update 7
Update 8
Update 9
Update 10

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Generate - add 1 to cur_len to make up the new beam length (#21993)
add 1 to cur_len to make up the new beam length

cur_len is 1 token shorter comparing to the length of the sequence whose best_sum_logprobs is the numerator.

cur_len+=1 before check if beam hyp is done
format code
reformat with black

Co-authored-by: Chiming chiming@biomap.com

VideoMAE doctest - use valid dummy pixel values (#22022)

Use valid dummy pixel values

update: bertology paper (#22012)
Update AudioClassificationPipelineTests::test_small_model_pt for PT 2.0.0 (#22023)

fix

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

[bnb] Fix bnb error message (#22026)
fix error message
make style
[WIP] Add BridgeTowerForContrastiveLearning (#21964)
Add BridgeTower for ITC
Fix review feedback
Rename BridgeTowerForITC, cleanup
Fix style and quality
implement tests

Co-authored-by: Tiep Le 97980157+tileintel@users.noreply.github.com Co-authored-by: Tiep Le tiep.le@intel.com

Fix test for torchneuroncore in Trainer (#22028)
Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline (#22031)

add tokenize_kwargs doc in the FeatureExtractionPipeline

[examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py (#21942)
Add specaugment to run_speech_recognition_seq2seq.py
Remove useless argument: text_column
Fix quality
Update return_attention_mask condition
Update specaugment arguments only for whisper models
Remove SpecAugment arguments from ModelArguments, only leave default values for simplicity
Apply suggestions from code review

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com

Update apply_spec_augment only for whisper models
Apply suggestions from code review

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com

Rename return_attention_mask to forward_attention_mask to avoid confusion with wav2vec2 models

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com

fixes the gradient checkpointing of whisper (#22019)
fixing
Update modeling_whisper.py
Update modeling_whisper.py
Update src/transformers/models/whisper/modeling_whisper.py

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Avoid text_config_dict and vision_config_dict being saved for CLIP-like models (#22035)
Avoid text_config_dict and vision_config_dict being saved
for other CLIP-like models

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Mark all BridgeTower tests slow for now (#22039)
slow me

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Bug fix: token classification pipeline while passing offset_mapping (#22034)

fix slow tokenizers with passing offset_mapping

Update ALIGN docs (#22025)
Fix typos and add code examples, resources
[21737][T5]: Fix gradient checkpoint bug (#22036)
[21737][T5]: Fix gradient checkpoint bug
[21737][T5]: Fix gradient checkpoint bug
[21737][T5]: Fix gradient checkpoint bug
Update src/transformers/models/mt5/modeling_mt5.py
Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: njindal njindal@adobe.com Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it (#22045)

In ZSH, not using ' ' around pip install fails

Running

pip install transformers[torch]

in the default ZSH terminal will fail with the error zsh: no matches found: transformers[torch]

The solution is to wrap the installation path in ' ' like

pip install 'transformers[torch]'

Relevant StackOverflow: https://stackoverflow.com/questions/30539798/zsh-no-matches-found-requestssecurity

Can't install tf2 on M1 Chip by default (#22046)
Remove set_access_token usage + fail tests if FutureWarning (#22051)
Remove set_access_token usage + fail tests if FutureWarning
do not fail on FutureWarning in CI

Co-authored-by: testbot lucainp@hf.co

Show the number of huggingface_hub warnings in CI report (#22054)
show hfh warnings

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Return analysis for hyperparameter_search with Ray backend (#22040)
return analysis for hyperparameter_search with ray backend
Revert "return analysis for hyperparameter_search with ray backend"

This reverts commit cd5179070930e03020d96d98eb51dec3eb21ef75.

add run_summary attribute to BestRun and return analysis for ray backend
fix typo
add doc for run_summary for ray backend
pt-to-tf model architecture override (#22055)
Add an argument to pt-to-tf to allow overriding the model class
make fixup
Minor fix to error message
Remove unused extra conversion from the script
rm $ symbol from code block from contributing.md (#22057)

rm $ symbol from code block

Removed the $ symbol from the code block to make copy-pasting easier.

[deepspeed] offload + non-cpuadam optimizer exception (#22043)
[deepspeed] offload + non-cpuadam optimizer exception
flip
revert min version
Edit the docstring of image_processing_donut to match code (#22033)
Edit the docstring of image_processing_donut to match code
improve style
more style improvement after installing quality
Skip 3 tests for WhisperEncoderModelTest (#22060)
skip 3 tests

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Add setters by type of args to TrainingArguments (#21570)
Add setters by type of args to TrainingArguments
Define more setters
Update tiny model creation script (#22058)

Update the script

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007)

Co-authored-by: EC2 Default User ec2-user@ip-172-31-42-72.us-west-2.compute.internal

Add a progress bar for the total download of shards (#22062)
Add a progress bar for the total download of shards
Check for no cache at all
Fix check
Fix gradient checkpointing bug in Speech2Text (#22079)
Fix gradient checkpointing bug in Speech2Text
Update modeling_speech_to_text.py
Update modeling_speech_to_text_2.py
Fix gradient checkpointing bug in switch transformer (#22081)
[GPT2] Propose fix for #21080 (#21853)
Make sure position ids are masked
test that padded input produce the same results
fix failing tests
fixup
fix batch test
Fix small typo in flan-ul2.mdx (#22068)
Update flan-ul2.mdx
Update flan-ul2.mdx
Generate - Fix broken documentation links (#22078)

fix broken links

Fix gradient checkpointing bug in Speecht5 (#22080)
Fix gradient checkpointing bug in Speecht5
Update modeling_speech_to_text.py
Update src/transformers/models/speech_to_text/modeling_speech_to_text.py
Fix change errors

Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com

Fix hint in src/transformers/modeling_utils.py (#22074)

fix hint

handle numpy inputs in whole word mask data collator (#22032)
GPT-J specific half precision on CPU note (#22086)
re: #21989
update re: #21989
removed cpu option
make style
Fix imports of TF MobileViT (#22065)
Fix imports of TF MobileViT
Fix copies
Revert "[GPT2] Propose fix for #21080" (#22093)

Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure

This reverts commit a3fef89b2694fac4dd642a3f77d3e96d0c3df82a.

[Whisper] Remove embed_tokens from encoder docstring (#21996)
[Whisper] Remove embed_tokens from encoder docstring
new line to retrigger CI
remove new line
Add AutoModelForZeroShotImageClassification (#22087)

Adds AutoModelForZeroShotImageClassification to transformers

add new model of MGP-STR (#21418)
add new model of MGP-STR
fix the check failings
remove torch and numpy from mgp_tokenization
remove unused import from modeling_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str.py
add test_processing_mgp_str
add test_processing_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str and add softmax outs to model
rm test_processing_mgp_str and add softmax outs to model
rewrite the code of mgp-str according to PR suggestions
rewrite the code of mgp-str according to PR suggestions
add new model of MGP-STR
fix the check failings
remove torch and numpy from mgp_tokenization
remove unused import from modeling_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str.py
add test_processing_mgp_str
add test_processing_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str and add softmax outs to model
rewrite the code of mgp-str according to PR suggestions
rewrite the code of mgp-str according to PR suggestions
remove representation_size from MGPSTRConfig
reformat configuration_mgp_str.py
format test_processor_mgp_str.py
add test for tokenizer and complete model/processer test and model file
rm Unnecessary tupple in modeling_mgp_str
reduce hidden_size/layers/label_size in test_model
add integration tests and change MGPSTR to Mgpstr
add test for logit values
reformat test model file

Co-authored-by: yue kun yuekun.wp@alibaba-inc.com

Add pr_checks.mdx Italian translation (#17459) (#22116)
Add pr_checks.mdx Italian translation (#17459)
Updated pr_checks.mdx Italian translation (#17459)
Fix gradient checkpointing bug in xglm (#22127)
Fix gradient checkpointing bug in Trajectory Transformer (#22125)
Fix gradient checkpointing bug in xlm_roberta_xl (#22128)
Added big_models.mdx italian translation #17600 (#22115)
updated toctree
italian translation big_model.mdx
italian translation big_models
[Blip2] skip accelerate test (#22124)

skip accelerate test

Fix gradient checkpointing bug in xmod (#22129)
Fix gradient checkpointing bug in LongT5 (#22130)
Fix gradient checkpointing bug in trocr (#22126)
Fix gradient checkpointing bug in trocr
Fix format
Update src/transformers/models/trocr/modeling_trocr.py

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Zero-shot image classification task guide (#22132)
WIP
WIP
manual inference example
make style
Apply suggestions from code review

Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com

Fix doc link for MGP-STR (#22138)
Adding Type Hints to TF_Pegasus model (#21941)
Adding Type Hints to TF_Pegasus model
Updated some parameters per maintainer comments
Add a new script to check model testers' config (#22063)
Add script

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Update configuration_align.py (projected_dim=640) (#22139)

Update configuration_align.py

updated projected_dim=640 from 512 in arguments of AlignConfig

[Whiper] add get_input_embeddings to WhisperForAudioClassification (#22133)
add get_input_embeddings to WhisperForAudioClassification
add common tests
fix another common test
Update tests/models/whisper/test_modeling_whisper.py

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

fix style

Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com

Trainer: let generate pick its inputs (#22108)
Let generate pick its inputs
fix squad seq2seq example
Enforce same behavior as PyTorch 2.0 for older versions (#22136)
[trainer] fix bug in grad accum with multiple epochs (#22098)
[trainer] fix bug in grad accum
comment out debug
fix one-off
rename counter
[deepspeed docs] Activation Checkpointing (#22099)
[deepspeed docs] Activation Checkpointing
Apply suggestions from code review

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Update deepspeed.mdx

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Remove backend check for torch.compile (#22140)
Remove backend enforcment for torch.compile
Update error
Update src/transformers/training_args.py

Co-authored-by: Stas Bekman stas00@users.noreply.github.com

Apply suggestions from code review

Co-authored-by: Stas Bekman stas00@users.noreply.github.com

Style

Co-authored-by: Stas Bekman stas00@users.noreply.github.com

[Safetensors] Add explicit flag to from pretrained (#22083)
[Safetensors] Add explicit flag to from pretrained
add test
remove @
Apply suggestions from code review

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Prepare daily CI for torch 2.0.0 (#22135)

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

docs: New terms and updates to glossary (#21982)
Updated glossary with new terms, added abbreviations for certain terms and merged autoencoding models, autoregressive models and causal language modeling into encoder and decoder models
Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Added link to 'Pipeline for inference' tutorial
Trigger CI
Update docs/source/en/glossary.mdx

Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Added entry for self supervised learning, added deleted entries + fixed broken links
Update docs/source/en/glossary.mdx

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com

Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com

[🛠️] Fix-whisper-breaking-changes (#21965)
temp fix
temporary fix
update
fix tests
fixup
update based on reveiew

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com

update to fix tests
update docstring

Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com

Move is_pipeline_test_to_skip to specific model test classes (#21999)
Move is_pipeline_test_to_skip to specific model test classes

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Add ConvNeXT V2 (#21679)
Add ConvNeXt V2 to transformers
TF model is separated from the PR to fix issues
Update 2 doctest expected values for torch 2.0.0 (#22148)

update values

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Translation Italian: perf_train_cpu and perf_train_cpu_many (#22151)
added translated files

added perf_train_cpu and perf_train_cpu_many

updated toctree
Fix big model inference for T5 models in float16 (#22095)
Fix big model inference for T5 models in float16
Apply suggestions from code review

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Style
Trigger CI with latest release

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Create MaskedImageCompletionOutput and fix ViT docs (#22152)
create MaskedImageCompletionOutput
fix bugs
fix bugs
to_pil - don't rescale if int and in range 0-255 (#22158)
Don't rescale if in and in range 0-255
Raise value error if int values too large
Update tests/test_image_transforms.py
Update tests/test_image_transforms.py
[trainer] add --optim adamw_torch_fused for pt-2.0+ (#22144)
[trainer] add --optim adamw_torch_fused
change optim default
deal with non-torch
revert default change; prep; add fp16/amp assert
typo
typo
Revert "Enforce same behavior as PyTorch 2.0 for older versions" (#22163)

Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136)"

This reverts commit 1c801d65eb42a71ea52db797af760bd96c8b113f.

v4.28.0.dev0
Load optimizer state on CPU to avoid CUDA OOM (#22159)
Run all tests by default (#22162)
Fix: unfinished_sequences with correct device (#22184)

Fix: unfinished_sequences with correct device

The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.

Revert 22152 MaskedImageCompletionOutput changes (#22187)

Revert changes

Regression pipeline device (#22190)
Fix regression in pipeline when device=-1 is passed
Add regression test
Update BridgeTowerForContrastiveLearning (#22145)
Use return_loss for BridgeTowerForContrastiveLearning, add example
fix tests
Update example in BridgeTowerForContrastiveLearning
Update test_modeling_bridgetower.py
update model output format
minor update
Update src/transformers/models/bridgetower/modeling_bridgetower.py
make style

Co-authored-by: Tiep Le 97980157+tileintel@users.noreply.github.com Co-authored-by: Tiep Le tiep.le@intel.com Co-authored-by: Yih-Dar 2521628+ydshieh@users.noreply.github.com Co-authored-by: ydshieh ydshieh@users.noreply.github.com

t5 remove data dependency (#22097)
t5 remove data dependency
make style
make fix-copies

Co-authored-by: Prathik Rao prathikrao@microsoft.com

Fix DeepSpeed CI (#22194)
Deal with torch-tensorrt

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Fix typo in Align docs (#22199)

Fix align docs typo

Update expected values in MgpstrModelIntegrationTest (#22195)

Update values

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Italian Translation of migration.mdx (#22183)
Tranlstion Italian: migration
Update migration.mdx

minor fixes

Update _toctree.yml
Delete migration.mdx
Add italian translation of migration.mdx
Update of migration.mdx translation and toctree
LLaMA Implementation (#21955)
LLaMA
sharding and docs
tweak
black
inits
ruff
LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
init
no checkpoint
docs
ruff
type_vocab_size
tokenizer fixes
tokenizer fixes
Update tokenization_llama.py
Update tokenization_llama.py
Update configuration_llama.py
Update modeling_llama.py
tokenizer add_bos by default
licenses
remove decoder
norms and mlp
rope overhaul
tweaks
black
mention OPT implementation
off-by-one naming
typo
fix
tokenization fix and slicing bug
padding config
cleanup
black
update tests
undo typo
fix vocab caching logic
ruff
docbuilder
attn fix from BlackSamorez
initial feedback
typo
docs
llama case
llama case
load checkpoint docs
comment about tokenizer
tokenizer defaults
clear past_key_values if use_cache=False
last tweaks
last tweaks
last tweaks
last tweaks

Co-authored-by: Stella Biderman stellabiderman@gmail.com

LLaMA Implementation (#21955)
LLaMA
sharding and docs
tweak
black
inits
ruff
LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
init
no checkpoint
docs
ruff
type_vocab_size
tokenizer fixes
tokenizer fixes
Update tokenization_llama.py
Update tokenization_llama.py
Update configuration_llama.py
Update modeling_llama.py
tokenizer add_bos by default
licenses
remove decoder
norms and mlp
rope overhaul
tweaks
black
mention OPT implementation
off-by-one naming
typo
fix
tokenization fix and slicing bug
padding config
cleanup
black
update tests
undo typo
fix vocab caching logic
ruff
docbuilder
attn fix from BlackSamorez
initial feedback
typo
docs
llama case
llama case
load checkpoint docs
comment about tokenizer
tokenizer defaults
clear past_key_values if use_cache=False
last tweaks
last tweaks
last tweaks
last tweaks

Co-authored-by: Stella Biderman stellabiderman@gmail.com

Update tiny model creation script (#22202)
Update UNCONVERTIBLE_MODEL_ARCHITECTURES
Deal with 2 model tester classes in single test file
Deal with 2 model tester classes in single test file
Deal with 2 model tester classes in single test file
make style and quality

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Temporarily fix ONNX model exporting error (#21830)
Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143
Reduced column width
Fix formatting.
Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143"

This reverts commit 6e95a108042118d204da447729f3834affa354fc.

Fix export error.
Revert "Fix formatting."

This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8.

Propagated changes made in SwinV2 to Swin2SR
[XGLM] Add accelerate support for XGLM (#22207)
add accelerate support for XGLM
fix order
fixes a typo in WhisperFeatureExtractor docs. (#22208)
fixes a typo
.
🔥py38 + torch 2 🔥🔥🔥🚀 (#22204)
py38 + torch 2
increment cache versions

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Hotfix for natten issue with torch 2.0.0 on CircleCI (#22218)

fix

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

fix typos in llama.mdx (#22223)
fix code example in mgp-str doc (#22219)

Co-authored-by: yue kun yuekun.wp@alibaba-inc.com

Use dash==2.8.1 for now for daily CI (#22227)

Use dash 2.8.1 for now

Co-authored-by: ydshieh ydshieh@users.noreply.github.com

Depth estimation task guide (#22205)
added doc to toc, auto tip with supported models, mention of task guide in model docs
make style
removed "see also"
minor fix
LLaMA house-keeping (#22216)
LLaMA house-keeping
Doc links
fix AutoTP in deepspeed could not work for bloom (#22196)
fix AutoTP in deepspeed could not work for bloom

Signed-off-by: Wang, Yi A yi.a.wang@intel.com

add a method in BloomModel to build ailib

Signed-off-by: Wang, Yi A yi.a.wang@intel.com

Add LlamaForSequenceClassification (#22209)
Add LlamaForSequenceClassification
Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Add docstring
Add test
Add input embedding getter and setter
Remove dead code

Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com

Removed .mdx extensi…