LLaMA Implementation by zphang · Pull Request #21955 · huggingface/transformers (original) (raw)
Fix 2 quicktour file doctest (#21742)
Update expect output values - as Hub repo. files are updated
Update expect output values - as librosa is from 0.9.2 to 0.10.0 on CI docker
fix
update one more
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
[
GPTNeo
] Fix gradient checkpointing bug (#21733)fix bug
forward contrib credits from discussions
change logic
Co-authored-by: edbeeching edbeeching@users.noreply.github.com
Generate: Fix GIT batched captioning (#21738)
Skip test_log_level for now
Added Type Hints for modeling_tf_encoder_decoder.py (#21673)
Ran Black formatting
Added imports and reformatted
Update src/transformers/models/encoder_decoder/modeling_tf_encoder_decoder.py
Co-authored-by: Matt Rocketknight1@users.noreply.github.com
Auto api Value Error addition to Troubleshoot (#21708)
troubleshooting guide: added an error description for missing auto-mapping
minor polishing
changed the example
Apply suggestions from code review
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/troubleshooting.mdx
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
[deepspeed tests] fix issues introduced by #21700 (#21769)
[deepspeed tests] fix issues introduced by #21700
fix
fix
Graphormer fix (#21699)
Removed useless check for backend
fix style check for graphormer
Reverted change and corrected requires_backend for cython
code qual
fix: Change is_last chunk calc and add conditional break in chunk_iter (#21612)
fix: Change is_last chunk calc and add conditional break
format fix
account for 0 and full stride_rights, add comment
add new test
make style
update slow whisper asr test timestamps
use nested_simplify on output and round timestamp to hundreths place
[Flax] adding support for batch norm layers (#21581)
[flax] adding support for batch norm layers
fixing bugs related to pt+flax integration
cleanup, batchnorm support in sharded pt to flax
support for batchnorm tests in pt+flax integration
simplifying checking batch norm layer
[Examples] Generalise run audio classification for log-mel models (#21756)
[Examples] Generalise run audio classification for log-mel models
batch feature extractor
make style
Different behavior in DistilBERT when using "inputs_embeds" (#21752)
Different behavior in DistilBERT when using "inputs_embeds" Fixes #21089
fix failing test
[Flax] Fix erroneous kwargs being passed to generate config (#21765)
[Whisper] Add SpecAugment (#21298)
Return and rescale attention_mask
Add SpecAugment to Whisper modeling
Fix test
Update docstring
Add SpecAug related parameters to model config
Add the _mask_input_features function to doc
Fix quality
Apply suggestions from code review
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com
Remove dev comments
Add test
Resolve conflict
feat: mask {feature, time} prob fast tests
Apply suggestions from code review
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com Co-authored-by: sanchit-gandhi sanchit@huggingface.co Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Fix-ci-whisper (#21767)
fix history
input_features instead of input ids for TFWhisport doctest
use translate intead of transcribe
Generate - update cookie cutters to not initialize cache with training and gradient checkpointing (#21759)
[time series] updated expected values for integration test. (#21762)
updated expected
prediction_length fix
prediction_length default value
default prediction_length 24
revert back prediction_length default
move prediction_length test
[GPT2, ProphetNet] Fix gradient checkpointing bug (#21772)
fix gradient checkpointing bug
fix gradient checkpointing bug
ran make fix-copies
fixed bug
fixed bug
[SpeechT5] Fix HiFiGAN tests (#21788)
Fix resume_from_checkpoint for deepspeed (#21735)
Fix resume_from_checkpoint for deepspeed
Fix resume_from_checkpoint for deepspeed, by ensuring that the deepspeed engine is the one to load the checkpoint.
Empty commit to trigger CI
Removed deepspeed skipping
Removed deepspeed skipping inside the _load_from_checkpoint function, as it is obsolete
another adjustment
Trigger CI
trigger circleci
style
Co-authored-by: ydshieh ydshieh@users.noreply.github.com Co-authored-by: Stas Bekman stas@stason.org
[examples/summarization] deal with
max_length
andnum_beams
(#21740)Override the decoding parameters of Seq2SeqTrainer
Fix quality
Fix max_length parameter
Fix quality
Remove redundant parameter max_length
Separate the preprocess of train and validation to use different max_target_length
Fix type in gpt2 config docstring (#21782)
Fix docstring gpt2 config
Fix en documentation typos (#21799)
fix wrong url
typos in english documentation
[FX tracer] Make
concrete_args
from outside available (#21775)
make concrete_args from outside available
[Pipeline] Add zero shot audio classificatoin pipeline (#21600)
add pipeline
update init
add zero shot to init
update inits and correct checkpoints
update base to support input features
add tests
Update src/transformers/pipelines/zero_shot_audio_classification.py
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
- Update src/transformers/pipelines/zero_shot_audio_classification.py
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
update pieline code
use tiny checkpoint
nits and expected value with tiny model
style
last nit on tests values
fix styling
fix collate fn that was casting t float
update
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
[torch] remove deprecated uint8 in favor of bool (#21384)
uint8 -> bool
fix copies
style
update test modeling commen when checking attention buffers
style
use logical not on random mask instead of subtraction with 1
remove torch uint8
quality
remove modified modeling utils
Update based on review
Co-authored-by: sgugger sylvain.gugger@gmail.com
Co-authored-by: sgugger sylvain.gugger@gmail.com
[
tests
] addaccelerate
marker (#21743)add
accelerate
markeradd to docs
Update docs/source/en/testing.mdx
Fix PyTorch Perceiver
PerceiverFourierPositionEncoding
with fp16 (#21787)fix perceiver fp16
hopefully fix tests
Fix nn.init.trunc_normal_ call on torch.float16 data (#21789)
fix nn.init.trunc_normal_ call on half data
Fix gradient checkpointing bug in gptneox (#21815)
Fix gradient checkpointing bug in gptneox
Remove use_cache block
Inheritance-based framework detection (#21784)
Fix quality with
ruff==0.0.253
(#21828)
fix quality with ruff 0.0.253
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
introduce
logger.warning_once
and use it for grad checkpointing code (#21804)logger.warning_once
style
Rename
MobileViTModelTest
toTFMobileViTModelTest
(#21825)
Let's give TF a bit more love ❤️ 🙏
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Fix gradient checkpointing bug BioGpt (#21844)
Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in
check for None forced tokens (#21793)
Fix gradient checkpointing bug in git (#21818)
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Fix gradient checkpointing imagegpt (#21816)
Fix gradient checkpointing bug in gptneox
Fix gradient checkpointing bug in modeling_imagegpt.py
Revert gpt neox changes
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Fix tf random token masking probability in data collator (#21834)
fix tf random mask tokens probability
fix tf random mask tokens probability in collator for langauge modelling
[
T5
] Fix torchquant issue (#21843)fix torchquant issue
add tests
[
Blip2
] AddBlip2Model
(#21817)add v1
add
Blip2Model
- add relevant functions
- add tests
- add on automapping
fix docs
fix doctest
Fix the issue of blip model returning loss even when the label is not provided. (#21811)
Fix the issue of blip model returning loss even when the label is not provoided
Fix ruff failure
Incorporate PR feedbacks
Incorporate PR feedbacks
Incorporate PR feedbacks
Incorporate PR feedbacks
[GPTJ] Fix gradient checkpointing bug (#21794)
If applied, this commit fixes generate bug in gptj
Remove extra same code block
formatting and test fix
Conflict fix and declaration error fix
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Add: task guide for zero shot object detection (#21829)
zero shot object detection part 1
added batch prediction section
added image guided object detection section
make style
added the task guide to the TOC
minor polishing
Apply suggestions from code review
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com
added embedded owlvit demo
Apply suggestions from code review
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
minor fix
make style
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Make Slack CI reporting stronger (#21823)
Use token
Avoid failure
better error
Fix
fix style
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
[
Blip2
] Fix Blip-2 multi gpu (#21707)fix blip multi gpu
fix
final changes
adapt suggestions
fix failing slow test
forward contrib credits from testing and suggestions
reformat
Co-authored-by: akkikiki akkikiki@users.noreply.github.com
Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval (#21684)
Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval
minor fix return_dict
implement test for loss computation
Co-authored-by: Tiep Le 97980157+tileintel@users.noreply.github.com Co-authored-by: Tiep Le tiep.le@intel.com
🔥Rework pipeline testing by removing
PipelineTestCaseMeta
🚀 (#21516)Add PipelineTesterMixin
remove class PipelineTestCaseMeta
move validate_test_components
Add for ViT
Add to SPECIAL_MODULE_TO_TEST_MAP
style and quality
Add feature-extraction
update
raise instead of skip
add tiny_model_summary.json
more explicit
skip tasks not in mapping
add availability check
Add Copyright
A way to diable irrelevant tests
update with main
remove disable_irrelevant_tests
skip tests
better skip message
better skip message
Add all pipeline task tests
revert
Import PipelineTesterMixin
subclass test classes with PipelineTesterMixin
Add pipieline_model_mapping
Fix import after adding pipieline_model_mapping
Fix style and quality after adding pipieline_model_mapping
Fix one more import after adding pipieline_model_mapping
Fix style and quality after adding pipieline_model_mapping
Fix test issues
Fix import requirements
Fix mapping for MobileViTModelTest
Update
Better skip message
pipieline_model_mapping could not be None
Remove some PipelineTesterMixin
Fix typo
revert tests_fetcher.py
update
rename
revert
Remove PipelineTestCaseMeta from ZeroShotAudioClassificationPipelineTests
style and quality
test fetcher for all pipeline/model tests
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Improve TF weight loading, especially PT crossloading (#21792)
First commit for the improved PT-TF weight loading
Remove workarounds from TFEncoderDecoder tests
Allow a custom weight renaming function in from_pretrained and use that to clean up EncoderDecoder
make fixup
First attempt at visionencoderdecoder
Disable tensorfloat32 in tests to get consistent outputs
Quick fix to tf_vision_encoder_decoder tests
make fixup
Update Blenderbot tests
Remove unused arg in modeling_tf_opt
load_tf_sharded_weights had strict=True! This meant transfer learning was impossible, so I'm setting it to False.
Support prefixes when loading sharded TF checkpoints
make fixup
Add test to load sharded models with a weight prefix
Fix sharded weight loading test
Add a test for transfer from a sharded checkpoint
make fixup
Add test to check that crossloading from PT with a prefix works
Refactor from_pretrained in the encoderdecoder classes
Refactor from_pretrained in the encoderdecoder classes
missmatched -> mismatched
Explicitly check for None
No comments showing my very impressive and attractive knowledge of Py3.9+
Disable TF32 across all TF tests
Fix flaky test for log level (#21776)
Fix flaky test for log level
Fix other flaky test
prepare for "floordiv is deprecated and its behavior will change in a future version of pytorch" (#20211)
rounding_mode = "floor" instead of // to prevent behavioral change
add other TODO
use
torch_int_div
from pytrch_utilssame for tests
fix copies
style
use relative imports when needed
Co-authored-by: sgugger sylvain.gugger@gmail.com
[ConvBert] Fix #21523 (#21849)
fix reshaping Fixes #21523
add test
styling
last fixes
Update src/transformers/models/convbert/modeling_convbert.py
code quallity
Flax beam search fix (#21857)
Fix gradient checkpointing bug Bart (#21866)
Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in
[deepspeed] check whether model is NLP one instead of counting on input type (#21800)
trying to figure out whether model is NLP
drop my changes and apply easier fix
trying to handle all int input types
fix logic
Co-authored-by: Stas Bekman stas@stason.org
Change the way tensor is reshaped in BartAttention (from .view to .reshape) (#21860)
Change the .view call to .reshape
Change the .view call to .reshape to all the copies from bart attention
Fix copies and style
Fix copies and style
Fix copies and style
Fix copies and style
Fix copies and style
Revert unneccessary changes
Revert unneccessary changes
Revert unneccessary changes
Revert unneccessary changes
Italian translation of community.mdx (#21871)
Italian translation of community.mdx gh-17459
- [
Blip
] Fix blip doctest (#21868)
fix blip doctest
- Removed BLIP mention from the troubleshooting guide (#21872)
removed BLIP mention from the troubleshooting guide
update FSDP and add XLA-FSDP documentation (#21812)
update FSDP and add XLA-FSDP documentation
resolving comments
minor update
fix xla-fsdp docs
[doc] deepspeed tests (#21859)
Add an utility file to get information from test files (#21856)
Add an utility file to get information from test files
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Add check for different embedding types in examples (#21881)
Add check for different embedding types in examples
Correctly update summarization example
Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights (#21879)
apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation
Add TFVisionTextDualEncoder (#21873)
Temporary commit to stash everything so far
Temporary commit to stash everything so far
stash commit
Refactor from_pretrained
Fix final test, make fixup
Update dummies
Add model to TEST_FILES_WITH_NO_COMMON_TESTS
Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
- Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
- Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
- Update src/transformers/models/vision_text_dual_encoder/modeling_tf_vision_text_dual_encoder.py
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
Add TFVisionTextDualEncoder to utils/documentation_tests.txt
make fixup
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
- Add ALIGN to transformers (#21741)
Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
- Fix Gradient checkpointing bug BigBird (#21882)
Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in
Fix
WhisperModelTest
(#21883)force on the same device
fix tests
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Fix
test_load_default_pipelines_pt
forClapModel
(#21886)fix tests
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
fix checkpoint (#21874)
[Refactor] Relative imports wherever we can (#21880)
initial commit
update
second batch
style
fix imports
fix relative import on pipeline
[ZAC] fix ci daily (#21893)
add correct revision after model was overwritten
Use PyAV instead of Decord in examples (#21572)
Use PyAV instead of Decord
Get frame indices
Fix number of frames
Update src/transformers/models/videomae/image_processing_videomae.py
Fix up
Fix copies
Update timesformer doctests
Update docstrings
Add
inputs_embeds
functionality when generating with BioGPT (#21889)initial commit to add inputs_embeds to generation
formatting
[T5 doc] Fix confusing documentation about
d_kv
(#21896)Confusing documentation in T5
Fix onfusing documentation in T5 configuration file
[Whisper] Add rescaling function with
do_normalize
(#21263)add
zero_mean_unit_var_norm
functionnormalize before MEL computation
fixup
add simple test
quality
Update tests/models/whisper/test_feature_extraction_whisper.py
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com
fixup
use attention masks if padding was applied
Update based on review
Co-authored-by: bofeng huang bofenghuang7@gmail.com
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com Co-authored-by: bofeng huang bofenghuang7@gmail.com
fix typo in Bart's attention (#21898)
[GPT-J] add deprecation warning (#21869)
add deprecation warning
remove pos ids from args docstirng
fix failing test
fsdp bf16 enable autocast (#21847)
Fix gradient checkpointing bug LED (#21840)
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
- Fix gradient checkpointing bug M2M 100 (#21841)
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
- Fix gradient checkpointing bug marian (#21842)
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Mark pipeline tests to skip them easily (#21887)
Mark pipeline tests to skip them easily
Mark the mixin as pipeline test
Update src/transformers/testing_utils.py
Co-authored-by: Yih-Dar 2521628+ydshieh@users.noreply.github.com
Co-authored-by: Yih-Dar 2521628+ydshieh@users.noreply.github.com
Clean up auto mapping names (#21903)
add new test
fix after new test
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Prophetnet batch dimension inversion fix (#21870)
decoder forward pass is working
no model has forward pass returning attentions
decoder ngram changed to not mix batch size
current basic forward pass returns identical result
passed test_model attentions
passed test_encoder_decoder_model_generate
passed test_headmasking
removed old block
removed comments bug/fixme
removed bug comments
applied styling
applied fix-copies
applied ngram forward comments
corrected dimension notation
applied styling and comment fixes
changed asserts for raise ValueError
changed question gen test
updated hidden_states integration test
applied styling
Make schedulers picklable by making lr_lambda fns global (#21768)
Make schedulers picklable by making lr_lambda fns global
add unused _get_constant_schedule_lr_lambda arg
remove unneeded _get_constant_schedule_lr_lamda
add test
make style
rebase, remove torch dep, put lambda back
repo-consistency and style
Refactor whisper asr pipeline to include language too. (#21427)
[WIP] whisper refacto to support language output.
Handling merges.
A bit more cleanup and comments.
Many improvements.
Lots of details everywhere.
Cleanup old code and tests.
Handle lone timestamp tokens (just recover when something bad happens).
Adding return_language example.
No ffmpeg.
Hmm.
Some corrections.
Both fast and slow.
New black.
Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com
- Update src/transformers/models/whisper/tokenization_whisper.py
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com
Remove print.
Undoing tests modifications.
Smaller test modifications.
Rename.
Remove maxDiff.
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com
Add Blip and Blip2 for pipeline tests (#21904)
fix
add to tests
style and quality
add missing
Co-authored-by: NielsRogge NielsRogge@users.noreply.github.com Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Temporarily skip 3 tests in
BridgeTowerModelTest
(#21908)
skip for now
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Faster zero shot image (#21897)
Make ZeroShotImageClassificationPipeline faster
The pipeline makes separate calls to model for each candidate label. This commit combines all labels into one call. Original code takes more that 60 seconds to process one image and 1000 candidate labels. Updated code takes less than 2 seconds.
implement batching
code formatting
Creating an even faster zero-shot-image-classifiction.
Unfortunately super tailored towards CLIP.
Co-Authored-By: Yessen Kanapin yessen@deepinfra.com
Quality.
Cleanup.
Order different on the CI it seems.
Cleanup.
Quality.
Co-authored-by: Yessen Kanapin yessen@deepinfra.com
[time series] Add Time series inputs tests (#21846)
intial test of inputs
added test for generation
remove asserts
fixed test
Update tests/models/time_series_transformer/test_modeling_time_series_transformer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
Avoid modeling tests run in pipeline CI jobs (#21911)
rework is_pipeline_test
bring back 3 tests
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Fix doctests for TFVisionTextDualEncoder (#21910)
faster forward following what is done for images (#21906)
faster forward following what is done for images
add missing licence
Fix gradient checkpointing bug in MBart (#21918)
Fix gradient checkpointing bug in mvp (#21920)
Fix gradient checkpointing megatron bert (#21921)
Update
model_split_percents
forWhisperModelTest
(#21922)
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Use large VM for
repo_utils_job
(#21928)
upgrade to large VM
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Cleanup more auto mapping names (#21909)
fix auto 2
fix auto 2
fix task guide issue
fix
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
feat: filter try/except when looking at custom code (#21914)
feat: filter try/except
Update src/transformers/dynamic_module_utils.py
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Fix
AlignModelTest
tests (#21923)fix
fix
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Avoid failure in
check_repo.py
due to missing backends (#21930)Update utils/check_repo.py
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
- Update utils/check_repo.py
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Co-authored-by: ydshieh ydshieh@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Fix wrong documentation about DataCollator padding defaults (#21919)
Fix wrong documentation about DataCollator padding defaults
Fix styling
[Flan-UL2] Add-flan-ul2 (#21929)
add doc and readme
add model docs
update toctree and fix copies
update
update doc file
fix
add FLAN-UL2 to configuration mapping
fixup
Apply suggestions from code review
more clarification
Co-authored-by: younesbelakda younesbelkada@gmail.com Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
Update README logo (#21933)
[CLAP] Support batched inputs for CLAP. Fixes pipeline issues (#21931)
fix pipeline
fix feature_extraction clap
you can now batch the
is_longer
attributeadd tests
fixup
add expected scores
comment on is_longert
[Whisper] Fix feature normalization in
WhisperFeatureExtractor
(#21938)
Fix feature normalization in WhisperFeatureExtractor
Fix gradient checkpointing bug in OPT (#21943)
Fix gradient checkpointing bug in Pegasus (#21944)
Fix gradient checkpointing bug in Rembert (#21945)
Fix gradient checkpointing bug in Roformer (#21946)
Fixed gradient_checkpointing/use_cache bug in blenderbot (#21833)
Fixed gradient_checkpointing/use_cache bug in blenderbot
Update modeling_blenderbot.py
Added back if statement
Formatted using black
Update expected values in
XLMProphetNetModelIntegrationTest
(#21957)
update values
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
[CI] Fix ci (#21940)
fix
get_proposal_pos_embed
fix order
style
zero shot simplify test
add approximate values for zero shot audio classification
Disable DDP for neuron (#21953)
Disable DDp for neuron
Co-authored-by: EC2 Default User ec2-user@ip-172-31-42-72.us-west-2.compute.internal
- Fix bert issue (#21963)
Co-authored-by: saswatmeher saswatmeher@cse.iitb.ac.in
- [Generate] Fix gradient_checkpointing and use_cache bug for BLOOM (#21956)
Step 1 - Change use_cache fix
- Add missing parameter definition in layoutlm config (#21960)
Four parameters in LayoutLM
config were missing definitions, Added their definition (copied from BertConfig).
- Use larger atol in
torch.allclose
for some tests (#21966)
Use larger atol
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Add TF contrastive image text finetuning example (#21939)
Initial commit
stash commit
Add model checkpointing and pushing
Fix model name inference
Update README
Update README
Remove a couple of Torch references
Update copyright date
make fixup
Update PushToHubCallback args!
Remove the torch summary
Add strategy.scope
Update expected values for
test_xglm_sample
(#21975)
update expected values for xglm
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Fix gradient checkpointing bug in BigBird Pegasus (#21976)
Fix gradient checkpointing bug in Blenderbot Small (#21977)
Fix gradient checkpointing bug in BlipText (#21978)
Make Format
Fix gradient checkpointing bug in Codegen (#21979)
Fix gradient checkpointing bug in ESM (#21980)
docs: improve clarity for language modeling (#21952)
docs: improve clarity for clm/mlm
docs: remove incorrect explanation
docs: remove incorrect explanation
Co-authored-by: pdhall99
Update
Jukebox
tests (#21984)update expected values for jukebox
update expected values for jukebox
update expected values for jukebox
update expected values for jukebox
update expected values for jukebox
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Add check before int casting for PIL conversion (#21969)
Add check before int casting for PIL conversion
Line length
Tidier logic
Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens (#21959)
Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens
fix docs
Empty commit
formatting
[DETR, YOLOS] Fix device bug (#21974)
Fix integration test
Add test
Add test
Remove unneeded casts to bool (#21983)
Remove cast to Bool
Update
notification_service.py
(#21992)better check
better check
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Skip
test_multi_gpu_data_parallel_forward
for some model tests (#21991)
skip test_multi_gpu_data_parallel_forward for some model tests
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
[Whisper] Add model for audio classification (#21754)
[Whisper] Add model for audio classification
make fix-copies
add to docs
add docstring
empty returns
add code example
switch to fleurs
stick everything on one line
Stop requiring Torch for our TF examples! (#21997)
Stop requiring Torch for our TF examples!
Slight tweak to logging in the example itself
[TF] Fix creating a PR while pushing in TF framework (#21968)
add create pr arg
style
add test
ficup
update test
last nit fix typo
add
is_pt_tf_cross_test
marker for the tsts[DETR and friends] Remove is_timm_available (#21814)
First draft
Fix to_dict
Improve conversion script
Update config
Remove timm dependency
Fix dummies
Fix typo, add integration test
Upload 101 model as well
Remove timm dummies
Fix style
Co-authored-by: Niels Rogge nielsrogge@Nielss-MacBook-Pro.local
[Time-Series] informer model (#21099)
added informer to gitignore
added informer to gitignore
WIP informer2020
added checking that instantiate works
added config using gluonTS by kashif
WIP config
adding informeConfig. need to remove FeatureEmbedder
done InformerConfig, but need to change the names
Done informer model init. working on enc-dec
added things to address, after reading again enc-dec in the paper
done modeling - checking initialization work
added informer to gitignore
WIP informer2020
added checking that instantiate works
added config using gluonTS by kashif
WIP config
adding informeConfig. need to remove FeatureEmbedder
done InformerConfig, but need to change the names
Done informer model init. working on enc-dec
added things to address, after reading again enc-dec in the paper
done modeling - checking initialization work
moved enc-dec init to InformerEncoder/Decoder init
added 'init_std' to config, now model init works!
WIP conversion script, and added code sources
WIP conversion script: loading original informer pth works
WIP conversion script: change defaults in the config
WIP conversion script: supporting Informer input embedding
WIP conversion script: added parameters for the informer embed
WIP conversion script: change dim_feedforward=2048
WIP conversion script: remove unused args for loading checkpoint
just cleaning up
DataEmbedding removed, after thinking with Kashif
working on forward pass
WIP forward pass: trying to establish working batch for forward pass
cleaning and finalizing
adding HF names and docs
init after cleaning works
WIP in tests
added docs for the informer specific args
fix style
undo change
cleaning informer, now need to work only enc-dec
initial enc-dec classes
added encoder and decoder
added todo
add todos for conv_layers
added decoder docs from vanilla
added encoder docs from vanilla
remove encoder decoder from the original informer
removed AttentionLayer from the original paper
removed TriangularCausalMask, same as decoder_attention_mask
initial sparse attention
use conv_layers
fixed test_config test
fix parenthesis when itearting zip(layers, conv_layers)
error found in prob attention, added sizes as comments
fix sizes
added proposal for q_reduce indexing, and remove unused
WIP ProbMask, and changed factor=2 for testing
remove unused libs for this PR for creating the env
fix checking the attn_weights.size() after bmm
Q_reduce: changed from torch.gather to simple slicing
WIP calculate final attn_output
finish adding v_aggregated, attn_output ready
changed tgt_len to u in attention_mask, need to fix the size error
comment attention_mask for encoder, and fix if cond for v_agg
added ProbMask support (wip), removed old original code
finished ProbMask 😃
Revert "remove unused libs for this PR for creating the env"
This reverts commit 11a081e09e92771e51a5d2758d53a9afb59547f0.
fixes
make style
fix initial tests
fix more tests
dry
make style
remove unused files
style
added integration tests
fix num_static_real_features
fix header
remove unused function
fix example
fix docs
Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
- Update src/transformers/models/informer/modeling_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
- Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
- Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
- Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
- Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
fixes for reviewer
use prediction_length from model
fix style
fixed informer.mdx
added to index
updated readme
undo
make fix-copies
typo
fix copy
added Informer to toctree
in order
fixed comments
remove unneeded new lines in docs
make static real and cat optional
fix use of distil conv layers
fixed integration test
added checkpoint for convlayer
make fix-copies
updated from time series model
make fix-copies
copy decoder
fix unit tests
updated scaling config
fix integration tests
IGNORE_NON_TESTED
IGNORE_NON_AUTO_CONFIGURED
IGNORE_NON_AUTO_CONFIGURED
updated check configs
fix formatting
undo change from time series
prediction_length should not be None
aliign with the blog: prettify ProbSparse and change attention_factor to sampling_factor
make style
make fix-copies
niels CR: update contributed by
niels CR: update configuration_informer.py
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
- niels CR: update kashif -> huggingface
Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
niels CR:
sampling_factor
only relevant whenattention_type
=probmake style
fixed U_part: added multiplication by
L_Q
fixed bug: remove
is not None
fromif config.distil
fixed test:
decoder_seq_length
toencoder_seq_length
in cross_attentions checkfix integration tests
updated model hub
do not shift as in training
undo
fix make-copies
make fix-copies
added
if prediction_length is None
changed
ProbSparseAttention
toInformerProbSparseAttention
changed
V_sum
->v_mean_dim_time
changed
ConvLayer
toInformerConvLayer
and fixedsuper()
TimeSeriesTansformer->Informer in decoder's Copied from
more descriptive in ProbSparse
make style
fix coped from
Revert "added
if prediction_length is None
"
This reverts commit b4cbddfa05e3bd739b79569cd3c3b89e316f2451.
fixed indent
use InformerSinusoidalPositionalEmbedding
make fix-style
fix from #21860
fix name
make fix-copies
use time series utils
fix dec num_heads
docstring
added time series util doc
_import_structure
formatting
changes from review
make style
fix docs
fix doc
removed NegativeLogLikelihood
Co-authored-by: Kashif Rasul kashif.rasul@gmail.com Co-authored-by: NielsRogge 48327001+NielsRogge@users.noreply.github.com
Update tiny model creation script and some others files (#22006)
Update 1
Update 2
Update 3
Update 4
Update 5
Update 6
Update 7
Update 8
Update 9
Update 10
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Generate - add 1 to cur_len to make up the new beam length (#21993)
add 1 to cur_len to make up the new beam length
cur_len is 1 token shorter comparing to the length of the sequence whose best_sum_logprobs is the numerator.
cur_len+=1 before check if beam hyp is done
format code
reformat with black
Co-authored-by: Chiming chiming@biomap.com
- VideoMAE doctest - use valid dummy pixel values (#22022)
Use valid dummy pixel values
update: bertology paper (#22012)
Update
AudioClassificationPipelineTests::test_small_model_pt
for PT 2.0.0 (#22023)
fix
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
[
bnb
] Fix bnb error message (#22026)fix error message
make style
[WIP] Add BridgeTowerForContrastiveLearning (#21964)
Add BridgeTower for ITC
Fix review feedback
Rename BridgeTowerForITC, cleanup
Fix style and quality
implement tests
Co-authored-by: Tiep Le 97980157+tileintel@users.noreply.github.com Co-authored-by: Tiep Le tiep.le@intel.com
Fix test for torchneuroncore in Trainer (#22028)
Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline (#22031)
add tokenize_kwargs doc in the FeatureExtractionPipeline
[examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py (#21942)
Add specaugment to run_speech_recognition_seq2seq.py
Remove useless argument: text_column
Fix quality
Update return_attention_mask condition
Update specaugment arguments only for whisper models
Remove SpecAugment arguments from ModelArguments, only leave default values for simplicity
Apply suggestions from code review
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com
Update apply_spec_augment only for whisper models
Apply suggestions from code review
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com
- Rename return_attention_mask to forward_attention_mask to avoid confusion with wav2vec2 models
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com
fixes the gradient checkpointing of whisper (#22019)
fixing
Update modeling_whisper.py
Update modeling_whisper.py
Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
Avoid
text_config_dict
andvision_config_dict
being saved for CLIP-like models (#22035)Avoid text_config_dict and vision_config_dict being saved
for other CLIP-like models
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Mark all
BridgeTower
tests slow for now (#22039)slow me
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Bug fix: token classification pipeline while passing offset_mapping (#22034)
fix slow tokenizers with passing offset_mapping
Update ALIGN docs (#22025)
Fix typos and add code examples, resources
[21737][T5]: Fix gradient checkpoint bug (#22036)
[21737][T5]: Fix gradient checkpoint bug
[21737][T5]: Fix gradient checkpoint bug
[21737][T5]: Fix gradient checkpoint bug
Update src/transformers/models/mt5/modeling_mt5.py
Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: njindal njindal@adobe.com Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
- Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it (#22045)
In ZSH, not using ' ' around pip install fails
Running
pip install transformers[torch]
in the default ZSH terminal will fail with the error zsh: no matches found: transformers[torch]
The solution is to wrap the installation path in ' ' like
pip install 'transformers[torch]'
Relevant StackOverflow: https://stackoverflow.com/questions/30539798/zsh-no-matches-found-requestssecurity
Can't install tf2 on M1 Chip by default (#22046)
Remove set_access_token usage + fail tests if FutureWarning (#22051)
Remove set_access_token usage + fail tests if FutureWarning
do not fail on FutureWarning in CI
Co-authored-by: testbot lucainp@hf.co
Show the number of
huggingface_hub
warnings in CI report (#22054)show hfh warnings
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Return analysis for hyperparameter_search with Ray backend (#22040)
return analysis for hyperparameter_search with ray backend
Revert "return analysis for hyperparameter_search with ray backend"
This reverts commit cd5179070930e03020d96d98eb51dec3eb21ef75.
add run_summary attribute to BestRun and return analysis for ray backend
fix typo
add doc for run_summary for ray backend
pt-to-tf model architecture override (#22055)
Add an argument to pt-to-tf to allow overriding the model class
make fixup
Minor fix to error message
Remove unused extra conversion from the script
rm $ symbol from code block from contributing.md (#22057)
rm $ symbol from code block
Removed the $ symbol from the code block to make copy-pasting easier.
[deepspeed] offload + non-cpuadam optimizer exception (#22043)
[deepspeed] offload + non-cpuadam optimizer exception
flip
revert min version
Edit the docstring of
image_processing_donut
to match code (#22033)Edit the docstring of
image_processing_donut
to match codeimprove style
more style improvement after installing quality
Skip 3 tests for
WhisperEncoderModelTest
(#22060)skip 3 tests
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Add setters by type of args to TrainingArguments (#21570)
Add setters by type of args to TrainingArguments
Define more setters
Update tiny model creation script (#22058)
Update the script
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Fix case when using --gradient_accumulation_steps with DDP disabled. (#22007)
Co-authored-by: EC2 Default User ec2-user@ip-172-31-42-72.us-west-2.compute.internal
Add a progress bar for the total download of shards (#22062)
Add a progress bar for the total download of shards
Check for no cache at all
Fix check
Fix gradient checkpointing bug in Speech2Text (#22079)
Fix gradient checkpointing bug in Speech2Text
Update modeling_speech_to_text.py
Update modeling_speech_to_text_2.py
Fix gradient checkpointing bug in switch transformer (#22081)
[GPT2] Propose fix for #21080 (#21853)
Make sure position ids are masked
test that padded input produce the same results
fix failing tests
fixup
fix batch test
Fix small typo in flan-ul2.mdx (#22068)
Update flan-ul2.mdx
Update flan-ul2.mdx
Generate - Fix broken documentation links (#22078)
fix broken links
Fix gradient checkpointing bug in Speecht5 (#22080)
Fix gradient checkpointing bug in Speecht5
Update modeling_speech_to_text.py
Update src/transformers/models/speech_to_text/modeling_speech_to_text.py
Fix change errors
Co-authored-by: Joao Gante joaofranciscocardosogante@gmail.com
- Fix hint in src/transformers/modeling_utils.py (#22074)
fix hint
handle numpy inputs in whole word mask data collator (#22032)
GPT-J specific half precision on CPU note (#22086)
re: #21989
update re: #21989
removed cpu option
make style
Fix imports of TF MobileViT (#22065)
Fix imports of TF MobileViT
Fix copies
Revert "[GPT2] Propose fix for #21080" (#22093)
Revert "[GPT2] Propose fix for #21080 (#21853)" to avoid CI failure
This reverts commit a3fef89b2694fac4dd642a3f77d3e96d0c3df82a.
[Whisper] Remove embed_tokens from encoder docstring (#21996)
[Whisper] Remove embed_tokens from encoder docstring
new line to retrigger CI
remove new line
Add AutoModelForZeroShotImageClassification (#22087)
Adds AutoModelForZeroShotImageClassification to transformers
add new model of MGP-STR (#21418)
add new model of MGP-STR
fix the check failings
remove torch and numpy from mgp_tokenization
remove unused import from modeling_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str.py
add test_processing_mgp_str
add test_processing_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str and add softmax outs to model
rm test_processing_mgp_str and add softmax outs to model
rewrite the code of mgp-str according to PR suggestions
rewrite the code of mgp-str according to PR suggestions
add new model of MGP-STR
fix the check failings
remove torch and numpy from mgp_tokenization
remove unused import from modeling_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str.py
add test_processing_mgp_str
add test_processing_mgp_str
add test_processing_mgp_str
rm test_processing_mgp_str and add softmax outs to model
rewrite the code of mgp-str according to PR suggestions
rewrite the code of mgp-str according to PR suggestions
remove representation_size from MGPSTRConfig
reformat configuration_mgp_str.py
format test_processor_mgp_str.py
add test for tokenizer and complete model/processer test and model file
rm Unnecessary tupple in modeling_mgp_str
reduce hidden_size/layers/label_size in test_model
add integration tests and change MGPSTR to Mgpstr
add test for logit values
reformat test model file
Co-authored-by: yue kun yuekun.wp@alibaba-inc.com
Add pr_checks.mdx Italian translation (#17459) (#22116)
Add pr_checks.mdx Italian translation (#17459)
Updated pr_checks.mdx Italian translation (#17459)
Fix gradient checkpointing bug in xglm (#22127)
Fix gradient checkpointing bug in Trajectory Transformer (#22125)
Fix gradient checkpointing bug in xlm_roberta_xl (#22128)
Added big_models.mdx italian translation #17600 (#22115)
updated toctree
italian translation big_model.mdx
italian translation big_models
[
Blip2
] skip accelerate test (#22124)
skip accelerate test
Fix gradient checkpointing bug in xmod (#22129)
Fix gradient checkpointing bug in LongT5 (#22130)
Fix gradient checkpointing bug in trocr (#22126)
Fix gradient checkpointing bug in trocr
Fix format
Update src/transformers/models/trocr/modeling_trocr.py
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
Zero-shot image classification task guide (#22132)
WIP
WIP
manual inference example
make style
Apply suggestions from code review
Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com
Co-authored-by: Alara Dirik 8944735+alaradirik@users.noreply.github.com
Fix doc link for MGP-STR (#22138)
Adding Type Hints to TF_Pegasus model (#21941)
Adding Type Hints to TF_Pegasus model
Updated some parameters per maintainer comments
Add a new script to check model testers' config (#22063)
Add script
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Update configuration_align.py (projected_dim=640) (#22139)
Update configuration_align.py
updated projected_dim=640 from 512 in arguments of AlignConfig
[
Whiper
] addget_input_embeddings
toWhisperForAudioClassification
(#22133)add
get_input_embeddings
toWhisperForAudioClassification
add common tests
fix another common test
Update tests/models/whisper/test_modeling_whisper.py
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com
- fix style
Co-authored-by: Arthur 48595927+ArthurZucker@users.noreply.github.com
Trainer: let generate pick its inputs (#22108)
Let generate pick its inputs
fix squad seq2seq example
Enforce same behavior as PyTorch 2.0 for older versions (#22136)
[trainer] fix bug in grad accum with multiple epochs (#22098)
[trainer] fix bug in grad accum
comment out debug
fix one-off
rename counter
[deepspeed docs] Activation Checkpointing (#22099)
[deepspeed docs] Activation Checkpointing
Apply suggestions from code review
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
- Update deepspeed.mdx
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Remove backend check for torch.compile (#22140)
Remove backend enforcment for torch.compile
Update error
Update src/transformers/training_args.py
Co-authored-by: Stas Bekman stas00@users.noreply.github.com
- Apply suggestions from code review
Co-authored-by: Stas Bekman stas00@users.noreply.github.com
- Style
Co-authored-by: Stas Bekman stas00@users.noreply.github.com
[Safetensors] Add explicit flag to from pretrained (#22083)
[Safetensors] Add explicit flag to from pretrained
add test
remove @
Apply suggestions from code review
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
- Prepare daily CI for torch 2.0.0 (#22135)
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
docs: New terms and updates to glossary (#21982)
Updated glossary with new terms, added abbreviations for certain terms and merged autoencoding models, autoregressive models and causal language modeling into encoder and decoder models
Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
Added link to 'Pipeline for inference' tutorial
Trigger CI
Update docs/source/en/glossary.mdx
Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
- Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
Added entry for self supervised learning, added deleted entries + fixed broken links
Update docs/source/en/glossary.mdx
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com
Co-authored-by: Steven Liu 59462357+stevhliu@users.noreply.github.com Co-authored-by: Sylvain Gugger 35901082+sgugger@users.noreply.github.com
[🛠️] Fix-whisper-breaking-changes (#21965)
temp fix
temporary fix
update
fix tests
fixup
update based on reveiew
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com
update to fix tests
update docstring
Co-authored-by: Sanchit Gandhi 93869735+sanchit-gandhi@users.noreply.github.com
Move
is_pipeline_test_to_skip
to specific model test classes (#21999)Move
is_pipeline_test_to_skip
to specific model test classes
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Add ConvNeXT V2 (#21679)
Add ConvNeXt V2 to transformers
TF model is separated from the PR to fix issues
Update 2 doctest expected values for torch 2.0.0 (#22148)
update values
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Translation Italian: perf_train_cpu and perf_train_cpu_many (#22151)
added translated files
added perf_train_cpu and perf_train_cpu_many
updated toctree
Fix big model inference for T5 models in float16 (#22095)
Fix big model inference for T5 models in float16
Apply suggestions from code review
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
Style
Trigger CI with latest release
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
Create MaskedImageCompletionOutput and fix ViT docs (#22152)
create MaskedImageCompletionOutput
fix bugs
fix bugs
to_pil - don't rescale if int and in range 0-255 (#22158)
Don't rescale if in and in range 0-255
Raise value error if int values too large
Update tests/test_image_transforms.py
Update tests/test_image_transforms.py
[trainer] add
--optim adamw_torch_fused
for pt-2.0+ (#22144)[trainer] add --optim adamw_torch_fused
change optim default
deal with non-torch
revert default change; prep; add fp16/amp assert
typo
typo
Revert "Enforce same behavior as PyTorch 2.0 for older versions" (#22163)
Revert "Enforce same behavior as PyTorch 2.0 for older versions (#22136)"
This reverts commit 1c801d65eb42a71ea52db797af760bd96c8b113f.
v4.28.0.dev0
Load optimizer state on CPU to avoid CUDA OOM (#22159)
Run all tests by default (#22162)
Fix: unfinished_sequences with correct device (#22184)
Fix: unfinished_sequences with correct device
The original code was causing errors when running torch.jit.trace due to the tensor options being incorrect. I fixed this by using torch.ones to create a tensor with the correct device and dtype. This should resolve the issue with running torch.jit.trace.
- Revert 22152 MaskedImageCompletionOutput changes (#22187)
Revert changes
Regression pipeline device (#22190)
Fix regression in pipeline when device=-1 is passed
Add regression test
Update BridgeTowerForContrastiveLearning (#22145)
Use return_loss for BridgeTowerForContrastiveLearning, add example
fix tests
Update example in BridgeTowerForContrastiveLearning
Update test_modeling_bridgetower.py
update model output format
minor update
Update src/transformers/models/bridgetower/modeling_bridgetower.py
make style
Co-authored-by: Tiep Le 97980157+tileintel@users.noreply.github.com Co-authored-by: Tiep Le tiep.le@intel.com Co-authored-by: Yih-Dar 2521628+ydshieh@users.noreply.github.com Co-authored-by: ydshieh ydshieh@users.noreply.github.com
t5 remove data dependency (#22097)
t5 remove data dependency
make style
make fix-copies
Co-authored-by: Prathik Rao prathikrao@microsoft.com
Fix DeepSpeed CI (#22194)
Deal with torch-tensorrt
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Fix typo in Align docs (#22199)
Fix align docs typo
- Update expected values in
MgpstrModelIntegrationTest
(#22195)
Update values
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Italian Translation of migration.mdx (#22183)
Tranlstion Italian: migration
Update migration.mdx
minor fixes
Update _toctree.yml
Delete migration.mdx
Add italian translation of migration.mdx
Update of migration.mdx translation and toctree
LLaMA Implementation (#21955)
LLaMA
sharding and docs
tweak
black
inits
ruff
LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
init
no checkpoint
docs
ruff
type_vocab_size
tokenizer fixes
tokenizer fixes
Update tokenization_llama.py
Update tokenization_llama.py
Update configuration_llama.py
Update modeling_llama.py
tokenizer add_bos by default
licenses
remove decoder
norms and mlp
rope overhaul
tweaks
black
mention OPT implementation
off-by-one naming
typo
fix
tokenization fix and slicing bug
padding config
cleanup
black
update tests
undo typo
fix vocab caching logic
ruff
docbuilder
attn fix from BlackSamorez
initial feedback
typo
docs
llama case
llama case
load checkpoint docs
comment about tokenizer
tokenizer defaults
clear past_key_values if use_cache=False
last tweaks
last tweaks
last tweaks
last tweaks
Co-authored-by: Stella Biderman stellabiderman@gmail.com
LLaMA Implementation (#21955)
LLaMA
sharding and docs
tweak
black
inits
ruff
LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
init
no checkpoint
docs
ruff
type_vocab_size
tokenizer fixes
tokenizer fixes
Update tokenization_llama.py
Update tokenization_llama.py
Update configuration_llama.py
Update modeling_llama.py
tokenizer add_bos by default
licenses
remove decoder
norms and mlp
rope overhaul
tweaks
black
mention OPT implementation
off-by-one naming
typo
fix
tokenization fix and slicing bug
padding config
cleanup
black
update tests
undo typo
fix vocab caching logic
ruff
docbuilder
attn fix from BlackSamorez
initial feedback
typo
docs
llama case
llama case
load checkpoint docs
comment about tokenizer
tokenizer defaults
clear past_key_values if use_cache=False
last tweaks
last tweaks
last tweaks
last tweaks
Co-authored-by: Stella Biderman stellabiderman@gmail.com
Update tiny model creation script (#22202)
Update UNCONVERTIBLE_MODEL_ARCHITECTURES
Deal with 2 model tester classes in single test file
Deal with 2 model tester classes in single test file
Deal with 2 model tester classes in single test file
make style and quality
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Temporarily fix ONNX model exporting error (#21830)
Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143
Reduced column width
Fix formatting.
Revert "Temporarily fix https://github.com/microsoft/onnx-converters-private/issues/143"
This reverts commit 6e95a108042118d204da447729f3834affa354fc.
Fix export error.
Revert "Fix formatting."
This reverts commit 8310f60da10358edbdf77a2a2f3c83ee55066cb8.
Propagated changes made in SwinV2 to Swin2SR
[
XGLM
] Addaccelerate
support for XGLM (#22207)add
accelerate
support for XGLMfix order
fixes a typo in WhisperFeatureExtractor docs. (#22208)
fixes a typo
.
🔥py38 + torch 2 🔥🔥🔥🚀 (#22204)
py38 + torch 2
increment cache versions
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
- Hotfix for natten issue with torch 2.0.0 on CircleCI (#22218)
fix
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
fix typos in llama.mdx (#22223)
fix code example in mgp-str doc (#22219)
Co-authored-by: yue kun yuekun.wp@alibaba-inc.com
- Use
dash==2.8.1
for now for daily CI (#22227)
Use dash 2.8.1 for now
Co-authored-by: ydshieh ydshieh@users.noreply.github.com
Depth estimation task guide (#22205)
added doc to toc, auto tip with supported models, mention of task guide in model docs
make style
removed "see also"
minor fix
LLaMA house-keeping (#22216)
LLaMA house-keeping
Doc links
fix AutoTP in deepspeed could not work for bloom (#22196)
fix AutoTP in deepspeed could not work for bloom
Signed-off-by: Wang, Yi A yi.a.wang@intel.com
- add a method in BloomModel to build ailib
Signed-off-by: Wang, Yi A yi.a.wang@intel.com
Signed-off-by: Wang, Yi A yi.a.wang@intel.com
Add LlamaForSequenceClassification (#22209)
Add LlamaForSequenceClassification
Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
- Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
Add docstring
Add test
Add input embedding getter and setter
Remove dead code
Co-authored-by: Younes Belkada 49240599+younesbelkada@users.noreply.github.com
- Removed .mdx extensi…