[CI] add a big GPU marker to run memory-intensive tests separately on CI by sayakpaul · Pull Request #9691 · huggingface/diffusers (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation17 Commits38 Checks18 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

What does this PR do?

I have only touched a handful of tests with the marker being introduced. I think we may need to change the slices based on the CI machine and infra. @a-r-r-o-w should consider marking the Cog tests similarly as well?

@DN6 would love to get your thoughts on the design.

@@ -2,6 +2,7 @@ name: Nightly and release tests on main/release branch

on:
workflow_dispatch:
pull_request:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary.

@@ -18,6 +19,7 @@ env:

jobs:
setup_torch_cuda_pipeline_matrix:
if: github.event_name == 'schedule'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary.

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

should consider marking the Cog tests similarly as well?

With model cpu offload and vae tiling, it should be < 16 GB, and I think we documented it here. Are we seeing Cog test failures due to memory? I see that they are passing here

DN6 approved these changes Oct 16, 2024

Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👍🏽

@DN6 okay if I modified the failing tests to account for the machine change?

@DN6 can you give this a look? I think the test failures should go away once the CI Bot has access to Flux.

Once approved I will revert the changes which I have denoted as temporary (like this).


@slow
@require_torch_gpu
class FluxControlNetImg2ImgPipelineSlowTests(unittest.TestCase):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test was correctly done as it doesn't pass the controlnet module to the pipeline and it also uses very dummy inputs which I think should be avoided for an integration test. LMK if you think otherwise.

DN6 approved these changes Oct 31, 2024

In a follow-up I will introduce the quantization tests.

a-r-r-o-w added a commit that referenced this pull request

Nov 1, 2024

… CI (#9691)

add a marker for big gpu tests
update
trigger on PRs temporarily.
onnx
fix
total memory
fixes
reduce memory threshold.
bigger gpu
empty
g6e
Apply suggestions from code review
address comments.
fix
fix
fix
fix
fix
okay
further reduce.
updates
remove
updates
updates
updates
updates
fixes
fixes
updates.
fix
workflow fixes.

Co-authored-by: Aryan aryan@huggingface.co

sayakpaul added a commit that referenced this pull request

Dec 23, 2024

… CI (#9691)

add a marker for big gpu tests
update
trigger on PRs temporarily.
onnx
fix
total memory
fixes
reduce memory threshold.
bigger gpu
empty
g6e
Apply suggestions from code review
address comments.
fix
fix
fix
fix
fix
okay
further reduce.
updates
remove
updates
updates
updates
updates
fixes
fixes
updates.
fix
workflow fixes.

Co-authored-by: Aryan aryan@huggingface.co