Problems with multiple GPUs (original) (raw)

Hi,
I have a problem using the IsaacSim with multiple GPUs. I am using IsaacSim v4.2.0 inside a docker container. Isaac is hosted on a server with multiple graphics cards. I use Ubuntu 22.04.5 LTS, and as for the graphics cards, they are NVIDIA RTX A6000 and there are 8 of them.
When I run the container to use only one GPU, Isaac works as it should, and there are no problems. The problem arises when I run the container with more than one graphics card.
The problem I have is that when using multiple GPUs Isaac becomes unstable and very slow. For example, when I select the RTX - Interactive (Path Tracing) option, when I use multiple graphics cards, Isaac completely blocks, while this does not happen when I use only one graphics card.

Isaac Sim Version

4.5.0
4.2.0
4.1.0
4.0.0
4.5.0
2023.1.1
2023.1.0-hotfix.1
Other (please specify):

Operating System

Ubuntu 22.04
Ubuntu 20.04
Windows 11
Windows 10
Other (please specify):

GPU Information

VickNV April 7, 2025, 8:08pm 2

Could you check if there are any errors or culprit messages in the log and share the complete log file for review?

Additionally, could you try using the workstation installation method and see if it also encounters the issue? If possible, could you test with Isaac Sim version 4.5.0 as well?

Hi VickNV,
Thanks for the quick reply.
The whole Isaac freezes (picture below), so it doesn’t give me any errors, but I attached the log file. I am not able to install the simulator directly on the machine, it needs to be inside a docker container. I will also try another version of Isaac.

Screenshot

kit_20250409_092544.log (1.3 MB)

VickNV April 9, 2025, 6:57pm 4

I cannot find any culprit messages in this log. Let’s wait for your result on the latest release.

I ran the IsaacSim 4.5.0 inside a docker container, and now it seems to me that the graphics are equally loaded and do not slow down the simulation as much.
Still, using only one graphics card works much better than using multiple graphics cards.

VickNV April 11, 2025, 7:27pm 6

Could you elaborate on this observation? Do you mean that running Isaac Sim on a single GPU has a better performance?

That’s right, the simulation is faster when using one GPU instead of multiple GPUs. Also, the simulation is better, in a sense, when using multiple graphics cards, as the simulation slows down, artifacts appear.

Single GPU:

single-gpu

Multiple GPUs:

multi-gpus

genboys May 9, 2025, 9:43am 8

Hello, have you solved this problem? I have the same problem. The simulated frame rate of 4 4090 GPUs is not as good as that of a single 4060 GPU.

Hi genboys,
I didn’t solve the problem. Still have same artifacts when using multiple GPUs, so I’m just using one GPU for now.

VickNV May 20, 2025, 4:35am 10

@andrej.ch Do you mean that you are still seeing the artifact issue on Isaac Sim 4.5.0? If so, could you also provide the full log on 4.5.0? Thanks.
@genboys are you seeing the artifact issue on 4.5.0 using multiple GPUs? If so, could you also provide your full log? Thanks.

Yes, we still have issues when using v4.5.0 with artifacts when using multiple graphics cards and the RTX - Interactive (Path Tracing) is turn on. Also, as @genboys said, FPS drops a lot when using multiple GPUs (better performance is when using only one GPU).

Single GPU:
SingleGPU.log (1.3 MB)

Screenshot 2025-05-23 141527

Multiple GUPs:
MultipleGPUs.log (1.4 MB)

Screenshot 2025-05-23 140411

VickNV May 28, 2025, 9:12pm 13

@genboys Could you also provide more details of your issue?

VickNV June 4, 2025, 5:33pm 14

We have created an internal ticket for this observation and will keep you updated on any progress.

I just want to offer some general advice here. We see this a lot with multiple GPUs. Is this your machine or are you renting it from a cloud provider? The reason I ask is people assume 4 GPUs will give them 4 x the rendering speed, and 8 GPUs will give them 8 x the rendering speed. This is simply not the case. Nowhere in our documentation do we say you get linear scaling.

The best way to use a massive machine like this is to run 8 dockers simultaneously and assign EACH GPU to each docker. 8 gpus, 8 dockers, all running on 1 GPU at 100% draw rate. This is the true power of this kind of machine.

As I say a lot on here, just because you add 8 engines to a car, does not mean you can drive 8 times as fast. The reason the machine is grinding to a halt is your 8 gpus are completely overwhelming the rest of the system. The cpu, the bus, the motherboard, the system ram. It’s totally overkill. 1 is good, 2 is great, but beyond that you start to get less and less gpu balance. 4 cards barely works well. 8 I have never seen balanced.

So my advice is this. If this is your machine, run multiple rendering projects simultaneously in containers to use all the cards better. Or just take out 6 cards, and put 2 GPUs each in 3 more machines. Make yourself a 4 way render farm. Still 8 cards but way more rendering power, all being way more efficient.

If this is not a machine you own, but rent, my advice is to stop renting it, because its crazy expensive and overkill, and rent 4 x dual GPU machines. You will see a much more efficient workflow.

Having said all of this, two more points.

  1. We are aware of the black horizontal lines with MGPU systems only when using Path Tracing. This is being investigated. It is a possible driver issue. We are working on it. It might be drivers, it might be you are running an old copy of the template.
  2. It is only with path tracing. I am sure this is just an example, but the scene you are rendering is very very basic. This could easily be rendered with REALTIME mode instead. No waiting for frames and no black lines. Again realtime works best on 1-2 GPUs.
  3. Finally “resolution” and “samples” matters. If you are rendering too low a resolution or too few samples, the other 7 cards have nothing to do and no chance of getting down to work. It takes some minimal amount of resolution for those other cards to “kick in”. If they are all “White” then they are in use. If they are all “grey” then they are not in use. You can change these “tiling” settings in the Render Settings under MGPU. It is the same with resolution. If you are rendering at 1080p, it is not worth it for the other cards to even bother. You want to be rendering at LEAST 4k.

Thank you for your reply.

I’m using the remote cloud machine. As you said, in documentation there is no mention of linear scaling, and I did not expect that. Here is what I tried and some comments.
When using multiple GPUs I get lower FPS, but I get visual artifacts as well (I posted images in earlier posts). I tried using 1, 2, 4, and 8 GPUs, and the best result for FPS is when using only one GPU. Also, visual artifacts are not there only when I’m using one GPU. I’ll try that once more just to confirm it.

Using multiple docker containers is something that I want to do as well. Here I also have a problem, and I made a separate post on the forum for that problem. Here is the link for that post.

Yes. The scene is very basic. I loaded that example (and a few more basic scenes) just to see if all GPUs (1, 2, 4, and 8) were loading correctly, and when I saw better performance with only one GPU, I did not test it further with more complex scenes. I continued using only one GPU.

Thank you once again for your comments.

Well the first thing I would do is not pay for an 8 GPU machine and only use 1 of them, lol. So return that and get some normal 1-2 gpu machines.

Second, try not to use Path Tracing in general. We have a new REALTIME 2.0 renderer out for Composer and soon for Isaac. Path tracing really is not necessary unless you are doing very complex caustics or refraction. For most scenes, realtime is totally fine. You can render whatever you want in 4k in milliseconds.

Finally please UPGRADE to 4.5, and soon to be 5.0. Stay ahead with us, because we are making improvements all the time.

And one more thing. Please get on our OFFICIAL AMIs for the AWS or Azure. They are correctly set up with the right drivers and right system setup. We have them for Linux and Windows. Don’t try to set all of this up yourself, because we cannot support it as well.

NVIDIA docs on AWS vWS: Using Omniverse AMIs on the AWS Marketplace — Omniverse Developer Workstations
AWS Linux AMI for development: AWS Marketplace: NVIDIA Omniverse™ Development Workstation (Linux)
AWS Linux AMI for enterprise: AWS Marketplace: NVIDIA Omniverse™ Enterprise Workstation (Linux)
AWS Windows AMI for development: AWS Marketplace: NVIDIA Omniverse™ Development Workstation (Windows)
AWS Windows AMI for enterprise: AWS Marketplace: NVIDIA Omniverse™ Enterprise Workstation (Windows)

I don’t use the machine only for the Isaac Sim.

Yes I do use Isaac Sim version 4.5.0. That was suggested by @VickNV, and FPS is still lower with multiple GPUs then one. I tried 1, 2, 4 and 8 GPUs as you mentioned. Using 2 or 4 GPUs are better then 8 GPUs, but with 1 GPU i still get best performance.

Regarding Cloud server, i did as per the official instructions: link

system Closed June 25, 2025, 6:43am 19

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.