Slide View : Parallel Programming :: Fall 2019 (original) (raw)
WLTDO
Is my understanding correct that the program could not advance to the next step (e.g., i=1) until all the program instances have finished their work at the current step (e.g., i=0)?
In addition, would all instances stall until the single memory load is completed? What about the case in which the program need to make several memory loads?
anonymoose
It seems like another benefit of interleaving like this is that, depending on the context of what's being computed, work may be more evenly distributed among program instances, i.e. assuming instruction i takes a similar amount of time to compute as instruction i+1 takes to compute.
barracuda
(@WLTDO) Yes, the load at one time-step across the program instances is implemented by a vector instruction, so it would have to be completed before the next one.