[R-sig-hpc] Rmpi spawning across nodes. (original) (raw)

Stephen Weston stephen.b.weston at gmail.com
Mon Apr 9 22:55:55 CEST 2012


Hi Ben,

What machines are listed when you execute:

cat $PBS_NODEFILE

in your batch script? Is it definitely four different nodes?

On Mon, Apr 9, 2012 at 3:28 PM, Ben Weinstein <bweinste at life.bio.sunysb.edu> wrote:

Hi Stephen,

I've tried to follow your answer, but i'm still getting the same results. the heart of my qsub looks like: mpirun -hostfile $PBSNODEFILE -np 1 R --slave -f /nfs/user08/bw4sz/Files/Seawulf.R

Before i run the foreach statement, i ask what node am i on? [1] "Original Node wulfie121" I make sure the open MPI library is there. [1] "/usr/local/pkg/openmpi-1.4.4/lib/" I make the cluster and ask how many slaves were spawn 4 slaves are spawned successfully. 0 failed. Then i ask what are the nodenames of each of my slaves. I believe that if this is working correctly, each of the nodenames should be different, since i specified #PBS -l nodes=4:ppn=1 However, all the slaves still spawn on that one node. [[1]]  nodename     machine "wulfie121"    "x8664" [[2]]  nodename     machine "wulfie121"    "x8664" [[3]]  nodename     machine "wulfie121"    "x8664" [[4]]  nodename     machine "wulfie121"    "x8664" Finally, i'm testing how long the process takes to see if i'm actually getting parrelization. [1] 4  user  system elapsed  17.650  39.990 159.632 Again, the heart of the code looks like cl <- makeCluster(4, type = "MPI")_ _print(clusterCall(cl,function() Sys.info()[c("nodename","machine")]))_ _registerDoSNOW(cl)_ _print(getDoParWorkers())_ _system.time(five.ten <- rbind.fill(foreach(j=1:times ) %dopar%_ _drop.shuffle(j,iterations)))_ _stopCluster(cl)_ _I am about to change over to a different parralel backend as suggested, but_ _i doubt that is the root of the problem in this case._ _I appreciate the continued help,_ _Ben Weinstein_ _On Thu, Mar 29, 2012 at 2:56 PM, Stephen Weston <stephen.b.weston at gmail.com> wrote:

Hi Ben, You have to run R via mpirun, otherwise all of the workers start on the one node. > I have tried using mpirun -np 4 in front of the R - call, but this just > fails without message. You have to use '-np 1', otherwise your script will be executed by mpirun four times, each trying to spawn four workers. I'm not sure if that explains failing without a message, however. Try something like this: #!/bin/bash #PBS -o 'qsub.out' #PBS -e 'qsub.err' #PBS -l nodes=4:ppn=1 #PBS -m bea cat $PBSNODEFILE hostname cd $PBSOWORKDIR # Run an R script mpirun -hostfile $PBSNODEFILE -np 1 R --slave -f /nfs/user08/bw4sz/Files/Seawulf.R You may not need to use '-hostfile $PBSNODEFILE', depending on how your Open MPI was built, but I don't think if ever hurts, and it may be required for your installation. - Steve

-- Ben Weinstein Graduate Student Ecology and Evolution Stony Brook University http://life.bio.sunysb.edu/~bweinste/index.html



More information about the R-sig-hpc mailing list