[R-sig-hpc] Rmpi spawning across nodes. (original) (raw)
Stephen Weston stephen.b.weston at gmail.com
Mon Apr 9 22:55:55 CEST 2012
- Previous message: [R-sig-hpc] libRblas.so => not found problem
- Next message: [R-sig-hpc] Rmpi spawning across nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Ben,
What machines are listed when you execute:
cat $PBS_NODEFILE
in your batch script? Is it definitely four different nodes?
- Steve
On Mon, Apr 9, 2012 at 3:28 PM, Ben Weinstein <bweinste at life.bio.sunysb.edu> wrote:
Hi Stephen,
I've tried to follow your answer, but i'm still getting the same results. the heart of my qsub looks like: mpirun -hostfile $PBSNODEFILE -np 1 R --slave -f /nfs/user08/bw4sz/Files/Seawulf.R
Before i run the foreach statement, i ask what node am i on? [1] "Original Node wulfie121" I make sure the open MPI library is there. [1] "/usr/local/pkg/openmpi-1.4.4/lib/" I make the cluster and ask how many slaves were spawn 4 slaves are spawned successfully. 0 failed. Then i ask what are the nodenames of each of my slaves. I believe that if this is working correctly, each of the nodenames should be different, since i specified #PBS -l nodes=4:ppn=1 However, all the slaves still spawn on that one node. [[1]] nodename machine "wulfie121" "x8664" [[2]] nodename machine "wulfie121" "x8664" [[3]] nodename machine "wulfie121" "x8664" [[4]] nodename machine "wulfie121" "x8664" Finally, i'm testing how long the process takes to see if i'm actually getting parrelization. [1] 4 user system elapsed 17.650 39.990 159.632 Again, the heart of the code looks like cl <- makeCluster(4, type = "MPI")_ _print(clusterCall(cl,function() Sys.info()[c("nodename","machine")]))_ _registerDoSNOW(cl)_ _print(getDoParWorkers())_ _system.time(five.ten <- rbind.fill(foreach(j=1:times ) %dopar%_ _drop.shuffle(j,iterations)))_ _stopCluster(cl)_ _I am about to change over to a different parralel backend as suggested, but_ _i doubt that is the root of the problem in this case._ _I appreciate the continued help,_ _Ben Weinstein_ _On Thu, Mar 29, 2012 at 2:56 PM, Stephen Weston <stephen.b.weston at gmail.com> wrote:
Hi Ben, You have to run R via mpirun, otherwise all of the workers start on the one node. > I have tried using mpirun -np 4 in front of the R - call, but this just > fails without message. You have to use '-np 1', otherwise your script will be executed by mpirun four times, each trying to spawn four workers. I'm not sure if that explains failing without a message, however. Try something like this: #!/bin/bash #PBS -o 'qsub.out' #PBS -e 'qsub.err' #PBS -l nodes=4:ppn=1 #PBS -m bea cat $PBSNODEFILE hostname cd $PBSOWORKDIR # Run an R script mpirun -hostfile $PBSNODEFILE -np 1 R --slave -f /nfs/user08/bw4sz/Files/Seawulf.R You may not need to use '-hostfile $PBSNODEFILE', depending on how your Open MPI was built, but I don't think if ever hurts, and it may be required for your installation. - Steve
-- Ben Weinstein Graduate Student Ecology and Evolution Stony Brook University http://life.bio.sunysb.edu/~bweinste/index.html
- Previous message: [R-sig-hpc] libRblas.so => not found problem
- Next message: [R-sig-hpc] Rmpi spawning across nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]