RFR: race with nested repos in hgforest.sh (original) (raw)

David Holmes david.holmes at oracle.com
Wed Feb 6 10:24:52 UTC 2013


Thanks Chris. I missed a couple of obvious things there. :(

One thing related to the repo name in the $(rc) variable. Somewhere the repo name gets mangled:

the / to _ doesn't work for the first / (not part of your fix though).

I must say though that this bug still has me perplexed. I ran with set -x and observed the following. Given:

Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed

jdk/make/closed jdk/test/closed hotspot/make/closed hotspot/src/closed hotspot/test/closed deploy install sponsors pubs

The first clone I see is corba and the second is jdk/make/closed. The very last clone I see is jdk. BUT there was no error creating the sub-repos before the enclosing repo! ??? Makes no sense.

David

On 6/02/2013 8:04 PM, Chris Hegarty wrote:

Thank for the review. comments inline...

On 06/02/2013 01:53, David Holmes wrote: 204 # Terminate with exit 0 only if all subprocesses were successful 205 ec=0 206 if [ -d ${tmp} ]; then 207 for rc in ls ${tmp}/*.pid.rc ; do

I don't think you need to subshell an invocation of ls, simply do for rc in "${tmp}/*.pid.rc" ; do Yes this is better, but the quotes are incorrect. It should simply be for rc in ${tmp}/*.pid.rc ; do 208 exitcode=cat ${rc} | tr -d ' \n\r' 209 if [ "${exitcode}" != "0" ] ; then 210 echo "WARNING: ${rc} exited abnormally." Not sure printing ${rc} will be very informative there ?? It prints the file name which contains the subrepo name. Not pretty, but this gives more information about which clone/pull failed. 211 ec=1 Is there a "break" we can add here? Yes, we could break out/exit early, but I deliberately want to iterator over all the child process exits codes to better highlight which ones, if more than one, failed. In getsource.sh # Get clones of all nested repositories ! sh ./common/bin/hgforest.sh clone "$@" || exit 1 # Update all existing repositories to the latest sources sh ./common/bin/hgforest.sh pull -u don't we want "|| exit 1" for the pull case as well? Since the pull is the last command there is no need for "|| exit 1", since its exit code will be returned. -Chris.

Thanks, David On 6/02/2013 12:07 AM, Chris Hegarty wrote: On 02/05/2013 06:21 AM, David Holmes wrote: Chris,

When these failures occur does the failure get reflected in an error exit code? Not currently, good point. I'm seeing hudson builds merilly buildinh OpenJDK instead of Oracle JDK because the closed repos were skipped due to this bug :( This is what I am seeing too, and the reason I proposed a patch to resolve it. Updated webrev; http://cr.openjdk.java.net/~chegar/hgforestnestedRepos/webrev/ 1) now checks, and returns, appropriate error code, so that higher level scripts can determine if it completed successfully 2) Added diagnostic error message to help identify potential future problems if a nested repo is blocked for too long -Chris.

David On 2/02/2013 12:40 AM, Chris Hegarty wrote: [ to build-dev and core-libs-dev, expect reviewer from either, but will integrate through jdk8/tl ] This issue is mainly of interest to Oracle engineers, but it effects the public hgforest script. When hgforest.sh is run with an addition argument to specify a closed server, there is a problem/race between the creation of the directories to hold nested repositories and the clone itself. These directories need to be created before the clone command is executed, otherwise it will fail, as below. The trivial fix is to back off these nested repos until their containing directory exists. Webrev: http://cr.openjdk.java.net/~chegar/hgforestnestedRepos/webrev/ sh ./getsource.sh http://xxx.yyy.oracle.com | & tee clone.log # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed jdk/make/closed jdk/test/closed hotspot/make/closed hotspot/src/closed hotspot/test/closed deploy install sponsors pubs Waiting 5 secs before spawning next background command. Waiting 5 secs before spawning next background command. Waiting 5 secs before spawning next background command. jdk/src/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u /java/devtools/sparc/mercurial/latest/bin/hg clone http://xxx.yyy.oracle.com/jdk8/tl/jdk/src/closed jdk/src/closed jdk/src/closed: abort: No such file or directory: jdk/src/closed jdk/make/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u /java/devtools/sparc/mercurial/latest/bin/hg clone http://xxx.yyy.oracle.com/jdk8/tl/jdk/make/closed jdk/make/closed jdk/make/closed: abort: No such file or directory: jdk/make/closed Waiting 5 secs before spawning next background command. ....

-Chris.



More information about the core-libs-dev mailing list