RFR: race with nested repos in hgforest.sh (original) (raw)
Neil Richards neil.richards at ngmr.net
Wed Feb 6 16:51:15 UTC 2013
- Previous message: RFR: race with nested repos in hgforest.sh
- Next message: RFR: race with nested repos in hgforest.sh
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
fwiw, I've always found the attempts by hg_forest to process things in parallel cause me more pain than it's worth.
Marching through the repo locations serially in order might take a few moments longer elapsed time, but creates instantly understandable status / output / log and points for returning sensible errors when things don't succeed.
Doing so also reduces the burst load on the server (it has to deal with one request per invocation at any time, rather than five? (or more)), which also strikes me as a win.
(There are times that my 'hg pull' requests get momentarily rebuffed by the OpenJDK server, in a manner that suggests to me it's getting flooded by such requests).
Obviously, figuring out how to do everything properly in parallel is a lot more fun :) ... but I don't think this script is there yet, and I wonder if the KISS principle shouldn't apply here.
Regards, Neil
On Wed, 2013-02-06 at 10:41 +0000, Chris Hegarty wrote:
On 06/02/2013 10:24, David Holmes wrote: > Thanks Chris. I missed a couple of obvious things there. :( > > One thing related to the repo name in the $(rc) variable. Somewhere the > repo name gets mangled: > > + echo jdk/src/closed > + sed -e s at ./@@ -e s@/@ at g > + repopidfile=jdsrcclosed > > the / to doesn't work for the first / (not part of your fix though). > > I must say though that this bug still has me perplexed. I ran with set > -x and observed the following. Given: > > # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed > jdk/make/closed jdk/test/closed hotspot/make/closed > hotspot/src/closed hotspot/test/closed deploy install sponsors pubs > > The first clone I see is corba and the second is jdk/make/closed. The > very last clone I see is jdk. BUT there was no error creating the > sub-repos before the enclosing repo! ??? Makes no sense. I only see the problem when running with old versions of hg. More recent ones seem to be immune to it. I guess they must do something equivalent to 'mkdir -p' on the destination directory. Blocking the clone of the nested repos until the enclosing repo creates the holding directory seems like a small price to pay for such an obscure bug. Also, now returning an error code when something fails should help avoid any future issues of missing or out of date repos. -Chris. > > David > ----- > > On 6/02/2013 8:04 PM, Chris Hegarty wrote: >> Thank for the review. comments inline... >> >> On 06/02/2013 01:53, David Holmes wrote: >>> 204 # Terminate with exit 0 only if all subprocesses were successful >>> 205 ec=0 >>> 206 if [ -d ${tmp} ]; then >>> 207 for rc in
ls ${tmp}/*.pid.rc
; do >>> >>> I don't think you need to subshell an invocation of ls, simply do >>> >>> for rc in "${tmp}/*.pid.rc" ; do >> >> Yes this is better, but the quotes are incorrect. It should simply be >> for rc in ${tmp}/*.pid.rc ; do >> >>> 208 exitcode=cat ${rc} | tr -d ' \n\r'
>>> 209 if [ "${exitcode}" != "0" ] ; then >>> 210 echo "WARNING: ${rc} exited abnormally." >>> >>> Not sure printing ${rc} will be very informative there ?? >> >> It prints the file name which contains the subrepo name. Not pretty, but >> this gives more information about which clone/pull failed. >> >>> 211 ec=1 >>> >>> Is there a "break" we can add here? >> >> Yes, we could break out/exit early, but I deliberately want to iterator >> over all the child process exits codes to better highlight which ones, >> if more than one, failed. >> >>> In getsource.sh >>> >>> # Get clones of all nested repositories >>> ! sh ./common/bin/hgforest.sh clone "$@" || exit 1 >>> >>> # Update all existing repositories to the latest sources >>> sh ./common/bin/hgforest.sh pull -u >>> >>> don't we want "|| exit 1" for the pull case as well? >> >> Since the pull is the last command there is no need for "|| exit 1", >> since its exit code will be returned. >> >> -Chris. >> >>> >>> Thanks, >>> David >>> >>> On 6/02/2013 12:07 AM, Chris Hegarty wrote: >>>> On 02/05/2013 06:21 AM, David Holmes wrote: >>>>> Chris, >>>>> >>>>> When these failures occur does the failure get reflected in an error >>>>> exit code? >>>> >>>> Not currently, good point. >>>> >>>>> I'm seeing hudson builds merilly buildinh OpenJDK instead of Oracle >>>>> JDK >>>>> because the closed repos were skipped due to this bug :( >>>> >>>> This is what I am seeing too, and the reason I proposed a patch to >>>> resolve it. >>>> >>>> Updated webrev; >>>> http://cr.openjdk.java.net/~chegar/hgforestnestedRepos/webrev/ >>>> >>>> 1) now checks, and returns, appropriate error code, so that higher >>>> level scripts can determine if it completed successfully >>>> 2) Added diagnostic error message to help identify potential future >>>> problems if a nested repo is blocked for too long >>>> >>>> -Chris. >>>> >>>>> >>>>> David >>>>> >>>>> On 2/02/2013 12:40 AM, Chris Hegarty wrote: >>>>>> [ to build-dev and core-libs-dev, expect reviewer from either, but >>>>>> will >>>>>> integrate through jdk8/tl ] >>>>>> >>>>>> This issue is mainly of interest to Oracle engineers, but it effects >>>>>> the >>>>>> public hgforest script. >>>>>> >>>>>> When hgforest.sh is run with an addition argument to specify a closed >>>>>> server, there is a problem/race between the creation of the >>>>>> directories >>>>>> to hold nested repositories and the clone itself. These directories >>>>>> need >>>>>> to be created before the clone command is executed, otherwise it will >>>>>> fail, as below. >>>>>> >>>>>> The trivial fix is to back off these nested repos until their >>>>>> containing >>>>>> directory exists. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~chegar/hgforestnestedRepos/webrev/ >>>>>> >>>>>> sh ./getsource.sh http://xxx.yyy.oracle.com | & tee clone.log >>>>>> # Repositories: corba jaxp jaxws langtools jdk hotspot jdk/src/closed >>>>>> jdk/make/closed jdk/test/closed hotspot/make/closed >>>>>> hotspot/src/closed >>>>>> hotspot/test/closed deploy install sponsors pubs >>>>>> >>>>>> Waiting 5 secs before spawning next background command. >>>>>> Waiting 5 secs before spawning next background command. >>>>>> Waiting 5 secs before spawning next background command. >>>>>> jdk/src/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u >>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone >>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/src/closed jdk/src/closed >>>>>> jdk/src/closed: abort: No such file or directory: jdk/src/closed >>>>>> jdk/make/closed: /java/devtools/sparc/mercurial/0.9.5/bin/python -u >>>>>> /java/devtools/sparc/mercurial/latest/bin/hg clone >>>>>> http://xxx.yyy.oracle.com/jdk8/tl/jdk/make/closed jdk/make/closed >>>>>> jdk/make/closed: abort: No such file or directory: jdk/make/closed >>>>>> Waiting 5 secs before spawning next background command. >>>>>> .... >>>>>> >>>>>> >>>>>> -Chris.
-- Unless stated above: IBM email: neil_richards at uk.ibm.com IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
- Previous message: RFR: race with nested repos in hgforest.sh
- Next message: RFR: race with nested repos in hgforest.sh
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]