Mombu the Science Forum sponsored links

Go Back   Mombu the Science Forum > Science > Mega-long/slow workunit?
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 16th June 06:02
patrick vervoorn
External User
 
Posts: 1
Default Mega-long/slow workunit?


Hi,

I'm running SetBOINC on several computers, one of which is a P4-2.8GHz
HT-enabled machine.

It is currently processing a WU which seems to take a very long time.
Using boinc_curses, I get:

setiathome_enhanced 527 04mr07ab.7106.' Running 47:24:28 0.089%
48:51:06


Full WU name/id:

04mr07ab.7106.4571.10.4.203

I've looked this one up via the BOINC account page, and other
people/machines which have crunched this WU, have aborted with a 'too long
processing' error.

My SetiBOINC install hasn't reached that conclusion (yet). Any idea on
when it will do that, and what the criteria are for such a conclusion?

This mega-long WU is now blocking one of the two threads on this machine
from crunching 'useful' WU's. Is there are way to manually clear it?

First time I ever ran into a WU like this actually...

Regards,

Patrick.
  Reply With Quote


  sponsored links


2 17th June 02:44
mark conroe
External User
 
Posts: 1
Default Mega-long/slow workunit?


There is an 'abort' option on 5.x clients. Sounds like you may have one of
the buggy clients/WUs sent out accidentally. See these thread for more details:

http://setiathome.berkeley.edu/forum_thread.php?id=41585
http://setiathome.berkeley.edu/forum_thread.php?id=41736
  Reply With Quote
3 17th June 02:44
patrick vervoorn
External User
 
Posts: 1
Default Mega-long/slow workunit?


I didn't have to abort it, after 52 hours of crunching, it finally
finished that WU, and sent it back/reported it.

However, now it seems the estimation of that client is a bit messed up, it
now 'thinks' all WUs will take 200+ hours to finish:

4 results
setiathome_enhanced 527 11fe07ag.27088.1' Running 7:12:16 91.712% 2:49:33
setiathome_enhanced 527 11ja07aa.3721.11' Running 4:24:08 61.789% 41:18:29
setiathome_enhanced 527 11ja07ab.19187.3' Ready to run 271:22:50
setiathome_enhanced 527 11mr07ab.27573.8' Ready to run 221:09:20

These estimates are totally wrong, since it finishes WU's in around 8 or 9
hours each. This also means the client does not maintain a 'realistic'
cache anymore, since it 'thinks' it has several days worth of WUs in the
cache (which it doesn't).

Restarting the client has no effect on these estimates, anyway to force
these to go lower again, or will the client self-adjust once it has crunch
a few more WUs in the 'standard' time of ~8 hours?

I do hope my P1-133MHz SetBOINC cruncher doesn't run into one of these
WU's since it will take it a few years to finish it probably.

Regards,

Patrick.
  Reply With Quote
4 17th June 02:44
odysseus
External User
 
Posts: 1
Default Mega-long/slow workunit?


<snip>


Yes; it will take a while to come down, because the secheduler is
designed to err on the side of caution when figuring out how much work
your system can do. You can ascertain the basis of its decisions from
the "Result duration correction factor" (RDCF) shown near the bottom of
the host page(s) in your account: the project-supplied time estimate for
each task is multiplied by this figure to obtain the estimate that your
BOINC client uses to decide when to ask for work and so on. The
algorithm for calculating this value makes it rise much more easily than
fall.

You can reset the RDCF by editing certain XML files in the BOINC data
folder, but unless the high estimates are extremely inconvenient for
you, I recommend letting it adjust on its own. Note that the new
Multibeam tasks have a different 'profile' (in terms of angle ranges and
the computations required), so everyone's clients will be adapting for a
while.

--
Odysseus
  Reply With Quote
5 17th June 02:44
patrick vervoorn
External User
 
Posts: 1
Default Mega-long/slow workunit?


It's not overly inconvenient, but the machine is plenty fast, and I'd set
the queues pretty long, so it at most of the times caches around 15 - 20
WUs (which it ran out of sometimes, when the servers were down for a
longer time).

Anyway, the new estimates the client does are set around 50 hours now, so
it seems to be adjusting. However, if this happens every time I get a
'faulty WU' as described in my OP, that would be pretty inconvenient....

Anyway, I'll keep a look on it, funny to see the client self-adjusting.

Regards,

Patrick.
  Reply With Quote
6 17th June 02:44
patrick vervoorn
External User
 
Posts: 1
Default Mega-long/slow workunit?


I've finally found this. it's currently:

Result duration correction factor 5.659713


For this host, while other computers I have running SetiBOINC are lower
than 1.0. The machine is currently struggling to get work. It crunches two
WU's at a time (taking about 8 to 9 hours per WU), and it only starts
fetching new WU's when either of these is close to finishing (and the
estimated time to finishing a WU is finally getting more realistic).

What's the easy way to set this to a more realistic value, and _what_ is a
more realistic value? 1.0?

How fast will this re-adjust itself? I'll keep monitoring the Host page,
and see what this 'factor' does over the next days... I suppose I should
be coming down, since the machine is back to it's usual turnaround
times...

I must say it's rather sloppy programming if this 'factor' is skewed like
this after the machine got handed 1 'buggy' WU....

Regards,

Patrick.
  Reply With Quote
7 17th June 02:44
patrick vervoorn
External User
 
Posts: 1
Default Mega-long/slow workunit?


After looking at the machine struggling to get a few workunits, and being
idle during the entire weekend because the SetiBOINC pipelines dried up, I
finally just edited the BOINC/client_state.xml file.
In there, in the <time_stats> section, there was a field called
<duration_correction_factor>, which containted a value of ~5.x. I stopped
BOINC, changed this value to 1.0, then restarted the client again, which
immediately started requesting WUs and slurping them in.

It now has a healthy cache of about 15 WUs again, enough to keep it busy
for a few days if/when the servers are down again. I suppose BOINC will
start re-adjusting this 'factor' once it has finished some WUs, but it now
has a more realistic 'starting-value' again...

Hope this is of any help to other people having the same problem...

Regards,

Patrick.
  Reply With Quote
8 17th June 02:44
patrick vervoorn
External User
 
Posts: 1
Default Mega-long/slow workunit?


[Mega-snip, apologies for these follow-ups to myself, but I'd like to get
this into Google and other archives correctly]


My apologies, this is not in the <time_stats> section, but in the
<project> section.

Regards,

Patrick.
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666