Screamernet crash-O-rama - best solution?

egearbox

Mostly Harmless
I have three quad-core CPU machines set up for Screamernet, each running 4 instances (total of 12 nodes). I run one of the machines as the "controller" also. (I think Newtek historically says this is a no-no, but I loathe losing 1/3 of my render farm.)

Every time I render something on SN, it seems one or more of the nodes crash. There's never anything in the log file, the node just stops responding and/or completely disappears off the client machine. Normally I just sigh, identify the missing frames, and re-render them locally. But this last time, 7 of the 12 nodes crashed. So I really have reached a point where I have to do something about this.

Some simple questions about what to do now:

1) Is this "normal"? Does everyone have this level of crashiness with Screamernet? Or is there something weird going on here?
2) I'm running 11.6.2. Would it help to install 11.6.3, does anyone know for sure if anything in Screamernet was fixed? (I checked the buglist and nothing jumped out at me, but maybe someone knows.)
3) Is there an improved way of running network rendering without spending more money? I confess I don't know what the current options are - just starting my search, if anyone has recommendations I'd love to hear them.

Thanks ...
 

3D Kiwi

Lava Lamp Technician
Does this happen when you run only one node per machine. By running 4 per machine you need 4 times the amount of ram as its loading the scene 4 times. You may be running out of ram.

You may also be overloading the network.

I haven't used it in production but I tried Amleto and it worked well with two nodes per machine.

Also renderpal gives you 3 free nodes. Works ok with Lightwave.
 

egearbox

Mostly Harmless
Does this happen when you run only one node per machine. By running 4 per machine you need 4 times the amount of ram as its loading the scene 4 times. You may be running out of ram.

You may also be overloading the network.

I haven't used it in production but I tried Amleto and it worked well with two nodes per machine.

Also renderpal gives you 3 free nodes. Works ok with Lightwave.

Thanks for the reply! I've tried it with 2 nodes per machine and again, it appears that some of the nodes hang, although I didn't lose any frames - which may indicate you're absolutely correct, I may simply be overloading the machines or the network. I'll reconfigure to just one node per machine and see if reliability improves.

Thanks for the other suggestions as well - I'll check out Amleto and renderpal.
 
1: No it's not normal
2: 11.6.2 and 11.6.3 are almost the same. The only thing changed was something with the license. So NO
3: You could install amleto to mange your render farm. http://virtualcoder.co.uk/amleto/

Why are you trying to run more than one node on each machine. It's usually better to just run one node per system. Also What kiwi says.....
 

Scazzino

3D Mac Maniac
Be sure to adjust the render threads downward if you are running more than one instance per computer. If it has 4 cores and 8 virtual threads and you want to run two instances then set the render threads down from auto to 4 each. But keep in mind that running multiple instances will require multiple times the amount of RAM necessary. Two instances will require twice the RAM of one instance. Unless you are doing things that are inherently single-threaded, you're generally best off with one instance per computer set to auto threads.

Here's some information about the thread setting (written for Mac but it's similar on Win).
Managing LightWave’s All Important Config Files - Multithreading
 

egearbox

Mostly Harmless
Yes, I reduced the instances from 4 per maching to 2 and eventually to one. I'm now running a single node on each machine and the problem seems fixed.

I'm not sure why I did it the other way to begin with - I had some rationale at the time but it's long since escaped me. Anyway, thanks to everyone for their help! Problem solved!
 
Top Bottom