PDA

View Full Version : 1,000,000 particle limit = counter??



NinoK
10-11-2011, 03:45 PM
So i have an emitter emitting particles to set up a 'trail' behind other particles. The 'max age' of the trail particles are only 20 frames, and the particle is emitting only 300 particles / second, or roughly 10 per frame / parent emitter. Well by all counts, with a lifetime of 20 frames, the max number of particles I will have on screen at any one time before they are killed off is somewhere around 1200 or so and this is evident based on how dense they look. The math is a very rough estimate by the way, point being much less than the 1,000,000 particle limit.

After that the particles older than 20 frames should be 'killed' and the emitter should keep producing them because the total num of particles has yet to be hit.

However the 'max number of particles' in LW is more like a counter in that, even though there are only 1200 particles on screen, if the emitter has produced more than 1,000,000 total, it stops making them. So if I have long trails in that my 1,000,000 particle count gets hit fast, I have to instance a ton of additional emitters and sequence them, which is a pain in the *** for baking and control.

If the emitter makes a 1000 per frame, but they only last 2 frames, I will never have more than 3000 particles on screen, so why make the limit a counter? The way they work now this emitter can then only function and produce particles for about 33.333 seconds at 30FPS... after which I have to work in another emitter to keep the animation going...

Sensei
10-11-2011, 03:55 PM
Particle Limit is probably because particles are only once allocated, not dynamically, which simplifies code and makes it faster.

It's allocating memory for array of particles in initialization routine, and then one by one making unique particles, just asking for free particle index slot, then they're living some time and are killed and remain dead (scanning emitters routine see them dead to the last frame of scene). So, the same index is not used anymore. Otherwise dead particle would become alive again and perhaps confuse routines which are holding particle indexes or so (f.e. particle #100 living in 1-100 frames, then killed, but some other emitter reusing it and again particle living in 200-300 frames, but has nothing in common with previous instance? no, it would confuse procedures..)

xxiii
10-11-2011, 04:31 PM
Particle Limit is probably because particles are only once allocated, not dynamically, which simplifies code and makes it faster.

It's allocating memory for array of particles in initialization routine, and then one by one making unique particles, just asking for free particle index slot, then they're living some time and are killed and remain dead (scanning emitters routine see them dead to the last frame of scene). So, the same index is not used anymore. Otherwise dead particle would become alive again and perhaps confuse routines which are holding particle indexes or so (f.e. particle #100 living in 1-100 frames, then killed, but some other emitter reusing it and again particle living in 200-300 frames, but has nothing in common with previous instance? no, it would confuse procedures..)

Since particle calculation time is going to be dwarfed by frame render time, this seems, uh, lazy. At the very least it should be possible to come up with a virtual index to real index translation that will return a null or empty placeholder particle to any routine trying to reference a dead particle, while allowing the underlying memory to be reused for a new particle. Each particle would always have a unique ID, but would only use memory while it is actually alive.

On a modern processor, even for 4 billion particles, this translation routine would only add a few seconds (assuming a C/C++/assember implementation), if you have only a million particles it would be nearly instantaneous, and dwarfed by other calculations the particle is involved in.

Ideally, any routine asking for particles would only ask for live particles, and/or will recognize the "dead" flag of a returned particle and immediately stop what they are trying to do with it.

Until this post right now, I didn't realise there was a total upper limit on particles, I thought there was only a maximum at any one time.

In any case, it seems like the 64 bit version should support a higher (and preferably configurable if there must be one) limit.

Sensei
10-11-2011, 04:43 PM
Since particle calculation time is going to be dwarfed by frame render time, this seems, uh, lazy.

This code is 10+ years old or so..


Each particle would always have a unique ID, but would only use memory while it is actually alive.

That doesn't make sense. Why to query dead particle, without all its data? Dead particle might be useful to something, but only if it has all its data, in mine opinion.



Ideally, any routine asking for particles would only ask for live particles, and/or will recognize the "dead" flag of a returned particle and immediately stop what they are trying to do with it.

Reading particle data by index one by one through LWSDK is slow. I was always using routine which is returning whole array of particles. Nothing can be faster than that..


typedef struct st_LWPSysFuncs {
LWPSysID (*create) (int flags, int type);
int (*destroy) (LWPSysID);
int (*init) (LWPSysID, int np);
void (*cleanup) (LWPSysID);
void (*load) (LWPSysID, LWLoadState *);
void (*save) (LWPSysID, LWSaveState *);
int (*getPCount) (LWPSysID);
void (*attach) (LWPSysID, LWItemID);
void (*detach) (LWPSysID, LWItemID);
LWPSysID * (*getPSys) (LWItemID);
LWPSBufID (*addBuf) (LWPSysID, LWPSBufDesc);
LWPSBufID (*getBufID) (LWPSysID, int bufFlag);
void (*setBufData) (LWPSBufID, void *data);
void (*getBufData) (LWPSBufID, void *data);
int (*addParticle) (LWPSysID);
void (*setParticle) (LWPSBufID, int index, void *data);
void (*getParticle) (LWPSBufID, int index, void *data);
void (*remParticle) (LWPSysID, int index);
} LWPSysFuncs;

NinoK
10-11-2011, 04:47 PM
From a developers point of view, I would never write an algorithm that needs to iterate over all the particles each calculation frame, as this seems wasteful. The only way this would need to be done is if particles are allowed to be 'reborn' after they have died. IE, if max age is 20 but you want the particle to be visible again at 200 to 220... In this case you would just make your particle 'max age' be 220 and edit envelopes accordingly. There needs to be a way to pull only the 'alive' particles into their own subset for complex iterations. This explains why my calculation times skyrocket, even though the same number of particles remain on screen after a while.

Sensei
10-11-2011, 05:01 PM
From modern CPU POV doing
for( int i = 0; i < 1000000; i++ )
{
if( enable_array[ i ] != 0 )
{
}
}
takes so small time, you can't even see difference (microseconds)..

xxiii
10-11-2011, 05:41 PM
Responding to the last several posts in no particlar order (and talking about a theoretical implementation, rather than the actual one):

If you are trying to reference useful data out of dead particles, then they aren't dead, or we are using different definitions of dead. It seems to me nothing should be trying to access truly dead particles anyway, and if something does need them, it should probably make its own "funeral" arrangements, so to speak.

Whether you need to iterate over all the particles or not each calculation frame I would think would depend on what you are doing. In any case, I was simply trying to point out that adding code that can realize particle[2353233] is really rparticle[17] in memory is really insignificant, and even more so if you are only accessing some of the particles. You could also ask for the entire live-particle array, and the current virtual to real transation offset (if implemented that way).

Regardless of how it may actually be implemented currently in Lightwave, I'm saying its possible to implement a particle system that given finite lifetimes can emit particles infinitely and it won't be that much slower.

Cryonic
10-11-2011, 11:03 PM
Actually I can think of a reason for this limit, rendering scenes over a network of nodes. How do you make sure that each successive frame is being properly rendered when sharing this data between systems?

dballesg
10-12-2011, 01:15 AM
Actually I can think of a reason for this limit, rendering scenes over a network of nodes. How do you make sure that each successive frame is being properly rendered when sharing this data between systems?

Easy answer. Because when you are doing network rendering you always bake first the particle animations to a PFX file.

Send a scene to a Screamer Net rendering without baking the particle animation to a PFX or a Cloth one to a MDD is a recipe for problems.

David

NinoK
10-12-2011, 01:38 AM
From modern CPU POV doing
for( int i = 0; i < 1000000; i++ )
{
if( enable_array[ i ] != 0 )
{
}
}
takes so small time, you can't even see difference (microseconds)..

Regardless, anytime you do something a million times (or more if you have more emitters) per frame is wasteful if not necessary. If it truly was milliseconds my pfx calculations would not slow down to a crawl as more particles are added to the array.

Sensei
10-12-2011, 01:40 PM
Regardless, anytime you do something a million times (or more if you have more emitters) per frame is wasteful if not necessary.

Alternative is having to keep dynamic list of indexes or pointers to particles. And then adding new entry or removing entry when emitter generates or particle dies. Removing element from middle of multi-million dynamic array is damn slow, and after couple such operations usually there is done reallocation of whole array, and copying data.. Of course this list is not sorted!


If it truly was milliseconds my pfx calculations would not slow down to a crawl as more particles are added to the array.

I bet slowdown has nothing to do with the fact that there are done some if()'s to check status of particle. Maybe routine is buggy and still adds them to kd-tree or octree? But later ignores them in other procedure.
In dynamics the slowest things are self-interaction calculations and detection of hitting some collision object(s). No collision = everything is calculated smoothly, right?
If collision object is dynamic (has bones, deformations, is moving or rotating) then every frame its kd-tree/octree has to be recreated from scratch. The more geometry it has the slower this process. Better make low polygon count object basing on original object, and use it instead of original.

NinoK
10-12-2011, 03:19 PM
I bet slowdown has nothing to do with the fact that there are done some if()'s to check status of particle. Maybe routine is buggy and still adds them to kd-tree or octree? But later ignores them in other procedure.
In dynamics the slowest things are self-interaction calculations and detection of hitting some collision object(s). No collision = everything is calculated smoothly, right?


Yeah I'm not sure, I'm stumped on this one. The scene that I'm doing has no collisions or interactions of any kind. The only complex thing is an animation path wind and 5-6 emitters half of which are affected by the wind, and the other half just spawn behind the ones affected by the wind. Everything should be smooth but is not. Scrubbing on the early frames is acceptable and smooth, but come near the half way point and the calculations go from near real time at first to 10-15 sec per frame. There is less than a million particles nearly the entire time, as only 1 of the emitters hits the million particle limit at the end of the animation, the rest spawn only a few thousand each, so I am stumped as to why LW takes such a huge hit. I've tried setting it up numerous ways and its the same thing each time. I am currently looking at Blender but unfortunately that means learning a new particle system first.

Sensei
10-12-2011, 03:21 PM
Are particles self colliding?

Can you share scene here?

Sensei
10-12-2011, 03:30 PM
Maybe slowdown is result of out of physical memory and having to use virtual memory instead? 1 million particles * buffers * large number of frames = a lot of memory needed..
Try baking PFX with different number of frames, like 25%, 50%, 75%.. What sizes they have on disk? growing of file size is linear?

NinoK
10-12-2011, 04:55 PM
I'm afraid I cant post the scene, but its fairly simple. There is no self collision, they just follow a wind path, albeit a long one. I checked, only 4 of the 12GB of ram are used, although CPU usage stays around 28% only while calculating. (100% as expected while rendering) Looks like threading is not being utilized as its a quad core i7.

As for the baked PFX file, growth is as expected. The 1mill PFX file is quite large at 250Mb, the remaining 5 though are small at 50MB and less. Perhaps it some bug I keep hitting.

Oh well, I just baked everything over the last two hours so Ill just try and keep the changes to a minimum.

Thanks for your help.

xxiii
10-12-2011, 11:38 PM
Is "Detect interaction" (on interaction tab of fx emitter) checked?


only 4 of the 12GB of ram are used, although CPU usage stays around 28% only while calculating.

This is suspicious, this is 64 bit lightwave I assume?

Open task manager, on processes tab, make sure PF Delta is displayed (add it from the view menu, select columns). This will indicate virtual memory usage. You might also want to add Memory (private Working Set) and Commit Size, which should be approximately the same, or disk is being used for memory.

But, if its 64 bit LW, and only 4 of 12gb is in use, it shouldn't be faulting. Perhaps the particle system is 32 bit? (I hope not).

xxiii
10-12-2011, 11:54 PM
Easy answer. Because when you are doing network rendering you always bake first the particle animations to a PFX file.

Send a scene to a Screamer Net rendering without baking the particle animation to a PFX or a Cloth one to a MDD is a recipe for problems.


Alternatively, if you have a deterministic random number generator, and the ability to specify the seed, all the nodes should generate the same thing (assuming they all calcuate from the same starting point, regardless of what frame they are actually rendering). I think LW has an option to force the same seed all the time, but I don't think it lets you actually set what it is.

I kind of wish it had a more transparent baking option though, like radiosity (just automatically bake the particle system(s) on the first frame (or in screamernet, before it launches any renders) and use it for the rest, then delete when its done).

Sensei
10-13-2011, 12:17 AM
It has nothing to do with randomization (unlike randomizing nodes, textures and shaders).
When render farm is rendering f.e. frame 100, it is complete senseless that it would have to go through all 1-100 frames calculating dynamics again, and again, and again, every single frame render controller ask it for..

Remember that render controller can send command to render node to render frame out of any order. It might be as well 100, and then 200 the next time (because f.e. there is 100 render nodes in render farm)

Do you remember how long dynamics is calculated in Layout when you're pressing Calculate? Then imagine render node doing it every single frame prior generating image..

xxiii
10-13-2011, 12:31 AM
all 1-100 frames calculating dynamics again, and again, and again, every single frame render controller ask it for..

Hence, my second paragraph about baking.

There are two issues: Getting the same result each time, and avoiding redundant calcuations.

Baking solves both issues, but in the absence of baking it is still possible to achieve the first. Ideally, assuming I don't need to tweak the results, I would like the baking to occur automatically and transparently.

NinoK
10-13-2011, 08:07 AM
Does LW have general solving or calculation preferences? Num of steps, error margins, etc that are global that I can play with? I haven't seen anything in the standard locations for preferences.

Lightwolf
10-13-2011, 08:12 AM
When render farm is rendering f.e. frame 100, it is complete senseless that it would have to go through all 1-100 frames calculating dynamics again, and again, and again, every single frame render controller ask it for..

Actually, for simple cases LW can perfectly well render dynamics driven particles across a network without the need to bake. (However, I wouldn't recommend it).
And since most render nodes render in frame increments, even pre-computing a few frames isn't that bad (they only need to cover the difference to the previously rendered frames, not the beginning of the animation) - especially if rendering using mode -2 or even mode -3 in batches of frames.

Cheers,
Mike

xxiii
10-13-2011, 11:42 AM
Does LW have general solving or calculation preferences? Num of steps, error margins, etc that are global that I can play with? I haven't seen anything in the standard locations for preferences.

Ah, thanks for reminding me. I too was getting ridiculously long particle calculation times in a recent project and I was trying to remember what I did to solve it so I could post it here.

In Layout, go to Utilities -> Additional Plugings -< FX Browser.

Click on Options.

Adjust resolution as apporpriate. For example, if your particles are traveling 100 kilometers, and resolution is 100mm each particle will be "calculated" 10,000,000 times on its journey. Thats still not too bad if you only have a few particles, but if you have lots of particles, that adds up.

On the other hand, if your scene is really small or you need fine detail in your particle interactions, you may need to reduce (thereby increasing the resolution) the number in this box. (for instance if you're wondering why your particle sailed right on by something like it was invisible when it shouldn't have).

I'm guessing for most "human scale" scenes 100mm is fine. In my case I'm working on something on the order or a megameter or so, and this was killing me.

Now, I wish this could be set per emitter, instead of globally, and also an option tor make it relative to the particle's current speed.

NinoK
10-17-2011, 01:27 PM
I had changes so had to rebake it. That last tip was awesome, it calculates magnitudes faster moving from 100mm to 500mm. Thanks!