PDA

View Full Version : Native Instances vs HD-Instance..



Hieron
09-30-2012, 04:53 PM
My 1500th post, I'll try to make it count.. :)

Since posting this tutorial on nice grass using HDI: http://forums.newtek.com/showthread.php?101604-Grass-with-HDI-explanation-files , we have been using grass (and trees and bushes and flowers) alot in our renders. Now with native instancing, it has become interesting to replace trusty (but not actively developed) HDI. However, this is not possible yet and I hope this thread may help to shed some light on some areas that may need significant improvement.

Right, the actors in this play: (nothing fancy this time, see the old thread for something nicer... or our webpage..)

The grass:
108191
2.5k poly's per patch

The ground:
108192
500mx500m. Each poly (outlined in yellow) is ~8mx8m

Let's start with HDI, instance the grass over the ground surface and place about 1 instance per meter. Since this is a 500x500 ground we end up with +- 250k grass patches:
108193
Shot rendered in 14.3 seconds. (this is for comparison vs other shots. All settings kept the same unless mentioned)

Let's ramp it up a notch and go for 10 instances per meter. So, this leads to +- 25M instanced grass patches:
108194
Shot rendered in 53.5 seconds and only took 500MB more memory than the 250k instances. Notice the nice even spread.

Let's go mental and go for 50 per meter! We are talking 625M instances, ~1500Gigapoly in total:
108195
Shot rendered in 5m29s seconds and only took ~1GB more memory (limited to 1GB in HDI options, setting it higher allows HDI to use much more memory but speed is hardly better at 4m48s, very very nice usage of memory by HDI) than the 250k instances. Not bad for ~1500Gpoly... Moire patterns due to me not putting in any jitter and randomization. Can easily be done at practically no cost but left out for comparison sake.

Now doing a little trick, the ground plane is cloned and the clone is put 1 meter above. Covering all instances which results in:
108196
Which rendered in 10.7 seconds and used no memory to speak of. Nice use of culling away 625M instances there!

Continued with the same tests using native on next post.

Hieron
09-30-2012, 05:26 PM
Continueing with native instances, grass over the ground surface with 250k grass patches: (not shown in viewport, so only at rendertime generated)
108200
Shot rendered in 11.9 seconds. beat HDI by a few seconds. Native places them random by default, which may look nice here but actually has disadvantages when trying to get a uniform coverage. For say grass on a field or a lipid membrane this is essential.

Setting the spread relax to "10" yields:
108201
Which is better. LW was unresponsive while trying to generate them, for 10 seconds. Total rendertime 21.2s. Losing to HDI by seconds..

Back to random/non-relaxed and trying 25M instances using native, a number that is trivial to HDI. It makes memory usage shoot sky high and hit my 16GB memory limit in to time. Waiting for 2m40s before anything even happens.... see CPU usage drop to ~0% in the meantime. Then it finally starts to render. Memory usage dropped back down to ~7GB (some culling has occured after generation?) and CPU usage stays at 100%. Resulting in:
108202
Which is admittedly a nicer 25M instances, but HDI can be made to look like this (more random) easily at no rendertime cost and it took 53.5 seconds at 500MB memory whereas native took 15m50s for this image!

So what about the trick? The culling at generation? Remember, HDI managed to render a scene with 625M instances instantly when all were covered (how on earth did Graham pull that off with HDI, it is amazing....), it knew they did not need rendering. Native instancing has no such thing.. it will generate every single one of them, even if none will show up at all... So this image which shows not a single instance of the 25M that were generated uselessly:
---image not included, as LW hangs at 100% completed. CPU at 0%, memory not changing for minutes...---

Made LW stall for 2m 30s while memory fills up completely (like before) and in all took 5m 4s to render (untill it hang at 100%)... :(

You will understand that I won't try the uniformity/relax at 10 for this.. it would take very long... Nor trying to get to 625M instances, something HDI did without breaking a sweat!

See our issue?
->Native is not even close when the numbers come in, and those numbers are needed in some cases... And HDI was absolutely amazing at it. Fast, low memory, responsive<-
It would be very sad if we lost HDI while LW keeps being updated in new cycles and HDI doesn't, since native is not able to take over its spot yet.. not even close...

Cageman
09-30-2012, 06:14 PM
See our issue?
->Native is not even close when the numbers come in, and those numbers are needed... And HDI was absolutely amazing at it. Fast, low memory, responsive<-
It would be very sad if we lost HDI while LW keeps being updated in new cycles and HDI doesn't, since native is not able to take over its spot yet.. not even close...

Build a jungle and turn on GI... lets see which one is faster. :)

Hieron
09-30-2012, 06:19 PM
So, as an extra and subtler point. Let's check GI and these instances.

Here a much smaller patch of land with HDI instanced grass (a bit randomly placed this time) and some clutter. No GI:
108203
Rendered in 39,6 seconds.

Now with GI turned on (settings tweaked to something fast, did not compensate light as it is not about looks):
108204
GI in 30.8 seconds. Total rendertime 1m49s.


Back to native instances version, no GI. 5M pieces to match HDI a bit:
108207
Rendered in 1m14s.

And with GI turned on again (same settings as before):
108208
GI in 39.2 seconds. Total rendertime 2m14s. Outpaced by the plugin in both GI calc and normal render.

It would be nice if over time, native could match the speed of HDI.

So, hopefully this makes some sense and is usefull to someone here that needs to value the worth of native instancing vs HDI. For us, HDI will be around for a while I guess.. Hopefully native instancing will keep improving so that one day it can match and replace HDI when it comes to mass instancing things.. Hopefully before HDI stops working with new LW update cycles..

Scenes and models provided:
108209



Build a jungle and turn on GI... lets see which one is faster. :)

Not sure how you meant the remark..

http://www.nymus3d.nl/portfolio.php?cat=1&scat=5&lan=nl&player=59
http://maps.kennispark.nl/
Not quite trivial scenes we are applying instances to....

I was still writing the above about GI. HDI will be much faster, hands down. GI or no GI.
imho ofc, I'd like to be proven wrong on this...

sukardi
09-30-2012, 06:22 PM
I am not a programmer but I think the native instance generate instance ID for each instance generated, so it may not handle insane number of instances as well as HD.

Basically, I think HD instance is phenomenal at handling insane number of instances, probably no other commercial instancing system can match. However, current native instancing serves my need well, in many cases better than HD. I would surely like to see it developed further but the ability to handle 625m instances in quite low on my list at the moment.

Hieron
09-30-2012, 06:33 PM
Native surely has great improvements, it is nice that they added it! 625M instances may not be necessary for all shots per se. I'm just pointing at the loss of performance, which may not be obvious to some. We surely can't do away with HDI in our scenes and I hope to have explained why..

Not to bash native instancing, it is great and allows many things HDI doesn't. I wonder if part of the performance difference is due to this..

Cageman
10-01-2012, 02:27 AM
Just finished a test with your content....

LW Native: 4m 4s
HDI: 4m 15s

When rendering GI with HDI, you need to enable Volumetric Radiosity.

Hieron
10-01-2012, 06:23 AM
Just finished a test with your content....

LW Native: 4m 4s
HDI: 4m 15s

When rendering GI with HDI, you need to enable Volumetric Radiosity.

Ow come on.. I know that...:/ otherwise the instances in the example render would have been dark right..? The example scene was saved after the last test, which was native, so some options like Volumetric Radiosity were turned off. Perhaps I should have left out the GI comparison as it is in the same ballpark. And not an excessive difference like the high numbers.

What takes 4m to render for you exactly? The content I provided allows for all tests here. On your Hexacore I would expect an F9 upon load to take about 2 minutes?

Apparantly you do not agree.. yet I think I clearly demonstrated that Native Instancing stumbles when rendering 25M instances while HDI does not mind (nevermind the 625M instances). I could have chosen to do the GI test in that regime as well, which would surely bring Native down to its knees.

Do you not agree that native instancing is (much) slower when the high numbers come in? And that it could use some culling? We surely can not do certain stuff with Native that is doable with HDI.. it's not like we do not want Native to be able to do it.... Native has all kinds of nice options and possibilities..

If those possibilities prohibit any chance of being able to render 25M instances as quickly and with low memory footprint as HDI then that would be interesting to know as well.

Sensei
10-01-2012, 06:37 PM
LW instancing is asking instance generator to provide all data about each instance. Like with particles. Position, rotation and scale etc.
So they must take memory.
Then they're placed in some octree/kd-tree or something similar. And OpenGL is going through the list drawing each bounding box from array.
And when ray is hitting one of them there is done conversion of ray origin and direction to match reference item.

Changing it means rewriting from scratch.

Hieron
10-02-2012, 03:23 AM
LW instancing is asking instance generator to provide all data about each instance. Like with particles. Position, rotation and scale etc.
So they must take memory.

But how is HDI different in this.. all of those can be set with HDI too?


Then they're placed in some octree/kd-tree or something similar. And OpenGL is going through the list drawing each bounding box from array.


OpenGL having a hard time is ok, with HDI and high numbers one would never show bounding boxes either, there is usually no need to have them all previewed. With visibility off LW only generates them at rendertime it seems. So ignoring OpenGL, isn't HDI doing the same?


And when ray is hitting one of them there is done conversion of ray origin and direction to match reference item.

Changing it means rewriting from scratch.

Still not sure how that would be different for HDI? That would be a bummer though...

Do you have any idea how HDI manages to cull away so many instances so dang fast when they are occluded from view (barely) by another polygon? Always amazed me...

Ah well, was worth a shot to bring this to attention.. it is easy to get used to be able to use so many instances.. very helpfull in massive arch viz scenes.

Phil
10-02-2012, 05:14 AM
One of Sensei's points, if I understand him correctly, is that the bounding box preview limits differ between HDI and native. That adds to UI lag and memory overhead whilst working in Layout - the native system uses a % value and Layout will choke during instance evaluation unless you set that % value to be very low (default is 100%). HDInstance uses a numeric cut-off that, by default, is set to something like 100k (from memory) and so the evaluation stall and memory load is not as pronounced. If you set both to close to 0, you'll see Layout become responsive. If you set them both to be higher than your instance count, you will see equivalent stalling and memory load whilst working in Layout

In terms of rendering, HDInstance has generally been more efficient. It seems that 11.5 will bring further improvements here. For a first iteration, the instancing in 11.0 was pretty decent.

I'm much more concerned about the inability of hair/fur systems to match Sasquatch for large area coverage. Both LW and modo's systems fail badly in this - slow and very memory intensive.

Sensei
10-02-2012, 05:38 AM
SasQuatch doesn't have to generate any hairs in memory.. It's post process effect.. When you draw line or curve in Paint or so, do you need memory (undo is off).. ? It's placed on top of previously rendered image..

Now imagine curve point that has x,y and z-depth (so nearest hair is rendered on farthest)..

for( i = 0; i < hairs_count; i++ ) { draw_hair( i ); }

Hair points generated during drawing hair, and then disappearing..

Phil
10-02-2012, 07:41 AM
Yes, I know, but I wish the FFX pixel filter mode was just as efficient, but it doesn't appear to be.

Cageman
10-04-2012, 08:06 AM
Hi guys,

Sorry for a late response... but I think I need to clarify what I meant with "Build a Jungle and lets see which one is the faster...."

So, what am I talking about when I say that in my experience, LWs native instancer is faster to render? Well, first of all, Hieron, you are absolutely correct that HDI does not require any translationtime, especially in comparsion to native when one is using Surface placements. In a sense, HDI is it's own closed system that only feeds LW exactly what it needs in order to render. That, and the fact that it turns everything into Volumetrics makes things pretty nice and dandy and fast. This is superefficient for situations where you don't need multiple bounces or a lot of rays to make a clean GI-render, but you will get diminishing returns from HDI when you start to crank up the quality of GI. Another thing that I have noticed is that when using Points-mode and reaching up to 4 million instances, HDI starts to have much longer pre-computing times compared to Native. In any case, I would argue that GI is the main culprit, and a showstopper when you want more than one bounce....

At this point I say that HDI is slower, lets take a look at that. In the following images, Native is to the left, HDI is to the right. Lets use something simple, a cube instanced to a groundmesh, without any difficult shaders or other things that wouldn't work with HDI.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Eyeballing_comparsion1.jpg

This is a 6 bounce Interpolated MC, with RPE 512 and SBR 512, rest of GI-settings are default. Obviously quite overkill, but it defanately tells a different story about the efficiency of GI regarding Volumetric objects vs true polygonal objects.

Lets get more sensible and lower the settings to something more "down to earth". Interpolated MC, 2 bounces, RPE 256, SBR 64.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Eyeballing_comparsion2.jpg

Still, it seems that native is much faster. But, I'm not too happy about these tests, because they are not really scientific because I actually do not know the exact number of instances used in HDI. I do know that in native, I have 400.000 instances. So... lets move on to some more exact comparsions. Lets use the Points on the ground-object. I will stick to the same GI-settings.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Matching_comparsion1.jpg

Ok... what happened there? Native instances are 3 times faster? How can that be? Ah... of course, it is just 20402 instances... I upped the display and render subdlevel of the ground object, reaching 1.012.210 points... so... about 1 million instances.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Matching_comparsion2.jpg

This is an interresting result. Native increased the rendertime with 23 seconds, and HDI only with 8. Lets see what happens if I up the number of instances to 4.014.202 without changing any scale.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Matching_comparsion3.jpg

Native stayed fixed, while HDI again got bitten by the Volumetric GI slowness. As a final test with GI, lets see what happens if I scale down the cubes so that they are not intersecting.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Matching_comparsion4.jpg

Ah, as I suspected... the conclusion I can draw from this is that HD-instances suffers greatly when large areas of the instances are recieving GI. In the previous comparsion, it was almost redicilously slow compared to native, in this last one, it is more evened out, even if native is still about 2x faster.

Lets just render without GI now.

http://hangar18.gotdns.org/~cageman/HDIvsNative/Matching_comparsion5.jpg

Nopes... still slower... for some reason, with these number of instances, HDI seems to have a longer "pre-computing pass" compared to native. That said, when using Surface mode, HDI has less precomputing (almost none actually). But, as is evident, GI is one of the more important factors for me at least, that makes LWs native instancer the faster choice, especially when you want to use more than one bounce.

I hope this clear things up. As for room for improvements regarding Instancing... oh yes... absolutely. I can also understand that the higher level of controll with LWs native instances (I mean, through nodes and whatnot) will sacrifice speed to a certain level, something that Sensei touched on with his post. It is, simply put, a more expandable implementation as it is, and I suspect we will see some really cool things in future versions of LW, speed-improvements included.

Cheers!

Sensei
10-04-2012, 08:52 AM
That, and the fact that it turns everything into Volumetrics

Did you read Volumetrics LWSDK?

What is volumetric?
It's plugin which is receiving ray origin and ray direction, and two floats clipping plane near and clipping plane far, as parameters. Clip near is 0 or close to it in the most situations. Clip far is usually infinity.
And it has to return distance to the closest to ray origin intersecting geometry (between near and far clips), color and alpha at that spot..
And can STOP evaluation, if it'll return alpha=1.0 (color is final, completely opaque)..
(Volumetric HyperVoxels don't stop evaluation, but that's their implementation chosen by programmer..)
It absolutely doesn't differer from regular ray-tracing which is finding what triangle intersected ray..
Renderer can even use internally the same routine for regular triangle mesh.. Mesh can be implemented as (one of) volumetric plugins in ray-tracing renderer engine. Without any speed impact.

Volumetric is not type of data.. there is no "conversion" like somebody can understand in your post..
You probably mixed it with voxels
http://en.wikipedia.org/wiki/Voxel
Where is grid of evenly placed data, and it's nicely blended together inside of volumetric.

kopperdrake
10-04-2012, 09:09 AM
Nice post Cageman. It does seem to reflect my experience with native instancing. The majority of the use for instancing here is with grass and flowers outside an arch viz project. I haven't had the time to do rigorous tests as you have here, but my gut instinct is that native has proved to be quicker, especially when throwing objects at it. In the past I have had to turn off GI for grass and fake the lighting to match the GI used elsewhere, such as buildings,which meant a multiple light set up and lots of turning off and on of render flags in the light panel. Now I just use the same light source and I'm done with it.

Cageman
10-04-2012, 09:20 AM
Volumetric is not type of data.. there is no "conversion" like somebody can understand in your post..
You probably mixed it with voxels

Well, the fact that you need to turn on Volumetric Radiosity in GI-panel when rendering GI with HDI does suggest that it isn't true geometric data that HDI generates.

EDIT: It also says on Happy Digitals homepage that, and I quoute... "HD Instance is a true volumetric plug-in,..."

http://www.happy-digital.com/hdinstance_docs/hd_instance.html

Cageman
10-04-2012, 10:06 AM
http://hangar18.gotdns.org/~cageman/HDIvsNative/Matching_comparsion1.jpg

Ok... what happened there? Native instances are 3 times faster? How can that be? Ah... of course, it is just 20402 instances...

That was a huge typo! :D The number of instances are actually 371.402 *doh* I was simply looking at the display subd level (which was at 1) and not the render subd level (which was set to 6) in that test.

My bad...

Netvudu
10-04-2012, 10:25 AM
Great rendering tests Cage. It allows to draw clear conclussions.

jwiede
10-04-2012, 11:37 AM
Hi guys,

Sorry for a late response... but I think I need to clarify what I meant with "Build a Jungle and lets see which one is the faster...."
Can you please provide the scenes used for your tests? The results in your final test appear to directly contradict the results of Hieron's testing, as well as my own results re-running his tests. As he provided scenes I was able to confirm his results by running them here, and would like to do so with yours as well. I'm trying to understand why in a similar test you're getting contradictory results compared to his scenes, when it comes to native instancing vs HDI performance even in non-GI "standard" renders.

Hmm, also, did your objects have any surfacing/textures applied?

Cageman
10-04-2012, 12:06 PM
Can you please provide the scenes used for your tests? The results in your final test appear to directly contradict the results of Hieron's testing, as well as my own results re-running his tests. As he provided scenes I was able to confirm his results by running them here, and would like to do so with yours as well. I'm trying to understand why in a similar test you're getting contradictory results compared to his scenes, when it comes to native instancing vs HDI performance even in non-GI "standard" renders.



Ah.. sorry... totally forgot that! See attachment!


Hmm, also, did your objects have any surfacing/textures applied?

Nopes... as I mentioned, I didn't want to add any type of complexity to the instances themselves, just to focus on the Instance engines and GI. I have an ugly procedural on the ground object itself though, but that shouldn't interfere with the instances, more than the eventual lightbounce.

I was originally going to instance vegitation, but then I realized that that wasn't something I could share as content. I will, however, continue with these comparsion tests with a hell of a lot more complex geometry. But that stuff is something I will not be able to share the content though.

Funny... I made this content especially for being able to share, and that was something I forgot to do. :D

EDIT: About the content.....

The scene-names refer to the render subd level of the groundmesh and how many instances it creates. The scenefiles that doesn't have the numbers are the crazy 6 bounce MC setup + surface-distribution for the instances.

jwiede
10-04-2012, 12:19 PM
Thanks! Will try to repro results likely this weekend, and let you know if I encounter any issues.

Hieron
10-04-2012, 03:54 PM
Hi guys,

Sorry for a late response... but I think I need to clarify what I meant with "Build a Jungle and lets see which one is the faster...."

So, what am I talking about when I say that in my experience, LWs native instancer is faster to render? .....

Cheers!

So your result is that you need (and I quote): "the crazy 6 bounce MC setup" to make HDI slow (while interestingly keeping instance counts low, completely avoiding anything I said).

or that HDI gets slow when using point mode (magically now introducing much higher instance counts to drive home the point)?

How does that help or respond to anything I posted?

Did I say HDI is faster when someone tries 6 bounces and 512 RPE on first and second bounce on some meager amount of instances like a few 100k?
I was talking millions for grass producing billions of poly's. I provided the example scene and shots, I showed the results on something that would actually be usefull on a daily basis. Instancing a few cubes on the floor isn't. I wouldn't be here commenting on the speed of native if I would need a few cubes.

How would instancing grass or proteins on points even make sense?? It poses limits to the actual model it is placed on, hardly handy in most cases.
Did you even consider the point I was making before rushing to native's defense?


Let's reduce all this to something easy then:
If you do think native can handle the situations that I adressed faster, show me a 1kmx1km field of instanced grass and flowers, pushing it into the 50M+ instances, surface generated (as would be the normal thing to do... geez). And for the fun of it, show that entire scene rendering in a few seconds when it is covered by something like a building so not actually seen to the camera.

It should be way less time to generate than all those tests you did to show that there are situations when HDI is slower (how is that constructive to the future of instancing I do not know).

Waiting for the scene, amaze me.


And as I said in the first posts:
"See our issue?
->Native is not even close when the numbers come in, and those numbers are needed in some cases... And HDI was absolutely amazing at it. Fast, low memory, responsive<-
It would be very sad if we lost HDI while LW keeps being updated in new cycles and HDI doesn't, since native is not able to take over its spot yet.. not even close... "

Someone care to respond to that?

ps: yes this topic is dear to me, so I won't simply bow and say yes when the issue at hand is not adressed.

Exception
10-04-2012, 04:07 PM
From tests I did a while ago with HDI, DPI and the native system, there appeared a significant diference in GI sample placement. The native system places many more, and proper, GI samples on instanced objects. HDI and DPI go through another system which results in quite low GI sample placement. This does have impact on render times.

You can test this by using the Radiosity Sample visibility switches.

This may have changed, but it's been something Ive been providing feedback to NT on and it improved a lot in the beta stage. GI sample placement on instances is not identical to placement on normal geometry though, and most GI settings are ignored, so you need to do the reverse-setting trick, where you specify details for each geometry object, and use the general GI options for what you want the instances to do. Works reasonably well.

I would like to see some comparisons with both the DPInstance and DPInstancer plugins. DPInstancer uses the native system but it has different placement algorithms which may impact results. DPI quickly replaced HDI for me because it is supported well by Denis, less of a hassle to install and run, and has great placement options.

Cageman
10-04-2012, 04:13 PM
I've choosen to start out simple, just to showcase the basis of instancing in both HDI and Native. The focus has been on GI and the instance-engines alone in a fairly basic environment. As you can see by my response to jwiede, my testing and comparsion of this is far from concluded. Though, it is quite interresting to see how much GI actually makes the volumetricbased solution crawl, even in these very simple tests, don't you agree on that, at least?

I also tried to use HDI for some quite complex jungle-scenes, and I just had to give up on that, because I needed GI. It was much faster constructing several scenes consisting patches of the jungle and render those separately, without any HDI. I have yet to see if the native instancing will work better on that particular shot, but, so far, from what we have used instancing for at work, the native one has proven to be faster.

So... again.... in my experience.... that is the key words here... I am simply sharing what I have experienced! I don't see why you get so upset about that...

The problem though, with the content that I am going to use, is that it isn't something I can share... if you would want to share some content that I can use for my further testing, and you have no problems me using it and sharing it in this thread, then.. please... that would be awesome! I need at least a couple of trees, some flowers and grass.

:)

Hieron
10-04-2012, 04:24 PM
From tests I did a while ago with HDI, DPI and the native system, there appeared a significant diference in GI sample placement. The native system places many more, and proper, GI samples on instanced objects. HDI and DPI go through another system which results in quite low GI sample placement. This does have impact on render times.

You can test this by using the Radiosity Sample visibility switches.

This may have changed, but it's been something Ive been providing feedback to NT on and it improved a lot in the beta stage. GI sample placement on instances is not identical to placement on normal geometry though, and most GI settings are ignored, so you need to do the reverse-setting trick, where you specify details for each geometry object, and use the general GI options for what you want the instances to do. Works reasonably well.

I would like to see some comparisons with both the DPInstance and DPInstancer plugins. DPInstancer uses the native system but it has different placement algorithms which may impact results. DPI quickly replaced HDI for me because it is supported well by Denis, less of a hassle to install and run, and has great placement options.

Good point, it would be interesting. Will need to see how DPI's placement goes...
GI wise it is not a major concern, we can get nice and stable GI on the grass and trees in no time at all. One would not notice the difference between native and HDI there. (as it is fairly random and organic anyway)


I've choosen to start out simple, just to showcase the basis of instancing in both HDI and Native. The focus has been on GI and the instance-engines alone in a fairly basic environment. As you can see by my response to jwiede, my testing and comparsion of this is far from concluded. Though, it is quite interresting to see how much GI actually makes the volumetricbased solution crawl, even in these very simple tests, don't you agree on that, at least?

If I ever need a simple scene and insane GI, I will surely remember this and I do agree with you that HDI is slower in that case. How it is a response to the issue and examples I posted, no idea.


I also tried to use HDI for some quite complex jungle-scenes, and I just had to give up on that, because I needed GI. It was much faster constructing several scenes consisting patches of the jungle and render those separately, without any HDI. I have yet to see if the native instancing will work better on that particular shot, but, so far, from what we have used instancing for at work, the native one has proven to be faster.

If you need low instance numbers (a jungle may well be considered low mind you, my examples and situations (proteins, grass) are for millions of instances and billions of poly's. HDI does that at a tremendous pace.

I have never ever tried high GI settings on trees/grass. It never weighs up to the added rendertime and makes GI prohibitive to use indeed.

Again, instance counts are the point here.


So... again.... in my experience.... that is the key words here... I am simply sharing what I have experienced! I don't see why you get so upset about that...

I spent a fair amount of time to write those first posts. It is not much fun to see people completely missing the point, thus stating that native is fine. I woud like NT to be aware or Graham to keep this plugin up to date.

Native is great, native is the future. but native can not render these situations. Hence my post, to try and get attention for it. Do you not agree that native has issues with generating tens of millions of instances? And that such a thing could be usefull? Sure, if I would not be accustomed to HDI I may not have considered that km's of grass is an option, but when one has seen such power.. Why not acknowledge that native could use a push here?

Hey I would be fine with: "ow that would require a ton of work, just pray that HDI stays supported". I am for sure not the only one with requests and can wait in line.


The problem though, with the content that I am going to use, is that it isn't something I can share... if you would want to share some content that I can use for my further testing, and you have no problems me using it and sharing it in this thread, then.. please... that would be awesome! I need at least a couple of trees, some flowers and grass.

:)

Ow, you may ignore trees and flowers. take the scene I posted in the first posts. Redo the 625M test. Done. or the 25M test even.. if 625M may sound too ridiculous. (yet HDI doesn't mind)

Cageman
10-04-2012, 04:25 PM
From tests I did a while ago with HDI, DPI and the native system, there appeared a significant diference in GI sample placement. The native system places many more, and proper, GI samples on instanced objects. HDI and DPI go through another system which results in quite low GI sample placement. This does have impact on render times.

It is also worth mentioning that DPI also supports the native instance engine... I mean... you use DPI for placements, but render with regular polygon objects rather than Volumetrics.

Sensei
10-04-2012, 04:33 PM
GI samples have normal vectors. (and position and color)

But volumetric plugin cannot tell what's direction of volumetric sample data. (as I said earlier volumetric is returning distance from ray origin across ray direction, color and opacity at that location).

That's probably why GI engine is going crazy.

Hieron
10-04-2012, 04:42 PM
I would like to see some comparisons with both the DPInstance and DPInstancer plugins. DPInstancer uses the native system but it has different placement algorithms which may impact results. DPI quickly replaced HDI for me because it is supported well by Denis, less of a hassle to install and run, and has great placement options.

Agree with the placement options, it is good. That areabased evaluation is nice and very usefull too.


edit: tested DPI a bit as you suggested it (but I won't expect zillions of instances from DP's nice and free instancer and this is not meant as such) and when one sets the pass numbers to 0 (I assume no relaxation happens), it generates the instances a bit faster than native it seems, but it starts doing it before a render (with all previews off..?). Both are slow though and take heaps of memory. (again: not such a problem with low instance counts, but that is not why I started the thread)

Cageman
10-04-2012, 05:20 PM
Ow, you may ignore trees and flowers. take the scene I posted in the first posts. Redo the 625M test. Done. or the 25M test even.. if 625M may sound too ridiculous. (yet HDI doesn't mind)

What is the settings in HDI in your testscene to get 625M instances? For some reason, it loaded fine at work a couple of days ago, but when I bring it into LW 11.0.3 (latest HDI, same I have at work), it says 0 Instances. :/

EDIT: LOL, I am getting sleepy now. No worries... things are sorted. :)

Hieron
10-04-2012, 05:38 PM
I've choosen to start out simple,....


if you would want to share some content that I can use for my further testing, and you have no problems me using it and sharing it in this thread, then.. please... that would be awesome! I need at least a couple of trees, some flowers and grass.

:)

K, np. I'll help out but let's forget the flowers and trees and keep it simple. Getting this point across is important to me.. guess that was obvious by now :) A field of grass (25M instances, and yes it is quite exact and scientific, even with HDI) and some filler objects. Imagine a camera flying over a field of grass, going to inspect a sphere in close up.

HDI shot midway:
108306
rendertime: 6m4. Of that 4m4s is GI. Static cached in a normal situation ofcourse (ow and on a general note: that per object GI override of cache would be sweet NT, thx! :))

mem usage: (cpu is maxed all the time btw)
108307
Not bad.. one can actually have other stuff in memory. Like an actual arch viz scene to go with the grass. Or Pshop... Or appreciate LW not paging to disk.

Camera moves for a close up (frame 120):
108308
rendertime: 41s. Of that 15.3s is GI.

Quite a normal situation. You can replace the sphere with: office building, tree, cow, sign, anything.

Right, back to native... may take a while. :)

edit:

Here it is:
108309
Rendertime: 9m42s. Of which GI: 4m30s.

The reason for it being slower:
108310
The first 4 minutes, LW was completely unresponsive as the instances were being generated and put into memory. to make matters worse, the 16GB I have at home was not enough so some serious paging started to happen. CPU usage dropped to 0 completely at some stage. :(

Do note the better spread on instances by HDI. It does that without penalty to speed. If I would have wanted native to match that spread by relaxing, speed would have plummeted further.

close up:
108312
Rendertime: 19m51s. Of that was ~10m30s GI.

Memory usage was high again, so much of that slowness was due to massive paging to disk. CPU dropping to 0% at times. But it is not only a paging to disk issue, native is already tending to be slower when it is still in bounds. But this at least means that one cannot reasonably use 25M instances on a <16GB machine.

Also, sadly, native is not culling. So where HDI speeds up to a very fast 41s for a close up, native just plows away needlessly. Unhelped by massive mem usage and paging to disk.

This is not a bash, this is a careful consideration and worry. A worry that HDI will break in future builds of LW and native not be able to replace the need we still have. A real concern, to us at least. I hope more agree. The need here is not specifically high GI settings etc. it is being able to generate massive amounts of instances, without too much of a worry. Something we do often enough... arch viz.. exteriors.. a field LW strives to excell in? Just helping..

HDI was awesome in this, I hope native can follow suit. (or Graham be interested to keep updating his plugin)

Cageman
10-04-2012, 06:02 PM
Hmm... I must be doing something wrong...

http://hangar18.gotdns.org/~cageman/HDIvsNative/LWHDI_25million_instances_2m_29s.png

That render uses 10 instances/meter, took 2m 29s, and according to the math, it should in total generate 25 million instances. The problem I have with this is that it seems to be a theoretical thing... in theory, it should do that... that is the feeling I get when I use HDI. The lack of absolute numbers is... disturbing, to say the least. That feeling is increased when I see the result of the LW-Native render... where I actually typed in 25000000 instances; it looks sooo much denser! It took 7m 21s.

http://hangar18.gotdns.org/~cageman/HDIvsNative/LWNative_25million_instances_7m_21s.png

When comparing the two, one thing for me is sure... there is a math here that is waaay off. Wether it is LW native or HDI, I can't tell... but the settings regarding scale, rotation etc are absolutely the same in both renders. What is evident to me though, is that it looks like LW-Native is rendering far more instances compared to HDI?

Hieron
10-04-2012, 06:13 PM
What is evident to me though, is that it looks like LW-Native is rendering far more instances compared to HDI?

Ah you have the wrong object. See my tests.
native is placing the 25M instances on a much smaller patch and HDI is just doing it by meter (which is a handy way btw)

(that is the small patch, not the big 500x500 meter one.)

See my tests up a post for consistent results. Will post a thinned out test scene in a sec, when rendering is done of the close up scene.. it is not 50 seconds with native, that's for sure.

ps: my bad I suppose for combining all tests in 1 scene. But a good example why you would want even more than 10 instances per meter. Eg: more than 25M instances on a 500x500m patch of land.

Cageman
10-04-2012, 06:19 PM
Ok... there is defenately something wrong with the math...

Here is a 12.5 million instances render from LW-Native, and it still looks more dense than the HDI-render that is supposed to render 25 million instances. The rendertime is now down to 4m 46s.

http://hangar18.gotdns.org/~cageman/HDIvsNative/LWNative_12.5_million_instances_4m_46s.png

- - - Updated - - -


Will post a thinned out test scene in a sec, when rendering is done of the close up scene.. it is not 50 seconds with native, that's for sure.

ps: my bad I suppose for combining all tests in 1 scene.

Cool! :)

Sensei
10-04-2012, 06:38 PM
The lack of absolute numbers is... disturbing, to say the least.

You can't have more instances visible from camera than pixels on render..
So for 1280 * 720 it's 921,600

Good universal instancing engine doesn't generate instances, but there is math routine returning always the same transformation matrix and bounding box for given index.
Thus instances don't need to be stored in memory.. just indexes.. or range of indexes min-max.. Same index, same placement on render, without storing it in memory. But calculating it so many times takes time, instead of memory.

It's very easy for point based instancing - just take point position, and calculate bounding box from it as center.
For surface based instancing - take instance index divide by max number of instances, then multiply by number of polygons and you have polygon index (so you know vertex positions). Rest of division is index of instance in particular polygon. Now just multiply it by always the same attraction of vertexes and there is position on surface, normal vector and bounding box of instance.. Everything calculated, without using any memory.

Routine might be doing something like
for ( i = 0; i < max_instances; i++ )
{
bounding_box = instance_get_box_by_index( i );
convert bounding box to min-x,min-y, and max-x, max-y on screen.
add index to lists of instances in area min-x,min-y, max-x, max-y.
or better special case of kd-tree, with just two axes x and y on screen.
}

Hieron
10-04-2012, 06:40 PM
ps: why did my images get odd colorshifts when uploaded to the forum... odd. They look fine in Pshop. (edit: tested and they look good in Chrome and Firefox too, just not in IE9)

- - - Updated - - -


You can't have more instances visible from camera than pixels on render..
So for 1280 * 720 it's 921,600

...

Right.. nice explanation. I think I meant that. :P
And HDI seems good at this right? Hence the fast examples when it is blocked from view, low memory etc

Hieron
10-04-2012, 06:46 PM
Cool! :)

Ok here it is.

Should be a: open -> F9. Kind of thing.
108313

Let me know how it goes. And please, do consider switching it to the HDI variant and keep an open mind. This entire thread is to improve upon native, not to bash it. Perhaps I should not have named it "vs".. but ok.

Hieron
10-04-2012, 06:48 PM
I suppose Graham went and developed this:
http://www.happy-digital.com/autograss/
for a reason... same system, same coolness, in Vray, 3ds-max arch viz, more clients. No clue if Vray itself has a better instancing system now, since he released it years ago.

Do note that that one only instances grass presets, which is nice and easy. Yet we have something much better in LW. (but perhaps without further development)

So I guess I am afraid to lose our Autograss, Autoproteins, Autostuff... Pretty valid... no?

edit: -killed double post- "forum is not always as responsive as one would hope"

Sensei
10-04-2012, 06:52 PM
forum is not always as responsive as one would hope

It's tragedy since they upgraded forum..

Cageman
10-04-2012, 07:16 PM
ps: why did my images get odd colorshifts when uploaded to the forum... odd. They look fine in Pshop. (edit: tested and they look good in Chrome and Firefox too, just not in IE9)

- - - Updated - - -



Right.. nice explanation. I think I meant that. :P
And HDI seems good at this right? Hence the fast examples when it is blocked from view, low memory etc

From a rendering perspective this is a very good thing, but it is also related to the fact that you can't communicate with instances in HDI at all. This is one of those tradeoffs regarding speed and memory that I mentioned before and that Sensei touched upon; LW native instances isn't an isolated plugin that operates in its own space... it needs, and uses, inputs from within LW itself. Right now we have access to all the individual instances position, rotation and scale and that stuff has to exist in memory. This also means that we can have nulls tracking individual instances and do light-exlusions (well, through some workarounds) on specific instances if we know thier ID.

Things like that is just impossible with HDI... and as such, we are sacrificing certain speeds with a new level of flexibility. This also means that optimisations, such as culling, is harder, because no matter what, LW needs to know about an instance position, rotation and scale at any given time, even if it has to render it or not. This does mean faster renders but not as fast as HDI, which doesn't need to communicate anything with LW, litteraly, in comparsion at least.

Hieron
10-04-2012, 07:24 PM
Things like that is just impossible with HDI... and as such, we are sacrificing certain speeds with a new level of flexibility. This also means that optimisations, such as culling, is harder, because no matter what, LW needs to know about an instance position, rotation and scale at any given time.


Not sure, I am no developer so wouldn't dare to make assumptions there. But I have a feeling you are not a programmer either... Sensei, does the way native instancing allow more control, mean that all instances need to be generated per se and put into memory (in view or not)? Does LW's native system prohibit low memory usage and culling?


This does mean faster renders, but not as fast as HDI, that doesn't need to communicate anything with LW, litteraly.

hehe this is a nice sentence. :)
You managed to buffer the "not as fast as HDI" with other comments. And I hardly think HDI does not communicate anything with LW..?

So I suppose you agree with me that native cannot keep up in the examples I posted. Do you also agree that such a thing would be usefull? If so, are you 100% sure it can not be done from a programming point of view? How did you come to this view?

Anyway, if some dev or a programmer with serious experience would comment "no can do it is impossble, you go the flexibility, you pay the price" then that would be all the info I looked for (but didn't hope for).

Next up is a mail to Graham (again :)).. he'll have a firm /ignore on such mails I think..

Sensei
10-04-2012, 07:41 PM
Even without being programmer, you should be able to understand this:


// Instancer funcs.
typedef struct LWItemInstancerFuncs_t {
unsigned int (*numInstances)( LWItemInstancerID instancer );
LWItemInstanceID (*instanceByIndex)( LWItemInstancerID instancer, unsigned int index );
LWItemInstanceID (*first)( LWItemInstancerID instancer );
LWItemInstanceID (*next)( LWItemInstancerID instancer, LWItemInstanceID instance );

LWItemInstanceID (*createInstance)( LWItemInstancerID instancer );
void (*destroyInstance)( LWItemInstancerID instancer, LWItemInstanceID instance );
void (*setInstance)( LWItemInstanceID instance, unsigned int ID, LWItemID item, unsigned int steps );
void (*setMotionStep)( LWItemInstanceID instance, unsigned int step,
const LWDVector pos, const LWDVector scl, const LWDVector rot );
void (*setMotions)( LWItemInstanceID instance, unsigned int steps,
const LWDVector *pos, const LWDVector *scl, const LWDVector *rot );
void (*setMotionStepM)( LWItemInstanceID instance, unsigned int step,
const LWDVector pos, const double matrix[9] );
void (*setMotionsM)( LWItemInstanceID instance, unsigned int steps,
const LWDVector *pos, const double *matrix[9] );

// GUI only
void (*setInstanceDrawer)( LWItemInstanceID instance, InstanceDrawerMode drawmode );
void (*setInstanceDrawerColor)( LWItemInstanceID instance, unsigned int color ); // Use RGB_(r, g, b) macro to make colors.
} LWItemInstancerFuncs;


// Instance information.
typedef struct LWItemInstanceInfo_t {
LWItemID (*item)( LWItemInstanceID inst ); // The LWItemID of the item being instanced.
unsigned int (*steps)( LWItemInstanceID inst); // The motion steps stored in the instance.
void (*pos)( LWItemInstanceID inst, unsigned int step, LWDVector p ); // The position of the instance.
void (*scale)( LWItemInstanceID inst, unsigned int step, LWDVector s ); // The scale of the instance.
void (*rotation)( LWItemInstanceID inst, unsigned int step, LWDVector r ); // The rotation of the instance.
unsigned int (*ID)( LWItemInstanceID inst ); // The ID set by the generator.
void (*matrix)( LWItemInstanceID inst, unsigned int step, double m[9] ); // The rotation and scale matrix of the instance.
} LWItemInstanceInfo;


LWDVector is double floating point vector, 3 * 8 = 24 bytes, so for position, rotation and scale there is needed 72 bytes. There is also id and LWItemID, 8 bytes on 32 bit OS, so in total 80 bytes per instance.


// The LWInstancerAccess struct.
typedef struct LWInstancerAccess_t {
LWItemInstancerID instancer;
int mode;
} LWInstancerAccess;


// Instancer handler activation.
typedef struct st_LWInstancerHandler {
LWInstanceFuncs *inst;
LWItemFuncs *item;
LWRenderFuncs *rend;
void (*evaluate)( LWInstance, const LWInstancerAccess* );
} LWInstancerHandler;

LW calls LWInstancerHandler->evaluate() every new frame
it has to call LWItemInstancerFuncs->createInstance() in loop.
Call it 1000 times, to have 1000 instances. etc.
And it will eat 80,000 bytes of memory.

Hieron
10-04-2012, 07:52 PM
Even without being programmer, you should be able to understand this:

...
LW calls LWInstancerHandler->evaluate() every new frame
it has to call LWItemInstancerFuncs->createInstance() in loop.
Call it 1000 times, to have 1000 instances. etc.
And it will eat 80,000 bytes of memory.

Yes sure, that makes sense. But how does that correlate to your statement:


You can't have more instances visible from camera than pixels on render..
So for 1280 * 720 it's 921,600

Good universal instancing engine doesn't generate instances, but there is math routine returning always the same transformation matrix and bounding box for given index.

I can understand that LW needs 80 bytes for an instance, I was just hoping it would not require 80 bytes for every single instance whether visible or not. HDI doesn't need it... yet any of those instances has position, rotation and scale too..

Is there a fundamental difference, due to the flexibility of the native instances (preview/OpenGL is no concern here)? or is it something NT can streamline at a later time and after some effort?

Sensei
10-04-2012, 07:56 PM
I didn't say "good universal instancing engine" is about LW instancer ;)

But as Cageman pointed out, it would not be possible to so nicely control it, especially using node editor, without buffering all data once at initialization..

25M instances * 80 bytes = 2 GB.

Check out Task Manager during rendering in native LW.

If it won't eat 2 GB, maybe LW instancer has two modes: when it's buffering position, rotation and scale, and when it doesn't have to buffer it (because can calculate at any time from points and/or polygons). But I doubt.
You can try connecting vectors in Node Editor in Edit Nodes in LW Instancer to also check it.

Hieron
10-04-2012, 08:00 PM
I didn't say "good universal instancing engine" is about LW instancer ;)

But as Cageman pointed out, it would not be possible to so nicely control it, especially using node editor, without buffering all data once at initialization..

hehe :)
Well, as I pointed our earlier already, that is a bummer then. As it means losing functionality.. I do hope I've shown that at least.. :/

Sensei
10-04-2012, 08:03 PM
You can't have cake and eat it at the same time.

Autograss for sure is completely automatic matrix generation for each grass index. Like I showed in #35 post.
Specialization that works.

Hieron
10-04-2012, 08:04 PM
25M instances * 80 bytes = 2 GB.

Check out Task Manager during rendering in native LW.

If it won't eat 2 GB, maybe LW instancer has two modes: when it's buffering position, rotation and scale, and when it doesn't have to buffer it (because can calculate at any time from points and/or polygons). But I doubt.
You can try connecting vectors in Node Editor in Edit Nodes in LW Instancer to also check it.


It eats 14GB+ here (see my posted images of memory/taskmanager)and maxes my 16GB memory machine for 25M instances... so not nearly 2Gb or 4GB.. Later on it drops to 9M (maybe some culling happened?). Still much higher..

Hieron
10-04-2012, 08:09 PM
You can't have cake and eat it at the same time.

Autograss for sure is completely automatic matrix generation for each grass index. Like I showed in #35 post.
Specialization that works.

Hey hold on, I am just asking here. I only needed/wanted a response and: "You can't have cake and eat it at the same time." is fine and one of the responses I expected. All I needed to know. I just posted the examples 2x to get the issue across as there was even debate if there is an issue.

And seeing the general trend here these few days, no one even agrees. Which is interesting in itself btw... Must be the only one interested in millions of instances..

perhaps we were spoiled with HDI

Sensei
10-04-2012, 08:14 PM
Your scene from post #37 is immediately crashing after F9 here so I can't check it..

- - - Updated - - -


Must be the only one interested in millions of instances..

Or you're the last LW user on the planet.. :p

Hieron
10-04-2012, 08:18 PM
Your scene from post #37 is immediately crashing after F9 here so I can't check it..

You sure it is crashing? The entire point here is that it hangs, sucks up memory like a mofo and stalls LW. On my 2600k it stalls for 4minutes, goes above 16GB of memory usage. I suppose that crashing could happen too.

See, HDI would not do that. it would start rendering :)


- - - Updated - - -



Or you're the last LW user on the planet.. :p

heh.. who knows.. regarding arch viz you'd think from the response..
No clue what eg. Vray has though.. if HD made Autograss, I suppose Vray's own instancing systems isn't up to it either.. But that is a few years ago. Should test it one day.

Sensei
10-04-2012, 08:23 PM
There is 8 crash reports windows from OS..

And in item that should have instances in title I see 0 instances.. Trying to double click instancer, and nothing.. Can't do that. Panel doesn't show up. Can't change any settings.
Maybe you're not using the same LW v11 version, as I am.

Hieron
10-04-2012, 08:24 PM
11.0.3 here.. 64 bit ofcourse. Are you on a newer build hmmm? :P

It should show 0 instances btw, as I have turned off preview on it and LW did not generate them yet.

Cageman
10-05-2012, 11:22 AM
hehe :)
Well, as I pointed our earlier already, that is a bummer then. As it means losing functionality.. I do hope I've shown that at least.. :/

Well, we are effectively getting more controll in the native instancer, but currently, at the sacrifice of speed, especially when throwing in extreme numbers of instances. It is going to be interresting to see what LW3DC can do for such large datasets. I wonder if a switch could be added where you simply turn off the pos/rot/scale when all you need is to simply throw out billions of instances.

Netvudu
10-05-2012, 01:07 PM
yes, in Mantra there is a similar option called "fast point instancing". You sacrifice some control in exchange for resources...I donīt see why this couldnīt be implemented in LW.

Hieron
10-05-2012, 02:30 PM
Well, we are effectively getting more controll in the native instancer, but currently, at the sacrifice of speed, especially when throwing in extreme numbers of instances. It is going to be interresting to see what LW3DC can do for such large datasets. I wonder if a switch could be added where you simply turn off the pos/rot/scale when all you need is to simply throw out billions of instances.

Something like that.. even though HDI does allow random variation of pos./rot./scale. Much the same like the non nodal instances... I have no clue how all this works in code though.. but if a switch and some sacrifice is needed, I'm all for it :)

Perhaps if one does not use nodal and tick that box.. it can already be in the "fast instancing" mode..?

How I would love to talk to dev's sometimes.. Guess I'm not the only one and they would spend all their time listening to requests/demands and chitchat :)

Sensei
10-05-2012, 03:17 PM
Perhaps if one does not use nodal and tick that box.. it can already be in the "fast instancing" mode..?

You rather mean "slow instancing mode".. Because having to generate position, scale and rotation over and over again, saves memory, but it's slower..



How I would love to talk to dev's sometimes.. Guess I'm not the only one and they would spend all their time listening to requests/demands and chitchat :)

Install skype.. :p

Cageman
10-05-2012, 04:19 PM
Something like that.. even though HDI does allow random variation of pos./rot./scale. Much the same like the non nodal instances... I have no clue how all this works in code though.. but if a switch and some sacrifice is needed, I'm all for it :)

No... you should still be able to do that... what I was refering to was the buffering of this data into memory. The reason for Native instancing to do this is because, unlike HDI, the idea behind the native implementation is related to a more open communication network between different aspects of LW. If you use HDI, can you make a light follow any of those instances? No, but you can with LW Native. Now, that doesn't sound too impressive, but in future versions of LW, we will start to see more and more usage for such interchange of data inside of LW.

One example that I saw that was quite cool (an example of the flexibility) was a setup where each instance used DPs Ray Intersect node (this was done through the node-editor using the Index as an input, if I remember correctly). This resulted in all instances following a displaced terrain, without the use of particles, surface placements or polygon placements (it was a test to do a crowd simulation with humanoid objects). The trick, in that case was a separate instance-source (not the regular ground object). This object which was flat on y-axis generated the instances, but the Ray Intersect-node that was using the groundmesh as a reference, forced each instance to fire a ray in positive y direction. When the ray hit the surface of the groundmesh, it moved the instance to that particular spot on the groundmesh.

Now, this might sound computation heavy, but in fact, it works in realtime in OGL, so no need to wait for any rendering (the number of instances does affect the level of smoothness as well, of course, but that should go without saying). :)

With all this said, I do agree that this flexibility isn't allways wanted (again, your examplescenes have clearly demonstraded that), so a switcher that more or less turns off the buffering of the data into memory should, if it can be implemented, get native instancing much, much closer to HDI.

Cageman
10-05-2012, 04:43 PM
Not sure, I am no developer so wouldn't dare to make assumptions there. But I have a feeling you are not a programmer either... Sensei, does the way native instancing allow more control, mean that all instances need to be generated per se and put into memory (in view or not)? Does LW's native system prohibit low memory usage and culling?



hehe this is a nice sentence. :)
You managed to buffer the "not as fast as HDI" with other comments. And I hardly think HDI does not communicate anything with LW..?

Obviously I was very tired and the sentence was coming out the wrong way. :)

If we focus on culling, LW Native renders much faster with culling than without, but since it also computes and store each individual instance pos/rot/scale/id (which HDI does not, or if it does, it can only be accessed by HDI itself, not LW), it will have the pre-computation time no matter what. This is what I would call a tradeoff of new functionality vs speed. In your example where you want to have that many instances, it is of course an issue. For situations where you have a couple of hundred animals running in a herd, or a couple of thousands of them, you might need that additional control to be able to select a specific instance to have a completely different shader, or, with some clever nodal in Nodal Motion, have a specific light track a specific instance (or a group of them).

There are lots of possibilities with the Native instancer that I really like, that HDI is not close to have, and, I'm not sure I would want to go back to those limitations. It's really up to LW3DC to come up with a solution for extreme situations like in your case with grass.


So I suppose you agree with me that native cannot keep up in the examples I posted.

Yes and no. :) Yes, it is much faster with that many instances! That is my short answer... the longer answer would be to think about eco-systems and pre-define a smaller area that then becomes the source for the Instances. I saw a behind the scenes of Animal Armageddon where the designer talked about this... each tree had its own ground and own patch of grass, dirt, mud etc (its own eco sytem so to speak). This was pretty much a model that then got instanced. I think that they did that for several reasons, but the most obvious one is for controll, and I guess that they also were mindfull of how many instances they actually used... I would argue that just because you can instance billions of grass-patches, it might not be the best solution when you also have to instance x thousands of trees, bushes, rocks etc. :)

I would argue that there is room for a grass-specific instancing tool that is very specialiced in just doing that, with no additional controll other than random rotations, alignment, jitter. No nodes, no pos/rot/scale data being buffered, or some form of switch in Native Instancer itself.

Hopefully I'll be able to showcase some of the typical things we've ended up doing that still benefits a whole lot from instancing, but that is more modest on the number of instances needed and where HDI is the bottleneck, not Native. I think this is a good thread, and it has certanly enlighted me with insights into where HDI is beneficial, but I also think it is important to showcase situations where the oposite is true.

:)



Do you also agree that such a thing would be usefull?

Yes!


If so, are you 100% sure it can not be done from a programming point of view? How did you come to this view?

As I elaborated in the previous post, I think the only way to get there is to remove the added functionality. Remove the the features added that allows us to per instance reference the position, rotation and scale... data that is accessable to anywhere there is a nodal interface (which is litterally everywhere in LW these days). I think that isn't an option though, so maybe a switch that can turn that data off, is probably the only viable option from a code perspective. Wether it is possible or not, I do not know.

Another solution, code-wise, would be to develop tools that allows the user to easily manufacture "instance patches" using instances. What do I mean by that? Well, Native Instances can be converted into true polygonal geometry through a Pythonscript.... my idea here is that you have a 1x1m patch, you instance your grass to that and make sure you get the number of instances you need. Then, you hit a button that says "Convert to LWO" or something similar, upon where a new LWO is generated to disk that is a single-layer object containing what you just have done with the instancer. You then take that LWO and instance it onto the big groundmesh. By doing this, you have now effectively reduced the number of instances needed by 10.

This IS doable right now, but it does require manual labour. I don't think such a toolset would be that hard nor timeconsuming to develop though... I'm quite sure it could be developed by third party, actually.

:)

jwiede
10-08-2012, 12:38 PM
There are lots of possibilities with the Native instancer that I really like, that HDI is not close to have, and, I'm not sure I would want to go back to those limitations. It's really up to LW3DC to come up with a solution for extreme situations like in your case with grass.
I seriously question whether Hieron's grass example is as "extreme" as you're making it out to be. It's a very common arch-viz need, and on LW, using instanced grass is significantly more efficient than other options like using hair or displacement.

On a more general level, it seems to me the whole point of instancing systems is to generate lots of instances, enabling scenarios where the host pkg would never be able to deal with real geometry versions. That certainly appears to be the goal of competitors' instancing systems (f.e. Lux's "Hippo Replicators" demo), and they all appear to be achieving said goal.

LW's native instancing is already choking memory on single-digit millions of instances, yet neither HDI nor competing packages seem to encounter much difficulty producing much higher quantities of instances (up to 1000x greater in some cases). Their implementations also seem to be able to offer fine-grained control of instances (though perhaps not quite as much as LW's), without sacrificing quantities they can generate. They even support recursive instancing in many cases, something LW's native instancing doesn't support (yet, anyway).

Seems like the real question here is: What are instancing "common uses" (by genre), and what quantities are involved? Maybe I'm completely wrong about high-quantity instancing uses being more common than low-quantity uses, but so far that doesn't appear to be the case based on the frequency of uses I've seen from various visualization communities, etc. The capabilities and marketing of other packages' instancing systems also strongly appear to support and target high-quantity instancing users.

Hieron
10-16-2012, 09:21 AM
No... you should still be able to do that... what I was refering to was the buffering of this data into memory. The reason for Native instancing to do this is because, unlike HDI, the idea behind the native implementation is related to a more open communication network between different aspects of LW. If you use HDI, can you make a light follow any of those instances? No, but you can with LW Native. Now, that doesn't sound too impressive, but in future versions of LW, we will start to see more and more usage for such interchange of data inside of LW
....

With all this said, I do agree that this flexibility isn't allways wanted (again, your examplescenes have clearly demonstraded that), so a switcher that more or less turns off the buffering of the data into memory should, if it can be implemented, get native instancing much, much closer to HDI.

Well surely I do like the flexibility.. just wondering whether it is a trade off that needs to be made. Or it just requires more work. I hope the latter.


..
I would argue that there is room for a grass-specific instancing tool that is very specialiced in just doing that, with no additional controll other than random rotations, alignment, jitter. No nodes, no pos/rot/scale data being buffered, or some form of switch in Native Instancer itself.

Please don't make it grass specific.. our lipids and proteins would feel left out.
I'm just hoping LW3DG manages to find a way to speed up the current methods and make them less memory footprint heavy.


Hopefully I'll be able to showcase some of the typical things we've ended up doing that still benefits a whole lot from instancing, but that is more modest on the number of instances needed and where HDI is the bottleneck, not Native. I think this is a good thread, and it has certanly enlighted me with insights into where HDI is beneficial, but I also think it is important to showcase situations where the oposite is true.

Ofcourse a whole lot benefits from instancing and I do not doubt that HDI is outperformed in quite a few places. I was mostly referring to the leap it still has to make to reach into actual millions. Hopefully without removing functionality by a switch, but if it needs be...


..
As I elaborated in the previous post, I think the only way to get there is to remove the added functionality. Remove the the features added that allows us to per instance reference the position, rotation and scale... data that is accessable to anywhere there is a nodal interface (which is litterally everywhere in LW these days). I think that isn't an option though, so maybe a switch that can turn that data off, is probably the only viable option from a code perspective. Wether it is possible or not, I do not know.

I hope no funtionality has to be removed. Now would be a great time for the responsible dev to chime in :P


..Another solution, code-wise, would be to develop tools that allows the user to easily manufacture "instance patches" using instances. What do I mean by that? Well, Native Instances can be converted into true polygonal geometry through a Pythonscript.... ...

This IS doable right now, but it does require manual labour. I don't think such a toolset would be that hard nor timeconsuming to develop though... I'm quite sure it could be developed by third party, actually.

:)

Converting instances to polygons is great, but imho a very poor solution for the presented problem. Even if so, it could be done in Modeller just fine.. just make a larger patch etc. But that is not ideal at all. And would not conform to curved surfaces well.


I seriously question whether Hieron's grass example is as "extreme" as you're making it out to be. It's a very common arch-viz need, and on LW, using instanced grass is significantly more efficient than other options like using hair or displacement.

On a more general level, it seems to me the whole point of instancing systems is to generate lots of instances, enabling scenarios where the host pkg would never be able to deal with real geometry versions. That certainly appears to be the goal of competitors' instancing systems (f.e. Lux's "Hippo Replicators" demo), and they all appear to be achieving said goal.

LW's native instancing is already choking memory on single-digit millions of instances, yet neither HDI nor competing packages seem to encounter much difficulty producing much higher quantities of instances (up to 1000x greater in some cases). Their implementations also seem to be able to offer fine-grained control of instances (though perhaps not quite as much as LW's), without sacrificing quantities they can generate. They even support recursive instancing in many cases, something LW's native instancing doesn't support (yet, anyway).

Seems like the real question here is: What are instancing "common uses" (by genre), and what quantities are involved? Maybe I'm completely wrong about high-quantity instancing uses being more common than low-quantity uses, but so far that doesn't appear to be the case based on the frequency of uses I've seen from various visualization communities, etc. The capabilities and marketing of other packages' instancing systems also strongly appear to support and target high-quantity instancing users.

/agree.

Anyway, I for one will eagerly install 11.5. Open the 25M instances testscene with native instances and pray while looking at the taskmgr. Hopefully it will be faster and less memory savy :) Ow and I will pray and test that HDI will at least still work...

pixym
10-31-2012, 07:56 PM
Hi,
This thread is very very interesting. I hope Graham will keep updating HD Instance (specially to support future LW versions and more than 16 threads while rendering). I hope next versions of LW will render huge amount of instances faster than 11.0 do as well.
Best.
P.S.: Hieron, your archviz company produce very nice jobs.

jwiede
11-01-2012, 03:29 AM
You can't have more instances visible from camera than pixels on render..
So for 1280 * 720 it's 921,600
Technically, the number could be substantially higher, due to the sub-pixel sampling of AA. It's really more like "<width> x <height> x <number of distinct samples used>", since the final color chosen could represent an average of a separate instance for each sub-pixel sampling point used in AA. Multiple adaptive AA passes, localized supersampling for edge detection, etc. can all further modify that number. Minor nit, just pointing out AA impact.