Page 1 of 5 123 ... LastLast
Results 1 to 15 of 62

Thread: Native Instances vs HD-Instance..

  1. #1
    Super Member Hieron's Avatar
    Join Date
    Aug 2006
    Location
    Netherlands
    Posts
    1,685

    Native Instances vs HD-Instance..

    My 1500th post, I'll try to make it count..

    Since posting this tutorial on nice grass using HDI: http://forums.newtek.com/showthread....lanation-files , we have been using grass (and trees and bushes and flowers) alot in our renders. Now with native instancing, it has become interesting to replace trusty (but not actively developed) HDI. However, this is not possible yet and I hope this thread may help to shed some light on some areas that may need significant improvement.

    Right, the actors in this play: (nothing fancy this time, see the old thread for something nicer... or our webpage..)

    The grass:
    Click image for larger version. 

Name:	Grass_actor.jpg 
Views:	139 
Size:	103.2 KB 
ID:	108191
    2.5k poly's per patch

    The ground:
    Click image for larger version. 

Name:	Ground_actor.jpg 
Views:	116 
Size:	262.4 KB 
ID:	108192
    500mx500m. Each poly (outlined in yellow) is ~8mx8m

    Let's start with HDI, instance the grass over the ground surface and place about 1 instance per meter. Since this is a 500x500 ground we end up with +- 250k grass patches:
    Click image for larger version. 

Name:	HDI_250k.jpg 
Views:	200 
Size:	459.6 KB 
ID:	108193
    Shot rendered in 14.3 seconds. (this is for comparison vs other shots. All settings kept the same unless mentioned)

    Let's ramp it up a notch and go for 10 instances per meter. So, this leads to +- 25M instanced grass patches:
    Click image for larger version. 

Name:	HDI_25M.jpg 
Views:	271 
Size:	983.9 KB 
ID:	108194
    Shot rendered in 53.5 seconds and only took 500MB more memory than the 250k instances. Notice the nice even spread.

    Let's go mental and go for 50 per meter! We are talking 625M instances, ~1500Gigapoly in total:
    Click image for larger version. 

Name:	HDI_625M.jpg 
Views:	228 
Size:	860.1 KB 
ID:	108195
    Shot rendered in 5m29s seconds and only took ~1GB more memory (limited to 1GB in HDI options, setting it higher allows HDI to use much more memory but speed is hardly better at 4m48s, very very nice usage of memory by HDI) than the 250k instances. Not bad for ~1500Gpoly... Moire patterns due to me not putting in any jitter and randomization. Can easily be done at practically no cost but left out for comparison sake.

    Now doing a little trick, the ground plane is cloned and the clone is put 1 meter above. Covering all instances which results in:
    Click image for larger version. 

Name:	HDI_625M_hidden.jpg 
Views:	181 
Size:	289.9 KB 
ID:	108196
    Which rendered in 10.7 seconds and used no memory to speak of. Nice use of culling away 625M instances there!

    Continued with the same tests using native on next post.

  2. #2
    Super Member Hieron's Avatar
    Join Date
    Aug 2006
    Location
    Netherlands
    Posts
    1,685
    Continueing with native instances, grass over the ground surface with 250k grass patches: (not shown in viewport, so only at rendertime generated)
    Click image for larger version. 

Name:	Native_250k.jpg 
Views:	122 
Size:	444.6 KB 
ID:	108200
    Shot rendered in 11.9 seconds. beat HDI by a few seconds. Native places them random by default, which may look nice here but actually has disadvantages when trying to get a uniform coverage. For say grass on a field or a lipid membrane this is essential.

    Setting the spread relax to "10" yields:
    Click image for larger version. 

Name:	Native_250k_relax.jpg 
Views:	137 
Size:	460.1 KB 
ID:	108201
    Which is better. LW was unresponsive while trying to generate them, for 10 seconds. Total rendertime 21.2s. Losing to HDI by seconds..

    Back to random/non-relaxed and trying 25M instances using native, a number that is trivial to HDI. It makes memory usage shoot sky high and hit my 16GB memory limit in to time. Waiting for 2m40s before anything even happens.... see CPU usage drop to ~0% in the meantime. Then it finally starts to render. Memory usage dropped back down to ~7GB (some culling has occured after generation?) and CPU usage stays at 100%. Resulting in:
    Click image for larger version. 

Name:	Native_25M.jpg 
Views:	232 
Size:	1,016.2 KB 
ID:	108202
    Which is admittedly a nicer 25M instances, but HDI can be made to look like this (more random) easily at no rendertime cost and it took 53.5 seconds at 500MB memory whereas native took 15m50s for this image!

    So what about the trick? The culling at generation? Remember, HDI managed to render a scene with 625M instances instantly when all were covered (how on earth did Graham pull that off with HDI, it is amazing....), it knew they did not need rendering. Native instancing has no such thing.. it will generate every single one of them, even if none will show up at all... So this image which shows not a single instance of the 25M that were generated uselessly:
    ---image not included, as LW hangs at 100% completed. CPU at 0%, memory not changing for minutes...---

    Made LW stall for 2m 30s while memory fills up completely (like before) and in all took 5m 4s to render (untill it hang at 100%)...

    You will understand that I won't try the uniformity/relax at 10 for this.. it would take very long... Nor trying to get to 625M instances, something HDI did without breaking a sweat!

    See our issue?
    ->Native is not even close when the numbers come in, and those numbers are needed in some cases... And HDI was absolutely amazing at it. Fast, low memory, responsive<-
    It would be very sad if we lost HDI while LW keeps being updated in new cycles and HDI doesn't, since native is not able to take over its spot yet.. not even close...
    Last edited by Hieron; 09-30-2012 at 06:36 PM.

  3. #3
    Almost newbie Cageman's Avatar
    Join Date
    Apr 2003
    Location
    Malmö, SWEDEN
    Posts
    7,639
    Quote Originally Posted by Hieron View Post
    See our issue?
    ->Native is not even close when the numbers come in, and those numbers are needed... And HDI was absolutely amazing at it. Fast, low memory, responsive<-
    It would be very sad if we lost HDI while LW keeps being updated in new cycles and HDI doesn't, since native is not able to take over its spot yet.. not even close...
    Build a jungle and turn on GI... lets see which one is faster.
    Senior Technical Supervisor
    Cinematics Department
    Massive - A Ubisoft Studio
    -----
    Intel Core i7-4790K @ 4GHz
    16GB Ram
    GeForce GTX 1080 8GB
    Windows 10 Pro x64

  4. #4
    Super Member Hieron's Avatar
    Join Date
    Aug 2006
    Location
    Netherlands
    Posts
    1,685
    So, as an extra and subtler point. Let's check GI and these instances.

    Here a much smaller patch of land with HDI instanced grass (a bit randomly placed this time) and some clutter. No GI:
    Click image for larger version. 

Name:	HDI_clutter_noGI.jpg 
Views:	109 
Size:	707.8 KB 
ID:	108203
    Rendered in 39,6 seconds.

    Now with GI turned on (settings tweaked to something fast, did not compensate light as it is not about looks):
    Click image for larger version. 

Name:	HDI_clutter_GI.jpg 
Views:	140 
Size:	771.6 KB 
ID:	108204
    GI in 30.8 seconds. Total rendertime 1m49s.


    Back to native instances version, no GI. 5M pieces to match HDI a bit:
    Click image for larger version. 

Name:	Native_clutter_noGI.jpg 
Views:	117 
Size:	713.9 KB 
ID:	108207
    Rendered in 1m14s.

    And with GI turned on again (same settings as before):
    Click image for larger version. 

Name:	Native_clutter_GI.jpg 
Views:	126 
Size:	762.7 KB 
ID:	108208
    GI in 39.2 seconds. Total rendertime 2m14s. Outpaced by the plugin in both GI calc and normal render.

    It would be nice if over time, native could match the speed of HDI.

    So, hopefully this makes some sense and is usefull to someone here that needs to value the worth of native instancing vs HDI. For us, HDI will be around for a while I guess.. Hopefully native instancing will keep improving so that one day it can match and replace HDI when it comes to mass instancing things.. Hopefully before HDI stops working with new LW update cycles..

    Scenes and models provided:
    HDIvsNative.zip


    Quote Originally Posted by Cageman View Post
    Build a jungle and turn on GI... lets see which one is faster.
    Not sure how you meant the remark..

    http://www.nymus3d.nl/portfolio.php?...n=nl&player=59
    http://maps.kennispark.nl/
    Not quite trivial scenes we are applying instances to....

    I was still writing the above about GI. HDI will be much faster, hands down. GI or no GI.
    imho ofc, I'd like to be proven wrong on this...
    Last edited by Hieron; 09-30-2012 at 06:38 PM.

  5. #5
    Registered User
    Join Date
    Nov 2005
    Location
    malaysia
    Posts
    281
    I am not a programmer but I think the native instance generate instance ID for each instance generated, so it may not handle insane number of instances as well as HD.

    Basically, I think HD instance is phenomenal at handling insane number of instances, probably no other commercial instancing system can match. However, current native instancing serves my need well, in many cases better than HD. I would surely like to see it developed further but the ability to handle 625m instances in quite low on my list at the moment.

  6. #6
    Super Member Hieron's Avatar
    Join Date
    Aug 2006
    Location
    Netherlands
    Posts
    1,685
    Native surely has great improvements, it is nice that they added it! 625M instances may not be necessary for all shots per se. I'm just pointing at the loss of performance, which may not be obvious to some. We surely can't do away with HDI in our scenes and I hope to have explained why..

    Not to bash native instancing, it is great and allows many things HDI doesn't. I wonder if part of the performance difference is due to this..
    Last edited by Hieron; 09-30-2012 at 06:43 PM.

  7. #7
    Almost newbie Cageman's Avatar
    Join Date
    Apr 2003
    Location
    Malmö, SWEDEN
    Posts
    7,639
    Just finished a test with your content....

    LW Native: 4m 4s
    HDI: 4m 15s

    When rendering GI with HDI, you need to enable Volumetric Radiosity.
    Senior Technical Supervisor
    Cinematics Department
    Massive - A Ubisoft Studio
    -----
    Intel Core i7-4790K @ 4GHz
    16GB Ram
    GeForce GTX 1080 8GB
    Windows 10 Pro x64

  8. #8
    Super Member Hieron's Avatar
    Join Date
    Aug 2006
    Location
    Netherlands
    Posts
    1,685
    Quote Originally Posted by Cageman View Post
    Just finished a test with your content....

    LW Native: 4m 4s
    HDI: 4m 15s

    When rendering GI with HDI, you need to enable Volumetric Radiosity.
    Ow come on.. I know that...:/ otherwise the instances in the example render would have been dark right..? The example scene was saved after the last test, which was native, so some options like Volumetric Radiosity were turned off. Perhaps I should have left out the GI comparison as it is in the same ballpark. And not an excessive difference like the high numbers.

    What takes 4m to render for you exactly? The content I provided allows for all tests here. On your Hexacore I would expect an F9 upon load to take about 2 minutes?

    Apparantly you do not agree.. yet I think I clearly demonstrated that Native Instancing stumbles when rendering 25M instances while HDI does not mind (nevermind the 625M instances). I could have chosen to do the GI test in that regime as well, which would surely bring Native down to its knees.

    Do you not agree that native instancing is (much) slower when the high numbers come in? And that it could use some culling? We surely can not do certain stuff with Native that is doable with HDI.. it's not like we do not want Native to be able to do it.... Native has all kinds of nice options and possibilities..

    If those possibilities prohibit any chance of being able to render 25M instances as quickly and with low memory footprint as HDI then that would be interesting to know as well.
    Last edited by Hieron; 10-01-2012 at 06:40 AM.

  9. #9
    TrueArt Support
    Join Date
    Feb 2003
    Location
    Poland
    Posts
    7,884
    LW instancing is asking instance generator to provide all data about each instance. Like with particles. Position, rotation and scale etc.
    So they must take memory.
    Then they're placed in some octree/kd-tree or something similar. And OpenGL is going through the list drawing each bounding box from array.
    And when ray is hitting one of them there is done conversion of ray origin and direction to match reference item.

    Changing it means rewriting from scratch.
    Last edited by Sensei; 10-01-2012 at 06:41 PM.

  10. #10
    Super Member Hieron's Avatar
    Join Date
    Aug 2006
    Location
    Netherlands
    Posts
    1,685
    Quote Originally Posted by Sensei View Post
    LW instancing is asking instance generator to provide all data about each instance. Like with particles. Position, rotation and scale etc.
    So they must take memory.
    But how is HDI different in this.. all of those can be set with HDI too?

    Quote Originally Posted by Sensei View Post
    Then they're placed in some octree/kd-tree or something similar. And OpenGL is going through the list drawing each bounding box from array.
    OpenGL having a hard time is ok, with HDI and high numbers one would never show bounding boxes either, there is usually no need to have them all previewed. With visibility off LW only generates them at rendertime it seems. So ignoring OpenGL, isn't HDI doing the same?

    Quote Originally Posted by Sensei View Post
    And when ray is hitting one of them there is done conversion of ray origin and direction to match reference item.

    Changing it means rewriting from scratch.
    Still not sure how that would be different for HDI? That would be a bummer though...

    Do you have any idea how HDI manages to cull away so many instances so dang fast when they are occluded from view (barely) by another polygon? Always amazed me...

    Ah well, was worth a shot to bring this to attention.. it is easy to get used to be able to use so many instances.. very helpfull in massive arch viz scenes.

  11. #11
    Heffalump
    Join Date
    Feb 2003
    Location
    Away
    Posts
    3,897
    One of Sensei's points, if I understand him correctly, is that the bounding box preview limits differ between HDI and native. That adds to UI lag and memory overhead whilst working in Layout - the native system uses a % value and Layout will choke during instance evaluation unless you set that % value to be very low (default is 100%). HDInstance uses a numeric cut-off that, by default, is set to something like 100k (from memory) and so the evaluation stall and memory load is not as pronounced. If you set both to close to 0, you'll see Layout become responsive. If you set them both to be higher than your instance count, you will see equivalent stalling and memory load whilst working in Layout

    In terms of rendering, HDInstance has generally been more efficient. It seems that 11.5 will bring further improvements here. For a first iteration, the instancing in 11.0 was pretty decent.

    I'm much more concerned about the inability of hair/fur systems to match Sasquatch for large area coverage. Both LW and modo's systems fail badly in this - slow and very memory intensive.
    Inactive.

  12. #12
    TrueArt Support
    Join Date
    Feb 2003
    Location
    Poland
    Posts
    7,884
    SasQuatch doesn't have to generate any hairs in memory.. It's post process effect.. When you draw line or curve in Paint or so, do you need memory (undo is off).. ? It's placed on top of previously rendered image..

    Now imagine curve point that has x,y and z-depth (so nearest hair is rendered on farthest)..

    for( i = 0; i < hairs_count; i++ ) { draw_hair( i ); }

    Hair points generated during drawing hair, and then disappearing..
    Last edited by Sensei; 10-02-2012 at 05:41 AM.

  13. #13
    Heffalump
    Join Date
    Feb 2003
    Location
    Away
    Posts
    3,897
    Yes, I know, but I wish the FFX pixel filter mode was just as efficient, but it doesn't appear to be.
    Inactive.

  14. #14
    Almost newbie Cageman's Avatar
    Join Date
    Apr 2003
    Location
    Malmö, SWEDEN
    Posts
    7,639
    Hi guys,

    Sorry for a late response... but I think I need to clarify what I meant with "Build a Jungle and lets see which one is the faster...."

    So, what am I talking about when I say that in my experience, LWs native instancer is faster to render? Well, first of all, Hieron, you are absolutely correct that HDI does not require any translationtime, especially in comparsion to native when one is using Surface placements. In a sense, HDI is it's own closed system that only feeds LW exactly what it needs in order to render. That, and the fact that it turns everything into Volumetrics makes things pretty nice and dandy and fast. This is superefficient for situations where you don't need multiple bounces or a lot of rays to make a clean GI-render, but you will get diminishing returns from HDI when you start to crank up the quality of GI. Another thing that I have noticed is that when using Points-mode and reaching up to 4 million instances, HDI starts to have much longer pre-computing times compared to Native. In any case, I would argue that GI is the main culprit, and a showstopper when you want more than one bounce....

    At this point I say that HDI is slower, lets take a look at that. In the following images, Native is to the left, HDI is to the right. Lets use something simple, a cube instanced to a groundmesh, without any difficult shaders or other things that wouldn't work with HDI.



    This is a 6 bounce Interpolated MC, with RPE 512 and SBR 512, rest of GI-settings are default. Obviously quite overkill, but it defanately tells a different story about the efficiency of GI regarding Volumetric objects vs true polygonal objects.

    Lets get more sensible and lower the settings to something more "down to earth". Interpolated MC, 2 bounces, RPE 256, SBR 64.



    Still, it seems that native is much faster. But, I'm not too happy about these tests, because they are not really scientific because I actually do not know the exact number of instances used in HDI. I do know that in native, I have 400.000 instances. So... lets move on to some more exact comparsions. Lets use the Points on the ground-object. I will stick to the same GI-settings.



    Ok... what happened there? Native instances are 3 times faster? How can that be? Ah... of course, it is just 20402 instances... I upped the display and render subdlevel of the ground object, reaching 1.012.210 points... so... about 1 million instances.



    This is an interresting result. Native increased the rendertime with 23 seconds, and HDI only with 8. Lets see what happens if I up the number of instances to 4.014.202 without changing any scale.



    Native stayed fixed, while HDI again got bitten by the Volumetric GI slowness. As a final test with GI, lets see what happens if I scale down the cubes so that they are not intersecting.



    Ah, as I suspected... the conclusion I can draw from this is that HD-instances suffers greatly when large areas of the instances are recieving GI. In the previous comparsion, it was almost redicilously slow compared to native, in this last one, it is more evened out, even if native is still about 2x faster.

    Lets just render without GI now.



    Nopes... still slower... for some reason, with these number of instances, HDI seems to have a longer "pre-computing pass" compared to native. That said, when using Surface mode, HDI has less precomputing (almost none actually). But, as is evident, GI is one of the more important factors for me at least, that makes LWs native instancer the faster choice, especially when you want to use more than one bounce.

    I hope this clear things up. As for room for improvements regarding Instancing... oh yes... absolutely. I can also understand that the higher level of controll with LWs native instances (I mean, through nodes and whatnot) will sacrifice speed to a certain level, something that Sensei touched on with his post. It is, simply put, a more expandable implementation as it is, and I suspect we will see some really cool things in future versions of LW, speed-improvements included.

    Cheers!
    Senior Technical Supervisor
    Cinematics Department
    Massive - A Ubisoft Studio
    -----
    Intel Core i7-4790K @ 4GHz
    16GB Ram
    GeForce GTX 1080 8GB
    Windows 10 Pro x64

  15. #15
    TrueArt Support
    Join Date
    Feb 2003
    Location
    Poland
    Posts
    7,884
    That, and the fact that it turns everything into Volumetrics
    Did you read Volumetrics LWSDK?

    What is volumetric?
    It's plugin which is receiving ray origin and ray direction, and two floats clipping plane near and clipping plane far, as parameters. Clip near is 0 or close to it in the most situations. Clip far is usually infinity.
    And it has to return distance to the closest to ray origin intersecting geometry (between near and far clips), color and alpha at that spot..
    And can STOP evaluation, if it'll return alpha=1.0 (color is final, completely opaque)..
    (Volumetric HyperVoxels don't stop evaluation, but that's their implementation chosen by programmer..)
    It absolutely doesn't differer from regular ray-tracing which is finding what triangle intersected ray..
    Renderer can even use internally the same routine for regular triangle mesh.. Mesh can be implemented as (one of) volumetric plugins in ray-tracing renderer engine. Without any speed impact.

    Volumetric is not type of data.. there is no "conversion" like somebody can understand in your post..
    You probably mixed it with voxels
    http://en.wikipedia.org/wiki/Voxel
    Where is grid of evenly placed data, and it's nicely blended together inside of volumetric.
    Last edited by Sensei; 10-04-2012 at 08:56 AM.

Page 1 of 5 123 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •