Results 1 to 15 of 15

Thread: NDI output not as smooth as possible

  1. #1
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11

    NDI output not as smooth as possible

    Hello,

    I am developing an application that is rendering smooth graphics and animations on a desktop monitor. I am capturing back these graphics via a Windows 10 screen capture API, and attempting to send the resulting frames over NDI. It is working pretty well, but I cannot seem to achieve absolutely smooth, high framerate playback over NDI and I can't figure out where the bottleneck is, or what I'm doing wrong.

    In short, this is the flow of my application:
    1. Capture the screen via a Windows API. This gives me a texture on the GPU.
    2. Copy the memory to a staging texture where the CPU can access it --> this gives me a memory pointer for where the pixel data is located
    3. Send the memory pointer to an NDI sender which sends the frame.


    The screen capture API as well as copying the texture to the CPU is more then fast enough to reach 144+ FPS (I have timed it). There is less then 1 ms delay between receiving subsequent frames which includes all my processing.
    The capture API also gives me a timestamp in the form of a QueryPerformanceCounter, which I am trying to send along via NDI to get a better time syncing, but that doesn't seem to help at all.

    When I send the frames over NDI, I get animations that seem "not quite" 60 FPS. There is also some artifact sometimes depending on which of my two monitors I am using to render the original image on. One of my monitors is 60 Hz, the other is 144 Hz.

    I captured two videos in slowmotion with my phone. It is visible in real-time too but wanted to make it much clearer. The top animation is my NDI output, visualized in the NDI Studio Monitor. The bottom one is the "original" image which my app is capturing. You can see that the original image moves nice and smooth, but the NDI image has two kinds of problems:
    1. https://www.youtube.com/watch?v=jvK2DkgKs7E - In the first video I am capturing my 60 Hz screen. The NDI output is clearly "jumping" and not as smooth as the original. This results in an annoyingly stuttering image.
    2. https://www.youtube.com/watch?v=MhD9nX0VPss - In the second video I am capturing my 144 Hz screen. Now the output is no longer jumping as much, but now shows horizontal streaks, like subsequent rows in the data are not synced properly. This is also very visible in realtime.


    My app is in C# but hopefully the relevant code is still pretty clear. I tried to remove all of the things I thought were unnecessary.


    Part 1 is where I receive my texture via the screen capture API, convert it to a CPU texture and send the memory pointer to NDI. I also send the size and timestamp.
    Code:
                // Copy GPU to CPU memory
                d3dDevice.ImmediateContext.CopyResource(texture, copy);
                d3dDevice.ImmediateContext.MapSubresource(copy, 0, 0, MapMode.Read, MapFlags.None, out DataStream stream);
                d3dDevice.ImmediateContext.UnmapSubresource(copy, 0);
    
                // Send to NDI
                var memory = stream.DataPointer;
                var time = frame.SystemRelativeTime.Ticks;
                _sender.SendData(memory, time, copy.Description.Width, copy.Description.Height);
    Part 2 is the SendData method where I create a frame and send it:
    Code:
            public void SendData(IntPtr data, long time, int width, int height)
            {
                int xres = width; 
                int yres = height; 
    
                stride = (xres * 32 + 7) / 8;
                bufferSize = yres * stride;
                aspectRatio = xres / (float)yres;
    
                NDIlib.video_frame_v2_t frame = new NDIlib.video_frame_v2_t()
                {
                    xres = xres,
                    yres = yres,
                    FourCC = NDIlib.FourCC_type_e.FourCC_type_BGRA,
                    frame_rate_N = Framerate,
                    frame_rate_D = 1,
                    picture_aspect_ratio = aspectRatio,
                    frame_format_type = NDIlib.frame_format_type_e.frame_format_type_progr  essive,
                    timecode = time, 
                    p_data = data,
                    line_stride_in_bytes = stride,
                    p_metadata = IntPtr.Zero,
                    timestamp = 0
                };
    
                NDIlib.send_send_video_async_v2(sendInstancePtr, ref frame);
            }
    Framerate is typically 60 or 144 but it does not seem to make a big difference to the output in Studio Monitor.

    What can I do to improve the smoothness of the NDI playback? I think there is no bottleneck in the screen capture, when I time how many frames I am sending to NDI I easily reach ~143 FPS on average.

    Could it perhaps be an issue with playback in Studio Monitor? I don't have any actual hardware to test it on.

    Thanks for any suggestions!

  2. #2
    Registered User
    Join Date
    Nov 2019
    Location
    Canada
    Posts
    11
    When you're filling out your sender setting instance, are you setting the clock_video parameter to true? If so have you tried setting it to false to see if that makes a difference in framerate?

    //Sender setup
    senderSettings.clock_video = false;
    senderSettings.clock_audio = false;
    senderSettings.p_groups = nullptr;
    senderSettings.p_ndi_name = "Test Sender";

    sender = NDIlib_send_create(&senderSettings);

  3. #3
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11
    Thanks, I had it set to true yes. It does not seem to make any difference whether I use true or false.

    I also tried experimenting with every combination of:
    - Using "send_send_video_async_v2" vs "send_send_video_v2"
    - Using a timecode or using the default "send_timecode_synthesize"
    - Many different framerates.

    I found some moderate success by setting the desired framerate very high (e.g. 250+). In some situations (don't know what's exactly causing it), the problem goes away and the NDI stream looks very smooth. Then a second later it goes back to the same issue. It may be related to my use of dual monitor setup, it seems to matter if I move my mouse (or click something) in the 60 Hz monitor. Tomorrow I'll experiment with a single monitor instead.

    I'm afraid I'm just running into some of my own hardware issues / bottlenecks and this whole thing isn't a problem at all if I could just use an actual NDI setup...

  4. #4
    Registered User
    Join Date
    Nov 2019
    Location
    Canada
    Posts
    11
    Looking at your videos it would make sense if it's related to the monitor. The 144 Hz video artifacts look totally different. Curious to see how your tests go.

  5. #5
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11
    After experimenting more, I still have not solved this issue.

    I have changed my code back to make it more similar to the C# WPF example. In my new case, every time a new frame comes in it is simply added to a collection. On another thread, a continuous loop is constantly pulling the frames from that collection and sending them to NDI, using the sync version (not async) so it blocks just long enough to reach the desired framerate. If the collection has more than 1 frame, I discard the others (rendering is ahead of the desired framerate).

    This gives me a very smooth stream on the NDI studio monitor, but I still have the same issue. I think it may be some kind of tearing issue where it is somehow sending frames out of order or something?! I have no idea what to do from now on. The problem goes away completely when I use Field0 or Field1 frame type, but that is not desirable as I miss half my pixels that way!

    Here is another video of the problem. Sorry it is a bit confusing but this is the only way I can show the problem clearly. The video shows 3 blue bars shrinking and expanding:
    https://www.youtube.com/watch?v=c5Bk...ature=youtu.be
    1. The middle bar is the "real" graphics (drawn by a WPF application) that I am capturing with the windows capture API.
    2. The top bar is the captured graphics, directly copied to another GPU texture. There is no NDI involved here, it is simply a copy, and behaves identical to the real graphics. The only difference is the scale (the playback window is slightly smaller).
    3. The lower bar is my NDI stream viewed from Studio Monitor, and this shows the issue where the right-side of the bar is not a straight line but jagged.

    Screenshot of the problem more clearly, see the right side:
    https://imgur.com/cQYEtyN


    Here is my updated code:
    Code:
    // Collection to hold frames
    private BlockingCollection<NDIlib.video_frame_v2_t> _frameQueue;
    
    public void SendData(IntPtr data, long time, int width, int height)
    {
    	var stride = (width * 32 + 7) / 8;
    	var bufferSize = height * stride;
    	var aspectRatio = width / (float)height;
    
    	// Create frame
    	NDIlib.video_frame_v2_t frame = new NDIlib.video_frame_v2_t()
    	{
    		xres = width,
    		yres = height,
    		FourCC = NDIlib.FourCC_type_e.FourCC_type_BGRA,
    		frame_rate_N = Framerate,
    		frame_rate_D = 1,
    		picture_aspect_ratio = aspectRatio,
    		frame_format_type = GetFrameType(),
    		timecode = NDIlib.send_timecode_synthesize,
    		p_data = data,
    		line_stride_in_bytes = stride,
    		p_metadata = IntPtr.Zero,
    		timestamp = 0
    	};
    
    	// Add to list
    	_frameQueue.Add(frame);
    }
    
    private void FrameQueueHandler()
    {
    	while (!_exitThread)
    	{
    		if (_frameQueue.TryTake(out var frame, 25))
    		{
    			// Drop frames if UI is ahead of NDI framerate
    			while (_frameQueue.Count > 1)
    				_frameQueue.Take();
    
    			// Submit frame
    			// This blocks for the correct amount of time to reach the desired framerate
    			NDIlib.send_send_video_v2(_sendInstancePtr, ref frame);
    		}
    	}
    }
    SendData is called at pretty much exactly 144 frames per second, from the windows capture API. Note I am not freeing the memory of each frame, because that is handled by the capture API directly (if I try to free this memory I get access violations).

    I have no clue what I can still do to improve the situation. Any help would be highly appreciated...

  6. #6
    LightWave Engineer Jarno's Avatar
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    618
    How are you managing the frame video data? How do you guarantee that the data pointed to by "data" remains valid and correct while the frame is in the queue until it has been sent?
    Could it be that the copy from GPU to CPU memory hasn't finished yet by the time NDI starts sending it?

    ---JvdL---

  7. #7
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11
    Thanks Jarno, I think you have touched on the root cause of this issue indeed. I found it myself too before you post. The issue happens because I directly use the memory that I obtain from the screen capture and send it along to NDI. There is indeed no guarantee that data is still correct by the time NDI gets to it. I don't think NDI can start sending before the data is copied, but it could possibly happen that the memory is changed before or during the NDI send call.

    I can get around the problem by manually copying the data to another location and sending the pointer of that copy, e.g. using Marshal.AllocHGlobal, then Buffer.MemoryCopy. Then, I also have to free that memory again for every frame. It works but it adds a couple milliseconds of overhead and is making the stream less smooth. I'm still reaching around 130 fps according to my own timing but it does not look close to that. Maybe some frame timing issues now...

    Related question: how exactly am I supposed to use the 'timecode' if I want to set my own frame timings rather than let NDI generate them? I get a timestamp from the screen capture and want to use that to set the frame timecode. But the timestamp I get is something like the number of milliseconds since last system boot I think, and I think NDI requires number of milliseconds since Unix epoch? It is not fully clear to me.

  8. #8
    Registered User
    Join Date
    Aug 2015
    Location
    london
    Posts
    266
    Quote Originally Posted by Nick89 View Post
    Thanks Jarno, I think you have touched on the root cause of this issue indeed. I found it myself too before you post. The issue happens because I directly use the memory that I obtain from the screen capture and send it along to NDI. There is indeed no guarantee that data is still correct by the time NDI gets to it. I don't think NDI can start sending before the data is copied, but it could possibly happen that the memory is changed before or during the NDI send call.

    I can get around the problem by manually copying the data to another location and sending the pointer of that copy, e.g. using Marshal.AllocHGlobal, then Buffer.MemoryCopy. Then, I also have to free that memory again for every frame. It works but it adds a couple milliseconds of overhead and is making the stream less smooth. I'm still reaching around 130 fps according to my own timing but it does not look close to that. Maybe some frame timing issues now...

    Related question: how exactly am I supposed to use the 'timecode' if I want to set my own frame timings rather than let NDI generate them? I get a timestamp from the screen capture and want to use that to set the frame timecode. But the timestamp I get is something like the number of milliseconds since last system boot I think, and I think NDI requires number of milliseconds since Unix epoch? It is not fully clear to me.
    timestamp and timecode are 2 different entities.

    in general, leave timestamp alone and NDI will take care of that, but you can put *whatever time you want* in timecode - since its a user / application defined field but for good compatibility with other systems this is designed to be a time based value using 10,000,000 units per second since the unix epoch. it doesn't have to represent 'now' - it can represent any timecode that is appropriate for your application.

    it may technically be possible to override the NDI timestamps but that should be attempted very cautiously.
    Last edited by livepad; 02-24-2020 at 03:58 AM.

  9. #9
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11
    Quote Originally Posted by livepad View Post
    timestamp and timecode are 2 different entities.

    in general, leave timestamp alone and NDI will take care of that, but you can put *whatever time you want* in timecode - since its a user / application defined field but for good compatibility with other systems this is designed to be a time based value using 10,000,000 units per second since the unix epoch. it doesn't have to represent 'now' - it can represent any timecode that is appropriate for your application.
    Now I'm confused. I understood the following:
    - timecode: set to int64.MaxValue to let NDI take care of the frame timing. Set to another value to determine your own frame timing. My question was: what value should i use? You say 10,000,000 units per second since epoch so I will try that.
    - timestamp: not used to determine frame timings.

    In my screen capture I also get a timestamp, which is probably more accurate than letting NDI determine the time (because the screen capture knows exactly to which time it belongs, rather than NDI which only knows at what time I sent it). So I am hoping to use that time to set more accurate frametimings. My main question was what the 'format' of the time should be.

    I think "10,000,000 units per second" is what .NET calls a tick, so I should be able to use that, thanks. I think I did already try it though and it didn't seem to make a difference. I'll experiment with it more.

    Do I have to turn off "clock to video" for this to work at all?

  10. #10
    LightWave Engineer Jarno's Avatar
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    618
    In the GPU to CPU memory copying, why is the resource mapped and then immediately unmapped? Unmapping will invalidate the "stream" subresource, so you can't rely on the memory contents afterwards.

    ---JvdL---

  11. #11
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11
    Quote Originally Posted by Jarno View Post
    In the GPU to CPU memory copying, why is the resource mapped and then immediately unmapped? Unmapping will invalidate the "stream" subresource, so you can't rely on the memory contents afterwards.

    ---JvdL---
    Yes, that was one of the issues indeed. Don't know why I didn't spot that before. The other issue is that I was reusing the "copy" staging texture instead of creating a new one every frame. I was recommended to reuse it for performance recently but creating a new one doesn't hurt performance and fixes the issue. I just have to keep better track of it in that case and unmap + dispose it only after the frame was sent.

    Now I am able to stream absolutely smooth at 144fps and it looks indistinguishable from real in terms of animations. However two more issues remain:

    1. At 144 fps target frame rate, the visual quality suffers a lot, it seems the graphics are compressed rather a lot. I suppose this is a built in NDI feature to improve performance. Is there any way to turn off this behavior and stream max quality, and just let me deal with the resulting performance? At least to test what framerate I can reach without this compression?

    2. If I turn down to 60 fps, the visual quality is better, but now the animations are not as smooth. They look quite stuttery for some reason. I don't get why as I can stream at 144fps fine, but 60 looks more like 30... I still suspect frame timings, but nothing I do to the timecode or timestamp is making any difference. Even if I put random values in every frame (just to see what would happen) it looks the same. It simply does not do anything... Do I need to do more then just set the timecode?

  12. #12
    Registered User
    Join Date
    Aug 2015
    Location
    london
    Posts
    266
    Quote Originally Posted by Nick89 View Post
    Now I'm confused. I understood the following:
    - timecode: set to int64.MaxValue to let NDI take care of the frame timing. Set to another value to determine your own frame timing. My question was: what value should i use? You say 10,000,000 units per second since epoch so I will try that.
    - timestamp: not used to determine frame timings.

    In my screen capture I also get a timestamp, which is probably more accurate than letting NDI determine the time (because the screen capture knows exactly to which time it belongs, rather than NDI which only knows at what time I sent it). So I am hoping to use that time to set more accurate frametimings. My main question was what the 'format' of the time should be.

    I think "10,000,000 units per second" is what .NET calls a tick, so I should be able to use that, thanks. I think I did already try it though and it didn't seem to make a difference. I'll experiment with it more.

    Do I have to turn off "clock to video" for this to work at all?
    Timecode and Timestamps are essentially just metadata carried by NDI. What they are used for is application-specific. So, just changing timecode or timestamps will do absolutely nothing unless something downstream is actually planning to use that metadata for something.

    On the whole, Timestamps are useful for audio video synchronisation - within one stream if necessary and between multiple streams, if derived from the same clock source. I suspect that most simple NDI Monitors are ignoring both timestamp and timecode and displaying content 'as it arrives' - this is the mechanism used for lowest latency. Some Monitors may have a cache to smooth out display in the case of network jitter / lumpiness, and this type of function might use timestamps, or it might be completely ignoring them and reclocking based on a fixed frame rate.

    Timecode is generally used for functionality aligned to traditional timecode usage, like time of day. Some NDI systems may translate to and from traditional SMPTE timecode and NDI timecode metadata, but in general NDI timecode won't play much of a role in precision timing (even though it has the same 64-bit accuracy as the timestamp) - due to its intended purpose (which is open to interpretation).

    In summary, I suspect you may be using tools to judge your own code which are not necessarily representing what you are doing transparently. You could be making a perfect 144fps stream with mathematically perfect timestamps, but if the tool you use to 'test' it is more interested in the pacing of your NDI packets rather than the timestamps - you are likely to be confused.

  13. #13
    Registered User
    Join Date
    Aug 2019
    Location
    X
    Posts
    11
    Quote Originally Posted by livepad View Post
    Timecode and Timestamps are essentially just metadata carried by NDI. What they are used for is application-specific. So, just changing timecode or timestamps will do absolutely nothing unless something downstream is actually planning to use that metadata for something.

    On the whole, Timestamps are useful for audio video synchronisation - within one stream if necessary and between multiple streams, if derived from the same clock source. I suspect that most simple NDI Monitors are ignoring both timestamp and timecode and displaying content 'as it arrives' - this is the mechanism used for lowest latency. Some Monitors may have a cache to smooth out display in the case of network jitter / lumpiness, and this type of function might use timestamps, or it might be completely ignoring them and reclocking based on a fixed frame rate.

    Timecode is generally used for functionality aligned to traditional timecode usage, like time of day. Some NDI systems may translate to and from traditional SMPTE timecode and NDI timecode metadata, but in general NDI timecode won't play much of a role in precision timing (even though it has the same 64-bit accuracy as the timestamp) - due to its intended purpose (which is open to interpretation).

    In summary, I suspect you may be using tools to judge your own code which are not necessarily representing what you are doing transparently. You could be making a perfect 144fps stream with mathematically perfect timestamps, but if the tool you use to 'test' it is more interested in the pacing of your NDI packets rather than the timestamps - you are likely to be confused.
    Thanks, that makes sense. This is kind of what I meant when I said "perhaps I am running into bottlenecks with my own tooling".

    I am currently using the Studio Monitor to judge how smooth the stream looks. I am hoping that Studio Monitor is a good way to do that but if it indeed completely ignores timestamps and just goes by packet arrival then I will never see it. On the other end, the intended purpose for this application is to stream to a Tricaster. I don't have one to test with though. Perhaps the Tricaster is capable of doing "better" than Studio Monitor? I am sending test builds of the application to whoever is going to be using the Tricaster but feedback is slow this way and I cannot judge very well what is improving and what isn't.

    Is there a better test vehicle then Studio Monitor available?

  14. #14
    LightWave Engineer Jarno's Avatar
    Join Date
    Aug 2003
    Location
    New Zealand
    Posts
    618
    If Studio Monitor is playing a stream with a framerate not equal to your monitor (or a whole fraction thereof) then it can appear to be not smooth.

    You could try the NDI Analysis tool to look at the statistics for the NDI stream. There you should be able to verify that the frames arrive at the intended framerate (plus or minus a small amount depending on network effects, but it should average out to the framerate over the long term, with no gaps between frames longer than twice the frame interval). It should also give you an indication of what the bitrate is.

    ---JvdL---

  15. #15
    Registered User
    Join Date
    Aug 2015
    Location
    london
    Posts
    266
    Quote Originally Posted by Nick89 View Post
    Thanks, that makes sense. This is kind of what I meant when I said "perhaps I am running into bottlenecks with my own tooling".

    I am currently using the Studio Monitor to judge how smooth the stream looks. I am hoping that Studio Monitor is a good way to do that but if it indeed completely ignores timestamps and just goes by packet arrival then I will never see it. On the other end, the intended purpose for this application is to stream to a Tricaster. I don't have one to test with though. Perhaps the Tricaster is capable of doing "better" than Studio Monitor? I am sending test builds of the application to whoever is going to be using the Tricaster but feedback is slow this way and I cannot judge very well what is improving and what isn't.

    Is there a better test vehicle then Studio Monitor available?
    A TriCaster isn't exactly going to welcome a 144fps stream any more than any other broadcast device would. It will probably frame-sync that stream to its native frame rate anyway.
    If you want more control over what ultimately gets processed, you may want to do processing in your own code to output a 59.94 or whatever frame rate stream.

    The suggestion of using the NDI Analyser is a good one to tell you technically whether your stream is as you hope - although as I said - interpretation of that is ultimately down to the end consuming device.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •