Thread: Request : IPP2 Support for Red Rocket-X

Reply to Thread
Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 22
  1. #11  
    Senior Member Blair S. Paulsen's Avatar
    Join Date
    Dec 2006
    Location
    San Diego, CA
    Posts
    5,054
    Dear RED, Please continue to devote coding resources to the RR-X PCIe card. It's ability to free up CPU resources for other tasks still has value. Thank you for your consideration.

    Cheers - #19
    Reply With Quote  
     

  2. #12  
    Senior Member
    Join Date
    Jun 2017
    Posts
    1,632
    Quote Originally Posted by Michael Lindsay View Post
    RED's initial greatest asset 'Redcode' is still going strong but due to unfortunately only modest increase in CPU (but massive bandwidth and GPU advances) over the last 10years it is not the asset I suspect Red had hoped for...

    My hope is that Red are (behind the scenes) spending serious development resources on improvements to r3d decoding speeds to make r3ds the wonder feature it was..
    Over the last 10 years you might be right but the last 2 years cpu speed has increased dramatically.

    Q2 2016 intel i7-6950x 10 core (CB R15 MC 1792 points) $ 1700 vs. Q3 2018 AMD TR2 32 core (CB R15 MC 6200 points) $ 1700 vs. Q2 2018 AMD R7 2700x (CB R15 MC 1817 points) $ 330

    That is 3,5 times more speed in a bit more than 2 years for around the same price, or the same speed for 1/5 th of the price.

    The new TR2 should be able to handle any R3D upto 8k in realtime or faster.

    8k.R3D is great as it is.
    Reply With Quote  
     

  3. #13  
    Senior Member Antony Newman's Avatar
    Join Date
    Mar 2012
    Location
    London, UK.
    Posts
    1,557
    Michael,

    Thankyou for chiming in on this.

    In further testing, I have found:
    +) Resolve is about 5% slower processing IPP2 over LEGACY (8K VV footage : Debayer Quality=Half Res Good)
    +) Resolve Half_Res_Good with an RRX is considerably faster on a TRASHCAN. 24fps playback with vs 7fps without.

    Based on this - I agree with your view that the RRX is still doing the heavy lifting - and that the debayer is not the most costly part of processing (at Half Res Good).

    Quote Originally Posted by Michael Lindsay View Post
    <snip> RRX is currently happy with the really hard part of a r3d decode (the de-wavelet).
    OTHER TESTING

    It seems that RESOLVE has some problems with certain IPP2 clips (I had not noticed this with LEGACY).

    When it struggles - the frame rate drops in Playback. In Delivery - those slowdowns are (sometimes) rendered as time reversal (or random data). Most problems seem to occur when transitioning between two 8K VV clips.

    Also : Recreating a similar transition in FCP-X has not such slowdown.

    AJ
    Last edited by Antony Newman; 07-15-2018 at 08:45 AM.
    Reply With Quote  
     

  4. #14  
    Senior Member Jason Rivera's Avatar
    Join Date
    Feb 2008
    Location
    Houston, TX
    Posts
    142
    Quote Originally Posted by Misha Engel View Post
    The new TR2 should be able to handle any R3D upto 8k in realtime or faster.
    I've only had mine for about 3 or 4 days with little time to test it out, but yes, I can confirm a significant increase in performance over using the TR1 2950x and just recently been able to achieve 8K 8:1 @ full res in realtime. I consider realtime to be where the playhead never catches up to the frame buffer and the neither the CPU or GPU utilization is maxed out. In the TR1 2950X configuration, I only had my 1080Ti plugged in, and the CPU was pegged at 100% in "Full" res while the GPU was barely doing any work (I don't remember utilization).

    With the TR2 2990WX (under optimal settings), the CPU nor the GPU ever really got above 70% utilization. I can get smooth, realtime playback @ "Full" res in RedcineX Pro thus far, but it's still not quit as instantaneous/responsive as using 1/2 or 1/4 res and still not stutter free 100% of the time. It took a lot of tweaking to figure out what the best settings were, but as it stands, using a combination of three of my Nvidia 1080 cards in the following configuration, I can get smooth, realtime playback with moderate stress on both the CPU and GPUs while the playhead stays comfortably behind the frame buffer.

    GPU Configuration:
    1) 1080Ti (Slot 1 - x16)
    2) 1080 (Slot 2 - x8) - Dual Displays connected
    3) 1080 (Slot 3 - x16) - Dedicated PhysX GPU *** Important ***
    4) Decklink Mini Monitor 4K (Slot 4 - x8)

    Edit:

    My realtime Redcine-X Pro settings by resolution are:
    1) 8k Full res - Fairly Responsive and Fairly Smooth - 2 GPUs selected (2X x16); 36-40 frames processed simultaneously - currently playing a 1.5 minute clip in a loop over and over again as I write this. The frame buffer never fills up and keeps processing.
    2) 8k Full res - Fairly Responsive and Fairly Smooth - 3 GPUs selected (2X x16, 1X x8); 40-46 frames processed simultaneously
    3) 8k Half res - Fairly Responsive and Smooth - 2 GPUs selected; 24-100+ frames processed simultaneously
    4) 8K Quarter res - Super Responsive and Butter Smooth - 1 GPU selected; 14-100+ frames processed simultaneously

    Observations:
    1) The lower the resolution, the higher the number of frames I could process simultaneously and vice versa. I could also process fewer frames than the clip frame rate and still achieve realtime playback at 1/4 resolution and lower. Surprisingly I could process 100+ frames at a time in 1/2 res.
    2) The more GPUs I used, the more frames I could process simultaneously and vice versa.
    3) The lower the number of frames I processes simultaneously (down to the clip frame rate at 1/2 and full res), the smoother the playback was overall and the more responsive it was when initially hitting the play button. Vice versa was also true.
    4) Processing any number of frames less than the clip frame rate resulted in less than realtime performance for 1/2 and full res.
    5) I think using my GPU in the x8 lane with any of the others dropped the overall performance of all the GPUs, but still worked good enough to see an increase in processed frames even at full res.
    6) For my lower than Full res configurations, I could process 64 or more frames simultaneously with no problem as well as use more than one GPU with no problem. I only listed the minimum # of GPUs needed to achieve realtime smoothly. Responsiveness varied by configuration.
    7) The TR2 2990WX CPU could process up to 100 simultaneous frames with no problem using all lower resolutions and up to 64 at full res. The GPUs however could not process 64 frames smoothly enough at full res, but it didn't matter since I could still process less frames simultaneously without the playhead catching up to the frame buffer.

    The biggest factors affecting realtime performance that I've noticed are:

    1) Which Nvidia GPU is set to the dedicated Physx GPU. This was the biggest killer of performance! If it is set to any card that is being used as a display, the playback will get choppy, even in lower resolutions. I would never use a single GPU setup with any resolution R3D file again now that I 've seen the difference.

    2) The number of simultaneous frames to process. This has to be proportional to the hardware being used and the clip frame rate, compression, etc.

    3) Whether I export to my Decklink Mini Monitor or not. Latency is a problem. Essentially, my Decklink Card is useless in Redcine at anything higher than 1/4 res, but 1/4 res works perfectly. At higher than 1/4 res I get a delay between my computer monitor and my program monitor and the lag gets exponentially worse over time to the point where eventually the system itself slows down to a crawl and playback on all displays stops, but I can still pause the playback. Must be some kind of memory buffer problem thingy majiggy on the Decklink or the PCIe slot I have it in.

    What I learned is that for the best performance and overall system responsiveness during playback in Redcine-X, I want to process the fewest number of frames simultaneously and still allow the frame buffer to fill up fast enough to keep the playhead from ever catching up.

    I'd love to know if anyone else had any luck with getting realtime performance with the new TR2 2990WX. I'm comfortable using ful res for long clips, but not all the time due to the initial responsiveness of the app at that resolution. I prefer the ultra responsiveness of the lower res settings. I think I'll stick to 1/2 res for most of my previews and 1/4 res when I'm running through clips rather quickly.
    Red Epic-W/Red Epic-MX
    Filmaker, Software Technical Architect & Serial Entrepreneur
    Reply With Quote  
     

  5. #15  
    Senior Member
    Join Date
    Jun 2017
    Posts
    1,632
    Jason, some tricks that might help you getting your workstation on speed.

    - Not knowing how many threads are at 100% but guessing only half them, try to turn off SMT so that you have 32c/32t, this should get you around 100% CPU utilization on all 32 cores and lower latencies.
    - With SMT off the threadrippers can overclock higher with lower Vcore voltages and less heat.
    - Use at least 4 x 16 GB ram sticks with atleast DDR4-2933 CL14 (Zen loves fast memory).
    - Tweak your memory, our DDR4-3200 CL14 128 GB runs at 2933 CL14 at 1.35 volt and 3200 CL14 at 1.55 volt (DDR4 can easily handle the voltage), with 4x16 GB you don't have to tweak it most of the time
    (a friend of mine clocked the same memory 4x16GB to 3200 CL12 at 1.55 volt on a i9-7900).
    - Try to sell those 2 GTX1080 and buy an extra GTX1080ti ($ 650..700), the memory bandwidth of the 1080's is only 320 GB/s whereas the 1080ti is 484 GB/s, memory bandwidth is a big limit on performance,
    I overclocked our (VEGA)GPU's memory from 484 GB/s (945 MHz) to 564 GB/s (1100MHz) and that was a big help in Resolve. (I don't know if you can overclock the ti's memory).
    - Try to use a separate scratch disk (ours is overkill 7 GB/s R/W, 4x960pro 512 GB in raid0 on a highpoint SSD72??? over 8 PCIe-lanes), 2x 1TB 970 pro's in raid0 should be great(5 GB/s seq.write and 7 GB/s seq.read)
    or when not enough budget buy cheaper ones.
    - Use Linux instead of Windows, Linux can better handle all the threads.

    For us no WS budget this year, I hope you can let it run at it's full potential.
    Reply With Quote  
     

  6. #16  
    Senior Member Jason Rivera's Avatar
    Join Date
    Feb 2008
    Location
    Houston, TX
    Posts
    142
    Quote Originally Posted by Misha Engel View Post
    Jason, some tricks that might help you getting your workstation on speed.

    - Not knowing how many threads are at 100% but guessing only half them, try to turn off SMT so that you have 32c/32t, this should get you around 100% CPU utilization on all 32 cores and lower latencies.
    - With SMT off the threadrippers can overclock higher with lower Vcore voltages and less heat.
    - Use at least 4 x 16 GB ram sticks with atleast DDR4-2933 CL14 (Zen loves fast memory).
    - Tweak your memory, our DDR4-3200 CL14 128 GB runs at 2933 CL14 at 1.35 volt and 3200 CL14 at 1.55 volt (DDR4 can easily handle the voltage), with 4x16 GB you don't have to tweak it most of the time
    (a friend of mine clocked the same memory 4x16GB to 3200 CL12 at 1.55 volt on a i9-7900).
    - Try to sell those 2 GTX1080 and buy an extra GTX1080ti ($ 650..700), the memory bandwidth of the 1080's is only 320 GB/s whereas the 1080ti is 484 GB/s, memory bandwidth is a big limit on performance,
    I overclocked our (VEGA)GPU's memory from 484 GB/s (945 MHz) to 564 GB/s (1100MHz) and that was a big help in Resolve. (I don't know if you can overclock the ti's memory).
    - Try to use a separate scratch disk (ours is overkill 7 GB/s R/W, 4x960pro 512 GB in raid0 on a highpoint SSD72??? over 8 PCIe-lanes), 2x 1TB 970 pro's in raid0 should be great(5 GB/s seq.write and 7 GB/s seq.read)
    or when not enough budget buy cheaper ones.
    - Use Linux instead of Windows, Linux can better handle all the threads.

    For us no WS budget this year, I hope you can let it run at it's full potential.
    Sweet! Thanks for the feedback. I'll definitely play around with some of these suggestions to see what works for my system or not. I'm not expert in any of this, but I can get it done.

    -I'm definitely curious about the SMT setting. I believe I can only change that from the AMD Ryzan Master software as I don't recall seeing it in the BIOS. I was wondering why none of the auto overclocking software would clock better than 10% over base. The highest I can get is 3.3GHz on all cores.
    -Also, I do use 8x16GB 2666 Memory, but only 96GB is showing up, so I believe a pair of my memory slots is not working. Need to test that out. I did notice by default the Asus ROG Zenith Extreme BIOS has the memory set to 2133MHz and I just set it to 2666MHz last night before running those tests above. I'll double check.
    -I've got two of the RTX2980Ti FEs coming next month, so this should help GPU performance. But, hopefully this is all moot when the NLEs software is updated in December to use the new Turin architecture and such.
    -I was wondering using scratch disks was still a thing since the invention of raid SSDs and them being so fast, but it wouldn't hurt. Overkill works for me. lol I'll try using a pair of those 960s separately from my media raids.
    -I'm slowly using Linux at work for software development, so it's growing on me. May do a reinstall at some point and try it out.

    I'll let you know if I can get better performance with some of these suggestions. Thanks again!
    Red Epic-W/Red Epic-MX
    Filmaker, Software Technical Architect & Serial Entrepreneur
    Reply With Quote  
     

  7. #17  
    Senior Member DJ Meyer's Avatar
    Join Date
    Dec 2008
    Posts
    844
    Quote Originally Posted by Jason Rivera View Post
    Sweet! Thanks for the feedback. I'll definitely play around with some of these suggestions to see what works for my system or not. I'm not expert in any of this, but I can get it done.

    -I'm definitely curious about the SMT setting. I believe I can only change that from the AMD Ryzan Master software as I don't recall seeing it in the BIOS. I was wondering why none of the auto overclocking software would clock better than 10% over base. The highest I can get is 3.3GHz on all cores.
    -Also, I do use 8x16GB 2666 Memory, but only 96GB is showing up, so I believe a pair of my memory slots is not working. Need to test that out. I did notice by default the Asus ROG Zenith Extreme BIOS has the memory set to 2133MHz and I just set it to 2666MHz last night before running those tests above. I'll double check.
    -I've got two of the RTX2980Ti FEs coming next month, so this should help GPU performance. But, hopefully this is all moot when the NLEs software is updated in December to use the new Turin architecture and such.
    -I was wondering using scratch disks was still a thing since the invention of raid SSDs and them being so fast, but it wouldn't hurt. Overkill works for me. lol I'll try using a pair of those 960s separately from my media raids.
    -I'm slowly using Linux at work for software development, so it's growing on me. May do a reinstall at some point and try it out.

    I'll let you know if I can get better performance with some of these suggestions. Thanks again!
    On the "memory not showing up" issue, this is a common encounter with TR4 boards. The retention design of the socket is among the worst I've ever seen so its challenging to get perfectly even torque across the entire chip. If you do not, some of the memory controller pins do not connect, and some of your RAM vanishes. So before you send RAM back, try re-doing the CPU retention.
    Reply With Quote  
     

  8. #18  
    Senior Member Jason Rivera's Avatar
    Join Date
    Feb 2008
    Location
    Houston, TX
    Posts
    142
    Quote Originally Posted by DJ Meyer View Post
    On the "memory not showing up" issue, this is a common encounter with TR4 boards. The retention design of the socket is among the worst I've ever seen so its challenging to get perfectly even torque across the entire chip. If you do not, some of the memory controller pins do not connect, and some of your RAM vanishes. So before you send RAM back, try re-doing the CPU retention.
    So good to know. The same thing happened with the TR 2950X. Memory disappeared and then suddenly after pulling them out and putting them back in, they reappeared, and over and over. I'll redo my thermal compound while I'm at it.
    Red Epic-W/Red Epic-MX
    Filmaker, Software Technical Architect & Serial Entrepreneur
    Reply With Quote  
     

  9. #19  
    Senior Member
    Join Date
    Jun 2017
    Posts
    1,632
    Quote Originally Posted by Jason Rivera View Post
    So good to know. The same thing happened with the TR 2950X. Memory disappeared and then suddenly after pulling them out and putting them back in, they reappeared, and over and over. I'll redo my thermal compound while I'm at it.
    Having only 1 Asrock Fatal1ty X399 Professional Gaming and not having any problems I don't know if this is a commen problem, when the same happened with the TR2950x 2 things a possible, bad RAM or bad motherboard. Normally ASUS boards are highly rated.

    When you want to know more about overclocking with your asus motherboard have a look over here

    When you want to read more about overclocking the TR in general, have a look over here https://www.guru3d.com/articles_page...review,31.html, or here https://www.anandtech.com/show/13124...950x-review/13

    With Resolve 4x16GB should be enough (with Fusion 128 works a lot better).
    Reply With Quote  
     

  10. #20  
    Senior Member DJ Meyer's Avatar
    Join Date
    Dec 2008
    Posts
    844
    Quote Originally Posted by Jason Rivera View Post
    So good to know. The same thing happened with the TR 2950X. Memory disappeared and then suddenly after pulling them out and putting them back in, they reappeared, and over and over. I'll redo my thermal compound while I'm at it.
    The good news is that once you have it locked in well it tends to be fine. But yeah after having done around 30 threadripper builds it is something that happens.
    Reply With Quote  
     

Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts