PDA

View Full Version : Workstation Benchmark Rankings



Jon Thomasberg
12-10-2011, 02:15 AM
[Set aside for future Rankings List info here]

Jon Thomasberg
12-10-2011, 02:16 AM
....also known as measuring-up, mine is bigger than yours, etc. I move that we all have a nice friendly competition for the fastest workstation. I will (attempt) to keep the Rankings List.

This will serve 2 purposes:
1) To have fun with it and see how your machine stacks up to others here on the REDUSER forum; and,
2) To perhaps shed some insight from those systems that perform better than others for anyone looking to make new hardware purchasing decisions.

I know there are other leaderboards of every system in the world based on one benchmark app. This one is just for the REDUSER community. Feel free to add to the suggestion box for benchmarks to be included.

I'll start with the system I just built for editing + color grading. Ive been all Mac for the last 5 yrs, so this is my first WinPC in as many years:

i7-3930k (3.2GHz, overclocked stable to 5.0GHz), Corsair H100 closed-loop hydro CPU cooler
Asus P9X79 Deluxe motherboard
32 GB (4x8) Corsair Vengence quad DDR3-1866-CL9 RAM
***MSI n580GTX Lightning Extreme 3GB video card (will be; now its an old nVidia GeForce7900GS w/ 256MB (from 4 yrs ago) until next week when the new one comes in.)
No RedRocket
2x OCZ Vertex 3 240GB SSD (in RAID 0 config -- benching sustained 1000MB/sec) as OS drive
1x 500GB WD SATA2 7200RPM, until prices come back to reality for multiple WD RE4 2TBs
Pioneer BDR
Cooler Master HAF X case w/ 4x 200mm + 1x 140mm case fans + 2x 120mm fans for push-pull config on the H100 CPU radiator
Corsair AX1200 Gold PSU
OS = Windows7 Pro x64 (w/ all patches & ServicePacks up-to-date)

Benchmarks: (as of today, 2011/12/10)
Cinebench 11.5 = no OpenGL tested; multiCPU = 14.44
GeekBench2_x64Win = 28247

and of course -- pix or it didn't happen :)

https://lh4.googleusercontent.com/-LtOcYryW4ZQ/TuMutvEm82I/AAAAAAAADgk/iD0d6jW6sQ4/s1152/Benchmarks_DivaProd1.PNG

Bruce Allen
12-10-2011, 03:29 PM
NICE Cinebench score!

BTW Resolve Beta for Win is on Blackmagic's site! Looking forward to your numbers, with it when you get your GPU, good man!

We need a ResolveBench...

Also, how loud is it and what is the power draw? You guys are tempting me...

Bruce Allen
www.boacinema.com

Matt Gottshalk
12-10-2011, 03:54 PM
Tagged.

Jon Thomasberg
12-10-2011, 04:02 PM
Bruce,

I haven't checked it on a decibel meter, but I can tell you it is VERY quiet. The large diameter fans make all the difference in the world. Moves lots of CFM without sounding like it will take off. Honestly, the Epic while in capture mode is about the same db level. Epic in standby is significantly louder than this computer.

Also, while stable at relative load. After a few hours running PRIME 95 and AIDA Extreme to burn it in with 100% CPU, I decided to dial it back to 4.9 GHz. This changed the Cinebench score to a still very respectable 14.33, and as of yet I haven't rerun Geekbench on it. But it is perfectly stable at 4.9GHz running AIDA full-bore for the last 12 hours straight. Core temps max are 82 C, avg 74 C. Case is at avg 22 C. Which is only +1 or 2 C over my room ambient temp.

I forgot to add that I am running a Corsair Gold AX1200 PSU, but I have not metered it to see the total current draw. I will once I add the GTX580, since right now it would be pointless with the puny card that's in there.

On a side note, I am able to playback in slightly faster than realtime in RC-X Pro beta8 @ 1/2 debayer on 4kHD R3Ds, no HDRx with the current setup (and no Rocket). But also noticed that my main 6 CPU cores are only hitting up to 55% utilization and not using the hyperthreads at all. Perhaps because the video card can't keep up at this point. More to come.

Is there such a benchmark either within Resolve or as a standalone?

Subhadip Sen
12-10-2011, 10:17 PM
That is undoubtedly an immense Cinebench score! Once you get your GTX 580 try running your R3Ds through Premiere Pro - I have a feeling you will get real-time playback at full-res (i.e. 4K). Matching your monitor's resolution, which I am guessing is 2560x1440/1600, should be a piece of cake!

Les Dittert
12-10-2011, 10:36 PM
I built a similar system. On premiere pro, there is no way in hell you will get real time 4k playback. The Red sdk that people are using for the r3d decode and demosaic is not very multi threaded and therefor crippled. The gpu does little to help, as it assists in other image processing *after* the sdk makes an rgb image for the app.
Even if the sdk was allowed to use all cores, it still wouldn't give you fast playback, as the jpeg2000 is hard to decode. The gpu could help with the demosaic, but that is not possible at the moment.

I will be working with S3D, so I built this fast machine to try to help. I am ok with half res for editing and color, so that helps a bit. There is no debayer when looking at half res.

-Les


That is undoubtedly an immense Cinebench score! Once you get your GTX 580 try running your R3Ds through Premiere Pro - I have a feeling you will get real-time playback at full-res (i.e. 4K). Matching your monitor's resolution, which I am guessing is 2560x1440/1600, should be a piece of cake!

Jon Thomasberg
12-11-2011, 12:17 AM
I built a similar system. ......

Les, care to share the system specs and your results?


That is undoubtedly an immense Cinebench score! Once you get your GTX 580 try running your R3Ds through Premiere Pro - I have a feeling you will get real-time playback at full-res (i.e. 4K). Matching your monitor's resolution, which I am guessing is 2560x1440/1600, should be a piece of cake!

Thanks Subhadip. It seems pretty impressive so far. I will share my findings once it is all completed.

Les Dittert
12-11-2011, 01:23 AM
Asus p9x79 pro mobo, the 3.2 cpu OC to 4.3 ( I like to keep things cool , stays at 40 C ). Threw away the H100 fans, they are not compatible with the Asus PWM fan control.
With 4 pin PWM fans, the system is very quiet, 700 rpm fans at idle !
Sandra memory bandwidth 40 GB/sec .... this is where the system helps, in image processing the memory bandwidth is critical.

On benchmarking, still waiting to see how to use redline to saturate the cores and ACTUALLY USE THE MACHINE .... frustrating. I'll probably end up transcoding to another wavelet codec using spare cycles on the render farm at work.
A hundred i7's with GTX580's should do the trick ;)

-Les

Mike P.
12-11-2011, 06:05 PM
Is there a memory bandwidth benchmark for Mac?

Also, don't OCZ's suffer from incompressible data bottlenecks (of which video definitely is, even RED footy), or was that just the Vertex 2s and earlier?

Carter Cammack
12-11-2011, 08:22 PM
Maybe Sir Jon of Thomasberg has the answer to this one. I posted on another thread and nobody answered.

Can anyone confirm the "huge performance gain" of SSD Caching on the latest Intel X79 systems?
The newer ASUS PGX79 boards allow for more caching than the Windows default limitation.
I'm wondering if I can put my page file and cache on a 90G Corsair GT and really utilize the 6GB/S speeds.

Jon Thomasberg
12-12-2011, 09:55 PM
Hi Carter,

I have yet to use the SSD caching option on the asus mobo. My config is using 2 SSDs on the Intel SATA3 ports. The SSD Caching requires a single SSD and single HDD on each of the Marvell chipset ports to function. It dedicates that SSD to caching the HDD. I didn't want to waste an SSD on cache exclusively.

Normally, I would have my pagefile.sys on a separate drive from the OS, but since I have 2 SSDs RAID 0 and getting >1000MB/sec, I just manually locked the pagefile size, rather than letting Windows manage it, and left it on the OS drive. If/when I add another SSD, I will likely put the pagefile on it and use it for the SCRATCH volume on editing. But as it is, nothing hits capacity of my 32 GB of RAM as it is.

Overall though, the SSD Caching feature is not really useful since most of what we do is uncompressed sequential reads/writes. These aren't heavily cached anyways. If you want that kind of speed, use SSDs exclusively for that purpose OR get an Areca hardware-based controller and a bunch of HDDs to get the performance out of them.

But to directly answer your question: No, I can neither confirm or deny that Asus SSD caching works.

Side note: As Jeff Kilgroe pointed out in another thread, HDD manufacturers are now selling SATA3 6G HDDs, that's a joke. No single spinning drive can hit, much less sustain those speeds, and the onboard caches are not any larger. Waste of money over the SATA2 3GB versions.

Jon Thomasberg
12-15-2011, 11:52 AM
Little bit of an update on my workstation build:

RC-X Pro beta8 Win7 edition:
-Full Res:
--no 5k realtime playback (not even usable -- quite annoying, in fact)
--no 4kHD realtime playback (plays, but hiccups every 5 seconds)

-Half Res:
--no 5k realtime playback
--no 4kHD realtime playback (plays but hiccups every 20 seconds)

HOWEVER, on PremierPro CS5.5.1:
-Full Res:
--no 5k realtime playback (plays back but skips a lot of frames, plays approx 4-5 fps)
--no 4kHD realtime playback (plays back but skips some frames, plays approx 12-15fps)

-Half Res:
--5k realtime playback !!!!
--4kHD realtime playback !!!!

Also, rendering for FULL RES playback in timeline preview I tested are:
-5k clip, no HDRx, 24 fps 00:02:58:19 (4291 frames) = 4min 25sec
-4kHD clip, no HDRx, 24 fps 00:03:25:13 (4933 frames) = 2min 48 sec

Notes:
-In RCx Pro Beta 8, even after I maxed out the performance settings, I could not get it to use more than 40% load on my CPU cores. Most of the the time fluctuating between 32-40%. Also, the load was not equally distributed among all 12 threads. Thus, many of the threads were ~10% utilization. Memory usage never even came close to maxed-out /saturation on either system RAM or Video RAM on my GTX580. So it looks like a code thing in RCX Pro that is limiting it from taking advantage of all the horsepower.

-In PP CS5.5.1, it hit and maintained 90% CPU load +/- 3% throughout rendering, and was well-balanced across all processor threads. Again, at no point did I come even remotely close to saturating the System RAM or VRAM.

Also, monitoring the SATA throughput, my 2 RAID-0 OCZ Vertex 3 SSDs performed flawlessly and never hit saturation.

Overall, I am very pleased with this build. If there are any test that you all would like run, feel free to ask. Hopefully this will help you all in basing decisions for any new workstation builds you all are contemplating.

One downfall to this, after using this workstation, my decked-out iMac 27" seems pathetically slow in comparison.

Tony Lorentzen
12-17-2011, 04:46 PM
In RCx Pro Beta 8, even after I maxed out the performance settings, I could not get it to use more than 40% load on my CPU cores. Most of the the time fluctuating between 32-40%. Also, the load was not equally distributed among all 12 threads. Thus, many of the threads were ~10% utilization. Memory usage never even came close to maxed-out /saturation on either system RAM or Video RAM on my GTX580. So it looks like a code thing in RCX Pro that is limiting it from taking advantage of all the horsepower.

Urgh. That is a major disappointment. Why the hell isn't RCX leveraging all the power it can? Why no GPU/CUDA support? Because of RED Rocket, I guess.

Mike P.
12-17-2011, 05:29 PM
Urgh. That is a major disappointment. Why the hell isn't RCX leveraging all the power it can? Why no GPU/CUDA support? Because of RED Rocket, I guess.

Whoa whoa... I mean, yes, that's a logical reason NOT to implement GPU/CUDA (or even proper multi-threading support), but I think the real reason is solely complexity. When it comes to using GPUs for general processing, both AMD/ATi and nVidia have their own way of doing things, which sucks because it's not a one-code-is-efficient-on-all-graphics-hardware situation, meaning RED would have to write different sets of code for ATi and nVidia (bleh.) And when it comes to multi-threading/multi-CPUs, it's actually way more difficult to write code that can split itself up properly/efficiently between the different cores/threads.

I think the easiest/quickest thing that RED should do is render different clips on different cores/threads. That way, each core/thread is handling it's own clip(s) and it'd be kind of a brute-force way of using multi-cpus. Not efficient, but still effective. I know in windows you could set the affinity of applications pretty easily, so theoretically, if you ran multiple instances of RCXp and set each one to a different core/thread, and it should work pretty good. But of course that's a bit of a pain in the ass from the end-user perspective... But still it'd be 4 or 8 times faster than just setting up a single batch transcode on one core (it also assumes you have more than one clip that needs transcoding and they all use the same look settings.)

Les Dittert
12-17-2011, 05:53 PM
Writing code to multithread is not that hard when you have discreet frames. You just fire off a separate thread for each of several frames. There is no interframe compression that I know of.
There is a question I have, however : Can an SDK caller call the SDK multiple times for the same r3d in seperate threads, to decode the frames faster? It may be that the SDK prevents that sort of activity.
Most of the work decoding r3d is the j2000 decode, that is not very easy to do on a GPU ( CUDA or open-cl ). But the demosaic is very doable in a GPU. It wouldn't speed things much, it's not the hard part.

You are free however to decode multiple clips simultaneously yourself. Just use redline commands. Not GUI but doable.
I have tripled my speed this way. So someone with an old i7 920 can 'out benchmark' a less savvy SB-e owner, as far as bulk trans-coding goes !!

-Les Dittert

Whoa whoa... I mean, yes, that's a logical reason NOT to implement GPU/CUDA (or even proper multi-threading support), but I think the real reason is solely complexity. When it comes to using GPUs for general processing, both AMD/ATi and nVidia have their own way of doing things, which sucks because it's not a one-code-is-efficient-on-all-graphics-hardware situation, meaning RED would have to write different sets of code for ATi and nVidia (bleh.) And when it comes to multi-threading/multi-CPUs, it's actually way more difficult to actual write code that can split itself up properly/efficiently between the different cores/threads.

I think the easiest/quickest thing that RED should do is render different clips on different cores/threads. That way, each core/thread is handling it's own clip(s) and it'd be kind of a brute-force way of using multi-cpus. Not efficient, but still effective. I know in windows you could set the affinity of applications pretty easily, so theoretically, if you ran multiple instances of RCXp and set each one to a different core/thread, and it should work pretty good. But of course that's a bit of a pain in the ass from the end-user perspective... But still it'd be 4 or 8 times faster than just setting up a single batch transcode on one core (it also assumes you have more than one clip that needs transcoding and they all use the same look settings.)

Mike P.
12-17-2011, 06:33 PM
Writing code to multithread is not that hard when you have discreet frames. You just fire off a separate thread for each of several frames. There is no interframe compression that I know of.
There is a question I have, however : Can an SDK caller call the SDK multiple times for the same r3d in seperate threads, to decode the frames faster? It may be that the SDK prevents that sort of activity.
Most of the work decoding r3d is the j2000 decode, that is not very easy to do on a GPU ( CUDA or open-cl ). But the demosaic is very doable in a GPU. It wouldn't speed things much, it's not the hard part.

You are free however to decode multiple clips simultaneously yourself. Just use redline commands. Not GUI but doable.
I have tripled my speed this way. So someone with an old i7 920 can 'out benchmark' a less savvy SB-e owner, as far as bulk trans-coding goes !!

-Les Dittert

That's what I wanted to know. Potentially, the more cores/threads the faster it could be; which means if you had dual octo-xeons (16c/32threads), you could render 32 clips simultaneously (aka. a brute-force method for getting 32x faster overall render times.) To me, that'd be far easier than actually making a renderer that sends each frame to a new core/thread (which, as you suggest, shouldn't be that difficult, either.) And, yeah, I totally didn't even think about it as discreet frames; that should be pretty simple to do as well.

Les, you should start a thread outlining how to pound out more clips simultaneously using Redline commands. I think I asked you in another thread how you were doing it; it'd be really helpful to a lot of people who can't afford, or are reluctant to buy, a redrocket and just getting their scarlets. If I knew how to make a simple GUI for the process I would; a simple list of the clips and their corresponding RMD(s) and go. As I said, it'd be ~8x faster than just using a single core if you have a Quad-CPU with Hyperthreading.

If the demosaic is easily doable on the GPU it would at least allow full-res 4k/5k playback of .r3ds in the NLE. Not really necessary, but still helpful.

Be careful what you call "old"; I'm rocking a 920 at 4.2GHz and it still whips the lama's ass in most cases :) I was actually thinking about getting a used 970 (because it's 1366 hexacore) to hold me over until octocores become available... I could just drop it in, and bam, instantly go from 8 to 12 threads for a mere ~$300... Alas, if it doesn't clock to 4.2GHz+, it might actually be slower overall.

Subhadip Sen
12-17-2011, 10:38 PM
Thanks Jon. There always seems to be some issue somewhere stepping up to 5K full-res - it brings about a sharp drop. On a Core i7 970 overclocked, 4K material is about 15fps at full-res, so I was expecting something more with the 3930K. Particularly impressive that 4KHD full-res renders faster than real-time. Either way - half-res suits me just fine. At 5K half-res, it maxes out the resolution of my monitor. So unless you have a 4K monitor, full-res playback is unnecessary. And by the time 4K monitors are more affordable, I hope the RED SDK is optimized and of course, CPU performance will advance further, to make real-time at 4K possible. Of course, if only GPU debayering could be worked out... It's true that it will cannibalize Rocket sales though.


One downfall to this, after using this workstation, my decked-out iMac 27" seems pathetically slow in comparison.

Indeed! Probably doesn't cost much more either.

Tony Lorentzen
12-17-2011, 11:09 PM
Don't get me wrong - I love RED in many aspects, but they are also a company that needs to make money and not cannibalize on their own products. But by keeping all of the debayer stuff inside their own SDK (which is fully understandable from an IP standpoint) they are also 'crippling' the post community and making themselves the bottleneck in keeping up with technological advancements in areas such as fully utilizing the power of current generation GPUs. I'm 110% sure it's possible to get full debayer of R3Ds in 5K with some of the latest gen nvidia cards. Theres quite a few JPeg2000 GPU projects out there - some of which are leveraging CUDA technology. We all know the Red Rocket card wasn't developed by RED do the question is if we will se a Red Rocket II or if they will be 'forced' to go in another direction. We need 5K realtime playback soon!

Jon Thomasberg
12-17-2011, 11:29 PM
So far I have tried to merely present facts and benchmarks without adding much commentary. But GPU accelerated debayer would be great ultimately, but I was thinking more along the lines that it would seem reasonable if they would unlock/optimize RCX Pro to even use merely the CPU cores and threads to their potential. However, if they were able to enlist MPE into the mix, I am confident I would be able to render 5k realtime (or close to it) given my current build.

RedRocket is still viable and useful for those with systems that cannot handle such load without the assistance of the card, but it would seem that if the system were able to handle the load natively on its own merit, the SDK shouldn't hamper/throttle its performance just to sell proprietary J2000 cards. Given how gracious Red has been with free firmware updates and free software, I highly doubt that this is Red's motive, as some have alluded. With that said, I am not Rob, nor is my strength in coding, but it seems reasonable that with some tweaking of code RCX Pro should be able to enable full CPU usage for software-based debayer.

Mike P.
12-18-2011, 10:27 AM
I was kind of suggesting this as well - focus on CPU rather than GPU in the short term - simply because everyone has a multi-thread cpu these days and it would be easier than having to write two separate code for both ATI AND nVidia graphics pipelines.

Conversely, if it were to happen an RR would be obsolete pretty quickly, even on older systems, because it would be cheaper to buy a brand-new computer system with even the highest-end hardware, than to buy a RR.


So far I have tried to merely present facts and benchmarks without adding much commentary. But GPU accelerated debayer would be great ultimately, but I was thinking more along the lines that it would seem reasonable if they would unlock/optimize RCX Pro to even use merely the CPU cores and threads to their potential. However, if they were able to enlist MPE into the mix, I am confident I would be able to render 5k realtime (or close to it) given my current build.

RedRocket is still viable and useful for those with systems that cannot handle such load without the assistance of the card, but it would seem that if the system were able to handle the load natively on its own merit, the SDK shouldn't hamper/throttle its performance just to sell proprietary J2000 cards. Given how gracious Red has been with free firmware updates and free software, I highly doubt that this is Red's motive, as some have alluded. With that said, I am not Rob, nor is my strength in coding, but it seems reasonable that with some tweaking of code RCX Pro should be able to enable full CPU usage for software-based debayer.

Jon, what are your full-quality/full-res/full-debayer transcode times with 4kHD? I'm confused by the numbers presented (2:48min render for a 3:23min clip); isn't that just going from r3d to r3d (which isn't really rendering anything, other than playing it back and saving it as a r3d file)? I was under the impression that a RR makes going from .r3d to any other file format (particularly ProRes, since that's usually what clients want footage delivered on) is where the card really shines, since it can do it in real-time. If you're saying you've already got that transcoding to be faster than real-time with your new SB-e system than I really don't see the need for an RR.

Stacey Spears
12-18-2011, 10:55 AM
Why no GPU/CUDA support? Because of RED Rocket, I guess.

The RR does both decoding and debayer. GPUs perform poorly at decoding any type of compressed video. AMD, Intel, and NVIDIA have a special video decoding HW (H.264, MPEG2, and VC-1) block that is part of the GPU. When Vista shipped, ATI had removed the special HW block from that generation of GPU to cut costs. The GPU performed worse than pure software decode and this was for MPEG-2! It killed DVD playback at that time. The next gen GPU added the special HW block back.

GPUs do well with some tasks and poor with others. CABAC is an example of something that works better using a CPU than a GPU. CABAC is the entropy coder that is part of H.264.

Les Dittert
12-18-2011, 11:29 AM
My proposal for a benchmark is to convert a 4k clip that everyone can find online to 1920 sized dpx.
It is common to finish at that resolution, and the dpx format is very easy to encode, so the benchmark is really the r3d decode and debayer.
The resize to 1920 is pretty easy too, some apps will harness the gpu for that part of the task.
The overall number we want to see is render (transcode) frames per second.

One thing I want to clarify: transcoding r3d is more than debayer. It is a compression decode and then a debayer. Debayer is easier than decompression .
like Stacey said, the jpeg 2000 that is the compression used is not a good speedup candidate for gpu code.

So who can point me to a 4k r3d that we can start with ??? Anything on a Red site maybe ?

-Les Dittert

Les Dittert
12-18-2011, 11:01 PM
So I found this r3d file, 222 frames of hockey footage, 4k x 2k
http://www.tomguilmette.com/wp/download/43/

Imported into prem cs5.5, no color correction, just a motion scale 47% to get it into a 1920 sized frame. Max bit depth and max render quality.
Rendering to a DPX sequence ( no log conversion ) took 75 seconds. That's 2.96 FPS render speed out of Premiere .
RCX took 96 seconds.

Jon Thomasberg
12-19-2011, 12:42 PM
Tested render out of 5k@24fps @ RC9:1 R3D 00:02:59.03 sequence / Export times in PP5.5.2

to: 4kHD 23.976 DPX Full/Max = 28min 47.0sec
to: 1920x1080 23.976 DPX Full/Max = 10min 24.2sec
to: 1920x1080 23.976 H.264 Blu-Ray Max/VBR1pass bitrate25-30/MaxRenderQ = 41mins 0sec

If I get time I'll test RCX Pro b8 and 4kHD clips also and let you know what I find.

Will Keir
12-30-2011, 02:18 AM
I notice on your rig you are using windows 7 pro. Is there any reason to get windows 7 ultimate? Thanks for this thread, will check back often.

Jon Thomasberg
12-31-2011, 04:49 PM
No, there is no need for Ultimate unless you want multi-language packs,software-based hard drive encryption and a couple other little odds and ends.

http://windows.microsoft.com/en-US/windows7/products/compare?T1=tab15

M.D. Hilton
02-27-2012, 11:39 AM
Jon,

I noticed you use WD RE4 2TB and not the Black Caviar 2TB drive - is there a reason? I've seen a few benchmark tests between the two where the Caviar Black beat the RE4 (though they were close.) Is there something the RE4 does better in terms of editing that I'm unaware of?

I'm putting together a system - not too different that what you put together - and this is one of my last decisions. Right now the Caviar Black drives are cheaper than the RE4 so if there's not much of a difference I'd prefer to go the cheaper route.

Jon Thomasberg
03-02-2012, 01:04 PM
MDH,

The RE4 is optimized for use in a RAID, whereas the Caviar Black isn't. HTH

Jon

Stefan Antonescu
03-04-2012, 08:20 PM
Great thread !

Jon, why did you use a SSD RAID 0 config for your system drive, rather then just a single one ? Did you use the onboard controller for that ?

Would you recommand using the onboard controler for a media RAID 0 config or going with a third party card ?