How We Test Computers
We review a lot of computers at CNET, and we’ve been doing it for a long time. Over the years, some of the methodology has changed, but our core commitment to in-depth product reviews has not. Our review process for laptops, desktops, tablets and other computer-like devices consists of two parts: performance testing under controlled conditions in the CNET Labs and extensive hands-on use by our expert reviewers. This includes evaluating a device’s aesthetics, ergonomics and features. A final review verdict is a combination of both those objective and subjective judgments.Â
When a computer — typically a laptop, desktop, two-in-one hybrid or Chromebook — arrives at the CNET Labs, we set it up as a typical user of the product would. As a best practice, during the setup, we disable as much of the invasive privacy and data collection options as possible. Then we update the OS, GPU drivers, BIOS and manufacturer utilities as needed and use applications like Sandra from SiSoftware, CPUID’s CPU-Z, TechPowerUp’s GPU-Z and so on to gather information about the system’s components, such as the CPU, GPU, RAM, SSD and mainboard.Â
Our benchmark tests consist of a core set we run on every compatible system plus also an extended set of tests for specific use cases, such as gaming or content creation, where systems may have more powerful GPUs or higher-resolution displays that need to be evaluated.Â
The list of benchmarking software we use changes over time as the devices we test evolve. The most important core tests we’re currently running on every compatible computer are:
Primate Labs Geekbench 5 and 6
We run both single-core and multicore CPU tests, and either the Vulkan (Windows) or Metal (MacOS) Compute test. On Android, Apple devices and Chromebooks, we run the CPU tests and the Compute test. Geekbench’s CPU tests measure the performance of a mixed workload. (We run both versions of the benchmark to be able to compare to models tested before Geekbench 6 became available.)
Cinebench R23
We run both the single-core and multicore tests on Windows and MacOS devices. Cinebench measures pure CPU processing performance for 3D rendering.
PCMark 10
We’re phasing out this Windows benchmark, but at the moment still run the last-generation version, which simulates a wide range of functions, including web browsing, video conferencing, photo editing, video editing and more.Â
3DMark Wild Life Extreme
We run this test on MacOS (Apple silicon), Windows, Android and iPadOS systems; it’s one of the few cross-platform benchmarks available to test graphics performance. We additionally run it in Unlimited mode, which eliminates screen resolution as a variable when making cross-device comparisons.
3DMark Fire Strike Ultra, Time Spy, and Port RoyalÂ
We run these tests on any system with a discrete GPU to test a system’s DirectX 11 and DirectX 12 graphics performance, which is especially important for gaming computers. We’re phasing out Port Royal, which is specifically designed to test Nvidia’s RTX raytracing performance, and switching to 3DMark’s DXR or Speed Way (the first tests DX12 Ultimate’s raytracing performance and the second tests a mixture of DX12 Ultimate’s features). We’ve also added 3DMark CPU Profiler, Storage and PCI Features tests to understand the results we see from tests with more mixed workloads.
Shadow of the Tomb Raider benchmarkÂ
This is an older game that can run well on lower-end gaming hardware. It balances the CPU and GPU loads rather than relying exclusively on the GPU, and reports how the two are used. We run the game’s built-in benchmark on systems with a discrete GPU using the Highest quality preset in 1,920×1,080 resolution.
Guardians of the Galaxy benchmark
A more modern game that can still run on lower-end gaming hardware, this measures pure GPU performance. We run the core test at 1,920×1,080-pixel resolution at High Quality, but on relevant hardware may run it at higher resolutions and higher quality (such as with full raytracing enabled) for comparison.
The Rift Breaker CPU and GPU benchmarks
Rift Breaker incorporates both action and complex simulation, which means it can rely heavily on the CPU as well as the GPU for different aspects of the game. We run the core test at 1,920×1,080-pixel resolution at High Quality, but on relevant hardware may run it at higher resolutions and higher quality (such as with full raytracing enabled) for comparison.
UL Procyon benchmarks
If a system meets the baseline requirements to run Adobe Premiere Pro and Photoshop with Photoshop Lightroom Classic, we use these two benchmarks at 1,920×1,080-pixel resolution to measure a system’s suitability for content creation. They also provide a picture of how mixed CPU and GPU loads are handled, unlike pure GPU benchmarks.
Battery life test
For all computers with a battery, we change the settings to keep the system from going to sleep or hibernating, disable pop-ups and notifications that may interfere with the test, and set screen brightness and volume (output to earbuds) to 50%. We then stream a looped, custom YouTube video over Wi-Fi in Chrome and use a timer app to track how long the system remains active.
JetStream 2, MotionMark and WebXPRT 3
We run these browser-based tests to evaluate Chromebook performance, and occasionally run them on Windows systems for comparison.
Additional testingÂ
We may run a number of additional tests or variations on the standard tests; for instance, we’ll run Geekbench and Cinebench on battery power to be able to see the impact a laptop’s power-saving settings have on performance. For systems with powerful components we may run loops of other benchmarks to see how stable the system is and how hot the components may run under a full load.
Discretionary testing can also include DLSS 2 and 3 (on Nvidia), FidelityFX Super Resolution 2.x (on AMD) or XeSS (for Intel hardware) game upscaling and optimization technologies in 3DMark as well as in games that support them. For systems with midrange and above GPUs, we sometimes run SpecViewPerf 2020 (professional content creation and analysis beyond photo and video editing) or anecdotal testing with OBS Studio (streaming).
As part of a review, we usually include a comparison chart of scores from relevant tests across comparable products. When we make a major change to testing, such as moving from one version of a test to another, we double-test both versions or the entire old and new set to build up a database of comparison data.
We’re currently evaluating two more sets of benchmarks for inclusion in our test suite: UL Procyon’s recently added AI Inferencing tests and Returnal, a game with high-end graphics and an informative benchmark.
With so many computers using the same handful of CPUs and GPUs, the same operating systems and similar amounts of RAM and storage, these benchmark results usually fit in with our expectations. That means, by looking at the specs for a system we can get a reasonable idea of how it’ll perform relative to systems with better or lesser specs. It’s when we compare against systems with similar specs that a particular brand may stand out as good, not-so-good or just set. With laptops especially, performance reflects a manufacturer’s decisions about where to allocate power, even when plugged it. It’s become especially murky, because there can be multiple ways to change settings or automated “AI”-driven setting adjustments that make it impossible to know what’s really going on.
Source: CNET