Many SSD Benchmark Reviews Contain Flaws

I find flaws in almost every benchmark reviews I read about solid state drives, an area I know well. Whether it is the tester degrading performance with poor hardware or OS settings, or forgetting to mention crucial details that can greatly impact results, or not using his benchmarking tools correctly, or even widely-used tools that are themselves poorly written(!), these errors make many benchmark results that are published simply incorrect, and sometimes deceptive.

I am currently doing in-depth research of SSDs providing good random write IOPS, with a focus on those based on SandForce controllers, and here are a few examples of flaws in SSD benchmark reviews from major tech sites:

Out of the hundreds of reviews of the popular Intel X18-M/X25-M drive series, not a single one mentions that random write IOPS highly depends on the span of the LBA space being tested. There is a difference of almost 10x between random write IOPS measured on the full LBA space as opposed to an 8GB fraction of it: respectively 350 IOPS and 3300 IOPS. Intel themselves do not make the information easily accessible. They published it for only 1 model (first generation, 50nm): you have to browse a page for SSD resellers and click on the link named "Intel X18-M/X25-M SATA Solid State Drive - Enterprise Server/Storage Applications product manual addendum" (direct link) to access a PDF that documents the difference between measuring random write IOPS on an 8GB span as opposed to 100% span! Also, Intel generally publish 100% span numbers for enterprise-class SSD models, but 8GB span numbers for consumer-class SSD models. This spec is the only source documenting the 2 performance numbers for the same SSD model (Intel publish no equivalent spec for the second generation 34nm X18-M/X25-M drive series.)
Two benchmarking tools, CrystalDiskMark and AS SSD, are popular despite a flaw that many reviewers noticed: they report sequential read/write throughput results consistently inferior to other benchmarking tools (especially for SF-1200-based SSDs.) For example Benchmark Reviews tested the OCZ Vertex 2 120GB and these tools report 210-215MB/s while all other tools report 270-280MB/s as expected. [Update: one explanation is that this could be due to CrystalDiskMark and AS SSD being set up to write and read random data, whereas other tools use constant data, inadvertently allowing the SF-1200 controller to aggressively optimize I/O with its transparent data deduplication and compression features. If this is the case, then I shift my objection to all reviewers, including Benchmark Reviews, who fail to mention how CrystalDiskMark is configured —random or constant data— and thereby provide results that readers cannot interpret.]
In this 14-page Legit Reviews article on the OCZ Vertex 2 100GB drive, the test system is preventing the drive from showing its full potential in sequential reads: there is a bottleneck at 230MB/s when this SSD is known to reach 270MB/s as confirmed by many other reviews: Anandtech, Techspot, etc. This indicates either a problem with the test system, or aging of the SSD which engaged the wear-leveling algorithms, degrading performance.
Benchmark Reviews admits here they made the mistake of publishing several SSD benchmarks with IOPS numbers measured with an I/O queue depth of 1, instead of 32 which is the maximum allowed by NCQ and provides the maximum random IOPS performance. They were effectively measuring latency instead of true IOPS performance.
Here is a terrible use of IOmeter from Benchmark Reviews who discovers about 1k IOPS (latency: 1ms) for a top-of-the-line Crucial RealSSD C300 SSD. Even the most basic random I/O tests with a queue depth of 1 on any SSD should yield at least 10k IOPS (latency: 100us). As a matter of fact, all the IOPS results for the 11 SSDs tested in this page are suspiciously low and reflect a poor configuration of IOmeter, of which the author gives no information whatsoever.
A similar mistake is made by Anandtech in this Crucial RealSSD C300 review where the random 4kB read IOPS performance is measured with a queue depth that is too small. For example the 256GB model reports 20k IOPS (79.5MB/s) on SATA 3Gbps, when in fact it is known to be capable of 50k IOPS at the maximum queue depth of 32. The reviewer is aware of the queue depth issue, but presents results for both a short and long queue only for random write tests (not random reads).
ServeTheHome measured a sequential read throughput of 502.6MB/s [sic] over a SATA 300MB/s link. It is simply impossible to surpass the throughput of the underlying SATA link —probably another bug in CrystalDiskMark.

Benchmarking is hard. Most people get it wrong. The above flaws demonstrate that even major online tech sites do not always provide quality results. Too many of them hire young writers passionate about technology, but otherwise with little in-depth knowledge of what they are testing, and barely able to run benchmarking tools and correctly interpret results that come out of them.