compute
The compute command provides graphable data which shows how well the CPU cores of the machine scale to a multithreaded job.
By default, the compute command uses two more threads than CPU cores, to give a good picture of the scalability up to and slightly beyond the available CPU cores.
Notes:
See below for how to run the compute command.
Graphing the results
The supplied spreadsheet can be used to visualize the results.
There are three benchmarks involved:
- SHA1 hash — a compute-intensive mathematic (integer) computation with moderate memory access; it is a reasonable proxy for many types of compute loads.
- Pure CPU — an integer math computation (fibonocci series) that involves no memory access.
- Pure Memory — copying and comparing memory; completely limited by memory speed.
- Pure integer computation;
The graph below shows how an 8-core (16 virtual core) Mac Pro Nehalem 2.93GHz scales from 1 to 32 threads, using three different tasks.
It can be seen that up to a point, scalability can be very good for most tasks, but that two cores by themselves can consume nearly all the memory bandwidth (it would be even better if a single core could consume all the bandwidth).
It can also be seen that the 16 virtual cores ( “hyperthreading”) offers zero advantage for pure integer computation as tested here. No dobut there are workloads in which there are a mix of instructions that can benefit, but pure computation is definitely not one of them.
Pure CPU
This benchmark performs no memory access at all, calculating the Fibonocci series with just a few register-based variables. This is the best case; the CPUs can run unhindered by relatively slow memory.
The hyperthreading advantage is minimal: with 9-16 thread, there is a ~5-10% reduction in time compared to 8 cores. Not impressive. Real cores are what matter.
Scalability is good: 1 core takes 7.1 times as long as 8 cores; the cores are doing their job.
Pure memory
This benchmark performs continual memory access by copying and comparing memory.
Memory bandwidth tops out with 4 cores, with even two cores consuming nearly all the bandwidth. This might be one reason that Photoshop and similar programs generally don’t scale well beyond 4 cores (except for compute intensive tasks).
But the flip side is that this graph also shows that about half of the memory bandwidth is available toa single core, and most of it to two cores, a good thing when a single-threaded program is running.
Here, 1 core takes 2.2 times as long as 4 cores. In the ideal world, one core would utilize all the available memory bandwidth, and there would be no difference with 1..N cores.
SHA1 hash
This benchmark runs the SHA1 cryptographic hash. It has moderate memory access and a lot of integer computation. It is fair to say that this represents a reasonabl approximation of an average workload.
Scalability is good: 1 core takes 7.2 times as long as 8 cores; the cores are doing their job.
Running MemoryTester compute
Using MemoryTester.app, choose , then click .
MemoryTester will run the test for different numbers of threads: a minimum of 8 threads, up to 50% more than the number of CPU cores. You can also (with the command line) specify any number of desired thrads, up to 255 threads.
When done, you can graph the results using the supplied spreadsheet, see the results shown on this page.
Command line usage
Many variations are possible, see below for useful examples. All testing is always non-descructive (if read/write is used for volumes, a temporary file is used).
stress
[--percent-cpu|-p <percent>] "100%"
[--threads|-t <num>] "16"
[--memory-per-thread|-m <size[b|K|M|G]>] "1374MB"
[--volumes|-v <all|[,<volume-name>]*>] "all"
[--read-write|-w] "true"
[--duration|-d <num>[S|M|H]]>] "8H"
Drain a laptop battery as quickly as possible (insert a DVD into the DVD drive first):
mt stress --volumes all --read-only
Run a stress test for one hour:
mt stress --duration 1H
Run a stress test but using only 50% of the CPU power:
mt stress --percent-cpu 50%
Run a test for 12 hours using 1GB memory for each of 24 threads while reading and writing to/from all volumes:
mt stress --memory-per-thread 1G --volumes all --duration 12H --threads 24 --read-write
Copyright © 2008-2010 diglloyd Inc, all rights reserved



