General Topics





Tips and How-To



The compute command provides graphable data which shows how well the CPU cores of the machine scale to a multithreaded job.

By default, the compute command uses two more threads than CPU cores, to give a good picture of the scalability up to and slightly beyond the available CPU cores.


See below for how to run the compute command.

Graphing the results

The supplied spreadsheet can be used to visualize the results.

There are three benchmarks involved:

Most programs fail to use more than a few CPU cores, or might use only a single core.

It can be seen that up to a point, scalability can be very good for most tasks, but that two cores by themselves can consume nearly all the memory bandwidth (it would be even better if a single core could consume all the bandwidth).

It can also be seen that the 16 virtual cores ( “hyperthreading”) offers zero advantage for pure integer computation as tested here. No dobut there are workloads in which there are a mix of instructions that can benefit, but pure computation is definitely not one of them.

Click to view larger size

Pure CPU

This benchmark performs no memory access at all, calculating the Fibonocci series with just a few register-based variables. This is the best case; the CPUs can run unhindered by relatively slow memory.

The hyperthreading advantage is minimal: with 9-16 thread, there is a ~5-10% reduction in time compared to 8 cores. Not impressive. Real cores are what matter.

Scalability is good: 1 core takes 7.1 times as long as 8 cores; the cores are doing their job.

Click to view larger size

Pure memory

This benchmark performs continual memory access by copying and comparing memory.

Memory bandwidth tops out with 4 cores, with even two cores consuming nearly all the bandwidth. This might be one reason that Photoshop and similar programs generally don’t scale well beyond 4 cores (except for compute intensive tasks).

But the flip side is that this graph also shows that about half of the memory bandwidth is available to a single core, and most of it to two cores, a good thing when a single-threaded program is running.

Here, 1 core takes 2.2 times as long as 4 cores. In the ideal world, one core would utilize all the available memory bandwidth, and there would be no difference with 1..N cores.

Click to view larger size

SHA1 hash

This benchmark runs the SHA1 cryptographic hash. It has moderate memory access and a lot of integer computation. It is fair to say that this represents a reasonable approximation of an average workload.

Scalability is good: 1 core takes 7.2 times as long as 8 cores; the cores are doing their job.

Click to view larger size

Running MemoryTester compute

Using, choose Compute, then click Start.

MemoryTester will run the test for different numbers of threads: a minimum of 8 threads, up to 50% more than the number of CPU cores. You can also (with the command line) specify any number of desired threads, up to 255 threads.

When done, you can graph the results using the supplied spreadsheet, see the results shown on this page.

MemoryTester compute

Command line usage

Many variations are possible, see below for useful examples. All testing is always non-destructive (if read/write is used for volumes, a temporary file is used).

    [--percent-cpu|-p <percent>]               "100%"
    [--threads|-t <num>]                       "16"
    [--memory-per-thread|-m <size[b|K|M|G]>]   "1374MB"
    [--volumes|-v <all|[,<volume-name>]*>]     "all"
    [--read-write|-w]                          "true"
    [--duration|-d <num>[S|M|H]]>]                "8H"

Drain a laptop battery as quickly as possible (insert a DVD into the DVD drive first):

mt stress --volumes all --read-only

Run a stress test for one hour:

mt stress --duration 1H

Run a stress test but using only 50% of the CPU power:

mt stress --percent-cpu 50%

Run a test for 12 hours using 1GB memory for each of 24 threads while reading and writing to/from all volumes:

mt stress --memory-per-thread 1G --volumes all --duration 12H --threads 24 --read-write

Previous page: vm
Next page: alloc