Benchmarking my Java code using JUnitBenchmarks

As part of building some integration tests for my Java service code, I wanted to get some (micro-)benchmarks run against the tests. In C# we have the likes of NBenchmark (see my post Using NBench for performance testing), so it comes as no surprise to find libraries such as JUnitBenchmarks in Java.

Note: the JUnitBenchmarks site states it’s now deprecated in favour of using JMH, but I will cover it here anyway as it’s very simple to use and get started with and fits nicely in with existing JUnit code.

JUnitBenchmarks

First off we need to add the required dependency to our pom.xml, so add the following

<dependency>
   <groupId>com.carrotsearch</groupId>
   <artifactId>junit-benchmarks</artifactId>
   <version>0.7.2</version>
   <scope>test</scope>
</dependency>

JUnitBenchmarks, as the name suggests, integrates with JUnit. To enable our tests within the test runner we simply add a rule to the unit test, like this

public class SampleVerticleIntegrationTests {
    @Rule
    public TestRule benchmarkRule = new BenchmarkRule();

   // tests
}

This will report information on the test, like this

[measured 10 out of 15 rounds, threads: 1 (sequential)]
round: 1.26 [+- 1.04], round.block: 0.00 [+- 0.00], 
round.gc: 0.00 [+- 0.00], 
GC.calls: 5, GC.time: 0.19, 
time.total: 25.10, time.warmup: 0.00, 
time.bench: 25.10

The first line tells us that the test was actually executed 15 times (or rounds), but only 10 times was it “measured” the other 5 times were warm-ups all on a single thread – this is obviously the default for benchmarking, however what if we want to change these parameters…

If we want to be more specific about the benchmarking of various test methods we add the annotation @BenchmarkOptions, for example

@BenchmarkOptions(benchmarkRounds = 20, warmupRounds = 0)
@Test
public void testSave() {
   // our code
}

As can be seen, this is a standard test but the annotation tells JUnitBenchmarks will run the test 20 times (with no warm-up runs) and then report the benchmark information, for example

[measured 20 out of 20 rounds, threads: 1 (sequential)]
round: 1.21 [+- 0.97], round.block: 0.00 [+- 0.00], 
round.gc: 0.00 [+- 0.00], 
GC.calls: 4, GC.time: 0.22, 
time.total: 24.27, time.warmup: 0.00, 
time.bench: 24.27

As you can see the first line tells us the code was measured 20 times on a single thread with no warm-ups (as we specified).

I’m not going to cover build integration here, but checkout JUnitBenchmarks: Build Integration for such information.

What do the results actually mean?

I’ll pretty much recreate what’s on Class Result here.

Let’s look at these results…

[measured 20 out of 20 rounds, threads: 1 (sequential)]
round: 1.21 [+- 0.97], round.block: 0.00 [+- 0.00], 
round.gc: 0.00 [+- 0.00], 
GC.calls: 4, GC.time: 0.22, 
time.total: 24.27, time.warmup: 0.00, 
time.bench: 24.27

We’ve already seen that the first line tells us how many times the test was run, and how many of those runs were warm-ups. It also tells us how many threads were used in this benchmark.

round tells us the average round time in seconds (hence the example took 1.21 seconds with a stddev of +/- 0.97 seconds).
round.block tells us the average (and stddev) of blocked threads, in this example there’s no concurrency hence 0.00.
round.gc tells us the average and stddev of the round’s GC time.
GC.calls tells us the number of times GC was invoked (in this example 4 times).
GC.time tels us the accumulated time take invoking the GC (0.22 seconds in this example).
time.total tells us the total benchmark time which includes benchmarking and GC overhead.
time.warmup tells us the total warmup time which includes benchmarking and GC overhead.

Caveats

Apart from the obvious caveat that this library has been marked as deprecated (but I feel it’s still useful), when benchmarking you have to be aware that, the results may be dependent upon outside factors, such as memory available, maybe hard disk/SSD speed if tests include any file I/O, network latency etc. So such figures are best seen as approximates of performance etc.

Also there’s seems to be no way to “fail” a test, for example if the test exceeds a specified time or more GC’s than x are seen, so treat these more as informational.