Description of experiments

Description of experiments

Simulators

We compare the output of two simulators: (1) ppcmem (in this note “Operational”) the model introduced in Understanding POWER Multiprocesors, and (2) “Axiom”, an implementation of the proposed axiomatic model that re-uses ppcmem front-end.

Base of tests

Our present test base results from aggregating several test bases:

Some hand-written tests, written by us or borrowed from others.
The PHAT experiment, which we performed so as to validate our global time model.
The test base we used to to validate our operational model. This test base features the “VAR3” test batch that result from systematic variations on the families of tests defined in the Quick reference for litmus test families.
New tests, which mostly are systematic variations of the three basic tests MP, S and LB (see the Quick reference for litmus test families). The variation resides in systematic enumeration of sequences of intra-thread relations from one memory access to the next, including address dependencies, data dependencies, control dependencies and identity of addresses.

As a result, the present test base includes 4503 tests. We provide an archive of all test sources which are valid input for the simulator ppcmem and litmus, a tool to run tests on hardware. We give synthetic model output (Allow/Forbid) for all tests as the kind file k.txt.

Most of our tests are produced by one of the generators of the diy tool suite from cycles of candidate relaxations, a concise and precise mean to describe violations of sequential consistency, and thus relevant tests for memory model testing. The cycles are available for 5544 tests.

Experimental setting

The “Operational” and “Axiom” simulators execute OCaml code derived from lem formal specifications. As a result, the simulators can be trusted to follow model specifications. However, our executable code is not optimised for speed and the simulators do not always terminate in decent time or memory space on input that features three processors or more, complex code sequences, or numerous memory accesses.

We thus resort to distributed execution of the simulators on two clusters of AMD opteron and Intel Xeon processors. Overall we have more than 500 cores available, which we share with other users. Our experiments lasted weeks and consist in repetitively running the simulators on the complete test base under increasing processor time and memory usage limits. The following table accounts for our experiments:

	N	avg (sec.)	geom (sec.)	max (sec.)	effort (days)	memory (Gb)
Operational	3565	2562.21	126.26	2.4e+05	1024.18	40.0
Axiom	4480	1394.14	20.57	2.3e+05	82.17	4.0

The “N” column shows how many tests finally completed successfully in the allocated processor time and memory limits. We see that we achieved an (almost) complete run of the “Axiom” simulator. We also give a few statistics: the “avg” (resp. “geom”) column shows the arithmetic (resp. geometric) mean of successful runs; while the “max” column show the time of the test that takes longer to complete successfully.

The “effort” column shows the total CPU time allocated to running the simulators (including failed runs due to intentional resource limits): the total computing effort amounts to 1106.35 days of machine time. Finally, the “memory” column shows the maximum memory limit we used. All together, those figure demonstrate that our implementation of “Axiom” is one order of magnitude more efficient than the one of “Operational”. As the implementation techniques are similar, we can interpret the result at the model level: “Axiom” is inherently easier to implement efficiently than “Operational”. Also notice that some preliminary experiments suggest the possibility of dramatic efficiency improvements for the “Axiom” model by using SMT/SAT solving.