A fast and precise DFT wavelet code

BigDFT Benchs

A fast and precise DFT wavelet code
Revision as of 20:48, 9 October 2013 by Genovese (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page contains a set of run performed via the BigDFT code with the aim of testing the code behaviour on different architectures. Most of the runs presented here are dummy, in the sense that the input parameters were tuned to have rapid executions on different platforms and no physical results can be extracted from the outputs. However, these runs are associated to true physical calculations that can be performed via BigDFT code. The interest of these benchs are numerous, to extract the code (or machine) behaviour under different parameters. For example, different objects of this tests can be

  • Walltime
  • Scalability (parallel efficiency)
  • Speedup (for example by comparing CPU and GPU runs

As a reference, data can be collected with the scripts provided in the tutorial about BigDFT scalability .

The idea is to keep this page updated so that any new bench can be commented.

Co - Porphyrine on Graphene Sheet (265 atoms)

"Co - Porphyrine on Graphene Sheet"

This benchmark is quite interesting as the number of orbitals is big therefore the strong scalability can be tested against a interesting range of cores. This run is a Surface BC run, therefore the Graphene dimensions are periodic. K-points are added in the surface direction. Different machines have been used for this test. The input files which can be used for the bench can be found here.

Curie Thin nodes, TGCC, France

The machine is a traditional CPU only machine basedon intel cores. To have the instructions on how to compile BigDFT on it see this link

Once the input files are ran on different number of processors, data can be collected following the scalability tutorial .

"Porphy Bench on Curie"

As it can be seen the figure shows very good scalability on a wide range of processors. For the big runs the communication is dominated by MPI_ALLTOALLV, which has known limitations on big number of cores. In other platforms the network performance can be better.

Todi XK7, CSCS supercomputer

The same benchmarck has been done on the Hybrid GPU Cray machine which is installed at Swiss Supercomputing Center in Manno. (See this link for compilation instructions).

Firstly, we have launched the BigDFT computation only for the CPU part. The run can of course use combined MPI-OMP parallelisation Then, GPU runs can be used in conjunction with MPI and OpenMP to get full profit of the MPI + OMP + OpenCL code architecture of BigDFT. As this system is rich of convolutions, the speedup that can be achieved by using GPU is quite interesting.

"Porphy Bench on Todi - CPU only"
"Porphy Bench on Todi - CPU and GPU comparison"

A full summary of all the runs made on Todi can be found in the last figure, where Speedups are measured with respect to the smallest run (8 MPI tasks multithreaded).

"Porphy Bench on Todi - full resources usage"
Personal tools