User Tools

Site Tools


performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
performance [2017/03/24 16:13] 79.64.85.68performance [2019/04/28 13:19] – [H2O-64-RI-MP2] 2a02:908:2221:ede0:7194:cbb5:ee3c:6a7d
Line 5: Line 5:
 The purpose of the CP2K benchmark suite is to provide performance which can be used to guide users towards the best configuration (e.g. machine, number of MPI processors, number of OpenMP threads) for a particular problem, and give a good estimation for the parallel performance of the code for different types of method. Five benchmarks are provided: ''H2O-64'', ''Fayalite-FIST'', ''LiH-HFX'', ''H2O-DFT-LS'' and ''H2O-64-RI-MP2''. Descriptions of each benchmark along with performance figures are below. The purpose of the CP2K benchmark suite is to provide performance which can be used to guide users towards the best configuration (e.g. machine, number of MPI processors, number of OpenMP threads) for a particular problem, and give a good estimation for the parallel performance of the code for different types of method. Five benchmarks are provided: ''H2O-64'', ''Fayalite-FIST'', ''LiH-HFX'', ''H2O-DFT-LS'' and ''H2O-64-RI-MP2''. Descriptions of each benchmark along with performance figures are below.
  
-We encourage you to contribute benchmark results from your own local cluster or HPC system - just run the inputs and add timings in the relevant sections below.  Python scripts for generating the scaling graphs are provided [[src>cp2k/tools/benchmark_plots/]].  Please also update the [[performance:systems|list of machines]] for which benchmark data is provided.+We encourage you to contribute benchmark results from your own local cluster or HPC system - just run the inputs and add timings in the relevant sections below.  Python scripts for generating the scaling graphs are provided [[src>tools/benchmark_plots/]].  Please also update the [[performance:systems|list of machines]] for which benchmark data is provided.
  
 If you have any questions or problems running benchmarks or using the scripts please contact Iain Bethune (<ibethune@epcc.ed.ac.uk>). If you have any questions or problems running benchmarks or using the scripts please contact Iain Bethune (<ibethune@epcc.ed.ac.uk>).
Line 23: Line 23:
 === Description === === Description ===
  
-//Ab-initio// molecular dynamics of liquid water using the Born-Oppenheimer approach, using [[Quickstep]] DFT. Production quality settings for the basis sets (TZV2P) and the planewave cutoff (280 Ry) are chosen, and the Local Density Approximation (LDA) is used for the calculation of the Exchange-Correlation energy.  The configurations were generated by classical equilibration, and the initial guess of the electronic density is made based on Atomic Orbitals.  The system contains 64 water molecules (192 atoms, 512 electrons) in a 12.4 Å<sup>3</sup> cell.+//Ab-initio// molecular dynamics of liquid water using the Born-Oppenheimer approach, using [[Quickstep]] DFT. Production quality settings for the basis sets (TZV2P) and the planewave cutoff (280 Ry) are chosen, and the Local Density Approximation (LDA) is used for the calculation of the Exchange-Correlation energy.  The configurations were generated by classical equilibration, and the initial guess of the electronic density is made based on Atomic Orbitals.  The system contains 64 water molecules (192 atoms, 512 electrons) in a 12.4 Å<sup>3</sup> cell and MD is run for 10 steps.
  
 === Availability === === Availability ===
  
 The benchmark is available (along with other water systems) from the CP2K source distribution: The benchmark is available (along with other water systems) from the CP2K source distribution:
-[[src>cp2k/tests/QS/benchmark/]]+[[src>tests/QS/benchmark/]]
  
 === Results === === Results ===
Line 40: Line 40:
 | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 19.885 | 192 cores | 1 OMP thread per MPI task, no GPU | [[performance:piz-daint-h2o-64]] | | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 19.885 | 192 cores | 1 OMP thread per MPI task, no GPU | [[performance:piz-daint-h2o-64]] |
 | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 15.560 | 1152 cores | 9 OMP threads per MPI task | [[performance:cirrus-h2o-64]] | | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 15.560 | 1152 cores | 9 OMP threads per MPI task | [[performance:cirrus-h2o-64]] |
 +| Noctua | Cray CS500 | 27/04/2019 | 3cf5f249 | 16.5 | 320 cores | 1 OMP thread per MPI task | [[performance:noctua-h2o-64]] |
 +
 ==== Fayalite-FIST ==== ==== Fayalite-FIST ====
  
Line 49: Line 51:
  
 The benchmark is available from the CP2K source distribution: The benchmark is available from the CP2K source distribution:
-[[src>cp2k/tests/Fist/benchmark/]]+[[src>tests/Fist/benchmark/]]
  
 === Results === === Results ===
Line 61: Line 63:
 | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 207.972 | 512 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-fayalite-fist]] | | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 207.972 | 512 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-fayalite-fist]] |
 | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 166.192 | 576 cores | 2 OMP threads per MPI task | [[performance:cirrus-fayalite-fist]] | | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 166.192 | 576 cores | 2 OMP threads per MPI task | [[performance:cirrus-fayalite-fist]] |
 +| Noctua | Cray CS500 | 27/04/2019 | 3cf5f249 | 139.177 | 320 cores | 1 OMP thread per MPI task | [[performance:noctua-fayalite-fist]] |
  
 ==== LiH-HFX ==== ==== LiH-HFX ====
Line 70: Line 73:
 === Availability === === Availability ===
  
-The benchmark is available from [[src>cp2k/tests/QS/benchmark_HFX/LiH/]].+The benchmark is available from [[src>tests/QS/benchmark_HFX/LiH/]].
  
 === Results === === Results ===
Line 77: Line 80:
  
 ^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ ^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^
-| HECToR | Cray XE6 | 21/1/2014 | 13196 | 121.362 | 65536 cores | 8 OMP threads per MPI task | [[performance:hector-lih-hfx]] | +| HECToR | Cray XE6 | 21/1/2014 | 13196(*) | 121.362 | 65536 cores | 8 OMP threads per MPI task | [[performance:hector-lih-hfx]] | 
-| ARCHER | Cray XC30 | 9/1/2014 | 13473 | 51.172 | 49152 cores | 6 OMP threads per MPI task | [[performance:archer-lih-hfx]] | +| ARCHER | Cray XC30 | 9/1/2014 | 13473(*) | 51.172 | 49152 cores | 6 OMP threads per MPI task | [[performance:archer-lih-hfx]] | 
-| Magnus | Cray XC40 | 10/11/2014 | 14377 | 62.075 | 24576 cores | 4 OMP threads per MPI task | [[performance:magnus-lih-hfx]] | +| Magnus | Cray XC40 | 10/11/2014 | 14377(*) | 62.075 | 24576 cores | 4 OMP threads per MPI task | [[performance:magnus-lih-hfx]] | 
-| Piz Daint | Cray XC30 | 12/05/2015 | 15268(*) | 66.051 | 32768 cores | 4 OMP threads per MPI task, no GPU | [[performance:piz-daint-lih-hfx]] | +| Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 66.051 | 32768 cores | 4 OMP threads per MPI task, no GPU | [[performance:piz-daint-lih-hfx]] | 
-| Cirrus | SGI ICE XA | 24/11/2016 | 17566(*) | 483.676 | 2016 cores | 6 OMP threads per MPI task | [[performance:cirrus-lih-hfx]] | +| Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 483.676 | 2016 cores | 6 OMP threads per MPI task | [[performance:cirrus-lih-hfx]] | 
- +| Noctua | Cray CS500 | 27/04/2019 | 3cf5f249 | 203.092 | 5120 cores | 1 OMP thread per MPI task | [[performance:noctua-lih-hfx]] |
-(*) Some time after Nov 2014, something changed resulting in around 50% more ERIs being included in the HFX calculation.  As a result, these results cannot be compared directly with previous ones.+
  
 +(*) Prior to r14945, a bug resulted in an underestimation of the number of ERIs which should be computed (by roughly 50% for this benchmark.  Therefore these results cannot be compared directly with later ones.
 ==== H2O-DFT-LS ==== ==== H2O-DFT-LS ====
  
Line 95: Line 98:
 The benchmark input file used to generate these results is {{performance:h2o-dft-ls-4.inp.gz|available here}}. The benchmark input file used to generate these results is {{performance:h2o-dft-ls-4.inp.gz|available here}}.
  
-It is a slightly modified version of the more general one in the CP2K SVN at [[src>cp2k/tests/QS/benchmark_DM_LS/H2O-dft-ls.inp]], where the problem size can be tuned by a parameter NREP. +It is a slightly modified version of the more general one in the CP2K SVN at [[src>tests/QS/benchmark_DM_LS/H2O-dft-ls.inp]], where the problem size can be tuned by a parameter NREP. 
  
 === Results === === Results ===
Line 107: Line 110:
 | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 27.900 | 32768 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-dft-ls]] | | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 27.900 | 32768 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-dft-ls]] |
 | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 543.032 | 2016 cores | 2 OMP threads per MPI task | [[performance:cirrus-h2o-dft-ls]] | | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 543.032 | 2016 cores | 2 OMP threads per MPI task | [[performance:cirrus-h2o-dft-ls]] |
 +| Noctua | Cray CS500 | 27/04/2019 | 3cf5f249 | 77.413 | 5120 cores | 1 OMP thread per MPI task | [[performance:noctua-h2o-dft-ls]] |
 +
 ==== H2O-64-RI-MP2 ==== ==== H2O-64-RI-MP2 ====
  
Line 115: Line 120:
 === Availability === === Availability ===
  
-The benchmark is in the CP2K SVN at: [[src>cp2k/tests/QS/benchmark_mp2_rpa/64-H2O/]].+The benchmark is in the CP2K SVN at: [[src>tests/QS/benchmark_mp2_rpa/64-H2O/]].
  
 === Results === === Results ===
Line 127: Line 132:
 | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 48.15 | 32768 cores | 8 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-64-ri-mp2]] | | Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 48.15 | 32768 cores | 8 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-64-ri-mp2]] |
 | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 303.571 | 2016 cores | 1 OMP thread per MPI task | [[performance:cirrus-h2o-64-ri-mp2]] | | Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 303.571 | 2016 cores | 1 OMP thread per MPI task | [[performance:cirrus-h2o-64-ri-mp2]] |
 +| Noctua | Cray CS500 | 27/04/2019 | 3cf5f249 | 101.617 | 5120 cores | 1 OMP thread per MPI task | [[performance:noctua-h2o-64-ri-mp2]] |
 +
performance.txt · Last modified: 2020/11/10 13:29 by rschade