User Tools

Site Tools


performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
performance [2016/11/24 16:25] – [Fayalite-FIST] ggibbperformance [2020/11/10 13:29] (current) – removed Noctua LiH-HFX results because faulty rschade
Line 5: Line 5:
 The purpose of the CP2K benchmark suite is to provide performance which can be used to guide users towards the best configuration (e.g. machine, number of MPI processors, number of OpenMP threads) for a particular problem, and give a good estimation for the parallel performance of the code for different types of method. Five benchmarks are provided: ''H2O-64'', ''Fayalite-FIST'', ''LiH-HFX'', ''H2O-DFT-LS'' and ''H2O-64-RI-MP2''. Descriptions of each benchmark along with performance figures are below. The purpose of the CP2K benchmark suite is to provide performance which can be used to guide users towards the best configuration (e.g. machine, number of MPI processors, number of OpenMP threads) for a particular problem, and give a good estimation for the parallel performance of the code for different types of method. Five benchmarks are provided: ''H2O-64'', ''Fayalite-FIST'', ''LiH-HFX'', ''H2O-DFT-LS'' and ''H2O-64-RI-MP2''. Descriptions of each benchmark along with performance figures are below.
  
-We encourage you to contribute benchmark results from your own local cluster or HPC system - just run the inputs and add timings in the relevant sections below.  Python scripts for generating the scaling graphs are provided [[src>cp2k/tools/benchmark_plots/]].  Please also update the [[performance:systems|list of machines]] for which benchmark data is provided.+We encourage you to contribute benchmark results from your own local cluster or HPC system - just run the inputs and add timings in the relevant sections below.  Python scripts for generating the scaling graphs are provided [[src>tools/benchmark_plots/]].  Please also update the [[performance:systems|list of machines]] for which benchmark data is provided.
  
 If you have any questions or problems running benchmarks or using the scripts please contact Iain Bethune (<ibethune@epcc.ed.ac.uk>). If you have any questions or problems running benchmarks or using the scripts please contact Iain Bethune (<ibethune@epcc.ed.ac.uk>).
Line 23: Line 23:
 === Description === === Description ===
  
-//Ab-initio// molecular dynamics of liquid water using the Born-Oppenheimer approach, using [[Quickstep]] DFT. Production quality settings for the basis sets (TZV2P) and the planewave cutoff (280 Ry) are chosen, and the Local Density Approximation (LDA) is used for the calculation of the Exchange-Correlation energy.  The configurations were generated by classical equilibration, and the initial guess of the electronic density is made based on Atomic Orbitals.  The system contains 64 water molecules (192 atoms, 512 electrons) in a 12.4 Å<sup>3</sup> cell.+//Ab-initio// molecular dynamics of liquid water using the Born-Oppenheimer approach, using [[Quickstep]] DFT. Production quality settings for the basis sets (TZV2P) and the planewave cutoff (280 Ry) are chosen, and the Local Density Approximation (LDA) is used for the calculation of the Exchange-Correlation energy.  The configurations were generated by classical equilibration, and the initial guess of the electronic density is made based on Atomic Orbitals.  The system contains 64 water molecules (192 atoms, 512 electrons) in a 12.4 Å<sup>3</sup> cell and MD is run for 10 steps.
  
 === Availability === === Availability ===
  
 The benchmark is available (along with other water systems) from the CP2K source distribution: The benchmark is available (along with other water systems) from the CP2K source distribution:
-[[src>cp2k/tests/QS/benchmark/]]+[[src>benchmarks/QS/]]
  
 === Results === === Results ===
Line 34: Line 34:
 The best configurations are shown below. Click the links to see more detail. The best configurations are shown below. Click the links to see more detail.
  
-^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ +^ Machine Name ^ Architecture ^ Date ^ Git Commit ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ 
-| HECToR | Cray XE6 | 21/1/2014 | 13196 | 39.066 | 512 cores | 2 OMP threads per MPI task | [[performance:hector-h2o-64]] | +| HECToR    | Cray XE6   | 21/01/2014 | [[commit>82b8204]] | 39.066 | 512 cores | 2 OMP threads per MPI task | [[performance:hector-h2o-64]] | 
-| ARCHER | Cray XC30 | 8/1/2014 | 13473 | 18.11 | 576 cores | 1 OMP thread per MPI task | [[performance:archer-h2o-64]] | +| ARCHER    | Cray XC30  08/01/2014 | [[commit>292a983]] | 18.11 | 576 cores | 1 OMP thread per MPI task | [[performance:archer-h2o-64]] | 
-| Magnus | Cray XC40 | 22/10/2014 | 14377 | 17.275 | 384 cores | 1 OMP thread per MPI task | [[performance:magnus-h2o-64]] | +| Magnus    | Cray XC40  | 22/10/2014 | [[commit>27eacee]] | 17.275 | 384 cores | 1 OMP thread per MPI task | [[performance:magnus-h2o-64]] | 
-| Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 19.885 | 192 cores | 1 OMP thread per MPI task, no GPU | [[performance:piz-daint-h2o-64]] | +| Piz Daint | Cray XC30  | 12/05/2015 | [[commit>f439118]] | 19.885 | 192 cores | 1 OMP thread per MPI task, no GPU | [[performance:piz-daint-h2o-64]] | 
-| Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 15.560 | 1152 cores | 9 OMP threads per MPI task | [[performance:cirrus-h2o-64]] |+| Cirrus    | SGI ICE XA | 24/11/2016 | [[commit>989a92c]] | 15.560 | 1152 cores | 9 OMP threads per MPI task | [[performance:cirrus-h2o-64]] | 
 +| Noctua    | Cray CS500 | 25/09/2019 | [[commit>9f58d81]] | 13.3 | 640 cores | 10 OMP thread per MPI task | [[performance:noctua-h2o-64]] | 
 ==== Fayalite-FIST ==== ==== Fayalite-FIST ====
  
Line 49: Line 51:
  
 The benchmark is available from the CP2K source distribution: The benchmark is available from the CP2K source distribution:
-[[src>cp2k/tests/Fist/benchmark/]]+[[src>benchmarks/Fist/]]
  
 === Results === === Results ===
Line 55: Line 57:
 The best configurations are shown below. Click the links to see more detail. The best configurations are shown below. Click the links to see more detail.
  
-^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ +^ Machine Name ^ Architecture ^ Date ^ Git Commit ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ 
-| HECToR | Cray XE6 | 21/1/2014 | 13196 | 403.928 | 2048 cores | 4 OMP threads per MPI task | [[performance:hector-fayalite-fist]] | +| HECToR    | Cray XE6   | 21/01/2014 | [[commit>82b8204]] | 403.928 | 2048 cores | 4 OMP threads per MPI task | [[performance:hector-fayalite-fist]] | 
-| ARCHER | Cray XC30 | 9/1/2014 | 13473 | 197.117 | 576 cores | 6 OMP threads per MPI task | [[performance:archer-fayalite-fist]] | +| ARCHER    | Cray XC30  09/01/2014 | [[commit>292a983]] | 197.117 | 576 cores | 6 OMP threads per MPI task | [[performance:archer-fayalite-fist]] | 
-| Magnus | Cray XC40 | 6/11/2014 | 14377 | 150.493 | 768 cores | 6 OMP threads per MPI task | [[performance:magnus-fayalite-fist]] | +| Magnus    | Cray XC40  06/11/2014 | [[commit>27eacee]] | 150.493 | 768 cores | 6 OMP threads per MPI task | [[performance:magnus-fayalite-fist]] | 
-| Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 207.972 | 512 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-fayalite-fist]] | +| Piz Daint | Cray XC30  | 12/05/2015 | [[commit>f439118]] | 207.972 | 512 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-fayalite-fist]] | 
-| Cirrus | SGI ICE XA | 24/11/2016 | 17566 | 166.192 | 576 cores | 2 OMP threads per MPI task | [[performance:cirrus-fayalite-fist]] |+| Cirrus    | SGI ICE XA | 24/11/2016 | [[commit>989a92c]] | 166.192 | 576 cores | 2 OMP threads per MPI task | [[performance:cirrus-fayalite-fist]] | 
 +| Noctua    | Cray CS500 | 25/09/2019 | [[commit>9f58d81]] | 119.820 | 2560 cores | 10 OMP thread per MPI task | [[performance:noctua-fayalite-fist]] |
  
 ==== LiH-HFX ==== ==== LiH-HFX ====
Line 66: Line 69:
 === Description === === Description ===
  
-This is a single-point energy calculation using [[Quickstep]] GAPW (Gaussian and Augmented Plane-Waves) with hybrid Hartree-Fock exchange. It consists of a 216 atom Lithium Hydride crystal with 432 electrons in a 12.3 Å<sup>3</sup> cell. These types of calculations are generally around one hundred times the computational cost of a standard local DFT calculation, although this can be reduced using the Auxiliary Density Matrix Method (ADMM). Uing OpenMP is of particular benefit here as the HFX implementation requires a large amount of memory to store partial integrals. By using several threads, fewer MPI processes share the available memory on the node and thus enough memory is available to avoid recomputing any integrals on-the-fly, improving performance.+This is a single-point energy calculation using [[Quickstep]] GAPW (Gaussian and Augmented Plane-Waves) with hybrid Hartree-Fock exchange. It consists of a 216 atom Lithium Hydride crystal with 432 electrons in a 12.3 Å<sup>3</sup> cell. These types of calculations are generally around one hundred times the computational cost of a standard local DFT calculation, although this can be reduced using the Auxiliary Density Matrix Method (ADMM). Using OpenMP is of particular benefit here as the HFX implementation requires a large amount of memory to store partial integrals. By using several threads, fewer MPI processes share the available memory on the node and thus enough memory is available to avoid recomputing any integrals on-the-fly, improving performance.
  
 === Availability === === Availability ===
  
-The benchmark is available from [[src>cp2k/tests/QS/benchmark_HFX/LiH/]].+The benchmark is available from [[src>benchmarks/QS_LiH_HFX/]].
  
 === Results === === Results ===
Line 76: Line 79:
 The best configurations are shown below. Click the links to see more detail. The best configurations are shown below. Click the links to see more detail.
  
-^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ +^ Machine Name ^ Architecture ^ Date ^ Git Commit ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ 
-| HECToR | Cray XE6 | 21/1/2014 | 13196 | 121.362 | 65536 cores | 8 OMP threads per MPI task | [[performance:hector-lih-hfx]] | +| HECToR    | Cray XE6   | 21/01/2014 | [[commit>82b8204]] (*) | 121.362 | 65536 cores | 8 OMP threads per MPI task | [[performance:hector-lih-hfx]] | 
-| ARCHER | Cray XC30 | 9/1/2014 | 13473 | 51.172 | 49152 cores | 6 OMP threads per MPI task | [[performance:archer-lih-hfx]] | +| ARCHER    | Cray XC30  09/01/2014 | [[commit>292a983]] (*) | 51.172 | 49152 cores | 6 OMP threads per MPI task | [[performance:archer-lih-hfx]] | 
-| Magnus | Cray XC40 | 10/11/2014 | 14377 | 62.075 | 24576 cores | 4 OMP threads per MPI task | [[performance:magnus-lih-hfx]] | +| Magnus    | Cray XC40  | 10/11/2014 | [[commit>27eacee]] (*) | 62.075 | 24576 cores | 4 OMP threads per MPI task | [[performance:magnus-lih-hfx]] | 
-| Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 66.051 | 32768 cores | 4 OMP threads per MPI task, no GPU | [[performance:piz-daint-lih-hfx]] | +| Piz Daint | Cray XC30  | 12/05/2015 | [[commit>f439118]] | 66.051 | 32768 cores | 4 OMP threads per MPI task, no GPU | [[performance:piz-daint-lih-hfx]] | 
 +| Cirrus    | SGI ICE XA | 24/11/2016 | [[commit>989a92c]] | 483.676 | 2016 cores | 6 OMP threads per MPI task | [[performance:cirrus-lih-hfx]] |
  
 +(*) Prior to r14945, a bug resulted in an underestimation of the number of ERIs which should be computed (by roughly 50% for this benchmark.  Therefore these results cannot be compared directly with later ones.
 ==== H2O-DFT-LS ==== ==== H2O-DFT-LS ====
  
Line 91: Line 95:
 === Availability === === Availability ===
  
-The benchmark input file is {{performance:h2o-dft-ls-4.inp.gz|available here}}.  It is a slightly modified version of the more general one in the CP2K SVN at, where the problem size can be tuned by a parameter NREP: [[src>cp2k/tests/QS/benchmark_DM_LS/H2O-dft-ls.inp]]. +The benchmark input file used to generate these results is {{performance:h2o-dft-ls-4.inp.gz|available here}}. 
 + 
 +It is a slightly modified version of the more general one in the CP2K github at [[src>benchmarks/QS_DM_LS/H2O-dft-ls.inp]], where the problem size can be tuned by a parameter NREP
  
 === Results === === Results ===
Line 97: Line 103:
 The best configurations are shown below. Click the links to see more detail. The best configurations are shown below. Click the links to see more detail.
  
-^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ +^ Machine Name ^ Architecture ^ Date ^ Git Commit ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ 
-| HECToR | Cray XE6 | 16/1/2014 | 13196 | 98.256 | 65536 cores | 8 OMP threads per MPI task | [[performance:hector-h2o-dft-ls]] | +| HECToR    | Cray XE6   | 16/01/2014 | [[commit>82b8204]] | 98.256 | 65536 cores | 8 OMP threads per MPI task | [[performance:hector-h2o-dft-ls]] | 
-| ARCHER | Cray XC30 | 8/1/2014 | 13473 | 28.476 | 49152 cores | 4 OMP threads per MPI task | [[performance:archer-h2o-dft-ls]] | +| ARCHER    | Cray XC30  08/01/2014 | [[commit>292a983]] | 28.476 | 49152 cores | 4 OMP threads per MPI task | [[performance:archer-h2o-dft-ls]] | 
-| Magnus | Cray XC40 | 3/12/2014 | 14377 | 30.921 | 24576 cores | 2 OMP threads per MPI task | [[performance:magnus-h2o-dft-ls]] | +| Magnus    | Cray XC40  03/12/2014 | [[commit>27eacee]] | 30.921 | 24576 cores | 2 OMP threads per MPI task | [[performance:magnus-h2o-dft-ls]] | 
-| Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 27.900 | 32768 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-dft-ls]] | +| Piz Daint | Cray XC30  | 12/05/2015 | [[commit>f439118]] | 27.900 | 32768 cores | 2 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-dft-ls]] | 
 +| Cirrus    | SGI ICE XA | 24/11/2016 | [[commit>989a92c]] | 543.032 | 2016 cores | 2 OMP threads per MPI task | [[performance:cirrus-h2o-dft-ls]] | 
 +| Noctua    | Cray CS500 | 25/09/2019 | [[commit>9f58d81]] | 37.730 | 10240 cores | 10 OMP thread per MPI task | [[performance:noctua-h2o-dft-ls]] |
  
 ==== H2O-64-RI-MP2 ==== ==== H2O-64-RI-MP2 ====
Line 112: Line 119:
 === Availability === === Availability ===
  
-The benchmark is in the CP2K SVN at: [[src>cp2k/tests/QS/benchmark_mp2_rpa/64-H2O/]].+The benchmark is in the CP2K github at: [[src>benchmarks/QS_mp2_rpa/64-H2O/]].
  
 === Results === === Results ===
Line 118: Line 125:
 The best configurations are shown below. Click the links to see more detail. The best configurations are shown below. Click the links to see more detail.
  
-^ Machine Name ^ Architecture ^ Date ^ SVN Revision ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ +^ Machine Name ^ Architecture ^ Date ^ Git Commit ^ Fastest time (s) ^ Configuration ^^ Detailed results ^ 
-| HECToR | Cray XE6 | 13/1/2014 | 13196 | 141.633 | 49152 cores | 8 OMP threads per MPI task | [[performance:hector-h2o-64-ri-mp2]] | +| HECToR    | Cray XE6   | 13/01/2014 | [[commit>82b8204]] | 141.633 | 49152 cores | 8 OMP threads per MPI task | [[performance:hector-h2o-64-ri-mp2]] | 
-| ARCHER | Cray XC30 | 9/1/2014 | 13473 | 83.945 | 36864 cores | 4 OMP threads per MPI task | [[performance:archer-h2o-64-ri-mp2]] | +| ARCHER    | Cray XC30  09/01/2014 | [[commit>292a983]] | 83.945 | 36864 cores | 4 OMP threads per MPI task | [[performance:archer-h2o-64-ri-mp2]] | 
-| Magnus | Cray XC40 | 4/11/2014 | 14377 | 63.891 | 24576 cores | 6 OMP threads per MPI task | [[performance:magnus-h2o-64-ri-mp2]] | +| Magnus    | Cray XC40  04/11/2014 | [[commit>27eacee]] | 63.891 | 24576 cores | 6 OMP threads per MPI task | [[performance:magnus-h2o-64-ri-mp2]] | 
-| Piz Daint | Cray XC30 | 12/05/2015 | 15268 | 48.15 | 32768 cores | 8 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-64-ri-mp2]] |+| Piz Daint | Cray XC30  | 12/05/2015 | [[commit>f439118]] | 48.15 | 32768 cores | 8 OMP threads per MPI task, no GPU | [[performance:piz-daint-h2o-64-ri-mp2]] | 
 +| Cirrus    | SGI ICE XA | 24/11/2016 | [[commit>989a92c]] | 303.571 | 2016 cores | 1 OMP thread per MPI task | [[performance:cirrus-h2o-64-ri-mp2]] | 
 +| Noctua    | Cray CS500 | 25/09/2019 | [[commit>9f58d81]] | 82.571 | 10240 cores | 2 OMP thread per MPI task | [[performance:noctua-h2o-64-ri-mp2]] | 
performance.1480004739.txt.gz · Last modified: 2020/08/21 10:15 (external edit)