howto:libcusmm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
howto:libcusmm [2014/03/28 17:44] – oschuett | howto:libcusmm [2019/02/08 13:50] – describe arguments for tune_setup.y sjakobovits | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Howto Optimize Cuda Kernels for Libcusmm ====== | ====== Howto Optimize Cuda Kernels for Libcusmm ====== | ||
- | === Step 1: Go to the directory | + | **Python version required:** python3.6 |
+ | |||
+ | If you are about to autotune parameters for a new GPU (i.e. a GPU for which there are no autotuned parameters yet), please first follow [[https:// | ||
+ | |||
+ | === Step 1: Go to the libcusmm directory === | ||
< | < | ||
- | $ cd $CP2K_ROOT/src/dbcsr/cuda/libcusmm | + | $ cd dbcsr/src/acc/libsmm_acc/libcusmm |
</ | </ | ||
- | === Step 2: Adopt tune.py for your Environment | + | === Step 2: Adapt tune_setup.py to your environment |
- | The '' | + | The '' |
<code python> | <code python> | ||
... | ... | ||
- | def gen_jobfile(outdir, | + | |
- | t = "/ | + | t = "/ |
- | all_exe_src = [basename(fn) for fn in glob(outdir+t+" | + | all_exe_src = [os.path.basename(fn) for fn in glob(outdir + t + " |
- | all_exe = sorted([fn.replace(" | + | all_exe = sorted([fn.replace(" |
- | | + | |
- | output += "# | + | output += "# |
- | output += "# | + | output += "# |
- | output += "# | + | output += "# |
- | output += " | + | |
- | output += " | + | output += "# |
- | output += " | + | |
- | output += " | + | output += " |
- | output += " | + | |
- | output += "cd $SLURM_SUBMIT_DIR \n" | + | |
- | output += " | + | output += " |
- | output += " | + | output += " |
- | for exe in all_exe: | + | output += " |
- | output += "aprun -b -n 1 -N 1 -d 8 make -j 16 %s & | + | |
+ | | ||
+ | output += " | ||
+ | output += " | ||
+ | for exe in all_exe: | ||
+ | output += ( | ||
+ | | ||
+ | | ||
... | ... | ||
+ | ... | ||
</ | </ | ||
- | === Step 3: Run the script | + | === Step 3: Run the script |
- | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | + | Specify which GPU you are autotuning for by passing the appropriate '' |
+ | In addition, the script takes as arguments the blocksizes you want to add to libcusmm. For example, if the system | ||
< | < | ||
- | $ ./tune.py 5 8 | + | $ ./tune_setup.py 5 8 -p parameters_P100.json |
Found 23 parameter sets for 5x5x5 | Found 23 parameter sets for 5x5x5 | ||
Found 31 parameter sets for 5x5x8 | Found 31 parameter sets for 5x5x8 | ||
Line 63: | Line 76: | ||
tune_8x8x8.job | tune_8x8x8.job | ||
</ | </ | ||
- | For each possible parameter-set a // | + | For each possible parameter-set a // |
- | In order to parallelize the benchmarking the launchers are distributed over multiple executables. | + | In order to parallelize the benchmarking, the launchers are distributed over multiple executables. |
- | Currently, up to 10000 launchers are benchmarked by one // | + | Currently, up to 10' |
- | === Step 4: Adopt submit.py for your Environment | + | === Step 4: Adapt tune_submit.py to your environment |
- | The script '' | + | The script '' |
=== Step 5: Submit Jobs === | === Step 5: Submit Jobs === | ||
Each tune-directory contains a job file. | Each tune-directory contains a job file. | ||
- | Since, there might be many tune-directories the convenience script '' | + | Since there might be many tune-directories, the convenience script '' |
- | When '' | + | When '' |
< | < | ||
- | $ ./submit.py | + | $ ./tune_submit.py |
tune_5x5x5: Would submit, run with " | tune_5x5x5: Would submit, run with " | ||
tune_5x5x8: Would submit, run with " | tune_5x5x8: Would submit, run with " | ||
Line 89: | Line 102: | ||
</ | </ | ||
- | Only when '' | + | Only when '' |
< | < | ||
- | $ ./submit.py doit! | + | $ ./tune_submit.py doit! |
tune_5x5x5: Submitting | tune_5x5x5: Submitting | ||
Submitted batch job 277987 | Submitted batch job 277987 | ||
Line 111: | Line 124: | ||
</ | </ | ||
- | === Step 5: Collect Results === | + | === Step 6: Collect Results === |
- | Run '' | + | Run '' |
< | < | ||
- | $ ./collect.py | + | $ ./tune_collect.py |
Reading: tune_5x5x5/ | Reading: tune_5x5x5/ | ||
Reading: tune_5x5x8/ | Reading: tune_5x5x8/ | ||
Line 131: | Line 144: | ||
Kernel_dnt_tiny(m=8, | Kernel_dnt_tiny(m=8, | ||
Kernel_dnt_tiny(m=8, | Kernel_dnt_tiny(m=8, | ||
+ | |||
+ | Wrote parameters.json | ||
</ | </ | ||
+ | |||
+ | The file '' | ||
+ | |||
+ | === Step 7: Merge new parameters with original parameter-file === | ||
+ | Run '' | ||
+ | < | ||
+ | $ ./ | ||
+ | Merging parameters.json with parameters_P100.json | ||
+ | Wrote parameters.new.json | ||
+ | </ | ||
+ | |||
+ | The file '' | ||
+ | |||
+ | === Step 8: Contribute parameters to the community === | ||
+ | |||
+ | **Contribute new optimal parameters** | ||
+ | |||
+ | Submit a pull request updating the appropriate '' | ||
+ | |||
+ | **Contribute autotuning data** | ||
+ | |||
+ | See [[https:// | ||
+ |