howto:libcusmm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
howto:libcusmm [2014/03/28 14:33] – oschuett | howto:libcusmm [2019/02/06 11:22] – Reflect changes brought by PR #137 to DBCSR repo sjakobovits | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Howto Optimize Cuda Kernels for Libcusmm ====== | ====== Howto Optimize Cuda Kernels for Libcusmm ====== | ||
- | === Step 1: Go to the directory | + | **Python version required:** python3.6 |
+ | If you are about to autotune parameters for a new GPU (i.e. a GPU for which there are no autotuned parameters yet), please first follow these instructions. | ||
+ | |||
+ | === Step 1: Go to the libcusmm directory === | ||
< | < | ||
- | $ cd $CP2K_ROOT/src/dbcsr/cuda/libcusmm | + | $ cd dbcsr/src/acc/libsmm_acc/libcusmm |
+ | </ | ||
+ | |||
+ | === Step 2: Adapt tune_setup.py to your environment === | ||
+ | The '' | ||
+ | <code python> | ||
+ | ... | ||
+ | def gen_jobfile(outdir, | ||
+ | t = "/ | ||
+ | all_exe_src = [basename(fn) for fn in glob(outdir+t+" | ||
+ | all_exe = sorted([fn.replace(" | ||
+ | |||
+ | output = "# | ||
+ | output += "# | ||
+ | output += "# | ||
+ | output += "# | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += "cd $SLURM_SUBMIT_DIR \n" | ||
+ | output += " | ||
+ | output += " | ||
+ | for exe in all_exe: | ||
+ | output += "aprun -b -n 1 -N 1 -d 8 make -j 16 %s & | ||
+ | ... | ||
</ | </ | ||
- | === Step 2: Run the script | + | === Step 3: Run the script |
The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | ||
< | < | ||
- | $ ./tune.py 5 8 | + | $ ./tune_setup.py 5 8 |
Found 23 parameter sets for 5x5x5 | Found 23 parameter sets for 5x5x5 | ||
Found 31 parameter sets for 5x5x8 | Found 31 parameter sets for 5x5x8 | ||
Line 37: | Line 66: | ||
tune_8x8x8.job | tune_8x8x8.job | ||
</ | </ | ||
- | For each possible parameter-set a // | + | For each possible parameter-set a // |
- | In order to parallelize the benchmarking the launchers are distributed over multiple executables. | + | In order to parallelize the benchmarking, the launchers are distributed over multiple executables. |
Currently, up to 10000 launchers are benchmarked by one // | Currently, up to 10000 launchers are benchmarked by one // | ||
- | === Step 3: Submit Jobs === | + | === Step 4: Adapt tune_submit.py to your environment === |
+ | The script '' | ||
+ | |||
+ | === Step 5: Submit Jobs === | ||
Each tune-directory contains a job file. | Each tune-directory contains a job file. | ||
- | Since, there might be many tune-directories the convenience script '' | + | Since there might be many tune-directories, the convenience script '' |
- | When '' | + | When '' |
< | < | ||
- | $ ./submit.py | + | $ ./tune_submit.py |
tune_5x5x5: Would submit, run with " | tune_5x5x5: Would submit, run with " | ||
tune_5x5x8: Would submit, run with " | tune_5x5x8: Would submit, run with " | ||
Line 60: | Line 92: | ||
</ | </ | ||
- | Only when '' | + | Only when '' |
< | < | ||
- | $ ./submit.py doit! | + | $ ./tune_submit.py doit! |
tune_5x5x5: Submitting | tune_5x5x5: Submitting | ||
Submitted batch job 277987 | Submitted batch job 277987 | ||
Line 82: | Line 114: | ||
</ | </ | ||
- | === Step 4: Collect Results === | + | === Step 5: Collect Results === |
- | Run '' | + | Run '' |
< | < | ||
- | $ ./collect.py | + | $ ./tune_collect.py |
Reading: tune_5x5x5/ | Reading: tune_5x5x5/ | ||
Reading: tune_5x5x8/ | Reading: tune_5x5x8/ |