howto:libcusmm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
howto:libcusmm [2014/03/28 14:32] – oschuett | howto:libcusmm [2019/02/06 12:18] – Add merge and contribution instructions sjakobovits | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Howto Optimize Cuda Kernels for Libcusmm ====== | ====== Howto Optimize Cuda Kernels for Libcusmm ====== | ||
- | === Step 1: Go to the directory | + | **Python version required:** python3.6 |
+ | |||
+ | If you are about to autotune parameters for a new GPU (i.e. a GPU for which there are no autotuned parameters yet), please first follow [[https:// | ||
+ | |||
+ | === Step 1: Go to the libcusmm directory === | ||
< | < | ||
- | $ cd $CP2K_ROOT/src/dbcsr/cuda/libcusmm | + | $ cd dbcsr/src/acc/libsmm_acc/libcusmm |
+ | </ | ||
+ | |||
+ | === Step 2: Adapt tune_setup.py to your environment === | ||
+ | The '' | ||
+ | <code python> | ||
+ | ... | ||
+ | def gen_jobfile(outdir, | ||
+ | t = "/ | ||
+ | all_exe_src = [os.path.basename(fn) for fn in glob(outdir + t + " | ||
+ | all_exe = sorted([fn.replace(" | ||
+ | |||
+ | output = "# | ||
+ | output += "# | ||
+ | output += "# | ||
+ | output += "# | ||
+ | output += "# | ||
+ | output += "# | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += " | ||
+ | output += "cd $SLURM_SUBMIT_DIR \n" | ||
+ | output += " | ||
+ | output += " | ||
+ | for exe in all_exe: | ||
+ | output += ( | ||
+ | "srun --nodes=1 --bcast=/ | ||
+ | exe) | ||
+ | ... | ||
+ | ... | ||
</ | </ | ||
- | === Step 2: Run the script | + | === Step 3: Run the script |
- | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | + | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if the system |
< | < | ||
- | $ ./tune.py 5 8 | + | $ ./tune_setup.py 5 8 |
Found 23 parameter sets for 5x5x5 | Found 23 parameter sets for 5x5x5 | ||
Found 31 parameter sets for 5x5x8 | Found 31 parameter sets for 5x5x8 | ||
Line 37: | Line 75: | ||
tune_8x8x8.job | tune_8x8x8.job | ||
</ | </ | ||
- | For each possible parameter-set a // | + | For each possible parameter-set a // |
- | In order to parallelize the benchmarking the launchers are distributed over multiple executables. | + | In order to parallelize the benchmarking, the launchers are distributed over multiple executables. |
- | Currently, up to 10000 launchers are benchmarked by one // | + | Currently, up to 10' |
- | === Step 3: Submit Jobs === | + | === Step 4: Adapt tune_submit.py to your environment === |
+ | The script '' | ||
+ | |||
+ | === Step 5: Submit Jobs === | ||
Each tune-directory contains a job file. | Each tune-directory contains a job file. | ||
- | Since, there might be many tune-directories the convenience script '' | + | Since there might be many tune-directories, the convenience script '' |
- | When '' | + | When '' |
< | < | ||
- | $ ./submit.py | + | $ ./tune_submit.py |
tune_5x5x5: Would submit, run with " | tune_5x5x5: Would submit, run with " | ||
tune_5x5x8: Would submit, run with " | tune_5x5x8: Would submit, run with " | ||
Line 60: | Line 101: | ||
</ | </ | ||
- | Only when '' | + | Only when '' |
< | < | ||
- | $ ./submit.py doit! | + | $ ./tune_submit.py doit! |
tune_5x5x5: Submitting | tune_5x5x5: Submitting | ||
Submitted batch job 277987 | Submitted batch job 277987 | ||
Line 82: | Line 123: | ||
</ | </ | ||
- | === Step 4: Collect Results === | + | === Step 6: Collect Results === |
- | Run '' | + | Run '' |
< | < | ||
- | $ ./collect.py | + | $ ./tune_collect.py |
Reading: tune_5x5x5/ | Reading: tune_5x5x5/ | ||
Reading: tune_5x5x8/ | Reading: tune_5x5x8/ | ||
Line 102: | Line 143: | ||
Kernel_dnt_tiny(m=8, | Kernel_dnt_tiny(m=8, | ||
Kernel_dnt_tiny(m=8, | Kernel_dnt_tiny(m=8, | ||
+ | |||
+ | Wrote parameters.json | ||
</ | </ | ||
+ | |||
+ | The file '' | ||
+ | |||
+ | === Step 7: Merge new parameters with original parameter-file === | ||
+ | Run '' | ||
+ | < | ||
+ | $ ./ | ||
+ | Merging parameters.json with parameters_P100.json | ||
+ | Wrote parameters.new.json | ||
+ | </ | ||
+ | |||
+ | The file '' | ||
+ | |||
+ | === Step 8: Contribute parameters to the community === | ||
+ | |||
+ | **Contribute new optimal parameters** | ||
+ | |||
+ | Submit a pull request updating the appropriate '' | ||
+ | |||
+ | **Contribute autotuning data** | ||
+ | |||
+ | See [[https:// | ||
+ |