howto:libcusmm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
howto:libcusmm [2019/02/06 11:22] – Reflect changes brought by PR #137 to DBCSR repo sjakobovits | howto:libcusmm [2019/02/06 12:18] – Add merge and contribution instructions sjakobovits | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Howto Optimize Cuda Kernels for Libcusmm ====== | ====== Howto Optimize Cuda Kernels for Libcusmm ====== | ||
**Python version required:** python3.6 | **Python version required:** python3.6 | ||
- | If you are about to autotune parameters for a new GPU (i.e. a GPU for which there are no autotuned parameters yet), please first follow | + | |
+ | If you are about to autotune parameters for a new GPU (i.e. a GPU for which there are no autotuned parameters yet), please first follow | ||
=== Step 1: Go to the libcusmm directory === | === Step 1: Go to the libcusmm directory === | ||
Line 12: | Line 13: | ||
<code python> | <code python> | ||
... | ... | ||
- | def gen_jobfile(outdir, | + | |
- | t = "/ | + | t = "/ |
- | all_exe_src = [basename(fn) for fn in glob(outdir+t+" | + | all_exe_src = [os.path.basename(fn) for fn in glob(outdir + t + " |
- | all_exe = sorted([fn.replace(" | + | all_exe = sorted([fn.replace(" |
- | | + | |
- | output += "# | + | output += "# |
- | output += "# | + | output += "# |
- | output += "# | + | output += "# |
- | output += " | + | |
- | output += " | + | output += "# |
- | output += " | + | |
- | output += " | + | output += " |
- | output += " | + | |
- | output += "cd $SLURM_SUBMIT_DIR \n" | + | |
- | output += " | + | output += " |
- | output += " | + | output += " |
- | for exe in all_exe: | + | output += " |
- | output += "aprun -b -n 1 -N 1 -d 8 make -j 16 %s & | + | |
+ | | ||
+ | output += " | ||
+ | output += " | ||
+ | for exe in all_exe: | ||
+ | output += ( | ||
+ | | ||
+ | | ||
... | ... | ||
+ | ... | ||
</ | </ | ||
=== Step 3: Run the script tune_setup.py === | === Step 3: Run the script tune_setup.py === | ||
- | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | + | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if the system |
< | < | ||
$ ./ | $ ./ | ||
Line 69: | Line 78: | ||
In order to parallelize the benchmarking, | In order to parallelize the benchmarking, | ||
- | Currently, up to 10000 launchers are benchmarked by one // | + | Currently, up to 10' |
=== Step 4: Adapt tune_submit.py to your environment === | === Step 4: Adapt tune_submit.py to your environment === | ||
- | The script '' | + | The script '' |
=== Step 5: Submit Jobs === | === Step 5: Submit Jobs === | ||
Line 114: | Line 123: | ||
</ | </ | ||
- | === Step 5: Collect Results === | + | === Step 6: Collect Results === |
- | Run '' | + | Run '' |
< | < | ||
$ ./ | $ ./ | ||
Line 134: | Line 143: | ||
Kernel_dnt_tiny(m=8, | Kernel_dnt_tiny(m=8, | ||
Kernel_dnt_tiny(m=8, | Kernel_dnt_tiny(m=8, | ||
+ | |||
+ | Wrote parameters.json | ||
</ | </ | ||
+ | |||
+ | The file '' | ||
+ | |||
+ | === Step 7: Merge new parameters with original parameter-file === | ||
+ | Run '' | ||
+ | < | ||
+ | $ ./ | ||
+ | Merging parameters.json with parameters_P100.json | ||
+ | Wrote parameters.new.json | ||
+ | </ | ||
+ | |||
+ | The file '' | ||
+ | |||
+ | === Step 8: Contribute parameters to the community === | ||
+ | |||
+ | **Contribute new optimal parameters** | ||
+ | |||
+ | Submit a pull request updating the appropriate '' | ||
+ | |||
+ | **Contribute autotuning data** | ||
+ | |||
+ | See [[https:// | ||
+ |