howto:libcusmm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | Next revisionBoth sides next revision | ||
howto:libcusmm [2018/12/18 13:36] – alazzaro | howto:libcusmm [2019/02/06 11:22] – Reflect changes brought by PR #137 to DBCSR repo sjakobovits | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Howto Optimize Cuda Kernels for Libcusmm ====== | ====== Howto Optimize Cuda Kernels for Libcusmm ====== | ||
- | === Step 1: Go to the directory | + | **Python version required:** python3.6 |
+ | If you are about to autotune parameters for a new GPU (i.e. a GPU for which there are no autotuned parameters yet), please first follow these instructions. | ||
+ | |||
+ | === Step 1: Go to the libcusmm directory === | ||
< | < | ||
$ cd dbcsr/ | $ cd dbcsr/ | ||
</ | </ | ||
- | === Step 2: Adopt tune.py for your Environment | + | === Step 2: Adapt tune_setup.py to your environment |
- | The '' | + | The '' |
<code python> | <code python> | ||
... | ... | ||
Line 31: | Line 34: | ||
</ | </ | ||
- | === Step 3: Run the script | + | === Step 3: Run the script |
The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | ||
< | < | ||
- | $ ./tune.py 5 8 | + | $ ./tune_setup.py 5 8 |
Found 23 parameter sets for 5x5x5 | Found 23 parameter sets for 5x5x5 | ||
Found 31 parameter sets for 5x5x8 | Found 31 parameter sets for 5x5x8 | ||
Line 63: | Line 66: | ||
tune_8x8x8.job | tune_8x8x8.job | ||
</ | </ | ||
- | For each possible parameter-set a // | + | For each possible parameter-set a // |
- | In order to parallelize the benchmarking the launchers are distributed over multiple executables. | + | In order to parallelize the benchmarking, the launchers are distributed over multiple executables. |
Currently, up to 10000 launchers are benchmarked by one // | Currently, up to 10000 launchers are benchmarked by one // | ||
- | === Step 4: Adopt submit.py for your Environment | + | === Step 4: Adapt tune_submit.py to your environment |
- | The script '' | + | The script '' |
=== Step 5: Submit Jobs === | === Step 5: Submit Jobs === | ||
Each tune-directory contains a job file. | Each tune-directory contains a job file. | ||
- | Since, there might be many tune-directories the convenience script '' | + | Since there might be many tune-directories, the convenience script '' |
- | When '' | + | When '' |
< | < | ||
- | $ ./submit.py | + | $ ./tune_submit.py |
tune_5x5x5: Would submit, run with " | tune_5x5x5: Would submit, run with " | ||
tune_5x5x8: Would submit, run with " | tune_5x5x8: Would submit, run with " | ||
Line 89: | Line 92: | ||
</ | </ | ||
- | Only when '' | + | Only when '' |
< | < | ||
- | $ ./submit.py doit! | + | $ ./tune_submit.py doit! |
tune_5x5x5: Submitting | tune_5x5x5: Submitting | ||
Submitted batch job 277987 | Submitted batch job 277987 | ||
Line 112: | Line 115: | ||
=== Step 5: Collect Results === | === Step 5: Collect Results === | ||
- | Run '' | + | Run '' |
< | < | ||
- | $ ./collect.py | + | $ ./tune_collect.py |
Reading: tune_5x5x5/ | Reading: tune_5x5x5/ | ||
Reading: tune_5x5x8/ | Reading: tune_5x5x8/ |