Differences

This shows you the differences between two versions of the page.

--- howto:libcusmm [2014/03/28 14:33] – oschuett
+++ howto:libcusmm [2014/03/28 17:44] – oschuett
@@ Line 5: / Line 5: @@
 </code>
-=== Step 2: Run the script tune.py ===
+=== Step 2: Adopt tune.py for your Environment ===
+The ''tune.py'' script generates job files. You have to adopt the script to the environment of your supercomputer and your personal settings.
+<code python>
+...
+def gen_jobfile(outdir, m, n, k):
+    t = "/tune_%dx%dx%d"%(m,n,k)
+    all_exe_src = [basename(fn) for fn in glob(outdir+t+"_*_main.cu")]
+    all_exe = sorted([fn.replace("_main.cu", "") for fn in all_exe_src])
+    output = "#!/bin/bash -l\n"
+    output += "#SBATCH --nodes=%d\n"%len(all_exe)
+    output += "#SBATCH --time=0:30:00\n"
+    output += "#SBATCH --account=s441\n"
+    output += "\n"
+    output += "source ${MODULESHOME}/init/sh;\n"
+    output += "module unload PrgEnv-cray\n"
+    output += "module load cudatoolkit PrgEnv-gnu\n"
+    output += "module list\n"
+    output += "cd $SLURM_SUBMIT_DIR \n"
+    output += "\n"
+    output += "date\n"
+    for exe in all_exe:
+        output += "aprun -b -n 1 -N 1 -d 8 make -j 16 %s &\n"%exe
+   ...
+</code>
+=== Step 3: Run the script tune.py ===
 The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type:
 <code>
@@ Line 42: / Line 68: @@
 Currently, up to 10000 launchers are benchmarked by one //executable//. Each executable is linked together from several ''tune_*_part???.o'' and a ''tune_*_main.o''. Each part-files contains up to 100 launchers. This allows to parallelize the compilation over multiple CPU cores.
-=== Step 3: Submit Jobs ===
+=== Step 4: Adopt submit.py for your Environment ===
+The script ''submit.py'' was written for the slurm batch system as used e.g. by CRAY supercomputers. If your computer runs a different batch system you have to adopt ''submit.py'' accordingly.
+=== Step 5: Submit Jobs ===
 Each tune-directory contains a job file.
-Since, there might be many tune-directories the convenience script ''submit.py'' can be used. It will go through all the ''tune_*''-directories and check if it has already been submited or run. For this the script calls ''squeue'' in the background and it searches for ''slurm-*.out'' files.
+Since, there might be many tune-directories the convenience script ''submit.py'' can be used. It will go through all the ''tune_*''-directories and check if it has already been submitted or run. For this the script calls ''squeue'' in the background and it searches for ''slurm-*.out'' files.
 When ''submit.py'' is called without arguments it will just list the jobs that could be submitted:
@@ Line 82: / Line 111: @@
 </code>
-=== Step 4: Collect Results ===
+=== Step 5: Collect Results ===
 Run ''collect.py'' to parse all log files and to determine the best kernel for each blocksize:
 <code>