User Tools

Site Tools


howto:libcusmm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
howto:libcusmm [2014/03/28 14:33] oschuetthowto:libcusmm [2018/12/18 13:36] alazzaro
Line 2: Line 2:
 === Step 1: Go to the directory libcusmm directory === === Step 1: Go to the directory libcusmm directory ===
 <code> <code>
-$ cd $CP2K_ROOT/src/dbcsr/cuda/libcusmm+$ cd dbcsr/src/acc/libsmm_acc/libcusmm
 </code> </code>
  
-=== Step 2: Run the script tune.py ===+=== Step 2: Adopt tune.py for your Environment === 
 +The ''tune.py'' script generates job files. You have to adopt the script to the environment of your supercomputer and your personal settings. 
 +<code python> 
 +... 
 +def gen_jobfile(outdir, m, n, k): 
 +    t = "/tune_%dx%dx%d"%(m,n,k) 
 +    all_exe_src = [basename(fn) for fn in glob(outdir+t+"_*_main.cu")] 
 +    all_exe = sorted([fn.replace("_main.cu", "") for fn in all_exe_src]) 
 + 
 +    output = "#!/bin/bash -l\n" 
 +    output += "#SBATCH --nodes=%d\n"%len(all_exe) 
 +    output += "#SBATCH --time=0:30:00\n" 
 +    output += "#SBATCH --account=s441\n" 
 +    output += "\n" 
 +    output += "source ${MODULESHOME}/init/sh;\n" 
 +    output += "module unload PrgEnv-cray\n" 
 +    output += "module load cudatoolkit PrgEnv-gnu\n" 
 +    output += "module list\n" 
 +    output += "cd $SLURM_SUBMIT_DIR \n" 
 +    output += "\n" 
 +    output += "date\n" 
 +    for exe in all_exe: 
 +        output += "aprun -b -n 1 -N 1 -d 8 make -j 16 %s &\n"%exe 
 +   ... 
 +</code> 
 + 
 +=== Step 3: Run the script tune.py ===
 The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type:
 <code> <code>
Line 42: Line 68:
 Currently, up to 10000 launchers are benchmarked by one //executable//. Each executable is linked together from several ''tune_*_part???.o'' and a ''tune_*_main.o''. Each part-files contains up to 100 launchers. This allows to parallelize the compilation over multiple CPU cores. Currently, up to 10000 launchers are benchmarked by one //executable//. Each executable is linked together from several ''tune_*_part???.o'' and a ''tune_*_main.o''. Each part-files contains up to 100 launchers. This allows to parallelize the compilation over multiple CPU cores.
  
-=== Step 3: Submit Jobs ===+=== Step 4: Adopt submit.py for your Environment === 
 +The script ''submit.py'' was written for the slurm batch system as used e.g. by CRAY supercomputers. If your computer runs a different batch system you have to adopt ''submit.py'' accordingly. 
 + 
 +=== Step 5: Submit Jobs ===
 Each tune-directory contains a job file. Each tune-directory contains a job file.
-Since, there might be many tune-directories the convenience script ''submit.py'' can be used. It will go through all the ''tune_*''-directories and check if it has already been submited or run. For this the script calls ''squeue'' in the background and it searches for ''slurm-*.out'' files.+Since, there might be many tune-directories the convenience script ''submit.py'' can be used. It will go through all the ''tune_*''-directories and check if it has already been submitted or run. For this the script calls ''squeue'' in the background and it searches for ''slurm-*.out'' files.
  
 When ''submit.py'' is called without arguments it will just list the jobs that could be submitted: When ''submit.py'' is called without arguments it will just list the jobs that could be submitted:
Line 82: Line 111:
 </code> </code>
  
-=== Step 4: Collect Results ===+=== Step 5: Collect Results ===
 Run ''collect.py'' to parse all log files and to determine the best kernel for each blocksize: Run ''collect.py'' to parse all log files and to determine the best kernel for each blocksize:
 <code> <code>