User Tools

Site Tools


howto:libcusmm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
howto:libcusmm [2014/03/28 14:33] oschuetthowto:libcusmm [2014/03/28 17:44] oschuett
Line 5: Line 5:
 </code> </code>
  
-=== Step 2: Run the script tune.py ===+=== Step 2: Adopt tune.py for your Environment === 
 +The ''tune.py'' script generates job files. You have to adopt the script to the environment of your supercomputer and your personal settings. 
 +<code python> 
 +... 
 +def gen_jobfile(outdir, m, n, k): 
 +    t = "/tune_%dx%dx%d"%(m,n,k) 
 +    all_exe_src = [basename(fn) for fn in glob(outdir+t+"_*_main.cu")] 
 +    all_exe = sorted([fn.replace("_main.cu", "") for fn in all_exe_src]) 
 + 
 +    output = "#!/bin/bash -l\n" 
 +    output += "#SBATCH --nodes=%d\n"%len(all_exe) 
 +    output += "#SBATCH --time=0:30:00\n" 
 +    output += "#SBATCH --account=s441\n" 
 +    output += "\n" 
 +    output += "source ${MODULESHOME}/init/sh;\n" 
 +    output += "module unload PrgEnv-cray\n" 
 +    output += "module load cudatoolkit PrgEnv-gnu\n" 
 +    output += "module list\n" 
 +    output += "cd $SLURM_SUBMIT_DIR \n" 
 +    output += "\n" 
 +    output += "date\n" 
 +    for exe in all_exe: 
 +        output += "aprun -b -n 1 -N 1 -d 8 make -j 16 %s &\n"%exe 
 +   ... 
 +</code> 
 + 
 +=== Step 3: Run the script tune.py ===
 The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type:
 <code> <code>
Line 42: Line 68:
 Currently, up to 10000 launchers are benchmarked by one //executable//. Each executable is linked together from several ''tune_*_part???.o'' and a ''tune_*_main.o''. Each part-files contains up to 100 launchers. This allows to parallelize the compilation over multiple CPU cores. Currently, up to 10000 launchers are benchmarked by one //executable//. Each executable is linked together from several ''tune_*_part???.o'' and a ''tune_*_main.o''. Each part-files contains up to 100 launchers. This allows to parallelize the compilation over multiple CPU cores.
  
-=== Step 3: Submit Jobs ===+=== Step 4: Adopt submit.py for your Environment === 
 +The script ''submit.py'' was written for the slurm batch system as used e.g. by CRAY supercomputers. If your computer runs a different batch system you have to adopt ''submit.py'' accordingly. 
 + 
 +=== Step 5: Submit Jobs ===
 Each tune-directory contains a job file. Each tune-directory contains a job file.
-Since, there might be many tune-directories the convenience script ''submit.py'' can be used. It will go through all the ''tune_*''-directories and check if it has already been submited or run. For this the script calls ''squeue'' in the background and it searches for ''slurm-*.out'' files.+Since, there might be many tune-directories the convenience script ''submit.py'' can be used. It will go through all the ''tune_*''-directories and check if it has already been submitted or run. For this the script calls ''squeue'' in the background and it searches for ''slurm-*.out'' files.
  
 When ''submit.py'' is called without arguments it will just list the jobs that could be submitted: When ''submit.py'' is called without arguments it will just list the jobs that could be submitted:
Line 82: Line 111:
 </code> </code>
  
-=== Step 4: Collect Results ===+=== Step 5: Collect Results ===
 Run ''collect.py'' to parse all log files and to determine the best kernel for each blocksize: Run ''collect.py'' to parse all log files and to determine the best kernel for each blocksize:
 <code> <code>