Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]
Internally, FFTW’s MPI transform algorithms work by first computing
transforms of the data local to each process, then by globally
transposing the data in some fashion to redistribute the data
among the processes, transforming the new data local to each process,
and transposing back.  For example, a two-dimensional n0 by
n1 array, distributed across the n0 dimension, is
transformd by: (i) transforming the n1 dimension, which are
local to each process; (ii) transposing to an n1 by n0
array, distributed across the n1 dimension; (iii) transforming
the n0 dimension, which is now local to each process; (iv)
transposing back.
However, in many applications it is acceptable to compute a
multidimensional DFT whose results are produced in transposed order
(e.g., n1 by n0 in two dimensions).  This provides a
significant performance advantage, because it means that the final
transposition step can be omitted.  FFTW supports this optimization,
which you specify by passing the flag FFTW_MPI_TRANSPOSED_OUT
to the planner routines.  To compute the inverse transform of
transposed output, you specify FFTW_MPI_TRANSPOSED_IN to tell
it that the input is transposed.  In this section, we explain how to
interpret the output format of such a transform.
Suppose you have are transforming multi-dimensional data with (at
least two) dimensions n0 × n1 × n2 × … × nd-1
.  As always, it is distributed along
the first dimension n0
.  Now, if we compute its DFT with the
FFTW_MPI_TRANSPOSED_OUT flag, the resulting output data are stored
with the first two dimensions transposed: n1 × n0 × n2 ×…× nd-1
,
distributed along the n1
 dimension.  Conversely, if we take the
n1 × n0 × n2 ×…× nd-1
 data and transform it with the
FFTW_MPI_TRANSPOSED_IN flag, then the format goes back to the
original n0 × n1 × n2 × … × nd-1
 array.
There are two ways to find the portion of the transposed array that resides on the current process. First, you can simply call the appropriate ‘local_size’ function, passing n1 × n0 × n2 ×…× nd-1 (the transposed dimensions). This would mean calling the ‘local_size’ function twice, once for the transposed and once for the non-transposed dimensions. Alternatively, you can call one of the ‘local_size_transposed’ functions, which returns both the non-transposed and transposed data distribution from a single call. For example, for a 3d transform with transposed output (or input), you might call:
ptrdiff_t fftw_mpi_local_size_3d_transposed(
                ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
                ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Here, local_n0 and local_0_start give the size and
starting index of the n0 dimension for the
non-transposed data, as in the previous sections.  For
transposed data (e.g. the output for
FFTW_MPI_TRANSPOSED_OUT), local_n1 and
local_1_start give the size and starting index of the n1
dimension, which is the first dimension of the transposed data
(n1 by n0 by n2).
(Note that FFTW_MPI_TRANSPOSED_IN is completely equivalent to
performing FFTW_MPI_TRANSPOSED_OUT and passing the first two
dimensions to the planner in reverse order, or vice versa.  If you
pass both the FFTW_MPI_TRANSPOSED_IN and
FFTW_MPI_TRANSPOSED_OUT flags, it is equivalent to swapping the
first two dimensions passed to the planner and passing neither
flag.)
Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]