Accelerate is an embedded language of array-based computations for high-performance computing in Haskell. Programs in Accelerate are expressed in the form of parameterised collective operations on regular multidimensional arrays, such as maps, reductions, and permutations. Accelerate then takes these computations and optimises, compiles, and executes them for the chosen target architecture.
Accelerate is an embedded language, which distinguishes between vanilla Haskell arrays and embedded language arrays, as well as the computations on each of these. Programs written in Accelerate are not compiled by the regular Haskell compiler (GHC). Rather, each Accelerate backend is a runtime compiler which generates and executes [parallel SIMD] code of the target architecture at application runtime.
Accelerate distinguishes the types of collective operations
Acc from the type of scalar operations
Exp to achieve a stratified language. Collective operations comprise many scalar computations which are executed in parallel, but scalar computations can not initiate new collective operations. This distinction excludes nested, irregular data-parallelism statically; instead, Accelerate is limited to flat data-parallelism involving only regular, multi-dimensional arrays.
The type constructor
Acc represents embedded collective array operations. A term of type
Acc a is an Accelerate program which, once executed, will produce a value of type
a (consisting of one or more arrays). Collective operations of type
Acc a comprise many scalar expressions, represented by the type
Exp, which will be executed in parallel. Although collective operations comprise many scalar operations executed in parallel, scalar operations can not initiate new collective operations. This stratification between scalar operations in
Exp and collective operations in
Acc helps to statically exclude nested data-parallelism, which is difficult to execute efficiently on constrained hardware such as GPUs.
For example, to compute a vector dot product we can write:
import Data.Array.Accelerate as A import qualified Prelude as P dotp :: Num a => Vector a -> Vector a -> Acc (Scalar a) dotp xs ys = let xs' = use xs ys' = use ys in fold (+) 0 ( zipWith (*) xs' ys' )
dotp consumes two one-dimensional arrays (
Vectors) of values, and produces a single (
Scalar) result as output. As the return type is wrapped in
Acc, we see that it is an embedded Accelerate computation---it will be evaluated in the object language of dynamically generated parallel code, rather than the meta language of vanilla Haskell.
The arguments to
dotp are plain Haskell arrays (not wrapped in
Acc). To make these arrays accessible to Accelerate computations, they must first be embedded with the
use function. This turns a regular, vanilla array, or tuple of arrays, into an Accelerate array:
use :: Arrays a => a -> Acc a
An Accelerate backed is use to evaluate the embedded computation and return the result back to vanilla Haskell. Calling the
run function of a backend will generate code for the target architecture, compile, and execute it. Currently the following backends are available:
See the Getting Started section for instructions on installing these backends.
Acc represents embedded computations that will only be executed when evaluated by a backend, we can programmatically generate computations using the meta language (Haskell): for example, unrolling loops or embedding input values directly into the generated code. See the fluid simulation program for an example.
Heads up! It is usually best to keep all intermediate computations in
Acc, and only
run the computation at the very end to produce the final result. This enables optimisations between intermediate computations (e.g. array fusion) and, if the target architecture has a separate memory space, as is the case of GPUs, to prevent excessive data transfers.
The type constructor
Exp represents embedded scalar expressions. The collective operations in Accelerate of type
Acc consist of many scalar operations of type
Exp executed in parallel.
Accelerate implements (or redefines) instances for the familiar Haskell type classes for scalar expressions. In the
dotp program above, this allows us to make use of the usual numeric operations
(*) from the
Num typeclass, applied to embedded scalar expressions.
use, to make scalar values accessible to Accelerate computations they must first be embedded with the
use :: Elt t => t -> Exp t
Tip! For constant numeric values this is often performed automatically. Notice in the
dotp program we did not need to use the
constant function to inject the initial value 0. This is because GHC applies the
fromInteger function of the
Num typeclass (or the
fromRational function of the
Fractional typeclass for floating point values) to the constant value, which implements the lifting operation for us.
Array is the core computational unit of Accelerate. Computations in Accelerate take the form of collective operations over arrays of the type
Array sh e. All programs in Accelerate take zero or more arrays as input and produce one or more arrays as output. The
Array type has two parameters:
sh: is the shape of the array, tracking the dimensionality and extent of each dimension of the array. For example,
DIM1 is the type of the shape of a one dimensional array (
DIM2 for two-dimensional arrays, and so on. See the section array shapes for more information.
e: represents the type each array element, such as
Float. See the section on allowable array element types for more information.
Array data is stored unboxed in an unzipped struct-of-array representation, and elements are laid out in row-major order (the right-most index of the shape is the fastest varying).
If the object code is executed in a separate memory space, for example on a GPU, arrays will be transferred to the target device as necessary (asynchronously and in parallel with other tasks) and cached on the device as long as sufficient memory is available.
Elt class characterises the allowable array element types, and hence the types which can appear in scalar Accelerate expressions. It roughly consists of:
Array itself is not an allowable array element type; there are no nested arrays in Accelerate, only regular multi-dimensional arrays.
Accelerate arrays consist of these simple atomic types stored efficiently in memory, as consecutive unpacked elements without pointers in an unzipped struct-of-array format.
Adding new instances to
Elt consists of explaining to Accelerate how to map between your data type and a (tuple of) primitive values. For examples see:
Operations in Accelerate consist of collective operations over arrays of type
Array sh e. Much like the repa library, the shape of an array, as well as array indices used to access individual elements, are built inductively using
(:.) (analogously to a list).
data Z = Z data tail :. head = tail :. head
Z corresponds to a shape with zero dimensions (or a
Scalar array consisting of a single element) and us used to mark the end of the list. The constructor
(:.) adds additional dimensions to the right of an index. For example:
Z :. Int
is the type of the shape of a one-dimensional array (
Vector) indexed by an
Z :. Int :. Int
is the type of the shape of a two-dimensional array indexed by an
Int in each dimension.
This style is used to construct both the type (as shown above) as well as the value of a shape. For example, a vector of ten elements has the following shape:
sh :: Z :. Int sh = Z :. 10
Heads up! The right-most index corresponds to the innermost dimension. This is the fastest-varying index, and corresponds to the elements of the array which are adjacent in memory.
The common shape and array types that we have seen above are simply type synonyms:
type DIM0 = Z type DIM1 = DIM0 :. Int type DIM2 = DIM1 :. Int -- and so on... type Scalar e = Array DIM0 e type Vector e = Array DIM1 e