Values
In terms of the Value type, you can declare two values, initialize them, then add them up and place the result in a third value:
Value1f x(1.0f), y(2.0f); Value1f z = x + y;
The 1 and f portions of the value types are used here. The 1 means that this value contains a single scalar. Values can contain any fixed number of scalars, but they usually contain between one and four elements. The f stands for float. Values can contain any standard C++ type, as well as some nonstandard ones (most notably, half-precision floating-point types). Values are templated; Value1f is actually a typedef for Value<1, float>. Typedefs for up to four elements of all basic types are provided by the platform.
Programs
The previous example of using values might not make them seem particularly interesting. The computation, if inserted into a C++ function, would simply happen immediately in the same thread the rest of the function is executing in. Values become more interesting when they are combined with the Program type. This code illustrates a program definition using RapidMind:
Program add_two_numbers = RM_BEGIN { In<Value1f> x, y; Out<Value1f> z; z = x + y; } RM_END;
This trivial program captures the same computation we performed directly on values above. However, the computation does not execute immediately. Instead, it is stored in the program object add_two_numbers, and can later be used to compute the sum of two numbers. When a program object is defined, every computation on RapidMind types between the RM_BEGIN and RM_END statements is collected and stored within the object. This process happens at runtime and does not require any special preprocessing or compiler modifications. Programs can be defined in any function, but typically a program is defined in a constructor of a class encapsulating some computation.
Arrays
Adding two numbers is not a very parallel operation. Adding two arrays of numbers, however, can be parallelized assuming the arrays are large enough. RapidMind lets program objects be called on entire arrays at once:
Array<1, Value1f> a(10000), b(10000), c; for (int i = 0; i < 10000; i++) { a.write_data()[i] = ...; b.write_data()[i] = ...; } c = add_two_numbers(a, b);
The first line declares three arrays. The arrays a and b are initialized to make space for 10,000 elements each, whereas c is initially empty. Just like Value, Array is a simple class template. The first template parameter specifies the dimensionality of the array (one, two, or three), and the second parameter specifies the element type of which the array holds a collection.
In the next four lines we simply initialize our input arrays with some data. The write_data() function obtains a plain C++ pointer to the datain this case, a float*that can be used to modify the array. A similar read_data() function can be used for read-only accesses. Distinguishing between the two helps the platform understand when it needs to move data around; for example, to do a copy-on-write operation.
The last line of the snippet calls upon our program object, add_two_numbers, to perform the computation. Note how the program object is used just as though it were a C++ function. Now, even though this program takes two values as its input, and produces another value, you can call it on the entire array. This is effectively the same as calling the program once for each entry in the array; for example, by looping through the array. However, by applying the program to entire arrays at once, the parallelism is explicit. The platform knows that this computation can be executed independently for each element it computes, and therefore knows that the computation can be split across an arbitrary number of cores.