Parallel Algorithms
Parallel Algorithms notes from CMSC751 with Uzi Vishkin.
This class is based on his book Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques
XMT Language
XMTC is a single-program multiple-data (SPMD) extension of C.
- Spawn creates threads
- Threads expire at Join
$
represents the number of the threadPS Ri Rj
is an atomic prefix sum- Stores Ri + Rj in Ri
- Stores the original value of Ri in Rj
int x = 0; // Spawn n threads spawn(0, n-1) { int e = 1; if (A[$] != 0) { // Sets e=x and increments x ps(e,x); } }
Models
PRAM
Parallel Random-Access Machine/Model
You're given n synchronous processors each with local memory and access to a shared memory.
Each processor can write to shared memory, read to shared memory, or do computation in local memory.
You tell each processor what to do at each time step.
Types of PRAM
- exclusive-read exclusive-write (EREW)
- concurrent-read exclusive-write (CREW)
- concurrent-read concurrent-write (CRCW)
- Arbitrary CRCW - an arbitrary processor writing succeeds
- Priority CRCW - the lowest numbered processor writing succeeds
- Common CRCW - writing succeeds only if all processors write the same value
Drawbacks
- Does not reveal how the algorithm will run on PRAMs with different number of proc
- Fully specifying allocation requires an unnecessary level of detail
Work Depth
You provide a sequence of instructions. At each time step, you specify the number of parallel operations.
WD-presentation Sufficiency Theorem
Given an algorithm in WD mode that takes x=x(n) operations and d=d(n) time, the algorithm can be implemented in any p-processor PRAM with O(x/p + d) time
Speedup
- A parallel algorithm is work-optimal if W(n) grows asymptotically the same as T(n).
- A work-optimal parallel algorithm is work-time-optimal if T(n) cannot be improved by another work-optimal algorithm.
- If T(n) is best known and W(n) matches it, this is called linear-speedup
NC Theory
Nick's Class Theory
- Good serial algorithms: Poly time
- Good parallel algorithm: poly-log \(\displaystyle O(\log ^c n)\) time, poly processors
Technique: Balanced Binary Trees
Example Applications:
- Prefix sum
- Largest element
- Nearest-one element
- Compaction
- Inputs
- Array A[1..n] of elements
- Associative binary operation, denoted * (e.g. addition, multiplication, min, max)
- Outputs
- Array B containing B(i) = A(1) * ... * A(i)
- Algorithm
\(\displaystyle O(n)\) work and \(\displaystyle O(\log n)\) time
for i, 1 <= i <= n pardo B(0, i) = A(i) // Summation step up the tree for h=1 to log n B(h, i) = B(h-1, 2i-1) * B(h-1, 2i) // Down the tree for h=log n to 1 if i even, i <= n/2^h C(h, i) = C(h+1, i/2) if i == 1 C(h, i) = B(h, i) if i odd, 3 <= i <= n/2^h C(h, i) = C(h+1, i/2) * B(h, i) return C(0, :)