Parallel Algorithms

Parallel Algorithms notes from CMSC751 with Uzi Vishkin. See File:CMSC751 Classnotes.pdf

XMT Language

XMTC is a single-program multiple-data (SPMD) extension of C.

Spawn creates threads
Threads expire at Join

Models

PRAM

Parallel Random-Access Machine/Model
You're given n synchronous processors each with local memory and access to a shared memory.
Each processor can write to shared memory, read to shared memory, or do computation in local memory.
You tell each processor what to do at each time step.

Types of PRAM

exclusive-read exclusive-write (EREW)
concurrent-read exclusive-write (CREW)
concurrent-read concurrent-write (CRCW)
- Arbitrary CRCW - an arbitrary processor writing succeeds
- Priority CRCW - the lowest numbered processor writing succeeds
- Common CRCW - writing succeeds only if all processors write the same value

Drawbacks

Does not reveal how the algorithm will run on PRAMs with different number of proc
Fully specifying allocation requires an unnecessary level of detail

Work Depth

You provide a sequence of instructions. At each time step, you specify the number of parallel operations.

WD-presentation Sufficiency Theorem

Given an algorithm in WD mode that takes x=x(n) operations and d=d(n) time, the algorithm can be implemented in any p-processor PRAM with O(x/p + d) time

Speedup

A parallel algorithm is work-optimal if W(n) grows asymptotically the same as T(n).
A work-optimal parallel algorithm is work-time-optimal if T(n) cannot be improved by another work-optimal algorithm.
If T(n) is best known and W(n) matches it, this is called linear-speedup

NC Theory

Nick's Class Theory

Good serial algorithms: Poly time
Good parallel algorithm: poly-log \(\displaystyle O(\log ^c n)\) time, poly processors

Technique: Balanced Binary Trees

Example Applications:

Prefix sum
Largest element
Nearest-one element

Inputs

Array A[1..n] of elements
Associative binary operation, denoted * (e.g. addition, multiplication, min, max)

Outputs

Array B containing B(i) = A(1) * ... * A(i)

Algorithm

\(\displaystyle O(n)\) work and \(\displaystyle O(\log n)\) time

for i, 1 <= i <= n pardo
  B(0, i) = A(i)
  // Summation step up the tree
  for h=1 to log n
    B(h, i) = B(h-1, 2i-1) * B(h-1, 2i)
  // Down the tree
  for h=log n to 1
    if i even, i <= n/2^h
      C(h, i) = C(h+1, i/2)
    if i == 1
      C(h, i) = B(h, i)
    if i odd, 3 <= i <= n/2^h
      C(h, i) = C(h+1, i/2) * B(h, i)
return C(0, :)