5,323
edits
No edit summary Tag: visualeditor |
|||
Line 4: | Line 4: | ||
==XMT Language== | ==XMT Language== | ||
XMTC is a single-program multiple-data (SPMD) extension of C.<br> | XMTC is a single-program multiple-data (SPMD) extension of C.<br> | ||
* Spawn creates threads | |||
* Threads expire at Join | *Spawn creates threads | ||
* <code>$</code> represents the number of the thread | *Threads expire at Join | ||
* <code>PS Ri Rj</code> is an atomic prefix sum | *<code>$</code> represents the number of the thread | ||
** Stores Ri + Rj in Ri | *<code>PS Ri Rj</code> is an atomic prefix sum | ||
** Stores the original value of Ri in Rj | **Stores Ri + Rj in Ri | ||
**Stores the original value of Ri in Rj | |||
<pre> | <pre> | ||
int x = 0; | int x = 0; | ||
Line 32: | Line 33: | ||
====Types of PRAM==== | ====Types of PRAM==== | ||
* exclusive-read exclusive-write (EREW) | |||
* concurrent-read exclusive-write (CREW) | *exclusive-read exclusive-write (EREW) | ||
* concurrent-read concurrent-write (CRCW) | *concurrent-read exclusive-write (CREW) | ||
** Arbitrary CRCW - an arbitrary processor writing succeeds | *concurrent-read concurrent-write (CRCW) | ||
** Priority CRCW - the lowest numbered processor writing succeeds | **Arbitrary CRCW - an arbitrary processor writing succeeds | ||
** Common CRCW - writing succeeds only if all processors write the same value | **Priority CRCW - the lowest numbered processor writing succeeds | ||
**Common CRCW - writing succeeds only if all processors write the same value | |||
====Drawbacks==== | ====Drawbacks==== | ||
* Does not reveal how the algorithm will run on PRAMs with different number of proc | |||
* Fully specifying allocation requires an unnecessary level of detail | *Does not reveal how the algorithm will run on PRAMs with different number of proc | ||
*Fully specifying allocation requires an unnecessary level of detail | |||
===Work Depth=== | ===Work Depth=== | ||
Line 51: | Line 54: | ||
;Notes | ;Notes | ||
* Other resources call this Brent's theorem | |||
* <math>x</math> is the work and <math>d</math> is the depth | *Other resources call this Brent's theorem | ||
*<math>x</math> is the work and <math>d</math> is the depth | |||
===Speedup=== | ===Speedup=== | ||
* A parallel algorithm is ''work-optimal'' if W(n) grows asymptotically the same as T(n). | |||
* A work-optimal parallel algorithm is ''work-time-optimal'' if T(n) cannot be improved by another work-optimal algorithm. | *A parallel algorithm is ''work-optimal'' if W(n) grows asymptotically the same as T(n). | ||
* If T(n) is best known and W(n) matches it, this is called ''linear-speedup'' | *A work-optimal parallel algorithm is ''work-time-optimal'' if T(n) cannot be improved by another work-optimal algorithm. | ||
*If T(n) is best known and W(n) matches it, this is called ''linear-speedup'' | |||
====NC Theory==== | ====NC Theory==== | ||
Nick's Class Theory | Nick's Class Theory | ||
* Good serial algorithms: Poly time | |||
* Good parallel algorithm: poly-log <math>O(\log ^c n)</math> time, poly processors | *Good serial algorithms: Poly time | ||
*Good parallel algorithm: poly-log <math>O(\log ^c n)</math> time, poly processors | |||
==Technique: Balanced Binary Trees== | ==Technique: Balanced Binary Trees== | ||
Example Applications: | Example Applications: | ||
* Prefix sum | |||
* Largest element | *Prefix sum | ||
* Nearest-one element | *Largest element | ||
* Compaction | *Nearest-one element | ||
*Compaction | |||
;Inputs | ;Inputs | ||
* Array A[1..n] of elements | |||
* Associative binary operation, denoted * (e.g. addition, multiplication, min, max) | *Array A[1..n] of elements | ||
*Associative binary operation, denoted * (e.g. addition, multiplication, min, max) | |||
;Outputs | ;Outputs | ||
* Array B containing B(i) = A(1) * ... * A(i) | |||
*Array B containing B(i) = A(1) * ... * A(i) | |||
;Algorithm | ;Algorithm | ||
<math>O(n)</math> work and <math>O(\log n)</math> time | <math>O(n)</math> work and <math>O(\log n)</math> time | ||
<pre> | <pre> | ||
Line 100: | Line 110: | ||
==Technique: Merge Sorting Cluster== | ==Technique: Merge Sorting Cluster== | ||
===Technique: Partitioning=== | ===Technique: Partitioning=== | ||
* Merging: Given two sorted arrays, A and B, create a sorted array C consisting of elements in A and B | |||
* Ranking: Given an element x, find <math>Rank(x, A) = i</math> defined such that <math>A(i) \leq x \leq A(i+1)</math> | *Merging: Given two sorted arrays, A and B, create a sorted array C consisting of elements in A and B | ||
** Note the book sometimes uses <math>Rank(i,B)</math> to denote <math>Rank(A(i), B)</math>. | *Ranking: Given an element x, find <math>Rank(x, A) = i</math> defined such that <math>A(i) \leq x \leq A(i+1)</math> | ||
**Note the book sometimes uses <math>Rank(i,B)</math> to denote <math>Rank(A(i), B)</math>. | |||
;Equivalence of Merging and Ranking | ;Equivalence of Merging and Ranking | ||
* Given an algorithm for merging, we know A(i) belongs in C(j) so Rank(A(i), B) = j-i | |||
* Given an algorithm for ranking, we get A(i) belongs in C(j) where j = Rank(A(i), B) + i | *Given an algorithm for merging, we know A(i) belongs in C(j) so Rank(A(i), B) = j-i | ||
*Given an algorithm for ranking, we get A(i) belongs in C(j) where j = Rank(A(i), B) + i | |||
;Naive algorithm | ;Naive algorithm | ||
Apply binary search element wise on concurrent read system. | Apply binary search element wise on concurrent read system. | ||
O(log n) time, O(nlog n) work | O(log n) time, O(nlog n) work | ||
Line 118: | Line 131: | ||
;Partitioning paradigm | ;Partitioning paradigm | ||
Input size: n | Input size: n | ||
* Partition the input into p jobs of size approx. n/p | |||
* Do small jobs concurrently using a separate (possibly serial) algo for each. | *Partition the input into p jobs of size approx. n/p | ||
*Do small jobs concurrently using a separate (possibly serial) algo for each. | |||
Example: | Example: | ||
Total work O(n), total time O(log n) | Total work O(n), total time O(log n) | ||
Line 137: | Line 153: | ||
;Notes | ;Notes | ||
* We do not need to compute A_end or B_end. Just continue merging until we hit some index where (i % x)=0 in A or B. | |||
*We do not need to compute A_end or B_end. Just continue merging until we hit some index where (i % x)=0 in A or B. | |||
===Techinque: Divide and Conquer=== | ===Techinque: Divide and Conquer=== | ||
;Merge Sort | ;Merge Sort | ||
Input: Array A[1..n]<br> | Input: Array A[1..n]<br> | ||
Complexity:<br> | Complexity:<br> | ||
Line 160: | Line 178: | ||
===Technique: Accelerating Cascades=== | ===Technique: Accelerating Cascades=== | ||
Consider: for problem of size n, there are two parallel algorithms.<br> | Consider: for problem of size n, there are two parallel algorithms.<br> | ||
* Algo A: <math>W_1(n)</math> work, <math>T_1(n)</math> time | |||
* Algo B: Work <math>W_2(n) > W_1(n)</math>. Time <math>T_2(n) < T_1(n)</math>. Faster but less efficient. | *Algo A: <math>W_1(n)</math> work, <math>T_1(n)</math> time | ||
*Algo B: Work <math>W_2(n) > W_1(n)</math>. Time <math>T_2(n) < T_1(n)</math>. Faster but less efficient. | |||
Assume Algo A is a "reducing algorithm"<br> | Assume Algo A is a "reducing algorithm"<br> | ||
We start with Algo A until the size is below some threshold. Then we switch to Algo B. | We start with Algo A until the size is below some threshold. Then we switch to Algo B. | ||
E.g. | E.g. | ||
* Algo 1 runs in <math>O(\log n)</math> iterations each taking <math>O(\log n)</math> time and <math>O(n)</math> work. Each iteration reduces the size to (3/4)m. | |||
* Algo 2 runs in <math>O(\log n)</math> time and <math>O(n\log n)</math> work. | *Algo 1 runs in <math>O(\log n)</math> iterations each taking <math>O(\log n)</math> time and <math>O(n)</math> work. Each iteration reduces the size to (3/4)m. | ||
* Run Algo 1 for <math>O(\log \log n)</math> rounds then run algo 2. | *Algo 2 runs in <math>O(\log n)</math> time and <math>O(n\log n)</math> work. | ||
** Total time is <math>O(\log n \log \log n)</math> and work is <math>O(n)</math> | *Run Algo 1 for <math>O(\log \log n)</math> rounds then run algo 2. | ||
**Total time is <math>O(\log n \log \log n)</math> and work is <math>O(n)</math> | |||
===Problem: Selection=== | ===Problem: Selection=== | ||
;Algorithm 1 | ;Algorithm 1 | ||
* Partition elements into rows of <math>\log n</math> size | |||
* For each row, find the median within the row | *Partition elements into rows of <math>\log n</math> size | ||
* Find the median of medians (MoM) in <math>O(n)</math> | *For each row, find the median within the row | ||
* Put all rows with median <= MoM above and all rows with median >= Mom below | *Find the median of medians (MoM) in <math>O(n)</math> | ||
* Now <math>m/4</math> elements are smaller and <math>m/4</math> are larger | *Put all rows with median <= MoM above and all rows with median >= Mom below | ||
** Known as the reducing lemma | *Now <math>m/4</math> elements are smaller and <math>m/4</math> are larger | ||
**Known as the reducing lemma | |||
This algorithm solves the selection problem in <math>O(\log^2 n)</math> time and <math>O(n)</math> work. | This algorithm solves the selection problem in <math>O(\log^2 n)</math> time and <math>O(n)</math> work. | ||
; Accelerating Cascades | ;Accelerating Cascades | ||
* What we have: | |||
** Algorithm 1 has <math>O(\log n)</math> iterations. | *What we have: | ||
:: Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math> | **Algorithm 1 has <math>O(\log n)</math> iterations. | ||
** Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work. | |||
* Step 1: Use algo 1 to reduce from n to <math>n / \log n</math> | ::Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math> | ||
* Step 2: Apply algorithm 2 | |||
**Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work. | |||
*Step 1: Use algo 1 to reduce from n to <math>n / \log n</math> | |||
*Step 2: Apply algorithm 2 | |||
===Informal Work-Depth (IWD)=== | ===Informal Work-Depth (IWD)=== | ||
Line 202: | Line 228: | ||
===Algorithm=== | ===Algorithm=== | ||
* Partition A into n/r subarrays <math>B_1,...,B_{n/r}</math> | |||
** Sort each subarray separately using serial bucket sort (counting sort) | *Partition A into n/r subarrays <math>B_1,...,B_{n/r}</math> | ||
** Compute number(v,s) # of elements of value v in <math>B_s</math> for <math>0\leq v \leq r-1</math> and <math>1 \leq s \leq n/r</math> | **Sort each subarray separately using serial bucket sort (counting sort) | ||
** Compute serial(i) = # of elements in the block <math>B_s</math> such that A(j)=A(i) and <math> j < i </math> <math>1 \leq j \neq n</math> | **Compute number(v,s) # of elements of value v in <math>B_s</math> for <math>0\leq v \leq r-1</math> and <math>1 \leq s \leq n/r</math> | ||
* Run prefix sum on number(v,1),number(v,2),...,number(v,n/r) into ps(v,1), ps(v,2),..., ps(v,n/r) | **Compute serial(i) = # of elements in the block <math>B_s</math> such that A(j)=A(i) and <math> j < i </math> <math>1 \leq j \neq n</math> | ||
* Compute prefix sums of cardinality(0),.., cardinality(r-1) into global_ps(0),...,global_ps(r-1) | *Run prefix sum on number(v,1),number(v,2),...,number(v,n/r) into ps(v,1), ps(v,2),..., ps(v,n/r) | ||
* The rank of element <math>i</math> is <math>1+serial(i)+ps(v,s-1)+global_ps(v-1)</math> | *Compute prefix sums of cardinality(0),.., cardinality(r-1) into global_ps(0),...,global_ps(r-1) | ||
*The rank of element <math>i</math> is <math>1+serial(i)+ps(v,s-1)+global_ps(v-1)</math> | |||
===Complexity=== | ===Complexity=== | ||
* Step 1:<math>T=O(r),\;W=O(r)</math> per subarray. | |||
** Total: <math>T=O(r),\; W=O(n)</math> | *Step 1:<math>T=O(r),\;W=O(r)</math> per subarray. | ||
* Step 2: <math>r</math> computations each <math>T=O(\log(n/r)),\; W=O(n/r)</math> | **Total: <math>T=O(r),\; W=O(n)</math> | ||
** Total <math>T=O(\log n),\; W=O(n)</math> | *Step 2: <math>r</math> computations each <math>T=O(\log(n/r)),\; W=O(n/r)</math> | ||
* Step 3: <math>T=O(\log r),\; W=O(r)</math> | **Total <math>T=O(\log n),\; W=O(n)</math> | ||
* Step 4: <math>T=O(1)\; W=O(n)</math> | *Step 3: <math>T=O(\log r),\; W=O(r)</math> | ||
* Total: <math>T=O(r + \log n),\; W=O(n)</math> | *Step 4: <math>T=O(1)\; W=O(n)</math> | ||
*Total: <math>T=O(r + \log n),\; W=O(n)</math> | |||
;Notes | ;Notes | ||
* Running time is not poly-log | |||
*Running time is not poly-log | |||
===Theorems=== | ===Theorems=== | ||
* The integer sorting algorithm runs in <math>O(r+\log n)</math> time and <math>O(n)</math> work | *The integer sorting algorithm runs in <math>O(r+\log n)</math> time and <math>O(n)</math> work | ||
* The integer sorting algorithm can be applied to run in time <math>O(k(r^{1/k}+\log n))</math> and work <math>O(kn)</math> | *The integer sorting algorithm can be applied to run in time <math>O(k(r^{1/k}+\log n))</math> and work <math>O(kn)</math> | ||
Radix sort using the basic integer sort (BIS) algorithm.<br> | Radix sort using the basic integer sort (BIS) algorithm.<br> | ||
Line 236: | Line 265: | ||
===2-3 tree=== | ===2-3 tree=== | ||
A 2-3 tree is a rooted tree which has the properties: | A 2-3 tree is a rooted tree which has the properties: | ||
* Each node has 2-3 ordered children | |||
* For any internal node, every directed path to a leaf is the same length | *Each node has 2-3 ordered children | ||
*For any internal node, every directed path to a leaf is the same length | |||
Types of queries: | Types of queries: | ||
* ''search(a)'' | |||
* ''insert(a)'' | *''search(a)'' | ||
* ''delete(a)'' | *''insert(a)'' | ||
*''delete(a)'' | |||
====Serial Algorithms==== | ====Serial Algorithms==== | ||
* Deletion: discard(a) | |||
** Delete connection between a and parent of a | *Deletion: discard(a) | ||
** If parent has 1 child | **Delete connection between a and parent of a | ||
*** If parent's sibling has 2 children, move child to sibling and discard(parent) | **If parent has 1 child | ||
*** If parent's sibling has 3 children, take one child from sibling | ***If parent's sibling has 2 children, move child to sibling and discard(parent) | ||
***If parent's sibling has 3 children, take one child from sibling | |||
====Parallel Algorithms==== | ====Parallel Algorithms==== | ||
Assume concurrent read model | Assume concurrent read model | ||
* Insert: Suppose we want to insert sorted elements <math>c_1,...,c_k</math> into a tree with n elements. | |||
** Insert element <math>c_{k/2}</math>. | *Insert: Suppose we want to insert sorted elements <math>c_1,...,c_k</math> into a tree with n elements. | ||
** Then insert in parallel <math>(c_1,...,c_{k/2-1})</math> and <math>(c_{k/2+1},...,c_k)</math> | **Insert element <math>c_{k/2}</math>. | ||
** Time is <math>O(\log k \log n)</math> using <math>k</math> processors. | **Then insert in parallel <math>(c_1,...,c_{k/2-1})</math> and <math>(c_{k/2+1},...,c_k)</math> | ||
* Deletion: Suppose we want to delete sorted elements <math>c_1,...,c_k</math> | **Time is <math>O(\log k \log n)</math> using <math>k</math> processors. | ||
** Backwards Insertion | *Deletion: Suppose we want to delete sorted elements <math>c_1,...,c_k</math> | ||
** for t=0 to <math>\log k</math> | **Backwards Insertion | ||
*** if <math>i \equiv 2^t (\operatorname{mod}2^{t+1})</math> | **for t=0 to <math>\log k</math> | ||
**** discard(c_i) | ***if <math>i \equiv 2^t (\operatorname{mod}2^{t+1})</math> | ||
** Time is <math>O(\log n \log k)</math> without pipelining | ****discard(c_i) | ||
** With pipelining, complexity is <math>O(\log n + \log k)</math> | **Time is <math>O(\log n \log k)</math> without pipelining | ||
**With pipelining, complexity is <math>O(\log n + \log k)</math> | |||
===Pipelining=== | ===Pipelining=== | ||
* There are <math>\log k</math> waves of the absorb procedure. | |||
* Apply pipelining to make the time <math>O(\log n + \log k)</math> | *There are <math>\log k</math> waves of the absorb procedure. | ||
*Apply pipelining to make the time <math>O(\log n + \log k)</math> | |||
==Maximum Finding== | ==Maximum Finding== | ||
Line 272: | Line 307: | ||
===Constant time, <math>O(n^2)</math> Work=== | ===Constant time, <math>O(n^2)</math> Work=== | ||
* Compare every pair of elements in A[1...n] | |||
*Compare every pair of elements in A[1...n] | |||
<pre> | <pre> | ||
for i=1 to n pardo | for i=1 to n pardo | ||
Line 287: | Line 323: | ||
===<math>O(\log \log n)</math> time and <math>O(n\log \log n)</math> work algorithm=== | ===<math>O(\log \log n)</math> time and <math>O(n\log \log n)</math> work algorithm=== | ||
* Split A into <math>\sqrt{n}</math> subarrays | |||
* Find the max of each subarray recursively | *Split A into <math>\sqrt{n}</math> subarrays | ||
* Find the max of all subarrays in <math>O(n)</math> using the constant time algo | *Find the max of each subarray recursively | ||
*Find the max of all subarrays in <math>O(n)</math> using the constant time algo | |||
Complexity | Complexity | ||
* <math>T(n) \leq T(\sqrt{n}) + c_1</math> | |||
* <math>W(n) \leq \sqrt{n}W(\sqrt{n}) + c_2 n</math> | *<math>T(n) \leq T(\sqrt{n}) + c_1</math> | ||
* <math>T(n) = O(\log \log n)</math>, <math>W(n) = O(n\log \log n)</math> | *<math>W(n) \leq \sqrt{n}W(\sqrt{n}) + c_2 n</math> | ||
*<math>T(n) = O(\log \log n)</math>, <math>W(n) = O(n\log \log n)</math> | |||
===<math>O(\log \log n)</math> time and <math>O(n)</math> work=== | ===<math>O(\log \log n)</math> time and <math>O(n)</math> work=== | ||
* Step 1: Partition into blocks of size <math>\log \log n</math>. Then we have <math>n/ \log \log n</math> blocks. | *Step 1: Partition into blocks of size <math>\log \log n</math>. Then we have <math>n/ \log \log n</math> blocks. | ||
** Apply serial linear time algorithm to find maximum of each block | **Apply serial linear time algorithm to find maximum of each block | ||
** <math>O(\log \log n)</math> time and <math>O(n)</math> work. | **<math>O(\log \log n)</math> time and <math>O(n)</math> work. | ||
* Step 2: Apply doubly log algorithm to <math>n / \log \log n</math> maxima. | *Step 2: Apply doubly log algorithm to <math>n / \log \log n</math> maxima. | ||
** <math>O(\log \log n)</math> time and <math>O(n)</math> work. | **<math>O(\log \log n)</math> time and <math>O(n)</math> work. | ||
===Random Sampling=== | ===Random Sampling=== | ||
Maximum finding in <math>O(1)</math> time and <math>O(n)</math> work with very high probability | Maximum finding in <math>O(1)</math> time and <math>O(n)</math> work with very high probability | ||
* Step 1: Using aux array B of size <math>b^{7/8}</math>. Independently fill with random elements from A | *Step 1: Using aux array B of size <math>b^{7/8}</math>. Independently fill with random elements from A | ||
* Step 2: Find the max <math>m</math> in array B in <math>O(1)</math> time and <math>O(n)</math> work | *Step 2: Find the max <math>m</math> in array B in <math>O(1)</math> time and <math>O(n)</math> work | ||
** Last 3 pulses of the recursive doubly-log time algorithm: | **Last 3 pulses of the recursive doubly-log time algorithm: | ||
** Pulse 1: B is partitioned into <math>n^{3/4}</math> blocks of size <math>n^{1/8}</math> each. Find the max of each block. | **Pulse 1: B is partitioned into <math>n^{3/4}</math> blocks of size <math>n^{1/8}</math> each. Find the max of each block. | ||
** Pulse 2: <math>n^{3/4}</math> maxima are partitioned into <math>n^{1/2}</math> blocks of size <math>n^{1/4}</math> each. | **Pulse 2: <math>n^{3/4}</math> maxima are partitioned into <math>n^{1/2}</math> blocks of size <math>n^{1/4}</math> each. | ||
** Pulse 2: Find the max m of <math>n^{1/2}</math> maxima in <math>O(1)</math> time and <math>O(n)</math> work. | **Pulse 2: Find the max m of <math>n^{1/2}</math> maxima in <math>O(1)</math> time and <math>O(n)</math> work. | ||
* Step 3: While there is an element larger than m, throw the new element into an array of size <math>n^{7/8}</math>. | *Step 3: While there is an element larger than m, throw the new element into an array of size <math>n^{7/8}</math>. | ||
: Compute the maximum of the new array. | |||
:Compute the maximum of the new array. | |||
;Complexity | ;Complexity | ||
* Step 1 takes <math>O(1)</math> time and <math>O(n^{7/8})</math> work | |||
* Step 2 takes <math>O(1)</math> time and <math>O(n)</math> work | *Step 1 takes <math>O(1)</math> time and <math>O(n^{7/8})</math> work | ||
* Each time Step 3 takes <math>O(1)</math> time and <math>O(n)</math> work | *Step 2 takes <math>O(1)</math> time and <math>O(n)</math> work | ||
* With high probability, we only need one iteration (see theorem below) so the time is <math>O(1)</math> and work is <math>O(n^{7/8})</math> | *Each time Step 3 takes <math>O(1)</math> time and <math>O(n)</math> work | ||
*With high probability, we only need one iteration (see theorem below) so the time is <math>O(1)</math> and work is <math>O(n^{7/8})</math> | |||
====Theorem 8.2==== | ====Theorem 8.2==== | ||
Line 327: | Line 367: | ||
The probability of not finishing in the above time is <math>O(1/n^c)</math>. | The probability of not finishing in the above time is <math>O(1/n^c)</math>. | ||
{{ hidden | Proof | | {{hidden | Proof | | ||
See page 58 of the classnotes.<br> | See page 58 of the classnotes.<br> | ||
Line 340: | Line 380: | ||
;Inputs | ;Inputs | ||
* Vector of vertices which point to an index in edges | |||
* Vector of edges e.g. (1,2) | *Vector of vertices which point to an index in edges | ||
* Vector of pointers to identical edges e.g. index of (1,2) -> index of (2,1) | *Vector of edges e.g. (1,2) | ||
* A vertex <math>r</math> | *Vector of pointers to identical edges e.g. index of (1,2) -> index of (2,1) | ||
*A vertex <math>r</math> | |||
;Goal: | ;Goal: | ||
* Select a direction for each edge to make our graph a tree with root <math>r</math> | |||
*Select a direction for each edge to make our graph a tree with root <math>r</math> | |||
;Algorithm | ;Algorithm | ||
* Step 1: For every edge <math>(u,v)</math>, add two edges <math>u \rightarrow v</math> and <math>v \rightarrow u</math> | |||
* Step 2: For every directed edge <math>u \rightarrow v</math>, set a pointer to another directed edge from <math>v</math>, <math>next(u \rightarrow v)</math> | *Step 1: For every edge <math>(u,v)</math>, add two edges <math>u \rightarrow v</math> and <math>v \rightarrow u</math> | ||
** If our edge is 1 -> 2, then the next edge is the immediate next edge of edge 2->1 in our edges array | *Step 2: For every directed edge <math>u \rightarrow v</math>, set a pointer to another directed edge from <math>v</math>, <math>next(u \rightarrow v)</math> | ||
** If edge 2 -> 1 is the last edge coming out of 2, then the next edge is the first edge coming out of 2 | **If our edge is 1 -> 2, then the next edge is the immediate next edge of edge 2->1 in our edges array | ||
** Now we have an euler cycle (or euler tour) among the edges | **If edge 2 -> 1 is the last edge coming out of 2, then the next edge is the first edge coming out of 2 | ||
* Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math> | **Now we have an euler cycle (or euler tour) among the edges | ||
* Step 4: Apply a list ranking algorithm to the edges. | *Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math> | ||
* Step 5: Delete the edge between every two vertices with the lower ranking | *Step 4: Apply a list ranking algorithm to the edges. | ||
*Step 5: Delete the edge between every two vertices with the lower ranking | |||
;Notes | ;Notes | ||
* Every step except 4, list ranking, is a local operation which is constant time and linear work | |||
* Requires a list ranking algorithm | *Every step except 4, list ranking, is a local operation which is constant time and linear work | ||
*Requires a list ranking algorithm | |||
====Preorder Numbering==== | ====Preorder Numbering==== | ||
Define: tree-edges point away from root, non-tree edges point to root | Define: tree-edges point away from root, non-tree edges point to root | ||
* Step 1: For every edge pardo, if e is a tree edge, distance(e) = 1 else distance(e) = 0 | *Step 1: For every edge pardo, if e is a tree edge, distance(e) = 1 else distance(e) = 0 | ||
* Step 2: List ranking | *Step 2: List ranking | ||
* Step 3: For every edge e: u->v, preorder(v) = n-distance(e) + 1 | *Step 3: For every edge e: u->v, preorder(v) = n-distance(e) + 1 | ||
===Technique: Pointer Jumping=== | ===Technique: Pointer Jumping=== | ||
Line 389: | Line 433: | ||
Correctness: <br> | Correctness: <br> | ||
* Each element will eventually point to the cashier | |||
* Distance will always be correct | *Each element will eventually point to the cashier | ||
** If next[i] = NIL then distance is to the cashier | *Distance will always be correct | ||
** Otherwise the distance is <math>2^k</math> | **If next[i] = NIL then distance is to the cashier | ||
**Otherwise the distance is <math>2^k</math> | |||
Complexity: <math>O(\log n)</math> time and <math>O(n\log n)</math> work | Complexity: <math>O(\log n)</math> time and <math>O(n\log n)</math> work | ||
===Work-optimal List Ranking=== | ===Work-optimal List Ranking=== | ||
; Complexity | *From our list A, pull a subset S with uniform density and sparcity | ||
*Remove S from A, creating a new subset <math>A-S</math>. | |||
**For each element i in <math>S</math>, fix the next and distance of prev(i) | |||
**Run compaction | |||
*Recurse on <math>A-S</math> until size is <math>\leq n/\log n</math>. Then use pointer jumping. | |||
*Add <math>S</math> back to <math>A-S</math> | |||
;Complexity | |||
With high probability: | With high probability: | ||
* <math>O(log (n) log log (n)) time</math> | |||
* <math>O(n)</math> work | *<math>O(log (n) log log (n)) time</math> | ||
*<math>O(n)</math> work | |||
===Randomized Symmetry Breaking=== | ===Randomized Symmetry Breaking=== | ||
Line 420: | Line 468: | ||
===Deterministic Symmetry Breaking=== | ===Deterministic Symmetry Breaking=== | ||
* Assign each element a unique tag, such as the index in the array | |||
* Convert the tag to binary. For each element i, tag(i)!=tag(next(i)). | *Assign each element a unique tag, such as the index in the array | ||
* Let k be the index of the rightmost bit (where 0 is the rightmost bit) differing between tag(i) and tag(next(i)) | *Convert the tag to binary. For each element i, tag(i)!=tag(next(i)). | ||
* Let b be the bit in tag(i) differing from tag(next(i)) | *Let k be the index of the rightmost bit (where 0 is the rightmost bit) differing between tag(i) and tag(next(i)) | ||
* Set the new tag to <code>(k<<1) + b</code> | *Let b be the bit in tag(i) differing from tag(next(i)) | ||
*Set the new tag to <code>(k<<1) + b</code> | |||
This algorithm takes <math>O(1)</math> time and <math>O(n)</math> work per iteration. | This algorithm takes <math>O(1)</math> time and <math>O(n)</math> work per iteration. | ||
Line 433: | Line 482: | ||
===r-ruling set=== | ===r-ruling set=== | ||
An r-ruling set is a subset <math>S</math> of a linked list satisfying two properties: | An r-ruling set is a subset <math>S</math> of a linked list satisfying two properties: | ||
* ''Uniform Density'' If node <math>i</math> is not in S, then one of the <math>r</math> nodes following i will be in <math>S</math> | |||
* ''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math> | *''Uniform Density'' If node <math>i</math> is not in S, then one of the <math>r</math> nodes following i will be in <math>S</math> | ||
*''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math> | |||
;Algorithm | ;Algorithm | ||
The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math> | The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math> | ||
# Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math> | |||
# For each element i in parallel, if i is a local maximum then add it to <math>S</math> | #Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math> | ||
#For each element i in parallel, if i is a local maximum then add it to <math>S</math> | |||
==Tree Contraction== | ==Tree Contraction== | ||
Line 453: | Line 505: | ||
===Parallel Rake=== | ===Parallel Rake=== | ||
Rake all leaves such that: | Rake all leaves such that: | ||
* No two leaves have the same parent (stem) | |||
* The parent of a stem cannot be the stem of another leaf | *No two leaves have the same parent (stem) | ||
*The parent of a stem cannot be the stem of another leaf | |||
Line 460: | Line 513: | ||
;Parallel Rake Contraction Scheme | ;Parallel Rake Contraction Scheme | ||
Apply legal parallel rakes until the tree becomes a 3-node binary tree | Apply legal parallel rakes until the tree becomes a 3-node binary tree | ||
Line 465: | Line 519: | ||
;Step 1 | ;Step 1 | ||
* Number leaves according to a DFS using the Euler tour algorithm.<br> | |||
*Number leaves according to a DFS using the Euler tour algorithm.<br> | |||
Observation 3: | Observation 3: | ||
Line 474: | Line 529: | ||
;Step 2: | ;Step 2: | ||
* Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>. | |||
; Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>. | *Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>. | ||
* Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root. | |||
* Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root. | ;Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>. | ||
* Repeat Step 2 until the tree is a 3-node binary tree | |||
*Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root. | |||
*Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root. | |||
*Repeat Step 2 until the tree is a 3-node binary tree | |||
===Complexity=== | ===Complexity=== | ||
Line 502: | Line 560: | ||
Definitions: | Definitions: | ||
* Pointer Graph | |||
* Supervertices | *Pointer Graph | ||
* The supervertex graph | *Supervertices | ||
* Hookings | *The supervertex graph | ||
* Parallel pointer jumping | *Hookings | ||
*Parallel pointer jumping | |||
===A first connectivity algorithm=== | ===A first connectivity algorithm=== | ||
The first connectivity algorithm consists of <math>\log(n)</math> iterations of the following two steps: | The first connectivity algorithm consists of <math>\log(n)</math> iterations of the following two steps: | ||
* Hookings - Each root hooks itself the the minimal adjacent root. If there are no adjacent stars, then this rooted star quits. | |||
* Parallel pointer jumping - Each vertex performs pointer jumping until every vertex points to the root, forming a rooted star. | *Hookings - Each root hooks itself the the minimal adjacent root. If there are no adjacent stars, then this rooted star quits. | ||
*Parallel pointer jumping - Each vertex performs pointer jumping until every vertex points to the root, forming a rooted star. | |||
;Theorems | ;Theorems | ||
# The pointer graph always consists of rooted trees. | |||
# After <math>\log n</math> iterations, all vertices are contained in rooted stars that quit. | #The pointer graph always consists of rooted trees. | ||
#: Each connected component is represented by a single rooted star. | #After <math>\log n</math> iterations, all vertices are contained in rooted stars that quit. | ||
#:Each connected component is represented by a single rooted star. | |||
;Complexity | ;Complexity | ||
* We need <math>O(\log n)</math> iterations | |||
** Hookings take <math>O(1)</math> time and <math>O(n+m)</math> work | *We need <math>O(\log n)</math> iterations | ||
** Pointer jumping takes <math>O(\log n)</math> time and <math>O(n\log n)</math> work | **Hookings take <math>O(1)</math> time and <math>O(n+m)</math> work | ||
* For adjacency matrix, in total we need <math>O(\log^2 n)</math> time and <math>O(n^2\log n)</math> work since we have a processor per <math>n^2</math> possible edges. | **Pointer jumping takes <math>O(\log n)</math> time and <math>O(n\log n)</math> work | ||
* For incidence lists, we need <math>O(\log^2 n)</math> time and <math>O(n\log^2 n + m\log n)</math> work since we have a processor per edge. | *For adjacency matrix, in total we need <math>O(\log^2 n)</math> time and <math>O(n^2\log n)</math> work since we have a processor per <math>n^2</math> possible edges. | ||
*For incidence lists, we need <math>O(\log^2 n)</math> time and <math>O(n\log^2 n + m\log n)</math> work since we have a processor per edge. | |||
===A second connectivity algorithm=== | ===A second connectivity algorithm=== | ||
Line 531: | Line 593: | ||
Each iteration takes constant time. | Each iteration takes constant time. | ||
# Probe quitting: each rooted star whose supervertex is not adjacent to any other quits | #Probe quitting: each rooted star whose supervertex is not adjacent to any other quits | ||
# Hooking on smaller: each rooted star hooks onto a smaller vertex of another supervertex which is not a leaf | #Hooking on smaller: each rooted star hooks onto a smaller vertex of another supervertex which is not a leaf | ||
#* Note that this is not unique. Requires arbitrary CRCW. | #*Note that this is not unique. Requires arbitrary CRCW. | ||
# Hooking non-hooked-upon: every rooted star not hooked upon in step 2 hooks to another vertex in another supervertex which is not a leaf | #Hooking non-hooked-upon: every rooted star not hooked upon in step 2 hooks to another vertex in another supervertex which is not a leaf | ||
# Parallel pointer jumping: one round of parallel pointer jumping | #Parallel pointer jumping: one round of parallel pointer jumping | ||
;Theorems | ;Theorems | ||
* The pointer graph always consists of rooted trees. | |||
* For every vertex which is not a leaf, <math>D(v) \leq v</math> | *The pointer graph always consists of rooted trees. | ||
*For every vertex which is not a leaf, <math>D(v) \leq v</math> | |||
{{hidden | Proof | | {{hidden | Proof | | ||
* This is true after steps 1, 2, and 4. | * This is true after steps 1, 2, and 4. | ||
Line 553: | Line 616: | ||
;Complexity | ;Complexity | ||
* Time <math>O(\log n)</math> | |||
* Work <math>O((n+m)\log n)</math> | *Time <math>O(\log n)</math> | ||
*Work <math>O((n+m)\log n)</math> | |||
;Notes | ;Notes | ||
{{ hidden | Proof | | *Without step 3, the algorithm will run in <math>O(n)</math> time instead of <math>O(\log n)</math> but the result will be valid. | ||
*In step 2, you can hook rooted trees in addition to rooted stars for better empirical performance | |||
{{hidden | Proof | | |||
Let <math>h(T)</math> be the height of tree T where a two-layer tree has height 1.<br> | Let <math>h(T)</math> be the height of tree T where a two-layer tree has height 1.<br> | ||
Define a single node to have height 1.<br> | Define a single node to have height 1.<br> | ||
Line 570: | Line 635: | ||
===Minimum Spanning Forest=== | ===Minimum Spanning Forest=== | ||
minimum spanning forest | |||
==Hardware/Architecture== | ==Hardware/Architecture== | ||
[http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM] | [http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM] | ||
Line 576: | Line 641: | ||
[[File:Pram hardware diagram.jpg | 400px]] | [[File:Pram hardware diagram.jpg | 400px]] | ||
* MTCU - Master Thread Control Unit - runs in serial mode | *MTCU - Master Thread Control Unit - runs in serial mode | ||
* TCU Cluster - Thread Control Unit | *TCU Cluster - Thread Control Unit | ||
;Procedure | ;Procedure | ||
* Spawn broadcasts your spawn code in the block to every thread control unit (TCU). | |||
* Data is shared through the interconnection network | *Spawn broadcasts your spawn code in the block to every thread control unit (TCU). | ||
*Data is shared through the interconnection network | |||
;Performance Penalty | ;Performance Penalty | ||
<math>\text{Execution Depth} = \text{Computation Depth} + LSRTM \times R + QD</math> | <math>\text{Execution Depth} = \text{Computation Depth} + LSRTM \times R + QD</math> | ||
* <math>LSTRM</math> - length of sequence of round trips to memory | |||
* <math>R</math> - time for round trip to memory | *<math>LSTRM</math> - length of sequence of round trips to memory | ||
* <math>QD</math> - queuing delay | *<math>R</math> - time for round trip to memory | ||
** If multiple threads access the same memory module, they are queued at the memory module | *<math>QD</math> - queuing delay | ||
**If multiple threads access the same memory module, they are queued at the memory module | |||
==Resources== | ==Resources== | ||
* [https://stanford.edu/~rezab/dao/ Stanford CME323 Distributed Algorithms Lectures 1-3] | |||
*[https://stanford.edu/~rezab/dao/ Stanford CME323 Distributed Algorithms Lectures 1-3] | |||
===Textbooks=== | ===Textbooks=== | ||
* [http://users.umiacs.umd.edu/~vishkin/PUBLICATIONS/classnotes.pdf Uzi Vishkin ENEE651/CMSC751 Classnotes] | |||
* [https://www.amazon.com/Introduction-Parallel-Algorithms-Joseph-JaJa/dp/0201548569?sa-no-redirect=1&pldnSite=1 Introduction to Parallel Algorithms (Joseph Jaja, textbook)] | *[http://users.umiacs.umd.edu/~vishkin/PUBLICATIONS/classnotes.pdf Uzi Vishkin ENEE651/CMSC751 Classnotes] | ||
* [https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)] | *[https://www.amazon.com/Introduction-Parallel-Algorithms-Joseph-JaJa/dp/0201548569?sa-no-redirect=1&pldnSite=1 Introduction to Parallel Algorithms (Joseph Jaja, textbook)] | ||
*[https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)] | |||
==Ignore== | ==Ignore== | ||
[[Visible to::users]] | [[Visible to::users]] |