Parallel Algorithms: Difference between revisions
No edit summary Tag: visualeditor |
No edit summary |
||
(32 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Parallel Algorithms notes from CMSC751 with Uzi Vishkin.<br> | Parallel Algorithms notes from CMSC751 with [http://users.umiacs.umd.edu/~vishkin/index.shtml Uzi Vishkin].<br> | ||
This class is based on his book [[:File:CMSC751_Classnotes.pdf | Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques]] | This class is based on his book [[:File:CMSC751_Classnotes.pdf | Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques]]<br> | ||
[http://users.umiacs.umd.edu/~vishkin/TEACHING/ Class Website]<br> | |||
==XMT Language== | ==XMT Language== | ||
Line 11: | Line 12: | ||
**Stores Ri + Rj in Ri | **Stores Ri + Rj in Ri | ||
**Stores the original value of Ri in Rj | **Stores the original value of Ri in Rj | ||
< | <syntaxhighlight lang="c"> | ||
int x = 0; | int x = 0; | ||
Line 22: | Line 23: | ||
} | } | ||
} | } | ||
</ | </syntaxhighlight> | ||
==Models== | ==Models== | ||
Line 209: | Line 210: | ||
*What we have: | *What we have: | ||
**Algorithm 1 has <math>O(\log n)</math> iterations. | **Algorithm 1 has <math>O(\log n)</math> iterations. | ||
::Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math> | ::Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math> | ||
**Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work. | |||
*Step 1: Use algo 1 to reduce from n to <math>n / \log n</math> | *Step 1: Use algo 1 to reduce from n to <math>n / \log n</math> | ||
*Step 2: Apply algorithm 2 | *Step 2: Apply algorithm 2 | ||
Total time is <math>O(\log n \log\log n)</math> and total work is <math>O(n)</math> | |||
===Informal Work-Depth (IWD)=== | ===Informal Work-Depth (IWD)=== | ||
Line 257: | Line 259: | ||
Radix sort using the basic integer sort (BIS) algorithm.<br> | Radix sort using the basic integer sort (BIS) algorithm.<br> | ||
If your range is 0 to n and and your radix is <math>\sqrt{n}</math> then you will need <math>log_{\sqrt{n}}(r) = 2</math> rounds. | If your range is 0 to n and and your radix is <math>\sqrt{n}</math> then you will need <math>\log_{\sqrt{n}}(r) = 2</math> rounds. | ||
==2-3 trees; Technique: Pipelining== | ==2-3 trees; Technique: Pipelining== | ||
Line 398: | Line 400: | ||
**Now we have an euler cycle (or euler tour) among the edges | **Now we have an euler cycle (or euler tour) among the edges | ||
*Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math> | *Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math> | ||
** The edge <math>u_{r,1} \rightarrow r</math> is the opposite edge of the first edge coming out of <math>r</math> | |||
*Step 4: Apply a list ranking algorithm to the edges. | *Step 4: Apply a list ranking algorithm to the edges. | ||
*Step 5: Delete the edge between every two vertices with the lower ranking | *Step 5: Delete the edge between every two vertices with the lower ranking | ||
Line 452: | Line 455: | ||
;Complexity | ;Complexity | ||
We get the same complexity for randomized and deterministic list ranking (see below)<br> | |||
Randomized has the complexity with high probability. | |||
*<math>O(\log (n) \log \log (n))</math> time | |||
*<math>O(n)</math> work | |||
Using a parallel prefix-sum algorith which runs in <math>O(\log n/\log \log n)</math> time, | |||
we can get list ranking in <math>O(\log n)</math> time and <math>O(n)</math> work. | |||
===Randomized Symmetry Breaking=== | ===Randomized Symmetry Breaking=== | ||
Line 466: | Line 472: | ||
s(i) = 1 | s(i) = 1 | ||
</pre> | </pre> | ||
;Notes | |||
* The full randomized list ranking works in <math>O(\log n \log\log n)</math> time and <math>O(n)</math> work with high probability | |||
===Deterministic Symmetry Breaking=== | ===Deterministic Symmetry Breaking=== | ||
Line 486: | Line 495: | ||
*''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math> | *''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math> | ||
;Algorithm | ;Algorithm 1 | ||
The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math> | The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math> | ||
Line 492: | Line 501: | ||
#Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math> | #Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math> | ||
#For each element i in parallel, if i is a local maximum then add it to <math>S</math> | #For each element i in parallel, if i is a local maximum then add it to <math>S</math> | ||
;Optimal-Work 2-Ruling set | |||
<pre> | |||
Apply one iteration of deterministic coin tossing | |||
Sort elements by tags | |||
Initialize array S with 1s | |||
for k = 0 to 2*log(n)-1: | |||
for i such that tag(i) = k pardo | |||
if S(pre(i)) = S(next(i)) = 1: | |||
S(i) = 0 | |||
</pre> | |||
==Tree Contraction== | ==Tree Contraction== | ||
Line 530: | Line 551: | ||
;Step 2: | ;Step 2: | ||
*Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>. | * Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>. | ||
* Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>. | |||
* Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root. | |||
* Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root. | |||
*Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root. | * Repeat Step 2 until the tree is a 3-node binary tree | ||
*Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root. | |||
*Repeat Step 2 until the tree is a 3-node binary tree | |||
===Complexity=== | ===Complexity=== | ||
Line 550: | Line 569: | ||
The internal node will be <math>(a*l+b) \times (c*r+d)</math> where <math>\times</math> is either addition or multiplication.<br> | The internal node will be <math>(a*l+b) \times (c*r+d)</math> where <math>\times</math> is either addition or multiplication.<br> | ||
If we delete one leaf and it's parent (stem), we will modify these variables of the stem's parent. | If we delete one leaf and it's parent (stem), we will modify these variables of the stem's parent. | ||
===Tree Binarization=== | |||
;Note: This is not in the classnotes, but has been covered in exams. | |||
To apply the above techniques to general trees or k-ary trees, you should binarize your k-ary tree.<br> | |||
This is done as follows: | |||
* Suppose node <math>x</math> has <math>n > 2</math> children | |||
* Select one child to keep | |||
* Create a node <math>x'</math> | |||
* Move all <math>n-1</math> children to <math>x'</math> | |||
* Set child <math>x'</math> to be a child of <math>x</math> | |||
Complexity | |||
* The time is <math>O(max\_degree)</math> and the work is <math>O(n)</math> | |||
==Graph Connectivity== | ==Graph Connectivity== | ||
Line 635: | Line 668: | ||
===Minimum Spanning Forest=== | ===Minimum Spanning Forest=== | ||
minimum spanning | ====The MSF Theorem==== | ||
Let <math>G(V,E)</math> be a weighted graph and let <math>U</math> and <math>V-U</math> be two non-empty subsets of <math>V</math>. | |||
Consider set <math>H=\{(u,v) \mid u \in U, v \in V\}</math> of edges connecting points in <math>U</math> and <math>V</math>. | |||
If <math>e</math> is the edge of minimum weight in <math>H</math>, then <math>e</math> is in the MSF of <math>G</math> | |||
====A first MSF algorithm==== | |||
* Sort the edges by weight. Assign a processor to each edge. Using a priority CRCW, the edge with minimum weight will take priority during writes. | |||
Each round consists of hooking and <math>\log n</math> rounds of parallel pointer jumping | |||
* Each root find the minimum weight edge to another supervertex and hooks itself to that root. | |||
*: This is automatic from the priority CRCW. | |||
====A second MSF algorithm==== | |||
We replace steps 2 and 3 in the connectivity algorithm with the simple rule: | |||
* We hook rooted stars using the edge with the smallest weight. | |||
;Notes | |||
* The progression of edge weights up the tree are monotonically increasing so there are no cycles | |||
===Biconnectivity=== | |||
This section is not it the textbook. It is from Tarjan and Vishkin (1985)<ref name="tarjan1985biconnectivity">Robert E. Tarjan, and Uzi Vishkin. ''An Efficient Parallel Biconnectivity Algorithm'' (1985). SIAM Journal on Computing. DOI: [https://doi.org/10.1137/0214061 10.1137/0214061] [https://epubs.siam.org/doi/10.1137/0214061 https://epubs.siam.org/doi/10.1137/0214061] [https://www.researchgate.net/publication/220617428_An_Efficient_Parallel_Biconnectivity_Algorithm Mirror]</ref>. | |||
A graph is biconnected if for every two vertices <math>v_1</math> and <math>v_2</math>, there is a simple cycle containing <math>v_1</math> and <math>v_2</math>. | |||
Intuitively this means that for any two vertices, there are at least two paths from <math>v_1</math> to <math>v_2</math>. If any vertex and its edges are removed from a biconnected graph, it still remains connected. | |||
Every connected graph consists of biconnected components. These are sets of edges such that every two edges from a set lie on a simple cycle. | |||
Vertices which connect two biconnected components are called ''articulation points''. | |||
If these points are removed, the two biconnected components are no longer connected. | |||
;Algorithm | |||
Assume we are given a connected graph <math>G</math>. | |||
# First build a spanning tree <math>T</math>. Record which edges are in the spanning tree. Root the spanning tree. | |||
# Compute the preorder number of all edges using an euler tour. | |||
# For every vertex <math>v</math>, calculate the hi and low preorder numbers in the subtree <math>T(v)</math> | |||
# Create the auxiliary graph <math>G'' = (V'', E'')</math> as follows: | |||
#* All edges of <math>G</math> are vertices in <math>G''</math> | |||
#* For each edge <math>(v, w)</math> in <math>G - T</math>, add edge <math> ((p(v),v),(p(w),w))</math> to <math>G''</math> iff <math>v</math> and <math>w</math> are unrelated in <math>T</math> | |||
#* For each edge <math>(v = p(w), w)</math> in <math>T</math>, add edge <math>((p(v), v), (v, w))</math> to <math>G''</math> iff <math>low(w) < v</math> or <math>high(w) \geq v + size(v)</math> | |||
# Compute the connected components of <math>G''</math> | |||
# Assign edges based on their connected components. | |||
;Complexity | |||
* <math>O(\log n)</math> time | |||
==Tricks== | |||
===Removing duplicates in an array=== | |||
Assume you have an Arbitrary CRCW | |||
# Given array A of size n with entries 0 to n-1 | |||
# For each entry pardo | |||
#* Write B[A[i]] == i | |||
#* Only one write will succeed for each unique A[i] | |||
#* Check if B[A[i]] == i | |||
===Compaction of an <math>n^2</math> array with <math>\leq n</math> elements in <math>O(\log n)</math> time=== | |||
* Split our <math>n^2</math> array to n subarrays, each of size <math>n</math> | |||
* Run <math>n</math> parallel compactions on each subarray | |||
* Perform one compaction on the size of each subarray | |||
* Compact all elements using (total size of prev subarrays + position in current subarray) | |||
The benefit of this segmented compaction/prefix sum is that if you only do each segment once in an iterative algorithm such as parallel BFS, | |||
you can get <math>O(n^2)</math> or <math>O(m)</math> work. | |||
==XMT Programming Tips== | |||
I never figured out how to use standard c headers. | |||
* XMT only has int (32 bit) and float (32 bit). If you need a bool type, you will need to define it in a header. | |||
* You cannot call functions from within <code>spawn</code> | |||
Here is my helpers header <code>helpers.h</code> | |||
{{ hidden | <code>helpers.h</code> | | |||
<syntaxhighlight lang="c"> | |||
#ifndef HELPERS_H | |||
#define HELPERS_H | |||
#define true 1 | |||
#define false 0 | |||
#define bool int | |||
// #define NIL -1 | |||
#define max(a, b) ((((a) < (b)) ? (b) : (a))) | |||
#define min(a, b) ((((a) < (b)) ? (a) : (b))) | |||
#endif | |||
</syntaxhighlight> | |||
}} | |||
If you prefer to debug in C++ like me, you can use the following code to adapt running the XMT code: | |||
This code does not run things asynchronously. | |||
{{hidden | C++ Code | | |||
<syntaxhighlight lang="cpp"> | |||
#define spawn(x, y) for (int $ = x; $ <= y; ++$) | |||
//#define spawn(x, y) for (int $ = y; $ >= x; --$) | |||
#define ps(a, b) \ | |||
{ \ | |||
b += a; \ | |||
a = b - a; \ | |||
} | |||
#define psm(a, b) \ | |||
{ \ | |||
b += a; \ | |||
a = b - a; \ | |||
} | |||
#define psBaseReg int | |||
#define STRINGIFY2(X) #X | |||
#define STRINGIFY(X) STRINGIFY2(X) | |||
using namespace std; | |||
#define GRAPH_NUM graph0 | |||
#define N 19 | |||
#define M 36 | |||
vector<int> vertices; | |||
vector<int> degrees; | |||
vector<vector<int>> edges; | |||
int main() { | |||
vertices.resize(N); | |||
degrees.resize(N); | |||
edges = std::vector<std::vector<int>>(M, std::vector<int>(2)); | |||
string datapath = "data/" STRINGIFY(GRAPH_NUM) "/graph.txt"; | |||
string data = readFile(datapath); | |||
regex section_regex( | |||
"[\\n\\r][\\n\\r]+|[\\r\\n]0\\s[\\r\\n]+", | |||
std::regex_constants::ECMAScript | std::regex_constants::icase); | |||
regex space_regex("[\\n\\s\\r]+", std::regex_constants::ECMAScript | | |||
std::regex_constants::icase); | |||
vector<string> sections = split(data, section_regex); | |||
// for (int i = 0; i < sections.size(); i++) { | |||
// cout << "Section " << i << " " << sections[i].size() << endl; | |||
// cout << sections[i] << endl; | |||
//} | |||
return 0; | |||
} | |||
string readFile(const string& datapath) { | |||
std::ifstream in(datapath, std::ios::in); | |||
string data; | |||
if (in.good()) { | |||
std::string contents; | |||
in.seekg(0, std::ios::end); | |||
contents.resize(static_cast<unsigned int>(in.tellg())); | |||
in.seekg(0, std::ios::beg); | |||
in.read(&contents[0], contents.size()); | |||
return contents; | |||
} | |||
std::cerr << "Failed to open file: " << datapath << std::endl; | |||
throw(errno); | |||
} | |||
vector<string> split(string s, regex r) { | |||
vector<string> splits; | |||
smatch m; // <-- need a match object | |||
while (regex_search(s, m, r)) // <-- use it here to get the match | |||
{ | |||
int split_on = m.position(); // <-- use the match position | |||
splits.push_back(s.substr(0, split_on)); | |||
s = s.substr(split_on + m.length()); // <-- also, skip the whole match | |||
} | |||
if (!s.empty()) { | |||
splits.push_back(s); // and there may be one last token at the end | |||
} | |||
return splits; | |||
} | |||
</syntaxhighlight> | |||
}} | |||
==Hardware/Architecture== | ==Hardware/Architecture== | ||
[http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM] | [http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM] | ||
Line 668: | Line 877: | ||
*[https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)] | *[https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)] | ||
== | ==References== | ||