@@ Line 1: / Line 1: @@
-Parallel Algorithms notes from CMSC751 with Uzi Vishkin.<br>
+Parallel Algorithms notes from CMSC751 with [http://users.umiacs.umd.edu/~vishkin/index.shtml Uzi Vishkin].<br>
-This class is based on his book [[:File:CMSC751_Classnotes.pdf | Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques]]
+This class is based on his book [[:File:CMSC751_Classnotes.pdf | Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques]]<br>
+[http://users.umiacs.umd.edu/~vishkin/TEACHING/ Class Website]<br>
 ==XMT Language==
@@ Line 11: / Line 12: @@
 **Stores Ri + Rj in Ri
 **Stores the original value of Ri in Rj
-<pre>
+<syntaxhighlight lang="c">
 int x = 0;
@@ Line 22: / Line 23: @@
    }
 }
-</pre>
+</syntaxhighlight>
 ==Models==
@@ Line 209: / Line 210: @@
 *What we have:
 **Algorithm 1 has <math>O(\log n)</math> iterations.
 ::Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math>
+**Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work.
-**Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work.
 *Step 1: Use algo 1 to reduce from n to <math>n / \log n</math>
 *Step 2: Apply algorithm 2
+Total time is <math>O(\log n \log\log n)</math> and total work is <math>O(n)</math>
 ===Informal Work-Depth (IWD)===
@@ Line 257: / Line 259: @@
 Radix sort using the basic integer sort (BIS) algorithm.<br>
-If your range is 0 to n and and your radix is <math>\sqrt{n}</math> then you will need <math>log_{\sqrt{n}}(r) = 2</math> rounds.
+If your range is 0 to n and and your radix is <math>\sqrt{n}</math> then you will need <math>\log_{\sqrt{n}}(r) = 2</math> rounds.
 ==2-3 trees; Technique: Pipelining==
@@ Line 398: / Line 400: @@
 **Now we have an euler cycle (or euler tour) among the edges
 *Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math>
+** The edge <math>u_{r,1} \rightarrow r</math> is the opposite edge of the first edge coming out of <math>r</math>
 *Step 4: Apply a list ranking algorithm to the edges.
 *Step 5: Delete the edge between every two vertices with the lower ranking
@@ Line 452: / Line 455: @@
 ;Complexity
-With high probability:
+We get the same complexity for randomized and deterministic list ranking (see below)<br>
+Randomized has the complexity with high probability.
+*<math>O(\log (n) \log \log (n))</math> time
+*<math>O(n)</math> work
-*<math>O(log (n) log log (n)) time</math>
+Using a parallel prefix-sum algorith which runs in <math>O(\log n/\log \log n)</math> time,
-*<math>O(n)</math> work
+we can get list ranking in <math>O(\log n)</math> time and <math>O(n)</math> work.
 ===Randomized Symmetry Breaking===
@@ Line 466: / Line 472: @@
      s(i) = 1
 </pre>
+;Notes
+* The full randomized list ranking works in <math>O(\log n \log\log n)</math> time and <math>O(n)</math> work with high probability
 ===Deterministic Symmetry Breaking===
@@ Line 486: / Line 495: @@
 *''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math>
-;Algorithm
+;Algorithm 1
 The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math>
@@ Line 492: / Line 501: @@
 #Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math>
 #For each element i in parallel, if i is a local maximum then add it to <math>S</math>
+;Optimal-Work 2-Ruling set
+<pre>
+Apply one iteration of deterministic coin tossing
+Sort elements by tags
+Initialize array S with 1s
+for k = 0 to 2*log(n)-1:
+  for i such that tag(i) = k pardo
+    if S(pre(i)) = S(next(i)) = 1:
+      S(i) = 0
+</pre>
 ==Tree Contraction==
@@ Line 530: / Line 551: @@
 ;Step 2:
-*Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>.
+* Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>.
+* Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>.
-;Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>.
+* Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root.
+* Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root.
-*Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root.
+* Repeat Step 2 until the tree is a 3-node binary tree
-*Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root.
-*Repeat Step 2 until the tree is a 3-node binary tree
 ===Complexity===
@@ Line 550: / Line 569: @@
 The internal node will be <math>(a*l+b) \times (c*r+d)</math> where <math>\times</math> is either addition or multiplication.<br>
 If we delete one leaf and it's parent (stem), we will modify these variables of the stem's parent.
+===Tree Binarization===
+;Note: This is not in the classnotes, but has been covered in exams.
+To apply the above techniques to general trees or k-ary trees, you should binarize your k-ary tree.<br>
+This is done as follows:
+* Suppose node <math>x</math> has <math>n > 2</math> children
+* Select one child to keep
+* Create a node <math>x'</math>
+* Move all <math>n-1</math> children to <math>x'</math>
+* Set child <math>x'</math> to be a child of <math>x</math>
+Complexity
+* The time is <math>O(max\_degree)</math> and the work is <math>O(n)</math>
 ==Graph Connectivity==
@@ Line 635: / Line 668: @@
 ===Minimum Spanning Forest===
-minimum spanning forest
+====The MSF Theorem====
+Let <math>G(V,E)</math> be a weighted graph and let <math>U</math> and <math>V-U</math> be two non-empty subsets of <math>V</math>.
+Consider set <math>H=\{(u,v) \mid u \in U, v \in V\}</math> of edges connecting points in <math>U</math> and <math>V</math>.
+If <math>e</math> is the edge of minimum weight in <math>H</math>, then <math>e</math> is in the MSF of <math>G</math>
+====A first MSF algorithm====
+* Sort the edges by weight. Assign a processor to each edge. Using a priority CRCW, the edge with minimum weight will take priority during writes.
+Each round consists of hooking and <math>\log n</math> rounds of parallel pointer jumping
+* Each root find the minimum weight edge to another supervertex and hooks itself to that root.
+*: This is automatic from the priority CRCW.
+====A second MSF algorithm====
+We replace steps 2 and 3 in the connectivity algorithm with the simple rule:
+* We hook rooted stars using the edge with the smallest weight.
+;Notes
+* The progression of edge weights up the tree are monotonically increasing so there are no cycles
+===Biconnectivity===
+This section is not it the textbook. It is from Tarjan and Vishkin (1985)<ref name="tarjan1985biconnectivity">Robert E. Tarjan, and Uzi Vishkin. ''An Efficient Parallel Biconnectivity Algorithm'' (1985). SIAM Journal on Computing. DOI: [https://doi.org/10.1137/0214061 10.1137/0214061] [https://epubs.siam.org/doi/10.1137/0214061 https://epubs.siam.org/doi/10.1137/0214061] [https://www.researchgate.net/publication/220617428_An_Efficient_Parallel_Biconnectivity_Algorithm Mirror]</ref>.
+A graph is biconnected if for every two vertices <math>v_1</math> and <math>v_2</math>, there is a simple cycle containing <math>v_1</math> and <math>v_2</math>.
+Intuitively this means that for any two vertices, there are at least two paths from <math>v_1</math> to <math>v_2</math>. If any vertex and its edges are removed from a biconnected graph, it still remains connected.
+Every connected graph consists of biconnected components. These are sets of edges such that every two edges from a set lie on a simple cycle.
+Vertices which connect two biconnected components are called ''articulation points''.
+If these points are removed, the two biconnected components are no longer connected.
+;Algorithm
+Assume we are given a connected graph <math>G</math>.
+# First build a spanning tree <math>T</math>. Record which edges are in the spanning tree. Root the spanning tree.
+# Compute the preorder number of all edges using an euler tour.
+# For every vertex <math>v</math>, calculate the hi and low preorder numbers in the subtree <math>T(v)</math>
+# Create the auxiliary graph <math>G'' = (V'', E'')</math> as follows:
+#* All edges of <math>G</math> are vertices in  <math>G''</math>
+#* For each edge <math>(v, w)</math> in <math>G - T</math>, add edge <math> ((p(v),v),(p(w),w))</math> to <math>G''</math> iff <math>v</math> and <math>w</math> are unrelated in <math>T</math>
+#* For each edge <math>(v = p(w), w)</math> in <math>T</math>, add edge <math>((p(v), v), (v, w))</math> to <math>G''</math> iff <math>low(w) < v</math> or <math>high(w) \geq v + size(v)</math>
+# Compute the connected components of <math>G''</math>
+# Assign edges based on their connected components.
+;Complexity
+* <math>O(\log n)</math> time
+==Tricks==
+===Removing duplicates in an array===
+Assume you have an Arbitrary CRCW
+# Given array A of size n with entries 0 to n-1
+# For each entry pardo
+#* Write B[A[i]] == i
+#* Only one write will succeed for each unique A[i]
+#* Check if B[A[i]] == i
+===Compaction of an <math>n^2</math> array with <math>\leq n</math> elements in <math>O(\log n)</math> time===
+* Split our <math>n^2</math> array to n subarrays, each of size <math>n</math>
+* Run <math>n</math> parallel compactions on each subarray
+* Perform one compaction on the size of each subarray
+* Compact all elements using (total size of prev subarrays + position in current subarray)
+The benefit of this segmented compaction/prefix sum is that if you only do each segment once in an iterative algorithm such as parallel BFS,
+you can get <math>O(n^2)</math> or <math>O(m)</math> work.
+==XMT Programming Tips==
+I never figured out how to use standard c headers.
+* XMT only has int (32 bit) and float (32 bit). If you need a bool type, you will need to define it in a header.
+* You cannot call functions from within <code>spawn</code>
+Here is my helpers header <code>helpers.h</code>
+{{ hidden | <code>helpers.h</code> |
+<syntaxhighlight lang="c">
+#ifndef HELPERS_H
+#define HELPERS_H
+#define true 1
+#define false 0
+#define bool int
+// #define NIL -1
+#define max(a, b) ((((a) < (b)) ? (b) : (a)))
+#define min(a, b) ((((a) < (b)) ? (a) : (b)))
+#endif
+</syntaxhighlight>
+}}
+If you prefer to debug in C++ like me, you can use the following code to adapt running the XMT code:
+This code does not run things asynchronously.
+{{hidden | C++ Code |
+<syntaxhighlight lang="cpp">
+#define spawn(x, y) for (int $ = x; $ <= y; ++$)
+//#define spawn(x, y) for (int $ = y; $ >= x; --$)
+#define ps(a, b) \
+  {              \
+    b += a;      \
+    a = b - a;   \
+  }
+#define psm(a, b) \
+  {               \
+    b += a;       \
+    a = b - a;    \
+  }
+#define psBaseReg int
+#define STRINGIFY2(X) #X
+#define STRINGIFY(X) STRINGIFY2(X)
+using namespace std;
+#define GRAPH_NUM graph0
+#define N 19
+#define M 36
+vector<int> vertices;
+vector<int> degrees;
+vector<vector<int>> edges;
+int main() {
+  vertices.resize(N);
+  degrees.resize(N);
+  edges = std::vector<std::vector<int>>(M, std::vector<int>(2));
+  string datapath = "data/" STRINGIFY(GRAPH_NUM) "/graph.txt";
+  string data = readFile(datapath);
+  regex section_regex(
+      "[\\n\\r][\\n\\r]+|[\\r\\n]0\\s[\\r\\n]+",
+      std::regex_constants::ECMAScript | std::regex_constants::icase);
+  regex space_regex("[\\n\\s\\r]+", std::regex_constants::ECMAScript |
+                                        std::regex_constants::icase);
+  vector<string> sections = split(data, section_regex);
+  // for (int i = 0; i < sections.size(); i++) {
+  //  cout << "Section " << i << " " << sections[i].size() << endl;
+  //  cout << sections[i] << endl;
+  //}
+  return 0;
+}
+string readFile(const string& datapath) {
+  std::ifstream in(datapath, std::ios::in);
+  string data;
+  if (in.good()) {
+    std::string contents;
+    in.seekg(0, std::ios::end);
+    contents.resize(static_cast<unsigned int>(in.tellg()));
+    in.seekg(0, std::ios::beg);
+    in.read(&contents[0], contents.size());
+    return contents;
+  }
+  std::cerr << "Failed to open file: " << datapath << std::endl;
+  throw(errno);
+}
+vector<string> split(string s, regex r) {
+  vector<string> splits;
+  smatch m;  // <-- need a match object
+  while (regex_search(s, m, r))  // <-- use it here to get the match
+  {
+    int split_on = m.position();  // <-- use the match position
+    splits.push_back(s.substr(0, split_on));
+    s = s.substr(split_on + m.length());  // <-- also, skip the whole match
+  }
+  if (!s.empty()) {
+    splits.push_back(s);  // and there may be one last token at the end
+  }
+  return splits;
+}
+</syntaxhighlight>
+}}
 ==Hardware/Architecture==
 [http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM]
@@ Line 668: / Line 877: @@
 *[https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)]
-==Ignore==
+==References==
-[[Visible to::users]]

Parallel Algorithms: Difference between revisions