Parallel Algorithms: Difference between revisions

No edit summary
Tag: visualeditor
No edit summary
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
Parallel Algorithms notes from CMSC751 with Uzi Vishkin.<br>
Parallel Algorithms notes from CMSC751 with [http://users.umiacs.umd.edu/~vishkin/index.shtml Uzi Vishkin].<br>
This class is based on his book [[:File:CMSC751_Classnotes.pdf | Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques]]
This class is based on his book [[:File:CMSC751_Classnotes.pdf | Thinking in Parallel Some Basic Data-Parallel Algorithms and Techniques]]<br>
[http://users.umiacs.umd.edu/~vishkin/TEACHING/ Class Website]<br>


==XMT Language==
==XMT Language==
Line 11: Line 12:
**Stores Ri + Rj in Ri
**Stores Ri + Rj in Ri
**Stores the original value of Ri in Rj
**Stores the original value of Ri in Rj
<pre>
<syntaxhighlight lang="c">
int x = 0;
int x = 0;


Line 22: Line 23:
   }
   }
}
}
</pre>
</syntaxhighlight>


==Models==
==Models==
Line 209: Line 210:
*What we have:
*What we have:
**Algorithm 1 has <math>O(\log n)</math> iterations.
**Algorithm 1 has <math>O(\log n)</math> iterations.
::Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math>
::Each iteration reduces a size m instance in <math>O(\log m)</math> time and <math>O(m)</math> work to an instance of size <math>\leq 3m/4</math>
**Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work.


**Algorithm 2 runs in <math>O(\log n)</math> time and <math>O(n \log n)</math> work.
*Step 1: Use algo 1 to reduce from n to <math>n / \log n</math>
*Step 1: Use algo 1 to reduce from n to <math>n / \log n</math>
*Step 2: Apply algorithm 2
*Step 2: Apply algorithm 2
Total time is <math>O(\log n \log\log n)</math> and total work is <math>O(n)</math>


===Informal Work-Depth (IWD)===
===Informal Work-Depth (IWD)===
Line 257: Line 259:


Radix sort using the basic integer sort (BIS) algorithm.<br>
Radix sort using the basic integer sort (BIS) algorithm.<br>
If your range is 0 to n and and your radix is <math>\sqrt{n}</math> then you will need <math>log_{\sqrt{n}}(r) = 2</math> rounds.
If your range is 0 to n and and your radix is <math>\sqrt{n}</math> then you will need <math>\log_{\sqrt{n}}(r) = 2</math> rounds.


==2-3 trees; Technique: Pipelining==
==2-3 trees; Technique: Pipelining==
Line 398: Line 400:
**Now we have an euler cycle (or euler tour) among the edges
**Now we have an euler cycle (or euler tour) among the edges
*Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math>
*Step 3: <math>next(u_{r,1} \rightarrow r) = NIL </math>
** The edge <math>u_{r,1} \rightarrow r</math> is the opposite edge of the first edge coming out of <math>r</math>
*Step 4: Apply a list ranking algorithm to the edges.
*Step 4: Apply a list ranking algorithm to the edges.
*Step 5: Delete the edge between every two vertices with the lower ranking
*Step 5: Delete the edge between every two vertices with the lower ranking
Line 452: Line 455:
;Complexity
;Complexity


With high probability:
We get the same complexity for randomized and deterministic list ranking (see below)<br>
Randomized has the complexity with high probability.
*<math>O(\log (n) \log \log (n))</math> time
*<math>O(n)</math> work


*<math>O(log (n) log log (n)) time</math>
Using a parallel prefix-sum algorith which runs in <math>O(\log n/\log \log n)</math> time,
*<math>O(n)</math> work
we can get list ranking in <math>O(\log n)</math> time and <math>O(n)</math> work.


===Randomized Symmetry Breaking===
===Randomized Symmetry Breaking===
Line 466: Line 472:
     s(i) = 1
     s(i) = 1
</pre>
</pre>
;Notes
* The full randomized list ranking works in <math>O(\log n \log\log n)</math> time and <math>O(n)</math> work with high probability


===Deterministic Symmetry Breaking===
===Deterministic Symmetry Breaking===
Line 486: Line 495:
*''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math>
*''Uniform Sparcity'' If node <math>i</math> is in <math>S</math> then node <math>next(i)</math> is not in <math>S</math>


;Algorithm
;Algorithm 1


The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math>
The following algorithm gives us an r-ruling set with <math>r=2*\log n - 1</math>
Line 492: Line 501:
#Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math>
#Apply the deterministic coin tossing algorithm to give each element a tag in <math>[0,...,2*\log n - 1]</math>
#For each element i in parallel, if i is a local maximum then add it to <math>S</math>
#For each element i in parallel, if i is a local maximum then add it to <math>S</math>
;Optimal-Work 2-Ruling set
<pre>
Apply one iteration of deterministic coin tossing
Sort elements by tags
Initialize array S with 1s
for k = 0 to 2*log(n)-1:
  for i such that tag(i) = k pardo
    if S(pre(i)) = S(next(i)) = 1:
      S(i) = 0
</pre>


==Tree Contraction==
==Tree Contraction==
Line 530: Line 551:
;Step 2:
;Step 2:


*Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>.
* Let <math>S_1</math> be nodes in <math>S</math> whose parents are not in <math>S</math>.
 
* Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>.
;Let <math>L_1</math> be leaves in <math>L</math> whose parents are in <math>S_1</math>.
* Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root.
 
* Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root.
*Apply parallel rakes to leaves in <math>L_1</math>, unless the parent is the root.
* Repeat Step 2 until the tree is a 3-node binary tree
*Apply parallel rakes to leaves in <math>L - L_1</math>, unless the parent is the root.
*Repeat Step 2 until the tree is a 3-node binary tree


===Complexity===
===Complexity===
Line 550: Line 569:
The internal node will be <math>(a*l+b) \times (c*r+d)</math> where <math>\times</math> is either addition or multiplication.<br>
The internal node will be <math>(a*l+b) \times (c*r+d)</math> where <math>\times</math> is either addition or multiplication.<br>
If we delete one leaf and it's parent (stem), we will modify these variables of the stem's parent.
If we delete one leaf and it's parent (stem), we will modify these variables of the stem's parent.
===Tree Binarization===
;Note: This is not in the classnotes, but has been covered in exams.
To apply the above techniques to general trees or k-ary trees, you should binarize your k-ary tree.<br>
This is done as follows:
* Suppose node <math>x</math> has <math>n > 2</math> children
* Select one child to keep
* Create a node <math>x'</math>
* Move all <math>n-1</math> children to <math>x'</math>
* Set child <math>x'</math> to be a child of <math>x</math>
Complexity
* The time is <math>O(max\_degree)</math> and the work is <math>O(n)</math>


==Graph Connectivity==
==Graph Connectivity==
Line 635: Line 668:
===Minimum Spanning Forest===
===Minimum Spanning Forest===


minimum spanning forest
====The MSF Theorem====
Let <math>G(V,E)</math> be a weighted graph and let <math>U</math> and <math>V-U</math> be two non-empty subsets of <math>V</math>.
Consider set <math>H=\{(u,v) \mid u \in U, v \in V\}</math> of edges connecting points in <math>U</math> and <math>V</math>.
If <math>e</math> is the edge of minimum weight in <math>H</math>, then <math>e</math> is in the MSF of <math>G</math>
 
====A first MSF algorithm====
* Sort the edges by weight. Assign a processor to each edge. Using a priority CRCW, the edge with minimum weight will take priority during writes.
 
Each round consists of hooking and <math>\log n</math> rounds of parallel pointer jumping
* Each root find the minimum weight edge to another supervertex and hooks itself to that root.
*: This is automatic from the priority CRCW.
 
====A second MSF algorithm====
We replace steps 2 and 3 in the connectivity algorithm with the simple rule:
 
* We hook rooted stars using the edge with the smallest weight.
 
;Notes
* The progression of edge weights up the tree are monotonically increasing so there are no cycles
 
===Biconnectivity===
This section is not it the textbook. It is from Tarjan and Vishkin (1985)<ref name="tarjan1985biconnectivity">Robert E. Tarjan, and Uzi Vishkin. ''An Efficient Parallel Biconnectivity Algorithm'' (1985). SIAM Journal on Computing. DOI: [https://doi.org/10.1137/0214061 10.1137/0214061] [https://epubs.siam.org/doi/10.1137/0214061 https://epubs.siam.org/doi/10.1137/0214061] [https://www.researchgate.net/publication/220617428_An_Efficient_Parallel_Biconnectivity_Algorithm Mirror]</ref>.
 
A graph is biconnected if for every two vertices <math>v_1</math> and <math>v_2</math>, there is a simple cycle containing <math>v_1</math> and <math>v_2</math>.
Intuitively this means that for any two vertices, there are at least two paths from <math>v_1</math> to <math>v_2</math>. If any vertex and its edges are removed from a biconnected graph, it still remains connected.
 
Every connected graph consists of biconnected components. These are sets of edges such that every two edges from a set lie on a simple cycle.
Vertices which connect two biconnected components are called ''articulation points''.
If these points are removed, the two biconnected components are no longer connected.
 
;Algorithm
Assume we are given a connected graph <math>G</math>.
# First build a spanning tree <math>T</math>. Record which edges are in the spanning tree. Root the spanning tree.
# Compute the preorder number of all edges using an euler tour.
# For every vertex <math>v</math>, calculate the hi and low preorder numbers in the subtree <math>T(v)</math>
# Create the auxiliary graph <math>G'' = (V'', E'')</math> as follows:
#* All edges of <math>G</math> are vertices in  <math>G''</math>
#* For each edge <math>(v, w)</math> in <math>G - T</math>, add edge <math> ((p(v),v),(p(w),w))</math> to <math>G''</math> iff <math>v</math> and <math>w</math> are unrelated in <math>T</math>
#* For each edge <math>(v = p(w), w)</math> in <math>T</math>, add edge <math>((p(v), v), (v, w))</math> to <math>G''</math> iff <math>low(w) < v</math> or <math>high(w) \geq v + size(v)</math>
# Compute the connected components of <math>G''</math>
# Assign edges based on their connected components.
 
;Complexity
* <math>O(\log n)</math> time
 
==Tricks==
===Removing duplicates in an array===
Assume you have an Arbitrary CRCW
# Given array A of size n with entries 0 to n-1
# For each entry pardo
#* Write B[A[i]] == i
#* Only one write will succeed for each unique A[i]
#* Check if B[A[i]] == i
 
===Compaction of an <math>n^2</math> array with <math>\leq n</math> elements in <math>O(\log n)</math> time===
* Split our <math>n^2</math> array to n subarrays, each of size <math>n</math>
* Run <math>n</math> parallel compactions on each subarray
* Perform one compaction on the size of each subarray
* Compact all elements using (total size of prev subarrays + position in current subarray)
 
The benefit of this segmented compaction/prefix sum is that if you only do each segment once in an iterative algorithm such as parallel BFS,
you can get <math>O(n^2)</math> or <math>O(m)</math> work.
 
==XMT Programming Tips==
I never figured out how to use standard c headers.
 
* XMT only has int (32 bit) and float (32 bit). If you need a bool type, you will need to define it in a header.
* You cannot call functions from within <code>spawn</code>
 
Here is my helpers header <code>helpers.h</code>
{{ hidden | <code>helpers.h</code> |
<syntaxhighlight lang="c">
#ifndef HELPERS_H
#define HELPERS_H
 
#define true 1
#define false 0
#define bool int
// #define NIL -1
 
#define max(a, b) ((((a) < (b)) ? (b) : (a)))
#define min(a, b) ((((a) < (b)) ? (a) : (b)))
 
#endif
</syntaxhighlight>
}}
 
If you prefer to debug in C++ like me, you can use the following code to adapt running the XMT code:
This code does not run things asynchronously.
{{hidden | C++ Code |
 
<syntaxhighlight lang="cpp">
#define spawn(x, y) for (int $ = x; $ <= y; ++$)
//#define spawn(x, y) for (int $ = y; $ >= x; --$)
#define ps(a, b) \
  {              \
    b += a;      \
    a = b - a;  \
  }
#define psm(a, b) \
  {              \
    b += a;      \
    a = b - a;    \
  }
#define psBaseReg int
 
#define STRINGIFY2(X) #X
#define STRINGIFY(X) STRINGIFY2(X)
 
using namespace std;
 
#define GRAPH_NUM graph0
#define N 19
#define M 36
 
vector<int> vertices;
vector<int> degrees;
vector<vector<int>> edges;
 
int main() {
  vertices.resize(N);
  degrees.resize(N);
  edges = std::vector<std::vector<int>>(M, std::vector<int>(2));
 
  string datapath = "data/" STRINGIFY(GRAPH_NUM) "/graph.txt";
  string data = readFile(datapath);
  regex section_regex(
      "[\\n\\r][\\n\\r]+|[\\r\\n]0\\s[\\r\\n]+",
      std::regex_constants::ECMAScript | std::regex_constants::icase);
  regex space_regex("[\\n\\s\\r]+", std::regex_constants::ECMAScript |
                                        std::regex_constants::icase);
  vector<string> sections = split(data, section_regex);
 
  // for (int i = 0; i < sections.size(); i++) {
  //  cout << "Section " << i << " " << sections[i].size() << endl;
  //  cout << sections[i] << endl;
  //}
 
  return 0;
}
 
string readFile(const string& datapath) {
  std::ifstream in(datapath, std::ios::in);
  string data;
  if (in.good()) {
    std::string contents;
    in.seekg(0, std::ios::end);
    contents.resize(static_cast<unsigned int>(in.tellg()));
    in.seekg(0, std::ios::beg);
    in.read(&contents[0], contents.size());
    return contents;
  }
  std::cerr << "Failed to open file: " << datapath << std::endl;
  throw(errno);
}
 
vector<string> split(string s, regex r) {
  vector<string> splits;
  smatch m;  // <-- need a match object
 
  while (regex_search(s, m, r))  // <-- use it here to get the match
  {
    int split_on = m.position();  // <-- use the match position
    splits.push_back(s.substr(0, split_on));
    s = s.substr(split_on + m.length());  // <-- also, skip the whole match
  }
 
  if (!s.empty()) {
    splits.push_back(s);  // and there may be one last token at the end
  }
 
  return splits;
}
</syntaxhighlight>
 
}}
 
==Hardware/Architecture==
==Hardware/Architecture==
[http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM]
[http://users.umiacs.umd.edu/~vishkin/XMT/vCarageaLeebook.pdf Models for Advancing PRAM and Other Algorithms into Parallel Algorithms for PRAM]
Line 668: Line 877:
*[https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)]
*[https://www.cs.cmu.edu/~guyb/papers/BM04.pdf Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs (CMU)]


==Ignore==
==References==
[[Visible to::users]]