Disjoint Set Data Structure

Naïve implementation of DSU

One of the naïve implementation for DSU is to use an array. Here for a Leetcode problem I've shown an implementation of DSU that do not use rank-path compression optimizations, just plain and simple implementation of DSU with array.

APIs to support

• find(int i) to find the parent of some node i
• makeUnion(int i, int j) to create a union of two nodes i and j.
• The following is a simple implementation of DSU that implements the upper 2 APIs.
class DSU {
private:
int size;
vector<int> dsuArray;
public:
DSU(int n) {
int size = n;
for (int i=0;i<n+1;i++) {
dsuArray.push_back(i);
}
}

vector<int> getArray() {
return dsuArray;
}

int find(int n) {
return dsuArray[n];
}

bool makeUnion(int i, int j) {
int parent_i = find(i);
int parent_j = find(j);

if (parent_i == parent_j) return false;

// find all occurences of parent_j
// set to parent_i to make the Union
for (int i=1; i<dsuArray.size(); i++) {
if (dsuArray[i] == parent_j) {
dsuArray[i] = parent_i;
}
}

return true;
}
};


Union by rank optimization

If you see the makeUnion() API we are setting the parents of parent_j to parent_i. Now if the number of elements in the group of j is higher then there is a chance to optimize the time complexity. We can update the parent_i to parent_j as number of times parent_i is to be updated is much lower.

This is called union by rank optimization. In order to do this we need to have some data structure that'll indicate the size of every group everytime we do an union. Then we can take the smaller group and updated its parents instead of updating the parent of the larger group size.

Some Problems on Leetcode that uses Disjoint Set Structure

Redundant Connection

Problem Statement

In this problem, a tree is an undirected graph that is connected and has no cycles.

You are given a graph that started as a tree with n nodes labeled from 1 to n, with one additional edge added. The added edge has two different vertices chosen from 1 to n, and was not an edge that already existed. The graph is represented as an array edges of length n where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi in the graph.

Return an edge that can be removed so that the resulting graph is a tree of n nodes. If there are multiple answers, return the answer that occurs last in the input.

Example

Input: edges = [[1,2],[2,3],[3,4],[1,4],[1,5]]
Output: [1,4]


Approach

• We'll use a DSU to add the edges in the DSU until we get a cycle.
• When we get a cycle, we'll add it to the answer, this will be the edge that we need to return.

Code

class DSU {
private:
int size;
vector<int> dsuArray;
public:
DSU(int n) {
int size = n;
for (int i=0;i<n+1;i++) {
dsuArray.push_back(i);
}
}

vector<int> getArray() {
return dsuArray;
}

int find(int n) {
return dsuArray[n];
}

bool makeUnion(int i, int j) {
int parent_i = find(i);
int parent_j = find(j);

if (parent_i == parent_j) return false;

// find all occurences of parent_j
// set to parent_i to make the Union
for (int i=1; i<dsuArray.size(); i++) {
if (dsuArray[i] == parent_j) {
dsuArray[i] = parent_i;
}
}

return true;
}
};

class Solution {
public:
vector<int> findRedundantConnection(vector<vector<int>>& edges) {
int maxim = INT_MIN;
for (auto j:edges) {
maxim = std::max(j[0], maxim);
maxim = std::max(j[1], maxim);
}

// number of nodes = maxim now.
// now we create a DSU
DSU dsu = DSU(maxim);

for (auto edgePair:edges) {
int first = edgePair[0];
int second = edgePair[1];

bool possible = dsu.makeUnion(edgePair[0], edgePair[1]);

if (not possible) {