Lowest Common Ancestor in Binary Trees: Algorithm & Implementation

10 min read 07-11-2024

Lowest Common Ancestor in Binary Trees: Algorithm & Implementation

Introduction

In the realm of computer science, binary trees are a fundamental data structure that serves as the cornerstone of many algorithms. These trees consist of nodes, each holding a value, with a parent node and at most two child nodes. One of the common operations performed on binary trees is finding the Lowest Common Ancestor (LCA) of two given nodes.

The LCA of two nodes in a binary tree is defined as the deepest node that is an ancestor of both nodes. It's like finding the most recent common ancestor in a family tree. This concept is essential for many applications, including:

Efficient Search and Retrieval: LCA helps in navigating large datasets like file systems or hierarchical databases.
Phylogenetic Analysis: Determining the evolutionary relationships between species.
Network Routing: Optimizing network communication paths.
Compression Algorithms: Used in data compression techniques.

In this article, we will delve into the intricacies of the LCA problem, exploring different algorithms and their implementations. We will start by defining the problem more formally, then proceed to discuss various algorithms to solve it efficiently, along with their complexities and advantages.

Understanding the Problem

Definition: Given a binary tree and two nodes, we need to find the lowest common ancestor (LCA) of those two nodes. The LCA is the deepest node that is an ancestor of both nodes.

Formal Definition: The lowest common ancestor (LCA) of two nodes n1 and n2 in a binary tree is defined as the deepest node that has both n1 and n2 as descendants.

Example:

Imagine a family tree where you want to find the common ancestor of two cousins. The LCA would be the grandparent who is the closest common ancestor of both cousins.

Let's visualize this with an example:

       1
     /  \
    2    3
   / \  / \
  4   5 6   7
 / \     / \
8   9   10 11

If we want to find the LCA of nodes 4 and 5, the answer is node 2, as it's the deepest node that has both 4 and 5 as descendants.

Algorithms to Find the LCA

Now, let's explore different algorithms to find the LCA in a binary tree.

1. Recursive Algorithm

The recursive approach is a straightforward method to find the LCA. It involves traversing the tree recursively, starting from the root node. We check if the current node is either of the target nodes or if the target nodes are present in the left and right subtrees.

Algorithm:

Base Cases:
- If the current node is NULL (empty), return NULL.
- If the current node is either of the target nodes, return the current node.
Recursive Steps:
- Recursively find the LCA in the left subtree.
- Recursively find the LCA in the right subtree.
Result:
- If both the left and right subtrees returned a non-NULL node, it means the LCA is the current node.
- If only one subtree returned a non-NULL node, the LCA is the node returned by that subtree.
- If both subtrees returned NULL, then the LCA is also NULL.

Implementation in Python:

class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

def lca(root, n1, n2):
    if root is None or root.data == n1 or root.data == n2:
        return root

    left_lca = lca(root.left, n1, n2)
    right_lca = lca(root.right, n1, n2)

    if left_lca and right_lca:
        return root
    elif left_lca:
        return left_lca
    else:
        return right_lca

# Example usage
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
root.right.left = Node(6)
root.right.right = Node(7)
n1 = 4
n2 = 5

lca_node = lca(root, n1, n2)
print(f"LCA of {n1} and {n2} is: {lca_node.data}")

Time Complexity: O(N), where N is the number of nodes in the tree. In the worst case, we might visit all nodes of the tree.

Space Complexity: O(H), where H is the height of the tree. This is due to the recursive call stack.

2. Iterative Algorithm (Using Ancestors)

This approach uses an iterative method to traverse the tree and track the ancestors of the target nodes. It involves storing the ancestors of both nodes in a separate data structure and finding the common node with the greatest depth.

Algorithm:

Initialize two stacks: One for each target node (n1_ancestors, n2_ancestors).
Push nodes onto stacks:
- Starting from the root, traverse the tree in a depth-first manner.
- For each node encountered, if it's either n1 or n2, push the node onto the corresponding stack.
Find the LCA:
- Compare the top elements of both stacks until they become equal.
- The common node with the greatest depth (i.e., closest to the root) is the LCA.

Implementation in Python:

class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

def lca(root, n1, n2):
    n1_ancestors = []
    n2_ancestors = []

    def dfs(node, target, ancestors):
        if node is None:
            return False
        ancestors.append(node)
        if node.data == target:
            return True
        left_found = dfs(node.left, target, ancestors)
        right_found = dfs(node.right, target, ancestors)
        if not left_found and not right_found:
            ancestors.pop()
        return left_found or right_found

    dfs(root, n1, n1_ancestors)
    dfs(root, n2, n2_ancestors)

    lca = None
    while n1_ancestors and n2_ancestors:
        n1_top = n1_ancestors.pop()
        n2_top = n2_ancestors.pop()
        if n1_top == n2_top:
            lca = n1_top
            break

    return lca

# Example usage (same tree as before)
lca_node = lca(root, n1, n2)
print(f"LCA of {n1} and {n2} is: {lca_node.data}")

Time Complexity: O(N) in the worst case, as we might visit all nodes in the tree.

Space Complexity: O(H) for the stacks, where H is the height of the tree.

3. Binary Lifting Technique

The Binary Lifting Technique is an optimized method to efficiently find the LCA. It involves pre-processing the tree to calculate the ancestors of each node at different levels. This pre-processing enables us to quickly jump to an ancestor at a specific level.

Algorithm:

Pre-processing:
- For each node u, store its ancestor at level 2^i for i ranging from 0 to log(N), where N is the number of nodes. This is done using a 2D array ancestors[u][i], where ancestors[u][i] stores the ancestor of u at level 2^i.
- Calculate the ancestors using a bottom-up approach.
Finding LCA:
- For each target node n1 and n2:
  - Find the highest level ancestor of n1 and n2 that has the same depth.
  - If the depths are not the same, move the node with higher depth up to the same level as the other node.
  - Once both nodes are at the same level, repeatedly move both nodes up using the ancestors array until they become equal. The common node they reach is the LCA.

Implementation in Python:

class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

def lca(root, n1, n2):
    n = 10  # Maximum number of nodes (adjust based on your tree)
    height = 0
    ancestors = [[None for _ in range(int(log2(n)) + 1)] for _ in range(n)]

    def dfs(node, parent, depth):
        nonlocal height
        height = max(height, depth)
        ancestors[node.data][0] = parent
        for i in range(1, int(log2(n)) + 1):
            if ancestors[node.data][i - 1] is not None:
                ancestors[node.data][i] = ancestors[ancestors[node.data][i - 1]][i - 1]
        if node.left:
            dfs(node.left, node.data, depth + 1)
        if node.right:
            dfs(node.right, node.data, depth + 1)

    def jump(node, target_depth):
        depth = 0
        while depth < target_depth:
            i = int(log2(target_depth - depth))
            if ancestors[node][i] is not None:
                node = ancestors[node][i]
                depth += 2**i
        return node

    # Calculate ancestors
    dfs(root, None, 0)

    # Find the LCA
    n1_depth = height - get_depth(root, n1)
    n2_depth = height - get_depth(root, n2)
    n1 = jump(n1, n1_depth)
    n2 = jump(n2, n2_depth)
    while n1 != n2:
        i = int(log2(height))
        while i >= 0 and ancestors[n1][i] == ancestors[n2][i]:
            i -= 1
        if i >= 0:
            n1 = ancestors[n1][i]
            n2 = ancestors[n2][i]
    return n1

# Helper functions
def log2(x):
    return int(math.log2(x))

def get_depth(node, target):
    if node is None:
        return -1
    if node.data == target:
        return 0
    left_depth = get_depth(node.left, target)
    right_depth = get_depth(node.right, target)
    if left_depth != -1:
        return left_depth + 1
    elif right_depth != -1:
        return right_depth + 1
    else:
        return -1

# Example usage
lca_node = lca(root, n1, n2)
print(f"LCA of {n1} and {n2} is: {lca_node.data}")

Time Complexity:

Pre-processing: O(N * log(N)).
LCA Calculation: O(log(N)).

Space Complexity: O(N * log(N)) for the ancestors array.

4. Using Tarjan's Offline LCA Algorithm

Tarjan's Offline LCA algorithm provides an efficient way to find the LCA of multiple node pairs in a single pass. It utilizes a technique called union-find to maintain a forest of disjoint sets, representing the connected components of the tree.

Algorithm:

Pre-processing:
- Perform a depth-first search (DFS) on the tree.
- During the DFS, assign each node a unique identifier (ID).
- Store the order in which nodes are visited in a list postorder.
LCA Calculation:
- Initialize a union-find data structure to represent the connected components.
- For each node u in the postorder list:
  - For each node v in the children of u:
    - Perform union(u, v).
- For each query (pair of nodes) (n1, n2):
  - Find the LCA using find(n1) and find(n2). The LCA is the node with the smaller ID.

Implementation in Python:

class UnionFind:
    def __init__(self, n):
        self.parent = [i for i in range(n)]
        self.rank = [0] * n

    def find(self, u):
        if self.parent[u] != u:
            self.parent[u] = self.find(self.parent[u])
        return self.parent[u]

    def union(self, u, v):
        u_root = self.find(u)
        v_root = self.find(v)
        if u_root == v_root:
            return
        if self.rank[u_root] < self.rank[v_root]:
            self.parent[u_root] = v_root
        elif self.rank[u_root] > self.rank[v_root]:
            self.parent[v_root] = u_root
        else:
            self.parent[u_root] = v_root
            self.rank[v_root] += 1

class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

def lca(root, queries):
    id_map = {}
    postorder = []
    count = 0

    def dfs(node):
        nonlocal count
        id_map[node] = count
        count += 1
        if node.left:
            dfs(node.left)
        if node.right:
            dfs(node.right)
        postorder.append(node)

    dfs(root)

    uf = UnionFind(count)
    lcas = {}

    for node in postorder[::-1]:
        for child in [node.left, node.right]:
            if child:
                uf.union(id_map[node], id_map[child])

    for n1, n2 in queries:
        lcas[(n1, n2)] = id_map[uf.find(id_map[n1])]

    return lcas

# Example usage
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
root.right.left = Node(6)
root.right.right = Node(7)

queries = [(4, 5), (6, 7)]
lcas = lca(root, queries)
print(lcas)

Time Complexity: O(N + Q * log(N)), where N is the number of nodes and Q is the number of queries.

Space Complexity: O(N) for the union-find data structure and postorder list.

Advantages and Disadvantages of Different Algorithms

Here's a summary of the advantages and disadvantages of the algorithms we discussed:

Algorithm	Time Complexity	Space Complexity	Advantages	Disadvantages
Recursive	O(N)	O(H)	Simple to understand and implement.	Can be inefficient for large trees with high depth.
Iterative (Using Ancestors)	O(N)	O(H)	Can be more efficient than recursion for large trees.	Requires maintaining ancestor stacks, which can consume memory.
Binary Lifting	Pre-processing: O(N * log(N)), LCA Calculation: O(log(N))	O(N * log(N))	Very efficient for multiple LCA queries after pre-processing.	Requires pre-processing, which can be computationally expensive for large trees.
Tarjan's Offline LCA Algorithm	O(N + Q * log(N))	O(N)	Efficient for multiple LCA queries in a single pass, making it ideal for offline processing.	Requires the queries to be known beforehand (offline processing). The implementation can be complex compared to the other approaches.

Choosing the Right Algorithm

The optimal algorithm for finding the LCA depends on factors such as the size of the tree, the number of queries, and whether the queries are known beforehand (offline processing).

For small trees or a single LCA query, the recursive algorithm is a simple and straightforward choice.
For larger trees with a few LCA queries, the iterative algorithm or Binary Lifting technique can be more efficient.
For multiple LCA queries known beforehand, Tarjan's Offline LCA algorithm offers the best performance.

Applications of LCA

The Lowest Common Ancestor (LCA) problem finds applications in various domains of computer science, including:

1. File Systems

File systems are often organized in a hierarchical structure, akin to a binary tree. Finding the LCA of two files can help in determining the directory that contains both files.

2. Database Systems

Hierarchical databases store data in a tree-like structure. LCA can be used to find the common ancestor of two records, enabling efficient data retrieval.

3. Network Routing

In computer networks, LCA can be used to optimize the routing of data packets between two nodes. The LCA of the source and destination nodes represents the point in the network where their paths converge, allowing for efficient routing.

4. Phylogenetic Analysis

Phylogenetic analysis aims to understand the evolutionary relationships between different species. LCA calculations are used to determine the common ancestors of species, providing insights into their evolutionary history.

5. Compression Algorithms

Some data compression algorithms utilize binary trees to represent data. LCA calculations can help in finding common substrings, enabling more efficient compression.

Conclusion

Finding the Lowest Common Ancestor (LCA) in a binary tree is a fundamental problem with many practical applications. We explored different algorithms for solving this problem, each offering different time and space complexities. The choice of algorithm depends on factors like tree size, number of queries, and whether the queries are known beforehand. Understanding LCA algorithms and their complexities empowers us to solve various problems efficiently in diverse domains of computer science.

FAQs

1. What is the difference between a node and an ancestor in a binary tree?

A node is any individual element in a binary tree. An ancestor of a node is any node that lies on the path from the root to that node.

2. Can the LCA of two nodes be one of the two nodes themselves?

Yes, the LCA can be one of the two nodes if one node is an ancestor of the other.

3. Why is Tarjan's Offline LCA algorithm considered more efficient for multiple queries?

Tarjan's algorithm pre-processes the tree once, allowing for efficient LCA calculations for multiple queries. It avoids repetitive traversals of the tree for each query, resulting in better performance.

4. Is there a way to find the LCA without traversing the entire tree?

Yes, Binary Lifting and Tarjan's Offline LCA algorithm can find the LCA without visiting every node in the tree. They utilize pre-processing techniques to store information that allows for faster calculations.

5. What are some real-world examples where finding the LCA is useful?

Determining the common directory of two files in a file system.
Identifying the common ancestor of two records in a hierarchical database.
Optimizing network routing paths in computer networks.
Analyzing evolutionary relationships between species.
Compressing data more efficiently using binary trees.

Lowest Common Ancestor in Binary Trees: Algorithm & Implementation

Introduction

Understanding the Problem

Algorithms to Find the LCA

1. Recursive Algorithm

2. Iterative Algorithm (Using Ancestors)

3. Binary Lifting Technique

4. Using Tarjan's Offline LCA Algorithm

Advantages and Disadvantages of Different Algorithms

Choosing the Right Algorithm

Applications of LCA

1. File Systems

2. Database Systems

3. Network Routing

4. Phylogenetic Analysis

5. Compression Algorithms

Conclusion

FAQs

1. What is the difference between a node and an ancestor in a binary tree?

2. Can the LCA of two nodes be one of the two nodes themselves?

3. Why is Tarjan's Offline LCA algorithm considered more efficient for multiple queries?

4. Is there a way to find the LCA without traversing the entire tree?

5. What are some real-world examples where finding the LCA is useful?

Related Posts

Latest Posts

Popular Posts