Decoding Evolutionary Paths: A Guide to UPGMA Method for Constructing Phylogenetic Trees

Monika Mate
5 min readMar 30, 2024

--

Constructing phylogenetic tree using UPGMA method

In the field of bioinformatics, the quest to unravel the evolutionary relationships among species has led to the development of numerous methods. One such method, the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), is a method recognized for its elegance and simplicity in constructing phylogenetic trees. In this blog, let’s delve into the world of UPGMA and unravel its methodology, explore the significance of distance matrices, and consider its reliability.

Exploring UPGMA: Methodology Unveiled

UPGMA is a hierarchical clustering method used in constructing phylogenetic trees based on genetic distance matrices. It starts with a matrix detailing the genetic distances between species or taxa pairs. The algorithm then iteratively clusters the closest species, computing their average distance. This process continues until it culminates into a complete dendrogram or hierarchical tree diagram, illustrating the phylogenetic relationships.

Let’s explore an illustrative example to understand the working of UPGMA. In this case, we’ll consider seven species denoted as A through G to clarify their genetic relationships by constructing a phylogenetic tree. Below is the distance matrix that represents the genetic distances among these species, serving as the foundation for constructing the phylogenetic tree using the UPGMA method.

Distance matrix —

Distance matrix

To initiate the construction process using the UPGMA method, we commence with the following steps:

Step 1 —

Identify the pair of species with smallest distance. In the provided distance matrix, it is B and F, exhibiting a distance of 1.00.

Step 2 —

Create a new cluster by combining the pair with smallest distance and replace the values.

After identifying the pair with smallest distance, we merge them into a new group or cluster. In this case, we merge species B and F into a single cluster. This is followed by updating the distance matrix to reflect this clustering.

We calculate the average distance between the newly formed cluster and all other remaining species. This average distance is used to update the distance matrix, incorporating the newly formed cluster. This method leads to formation of a branch in the phylogenetic tree, the height of which is half the value of the smallest distance identified. This branch represents the evolutionary divergence between the merged species and the rest of the taxa.

Branch formed between the pair with smallest distance.

Step 3 —

Create a new group or cluster and repeat until all species are grouped into single cluster.

Pair with smallest distance are A and D. Create new cluster of A and D.

Step 4 —

Create new cluster — BF and G

Step 5 —

Create new cluster — AD and BFG

Step 6 —

Create new cluster — ADBFG and C

Create new cluster — ADBFGC and E

This will give the final phylogenetic tree —

Phylogenetic tree created using the UPGMA method

Understanding a key concept: Distance Matrices

Within the UPGMA’s domain, distance matrices serve as a crucial element for understanding the evolutionary relationships between species. These matrices form the foundation for depicting genetic connections between species, quantifying the genetic divergences that characterizes their evolutionary paths.

The distance matrix is formed by calculating the dissimilarity or distance between each pair of taxa/organisms in a dataset. This distance measure may represent genetic divergence, sequence dissimilarity or any other metric that reflects the evolutionary or biological distance among organisms. For example, in DNA sequence alignment, the distance between two sequences can be measured based on the number of nucleotide substitutions or mutations that occur.

Reliability of UPGMA

While UPGMA is useful for constructing phylogenetic trees, it has its limitations. One significant consideration is the assumption of a molecular clock, which assumes a constant rate of evolution over time. However, in reality, evolutionary rates may differ among species and lineages, which can result in inaccuracies in the resulting trees.

Additionally, UOGMA us susceptible to outliers and can produce misleading results if the input data contain significant errors or biases. Therefore, it is essential to exercise caution and validate the findings with independent methods or additional data sources.

Despite certain limitations, UPGMA remains a valuable tool in the phylogenetic tree construction, especially for analyzing relatively closely related species or taxa with uniform evolutionary rates. When used carefully and combined with other techniques, UPGMA can offer valuable insights into the evolutionary history of organisms and the relationships between them.

In conclusion, UPGMA provides a simple yet powerful approach for constructing phylogenetic trees based on genetic distance matrices. By understanding this methodology, the significance of distance matrices, and considerations of reliability, researchers can fully utilize UPGMA in unraveling the complexities of evolutionary relationships between species.

--

--