| Title: | Rearrangement Distances Between Phylogenetic Trees |
|---|---|
| Description: | Fast calculation of tree rearrangement distances. For unrooted trees: Subtree Prune and Regraft (SPR), Tree Bisection and Reconnection (TBR), and Replug distances, using the algorithms of Whidden and Matsen (2017) <doi:10.48550/arXiv.1511.07529>. For rooted trees: rooted SPR (rSPR) distance, using the fixed-parameter algorithms of Whidden, Beiko, and Zeh (2013) <doi:10.1137/110845045>. |
| Authors: | Martin R. Smith [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-5660-1727>, GitHub: ms609), Chris Whidden [cph] (rspr, uspr, GitHub: cwhidden) |
| Maintainer: | Martin R. Smith <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0.0 |
| Built: | 2026-06-02 22:10:05 UTC |
| Source: | https://github.com/ms609/tbrdist |
Calculate rooted Subtree Prune-and-Regraft (rSPR) distances between pairs of rooted binary trees using the algorithms of Whidden, Beiko & Zeh (2013).
RSPRDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, approx = FALSE, maf = FALSE )RSPRDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, approx = FALSE, maf = FALSE )
tree1, tree2
|
Trees of class |
allPairs |
Logical; if |
checks |
Logical; validate tree labels and dimensions before
computation. Set |
approx |
Logical; if |
maf |
Logical; if |
This function wraps the rspr C++ library by Chris Whidden. It handles
rooted trees and is substantially more efficient than
USPRDist() for large trees: it can handle pairs with 200+
leaves and distances > 50.
Note that these distances are NP-hard to compute exactly; running time
scales as O(2^k n) where k is the rSPR distance and n is the number of
leaves. The built-in cluster decomposition (enabled by default) provides a
large practical speedup. Use approx = TRUE for a guaranteed
linear-time 3-approximation.
Input trees must be rooted. An error is raised if any tree is unrooted.
By default, an integer vector of rSPR distances (one per tree pair), or a
dist object when allPairs = TRUE and tree2 = NULL.
If maf = TRUE, a named list with elements:
exactInteger vector of rSPR distances.
maf_1, maf_2
Character vectors giving the maximum agreement forest for each pair of trees, expressed as space-separated Newick components.
Whidden C, Beiko RG, Zeh N (2013). Fixed-Parameter Algorithms for Maximum Agreement Forests. SIAM Journal on Computing 42:1431–1466. doi:10.1137/110845045
Whidden C, Zeh N, Beiko RG (2014). Supertrees based on the subtree prune-and-regraft distance. Systematic Biology 63:566–581. doi:10.1093/sysbio/syu023
USPRDist() for unrooted trees.
library(ape) set.seed(1) tree1 <- rtree(8) tree2 <- rtree(8) tree1$tip.label <- tree2$tip.label <- paste0("t", 1:8) RSPRDist(tree1, tree2) # All pairwise distances among a list of trees trees <- c(tree1, tree2) RSPRDist(trees) # Fast 3-approximation RSPRDist(tree1, tree2, approx = TRUE) # With maximum agreement forest RSPRDist(tree1, tree2, maf = TRUE)library(ape) set.seed(1) tree1 <- rtree(8) tree2 <- rtree(8) tree1$tip.label <- tree2$tip.label <- paste0("t", 1:8) RSPRDist(tree1, tree2) # All pairwise distances among a list of trees trees <- c(tree1, tree2) RSPRDist(trees) # Fast 3-approximation RSPRDist(tree1, tree2, approx = TRUE) # With maximum agreement forest RSPRDist(tree1, tree2, maf = TRUE)
Calculate SPR, TBR and Replug distances on unrooted trees, and the information content of the maximum agreement forest.
USPRDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, useTbrApproxEstimate = TRUE, useTbrEstimate = TRUE, useReplugEstimate = TRUE ) ReplugDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, maf = FALSE ) TBRDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, maf = FALSE, countMafs = FALSE, printMafs = FALSE, exact = maf, approximate = !exact, optimize = TRUE, protectB = TRUE ) MAFInfo(tree1, tree2 = tree1, exact = FALSE)USPRDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, useTbrApproxEstimate = TRUE, useTbrEstimate = TRUE, useReplugEstimate = TRUE ) ReplugDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, maf = FALSE ) TBRDist( tree1, tree2 = NULL, allPairs = is.null(tree2), checks = TRUE, maf = FALSE, countMafs = FALSE, printMafs = FALSE, exact = maf, approximate = !exact, optimize = TRUE, protectB = TRUE ) MAFInfo(tree1, tree2 = tree1, exact = FALSE)
tree1, tree2
|
Trees of class |
allPairs |
Logical; if |
checks |
Logical specifying whether to check that trees are properly
formatted and labelled. Specify |
useTbrApproxEstimate, useTbrEstimate, useReplugEstimate
|
Logical specifying whether to use approximate TBR distance, TBR distance or Replug distance to help estimate the SPR distance. |
maf |
Logical specifying whether to report a maximum agreement forest corresponding to the optimal score. |
countMafs |
Logical specifying whether to count the number of maximum agreement forests found. |
printMafs |
Logical specifying whether to print maximum agreement
forests to stdout whilst counting.
Use |
exact |
Logical specifying whether to calculate the exact TBR distance. |
approximate |
Logical specifying whether to calculate the approximate
TBR distance. By default, is set to the opposite of |
optimize |
Logical specifying whether to use the default optimizations. |
protectB |
Logical specifying whether to use the 'PROTECT_B'
optimization.
Overrides value inherited from |
Note that these distances are NP-hard to compute, so the running time of the algorithms used in this software scale exponentially with the distance computed. The version of 'uspr' linked in this package is aimed at trees with up to 50 leaves and uSPR distances up to 14.
If you are interested in comparing rooted trees in terms of SPR operations, you should use 'rspr' instead. 'rspr' is also much more efficient and can easily handle pairs of binary rooted trees with 200+ leaves and distances > 50. rspr is not yet incorporated in this R package; please contact the maintainer if this would be useful to you.
USPRDist() returns a vector of SPR distances between each pair of
unrooted trees.
ReplugDist() returns a vector of Replug distances between each pair
of trees, or (if maf = TRUE) a named list whose second and third elements
list a vector of maximum agreement forests for each pair of trees.
TBRDist() returns a named list, each element of which bears a
vector corresponding to the requested value for each tree pair. If only the
exact value is requested (exact = TRUE), an unnamed vector of distances is
returned.
MAFInfo() returns the information content of the maximum agreement
forest, in bits. This is defined as the sum of the phylogenetic information
content of each constituent subtree, plus the entropy of the clusters implied
by the division of the tree into subtrees. Note that as there is no
guarantee that the most informative MAF will be encountered,
this measure is approximate only. exact will only serve to guarantee
that a MAF corresponding to the exact TBR distance is among those sampled.
Algorithms implemented by Chris Whidden ([email protected])
R wrappers by Martin R. Smith ([email protected])
If you use these functions in your research, please cite:
Chris Whidden and Frederick A. Matsen IV. Calculating the Unrooted Subtree-Prune-and-Regraft Distance. arXiv:1511.07529.
tree1 <- TreeTools::BalancedTree(6) tree2 <- TreeTools::PectinateTree(6) # SPR distance USPRDist(tree1, tree2) # Replug distance ReplugDist(tree1, tree2) ReplugDist(tree1, tree2, maf = TRUE) # TBR distance between two trees TBRDist(tree1, tree2, exact = TRUE) # Compare a list against one tree, using approximate distances TBRDist(list(tree1, tree2), tree2, exact = FALSE) # Compare all pairs in two lists TBRDist(list(tree1, tree2), list(tree1, tree2, tree2), allPairs = TRUE, exact = FALSE) # Compare each tree in a list against each other TBRDist(list(one = tree1, two = tree2, twoAgain = tree2)) # Compare each pair in two lists TBRDist(list(tree1, tree2, tree2), list(tree2, tree1, tree2), exact = TRUE, approximate = TRUE, countMafs = TRUE) # Capture maximum agreement forests mafs <- capture.output(TBRDist(tree1, tree2, approximate = FALSE, printMafs = TRUE)) head(mafs) MAFInfo(tree1, tree2) MAFInfo(list(tree2, tree1), list(tree1, tree2))tree1 <- TreeTools::BalancedTree(6) tree2 <- TreeTools::PectinateTree(6) # SPR distance USPRDist(tree1, tree2) # Replug distance ReplugDist(tree1, tree2) ReplugDist(tree1, tree2, maf = TRUE) # TBR distance between two trees TBRDist(tree1, tree2, exact = TRUE) # Compare a list against one tree, using approximate distances TBRDist(list(tree1, tree2), tree2, exact = FALSE) # Compare all pairs in two lists TBRDist(list(tree1, tree2), list(tree1, tree2, tree2), allPairs = TRUE, exact = FALSE) # Compare each tree in a list against each other TBRDist(list(one = tree1, two = tree2, twoAgain = tree2)) # Compare each pair in two lists TBRDist(list(tree1, tree2, tree2), list(tree2, tree1, tree2), exact = TRUE, approximate = TRUE, countMafs = TRUE) # Capture maximum agreement forests mafs <- capture.output(TBRDist(tree1, tree2, approximate = FALSE, printMafs = TRUE)) head(mafs) MAFInfo(tree1, tree2) MAFInfo(list(tree2, tree1), list(tree1, tree2))