Title: | Comparison of Phylogenetic Trees Using Quartet and Split Measures |
---|---|
Description: | Calculates the number of four-taxon subtrees consistent with a pair of cladograms, calculating the symmetric quartet distance of Bandelt & Dress (1986), Reconstructing the shape of a tree from observed dissimilarity data, Advances in Applied Mathematics, 7, 309-343 <doi:10.1016/0196-8858(86)90038-2>, and using the tqDist algorithm of Sand et al. (2014), tqDist: a library for computing the quartet and triplet distances between binary or general trees, Bioinformatics, 30, 2079–2080 <doi:10.1093/bioinformatics/btu157> for pairs of binary trees. |
Authors: | Martin R. Smith [aut, cre, cph] , Andreas Sand [ant], Gerth Stølting Brodal [ant], Rolf Fagerberg [ant], Thomas Mailund [ant], Christian N. S. Pedersen [ant] , Jens Johansen [ant], Morten K. Holt [ant] |
Maintainer: | Martin R. Smith <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.7 |
Built: | 2024-11-03 06:23:47 UTC |
Source: | https://github.com/ms609/quartet |
Lists all choices of four taxa from a tree.
A more computationally efficient alternative to combn
.
AllQuartets(nTips) ## S3 method for class 'numeric' AllQuartets(nTips) ## S3 method for class 'phylo' AllQuartets(nTips)
AllQuartets(nTips) ## S3 method for class 'numeric' AllQuartets(nTips) ## S3 method for class 'phylo' AllQuartets(nTips)
nTips |
Integer, specifying the number of tips in a tree; or a tree, whose tips will be counted. |
AllQuartets()
returns a matrix with four rows and
choose(n_tips, 4)
columns, with each column corresponding to a unique
selection of four different integers less than or equal to nTips
.
Martin R. Smith ([email protected])
States of quartets in given trees: QuartetStates()
Other quartet counting functions:
CompareQuartets()
,
CompareQuartetsMulti()
,
ResolvedQuartets()
AllQuartets(5) combn(5, 4) # Provides the same information, but for large # values of n_tips is significantly slower.
AllQuartets(5) combn(5, 4) # Provides the same information, but for large # values of n_tips is significantly slower.
CompareQuartets()
uses explicit enumeration to compare two lists of
quartet states (Estabrook et al. 1985),
detailing how many are identical and how many are unresolved.
For most purposes, the faster function QuartetStatus()
will be
preferable.
CompareQuartets(x, cf)
CompareQuartets(x, cf)
x , cf
|
List of quartet states, perhaps generated by |
Returns an array of seven numeric elements, corresponding
The total number of quartet statements for two trees of n leaves, i.e. 2 Q.
The total number of quartets for n leaves.
The number of quartets that are resolved identically in both trees.
The number of quartets that are resolved differently in each tree.
The number of quartets that are resolved in tree 1, but not in tree 2.
The number of quartets that are resolved in tree 2, but not in tree 1.
The number of quartets that are unresolved in both trees.
Martin R. Smith ([email protected])
Estabrook GF, McMorris FR, Meacham CA (1985). “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.” Systematic Zoology, 34(2), 193–200. doi:10.2307/2413326.
QuartetStatus()
generates the same output from a list of trees.
Other element-by-element comparisons:
CompareQuartetsMulti()
,
CompareSplits()
,
PairSharedQuartetStatus()
,
QuartetState()
,
SharedQuartetStatus()
,
SplitStatus()
Other quartet counting functions:
AllQuartets()
,
CompareQuartetsMulti()
,
ResolvedQuartets()
trees <- list(TreeTools::BalancedTree(6), TreeTools::PectinateTree(6)) quartets <- QuartetStates(trees) CompareQuartets(quartets[[1]], quartets[[2]])
trees <- list(TreeTools::BalancedTree(6), TreeTools::PectinateTree(6)) quartets <- QuartetStates(trees) CompareQuartets(quartets[[1]], quartets[[2]])
CompareQuartetsMulti()
counts how many quartets in one tree are resolved
in the same way or different ways in a forest of comparison trees.
CompareQuartetsMulti(x, cf)
CompareQuartetsMulti(x, cf)
x |
Object of class |
cf |
Comparison tree of class |
CompareQuartetsMulti()
explicitly evaluates each quartet in each tree.
As such its runtime will increase hyper-exponentially with the number of
leaves in trees being compared. 30 leaves will take around 5 seconds;
40 closer to 20 s, and 50 around a minute.
CompareQuartetsMulti()
returns a named integer vector specifying
the number of quartets whose resolution in x
matches all or any of the
resolutions in cf
.
Named elements are:
The total number of quartet statements for the given number of n-leaf trees, i.e. n_trees × Q.
The total number of quartets for n leaves.
The number of quartets that are resolved identically in all trees.
The number of quartets that are resolved in x
, and
identically in at least one of cf
.
The number of quartets that are resolved in every tree in
cf
, but never in the same way as they are resolved in in x
.
The number of quartets in x
that are resolved differently
(i.e. contradicted) in at least one tree in cf
.
The number of quartets that are resolved in x
, but not in
any of cf
.
The number of quartets that are resolved in x
, but
unresolved in at least one of cf
.
The number of quartets that are resolved in all of cf
,
but not in x
.
The number of quartets that are resolved in at least one of cf
,
but not in x
.
The number of quartets that are unresolved in all trees.
The number of quartets that are unresolved in x
and
at least one tree in cf
.
The number of quartets in x
that are not resolved the
same way in any of cf
.
Martin R. Smith ([email protected])
Other element-by-element comparisons:
CompareQuartets()
,
CompareSplits()
,
PairSharedQuartetStatus()
,
QuartetState()
,
SharedQuartetStatus()
,
SplitStatus()
Other quartet counting functions:
AllQuartets()
,
CompareQuartets()
,
ResolvedQuartets()
library("TreeTools") CompareQuartetsMulti(x = CollapseNode(as.phylo(42, 6), 8:9), cf = list(BalancedTree(6), PectinateTree(6), CollapseNode(as.phylo(1337, 6), 9:10)))
library("TreeTools") CompareQuartetsMulti(x = CollapseNode(as.phylo(42, 6), 8:9), cf = list(BalancedTree(6), PectinateTree(6), CollapseNode(as.phylo(1337, 6), 9:10)))
Reports whether splits are present or contradicted in a set of reference splits.
CompareSplits(splits, splits2) CompareBipartitions(splits, splits2)
CompareSplits(splits, splits2) CompareBipartitions(splits, splits2)
splits |
An object that can be coerced into class |
splits2 |
Splits against which to compare |
A named vector of eight integers, listing the number of unique splits that:
N exist in total; i.e. the number of splits in splits1
plus the
number in splits2
,
equivalent to 2 s + d1 + d2 + r1 + r2;
P1 occur in splits1
P2 occur in splits2
s occur in both splits1
and splits2
;
d1 occur in splits1
but are contradicted by splits2
;
d2 occur in splits2
but are contradicted by splits1
;
r1 occur in splits1
only, being neither present in nor contradicted by splits2
;
r2 occur in splits2
only, being neither present in nor contradicted by splits1
;
RF occur in one tree only; i.e. d1 + d2 + r1 + r2, the Robinson-Foulds distance.
Martin R. Smith ([email protected])
Estabrook GF, McMorris FR, Meacham CA (1985). “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.” Systematic Zoology, 34(2), 193–200. doi:10.2307/2413326.
Robinson DF, Foulds LR (1981). “Comparison of phylogenetic trees.” Mathematical Biosciences, 53(1-2), 131–147. doi:10.1016/0025-5564(81)90043-2.
Equivalent function for quartets: CompareQuartets()
Other element-by-element comparisons:
CompareQuartets()
,
CompareQuartetsMulti()
,
PairSharedQuartetStatus()
,
QuartetState()
,
SharedQuartetStatus()
,
SplitStatus()
splits1 <- TreeTools::BalancedTree(8) splits2 <- TreeTools::PectinateTree(8) CompareSplits(splits1, splits2)
splits1 <- TreeTools::BalancedTree(8) splits2 <- TreeTools::PectinateTree(8) CompareSplits(splits1, splits2)
Wrappers for functions in "tqDist", which calculate triplet and quartet distances between pairs of trees.
QuartetDistance(file1, file2) QuartetAgreement(file1, file2) PairsQuartetDistance(file1, file2) OneToManyQuartetAgreement(file1, file2) AllPairsQuartetDistance(file) AllPairsQuartetAgreement(file) TripletDistance(file1, file2) PairsTripletDistance(file1, file2) AllPairsTripletDistance(file)
QuartetDistance(file1, file2) QuartetAgreement(file1, file2) PairsQuartetDistance(file1, file2) OneToManyQuartetAgreement(file1, file2) AllPairsQuartetDistance(file) AllPairsQuartetAgreement(file) TripletDistance(file1, file2) PairsTripletDistance(file1, file2) AllPairsTripletDistance(file)
file , file1 , file2
|
Paths to files containing a tree or trees in Newick
format, possibly created using |
...Distance()
functions return the distance between the requested trees.
...Agreement()
functions return the number of triplets or quartets that are:
A
, resolved in the same fashion in both trees;
E
, unresolved in both trees.
Comparing a tree against itself yields the totals (A+B+C
) and (D+E
)
referred to by Brodal et al. (2013) and
Holt et al. (2014).
QuartetDistance()
: Returns the quartet distance between the tree.
in file1
and the tree in file2
.
QuartetAgreement()
: Returns a vector of length two, listing [1]
the number of resolved quartets that agree (A
);
[2] the number of quartets that are unresolved in both trees (E
).
See Brodal et al. (2013).
PairsQuartetDistance()
: Quartet distance between the tree on each line of file1
and the tree on the corresponding line of file2
.
OneToManyQuartetAgreement()
: Quartet distance between the tree in
file1
and the tree on each line of file2
.
AllPairsQuartetDistance()
: Quartet distance between each tree listed in file
and
each other tree therein.
AllPairsQuartetAgreement()
: Quartet status for each pair of trees in file
.
TripletDistance()
: Triplet distance between the single tree given
in each file.
PairsTripletDistance()
: Triplet distance between the tree on each line of file1
and the tree on the corresponding line of file2
.
AllPairsTripletDistance()
: Triplet distance between each tree listed in file
and
each other tree therein.
Algorithms: Brodal et al. (2013); Holt et al. (2014).
C implementation: Sand et al. (2014); modified for portability by Martin R. Smith.
R interface: Martin R. Smith.
Brodal GS, Fagerberg R, Mailund T, Pedersen CNS, Sand A (2013).
“Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree.”
SODA '13 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 1814–1832.
doi:10.1137/1.9781611973105.130.
Holt MK, Johansen J, Brodal GS (2014).
“On the scalability of computing triplet and quartet distances.”
In Proceedings of 16th Workshop on Algorithm Engineering and Experiments (ALENEX) Portland, Oregon, USA.
QuartetStatus()
takes trees, rather than files, as input.
TQFile()
creates a temporary file containing specified trees.
Draws a tree, highlighting the members of a specified quartet in colour.
PlotQuartet(tree, quartet, overwritePar = TRUE, caption = TRUE, ...)
PlotQuartet(tree, quartet, overwritePar = TRUE, caption = TRUE, ...)
tree |
A tree of class |
quartet |
A vector of four integers, corresponding to numbered leaves on the tree; or a character vector specifying the labels of four leaves. |
overwritePar |
Logical specifying whether to use existing |
caption |
Logical specifying whether to annotate each plot to specify whether the quartet selected is in the same or a different state to the reference tree. |
... |
Additional parameters to send to
|
PlotQuartet()
returns invisible()
, having plotted a tree in
which the first two members of quartet
are highlighted in orange, and the
second two highlighted in blue.
Martin R. Smith ([email protected])
data("sq_trees") oPar <- par(mfrow = c(3, 6), mar = rep(0.5, 4)) PlotQuartet(sq_trees, c(2, 5, 3, 8), overwritePar = FALSE) par(oPar)
data("sq_trees") oPar <- par(mfrow = c(3, 6), mar = rep(0.5, 4)) PlotQuartet(sq_trees, c(2, 5, 3, 8), overwritePar = FALSE) par(oPar)
Generate points to depict tree difference (in terms of resolution and accuracy) on a ternary plot, per Smith (2019).
QuartetPoints(trees, cf = trees[[1]]) SplitPoints(trees, cf = trees[[1]]) BipartitionPoints(trees, cf = trees[[1]])
QuartetPoints(trees, cf = trees[[1]]) SplitPoints(trees, cf = trees[[1]]) BipartitionPoints(trees, cf = trees[[1]])
trees |
A list of trees of class |
cf |
Comparison tree of class |
The ternary plot (produced using the Ternary package, Smith 2017) will depict the number of quartets or splits that are:
resolved in the reference tree (cf
), but neither present nor contradicted
in each comparison tree (trees
);
resolved differently in the reference and the comparison tree;
resolved in the same manner in the reference and comparison trees.
If the reference tree (cf
) is taken to represent the best possible
knowledge of the "true" topology, then polytomies in the reference tree
represent uncertainty.
If a tree in trees
resolves relationships within this polytomy, it is not
possible to establish (based only on the reference tree) whether this
resolution is correct or erroneous.
As such, extra resolution in trees
that is neither corroborated nor
contradicted by cf
is ignored.
A data frame listing the ternary coordinates of trees, based on the amount of information that they have in common with the comparison tree (which defaults to the first member of the list, if unspecified).
Martin R. Smith ([email protected])
Smith MR (2017).
“Ternary: An R Package for Creating Ternary Plots.”
doi:10.5281/zenodo.1068997.
Smith MR (2019).
“Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets.”
Biology Letters, 15(2), 20180632.
doi:10.1098/rsbl.2018.0632.
library("Ternary") data("sq_trees") TernaryPlot(alab = "Unresolved", blab = "Contradicted", clab = "Consistent", point = "right") TernaryLines(list(c(0, 2/3, 1/3), c(1, 0, 0)), col = "red", lty = "dotted") TernaryText(QuartetPoints(sq_trees, cf = sq_trees$collapse_one), 1:15, col = Ternary::cbPalette8[2], cex = 0.8) TernaryText(SplitPoints(sq_trees, cf = sq_trees$collapse_one), 1:15, col = Ternary::cbPalette8[3], cex = 0.8) legend("bottomright", c("Quartets", "Splits"), bty = "n", pch = 1, cex = 0.8, col = Ternary::cbPalette8[2:3])
library("Ternary") data("sq_trees") TernaryPlot(alab = "Unresolved", blab = "Contradicted", clab = "Consistent", point = "right") TernaryLines(list(c(0, 2/3, 1/3), c(1, 0, 0)), col = "red", lty = "dotted") TernaryText(QuartetPoints(sq_trees, cf = sq_trees$collapse_one), 1:15, col = Ternary::cbPalette8[2], cex = 0.8) TernaryText(SplitPoints(sq_trees, cf = sq_trees$collapse_one), 1:15, col = Ternary::cbPalette8[3], cex = 0.8) legend("bottomright", c("Quartets", "Splits"), bty = "n", pch = 1, cex = 0.8, col = Ternary::cbPalette8[2:3])
Report the status of the specified quartet(s) in given trees or lists of splits (Estabrook et al. 1985).
QuartetState(tips, bips, splits = bips, asRaw = FALSE) QuartetStates(splits, asRaw = FALSE) ## S3 method for class 'Splits' QuartetStates(splits, asRaw = FALSE) ## S3 method for class 'list' QuartetStates(splits, asRaw = FALSE) ## S3 method for class 'multiPhylo' QuartetStates(splits, asRaw = FALSE)
QuartetState(tips, bips, splits = bips, asRaw = FALSE) QuartetStates(splits, asRaw = FALSE) ## S3 method for class 'Splits' QuartetStates(splits, asRaw = FALSE) ## S3 method for class 'list' QuartetStates(splits, asRaw = FALSE) ## S3 method for class 'multiPhylo' QuartetStates(splits, asRaw = FALSE)
tips |
A four-element array listing a quartet of leaves, either by their
number (if class |
bips |
Deprecated; included for compatibility with v1.0.2 and below. |
splits |
An object, such as a tree of class |
asRaw |
Logical specifying whether return format should be |
One of the three possible four-leaf trees will be consistent with any set of splits generated from a fully resolved tree. If the leaves are numbered 1 to 4, this tree can be identified by naming the leaf most closely related to leaf 4. If a set of splits is generated from a tree that contains polytomies, it is possible that all three four-leaf trees are consistent with the set of splits.
QuartetState()
returns 0
if the relationships of the four leaves
are not constrained by the provided splits, or the index of the closest
relative to tips[4]
, otherwise.
QuartetStates()
returns a raw vector listing the status of each
quartet of leaves (in the order listed by AllQuartets()
) in turn,
or if multiple trees are provided, a matrix in which each row corresponds
to such a vector.
Martin R. Smith ([email protected])
Estabrook GF, McMorris FR, Meacham CA (1985). “Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.” Systematic Zoology, 34(2), 193–200. doi:10.2307/2413326.
Compare quartet states between trees (slowly) using
CompareQuartets()
and CompareQuartetsMulti()
.
Other element-by-element comparisons:
CompareQuartets()
,
CompareQuartetsMulti()
,
CompareSplits()
,
PairSharedQuartetStatus()
,
SharedQuartetStatus()
,
SplitStatus()
trees <- list(TreeTools::BalancedTree(6), TreeTools::PectinateTree(6)) trees[[3]] <- TreeTools::CollapseNode(trees[[2]], 9:10) QuartetState(c(1, 3, 4, 6), trees[[2]]) QuartetState(1:4, trees[[1]]) == QuartetState(1:4, trees[[2]]) QuartetState(c(1, 3, 4, 6), trees[[3]]) QuartetStates(trees[[2]]) QuartetStates(trees[[3]]) CompareQuartets(QuartetStates(trees[[2]]), QuartetStates(trees[[3]])) CompareQuartetsMulti(trees[[1]], trees[2:3])
trees <- list(TreeTools::BalancedTree(6), TreeTools::PectinateTree(6)) trees[[3]] <- TreeTools::CollapseNode(trees[[2]], 9:10) QuartetState(c(1, 3, 4, 6), trees[[2]]) QuartetState(1:4, trees[[1]]) == QuartetState(1:4, trees[[2]]) QuartetState(c(1, 3, 4, 6), trees[[3]]) QuartetStates(trees[[2]]) QuartetStates(trees[[3]]) CompareQuartets(QuartetStates(trees[[2]]), QuartetStates(trees[[3]])) CompareQuartetsMulti(trees[[1]], trees[2:3])
Counts how many quartets are resolved or unresolved in a given tree, following Brodal et al. (2013).
ResolvedQuartets(tree, countTriplets = FALSE) ResolvedTriplets(tree)
ResolvedQuartets(tree, countTriplets = FALSE) ResolvedTriplets(tree)
tree |
A tree of class |
countTriplets |
Logical; if |
Trees with more than 477 leaves risk encountering integer overflow errors, as the number of quartets is larger than can be stored in R's signed 32-bit integer representation. If warnings are thrown, check subsequent calculations for errors.
ResolvedQuartets()
returns a vector of length two, listing the
number of quartets (or triplets) that are [1]
resolved; [2]
unresolved
in the specified tree.
ResolvedTriplets()
: Convenience function to calculate the number of
resolved/unresolved triplets.
Martin R. Smith ([email protected])
Brodal GS, Fagerberg R, Mailund T, Pedersen CNS, Sand A (2013). “Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree.” SODA '13 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 1814–1832. doi:10.1137/1.9781611973105.130.
Other quartet counting functions:
AllQuartets()
,
CompareQuartets()
,
CompareQuartetsMulti()
data(sq_trees) ResolvedTriplets(sq_trees$collapse_some) # Equivalent to: ResolvedQuartets(sq_trees$collapse_some, countTriplets = TRUE) vapply(sq_trees, ResolvedQuartets, integer(2))
data(sq_trees) ResolvedTriplets(sq_trees$collapse_some) # Equivalent to: ResolvedQuartets(sq_trees$collapse_some, countTriplets = TRUE) vapply(sq_trees, ResolvedQuartets, integer(2))
Determines the number of quartets that are consistent within pairs of trees.
SharedQuartetStatus(trees, cf = trees[[1]]) QuartetStatus(trees, cf = trees[[1]], nTip = NULL) ManyToManyQuartetAgreement(trees, nTip = NULL) TwoListQuartetAgreement(trees1, trees2) SingleTreeQuartetAgreement(trees, comparison)
SharedQuartetStatus(trees, cf = trees[[1]]) QuartetStatus(trees, cf = trees[[1]], nTip = NULL) ManyToManyQuartetAgreement(trees, nTip = NULL) TwoListQuartetAgreement(trees1, trees2) SingleTreeQuartetAgreement(trees, comparison)
trees |
A list of trees of class |
cf |
Comparison tree of class |
nTip |
Integer specifying number of tips that could have occurred
in |
trees1 , trees2
|
List or |
comparison |
A tree of class |
Given a list of trees, returns the number of quartet statements
(Estabrook et al. 1985) present in the
reference tree (the first entry in trees
, if cf
is not specified)
that are also present in each other tree. A random pair of fully resolved
trees is expected to share choose(n_tip, 4) / 3
quartets.
If trees do not bear the same number of tips, SharedQuartetStatus()
will
consider only the quartets that include taxa common to both trees.
From this information it is possible to calculate how many of all possible quartets occur in one tree or the other, though there is not yet a function calculating this; let us know if you would appreciate this functionality.
The status of each quartet is calculated using the algorithms of Brodal et al. (2013) and Holt et al. (2014), implemented in the tqdist C library (Sand et al. 2014).
QuartetStatus()
returns a two dimensional array. Rows correspond to the input trees; the first row will report a perfect match if the first tree is specified as the comparison tree (or if cf
is not specified). Columns list the status of each quartet:
The total number of quartet statements for two trees of n leaves, i.e. 2 Q.
The total number of quartets for n leaves.
The number of quartets that are resolved identically in both trees.
The number of quartets that are resolved differently in each tree.
The number of quartets that are resolved in tree 1, but not in tree 2.
The number of quartets that are resolved in tree 2, but not in tree 1.
The number of quartets that are unresolved in both trees.
ManyToManyQuartetAgreement()
returns a three-dimensional array
listing, for each pair of trees in turn, the number of quartets in each
category.
TwoListQuartetAgreement()
returns a three-dimensional array listing,
for each pair of trees in turn, the number of quartets in each category.
SingleTreeQuartetAgreement()
returns a two-dimensional array listing,
for tree in trees
, the total number of quartets and the
number of quartets in each category.
The comparison
tree is treated as tree2
.
SharedQuartetStatus()
: Reports split statistics obtained after removing all
tips that do not occur in both trees being compared.
ManyToManyQuartetAgreement()
: Agreement of each quartet, comparing each pair of
trees in a list.
TwoListQuartetAgreement()
: Agreement of each quartet in trees in one list with
each quartet in trees in a second list.
SingleTreeQuartetAgreement()
: Agreement of each quartet in trees in a list with
the quartets in a comparison tree.
Martin R. Smith ([email protected])
Brodal GS, Fagerberg R, Mailund T, Pedersen CNS, Sand A (2013).
“Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree.”
SODA '13 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 1814–1832.
doi:10.1137/1.9781611973105.130.
Estabrook GF, McMorris FR, Meacham CA (1985).
“Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.”
Systematic Zoology, 34(2), 193–200.
doi:10.2307/2413326.
Holt MK, Johansen J, Brodal GS (2014).
“On the scalability of computing triplet and quartet distances.”
In Proceedings of 16th Workshop on Algorithm Engineering and Experiments (ALENEX) Portland, Oregon, USA.
Sand A, Holt MK, Johansen J, Brodal GS, Mailund T, Pedersen CNS (2014).
“tqDist: a library for computing the quartet and triplet distances between binary or general trees.”
Bioinformatics, 30(14), 2079–2080.
ISSN 1460-2059, doi:10.1093/bioinformatics/btu157.
Use splits (groups/clades defined by nodes or edges of the tree) instead
of quartets as the unit of comparison: SplitStatus()
.
Generate distance metrics from quartet statuses: SimilarityMetrics()
.
Other element-by-element comparisons:
CompareQuartets()
,
CompareQuartetsMulti()
,
CompareSplits()
,
PairSharedQuartetStatus()
,
QuartetState()
,
SplitStatus()
data("sq_trees") # Calculate the status of each quartet relative to the first entry in # sq_trees sq_status <- QuartetStatus(sq_trees) # Calculate the status of each quartet relative to a given tree two_moved <- sq_trees[5:7] sq_status <- QuartetStatus(two_moved, sq_trees$ref_tree) # Calculate Estabrook et al's similarity measures: SimilarityMetrics(sq_status) # Compare trees that include a subset of the taxa 1..10 library("TreeTools", quietly = TRUE, warn.conflict = FALSE) QuartetStatus(BalancedTree(1:5), BalancedTree(3:8), nTip = 10) # If all taxa studied occur in `trees` or `cf`, set `nTip = TRUE` QuartetStatus(BalancedTree(1:5), BalancedTree(3:10), nTip = TRUE) # Calculate Quartet Divergence between each tree and each other tree in a # list QuartetDivergence(ManyToManyQuartetAgreement(two_moved)) # Calculate Quartet Divergence between each tree in one list and each # tree in another QuartetDivergence(TwoListQuartetAgreement(sq_trees[1:3], sq_trees[10:13]))
data("sq_trees") # Calculate the status of each quartet relative to the first entry in # sq_trees sq_status <- QuartetStatus(sq_trees) # Calculate the status of each quartet relative to a given tree two_moved <- sq_trees[5:7] sq_status <- QuartetStatus(two_moved, sq_trees$ref_tree) # Calculate Estabrook et al's similarity measures: SimilarityMetrics(sq_status) # Compare trees that include a subset of the taxa 1..10 library("TreeTools", quietly = TRUE, warn.conflict = FALSE) QuartetStatus(BalancedTree(1:5), BalancedTree(3:8), nTip = 10) # If all taxa studied occur in `trees` or `cf`, set `nTip = TRUE` QuartetStatus(BalancedTree(1:5), BalancedTree(3:10), nTip = TRUE) # Calculate Quartet Divergence between each tree and each other tree in a # list QuartetDivergence(ManyToManyQuartetAgreement(two_moved)) # Calculate Quartet Divergence between each tree in one list and each # tree in another QuartetDivergence(TwoListQuartetAgreement(sq_trees[1:3], sq_trees[10:13]))
Measure tree similarity or difference.
SimilarityMetrics(elementStatus, similarity = TRUE) DoNotConflict(elementStatus, similarity = TRUE) ExplicitlyAgree(elementStatus, similarity = TRUE) StrictJointAssertions(elementStatus, similarity = TRUE) SemiStrictJointAssertions(elementStatus, similarity = TRUE) SymmetricDifference(elementStatus, similarity = TRUE) RawSymmetricDifference(elementStatus, similarity = FALSE) RobinsonFoulds(elementStatus, similarity = FALSE) MarczewskiSteinhaus(elementStatus, similarity = TRUE) SteelPenny(elementStatus, similarity = TRUE) QuartetDivergence(elementStatus, similarity = TRUE) SimilarityToReference(elementStatus, similarity = TRUE, normalize = FALSE)
SimilarityMetrics(elementStatus, similarity = TRUE) DoNotConflict(elementStatus, similarity = TRUE) ExplicitlyAgree(elementStatus, similarity = TRUE) StrictJointAssertions(elementStatus, similarity = TRUE) SemiStrictJointAssertions(elementStatus, similarity = TRUE) SymmetricDifference(elementStatus, similarity = TRUE) RawSymmetricDifference(elementStatus, similarity = FALSE) RobinsonFoulds(elementStatus, similarity = FALSE) MarczewskiSteinhaus(elementStatus, similarity = TRUE) SteelPenny(elementStatus, similarity = TRUE) QuartetDivergence(elementStatus, similarity = TRUE) SimilarityToReference(elementStatus, similarity = TRUE, normalize = FALSE)
elementStatus |
Two-dimensional integer array, with rows corresponding to
counts of matching quartets or partitions for each tree, and columns named
according to the output of |
similarity |
Logical specifying whether to calculate the similarity or dissimilarity. |
normalize |
Logical; if |
Estabrook et al. (1985) (table 2) define four similarity metrics in terms of the total number of quartets (N, their Q), the number of quartets resolved in the same manner in two trees (s), the number resolved differently in both trees (d), the number resolved in tree 1 or 2 but unresolved in the other tree (r1, r2), and the number that are unresolved in both trees (u).
The similarity metrics are then given as below. The dissimilarity metrics are their complement (i.e. 1 - similarity), and can be calculated algebraically using the identity N = s + d + r1 + r2 + u.
Although defined using quartets, analogous values can be calculated using partitions – though for a number of reasons, quartets may offer a more meaningful measure of the amount of information shared by two trees (Smith 2020).
Do Not Conflict (DC): (s + r1 + r2 + u) / N
Explicitly Agree (EA): s / N
Strict Joint Assertions (SJA): s / (s + d)
SemiStrict Joint Assertions (SSJA): s / (s + d + u)
(The numerator of the SemiStrict Joint Assertions similarity metric is given in Estabrook et al. (1985) table 2 as s + d, but this is understood, with reference to their text, to be a typographic error.)
Steel and Penny (1993) propose a further metric,
which they denote d_Q_,
which this package calculates using the function SteelPenny()
:
Steel & Penny's quartet metric (dQ): (s + u) / N
Another take on tree similarity is to consider the symmetric difference: that is, the number of partitions or quartets present in one tree that do not appear in the other, originally used to measure tree similarity by Robinson and Foulds (1981). (Note that, given the familiarity of the Robinson–Foulds distance metric, this quantity is be default expressed as a difference rather than a similarity.)
Raw symmetric difference (RF): d1 + d2 + r1 + r2
A pair of trees will have a high symmetric difference if they are well-resolved but disagree on many relationships; or if they agree on most relationships but are poorly resolved. As such, it is essential to contextualize the symmetric difference by appropriate normalization (Smith 2019). Multiple approaches to normalization have been proposed:
The total number of resolved quartets or partitions present in both trees (Day 1986):
Symmetric Difference (SD): (2 d + r1 + r2) / (2 d + 2 s + r1 + r2)
The total distinctly resolved quartets or partitions (Marczewski and Steinhaus 1958; Day 1986):
Marczewski-Steinhaus (MS): (2 d + r1 + r2) / (2 d + s + r1 + r2)
The maximum number of quartets or partitions that could have been resolved, given the number of tips (Smith 2019):
Symmetric Divergence: (d + d + r1 + r2) / N
Finally, in cases where a reconstructed tree r1
is being compared to a
reference tree r2
taken to represent "true" relationships,
a symmetric difference is not desired.
In such settings, the desired score is the expectation that a given
quartet's resolution in the reconstructed tree is "correct", given by
Asher and Smith (2022):
Similarity to Reference (S2R): (s + (r1 + r2 + u) / 3) / Q
This may optionally be normalized with reference to the maximum possible similarity, (s + d + r2 + (r1 + u) / 3) / Q, subtracting 1/3 (the probability of matching at random) from both the S2R score and maximum possible score before dividing; then, a tree scores zero if it is as different from the true tree as a random or fully unresolved tree, and one if it is as "true" as can be known.
SimilarityMetrics()
returns a named two-dimensional array in which each row
corresponds to an input tree, and each column corresponds to one of the
listed measures.
DoNotConflict()
and others return a named vector describing the requested
similarity (or difference) between the trees.
Martin R. Smith ([email protected])
Asher R, Smith MR (2022).
“Phylogenetic signal and bias in paleontology.”
Systematic Biology, 71(4), 986–1008.
doi:10.1093/sysbio/syab072.
Day WH (1986).
“Analysis of quartet dissimilarity measures between undirected phylogenetic trees.”
Systematic Biology, 35(3), 325–333.
doi:10.1093/sysbio/35.3.325.
Estabrook GF, McMorris FR, Meacham CA (1985).
“Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.”
Systematic Zoology, 34(2), 193–200.
doi:10.2307/2413326.
Marczewski E, Steinhaus H (1958).
“On a certain distance of sets and the corresponding distance of functions.”
Colloquium Mathematicae, 6(1), 319–327.
https://eudml.org/doc/210378.
Robinson DF, Foulds LR (1981).
“Comparison of phylogenetic trees.”
Mathematical Biosciences, 53(1-2), 131–147.
doi:10.1016/0025-5564(81)90043-2.
Smith MR (2019).
“Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets.”
Biology Letters, 15(2), 20180632.
doi:10.1098/rsbl.2018.0632.
Smith MR (2020).
“Information theoretic Generalized Robinson-Foulds metrics for comparing phylogenetic trees.”
Bioinformatics, 36(20), 5007–5013.
doi:10.1093/bioinformatics/btaa614.
Steel MA, Penny D (1993).
“Distributions of tree comparison metrics—some new results.”
Systematic Biology, 42(2), 126–141.
doi:10.1093/sysbio/42.2.126, http://www.math.canterbury.ac.nz/~m.steel/Non_UC/files/research/distributions.pdf.
Calculate status of each quartet – the raw material from which the
Estabrook et al. metrics are calculated – with QuartetStatus()
:
Equivalent metrics for bipartition splits: SplitStatus()
,
CompareSplits()
data("sq_trees") sq_status <- QuartetStatus(sq_trees) SimilarityMetrics(sq_status) QuartetDivergence(sq_status, similarity = FALSE) library("TreeTools", quietly = TRUE, warn.conflict = FALSE) set.seed(0) reference <- CollapseNode(as.phylo(101, 10), 16:18) trees <- c( reference = reference, binaryRef = MakeTreeBinary(reference), balanced = BalancedTree(reference), pectinate = PectinateTree(reference), star = StarTree(reference), random = RandomTree(reference), random2 = RandomTree(reference) ) elementStatus <- QuartetStatus(trees, reference) SimilarityToReference(elementStatus) SimilarityToReference(elementStatus, normalize = TRUE)
data("sq_trees") sq_status <- QuartetStatus(sq_trees) SimilarityMetrics(sq_status) QuartetDivergence(sq_status, similarity = FALSE) library("TreeTools", quietly = TRUE, warn.conflict = FALSE) set.seed(0) reference <- CollapseNode(as.phylo(101, 10), 16:18) trees <- c( reference = reference, binaryRef = MakeTreeBinary(reference), balanced = BalancedTree(reference), pectinate = PectinateTree(reference), star = StarTree(reference), random = RandomTree(reference), random2 = RandomTree(reference) ) elementStatus <- QuartetStatus(trees, reference) SimilarityToReference(elementStatus) SimilarityToReference(elementStatus, normalize = TRUE)
Calculates how many of the partitions present in tree 1 are also present in
tree 2 (s
),
how many of the partitions in tree 1 are absent in tree 2 (d1
),
and how many of the partitions in tree 2 are absent in tree 1 (d2
).
The Robinson-Foulds (symmetric partition) distance is the sum of the
latter two quantities, i.e. d1
+ d2
.
SplitStatus(trees, cf = trees[[1]]) SharedSplitStatus(trees, cf)
SplitStatus(trees, cf = trees[[1]]) SharedSplitStatus(trees, cf)
trees |
A list of trees of class |
cf |
Comparison tree of class |
Returns a two dimensional array. Rows correspond to the input trees, and are named if names were present. Columns report:
N: The total number of partitions present in the two trees, i.e. P1 + P2.
P1: The number of partitions present in tree 1.
P2: The number of partitions present in tree 2.
s: The number of partitions present in both trees.
d1: The number of partitions present in tree 1, but contradicted by tree 2.
d2: The number of partitions present in tree 2, but contradicted by tree 1.
r1: The number of partitions present in tree 1, and neither present nor contradicted in tree 2.
r2: The number of partitions present in tree 2, and neither present nor contradicted in tree 1.
SharedSplitStatus()
: Reports split statistics obtained after removing all
tips that do not occur in both trees being compared.
Martin R. Smith ([email protected])
Robinson DF, Foulds LR (1981). “Comparison of phylogenetic trees.” Mathematical Biosciences, 53(1-2), 131–147. doi:10.1016/0025-5564(81)90043-2.
Penny D, Hendy MD (1985). “The use of tree comparison metrics.” Systematic Zoology, 34(1), 75–82. doi:10.2307/2413347.
Other element-by-element comparisons:
CompareQuartets()
,
CompareQuartetsMulti()
,
CompareSplits()
,
PairSharedQuartetStatus()
,
QuartetState()
,
SharedQuartetStatus()
data("sq_trees") # Calculate the status of each quartet splitStatuses <- SplitStatus(sq_trees) # Calculate the raw symmetric difference (i.e. Robinson–Foulds distance) RawSymmetricDifference(splitStatuses) # Normalize the Robinson Foulds distance by dividing by the number of # splits present in the two trees: RawSymmetricDifference(splitStatuses) / splitStatuses[, "N"] # Normalize the Robinson Foulds distance by dividing by the total number of # splits that it is possible to resolve for `n` tips: nTip <- length(sq_trees[[1]]$tip.label) nPartitions <- 2 * (nTip - 3L) # Does not include the nTip partitions that # comprise but a single tip RawSymmetricDifference(splitStatuses) / nPartitions
data("sq_trees") # Calculate the status of each quartet splitStatuses <- SplitStatus(sq_trees) # Calculate the raw symmetric difference (i.e. Robinson–Foulds distance) RawSymmetricDifference(splitStatuses) # Normalize the Robinson Foulds distance by dividing by the number of # splits present in the two trees: RawSymmetricDifference(splitStatuses) / splitStatuses[, "N"] # Normalize the Robinson Foulds distance by dividing by the total number of # splits that it is possible to resolve for `n` tips: nTip <- length(sq_trees[[1]]$tip.label) nPartitions <- 2 * (nTip - 3L) # Does not include the nTip partitions that # comprise but a single tip RawSymmetricDifference(splitStatuses) / nPartitions
A list of class multiPhylo
containing phylogenetic trees:
ref_tree
A reference tree, bearing tips labelled 1 to 11.
move_one_near
Tip 1 has been moved a short distance.
move_one_mid
Tip 1 has been moved further.
move_one_far
Tip 1 has been moved further still.
move_two_near
Tips 10 & 11 have been moved a short distance.
move_two_mid
Tips 10 & 11 have been moved further.
move_two_far
Tips 10 & 11 have been moved further still.
collapse_one
One node has been collapsed into a polytomy.
collapse_some
Several nodes have been collapsed.
m1mid_col1
Tree move_one_mid
with one node collapsed.
m1mid_colsome
Tree move_one_mid
with several nodes collapsed.
m2mid_col1
Tree move_two_mid
with one node collapsed.
m2mid_colsome
Tree move_two_mid
with several nodes collapsed.
opposite_tree
A tree that shares fewer quartets with ref_tree
than expected by chance.
caterpillar
A pectinate "caterpillar" tree.
top_and_tail
Tree caterpillar
, with its outermost taxa swapped
such that it shares no partitions with caterpillar
.
anti_pectinate
A random tree that shares no partitions with
caterpillar
.
random_tree
A random tree.
sq_trees
sq_trees
An object of class multiPhylo
of length 18.
Assumes that tree 1 is perfectly resolved, but that the resolution of tree 2 can vary.
SymmetricDifferenceLineEnds(nsd) SymmetricDifferenceLines(nsd, ...)
SymmetricDifferenceLineEnds(nsd) SymmetricDifferenceLines(nsd, ...)
nsd |
Vector specifying normalized symmetric differences to plot. |
... |
Further parameters to pass to
|
Returns a matrix of dim (length(nsd), 6)
, with columns named
r2a
, da
, sa
, r2b
, db
and sb
.
Lines from a
to b
in each row connect points of equal symmetric difference.
SymmetricDifferenceLines()
: Plot the lines onto the active ternary plot.
Martin R. Smith ([email protected])
TQDist()
and TQAE()
are convenience functions that writes a list of
trees to text files that can be processed by the C implementation of tqDist
(Sand et al. 2014).
tqDist is then called, and the temporary file is deleted when analysis is
complete.
TQDist(trees) TQAE(trees)
TQDist(trees) TQAE(trees)
trees |
List of phylogenetic trees, of class |
Quartets can be resolved in one of five ways, which Brodal et al. (2013) and Holt et al. (2014) distinguish using the letters A-E, and Estabrook et al. (1985) refer to as:
s = resolved the same in both trees;
d = resolved differently in both trees;
r1 = resolved only in tree 1;
r2 = resolved only in tree 2 (the comparison tree);
u = unresolved in both trees.
TQDist()
returns the quartet distance between each pair of trees.
TQAE()
returns the number of resolved quartets in agreement between
each pair of trees ("A" in Brodal et al. 2013) and the number of quartets
that are unresolved in both trees ("E" in Brodal et al. 2013).
Martin R. Smith ([email protected])
Brodal GS, Fagerberg R, Mailund T, Pedersen CNS, Sand A (2013).
“Efficient algorithms for computing the triplet and quartet distance between trees of arbitrary degree.”
SODA '13 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 1814–1832.
doi:10.1137/1.9781611973105.130.
Estabrook GF, McMorris FR, Meacham CA (1985).
“Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units.”
Systematic Zoology, 34(2), 193–200.
doi:10.2307/2413326.
Holt MK, Johansen J, Brodal GS (2014).
“On the scalability of computing triplet and quartet distances.”
In Proceedings of 16th Workshop on Algorithm Engineering and Experiments (ALENEX) Portland, Oregon, USA.
Sand A, Holt MK, Johansen J, Brodal GS, Mailund T, Pedersen CNS (2014).
“tqDist: a library for computing the quartet and triplet distances between binary or general trees.”
Bioinformatics, 30(14), 2079–2080.
ISSN 1460-2059, doi:10.1093/bioinformatics/btu157.
CompareQuartets()
, QuartetStatus()
Visualize quartet difference on trees, by split
VisualizeQuartets( tree1, tree2, style = "pie", setPar = TRUE, precision = 3L, Plot = plot.phylo, scale = 1L, spectrum = viridisLite::viridis(101), legend = TRUE, ... )
VisualizeQuartets( tree1, tree2, style = "pie", setPar = TRUE, precision = 3L, Plot = plot.phylo, scale = 1L, spectrum = viridisLite::viridis(101), legend = TRUE, ... )
tree1 , tree2
|
Trees of class |
style |
Character string specifying split labels with an unambiguous abbreviation of:
|
setPar |
Logical specifying whether graphical parameters should be set to display trees side by side. |
precision |
Integer specifying number of significant figures to display when reporting matching scores. |
Plot |
Function to use to plot trees. |
scale |
Numeric, enlargement factor for split labels. |
spectrum |
101-element vector specifying a range of colours by which to colour matches. |
legend |
Logical specifying whether to display simple legend. |
... |
Additional parameters to send to |
VisualizeQuartets()
invisibly returns a list with two elements,
named tree1
and tree2
, containing a matrix.
Each row corresponds to a split within that tree; columns correspond to:
The internal numbering of the node corresponding to each split,
as displayed by ape::nodelabels()
The status of each quartet relative to that
split, as documented in QuartetStatus()
The number of quartets resolved by that split, i.e. s
+ d
The proportion of quartets resolved by that node that are
resolved in the same manner in the other tree; i.e. s / s + d
Martin R. Smith ([email protected])
library("TreeTools", quietly = TRUE) # Simple plot VisualizeQuartets(BalancedTree(10), CollapseNode(PectinateTree(10), 19), style = "label") # Plot with custom graphical parameters origPar <- par(mfrow = c(2, 2)) VisualizeQuartets(BalancedTree(10), CollapseNode(PectinateTree(10), 19), setPar = FALSE) VisualizeQuartets(BalancedTree(10), CollapseNode(PectinateTree(10), 19), style = "bar", legend = FALSE, setPar = FALSE) # Circle size denotes similarity par(mfrow = c(2, 1), mar = rep(0.1, 4)) vq <- VisualizeQuartets( tree1 = BalancedTree(20), tree2 = CollapseNode(PectinateTree(20), 29:33), style = "size", scale = 2, setPar = FALSE # necessary for node labels to align ) # Manually add custom node labels percentSame <- paste(round(vq[["tree2"]][, "same"] * 100, 1), "%") nodelabels(percentSame, vq[["tree2"]][, "node"], frame = "n", bg = NA, # No frame or background adj = 0.5 # align label ) # restore original graphical parameters par(origPar)
library("TreeTools", quietly = TRUE) # Simple plot VisualizeQuartets(BalancedTree(10), CollapseNode(PectinateTree(10), 19), style = "label") # Plot with custom graphical parameters origPar <- par(mfrow = c(2, 2)) VisualizeQuartets(BalancedTree(10), CollapseNode(PectinateTree(10), 19), setPar = FALSE) VisualizeQuartets(BalancedTree(10), CollapseNode(PectinateTree(10), 19), style = "bar", legend = FALSE, setPar = FALSE) # Circle size denotes similarity par(mfrow = c(2, 1), mar = rep(0.1, 4)) vq <- VisualizeQuartets( tree1 = BalancedTree(20), tree2 = CollapseNode(PectinateTree(20), 29:33), style = "size", scale = 2, setPar = FALSE # necessary for node labels to align ) # Manually add custom node labels percentSame <- paste(round(vq[["tree2"]][, "same"] * 100, 1), "%") nodelabels(percentSame, vq[["tree2"]][, "node"], frame = "n", bg = NA, # No frame or background adj = 0.5 # align label ) # restore original graphical parameters par(origPar)