This page defines key technical terms used throughout the site and cites the primary literature supporting the methods and findings. Highlighted terms in the main text link to definitions below. Superscript numbers link to the corresponding journal reference in the Literature Cited section.
Glossary
- Alignment (multiple sequence alignment)
- The process of inserting gap characters into DNA sequences so that homologous positions (sites from the ancestral nucleotide) line up in the same column across all sequences. The resulting matrix (rows = taxa, columns = aligned positions) is the direct input to all tree-building methods. See also: MUSCLE. Edgar, 2004
- Bayesian inference
- A phylogenetic method that estimates the probability distribution over all possible trees given the sequence data and a substitution model, using Markov chain Monte Carlo (MCMC) sampling. Returns posterior probabilities for each clade. Implemented in the program MrBayes. Ronquist & Huelsenbeck, 2003
- Bootstrap support
- A resampling measure of confidence. Alignment columns are sampled with replacement n-times to create pseudo-replicates; a fresh tree is built for each. The bootstrap percentage on a node is the fraction of replicate trees containing that clade. Values ≥ 95% are considered strong support, values below 50% indicate ambiguity. Felsenstein, 1985
- Clade
- A group consisting of an ancestor and allof its descendants, forming complete branch of the tree of life.
- COI (cytochrome oxidase I)
- A mitochondrial protein-coding gene encoding subunit I of cytochrome c oxidase. Its moderate evolutionary rate, universal PCR primers, and ~650 bp usable length make it the standard DNA barcode marker for animal species identification. Hebert et al., 2003; Ward et al., 2005
- Convergent evolution
- The independent evolution of similar traits in distantly related lineages in response to similar ecological pressures.
- Cryptic species
- Genetically distinct populations that are morphologically indistinguishable.
- DNA barcoding
- The use of a short, standardized DNA region, typically COI for animals, to identify species, screen for mislabeled specimens, and detect cryptic diversity. Sequences are compared against reference databases such as GenBank or BOLD. Hebert et al., 2003
- GenBank
- The NIH genetic sequence database maintained by NCBI, archiving all publicly submitted nucleotide sequences with associated metadata (taxonomy, collection locality, publication). Every COI sequence used in this study was downloaded from GenBank; accession numbers are listed on the Build Your Own Tree page.
- GTR model (General Time Reversible)
- A DNA substitution model that allows all six possible nucleotide substitution rates (A/G, C/T, A/C, A/T, G/C, G/T) to take independent values, along with unequal base frequencies. The most general time-reversible model; commonly extended with +Γ (gamma distribution of rate variation across sites) and +I (proportion of invariant sites). Used for Maximum Likelihood and Bayesian analyses in this tutorial.
- K80 model (Kimura 2-parameter)
- A DNA substitution model that distinguishes between transitions (purine/purine: A/G; pyrimidine/pyrimidine: C/T) and transversions (purine/pyrimidine), reflecting the biological observation that transitions occur roughly twice as often. Used in this study for Neighbor-Joining pairwise distance calculation. Kimura, 1980
- Maximum Likelihood (ML)
- A phylogenetic method that finds the tree topology and branch lengths maximizing the probability of observing the sequence data under an explicit model. Uses the full alignment column-by-column rather than a pairwise distance summary, making it generally more accurate than Neighbor-Joining but more computationally intensive. Felsenstein, 1981
- MCMC (Markov chain Monte Carlo)
- A computational technique used in Bayesian inference to sample from the posterior probability distribution of trees. A chain of proposed tree modifications is run for millions of steps; each proposal is accepted or rejected according to its effect on the posterior probability. After a "burn-in" period, the sampled trees are summarized into a majority-rule consensus tree.
- MUSCLE
-
A fast multiple sequence alignment algorithm optimized for protein-coding
sequences. Used in this study
via the R
msapackage. Edgar, 2004 - Neighbor-Joining (NJ)
- A distance-based phylogenetic method that first computes a pairwise distance matrix (using a model such as K80), then iteratively joins the pair of taxa with the smallest corrected distance, building a tree in O(n³) time. Extremely fast and reliable for divergences below ~20%; widely used as a quick first approximation. Saitou & Nei, 1987
- Outgroup
- A taxon known to have diverged from all members of the study group (the ingroup) before any ingroup members diverged from each other. Placing the root of the tree between the outgroup and the ingroup gives the tree a direction in time. A good outgroup is closely enough related for alignment to be reliable, but clearly outside the group of interest. In this study, Bull Shark (Carcharhinus leucas) is the primary outgroup.
- PCR (polymerase chain reaction)
- A laboratory technique for exponentially amplifying a specific DNA region from a complex template.
- Phylogeny / phylogenetic tree
- A branching diagram representing the inferred evolutionary history of a group of taxa. Tips (leaves) are the sampled taxa; internal nodes represent hypothetical common ancestors; branch lengths represent evolutionary distance in substitutions per site. The placement of the root (usually by an outgroup) gives the tree a direction in time. Near et al., 2012
- Posterior probability (PP)
- In Bayesian phylogenetics, the probability that a given clade is correct given the data and the model. Ranges from 0 to 1; values ≥ 0.95 are generally considered strong support. PP values tend to be numerically higher than bootstrap percentages for the same node and the two are not directly comparable.
- Sanger sequencing
- The gold-standard method for generating individual DNA sequences.
- Sister taxa
- Two taxa (or clades) that share the most recent common ancestor in a given tree; they are each other's closest relative in the dataset.
Literature Cited
- Hebert, P. D. N., Cywinska, A., Ball, S. L., & deWaard, J. R. (2003). Biological identifications through DNA barcodes. Proceedings of the Royal Society of London B: Biological Sciences, 270(1512), 313–321. doi:10.1098/rspb.2002.2218
- Ward, R. D., Zemlak, T. S., Innes, B. H., Last, P. R., & Hebert, P. D. N. (2005). DNA barcoding Australia's fish species. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1462), 1847–1857. doi:10.1098/rstb.2005.1716
- Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16(2), 111–120. doi:10.1007/BF01731581
- Saitou, N., & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4), 406–425. doi:10.1093/oxfordjournals.molbev.a040454
- Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17(6), 368–376. doi:10.1007/BF01734359
- Ronquist, F., & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19(12), 1572–1574. doi:10.1093/bioinformatics/btg180
- Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. doi:10.1093/nar/gkh340
- Near, T. J., Eytan, R. I., Dornburg, A., Kuhn, K. L., Moore, J. A., Davis, M. P., … & Smith, W. L. (2012). Resolution of ray-finned fish phylogeny and timing of diversification. Proceedings of the National Academy of Sciences, 109(34), 13698–13703. doi:10.1073/pnas.1206625109
- Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39(4), 783–791. doi:10.2307/2408678
- Zhu, M., et al. (2015). A Silurian maxillate placoderm illuminates jaw evolution. Nature, 525, 652–656. PMC4648279
- Florida Museum of Natural History. Spotted Seatrout (Cynoscion nebulosus) species profile. Retrieved from floridamuseum.ufl.edu
- Texas Parks & Wildlife Department. Spotted Seatrout species profile. Retrieved from tpwd.texas.gov
- Sun, Y., Huang, Y., Li, X., Baldwin, C. C., Zhou, Z., Yan, Z., … & He, S. (2021). Large-scale sequencing of flatfish genomes provides insights into the polyphyletic origin of their specialized body plan. Nature Genetics, 53, 742–751. doi:10.1038/s41588-021-00836-9