ZOOL 304
Class Notes
Chapter 12, Systematics
Systematics is the discipline of biology which classifies species into higher-level groups (taxa such as genera, families, orders, classes, phyla). A widely accepted goal for systematics is a system of classification which is "natural", in which each taxon consists of species that all share the same most-recent common ancestor.
Reconstruction of phylogeny (i.e., "the tree of life") is therefore a major goal of systematics. We shall presume that there is but one "true" tree of life, with all living things related by descent. But we also recognize that the true tree cannot be observed directly. Any representation of phylogeny is a hypothesis. Some branches of the tree of life are strongly supported and believed with great confidence; others remain quite obscure. Here we shall concentrate on the tools and vocabulary used in phylogenetic reconstruction.
Discussion.
[The following notes (and the quoted headings) are adapted from Chapter 17 of Mark Ridley's textbook, EVOLUTION, 2nd ed. (1996), Blackwell Science, Inc., Cambridge MA. ISBN 0-86542-495-0.]
304 index page
- The basic logic of phylogenetic reconstruction is quite straightforward.
- Nature provides species. These species have a phylogenic history, but that history cannot be observed directly.
- There is only one true phylogeny.
- Inheritance from common ancestry produces similarity.
- Evolutionary divergence produces differences.
- We arrange the species according to similarities and differences.
- The result should reveal phylogeny
- Unfortunately, Nature does not cooperate. Nature provides misleading and contradictory clues.
- Three conceptual tools are available:
- Cladistic analysis based on reliable synapomorphy (derived homology).
- Cladistic analysis using parsimony and numerous similarities and differences.
- "Distance" analysis, encompassing a variety of effective methods (in addition to phenetics which is quite unreliable).
- The difficult job for phylogenetic reconstruction, like that for classification, is deciding which way of choosing characters is most reliable in any particular case.
304 index pageSome definitions. A specialized jargon has developed in an attempt to address the problems and potential semantic difficulties which attend our efforts to reconstruct phylogeny. Following are some of the more significant terms:
- Rooted and unrooted trees.
- An unrooted tree indicates relationships without specifying any direction for evolutionary change. It has no ancestral "root" (or stem), and so does not indicated which branches came earlier and which ones later.
- A rooted tree does include indication of ancestor-descendent relationships.
- Informative and uninformative characters.
- Informative characters are those which are shared among some but not all members of a group in question.
- Uninformative characters are those which which are shared by all members of the group OR which are possessed only by single members. In neither case can they be used for inferring relationships.
- Homology is similarity of traits in two or more species due to descent from a common ancestor.
- Synapomorphy is homology based on recent shared ancestry; characterizing a monophyletic group.
- Plesiomorphy is homology based on more distant ancestry (also symplesiomorphy); see paraphyly.
Plesiomorphy and synapomorphy are associate with the ideas of "ancestral" and "derived" respectively (or, somewhat problematically, with "primitive" and "advanced"). Feathers are a synapomorphy for birds (derived within the group) while four limbs are a plesiomorphy for birds (inherited from a distant tetrapod vertebrate ancestor).
- Homoplasy is similarity NOT due to homology; resulting from convergence, parallelism or reversal.
- Monophyletic describes a taxon which includes an ancestral species and all of its descendent species. Operationally, a monophyletic group is defined by a suite of shared-derived characters (synapomorphies). Monophyletic groups are sometimes called clades. Ideally, all taxa should be monophyletic.
- Paraphyletic describes a taxon which includes an ancestral species and some but not all of its descendents. Operationally, a paraphyletic group is defined by a suite of ancestral traits (plesiomorphies) which have been modified or lost in the excluded species.
- Polyphyletic describes a taxon which includes species whose common ancestor is not included within the group. Operationally, a polyphyletic group is defined by convergent traits (homoplasies).
Most of these terms are defined "ideally", on the basis of the one "true" phylogeny, which in generally remains unknown. Thus any initial application of any of these terms should be regarded as a hypothesis rather than a fact. Only when an attribution of homology or of monophyly is supported by several independent lines of evidence, and is not challenged by substantially contradictory evidence, should it be regarded as established fact.
304 index page"The parsimony principle works if evolutionary change is improbable."
- Parsimony is an ancient guide for inference.
- Occam's Razor is the classic assertion of parsimony:
Essentia non sunt multiplicanda praeter necessitatum. Essentials shall not be multiplied beyond necessity. Explanations should be kept as simple as possible- When applied in the reconstruction of phylogeny, parsimony is usually taken to mean that the true tree will call for fewer evolutionary changes during phylogeny than would be needed to fit the same set of similarities and differences onto an incorrect tree.
- This would be fine if any evolutionary change occurred only once and was never reversed. Unfortunately, that is not the case. Any attempt at phylogenetic reconstruction soon reveals that some similarities must be convergent or parallel (i.e., occurring multiple times) while some differences must involve reversal.
- Therefore, rigid parsimony cannot be trusted absolutely.
- The principle of parsimony makes good sense, if it is used as fallible guide and not as an unimpeachable law.
"Phylogenetic inference uses two principles: parsimony and distance."
- When used for phylogenetic inference, the parsimony principle involves counting and minimizing the number of discrete transformations which must occur in a phylogenetic tree.
- The distance principle involves arranging tree branches of various lengths so the distances between every species pair can be most closely matched by summing distances along each branch of the tree. Distance, in turn, is a quantitative measure of similarity; the greater the distance, the less similarity.
- In simple cases, examples of these two methods can look very similar. Ideally they should give the same results (but in the real world they often do not).
- Both methods are computationally intense for any but the smallest sets of species.
"In most real cases, not all characters suggest the same phylogeny."
- True synapomorphies (derived homologies, reflecting shared ancestry) cannot conflict.
- Similarities which are not true synapomorphies may be due homoplasy (convergence, parallel evolution, or reversal) or symplesiomorphy (shared, ancestral similarity).
- Homoplasy and symplesiomorphy are both common.
- In practice, the job of reconstructing phylogeny is largely one of distinguishing true synapomorphy from homoplasy.
"Homologies can be distinguished from analogies by several criteria."
- Several features often characterize reliable homologies:
- They share "fundamental structure" (whatever that might mean). In practice, "fundamental structure" is generally taken to mean composition from similar elements, similarly arranged, like the various bones and muscles of homologous vertebrate limbs.
- They share similar relations to surrounding characters, such as relative position in the body.
- They share similar embryonic development (and genetic specification).
- Unfortunately, none of the above criteria can be trusted absolutely. There are well-established exceptions to each of them.
- Homolog similarities tend to remain evident in spite of modification for different modes of life.
- In contrast, analogies are often similar specifically because of adaptation for a common way of life. Therefore, similarities that clearly represent adaptation for current needs are poor candidates for reliable homology.
"Derived homologies are more reliable indicators of phylogenetic relations than are ancestral homologies."
- Paraphyletic groups are similar because of ancestral similarities (symplesiomorphies) that have been lost (or highly modified) from some members of the complete monophyletic group.
- Recognizing that a highly-modified group is itself monophyletic is often easy; recognizing the remaining paraphyletic assemblage is NOT monophyletic can be much less obvious.
- The following section describe tools for (tentatively) distinguishing between synapomorphies and symplesiomorphies.
"The polarity of character states can be inferred by three main techniques."
- "Outgroup comparison."
- An outgroup is a group that lies outside the group whose phylogeny is being analyzed.
- Ideally, an outgroup should be closely related to the group in question.
- Traits shared among some members of the group in question which are also present in the outgroup are likely to be ancestral homologies (symplesiomorphies) and therefore not evidence of monophyletic relationship within the group.
- Identifying suitable outgroups requires some prior knowledge of phylogeny.
- Outgroup comparison can also be confounded by homoplasy, in which case multiple outgroups can be helpful.
- "The embryological criterion."
- Von Baer's first law, that ancestral characters appear earlier in embryonic development than derived characters, bears a distinct similarity to Haeckel's infamous biogenetic law, that "Ontogeny recapitulates phylogeny."
- By limiting the application of his principle to the embryonic sequence of developmental states, von Baer avoided the error of assuming that embryonic development recapitulates the sequence of ancestral adult states.
- Unfortunately, we know that von Baer's first law does have exceptions (embryonic forms can and have changed evolutionarily, independently from later life-stages, such that related but strikingly dissimilar embryos can give rise to very similar adults.) Since we often know even less about embryology than we do about phylogeny, applying the embryological criterion to infer phylogeny runs a decided risk of error or circular reasoning.
- "The fossil record."
- To the extent that the fossil record can be trusted, ancestral traits can be directly observed.
- However, here the problem lies in deciding which fossil creatures are ancestral to which group. Even if the fossil is older than the first known appearance of the group in question, it is generally difficult to know for certain that the group didn't actually diverge earlier still and simply fail to leave a record of its beginnings.
"Any residual character conflict can be resolved by parsimony."
- In spite of all the uncertainty, it is possible to know, for certain, that all of the rules for inferring homology do have exceptions. This is proven by character conflict, in which there is no single tree which is consistent with all the evidence, and in which different rules for inference yield different best trees.
- Faced with unresolved uncertainty as a result of conflicting characters, there are four strategies:
- Gather more data (often impractical or excessively expensive).
- Intensify analysis of available data (often fruitless).
- Suspend judgment.
- Assert the overriding correctness of parsimony.
- Of these four, assertion of parsimony is chosen by many contemporary cladists. This has the virtue of giving "definite" answers. Unfortunately, such definite answers may well be wrong.
- Suspension of judgment is usually a sound choice, but deeply unsatisfying.
"Molecular sequences are becoming increasingly important in phylogenetic inference, and they have distinct properties."
- "The logic of phylogenetic inference is identical for molecular and morphological characters."
- Molecular characters are available in vast numbers.
- Every amino acid in every protein is a character.
- Similarly, every base-pair in every gene is a character.
- Each individual character is quite unreliable; character conflict is widespread.
- There are only possible four character states for a DNA base-pair.
- There are only 20 possible character states for an amino acid.
- With such low numbers, convergence and reversal become reasonably probable.
- So large numbers of characters are needed.
- Comparison of sequences is straightforward, using statistical techniques based on distance (using simple counts of differences) or parsimony (using counts of discrete character state-changes).
- These statistical techniques are limited by insurmountable computational difficulties to relatively small sets of taxa.
- Most commonly, molecular methods are used either with a group of species within a single taxon whose monophyly is already known (probably on the basis of morphology) or with representative species from several higher taxa whose separate monophyly is already known (again, most probably .from morphology)
"Molecular sequences can be used to infer an unrooted tree for a group of species."
- Steps involved in reconstruction:
- Acquire comparable molecular sequence data. (This is not always as straightforward as it sounds.)
- Align the sequences (also can be tricky).
- Find informative sites.
- Draw all possible trees (computationally intractable for large data sets) or sample from possible trees.
- Determine distance measure or count state-changes on each tree.
- Choose the optimal tree (based on optimality criterion of distance statistic or parsimony).
- Root the tree using outgroup comparison.
"Different molecules evolve at different rates, and molecular evidence can be tuned to solve particular phylogenetic problems."
- The job here is to find a molecule with the highest number of informative sites.
- Mitochondrial DNA is useful for recent lineages (diverging over the last few million years).
- Nuclear ribosomal RNA genes are useful for more ancient lineages (diverging over hundreds of millions of years).
"Molecular phylogenetic research encounters difficulties when the number of possible trees is large and not enough informative evidence exists."
- Note that with a practical upper limit of 25 taxa for which trees can be tested exhaustively, additional criteria or assumptions must be introduced.
- There are two distinct issues, "optimality criterion" and algorithm.
- The "optimality criterion" is discussed above (parsimony or distance). Several distinct "distance" criteria are available.
- "Algorithm" refers to the method the computing program uses to search for an optimal tree. Because an exhaustive search is impossible for any but small sets of taxa (25 or fewer), some method of trial sampling must be used. All available algorithms may fail to find the best tree.
"Unrooted trees can be inferred from other kinds of evidence, such as chromosomal inversions in Hawaiian fruitflies ..."
- Drosophila provides an especially elegant example of phylogenetic inference. These flies have wonderfully large, banded chromosomes. Since each chromosomal inversion is a good, reliable synapomorphy, a precise phylogeny has be inferred at the species level.
"Some molecular evidence can only be used to infer phylogenetic relations with distance statistics."
- Direct sequence comparison is a powerful method with increasing popularity.
- The older molecular techniques, immunological cross-reactivity and DNA annealing temperatures, give measures of similarity that cannot be analyzed cladistically.
"Comparing molecular evidence and paleontological evidence."
- When evidence is properly understood, all lines of evidence should be congruent (there is, after all, only one true "tree of life").
- However, when one line of evidence is scanty, or based on dubious assumptions, while another appears stronger, the stronger line will prevail.
- There is no automatic preference for morphological evidence or for molecular evidence or for paleontological evidence. Whatever works, works.
"Conclusion." Please take note: "It is easy to be deceived by discussions of phylogenetic inference into thinking that it represents an exceptionally uncertain, shaky kind of science." It is unfortunately true, in some cases, that "'Paleontology is mute, comparative anatomy meaningless, and embryology lies.'" But it is easy to overlook the fact that current interest tends to focus "on unsolved problems--and these issues tend to be the difficult ones." That's true in many areas of evolutionary biology, not just phylogenetic analysis.
Notes for chapter 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17
Comments and questions: dgking@siu.edu
Department of Zoology e-mail: zoology@zoology.siu.edu
Comments and questions related to web server: webmaster@science.siu.edu