首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the major challenges for researchers studying phylogeography and shallow-scale phylogenetics is the identification of highly variable and informative nuclear loci for the question of interest. Previous approaches to locus identification have generally required extensive testing of anonymous nuclear loci developed from genomic libraries of the target taxon, testing of loci of unknown utility from other systems, or identification of loci from the nearest model organism with genomic resources. Here, we present a fast and economical approach to generating thousands of variable, single-copy nuclear loci for any system using next-generation sequencing. We performed Illumina paired-end sequencing of three reduced-representation libraries (RRLs) in chorus frogs (Pseudacris) to identify orthologous, single-copy loci across libraries and to estimate sequence divergence at multiple taxonomic levels. We also conducted PCR testing of these loci across the genus Pseudacris and outgroups to determine whether loci developed for phylogeography can be extended to deeper phylogenetic levels. Prior to sequencing, we conducted in silico digestion of the most closely related reference genome (Xenopus tropicalis) to generate expectations for the number of loci and degree of coverage for a particular experimental design. Using the RRL approach, we: (i) identified more than 100,000 single-copy nuclear loci, 6339 of which were obtained for divergent conspecifics and 904 of which were obtained for heterospecifics; (ii) estimated average nuclear sequence divergence at 0.1% between alleles within an individual, 1.1% between conspecific individuals that represent two different clades, and 1.8% between species; and (iii) determined from PCR testing that 53% of the loci successfully amplify within-species and also many amplify to the genus-level and deeper in the phylogeny (16%). Our study effectively identified nuclear loci present in the genome that have levels of sequence divergence on par with mitochondrial loci commonly used in phylogeography. Specifically, we estimated that ~7% of loci in the chorus frog genome are >3% divergent within species; this translates to a prediction of approximately 50,000 single-copy loci in the genome with >3% divergence. Moreover, successful amplification of many loci at deeper phylogenetic levels indicates that the RRL approach represents an efficient method for rapid identification of informative loci for both phylogenetics and phylogeography. We conclude by making recommendations for minimizing the cost and maximizing the efficiency of locus identification for future studies in this field.  相似文献   

2.
Gene capture coupled with the next‐generation sequencing has become one of the preferred methods of subsampling genomes for phylogenomic studies. Many exon markers have been developed in plants, sharks, frogs, reptiles, fishes, and others, but no universal exon markers have been tested in ray‐finned fishes. Here, we identified a suite of “single‐copy” protein‐coding sequence (CDS) markers through comparing eight fish genomes, and tested them empirically in 83 species (33 families and nine orders or higher clades: Acipenseriformes, Lepisosteiformes, Elopomorpha, Osteoglossomorpha, Clupeiformes, Cypriniformes, Gobiaria, Carangaria, and Eupercaria; sensu Betancur et al. 2013). Sorting the markers according to their completeness and phylogenetic decisiveness in taxa tested resulted in a selection of 4,434 markers, which were proven to be useful in reconstructing phylogenies of the ray‐finned fishes at different taxonomic levels. We also proposed a strategy of refining baits (probes) design a posteriori based on empirical data. The markers that we have developed may greatly enrich the batteries of exon markers for phylogenomic study in ray‐finned fishes.  相似文献   

3.
4.
The systematic identification of the orthologous features of related organisms greatly facilitates comparative genomics, including research on genome evolution and comparative genetic mapping. In this study, we selected 274 unique gene sequences for the development of PCR-based genetic markers across fifteen legume genomes, representing six crop or model legume species from the phaseoloid and inverted repeat loss clades (IRLC). DNA sequence analysis demonstrated that 129 of the amplified fragments represented single copy loci across most target diploid genomes. The majority of these markers are intron-spanning (70.5%) and linked to legume genetic maps (85.3%). The markers were grouped into four main categories: (1) intron-spanning relatively conserved, (2) intron-spanning diverged, (3) exon-derived conserved, and (4) exon-derived diverged. The extent of sequence divergence within each category indicates that the corresponding markers may have utility for assessing phylogenetic relationships at different, but overlapping, taxonomic levels. We tested marker performance on genomes that had not been previously sampled, representing 95 different species that span the diversity of the Fabaceae. Phylogenetic analyses support the orthology of amplified sequences, with the notable exception of an ambiguous affiliation of Lotus relative to the IRLC and phaseoloid clades.  相似文献   

5.
6.
Metazoa-level universal single-copy orthologs (mzl-USCOs) are universally applicable markers for DNA taxonomy in animals that can replace or supplement single-gene barcodes. Previously, mzl-USCOs from target enrichment data were shown to reliably distinguish species. Here, we tested whether USCOs are an evenly distributed, representative sample of a given metazoan genome and therefore able to cope with past hybridization events and incomplete lineage sorting. This is relevant for coalescent-based species delimitation approaches, which critically depend on the assumption that the investigated loci do not exhibit autocorrelation due to physical linkage. Based on 239 chromosome-level assembled genomes, we confirmed that mzl-USCOs are genetically unlinked for practical purposes and a representative sample of a genome in terms of reciprocal distances between USCOs on a chromosome and of distribution across chromosomes. We tested the suitability of mzl-USCOs extracted from genomes for species delimitation and phylogeny in four case studies: Anopheles mosquitos, Drosophila fruit flies, Heliconius butterflies and Darwin's finches. In almost all instances, USCOs allowed delineating species and yielded phylogenies that corresponded to those generated from whole genome data. Our phylogenetic analyses demonstrate that USCOs may complement single-gene DNA barcodes and provide more accurate taxonomic inferences. Combining USCOs from sources that used different versions of ortholog reference libraries to infer marker orthology may be challenging and, at times, impact taxonomic conclusions. However, we expect this problem to become less severe as the rapidly growing number of reference genomes provides a better representation of the number and diversity of organismal lineages.  相似文献   

7.
Next-generation sequencing technologies have allowed researchers to determine the collective genomes of microbial communities co-existing within diverse ecological environments. Varying species abundance, length and complexities within different communities, coupled with discovery of new species makes the problem of taxonomic assignment to short DNA sequence reads extremely challenging. We have developed a new sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM for metagenomic analysis. TAC-ELM uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. The input features consist of GC content and oligonucleotides. TAC-ELM is evaluated on two metagenomic benchmarks with sequence read lengths reflecting the traditional and current sequencing technologies. Our empirical results indicate the strength of the developed approach, which outperforms state-of-the-art taxonomic classifiers in terms of accuracy and implementation complexity. We also perform experiments that evaluate the pervasive case within metagenome analysis, where a species may not have been previously sequenced or discovered and will not exist in the reference genome databases. TAC-ELM was also combined with BLAST to show improved classification results. Code and Supplementary Results: http://www.cs.gmu.edu/~mlbio/TAC-ELM (BSD License).  相似文献   

8.
Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However; for some organisms, it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.  相似文献   

9.
Genetic tools are increasingly used to identify and discriminate between species. One key transition in this process was the recognition of the potential of the ca 658bp fragment of the organelle cytochrome c oxidase I (COI) as a barcode region, which revolutionized animal bioidentification and lead, among others, to the instigation of the Barcode of Life Database (BOLD), containing currently barcodes from >7.9 million specimens. Following this discovery, suggestions for other organellar regions and markers, and the primers with which to amplify them, have been continuously proposed. Most recently, the field has taken the leap from PCR‐based generation of DNA references into shotgun sequencing‐based “genome skimming” alternatives, with the ultimate goal of assembling organellar reference genomes. Unfortunately, in genome skimming approaches, much of the nuclear genome (as much as 99% of the sequence data) is discarded, which is not only wasteful, but can also limit the power of discrimination at, or below, the species level. Here, we advocate that the full shotgun sequence data can be used to assign an identity (that we term for convenience its “DNA‐mark”) for both voucher and query samples, without requiring any computationally intensive pretreatment (e.g. assembly) of reads. We argue that if reference databases are populated with such “DNA‐marks,” it will enable future DNA‐based taxonomic identification to complement, or even replace PCR of barcodes with genome skimming, and we discuss how such methodology ultimately could enable identification to population, or even individual, level.  相似文献   

10.
The genus Elymus L. in the tribe Triticeae (Poaceae) includes economically and ecologically important forage grasses. The genus contains the pivotal St genome from Pseudoroegneria in combination with other genomes in the tribe. Many Elymus species are tetraploids containing the StY genomes. It is thought that polyploidization characterizes the speciation of the genus in which the Y is considered as another key genome. Based on data from cytological, genome in situ hybridization, and molecular studies, we hypothesized an endo-allopolyploidy origin of the StY-genome species from the autotetraploid Pseudoroegneria species. To test this hypothesis, we amplified, cloned, and sequenced five single-copy nuclear genes (i.e., alcohol dehydrogenase 1–3, Adh1–Adh3, RNA polymerase II, Rpb2; and Waxy) from Elymus, Pseudoroegneria, and Hordeum species. The phylogenetic trees constructed based on the sequencing analyses of all genes indicated that diploid and autotetraploid Pseudoroegneria species were closely related, although with considerable genetic variation in tetraploids. In addition, the StY-genome Elymus species tended to have a close relationship with the diploid and autotetraploid Pseudoroegneria species, although different phylogenetic relationships among the gene trees were detected. These results indicated that the StY-genome species may have an autotetraploid origin and experienced recurrent hybridization. The complex St genomes in Pseudoroegneria in the polyploid state may gain more opportunities for within-species differentiation and recurrent hybridization. As a result, series modified versions of St genomes evolved into the StY genomes in some Elymus species.  相似文献   

11.
We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8× depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373 bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,657 bp. Gene order is more similar to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperzia chloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophyte chloroplast genome data also enable a better reconstruction of the basal tracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, inferred amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.  相似文献   

12.
DNA metabarcoding is a promising approach for rapidly surveying biodiversity and is likely to become an important tool for measuring ecosystem responses to environmental change. Metabarcoding markers need sufficient taxonomic coverage to detect groups of interest, sufficient sequence divergence to resolve species, and will ideally indicate relative abundance of taxa present. We characterized zooplankton assemblages with three different metabarcoding markers (nuclear 18S rDNA, mitochondrial COI, and mitochondrial 16S rDNA) to compare their performance in terms of taxonomic coverage, taxonomic resolution, and correspondence between morphology‐ and DNA‐based identification. COI amplicons sequenced on separate runs showed that operational taxonomic units representing >0.1% of reads per sample were highly reproducible, although slightly more taxa were detected using a lower annealing temperature. Mitochondrial COI and nuclear 18S showed similar taxonomic coverage across zooplankton phyla. However, mitochondrial COI resolved up to threefold more taxa to species compared to 18S. All markers revealed similar patterns of beta‐diversity, although different taxa were identified as the greatest contributors to these patterns for 18S. For calanoid copepod families, all markers displayed a positive relationship between biomass and sequence reads, although the relationship was typically strongest for 18S. The use of COI for metabarcoding has been questioned due to lack of conserved primer‐binding sites. However, our results show the taxonomic coverage and resolution provided by degenerate COI primers, combined with a comparatively well‐developed reference sequence database, make them valuable metabarcoding markers for biodiversity assessment.  相似文献   

13.
Transposable elements (TEs) – selfish DNA sequences that can move within the genome – comprise a large proportion of the genomes of many organisms. Although low‐coverage whole‐genome sequencing can be used to survey TE composition, it is noneconomical for species with large quantities of DNA. Here, we utilize restriction‐site associated DNA sequencing (RADSeq) as an alternative method to survey TE composition. First, we demonstrate in silico that double digest restriction‐site associated DNA sequencing (ddRADseq) markers contain the same TE compositions as whole genome assemblies across arthropods. Next, we show empirically using eight Synalpheus snapping shrimp species with large genomes that TE compositions from ddRADseq and low‐coverage whole‐genome sequencing are comparable within and across species. Finally, we develop a new bioinformatic pipeline, TERAD, to extract TE compositions from RADseq data. Our study expands the utility of RADseq to study the repeatome, making comparative studies of genome structure for species with large genomes more tractable and affordable.  相似文献   

14.

Background

Obtaining chloroplast genome sequences is important to increase the knowledge about the fundamental biology of plastids, to understand evolutionary and ecological processes in the evolution of plants, to develop biotechnological applications (e.g. plastid engineering) and to improve the efficiency of breeding schemes. Extraction of pure chloroplast DNA is required for efficient sequencing of chloroplast genomes. Unfortunately, most protocols for extracting chloroplast DNA were developed for eudicots and do not produce sufficiently pure yields for a shotgun sequencing approach of whole plastid genomes from the monocot grasses.

Methodology/Principal Findings

We have developed a simple and inexpensive method to obtain chloroplast DNA from grass species by modifying and extending protocols optimized for the use in eudicots. Many protocols for extracting chloroplast DNA require an ultracentrifugation step to efficiently separate chloroplast DNA from nuclear DNA. The developed method uses two more centrifugation steps than previously reported protocols and does not require an ultracentrifuge.

Conclusions/Significance

The described method delivered chloroplast DNA of very high quality from two grass species belonging to highly different taxonomic subfamilies within the grass family (Lolium perenne, Pooideae; Miscanthus×giganteus, Panicoideae). The DNA from Lolium perenne was used for whole chloroplast genome sequencing and detection of SNPs. The sequence is publicly available on EMBL/GenBank.  相似文献   

15.
With the decreasing cost and availability of many newly developed bioinformatics pipelines, next-generation sequencing (NGS) has revolutionized plant systematics in recent years. Genome skimming has been widely used to obtain high-copy fractions of the genomes, including plastomes, mitochondrial DNA (mtDNA), and nuclear ribosomal DNA (nrDNA). In this study, through simulations, we evaluated the optimal (minimum) sequencing depth and performance for recovering single-copy nuclear genes (SCNs) from genome skimming data, by subsampling genome resequencing data and generating 10 data sets with different sequencing coverage in silico. We tested the performance of four data sets (plastome, nrDNA, mtDNA, and SCNs) obtained from genome skimming based on phylogenetic analyses of the Vitis clade at the genus level and Vitaceae at the family level, respectively. Our results showed that optimal minimum sequencing depth for high-quality SCNs assembly via genome skimming was about 10× coverage. Without the steps of synthesizing baits and enrichment experiments, coupled with incredibly low sequencing costs, we showcase that deep genome skimming (DGS) is as effective for capturing large data sets of SCNs as the widely used Hyb-Seq approach, in addition to capturing plastomes, mtDNA, and entire nrDNA repeats. DGS may serve as an efficient and economical alternative and may be superior to the popular target enrichment/Hyb-Seq approach.  相似文献   

16.
Applying microsatellite DNA markers in population genetic studies of the pest moth Helicoverpa armigera is subject to numerous technical problems, such as the high frequency of null alleles, occurrence of size homoplasy, presence of multiple copies of flanking sequence in the genome and the lack of PCR amplification robustness between populations. To overcome these difficulties, we developed exon-primed intron-crossing (EPIC) nuclear DNA markers for H. armigera based on ribosomal protein (Rp) and the Dopa Decarboxylase (DDC) genes and sequenced alleles showing length polymorphisms. Allele length polymorphisms were usually from random indels (insertions or deletions) within introns, although variation of short dinucleotide DNA repeat units was also detected. Mapping crosses demonstrated Mendelian inheritance patterns for these EPIC markers and the absence of both null alleles and allele 'dropouts'. Three examples of allele size homoplasies due to indels were detected in EPIC markers RpL3, RpS6 and DDC, while sequencing of multiple individuals across 11 randomly selected alleles did not detect indel size homoplasies. The robustness of the EPIC-PCR markers was demonstrated by PCR amplification in the related species, H. zea, H. assulta and H. punctigera.  相似文献   

17.
Comparative chloroplast genome analyses are mostly carried out at lower taxonomic levels, such as the family and genus levels. At higher taxonomic levels, chloroplast genomes are generally used to reconstruct phylogenies. However, little attention has been paid to chloroplast genome evolution within orders. Here, we present the chloroplast genome of Sedum sarmentosum and take advantage of several available (or elucidated) chloroplast genomes to examine the evolution of chloroplast genomes in Saxifragales. The chloroplast genome of S. sarmentosum is 150,448 bp long and includes 82,212 bp of a large single-copy (LSC) region, 16.670 bp of a small single-copy (SSC) region, and a pair of 25,783 bp sequences of inverted repeats (IRs).The genome contains 131 unique genes, 18 of which are duplicated within the IRs. Based on a comparative analysis of chloroplast genomes from four representative Saxifragales families, we observed two gene losses and two pseudogenes in Paeonia obovata, and the loss of an intron was detected in the rps16 gene of Penthorum chinense. Comparisons among the 72 common protein-coding genes confirmed that the chloroplast genomes of S. sarmentosum and Paeonia obovata exhibit accelerated sequence evolution. Furthermore, a strong correlation was observed between the rates of genome evolution and genome size. The detected genome size variations are predominantly caused by the length of intergenic spacers, rather than losses of genes and introns, gene pseudogenization or IR expansion or contraction. The genome sizes of these species are negatively correlated with nucleotide substitution rates. Species with shorter duration of the life cycle tend to exhibit shorter chloroplast genomes than those with longer life cycles.  相似文献   

18.
Salamanders (Urodela) have among the largest vertebrate genomes, ranging in size from 10 to 120 pg. Although changes in genome size often occur randomly and in the absence of selection pressure, nonrandom patterns of genome size variation are evident among specific vertebrate lineages. Several reports suggest a relationship between species richness and genome size, but the exact nature of that relationship remains unclear both within and across different taxonomic groups. Here, we report (a) a negative relationship between haploid genome size (C‐value) and species richness at the family taxonomic level in salamander clades; (b) a correlation of C‐value and species richness with clade crown age but not with diversification rates; (c) strong associations between C‐value and both geographic area and climatic‐niche rate. Finally, we report a relationship between C‐value diversity and species diversity at both the family‐ and genus‐level clades in urodeles.  相似文献   

19.
Members of the Calliphoridae (blowflies) are significant for medical and veterinary management, due to the ability of some species to consume living flesh as larvae, and for forensic investigations due to the ability of others to develop in corpses. Due to the difficulty of accurately identifying larval blowflies to species there is a need for DNA-based diagnostics for this family, however the widely used DNA-barcoding marker, cox1, has been shown to fail for several groups within this family. Additionally, many phylogenetic relationships within the Calliphoridae are still unresolved, particularly deeper level relationships. Sequencing whole mt genomes has been demonstrated both as an effective method for identifying the most informative diagnostic markers and for resolving phylogenetic relationships. Twenty-seven complete, or nearly so, mt genomes were sequenced representing 13 species, seven genera and four calliphorid subfamilies and a member of the related family Tachinidae. PCR and sequencing primers developed for sequencing one calliphorid species could be reused to sequence related species within the same superfamily with success rates ranging from 61% to 100%, demonstrating the speed and efficiency with which an mt genome dataset can be assembled. Comparison of molecular divergences for each of the 13 protein-coding genes and 2 ribosomal RNA genes, at a range of taxonomic scales identified novel targets for developing as diagnostic markers which were 117–200% more variable than the markers which have been used previously in calliphorids. Phylogenetic analysis of whole mt genome sequences resulted in much stronger support for family and subfamily-level relationships. The Calliphoridae are polyphyletic, with the Polleninae more closely related to the Tachinidae, and the Sarcophagidae are the sister group of the remaining calliphorids. Within the Calliphoridae, there was strong support for the monophyly of the Chrysomyinae and Luciliinae and for the sister-grouping of Luciliinae with Calliphorinae. Relationships within Chrysomya were not well resolved. Whole mt genome data, supported the previously demonstrated paraphyly of Lucilia cuprina with respect to L. sericata and allowed us to conclude that it is due to hybrid introgression prior to the last common ancestor of modern sericata populations, rather than due to recent hybridisation, nuclear pseudogenes or incomplete lineage sorting.  相似文献   

20.
The complete plastid genome sequence of the American cranberry (Vaccinium macrocarpon Ait.) was reconstructed using next-generation sequencing data by in silico procedures. We used Roche 454 shotgun sequence data to isolate cranberry plastid-specific sequences of “HyRed” via homology comparisons with complete sequences from several species available at the National Center for Biotechnology Information database. Eleven cranberry plastid contigs were selected for the construction of the plastid genome-based homologies and on raw reads flowing through contigs and connection information. We assembled and annotated a cranberry plastid genome (82,284 reads; 185x coverage) with a length of 176 kb and the typical structure found in plants, but with several structural rearrangements in the large single-copy region when compared to other plastid asterid genomes. To evaluate the reliability of the sequence data, phylogenetic analysis of 30 species outside the order Ericales (with 54 genes) showed Vaccinium inside the clade Asteridae, as reported in other studies using single genes. The cranberry plastid genome sequence will allow the accumulation of critical data useful for breeding and a suite of other genetic studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号