共查询到20条相似文献,搜索用时 0 毫秒
1.
As a fundamental unit in biology, species are used in a wide variety of studies, and their delimitation impacts every subfield of the life sciences. Thus, it is of utmost importance that species are delimited in an accurate and biologically meaningful way. However, due to morphologically similar, cryptic species, and processes such as incomplete lineage sorting, this is far from a trivial task. Here, we examine the accuracy and sensitivity to sampling strategy of three recently developed methods that aim to delimit species from multi-locus DNA sequence data without a priori assignments of samples to putative species. Specifically, we simulate data at two species tree depths and a variety of sampling strategies ranging from five alleles per species and five loci to 20 alleles per species and 100 loci to test (1) Structurama, (2) Gaussian clustering, and (3) nonparametric delimitation. We find that Structurama accurately delimits even relatively recently diverged (greater than 1.5N generations) species when sampling 10 or more loci. We also find that Gaussian clustering delimits more deeply divergent species (greater than 2.5N generations) relatively well, but is not sufficiently sensitive to delimit more recently diverged species. Finally, we find that nonparametric delimitation performs well with 25 or more loci if gene trees are known without error, but performs poorly with estimated gene genealogies, frequently over-splitting species and mis-assigning samples. We thus suggest that Structurama represents a powerful tool for use in species delimitation. It should be noted, however, that intraspecific population structure may be delimited using this or any of the methods tested herein. We argue that other methods, such as other species delimitation methods requiring a priori putative species assignments (e.g. SpeDeSTEM, Bayesian species delimitation), and other types of data (e.g. morphological, ecological, behavioral) be incorporated in conjunction with these methods in studies attempting to delimit species. 相似文献
2.
A molecular systematic framework for equine strongyles based on ribosomal DNA sequence data 总被引:1,自引:0,他引:1
In this study, molecular data sets were used to address the controversies relating to the systematics of strongyloid nematodes of equids utilising morphological data sets. DNA sequences of the first and second internal transcribed spacers (ITS-1 and ITS-2) of ribosomal DNA were determined for 30 species of equine strongyles and the systematic relationships reconstructed using phenetic and phylogenetic tree-building methods. The molecular data provided support for the hypothesis that the genera with large subglobular buccal capsules are ancestral to those with small cylindrical buccal capsules, but did not provide support for the current division of the subfamilies Strongylinae and Cyathostominae or for some taxonomic groupings (i.e. generic designations of species) within the Cyathostominea based on morphological data. Although not entirely concordant, the current molecular data provide a systematic framework for future studies of equine strongyles, which could be exploited in combination with new, phylogenetically informative morphological data sets. 相似文献
3.
Xin Zhao Samantha Fernández-Brime Mats Wedin Marissa Locke Steven D. Leavitt H. Thorsten Lumbsch 《Organisms Diversity & Evolution》2017,17(2):351-363
Accurate species delimitations are of great importance for effectively characterizing biological diversity. Our criteria for delimiting species have changed dramatically over the last decades with the increasing availability of molecular data and improvement of analytical methods to evaluate these data. Whereas reciprocal monophyly is often seen as an indicator for the presence of distinct lineages, recently diverged species often fail to form monophyletic groups. At the same time, cryptic species have repeatedly been detected in numerous organismal groups. In this study, we addressed the species delimitation in the crustose lichen-forming fungal genus Diploschistes using multilocus sequence data from specimens representing 16 currently accepted species. Our results indicate the presence of previously undetected, cryptic species-level lineages in the subgenus Limborina. In the subgenus Limborina, samples from different continents currently classified under the same species were shown to be only distantly related. At the same time, in parts of subgen. Diploschistes characterized by short branches, none of the currently accepted species formed monophyletic groups. In spite of the lack of monophyly in phylogenetic reconstructions, a multispecies coalescent method provided support for eight of the nine accepted species in subgen. Diploschistes as distinct lineages. We propose to reduce D. neutrophilus to synonymy with D. diacapsis and point out that additional sampling will be necessary before accepting additional species in subgen. Limborina. 相似文献
4.
Analysis of a genetic hitchhiking model, and its application to DNA polymorphism data from Drosophila melanogaster 总被引:6,自引:0,他引:6
Begun and Aquadro have demonstrated that levels of nucleotide variation
correlate with recombination rate among 20 gene regions from across the
genome of Drosophila melanogaster. It has been suggested that this
correlation results from genetic hitchhiking associated with the fixation
of strongly selected mutants. The hitchhiking process can be described as a
series of two-step events. The first step consists of a strongly selected
substitution wiping out linked variation in a population; this is followed
by a recovery period in which polymorphism can build up via neutral
mutations and random genetic drift. Genetic hitchhiking has previously been
modeled as a steady-state process driven by recurring selected
substitutions. We show here that the characteristic parameter of this
steady-state model is alpha v, the product of selection intensity (alpha =
2Ns) and the frequency of beneficial mutations v (where N is population
size and s is the selective advantage of the favored allele). We also
demonstrate that the steady-state model describes the hitchhiking process
adequately, unless the recombination rate is very low. To estimate alpha v,
we use the data of DNA sequence variation from 17 D. melanogaster loci from
regions of intermediate to high recombination rates. We find that alpha v
is likely to be > 1.3 x 10(-8). Additional data are needed to estimate
this parameter more precisely. The estimation of alpha v is important, as
this parameter determines the shape of the frequency distribution of
strongly selected substitutions.
相似文献
5.
6.
A composite-likelihood approach for detecting directional selection from DNA sequence data 下载免费PDF全文
We present a novel composite-likelihood-ratio test (CLRT) for detecting genes and genomic regions that are subject to recurrent natural selection (either positive or negative). The method uses the likelihood functions of Hartl et al. (1994) for inference in a Wright-Fisher genic selection model and corrects for nonindependence among sites by application of coalescent simulations with recombination. Here, we (1) characterize the distribution of the CLRT statistic (Lambda) as a function of the population recombination rate (R=4Ner); (2) explore the effects of bias in estimation of R on the size (type I error) of the CLRT; (3) explore the robustness of the model to population growth, bottlenecks, and migration; (4) explore the power of the CLRT under varying levels of mutation, selection, and recombination; (5) explore the discriminatory power of the test in distinguishing negative selection from population growth; and (6) evaluate the performance of maximum composite-likelihood estimation (MCLE) of the selection coefficient. We find that the test has excellent power to detect weak negative selection and moderate power to detect positive selection. Moreover, the test is quite robust to bias in the estimate of local recombination rate, but not to certain demographic scenarios such as population growth or a recent bottleneck. Last, we demonstrate that the MCLE of the selection parameter has little bias for weak negative selection and has downward bias for positively selected mutations. 相似文献
7.
Elizabeth A. Sinclair Brian Costello Jacqueline M. Courtenay Keith A. Crandall 《Conservation Genetics》2002,3(2):191-196
Gilbert's Potoroo isAustralia's most critically endangeredmarsupial, known from a single population inthe Two Peoples Bay National Park in WesternAustralia. We present results from a study ofgenetic variation in microsatellite andmitochondrial DNA. Mean heterozygosity at fivemicrosatellite loci was 49.3%, and the amountof mtDNA variation was extremely low ( =0.0004). There was evidence for a bottleneckin both sets of markers, and this wasconsistent with a demographic decline. Effective population size was estimated usingtwo different models of mutation formicrosatellites (N
e = 243 and 362). The results from this study highlight theconcern for the long-term survival of thisspecies. 相似文献
8.
Otto SP 《Trends in genetics : TIG》2000,16(12):526-529
Clues to our evolutionary history lie hidden within DNA sequence data. One of the great challenges facing population geneticists is to identify and accurately interpret these clues. This task is made especially difficult by the fact that many different evolutionary processes can lead to similar observations. For example, low levels of polymorphism within a region can be explained by a low local mutation rate, by selection having eliminated deleterious mutations, or by the recent spread to fixation of a beneficial allele. Theoretical advances improve our ability to distinguish signals left by different evolutionary processes. In particular, a new test might better detect the footprint of selection having favored the spread of a beneficial allele. 相似文献
9.
Accuracy of phylogenetic trees estimated from DNA sequence data 总被引:4,自引:1,他引:3
The relative merits of four different tree-making methods in obtaining the
correct topology were studied by using computer simulation. The methods
studied were the unweighted pair-group method with arithmetic mean (UPGMA),
Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and
Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was
assumed to evolve into eight sequences following a given model tree. Both
constant and varying rates of nucleotide substitution were considered. Once
the DNA sequences for the eight extant species were obtained, phylogenetic
trees were constructed by using corrected (d) and uncorrected (p)
nucleotide substitutions per site. The topologies of the trees obtained
were then compared with that of the model tree. The results obtained can be
summarized as follows: (1) The probability of obtaining the correct rooted
or unrooted tree is low unless a large number of nucleotide differences
exists between different sequences. (2) When the number of nucleotide
substitutions per sequence is small or moderately large, the FM, DW, and MF
methods show a better performance than UPGMA in recovering the correct
topology. The former group of methods is particularly good for obtaining
the correct unrooted tree. (3) When the number of substitutions per
sequence is large, UPGMA is at least as good as the other methods,
particularly for obtaining the correct rooted tree. (4) When the rate of
nucleotide substitution varies with evolutionary lineage, the FM, DW, and
MF methods show a better performance in obtaining the correct topology than
UPGMA, except when a rooted tree is to be produced from data with a large
number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250
WORDS)
相似文献
10.
Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial
and viral communities. Individual, short pieces of sequenced DNA (“reads”) are classified into (putative) taxonomic or metabolic
groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic
data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique
for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited
to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework
to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap,
and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can
robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine
and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples,
and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems
and others. We also identify specific associations between pathways and canonical environments, and explore how alternative
choices of decompositions facilitate analysis of read matrices at a finer scale. 相似文献
11.
植物DNA条形码与生物多样性数据共享平台构建 总被引:1,自引:0,他引:1
DNA条形码基于较短的DNA序列实现物种的快速、准确鉴定, 不仅加快了全球生物物种的鉴定和分类步伐, 也为生物多样性的管理、保护和可持续利用提供了新思路和研究方法。植物DNA条形码标准数据库的不断完善, 将使植物多样性信息的快速获取成为可能; 将不同类型数据资源整合、共享和利用, 构建植物DNA条形码数据共享平台, 是满足公众对物种准确鉴定和快速认知的重要支撑。本文介绍了近年来植物DNA条形码的研究进展; 植物DNA条形码参考数据库的研发现状和存在的问题。结合上述问题, 围绕“大数据”时代背景, 对如何管理和使用好海量的植物信息, 如何构建数据共享平台提出了一些设想: (1)数据共享平台的元数据应尽可能翔实、丰富、准确和多关联; (2)数据标准应统一规范; (3)查询入口方便、迅速、多样, 易于管理, 便于实现更大程度的数据共享和全球化的合作交流。 相似文献
12.
Background
DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases.Results
Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation.Conclusions
We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.13.
MOTIVATION: Maximum-likelihood methods for solving the consensus sequence identification (CSI) problem on DNA sequences may only find a local optimum rather than the global optimum. Additionally, such methods do not allow logical constraints to be imposed on their models. This study develops a linear programming technique to solve CSI problems by finding an optimum consensus sequence. This method is computationally more efficient and is guaranteed to reach the global optimum. The developed method can also be extended to treat more complicated CSI problems with ambiguous conserved patterns. RESULTS: A CSI problem is first formulated as a non-linear mixed 0-1 optimization program, which is then converted into a linear mixed 0-1 program. The proposed method provides the following advantages over maximum-likelihood methods: (1) It is guaranteed to find the global optimum. (2) It can embed various logical constraints into the corresponding model. (3) It is applicable to problems with many long sequences. (4) It can find the second and the third best solutions. An extension of the proposed linear mixed 0-1 program is also designed to solve CSI problems with an unknown spacer length between conserved regions. Two examples of searching for CRP-binding sites and for FNR-binding sites in the Escherichia coli genome are used to illustrate and test the proposed method. AVAILABILITY: A software package, Global Site Seer for the Microsoft Windows operating system is available by http://www.iim.nctu.edu.tw/~cjfu/gss.htm 相似文献
14.
ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data
Jane Hawkey Mohammad Hamidian Ryan R. Wick David J. Edwards Helen Billman-Jacobe Ruth M. Hall Kathryn E. Holt 《BMC genomics》2015,16(1)
Background
Insertion sequences (IS) are small transposable elements, commonly found in bacterial genomes. Identifying the location of IS in bacterial genomes can be useful for a variety of purposes including epidemiological tracking and predicting antibiotic resistance. However IS are commonly present in multiple copies in a single genome, which complicates genome assembly and the identification of IS insertion sites. Here we present ISMapper, a mapping-based tool for identification of the site and orientation of IS insertions in bacterial genomes, directly from paired-end short read data.Results
ISMapper was validated using three types of short read data: (i) simulated reads from a variety of species, (ii) Illumina reads from 5 isolates for which finished genome sequences were available for comparison, and (iii) Illumina reads from 7 Acinetobacter baumannii isolates for which predicted IS locations were tested using PCR. A total of 20 genomes, including 13 species and 32 distinct IS, were used for validation. ISMapper correctly identified 97 % of known IS insertions in the analysis of simulated reads, and 98 % in real Illumina reads. Subsampling of real Illumina reads to lower depths indicated ISMapper was able to correctly detect insertions for average genome-wide read depths >20x, although read depths >50x were required to obtain confident calls that were highly-supported by evidence from reads. All ISAba1 insertions identified by ISMapper in the A. baumannii genomes were confirmed by PCR. In each A. baumannii genome, ISMapper successfully identified an IS insertion upstream of the ampC beta-lactamase that could explain phenotypic resistance to third-generation cephalosporins. The utility of ISMapper was further demonstrated by profiling genome-wide IS6110 insertions in 138 publicly available Mycobacterium tuberculosis genomes, revealing lineage-specific insertions and multiple insertion hotspots.Conclusions
ISMapper provides a rapid and robust method for identifying IS insertion sites directly from short read data, with a high degree of accuracy demonstrated across a wide range of bacteria. 相似文献15.
Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster 下载免费PDF全文
Genome-wide nucleotide variation in non-African populations of Drosophila melanogaster is a subset of variation found in East sub-Saharan African populations, suggesting a bottleneck in the history of the former. We implement an approximate Bayesian approach to infer the timing, duration, and severity of this putative bottleneck and ask whether this inferred model is sufficient to account for patterns of variability observed at 115 loci scattered across the X chromosome. We estimate a recent bottleneck 0.006N(e) generations ago, somewhat further in the past than suggested by biogeographical evidence. Using various proposed statistical tests, we find that this bottleneck model is able to predict the majority of observed features of diversity and linkage disequilibrium in the data. Thus, while precise estimates of bottleneck parameters (like inferences of selection) are sensitive to model assumptions, our results imply that it may be unnecessary to invoke frequent selective sweeps associated with the dispersal of D. melanogaster from Africa to explain patterns of variability in non-African populations. 相似文献
16.
Thomas Kuilman Arno Velds Kristel Kemper Marco Ranzani Lorenzo Bombardelli Marlous Hoogstraat Ekaterina Nevedomskaya Guotai Xu Julian de Ruiter Martijn P Lolkema Bauke Ylstra Jos Jonkers Sven Rottenberg Lodewyk F Wessels David J Adams Daniel S Peeper Oscar Krijgsman 《Genome biology》2015,16(1)
Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting ‘off-target’ sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0617-1) contains supplementary material, which is available to authorized users. 相似文献17.
18.
19.
Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data 下载免费PDF全文
Alkan C Ventura M Archidiacono N Rocchi M Sahinalp SC Eichler EE 《PLoS computational biology》2007,3(9):1807-1818
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution. 相似文献