首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
As a fundamental unit in biology, species are used in a wide variety of studies, and their delimitation impacts every subfield of the life sciences. Thus, it is of utmost importance that species are delimited in an accurate and biologically meaningful way. However, due to morphologically similar, cryptic species, and processes such as incomplete lineage sorting, this is far from a trivial task. Here, we examine the accuracy and sensitivity to sampling strategy of three recently developed methods that aim to delimit species from multi-locus DNA sequence data without a priori assignments of samples to putative species. Specifically, we simulate data at two species tree depths and a variety of sampling strategies ranging from five alleles per species and five loci to 20 alleles per species and 100 loci to test (1) Structurama, (2) Gaussian clustering, and (3) nonparametric delimitation. We find that Structurama accurately delimits even relatively recently diverged (greater than 1.5N generations) species when sampling 10 or more loci. We also find that Gaussian clustering delimits more deeply divergent species (greater than 2.5N generations) relatively well, but is not sufficiently sensitive to delimit more recently diverged species. Finally, we find that nonparametric delimitation performs well with 25 or more loci if gene trees are known without error, but performs poorly with estimated gene genealogies, frequently over-splitting species and mis-assigning samples. We thus suggest that Structurama represents a powerful tool for use in species delimitation. It should be noted, however, that intraspecific population structure may be delimited using this or any of the methods tested herein. We argue that other methods, such as other species delimitation methods requiring a priori putative species assignments (e.g. SpeDeSTEM, Bayesian species delimitation), and other types of data (e.g. morphological, ecological, behavioral) be incorporated in conjunction with these methods in studies attempting to delimit species.  相似文献   

2.
In this study, molecular data sets were used to address the controversies relating to the systematics of strongyloid nematodes of equids utilising morphological data sets. DNA sequences of the first and second internal transcribed spacers (ITS-1 and ITS-2) of ribosomal DNA were determined for 30 species of equine strongyles and the systematic relationships reconstructed using phenetic and phylogenetic tree-building methods. The molecular data provided support for the hypothesis that the genera with large subglobular buccal capsules are ancestral to those with small cylindrical buccal capsules, but did not provide support for the current division of the subfamilies Strongylinae and Cyathostominae or for some taxonomic groupings (i.e. generic designations of species) within the Cyathostominea based on morphological data. Although not entirely concordant, the current molecular data provide a systematic framework for future studies of equine strongyles, which could be exploited in combination with new, phylogenetically informative morphological data sets.  相似文献   

3.
Accurate species delimitations are of great importance for effectively characterizing biological diversity. Our criteria for delimiting species have changed dramatically over the last decades with the increasing availability of molecular data and improvement of analytical methods to evaluate these data. Whereas reciprocal monophyly is often seen as an indicator for the presence of distinct lineages, recently diverged species often fail to form monophyletic groups. At the same time, cryptic species have repeatedly been detected in numerous organismal groups. In this study, we addressed the species delimitation in the crustose lichen-forming fungal genus Diploschistes using multilocus sequence data from specimens representing 16 currently accepted species. Our results indicate the presence of previously undetected, cryptic species-level lineages in the subgenus Limborina. In the subgenus Limborina, samples from different continents currently classified under the same species were shown to be only distantly related. At the same time, in parts of subgen. Diploschistes characterized by short branches, none of the currently accepted species formed monophyletic groups. In spite of the lack of monophyly in phylogenetic reconstructions, a multispecies coalescent method provided support for eight of the nine accepted species in subgen. Diploschistes as distinct lineages. We propose to reduce D. neutrophilus to synonymy with D. diacapsis and point out that additional sampling will be necessary before accepting additional species in subgen. Limborina.  相似文献   

4.
Begun and Aquadro have demonstrated that levels of nucleotide variation correlate with recombination rate among 20 gene regions from across the genome of Drosophila melanogaster. It has been suggested that this correlation results from genetic hitchhiking associated with the fixation of strongly selected mutants. The hitchhiking process can be described as a series of two-step events. The first step consists of a strongly selected substitution wiping out linked variation in a population; this is followed by a recovery period in which polymorphism can build up via neutral mutations and random genetic drift. Genetic hitchhiking has previously been modeled as a steady-state process driven by recurring selected substitutions. We show here that the characteristic parameter of this steady-state model is alpha v, the product of selection intensity (alpha = 2Ns) and the frequency of beneficial mutations v (where N is population size and s is the selective advantage of the favored allele). We also demonstrate that the steady-state model describes the hitchhiking process adequately, unless the recombination rate is very low. To estimate alpha v, we use the data of DNA sequence variation from 17 D. melanogaster loci from regions of intermediate to high recombination rates. We find that alpha v is likely to be > 1.3 x 10(-8). Additional data are needed to estimate this parameter more precisely. The estimation of alpha v is important, as this parameter determines the shape of the frequency distribution of strongly selected substitutions.   相似文献   

5.
6.
Zhu L  Bustamante CD 《Genetics》2005,170(3):1411-1421
We present a novel composite-likelihood-ratio test (CLRT) for detecting genes and genomic regions that are subject to recurrent natural selection (either positive or negative). The method uses the likelihood functions of Hartl et al. (1994) for inference in a Wright-Fisher genic selection model and corrects for nonindependence among sites by application of coalescent simulations with recombination. Here, we (1) characterize the distribution of the CLRT statistic (Lambda) as a function of the population recombination rate (R=4Ner); (2) explore the effects of bias in estimation of R on the size (type I error) of the CLRT; (3) explore the robustness of the model to population growth, bottlenecks, and migration; (4) explore the power of the CLRT under varying levels of mutation, selection, and recombination; (5) explore the discriminatory power of the test in distinguishing negative selection from population growth; and (6) evaluate the performance of maximum composite-likelihood estimation (MCLE) of the selection coefficient. We find that the test has excellent power to detect weak negative selection and moderate power to detect positive selection. Moreover, the test is quite robust to bias in the estimate of local recombination rate, but not to certain demographic scenarios such as population growth or a recent bottleneck. Last, we demonstrate that the MCLE of the selection parameter has little bias for weak negative selection and has downward bias for positively selected mutations.  相似文献   

7.
Gilbert's Potoroo isAustralia's most critically endangeredmarsupial, known from a single population inthe Two Peoples Bay National Park in WesternAustralia. We present results from a study ofgenetic variation in microsatellite andmitochondrial DNA. Mean heterozygosity at fivemicrosatellite loci was 49.3%, and the amountof mtDNA variation was extremely low ( =0.0004). There was evidence for a bottleneckin both sets of markers, and this wasconsistent with a demographic decline. Effective population size was estimated usingtwo different models of mutation formicrosatellites (N e = 243 and 362). The results from this study highlight theconcern for the long-term survival of thisspecies.  相似文献   

8.
Clues to our evolutionary history lie hidden within DNA sequence data. One of the great challenges facing population geneticists is to identify and accurately interpret these clues. This task is made especially difficult by the fact that many different evolutionary processes can lead to similar observations. For example, low levels of polymorphism within a region can be explained by a low local mutation rate, by selection having eliminated deleterious mutations, or by the recent spread to fixation of a beneficial allele. Theoretical advances improve our ability to distinguish signals left by different evolutionary processes. In particular, a new test might better detect the footprint of selection having favored the spread of a beneficial allele.  相似文献   

9.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

10.
Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial and viral communities. Individual, short pieces of sequenced DNA (“reads”) are classified into (putative) taxonomic or metabolic groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap, and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples, and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems and others. We also identify specific associations between pathways and canonical environments, and explore how alternative choices of decompositions facilitate analysis of read matrices at a finer scale.  相似文献   

11.
植物DNA条形码与生物多样性数据共享平台构建   总被引:1,自引:0,他引:1  
DNA条形码基于较短的DNA序列实现物种的快速、准确鉴定, 不仅加快了全球生物物种的鉴定和分类步伐, 也为生物多样性的管理、保护和可持续利用提供了新思路和研究方法。植物DNA条形码标准数据库的不断完善, 将使植物多样性信息的快速获取成为可能; 将不同类型数据资源整合、共享和利用, 构建植物DNA条形码数据共享平台, 是满足公众对物种准确鉴定和快速认知的重要支撑。本文介绍了近年来植物DNA条形码的研究进展; 植物DNA条形码参考数据库的研发现状和存在的问题。结合上述问题, 围绕“大数据”时代背景, 对如何管理和使用好海量的植物信息, 如何构建数据共享平台提出了一些设想: (1)数据共享平台的元数据应尽可能翔实、丰富、准确和多关联; (2)数据标准应统一规范; (3)查询入口方便、迅速、多样, 易于管理, 便于实现更大程度的数据共享和全球化的合作交流。  相似文献   

12.

Background

DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases.

Results

Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation.

Conclusions

We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.
  相似文献   

13.
MOTIVATION: Maximum-likelihood methods for solving the consensus sequence identification (CSI) problem on DNA sequences may only find a local optimum rather than the global optimum. Additionally, such methods do not allow logical constraints to be imposed on their models. This study develops a linear programming technique to solve CSI problems by finding an optimum consensus sequence. This method is computationally more efficient and is guaranteed to reach the global optimum. The developed method can also be extended to treat more complicated CSI problems with ambiguous conserved patterns. RESULTS: A CSI problem is first formulated as a non-linear mixed 0-1 optimization program, which is then converted into a linear mixed 0-1 program. The proposed method provides the following advantages over maximum-likelihood methods: (1) It is guaranteed to find the global optimum. (2) It can embed various logical constraints into the corresponding model. (3) It is applicable to problems with many long sequences. (4) It can find the second and the third best solutions. An extension of the proposed linear mixed 0-1 program is also designed to solve CSI problems with an unknown spacer length between conserved regions. Two examples of searching for CRP-binding sites and for FNR-binding sites in the Escherichia coli genome are used to illustrate and test the proposed method. AVAILABILITY: A software package, Global Site Seer for the Microsoft Windows operating system is available by http://www.iim.nctu.edu.tw/~cjfu/gss.htm  相似文献   

14.

Background

Insertion sequences (IS) are small transposable elements, commonly found in bacterial genomes. Identifying the location of IS in bacterial genomes can be useful for a variety of purposes including epidemiological tracking and predicting antibiotic resistance. However IS are commonly present in multiple copies in a single genome, which complicates genome assembly and the identification of IS insertion sites. Here we present ISMapper, a mapping-based tool for identification of the site and orientation of IS insertions in bacterial genomes, directly from paired-end short read data.

Results

ISMapper was validated using three types of short read data: (i) simulated reads from a variety of species, (ii) Illumina reads from 5 isolates for which finished genome sequences were available for comparison, and (iii) Illumina reads from 7 Acinetobacter baumannii isolates for which predicted IS locations were tested using PCR. A total of 20 genomes, including 13 species and 32 distinct IS, were used for validation. ISMapper correctly identified 97 % of known IS insertions in the analysis of simulated reads, and 98 % in real Illumina reads. Subsampling of real Illumina reads to lower depths indicated ISMapper was able to correctly detect insertions for average genome-wide read depths >20x, although read depths >50x were required to obtain confident calls that were highly-supported by evidence from reads. All ISAba1 insertions identified by ISMapper in the A. baumannii genomes were confirmed by PCR. In each A. baumannii genome, ISMapper successfully identified an IS insertion upstream of the ampC beta-lactamase that could explain phenotypic resistance to third-generation cephalosporins. The utility of ISMapper was further demonstrated by profiling genome-wide IS6110 insertions in 138 publicly available Mycobacterium tuberculosis genomes, revealing lineage-specific insertions and multiple insertion hotspots.

Conclusions

ISMapper provides a rapid and robust method for identifying IS insertion sites directly from short read data, with a high degree of accuracy demonstrated across a wide range of bacteria.  相似文献   

15.
Thornton K  Andolfatto P 《Genetics》2006,172(3):1607-1619
Genome-wide nucleotide variation in non-African populations of Drosophila melanogaster is a subset of variation found in East sub-Saharan African populations, suggesting a bottleneck in the history of the former. We implement an approximate Bayesian approach to infer the timing, duration, and severity of this putative bottleneck and ask whether this inferred model is sufficient to account for patterns of variability observed at 115 loci scattered across the X chromosome. We estimate a recent bottleneck 0.006N(e) generations ago, somewhat further in the past than suggested by biogeographical evidence. Using various proposed statistical tests, we find that this bottleneck model is able to predict the majority of observed features of diversity and linkage disequilibrium in the data. Thus, while precise estimates of bottleneck parameters (like inferences of selection) are sensitive to model assumptions, our results imply that it may be unnecessary to invoke frequent selective sweeps associated with the dispersal of D. melanogaster from Africa to explain patterns of variability in non-African populations.  相似文献   

16.
Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting ‘off-target’ sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0617-1) contains supplementary material, which is available to authorized users.  相似文献   

17.
18.
19.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号