首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Current methods for detection of copy number variants (CNV) and aberrations (CNA) from targeted sequencing data are based on the depth of coverage of captured exons. Accurate CNA determination is complicated by uneven genomic distribution and non-uniform capture efficiency of targeted exons. Here we present CopywriteR, which eludes these problems by exploiting ‘off-target’ sequence reads. CopywriteR allows for extracting uniformly distributed copy number information, can be used without reference, and can be applied to sequencing data obtained from various techniques including chromatin immunoprecipitation and target enrichment on small gene panels. CopywriteR outperforms existing methods and constitutes a widely applicable alternative to available tools.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0617-1) contains supplementary material, which is available to authorized users.  相似文献   

2.
Tests of applicability of several substitution models for DNA sequence data   总被引:5,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

3.
Clues to our evolutionary history lie hidden within DNA sequence data. One of the great challenges facing population geneticists is to identify and accurately interpret these clues. This task is made especially difficult by the fact that many different evolutionary processes can lead to similar observations. For example, low levels of polymorphism within a region can be explained by a low local mutation rate, by selection having eliminated deleterious mutations, or by the recent spread to fixation of a beneficial allele. Theoretical advances improve our ability to distinguish signals left by different evolutionary processes. In particular, a new test might better detect the footprint of selection having favored the spread of a beneficial allele.  相似文献   

4.
SUMMARY: VeriScan is a software package for the analysis of DNA sequence polymorphisms at the whole genome scale. Among other features, the software (1) can conduct many population genetic analyses; (2) incorporates a multiresolution wavelet transform-based method that allows capturing relevant information from DNA polymorphism data; (3) facilitates the visualization of the results in the most commonly used genome browsers.  相似文献   

5.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:3,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining thecorrect topology were studied by using computer simulation. The methodsstudied were the unweighted pair-group method with arithmetic mean (UPGMA),Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, andTateno et al.'s modified Farris (MF) method. An ancestral DNA sequence wasassumed to evolve into eight sequences following a given model tree. Bothconstant and varying rates of nucleotide substitution were considered. Oncethe DNA sequences for the eight extant species were obtained, phylogenetictrees were constructed by using corrected (d) and uncorrected (p)nucleotide substitutions per site. The topologies of the trees obtainedwere then compared with that of the model tree. The results obtained can besummarized as follows: (1) The probability of obtaining the correct rootedor unrooted tree is low unless a large number of nucleotide differencesexists between different sequences. (2) When the number of nucleotidesubstitutions per sequence is small or moderately large, the FM, DW, and MFmethods show a better performance than UPGMA in recovering the correcttopology. The former group of methods is particularly good for obtainingthe correct unrooted tree. (3) When the number of substitutions persequence is large, UPGMA is at least as good as the other methods,particularly for obtaining the correct rooted tree. (4) When the rate ofnucleotide substitution varies with evolutionary lineage, the FM, DW, andMF methods show a better performance in obtaining the correct topology thanUPGMA, except when a rooted tree is to be produced from data with a largenumber of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250WORDS)  相似文献   

6.
MOTIVATION: Classifying genes into clusters depending on their expression profiles is one of the most important analysis techniques for microarray data. Because temporal gene expression profiles are indicative of the dynamic functional properties of genes, the application of clustering analysis to time-course data allows the more precise division of genes into functional classes. Conventional clustering methods treat the sampling data at each time point as data obtained under different experimental conditions without considering the continuity of time-course data between time periods t and t+1. Here, we propose a method designated mathematical model-based clustering (MMBC). RESULTS: The proposed method, designated MMBC, was applied to artificial data and time-course data obtained using Saccharomyces cerevisiae. Our method is able to divide data into clusters more accurately and coherently than conventional clustering methods. Furthermore, MMBC is more tolerant to noise than conventional clustering methods. AVAILABILITY: Software is available upon request. CONTACT: taizo@brs.kyushu-u.ac.jp.  相似文献   

7.
8.
9.
Coltriciella sonorensis is described here as a new species from Mexico. It is characterized by pleuropodal, flabelliform basidiomes, rounded to elongated or daedaloid pores, a well-developed sub-hymenium, and oblong to cylindrical basidiospores, slightly attenuated towards the apex. The specimen was collected on soil in an open Quercus stand in mixed Quercus–tropical deciduous forest in the Sierra de álamos–Río Cuchujaqui Biosphere Reserve, Sonora, Mexico. From a phylogenetic perspective, the species appears to be related to C. oblectabilis, also occurring in Mexico.  相似文献   

10.
目的:建立简便、快捷、经济的模式小鼠总DNA提取方法,以快速鉴定大批量模式小鼠基因型。方法采用苯酚抽提法、异丙醇沉淀法、鼠耳煮沸法提取同种模式小鼠总DNA,对比DNA纯度、得率、耗费时间,并比较基因型鉴定结果。结果苯酚抽提法得率最高,异丙醇沉淀法最低;而纯度则按照苯酚抽提法、异丙醇沉淀法、鼠耳煮沸法顺序递减;在耗时上鼠耳煮沸法最短。三种方法提取的DNA均可做模版用于基因型鉴定。结论鼠耳煮沸法操作简单、成本最低,快速、基因型鉴定结果可靠,可用于规模化的基因型鉴定实验中。  相似文献   

11.
Zhu L  Bustamante CD 《Genetics》2005,170(3):1411-1421
We present a novel composite-likelihood-ratio test (CLRT) for detecting genes and genomic regions that are subject to recurrent natural selection (either positive or negative). The method uses the likelihood functions of Hartl et al. (1994) for inference in a Wright-Fisher genic selection model and corrects for nonindependence among sites by application of coalescent simulations with recombination. Here, we (1) characterize the distribution of the CLRT statistic (Lambda) as a function of the population recombination rate (R=4Ner); (2) explore the effects of bias in estimation of R on the size (type I error) of the CLRT; (3) explore the robustness of the model to population growth, bottlenecks, and migration; (4) explore the power of the CLRT under varying levels of mutation, selection, and recombination; (5) explore the discriminatory power of the test in distinguishing negative selection from population growth; and (6) evaluate the performance of maximum composite-likelihood estimation (MCLE) of the selection coefficient. We find that the test has excellent power to detect weak negative selection and moderate power to detect positive selection. Moreover, the test is quite robust to bias in the estimate of local recombination rate, but not to certain demographic scenarios such as population growth or a recent bottleneck. Last, we demonstrate that the MCLE of the selection parameter has little bias for weak negative selection and has downward bias for positively selected mutations.  相似文献   

12.
Consensus on the evolutionary relationships of humans, chimpanzees, andgorillas has not been reached, despite the existence of a number of DNAsequence data sets relating to the phylogeny, partly because not all genetrees from these data sets agree. However, given the well-known phenomenonof gene tree-species tree mismatch, agreement among gene trees is notexpected. A majority of gene trees from available DNA sequence data supportone hypothesis, but is this evidence sufficient for statistical confidencein the majority hypothesis? All available DNA sequence data sets showingphylogenetic resolution among the hominoids are grouped according togenetic linkage of their corresponding genes to form independent data sets.Of the 14 independent data sets defined in this way, 11 support a human-chimpanzee clade, 2 support a chimpanzee-gorilla clade, and one supports ahuman-gorilla clade. The hypothesis of a trichotomous speciation eventleading to Homo; Pan, and Gorilla can be firmly rejected on the basis ofthis data set distribution. The multiple-locus test (Wu 1991), whichevaluates hypotheses using gene tree-species tree mismatch probabilities ina likelihood ratio test, favors the phylogeny with a Homo-Pan clade andrejects the other alternatives with a P value of 0.002. When theprobabilities are modified to reflect effective population size differencesamong different types of genetic loci, the observed data set distributionis even more likely under the Homo-Pan clade hypothesis. Maximum-likelihoodestimates for the time between successive hominoid divergences are in therange of 300,000-2,800,000 years, based on a reasonable range of estimatesfor long-term hominoid effective population size and for generation time.The implication of the multiple-locus test is that existing DNA sequencedata sets provide overwhelming and sufficient support for ahuman-chimpanzee clade: no additional DNA data sets need to be generatedfor the purpose of estimating hominoid phylogeny. Because DNA hybridizationevidence (Caccone and Powell 1989) also supports a Homo-Pan clade, theproblem of hominoid phylogeny can be confidently considered solved.  相似文献   

13.
It is possible to perform a combined amplification and sequencing reaction ('DEXAS') directly from complex DNA mixtures by using two thermostable DNA polymerases, one that favours the incorporation of deoxynucleotides over dideoxynucleotides, and one which has a decreased ability to discriminate between these two nucleotide forms. During cycles of thermal denaturation, annealing and extension, the former enzyme primarily amplifies the target sequence whereas the latter enzyme primarily performs a sequencing reaction. This method allows the determination of single-copy nuclear DNA sequences from amounts of human genomic DNA comparable to those used to amplify nucleotide sequences by the polymerase chain reaction. Thus, DNA sequences can be easily determined directly from total genomic DNA.  相似文献   

14.
15.
Evolutionary distance matrices of the extant hominoids are computed from DNA sequence data, and hominoid DNA phylogenies are reconstructed by applying the neighbor-joining method to these distance matrices. The chimpanzee is clustered with the human in most of the phylogenetic trees thus obtained. The proportion of the distance between human and chimpanzee to that between human/chimpanzee and orangutan is estimated. Both mitochondrial DNA and nuclear DNA show a similar value (0.44), which is close to values derived from DNA-DNA hybridization data.  相似文献   

16.
Traditional classification in the genus Capra is based mainly on horn morphology. However, previous investigations based on allozyme data are not consistent with this classification. We thus reexamined the evolutionary history of the genus by analyzing mitochondrial DNA (mtDNA) sequence variation. We collected bone samples from museums or dead animals found in the field. Thirty-four individuals were successfully sequenced for a portion of the mtDNA cytochrome b gene and control region (500 bp in total). We obtained a star-like phylogeny supporting a rapid radiation of the genus. In accordance with traditional classification, mtDNA data support the presence of two clades in the Caucasus and the hypothesis of a domestication event in the Fertile Crescent. However, in conflict with morphology, we found that C. aegagrus and C. ibex are polyphyletic species, and we propose a new scenario for Capra immigration into Europe.  相似文献   

17.
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partitions and estimation of one evolutionary history, maximizing the explanatory power of the data. Consensus/independence approaches endorse a two-step procedure where partitions are analyzed independently and then a consensus is determined from the multiple results. Mixtures across the model space of a strict combined-data approach and a priori independent parameters are popular methods to integrate these methods. We propose an alternative middle ground by constructing a Bayesian hierarchical phylogenetic model. Our hierarchical framework enables researchers to pool information across data partitions to improve estimate precision in individual partitions while permitting estimation and testing of tendencies in across-partition quantities. Such across-partition quantities include the distribution from which individual topologies relating the sequences within a partition are drawn. We propose standard hierarchical priors on continuous evolutionary parameters across partitions, while the structure on topologies varies depending on the research problem. We illustrate our model with three examples. We first explore the evolutionary history of the guinea pig (Cavia porcellus) using alignments of 13 mitochondrial genes. The hierarchical model returns substantially more precise continuous parameter estimates than an independent parameter approach without losing the salient features of the data. Second, we analyze the frequency of horizontal gene transfer using 50 prokaryotic genes. We assume an unknown species-level topology and allow individual gene topologies to differ from this with a small estimable probability. Simultaneously inferring the species and individual gene topologies returns a transfer frequency of 17%. We also examine HIV sequences longitudinally sampled from HIV+ patients. We ask whether posttreatment development of CCR5 coreceptor virus represents concerted evolution from middisease CXCR4 virus or reemergence of initial infecting CCR5 virus. The hierarchical model pools partitions from multiple unrelated patients by assuming that the topology for each patient is drawn from a multinomial distribution with unknown probabilities. Preliminary results suggest evolution and not reemergence.  相似文献   

18.
ZTR: a new format for DNA sequence trace data   总被引:2,自引:0,他引:2  
MOTIVATION: To produce an open and extensible file format for DNA trace data which produces compact files suitable for large-scale storage and efficient use of internet bandwidth. RESULTS: We have created an extensible format named ZTR. For a set of data taken from an ABI-3700 the ZTR format produces trace files which require 61.6% of the disk space used by gzipped SCFv3, and which can be written and read at greater speed. The compression algorithms used for the trace amplitudes are used within the National Center for Biotechnology Information (NCBI) trace archive. lmb.cam.ac.uk/pub/staden/io_lib/test_data.  相似文献   

19.
N Tomioka  A Itai 《Biopolymers》1992,32(12):1593-1597
A three-dimensional model of DNA/RNA triple helix that contains a poly(L-deoxyadenosine) (L-dA) chain is proposed based on computer-assisted model building and energy calculations. The model building was performed by a new method that systematically searches possible conformations of nucleotide units in the helical chains. Two possible orientations of sugar-phosphate chains, in which two homopyrimidine strands are parallel or antiparallel with each other, were considered in the systematic search. Several possible base-pairing models, in which there are one Watson-Crick base pair and one other base pair, were also considered. Many possible models selected by the systematic search were further refined through molecular mechanics calculation incorporating a helical boundary condition. The preferred model, which was selected on the basis of potential energy, was the one with Watson-Crick and Hoogsteen base pairs and with its two polypyrimidine chains in the antiparallel orientation. The model can explain the experimental observation that poly(L-dA) forms a stable triple helix with poly(uridylic acid) (U) but not with poly(deoxythymidylic acid) (dT).  相似文献   

20.
Here, we investigate the evolutionary history and pattern of genetic divergence in the Rhagoletis pomonella (Diptera: Tephritidae) sibling species complex, a model for sympatric speciation via host plant shifting, using 11 anonymous nuclear genes and mtDNA. We report that DNA sequence results largely coincide with those of previous allozyme studies. Rhagoletis cornivora was basal in the complex, distinguished by fixed substitutions at all loci. Gene trees did not provide reciprocally monophyletic relationships among US populations of R. pomonella, R. mendax, R. zephyria and the undescribed flowering dogwood fly. However, private alleles were found for these taxa for certain loci. We discuss the implications of the results with respect to identifiable genetic signposts (stages) of speciation, the mosaic nature of genomic differentiation distinguishing formative species and a concept of speciation mode plurality involving a biogeographic contribution to sympatric speciation in the R. pomonella complex.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号