首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
An evolutionary model for maximum likelihood alignment of DNA sequences   总被引:16,自引:0,他引:16  
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.  相似文献   

2.
Two approximate methods are proposed for maximum likelihood phylogenetic estimation, which allow variable rates of substitution across nucleotide sites. Three data sets with quite different characteristics were analyzed to examine empirically the performance of these methods. The first, called the discrete gamma model, uses several categories of rates to approximate the gamma distribution, with equal probability for each category. The mean of each category is used to represent all the rates falling in the category. The performance of this method is found to be quite good, and four such categories appear to be sufficient to produce both an optimum, or near-optimum fit by the model to the data, and also an acceptable approximation to the continuous distribution. The second method, called fixed-rates model, classifies sites into several classes according to their rates predicted assuming the star tree. Sites in different classes are then assumed to be evolving at these fixed rates when other tree topologies are evaluated. Analyses of the data sets suggest that this method can produce reasonable results, but it seems to share some properties of a least-squares pairwise comparison; for example, interior branch lengths in nonbest trees are often found to be zero. The computational requirements of the two methods are comparable to that of Felsenstein's (1981, J Mol Evol 17:368–376) model, which assumes a single rate for all the sites.  相似文献   

3.
Molecular motors, such as kinesin, myosin, or dynein, convert chemical energy into mechanical energy by hydrolyzing ATP. The mechanical energy is used for moving in discrete steps along the cytoskeleton and carrying a molecular load. High resolution single molecule recordings of motor steps appear as a stochastic sequence of dwells, resembling a staircase. Staircase data can also be obtained from other molecular machines such as F1 -ATPase, RNA polymerase, or topoisomerase. We developed a maximum likelihood algorithm that estimates the rate constants between different conformational states of the protein, including motor steps. We model the motor with a periodic Markov model that reflects the repetitive chemistry of the motor step. We estimated the kinetics from the idealized dwell-sequence by numerical maximization of the likelihood function for discrete-time Markov models. This approach eliminates the need for missed event correction. The algorithm can fit kinetic models of arbitrary complexity, such as uniform or alternating step chemistry, reversible or irreversible kinetics, ATP concentration and mechanical force-dependent rates, etc. The method allows global fitting across stationary and nonstationary experimental conditions, and user-defined a priori constraints on rate constants. The algorithm was tested with simulated data, and implemented in the free QuB software.  相似文献   

4.
Maximum likelihood supertrees   总被引:2,自引:0,他引:2  
  相似文献   

5.
Compilation and alignment of DNA polymerase sequences.   总被引:34,自引:11,他引:34       下载免费PDF全文
  相似文献   

6.
Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.  相似文献   

7.
8.
MOTIVATION: Horizontal gene transfer (HGT) is believed to be ubiquitous among bacteria, and plays a major role in their genome diversification as well as their ability to develop resistance to antibiotics. In light of its evolutionary significance and implications for human health, developing accurate and efficient methods for detecting and reconstructing HGT is imperative. RESULTS: In this article we provide a new HGT-oriented likelihood framework for many problems that involve phylogeny-based HGT detection and reconstruction. Beside the formulation of various likelihood criteria, we show that most of these problems are NP-hard, and offer heuristics for efficient and accurate reconstruction of HGT under these criteria. We implemented our heuristics and used them to analyze biological as well as synthetic data. In both cases, our criteria and heuristics exhibited very good performance with respect to identifying the correct number of HGT events as well as inferring their correct location on the species tree. AVAILABILITY: Implementation of the criteria as well as heuristics and hardness proofs are available from the authors upon request. Hardness proofs can also be downloaded at http://www.cs.tau.ac.il/~tamirtul/MLNET/Supp-ML.pdf  相似文献   

9.
Bioinformatics (2006) 22(21), 2604–2611 The authors would like to apologize for errors of graph misplacementin Figures 4–6, and an  相似文献   

10.

Background  

Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions.  相似文献   

11.
Wang JP  Widom J 《Nucleic acids research》2005,33(21):6743-6755
DNA sequences that are present in nucleosomes have a preferential approximately 10 bp periodicity of certain dinucleotide signals, but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve the alignment of nucleosomal DNAs. We assume that a periodic dinucleotide signal of any type emits according to a probability distribution around a series of 'hot spots' that are equally spaced along nucleosomal DNA with 10 bp period, but with a 1 bp phase shift across the middle of the nucleosome. We model the three statistically most significant dinucleotide signals, AA/TT, GC and TA, simultaneously, while allowing phase shifts between the signals. The alignment is obtained by maximizing the likelihood of both Watson and Crick strands simultaneously. The resulting alignment of 177 chicken nucleosomal DNA sequences revealed that all 10 distinct dinucleotides are periodic, however, with only two distinct phases and varying intensity. By Fourier analysis, we show that our new alignment has enhanced periodicity and sequence identity compared with center alignment. The significance of the nucleosomal DNA sequence alignment is evaluated by comparing it with that obtained using the same model on non-nucleosomal sequences.  相似文献   

12.
Evolutionary trees from DNA sequences: A maximum likelihood approach   总被引:129,自引:0,他引:129  
Summary The application of maximum likelihood techniques to the estimation of evolutionary trees from nucleic acid sequence data is discussed. A computationally feasible method for finding such maximum likelihood estimates is developed, and a computer program is available. This method has advantages over the traditional parsimony algorithms, which can give misleading results if rates of evolution differ in different lineages. It also allows the testing of hypotheses about the constancy of evolutionary rates by likelihood ratio tests, and gives rough indication of the error of the estimate of the tree.By acceptance of this article, the publisher and/or recipient acknowledges the U.S. government's right to retain a nonexclusive, royalty-free licence in and to any copyright covering this paperThis report was prepared as an account of work sponsored by the United States Government. Neither the United States nor the United States Department of Energy, nor any of their employees, nor any of their contractors, subcontractors, or their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use would not infringe privately-owned rights  相似文献   

13.
This work is concerned with statistical methods to estimate yield and maintenance parameters associated with microbial growth. For a given dilution rate, an experimenter typically measures substrate concentration, oxygen utilization rate, the rate of carbon dioxide evolution, and biomass concentration. These correlated response variables each contain information about the maintenance and yield parameters of interest. A maximum likelihood estimator which combines this correlated information for the yield and maintenance parameters is proposed, evaluated, and tested on literature data. Both point and interval estimators are considered.  相似文献   

14.

Background

We present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal pairwise alignment of huge DNA sequences in multi-GPU platforms using the exact Smith-Waterman method.

Results

Our study includes acceleration factors, performance, scalability, power efficiency and energy costs. We also quantify the influence of the contents of the compared sequences, identify potential scenarios for energy savings on speculative executions, and calculate performance and energy usage differences among distinct GPU generations and models. For a sequence alignment on chromosome-wide scale (around 2 Petacells), we are able to reduce execution times from 9.5 h on a Kepler GPU to just 2.5 h on a Pascal counterpart, with energy costs cut by 60%.

Conclusions

We find GPUs to be an order of magnitude ahead in performance per watt compared to Xeon Phis. Finally, versus typical low-power devices like FPGAs, GPUs keep similar GFLOPS/w ratios in 2017 on a five times faster execution.
  相似文献   

15.
A statistical method is presented for comparing protein sequences by partitioning the polymers and estimating each subsegment's degree of conservation. Conservation is measured as a function of the number of transitions occurring in the underlying time homogeneous Markov process assumed to govern amino acid mutations. The Markovian assumption also permits estimation of the ancestral sequence. Partitioning and estimation are carried out via maximum likelihood. The method is contrasted with the commonly utilized percent homology measure. A moving likelihood ratio plot to aid in identifying regions of high conservation is suggested as an analogue to moving hydrophobicity plots. An application is presented which identifies highly conserved regions in thymidylate synthase from L. casei and E. coli.  相似文献   

16.
Maximum likelihood haplotyping for general pedigrees   总被引:3,自引:0,他引:3  
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping.  相似文献   

17.
N Mantel 《Biometrics》1985,41(3):777-783
In minimum chi-square logit or probit analysis of quantal bioassay data, a requirement for proper asymptotic behavior of the estimates made is that test-group sizes get indefinitely large. Inconsistent estimates result if group sizes are small, however numerous the groups. Maximum likelihood estimates do not show this inconsistent behavior, even if all the many group sizes are only unity. The inconsistent behavior for minimum chi-square results from a bias toward 0.5 for response probabilities. At 0.5 the binomial variance is at a maximum of 0.25, so tending to minimize the calculated value of chi square. The principle of minimum chi-square should not be confused with the principle of least squares.  相似文献   

18.

Background  

Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the DNA regions that have diverged less than neutral DNA. Vestige is a phylogenetic footprinting package built on the PyEvolve toolkit that uses probabilistic molecular evolutionary modelling to represent aspects of sequence evolution, including the conventional divergence measure employed by other footprinting approaches. In addition to measuring the divergence, Vestige allows the expansion of the definition of a phylogenetic footprint to include variation in the distribution of any molecular evolutionary processes. This is achieved by displaying the distribution of model parameters that represent partitions of molecular evolutionary substitutions. Examination of the spatial incidence of these effects across regions of the genome can identify DNA segments that differ in the nature of the evolutionary process.  相似文献   

19.
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source .  相似文献   

20.
Maximum likelihood estimation of multiple change points   总被引:3,自引:0,他引:3  
FU  YUN-XIN; CURNOW  R. N. 《Biometrika》1990,77(3):563-573
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号