共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical significance of probabilistic sequence alignment and related local hidden Markov models.
The score statistics of probabilistic gapped local alignment of random sequences is investigated both analytically and numerically. The full probabilistic algorithm (e.g., the "local" version of maximum-likelihood or hidden Markov model method) is found to have anomalous statistics. A modified "semi-probabilistic" alignment consisting of a hybrid of Smith-Waterman and probabilistic alignment is then proposed and studied in detail. It is predicted that the score statistics of the hybrid algorithm is of the Gumbel universal form, with the key Gumbel parameter lambda taking on a fixed asymptotic value for a wide variety of scoring systems and parameters. A simple recipe for the computation of the "relative entropy," and from it the finite size correction to lambda, is also given. These predictions compare well with direct numerical simulations for sequences of lengths between 100 and 1,000 examined using various PAM substitution scores and affine gap functions. The sensitivity of the hybrid method in the detection of sequence homology is also studied using correlated sequences generated from toy mutation models. It is found to be comparable to that of the Smith-Waterman alignment and significantly better than the Viterbi version of the probabilistic alignment. 相似文献
2.
Marvin Schulz Falko Krause Nicolas Le Novère Edda Klipp Wolfram Liebermeister 《Molecular systems biology》2011,7(1)
The exploding number of computational models produced by Systems Biologists over the last years is an invitation to structure and exploit this new wealth of information. Researchers would like to trace models relevant to specific scientific questions, to explore their biological content, to align and combine them, and to match them with experimental data. To automate these processes, it is essential to consider semantic annotations, which describe their biological meaning. As a prerequisite for a wide range of computational methods, we propose general and flexible similarity measures for Systems Biology models computed from semantic annotations. By using these measures and a large extensible ontology, we implement a platform that can retrieve, cluster, and align Systems Biology models and experimental data sets. At present, its major application is the search for relevant models in the BioModels Database, starting from initial models, data sets, or lists of biological concepts. Beyond similarity searches, the representation of models by semantic feature vectors may pave the way for visualisation, exploration, and statistical analysis of large collections of models and corresponding data. 相似文献
3.
插入和缺失(insertion and deletion)是DNA和蛋白质在进化过程中发生的序列长度上的改变,由于缺乏祖先序列的信息,不能肯定其到底是插入事件还是缺失事件,故统称之为增减(indel)。indel是分子水平进化变异的主要来源之一,近年来对这种进化事件的研究已经涵盖了其发生频率、大小、分布模式、序列进化模型及应用等各个方面。该文总结了基因组水平上插入和缺失的研究进展和发生机制;介绍了已经提出的插入和缺失进化模型,包括TKF91、TKF92、Long Indel模型和序列环境模型;讨论了插入和缺失作为分子标记在分子进化、基因分型和药物设计等方面的应用。 相似文献
4.
5.
We report the identification and characterization of 2,000 human diallelic insertion/deletion polymorphisms (indels) distributed throughout the human genome. Candidate indels were identified by comparison of overlapping genomic or cDNA sequences. Average confirmation rate for indels with a > or =2-nt allele-length difference was 58%, but the confirmation rate for indels with a 1-nt length difference was only 14%. The vast majority of the human diallelic indels were monomorphic in chimpanzees and gorillas. The ratio of deletionrcolon;insertion mutations was 4.1. Allele frequencies for the indels were measured in Europeans, Africans, Japanese, and Native Americans. New alleles were generally lower in frequency than old alleles. This tendency was most pronounced for the Africans, who are likely to be closest among the four groups to the original modern human population. Diallelic indels comprise approximately 8% of all human polymorphisms. Their abundance and ease of analysis make them useful for many applications. 相似文献
6.
Statistical evaluation of ion-channel gating models based on distributions of log-likelihood ratios
下载免费PDF全文

Csanády L 《Biophysical journal》2006,90(10):3523-3545
The distributions of log-likelihood ratios (DeltaLL) obtained from fitting ion-channel dwell-time distributions with nested pairs of gating models (Xi, full model; Xi(R), submodel) were studied both theoretically and using simulated data. When Xi is true, DeltaLL is asymptotically normally distributed with predictable mean and variance that increase linearly with data length (n). When Xi(R) is true and corresponds to a distinct point in full parameter space, DeltaLL is Gamma-distributed (2DeltaLL is chi-square). However, when data generated by an l-component multiexponential distribution are fitted by l+1 components, Xi(R) corresponds to an infinite set of points in parameter space. The distribution of DeltaLL is a mixture of two components, one identically zero, the other approximated by a Gamma-distribution. This empirical distribution of DeltaLL, assuming Xi(R), allows construction of a valid log-likelihood ratio test. The log-likelihood ratio test, the Akaike information criterion, and the Schwarz criterion all produce asymmetrical Type I and II errors and inefficiently recognize Xi, when true, from short datasets. A new decision strategy, which considers both the parameter estimates and DeltaLL, yields more symmetrical errors and a larger discrimination power for small n. These observations are explained by the distributions of DeltaLL when Xi or Xi(R) is true. 相似文献
7.
We describe a model for the sequence evolution of a processed pseudogene and its paralog from a common protein-coding ancestor. The model accounts for substitutions, insertions, and deletions and combines nucleotide- and codon-level mutation models. We give a dynamic programming method for calculating the likelihood of homology between two sequences in the model and describe the accompanying alignment algorithm. We also describe how ancestral codons can be computed when the same gene produced multiple pseudogene homologs. We apply our methods to the evolution of human cytochrome c. 相似文献
8.
9.
Effect of deletion and insertion on double-strand-break repair in Saccharomyces cerevisiae. 总被引:4,自引:2,他引:4
下载免费PDF全文

K Struhl 《Molecular and cellular biology》1987,7(3):1300-1303
I investigated double-strand-break repair in Saccharomyces cerevisiae cells by measuring the frequencies and types of integration events at the PET56-HIS3-DED1 chromosomal region associated with the introduction of linearized plasmid DNAs containing homologous sequences. In general, the integration frequencies observed in strains containing a wild-type region, a 1-kilobase (kb) deletion, or a 5-kb insertion were similar, provided that the cleavage site in the plasmid DNA was present in the host genome. Cleavage at a plasmid DNA site corresponding to a region deleted in the chromosome caused a 10-fold reduction in the integration frequency even when the site was close to regions of homology. However, although the integration frequency was normal even when cleavage occurred only 25 base pairs (bp) outside the deletion breakpoint, 98% of the events were associated not with the usual heterogenote structure, but instead with a homogenote structure containing two copies of the deletion allele separated by vector sequences. Similarly, when cleavage occurred 80 bp outside the 5-kb substitution breakpoint, 40% of the integration events were associated with homogenote structures. From these observations, I suggest that exonuclease and polymerase activities are not rate-limiting steps in double-strand-break repair, exonuclease activity is coupled to the initiation step, the integration frequency is strongly influenced by the amount of homology near the recombinogenic ends, both ends of a linear DNA molecule might interact with the host chromosome before significant exonuclease or polymerase action, and the average repair tract is about 600 bp. 相似文献
10.
Promoter-trapping in Saccharomyces cerevisiae by radiation-assisted fragment insertion 总被引:1,自引:0,他引:1
Kiechle M Manivasakam P Eckardt-Schupp F Schiestl RH Friedl AA 《Nucleic acids research》2002,30(24):e136
Non-homologous insertion (NHI) of DNA fragments into genomic DNA is a method widely used in insertional mutagenesis screens. In the yeast Saccharomyces cerevisiae, the efficiency of NHI is very low. Here we report that its efficiency can be increased by γ-irradiation of recipient cells at the time of transformation. Radiation-assisted NHI depends on YKU70, but its efficiency is not improved by inactivation of RAD5 or RAD52. In a pilot study, we generated 102 transformant clones expressing a lacZ reporter gene under standard conditions (30°C, rich medium). The site of insertion was determined in a subset of eight clones in which lacZ expression was altered by UV-irradiation. A comparison with published data revealed that three of the eight genes identified in our screen have not been targeted by large-scale transposon-based insertion screens. This suggests that radiation-assisted NHI offers a more homogeneous coverage of the genome than methods relying on transposons or retroviral elements. 相似文献
11.
MicroRNA identification based on sequence and structure alignment 总被引:20,自引:0,他引:20
MOTIVATION: MicroRNAs (miRNA) are approximately 22 nt long non-coding RNAs that are derived from larger hairpin RNA precursors and play important regulatory roles in both animals and plants. The short length of the miRNA sequences and relatively low conservation of pre-miRNA sequences restrict the conventional sequence-alignment-based methods to finding only relatively close homologs. On the other hand, it has been reported that miRNA genes are more conserved in the secondary structure rather than in primary sequences. Therefore, secondary structural features should be more fully exploited in the homologue search for new miRNA genes. RESULTS: In this paper, we present a novel genome-wide computational approach to detect miRNAs in animals based on both sequence and structure alignment. Experiments show this approach has higher sensitivity and comparable specificity than other reported homologue searching methods. We applied this method on Anopheles gambiae and detected 59 new miRNA genes. AVAILABILITY: This program is available at http://bioinfo.au.tsinghua.edu.cn/miralign. SUPPLEMENTARY INFORMATION: Supplementary information is available at http://bioinfo.au.tsinghua.edu.cn/miralign/supplementary.htm. 相似文献
12.
The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. 相似文献
13.
MOTIVATION: Protein structures are flexible and undergo structural rearrangements as part of their function, and yet most existing protein structure comparison methods treat them as rigid bodies, which may lead to incorrect alignment. RESULTS: We have developed the Flexible structure AlignmenT by Chaining AFPs (Aligned Fragment Pairs) with Twists (FATCAT), a new method for structural alignment of proteins. The FATCAT approach simultaneously addresses the two major goals of flexible structure alignment; optimizing the alignment and minimizing the number of rigid-body movements (twists) around pivot points (hinges) introduced in the reference protein. In contrast, currently existing flexible structure alignment programs treat the hinge detection as a post-process of a standard rigid body alignment. We illustrate the advantages of the FATCAT approach by several examples of comparison between proteins known to adopt different conformations, where the FATCAT algorithm achieves more accurate structure alignments than current methods, while at the same time introducing fewer hinges. 相似文献
14.
Background
Traditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique. 相似文献15.
16.
Summary Minimum message length encoding is a technique of inductive inference with theoretical and practical advantages. It allows the posterior odds-ratio of two theories or hypotheses to be calculated. Here it is applied to problems of aligning or relating two strings, in particular two biological macromolecules. We compare the r-theory, that the strings are related, with the null-theory, that they are not related. If they are related, the probabilities of the various alignments can be calculated. This is done for one-, three-, and five-state models of relation or mutation. These correspond to linear and piecewise linear cost functions on runs of insertions and deletions. We describe how to estimate parameters of a model. The validity of a model is itself an hypothesis and can be objectively tested. This is done on real DNA strings and on artificial data. The tests on artificial data indicate limits on what can be inferred in various situations. The tests on real DNA support either the three- or five-state models over the one-state model. Finally, a fast, approximate minimum message length string comparison algorithm is described.Offprint requests to: L. Allison 相似文献
17.
18.
The citrate utilization (Cit+) transposon Tn3411 was shown to be flanked by directly repeated sequences (IS3411L and IS3411R) by restriction enzyme analysis and electron microscope observation. Cit- deletion mutants were frequently found to be generated in pBR322::Tn3411 by intramolecular recombination between the two copies of IS3411. The flanking IS3411 elements of Tn3411 were shown to be functional insertion sequences by Tn3411-mediated direct and inverse transposition. Tn3411-mediated inverse transposition from pBR322::Tn3411 to the F-plasmid derivative pED100 occurred more efficiently than that of direct transposition of the Cit+ determinant. This was thought to be due to the differential transposability of IS3411L and IS3411R in the transposition process. The frequency of transposition of IS3411 marked with a chloramphenicol resistance determinant was much higher than IS3411-mediated cointegrate formation, suggesting that replicon fusions are not essential intermediates in the transposition process of Tn3411 or IS3411. Spontaneous deletions occurred with high frequency in recA hosts. The spontaneous deletion promoted by homologous recombination between two IS3411 elements in Tn3411 was examined with deletion mutants. 相似文献
19.
20.
Erling Tronvik Lars J Stovner Gunnar Bovim Linda R White Amanda J Gladwin Kathryn Owen Harald Schrader 《BMC neurology》2008,8(1):4