首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.  相似文献   

2.
Oligonucleotide-based therapeutics have the capacity to engage with nucleic acid immune sensors to activate or block their response, but a detailed understanding of these immunomodulatory effects is currently lacking. We recently showed that 2′-O-methyl (2′OMe) gapmer antisense oligonucleotides (ASOs) exhibited sequence-dependent inhibition of sensing by the RNA sensor Toll-Like Receptor (TLR) 7. Here we discovered that 2′OMe ASOs can also display sequence-dependent inhibitory effects on two major sensors of DNA, namely cyclic GMP-AMP synthase (cGAS) and TLR9. Through a screen of 80 2′OMe ASOs and sequence mutants, we characterized key features within the 20-mer ASOs regulating cGAS and TLR9 inhibition, and identified a highly potent cGAS inhibitor. Importantly, we show that the features of ASOs inhibiting TLR9 differ from those inhibiting cGAS, with only a few sequences inhibiting both pathways. Together with our previous studies, our work reveals a complex pattern of immunomodulation where 95% of the ASOs tested inhibited at least one of TLR7, TLR9 or cGAS by ≥30%, which may confound interpretation of their in vivo functions. Our studies constitute the broadest analysis of the immunomodulatory effect of 2′OMe ASOs on nucleic acid sensing to date and will support refinement of their therapeutic development.  相似文献   

3.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

4.
Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ=log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments.  相似文献   

5.
While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on a few sets of benchmark alignments that are commonly used to measure alignment accuracy, and the average improvements in accuracy can be as much as 1–3% on protein sequence alignment and 5–10% on DNA/RNA sequence alignment. Unlike previous algorithms, consistent average improvements can be obtained across all identity levels.  相似文献   

6.
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. Local alignment information is incorporated into a ClustalW global alignment in the form of a list of anchor points between pairs of sequences. The method is demonstrated using anchors supplied by the Blast post-processing program, Ballast. The rapidity and reliability of DbClustal have been demonstrated using the recently annotated Pyrococcus abyssi proteome where the number of alignments with totally misaligned sequences was reduced from 20% to <2%. A web site has been implemented proposing BlastP database searches with automatic alignment of the top hits by DbClustal.  相似文献   

7.

Background  

Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment.  相似文献   

8.
9.
STI1‐domains are present in a variety of co‐chaperone proteins and are required for the transfer of hydrophobic clients in various cellular processes. The domains were first identified in the yeast Sti1 protein where they were referred to as DP1 and DP2. Based on hidden Markov model searches, this domain had previously been found in other proteins including the mammalian co‐chaperone SGTA, the DNA damage response protein Rad23, and the chloroplast import protein Tic40. Here, we refine the domain definition and carry out structure‐based sequence alignment of STI1‐domains showing conservation of five amphipathic helices. Upon examinations of these identified domains, we identify a preceding helix 0 and unifying sequence properties, determine new molecular models, and recognize that STI1‐domains nearly always occur in pairs. The similarity at the sequence, structure, and molecular levels likely supports a unified functional role.  相似文献   

10.
A group of highly efficient Zn(II)-dependent RNA-cleaving deoxyribozymes has been obtained through in vitro selection. They share a common motif with the ‘8–17’ deoxyribozyme isolated under different conditions, including different design of the random pool and metal ion cofactor. We found that this commonly selected motif can efficiently cleave both RNA and DNA/RNA chimeric substrates. It can cleave any substrate containing rNG (where rN is any ribonucleotide base and G can be either ribo- or deoxyribo-G). The pH profile and reaction products of this deoxyribozyme are similar to those reported for hammerhead ribozyme. This deoxyribozyme has higher activity in the presence of transition metal ions compared to alkaline earth metal ions. At saturating concentrations of Zn2+, the cleavage rate is 1.35 min–1 at pH 6.0; based on pH profile this rate is estimated to be at least ~30 times faster at pH 7.5, where most assays of Mg2+-dependent DNA and RNA enzymes are carried out. This work represents a comprehensive characterization of a nucleic acid-based endonuclease that prefers transition metal ions to alkaline earth metal ions. The results demonstrate that nucleic acid enzymes are capable of binding transition metal ions such as Zn2+ with high affinity, and the resulting enzymes are more efficient at RNA cleavage than most Mg2+-dependent nucleic acid enzymes under similar conditions.  相似文献   

11.
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.  相似文献   

12.
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys.  相似文献   

13.
Argonaute proteins are programmable nucleases that are found in both eukaryotes and prokaryotes and provide defense against invading genetic elements. Although some prokaryotic argonautes (pAgos) were shown to recognize RNA targets in vitro, the majority of studied pAgos have strict specificity toward DNA, which limits their practical use in RNA-centric applications. Here, we describe a unique pAgo nuclease, KmAgo, from the mesophilic bacterium Kurthia massiliensis that can be programmed with either DNA or RNA guides and can precisely cleave both DNA and RNA targets. KmAgo binds 16–20 nt long 5′-phosphorylated guide molecules with no strict specificity for their sequence and is active in a wide range of temperatures. In bacterial cells, KmAgo is loaded with small DNAs with no obvious sequence preferences suggesting that it can uniformly target genomic sequences. Mismatches between the guide and target sequences greatly affect the efficiency and precision of target cleavage, depending on the mismatch position and the nature of the reacting nucleic acids. Target RNA cleavage by KmAgo depends on the formation of secondary structure indicating that KmAgo can be used for structural probing of RNA. These properties of KmAgo open the way for its use for highly specific nucleic acid detection and cleavage.  相似文献   

14.
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian‐weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD's robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, secondary‐structure matching, combinatorial extension, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics‐scale analysis. HwRMSD can align homologs with low‐sequence identity and large conformational differences, cases where both sequence‐based and structural‐based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence‐alignment method, substitution matrix, and gap parameters for each unique pair of homologs. Proteins 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

15.
John B  Sali A 《Nucleic acids research》2003,31(14):3982-3992
Comparative or homology protein structure modeling is severely limited by errors in the alignment of a modeled sequence with related proteins of known three-dimensional structure. To ameliorate this problem, we have developed an automated method that optimizes both the alignment and the model implied by it. This task is achieved by a genetic algorithm protocol that starts with a set of initial alignments and then iterates through re-alignment, model building and model assessment to optimize a model assessment score. During this iterative process: (i) new alignments are constructed by application of a number of operators, such as alignment mutations and cross-overs; (ii) comparative models corresponding to these alignments are built by satisfaction of spatial restraints, as implemented in our program MODELLER; (iii) the models are assessed by a variety of criteria, partly depending on an atomic statistical potential. When testing the procedure on a very difficult set of 19 modeling targets sharing only 4–27% sequence identity with their template structures, the average final alignment accuracy increased from 37 to 45% relative to the initial alignment (the alignment accuracy was measured as the percentage of positions in the tested alignment that were identical to the reference structure-based alignment). Correspondingly, the average model accuracy increased from 43 to 54% (the model accuracy was measured as the percentage of the Cα atoms of the model that were within 5 Å of the corresponding Cα atoms in the superposed native structure). The present method also compares favorably with two of the most successful previously described methods, PSI-BLAST and SAM. The accuracy of the final models would be increased further if a better method for ranking of the models were available.  相似文献   

16.
We studied the evolutionary relationships between γ-carbonic anhydrase (γ-CA) and a very diverse group of proteins that share the sequence motif characteristic of the left-handed parallel β-helix (LβH) fold. This sequence motif is characterized by the imperfect tandem repetition of short hexapeptide units, which makes it difficult to obtain a reliable alignment based on sequence information alone. To solve this problem, we used a structural alignment of three members of the group with known crystallographic structures as a seed to obtain a reliable sequence alignment. Then, we applied protein maximum-parsimony and maximum-likelihood phylogenetic inference methods to this alignment. We found that γ-CA belongs to a diverse superfamily of proteins that share the LβH domain. This superfamily is composed mainly of acyltransferases. The most remarkable feature of the phylogenetic tree obtained is that its main branches group together functionally related proteins, so that the coarse topology can be rather easily explained in terms of functional diversification. Regarding the main branch of the tree containing γ-CA, we found that, in addition to the group of its closest relatives that had already been studied, γ-CA is closely related to the tetrahydrodipicolinate N-succinyltransferases.  相似文献   

17.
When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the “seedâ€� regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.  相似文献   

18.
In this study, we investigate the extent to which techniques for homology modeling that were developed for water-soluble proteins are appropriate for membrane proteins as well. To this end we present an assessment of current strategies for homology modeling of membrane proteins and introduce a benchmark data set of homologous membrane protein structures, called HOMEP. First, we use HOMEP to reveal the relationship between sequence identity and structural similarity in membrane proteins. This analysis indicates that homology modeling is at least as applicable to membrane proteins as it is to water-soluble proteins and that acceptable models (with C alpha-RMSD values to the native of 2 A or less in the transmembrane regions) may be obtained for template sequence identities of 30% or higher if an accurate alignment of the sequences is used. Second, we show that secondary-structure prediction algorithms that were developed for water-soluble proteins perform approximately as well for membrane proteins. Third, we provide a comparison of a set of commonly used sequence alignment algorithms as applied to membrane proteins. We find that high-accuracy alignments of membrane protein sequences can be obtained using state-of-the-art profile-to-profile methods that were developed for water-soluble proteins. Improvements are observed when weights derived from the secondary structure of the query and the template are used in the scoring of the alignment, a result which relies on the accuracy of the secondary-structure prediction of the query sequence. The most accurate alignments were obtained using template profiles constructed with the aid of structural alignments. In contrast, a simple sequence-to-sequence alignment algorithm, using a membrane protein-specific substitution matrix, shows no improvement in alignment accuracy. We suggest that profile-to-profile alignment methods should be adopted to maximize the accuracy of homology models of membrane proteins.  相似文献   

19.
20.
‘Locked nucleic acids’ (LNAs) are known to introduce enhanced bio- and thermostability into natural nucleic acids rendering them powerful tools for diagnostic and therapeutic applications. We present the 1.9 Å X-ray structure of an ‘all LNA’ duplex containing exclusively modified β-d-2′-O-4′C-methylene ribofuranose nucleotides. The helix illustrates a new type of nucleic acid geometry that contributes to the understanding of the enhanced thermostability of LNA duplexes. A notable decrease of several local and overall helical parameters like twist, roll and propeller twist influence the structure of the LNA helix and result in a widening of the major groove, a decrease in helical winding and an enlarged helical pitch. A detailed structural comparison to the previously solved RNA crystal structure with the corresponding base pair sequence underlines the differences in conformation. The surrounding water network of the RNA and the LNA helix shows a similar hydration pattern.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号