首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The functions of RNAs, like proteins, are determined by their structures, which, in turn, are determined by their sequences. Comparison/alignment of RNA molecules provides an effective means to predict their functions and understand their evolutionary relationships. For RNA sequence alignment, most methods developed for protein and DNA sequence alignment can be directly applied. RNA 3-dimensional structure alignment, on the other hand, tends to be more difficult than protein structure alignment due to the lack of regular secondary structures as observed in proteins. Most of the existing RNA 3D structure alignment methods use only the backbone geometry and ignore the sequence information. Using both the sequence and backbone geometry information in RNA alignment may not only produce more accurate classification, but also deepen our understanding of the sequence–structure–function relationship of RNA molecules. In this study, we developed a new RNA alignment method based on elastic shape analysis (ESA). ESA treats RNA structures as three dimensional curves with sequence information encoded on additional dimensions so that the alignment can be performed in the joint sequence–structure space. The similarity between two RNA molecules is quantified by a formal distance, geodesic distance. Based on ESA, a rigorous mathematical framework can be built for RNA structure comparison. Means and covariances of full structures can be defined and computed, and probability distributions on spaces of such structures can be constructed for a group of RNAs. Our method was further applied to predict functions of RNA molecules and showed superior performance compared with previous methods when tested on benchmark datasets. The programs are available at http://stat.fsu.edu/ ∼jinfeng/ESA.html.  相似文献   

2.
The present century has witnessed an unprecedented rise in genome sequences owing to various genome-sequencing programs. However, the same has not been replicated with cDNA or expressed sequence tags (ESTs). Hence, prediction of protein coding sequence of genes from this enormous collection of genomic sequences presents a significant challenge. While robust high throughput methods of cloning and expression could be used to meet protein requirements, lack of intron information creates a bottleneck. Computational programs designed for recognizing intron–exon boundaries for a particular organism or group of organisms have their own limitations. Keeping this in view, we describe here a method for construction of intron-less gene from genomic DNA in the absence of cDNA/EST information and organism-specific gene prediction program. The method outlined is a sequential application of bioinformatics to predict correct intron–exon boundaries and splicing by overlap extension PCR for spliced gene synthesis. The gene construct so obtained can then be cloned for protein expression. The method is simple and can be used for any eukaryotic gene expression.  相似文献   

3.
The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon–carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389–469 and 482–523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.  相似文献   

4.
High-resolution two-dimensional gel electrophoresis and mass spectrometry has been used to identify the outer membrane (OM) subproteome of the Gram-negative bacterium Methylococcus capsulatus (Bath). Twenty-eight unique polypeptide sequences were identified from protein samples enriched in OMs. Only six of these polypeptides had previously been identified. The predictions from novel bioinformatic methods predicting β-barrel outer membrane proteins (OMPs) and OM lipoproteins were compared to proteins identified experimentally. BOMP () predicted 43 β-barrel OMPs (1.45%) from the 2,959 annotated open reading frames. This was a lower percentage than predicted from other Gram-negative proteomes (1.8–3%). More than half of the predicted BOMPs in M. capsulatus were annotated as (conserved) hypothetical proteins with significant similarity to very few sequences in Swiss-Prot or TrEMBL. The experimental data and the computer predictions indicated that the protein composition of the M. capsulatus OM subproteome was different from that of other Gram-negative bacteria studied in a similar manner. A new program, Lipo, was developed that can analyse entire predicted proteomes and give a list of recognised lipoproteins categorised according to their lipo-box similarity to known Gram-negative lipoproteins (). This report is the first using a proteomics and bioinformatics approach to identify the OM subproteome of an obligate methanotroph.  相似文献   

5.
A new method for predicting interacting residues in protein complexes, InterProSurf, was applied to the E1 envelope protein of Venezuelan equine encephalitis (VEEV). Monomeric and trimeric models of VEEV-E1 were constructed with our MPACK program, using the crystal structure of the E1 protein of Semliki forest virus as a template. An alignment of the E1 sequences from representative alphavirus sequences was used to determine physical chemical property motifs (likely functional areas) with our PCPMer program. Information on residue variability, propensity to be in protein interfaces, and surface exposure on the model was combined to predict surface clusters likely to interact with other viral or cellular proteins. Mutagenesis of these clusters indicated that the predictions accurately detected areas crucial for virus infection. In addition to the fusion peptide area in domain 2, at least two other surface areas play an important role in virus infection. We propose that these may be sites of interaction between the E1–E1 and E1–E2 subdomains of the envelope proteins that are required to assemble the functional unit. The InterProSurf method is, thus, an important new tool for predicting viral protein interactions. These results can aid in the design of new vaccines against alphaviruses and other viruses.  相似文献   

6.

Background  

The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark.  相似文献   

7.

Background  

Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties.  相似文献   

8.
The accurate identification of protein structure class solely using extracted information from protein sequence is a complicated task in the current computational biology. Prediction of protein structural class for low-similarity sequences remains a challenging problem. In this study, the new computational method has been developed to predict protein structural class by fusing the sequence information and evolution information to represent a protein sample. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark data-sets, 1189 and 25PDB with sequence similarity lower than 40 and 25%, respectively. Comparison of our results with other methods shows that the proposed method by us is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity data-sets.  相似文献   

9.
The accurate identification of protein structure class solely using extracted information from protein sequence is a complicated task in the current computational biology. Prediction of protein structural class for low-similarity sequences remains a challenging problem. In this study, the new computational method has been developed to predict protein structural class by fusing the sequence information and evolution information to represent a protein sample. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark data-sets, 1189 and 25PDB with sequence similarity lower than 40 and 25%, respectively. Comparison of our results with other methods shows that the proposed method by us is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity data-sets.  相似文献   

10.
Expressed sequence tags (ESTs) from Coffea canephora leaves and fruits were used to search for types and frequencies of simple sequence repeats (EST–SSRs) with a motif length of 1–6 bp. From a non-redundant (NR) EST set of 5,534 potential unigenes, 6.8% SSR-containing sequences were identified, with an average density of one SSR every 7.73 kb of EST sequences. Trinucleotide repeats were found to be the most abundant (34.34%), followed by di- (25.75%) and hexa-nucleotide (22.04%) motifs. The development of unique genic SSR markers was optimized by a computational approach which allowed us to eliminate redundancy in the original EST set and also to test the specificity of each pair of designed primers. Twenty-five EST–SSRs were developed and used to evaluate cross-species transferability in the Coffea genus. The orthology was supported by the amplicon sequence similarity and the amplification patterns. The >94% identity of flanking sequences revealed high sequence conservation across the Coffea genus. A high level of polymorphic loci was obtained regardless of the species considered (from 75% for C. liberica to 86% for C. canephora). Moreover, the polymorphism revealed by EST–SSR was similar to that exposed by genomic SSR. It is concluded that Coffea ESTs are a valuable resource for microsatellite mining. EST-SSR markers developed from C. canephora sequences can be easily transferred to other Coffea species for which very little molecular information is available. They constitute a set of conserved orthologous markers, which would be ideal for assessing genetic diversity in coffee trees as well as for cross-referencing transcribed sequences in comparative genomics studies.  相似文献   

11.
Abstract

In this paper, we propose a new method based on the 2-D graphical representation to analyze the similarity of biological sequences and classify the protein secondary structure sequences. Instead of computing some characteristics from the distance matrix, the average area surrounded by the curve and X axis is computed as a new invariant. The new method is tested on two sets: the coding sequences of 30 mitochondrial genes from NCBI and 12 protein secondary structure sequences. The similarity/disimilarity and phylogenetic tree (dendrogram) of these sequences verify the validity of our method.  相似文献   

12.
Allostery is the phenomenon of changes in the structure and activity of proteins that appear as a consequence of ligand binding at sites other than the active site. Studying mechanistic basis of allostery leading to protein design with predetermined functional endpoints is an important unmet need of synthetic biology. Here, we screened the amino acid sequence landscape in search of sequence-signatures of allostery using Recurrence Quantitative Analysis (RQA) method. A characteristic vector, comprised of 10 features extracted from RQA was defined for amino acid sequences. Using Principal Component Analysis, four factors were found to be important determinants of allosteric behavior. Our sequence–based predictor method shows 82.6% accuracy, 85.7% sensitivity and 77.9% specificity with the current dataset. Further, we show that Laminarity-Mean-hydrophobicity representing repeated hydrophobic patches is the most crucial indicator of allostery. To our best knowledge this is the first report that describes sequence determinants of allostery based on hydrophobicity. As an outcome of these findings, we plan to explore possibility of inducing allostery in proteins.  相似文献   

13.
Diao Y  Ma D  Wen Z  Yin J  Xiang J  Li M 《Amino acids》2008,34(1):111-117
Summary. Transmembrane (TM) proteins represent about 20–30% of the protein sequences in higher eukaryotes, playing important roles across a range of cellular functions. Moreover, knowledge about topology of these proteins often provides crucial hints toward their function. Due to the difficulties in experimental structure determinations of TM protein, theoretical prediction methods are highly preferred in identifying the topology of newly found ones according to their primary sequences, useful in both basic research and drug discovery. In this paper, based on the concept of pseudo amino acid composition (PseAA) that can incorporate sequence-order information of a protein sequence so as to remarkably enhance the power of discrete models (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), cellular automata and Lempel-Ziv complexity are introduced to predict the TM regions of integral membrane proteins including both α-helical and β-barrel membrane proteins, validated by jackknife test. The result thus obtained is quite promising, which indicates that the current approach might be a quite potential high throughput tool in the post-genomic era. The source code and dataset are available for academic users at liml@scu.edu.cn. Authors’ address: Menglong Li, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, P.R. China  相似文献   

14.
Thus far, identification of functionally important residues in Type II restriction endonucleases (REases) has been difficult using conventional methods. Even though known REase structures share a fold and marginally recognizable active site, the overall sequence similarities are statistically insignificant, unless compared among proteins that recognize identical or very similar sequences. Bsp6I is a Type II REase, which recognizes the palindromic DNA sequence 5′GCNGC and cleaves between the cytosine and the unspecified nucleotide in both strands, generating a double-strand break with 5′-protruding single nucleotides. There are no solved structures of REases that recognize similar DNA targets or generate cleavage products with similar characteristics. In straightforward comparisons, the Bsp6I sequence shows no significant similarity to REases with known structures. However, using a fold-recognition approach, we have identified a remote relationship between Bsp6I and the structure of PvuII. Starting from the sequence–structure alignment between Bsp6I and PvuII, we constructed a homology model of Bsp6I and used it to predict functionally significant regions in Bsp6I. The homology model was supported by site-directed mutagenesis of residues predicted to be important for dimerization, DNA binding and catalysis. Completing the picture of sequence–structure–function relationships in protein superfamilies becomes an essential task in the age of structural genomics and our study may serve as a paradigm for future analyses of superfamilies comprising strongly diverged members with little or no sequence similarity.  相似文献   

15.
Cross–scale interactions refer to processes at one spatial or temporal scale interacting with processes at another scale to result in nonlinear dynamics with thresholds. These interactions change the pattern–process relationships across scales such that fine-scale processes can influence a broad spatial extent or a long time period, or broad-scale drivers can interact with fine-scale processes to determine system dynamics. Cross–scale interactions are increasing recognized as having important influences on ecosystem processes, yet they pose formidable challenges for understanding and forecasting ecosystem dynamics. In this introduction to the special feature, “Cross–scale interactions and pattern–process relationships”, we provide a synthetic framework for understanding the causes and consequences of cross–scale interactions. Our framework focuses on the importance of transfer processes and spatial heterogeneity at intermediate scales in linking fine- and broad-scale patterns and processes. Transfer processes and spatial heterogeneity can either amplify or attenuate system response to broad-scale drivers. Providing a framework to explain cross–scale interactions is an important step in improving our understanding and ability to predict the impacts of propagating events and to ameliorate these impacts through proactive measures.  相似文献   

16.
Analysis of mitochondrial 16S rRNA sequences of four speciments from two lizard species (Leiolepis guentherpetersi and L. reevesii) showed identity of 91.1–91.6% and the genetic distances were 8.8–9.3%. The two speciments (C5 and C7) of L. reevesii species have the homology of 96.5–99.4% with L. belliana and L. reevesii, respectively. Whereas, those of L. guentherpetersi species (S4 and S6) have higher homology of 99.6–100% with L. guttata and L. guentherpetersi, respectively. These mitochondrial 16S rRNA sequences of individuals from L. guentherpetersi (S4 and S6) and L. reevesii (C5 and C7) were deposited in GenBank with accession number EU428186, EU428187, EU428188, and EU428189, respectively.  相似文献   

17.

Background  

The subcellular localisation of proteins in intact living cells is an important means for gaining information about protein functions. Even dynamic processes can be captured, which can barely be predicted based on amino acid sequences. Besides increasing our knowledge about intracellular processes, this information facilitates the development of innovative therapies and new diagnostic methods. In order to perform such a localisation, the proteins under analysis are usually fused with a fluorescent protein. So, they can be observed by means of a fluorescence microscope and analysed. In recent years, several automated methods have been proposed for performing such analyses. Here, two different types of approaches can be distinguished: techniques which enable the recognition of a fixed set of protein locations and methods that identify new ones. To our knowledge, a combination of both approaches – i.e. a technique, which enables supervised learning using a known set of protein locations and is able to identify and incorporate new protein locations afterwards – has not been presented yet. Furthermore, associated problems, e.g. the recognition of cells to be analysed, have usually been neglected.  相似文献   

18.
In this work, the variability of spo0A gene in the genus Geobacillus and applicability of this gene for the taxonomy within this genus were evaluated. The protein Spo0A is the master regulator of the endospore-forming process in the all endospore-forming bacteria. Geobacillus genus-specific primers GEOSPO were designed based on the sequences of Geobacillus spo0A gene available through the public databases. Inter and intraspecific variability of Geobacillus spo0A gene was determined after sequencing of the GEOSPO-PCR products. Geobacillus spo0A sequence analysis showed that three species—Geobacillus thermodenitrificans, G. stearothermophilus, and G. jurassicus—could be easily identified. Similarity between the sequences of these species and the other species were in the range of 83.3%–92.0%. In contrast, intraspecific similarity of G. thermodenitrificans and G. stearothermophilus was high—above 99.0%. Similarity of spo0A sequences of G. subterraneus–G. uzenensis species cluster also matched this interval. Intercluster similarity between G. lituanicus–G. thermoleovorans–G. kaustophilus–G. vulcani and G. thermocatenulatus–G. gargensisG. caldoxylosilyticus–G. toebii–G. thermoglucosidasius species clusters, as well as interspecific similarity within these two clusters was in the range of the intraspecific similarity determined for G. thermodenitrificans and G. stearothermophilus. It was also determined that spo0A cannot be used as the phylogenetic marker for the genus Geobacillus.  相似文献   

19.
Rotenone and pyridaben were tested on activities and properties of rat brain mitochondria determining Ki (inhibitor concentration at half maximal inhibition) and Imax (% of inhibition at maximal inhibitor concentration). The assayed activities were complexes I, II and IV, respiration in states 3, 3u (uncoupled) and 4, biochemical and functional activities of mitochondrial nitric oxide synthase (mtNOS), and inner membrane potential. Selective inhibitions of complex I activity, mitochondrial respiration and membrane potential with malate-glutamate as substrate were observed, with a Ki of 0.28–0.36 nmol inhibitor/mg of mitochondrial protein. Functional mtNOS activity was half-inhibited at 0.70–0.74 nmol inhibitor/mg protein in state 3 mitochondria and at 2.52–2.98 nmol inhibitor/mg protein in state 3u mitochondria. This fact is interpreted as an indication of mtNOS being structurally adjacent to complex I with an intermolecular mtNOS-complex I hydrophobic bonding that is stronger at high Δψ and weaker at low Δψ.  相似文献   

20.
Xiao X  Shao S  Ding Y  Huang Z  Chou KC 《Amino acids》2006,30(1):49-54
Summary. The avalanche of newly found protein sequences in the post-genomic era has motivated and challenged us to develop an automated method that can rapidly and accurately predict the localization of an uncharacterized protein in cells because the knowledge thus obtained can greatly speed up the process in finding its biological functions. However, it is very difficult to establish such a desired predictor by acquiring the key statistical information buried in a pile of extremely complicated and highly variable sequences. In this paper, based on the concept of the pseudo amino acid composition (Chou, K. C. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246–255), the approach of cellular automata image is introduced to cope with this problem. Many important features, which are originally hidden in the long amino acid sequences, can be clearly displayed through their cellular automata images. One of the remarkable merits by doing so is that many image recognition tools can be straightforwardly applied to the target aimed here. High success rates were observed through the self-consistency, jackknife, and independent dataset tests, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号