首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

2.
The present century has witnessed an unprecedented rise in genome sequences owing to various genome-sequencing programs. However, the same has not been replicated with cDNA or expressed sequence tags (ESTs). Hence, prediction of protein coding sequence of genes from this enormous collection of genomic sequences presents a significant challenge. While robust high throughput methods of cloning and expression could be used to meet protein requirements, lack of intron information creates a bottleneck. Computational programs designed for recognizing intron–exon boundaries for a particular organism or group of organisms have their own limitations. Keeping this in view, we describe here a method for construction of intron-less gene from genomic DNA in the absence of cDNA/EST information and organism-specific gene prediction program. The method outlined is a sequential application of bioinformatics to predict correct intron–exon boundaries and splicing by overlap extension PCR for spliced gene synthesis. The gene construct so obtained can then be cloned for protein expression. The method is simple and can be used for any eukaryotic gene expression.  相似文献   

3.
PCP: a program for supervised classification of gene expression profiles   总被引:1,自引:0,他引:1  
PCP (Pattern Classification Program) is an open-source machine learning program for supervised classification of patterns (vectors of measurements). The principal use of PCP in bioinformatics is design and evaluation of classifiers for use in clinical diagnostic tests based on measurements of gene expression. PCP implements leading pattern classification and gene selection algorithms and incorporates cross-validation estimation of classifier performance. Importantly, the implementation integrates gene selection and class prediction stages, which is vital for computing reliable performance estimates in small-sample scenarios. Additionally, the program includes automated and efficient model selection (optimization of parameters) for support vector machine (SVM) classifier. The distribution includes Linux and Windows/Cygwin binaries. The program can easily be ported to other platforms. AVAILABILITY: Free download at http://pcp.sourceforge.net  相似文献   

4.
miRTour: Plant miRNA and target prediction tool   总被引:1,自引:0,他引:1  
MicroRNAs (miRNAs) are important negative regulators of gene expression in plant and animals, which are endogenously produced from their own genes. Computational comparative approach based on evolutionary conservation of mature miRNAs has revealed a number of orthologs of known miRNAs in different plant species. The homology-based plant miRNA discovery, followed by target prediction, comprises several steps, which have been done so far manually. Here, we present the bioinformatics pipeline miRTour which automates all the steps of miRNA similarity search, miRNA precursor selection, target prediction and annotation, each of them performed with the same set of input sequences. AVAILABILITY: The database is available for free at http://bio2server.bioinfo.uni-plovdiv.bg/miRTour/  相似文献   

5.
SUMMARY: MuSeqBox is a program to parse BLAST output and store attributes of BLAST hits in tabular form. The user can apply a number of selection criteria to filter out hits with particular attributes. MuSeqBox provides a powerful annotation tool for large sets of query sequences that are simultaneously compared against a database with any of the standard stand-alone or network-client BLAST programs. We discuss such application to the problem of annotation and analysis of EST collections. AVAILABILITY: The program was written in standard C++ and is freely available to noncommercial users by request from the authors. The program is also available over the web at http://bioinformatics.iastate.edu/bioinformatics2go/mb/MuSeqBox.html.  相似文献   

6.
7.
A Bayesian framework for combining gene predictions   总被引:2,自引:0,他引:2  
MOTIVATION: Gene identification and gene discovery in new genomic sequences is one of the most timely computational questions addressed by bioinformatics scientists. This computational research has resulted in several systems that have been used successfully in many whole-genome analysis projects. As the number of such systems grows the need for a rigorous way to combine the predictions becomes more essential. RESULTS: In this paper we provide a Bayesian network framework for combining gene predictions from multiple systems. The framework allows us to treat the problem as combining the advice of multiple experts. Previous work in the area used relatively simple ideas such as majority voting. We introduce, for the first time, the use of hidden input/output Markov models for combining gene predictions. We apply the framework to the analysis of the Adh region in Drosophila that has been carefully studied in the context of gene finding and used as a basis for the GASP competition. The main challenge in combination of gene prediction programs is the fact that the systems are relying on similar features such as cod on usage and as a result the predictions are often correlated. We show that our approach is promising to improve the prediction accuracy and provides a systematic and flexible framework for incorporating multiple sources of evidence into gene prediction systems.  相似文献   

8.
Wang Y  Xue Z  Xu J 《Proteins》2006,65(1):49-54
We have developed a novel method named AlphaTurn to predict alpha-turns in proteins based on the support vector machine (SVM). The prediction was done on a data set of 469 nonhomologous proteins containing 967 alpha-turns. A great improvement in prediction performance was achieved by using multiple sequence alignment generated by PSI-BLAST as input instead of the single amino acid sequence. The introduction of secondary structure information predicted by PSIPRED also improved the prediction performance. Moreover, we handled the very uneven data set by combining the cost factor j with the "state-shifting" rule. This further promoted the prediction quality of our method. The final SVM model yielded a Matthews correlation coefficient (MCC) of 0.25 by a 10-fold cross-validation. To our knowledge, this MCC value is the highest obtained so far for predicting alpha-turns. An online Web server based on this method has been developed and can be freely accessed at http://bmc.hust.edu.cn/bioinformatics/ or http://210.42.106.80/.  相似文献   

9.
MOTIVATION: We describe algorithms implemented in a new software package, RNAbor, to investigate structures in a neighborhood of an input secondary structure S of an RNA sequence s. The input structure could be the minimum free energy structure, the secondary structure obtained by analysis of the X-ray structure or by comparative sequence analysis, or an arbitrary intermediate structure. RESULTS: A secondary structure T of s is called a delta-neighbor of S if T and S differ by exactly delta base pairs. RNAbor computes the number (N(delta)), the Boltzmann partition function (Z(delta)) and the minimum free energy (MFE(delta)) and corresponding structure over the collection of all delta-neighbors of S. This computation is done simultaneously for all delta < or = m, in run time O (mn3) and memory O(mn2), where n is the sequence length. We apply RNAbor for the detection of possible RNA conformational switches, and compare RNAbor with the switch detection method paRNAss. We also provide examples of how RNAbor can at times improve the accuracy of secondary structure prediction. AVAILABILITY: http://bioinformatics.bc.edu/clotelab/RNAbor/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

10.
11.
A prediction scheme has been developed for the IBM PC and compatiblescontaining computer programs which make use of the protein secondarystructure prediction algorithms of Nagano (1977a,b), Gamieret al. (1978), Burgess et al. (1974), Chou and Fasman (1974a,b),him (1974) and Dufton and Hider (1977). The results of the individualprediction methods are combined as described by Hamodrakas etal. (1982) by the program PLOTPROG to produce joint predictionhistograms for a protein, for three types of secondary structure:-helix, ß-sheet and ß-turns. The schemerequires uniform input for the prediction programs, producedby any word processor, spreadsheet, editor or database programand produces uniform output on a printer, a graphics screenor a file. The scheme is independent of any additional softwareand runs under DOS 2.0 or later releases. Received on January 26, 1988; accepted on May 24, 1988  相似文献   

12.
MOTIVATION: The prediction of beta-turns is an important element of protein secondary structure prediction. Recently, a highly accurate neural network based method Betatpred2 has been developed for predicting beta-turns in proteins using position-specific scoring matrices (PSSM) generated by PSI-BLAST and secondary structure information predicted by PSIPRED. However, the major limitation of Betatpred2 is that it predicts only beta-turn and non-beta-turn residues and does not provide any information of different beta-turn types. Thus, there is a need to predict beta-turn types using an approach based on multiple sequence alignment, which will be useful in overall tertiary structure prediction. RESULTS: In the present work, a method has been developed for the prediction of beta-turn types I, II, IV and VIII. For each turn type, two consecutive feed-forward back-propagation networks with a single hidden layer have been used where the first sequence-to-structure network has been trained on single sequences as well as on PSI-BLAST PSSM. The output from the first network along with PSIPRED predicted secondary structure has been used as input for the second-level structure-to-structure network. The networks have been trained and tested on a non-homologous dataset of 426 proteins chains by 7-fold cross-validation. It has been observed that the prediction performance for each turn type is improved significantly by using multiple sequence alignment. The performance has been further improved by using a second level structure-to-structure network and PSIPRED predicted secondary structure information. It has been observed that Type I and II beta-turns have better prediction performance than Type IV and VIII beta-turns. The final network yields an overall accuracy of 74.5, 93.5, 67.9 and 96.5% with MCC values of 0.29, 0.29, 0.23 and 0.02 for Type I, II, IV and VIII beta-turns, respectively, and is better than random prediction. AVAILABILITY: A web server for prediction of beta-turn types I, II, IV and VIII based on above approach is available at http://www.imtech.res.in/raghava/betaturns/ and http://bioinformatics.uams.edu/mirror/betaturns/ (mirror site).  相似文献   

13.
Comparative sequence analysis is a powerful approach to identify functional elements in genomic sequences. Herein, we describe AGenDA (Alignment-based GENe Detection Algorithm), a novel method for gene prediction that is based on long-range alignment of syntenic regions in eukaryotic genome sequences. Local sequence homologies identified by the DIALIGN program are searched for conserved splice signals to define potential protein-coding exons; these candidate exons are then used to assemble complete gene structures. The performance of our method was tested on a set of 105 human-mouse sequence pairs. These test runs showed that sensitivity and specificity of AGenDA are comparable with the best gene- prediction program that is currently available. However, since our method is based on a completely different type of input information, it can detect genes that are not detectable by standard methods and vice versa. Thus, our approach seems to be a useful addition to existing gene-prediction programs. Availability: DIALIGN is available through the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak.uni-bielefeld.de/dialign/ The gene-prediction program AGenDA described in this paper will be available through the BiBiServ or MIPS web server at http://mips.gsf.de.  相似文献   

14.
Our current biological knowledge is spread over many independent bioinformatics databases where many different types of gene and protein identifiers are used. The heterogeneous and redundant nature of these identifiers limits data analysis across different bioinformatics resources. It is an even more serious bottleneck of data analysis for larger datasets, such as gene lists derived from microarray and proteomic experiments. The DAVID Gene ID Conversion Tool (DICT), a web-based application, is able to convert user's input gene or gene product identifiers from one type to another in a more comprehensive and high-throughput manner with a uniquely enhanced ID-ID mapping database.  相似文献   

15.
Chen H  Kihara D 《Proteins》2011,79(1):315-334
Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing because of the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of using suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we use suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach, which only uses the optimal alignment in defining residue contacts, and also the re-ranking strategy, which uses the contact potential in re-ranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperforms existing methods.  相似文献   

16.
Prediction of both conserved and nonconserved microRNA targets in animals   总被引:2,自引:0,他引:2  
MOTIVATION: MicroRNAs (miRNAs) are involved in many diverse biological processes and they may potentially regulate the functions of thousands of genes. However, one major issue in miRNA studies is the lack of bioinformatics programs to accurately predict miRNA targets. Animal miRNAs have limited sequence complementarity to their gene targets, which makes it challenging to build target prediction models with high specificity. RESULTS: Here we present a new miRNA target prediction program based on support vector machines (SVMs) and a large microarray training dataset. By systematically analyzing public microarray data, we have identified statistically significant features that are important to target downregulation. Heterogeneous prediction features have been non-linearly integrated in an SVM machine learning framework for the training of our target prediction model, MirTarget2. About half of the predicted miRNA target sites in human are not conserved in other organisms. Our prediction algorithm has been validated with independent experimental data for its improved performance on predicting a large number of miRNA down-regulated gene targets. AVAILABILITY: All the predicted targets were imported into an online database miRDB, which is freely accessible at http://mirdb.org.  相似文献   

17.
SCWRL and MolIDE are software applications for prediction of protein structures. SCWRL is designed specifically for the task of prediction of side-chain conformations given a fixed backbone usually obtained from an experimental structure determined by X-ray crystallography or NMR. SCWRL is a command-line program that typically runs in a few seconds. MolIDE provides a graphical interface for basic comparative (homology) modeling using SCWRL and other programs. MolIDE takes an input target sequence and uses PSI-BLAST to identify and align templates for comparative modeling of the target. The sequence alignment to any template can be manually modified within a graphical window of the target-template alignment and visualization of the alignment on the template structure. MolIDE builds the model of the target structure on the basis of the template backbone, predicted side-chain conformations with SCWRL and a loop-modeling program for insertion-deletion regions with user-selected sequence segments. SCWRL and MolIDE can be obtained at (http://dunbrack.fccc.edu/Software.php).  相似文献   

18.
MOTIVATION: Dynamic programming is probably the most popular programming method in bioinformatics. Sequence comparison, gene recognition, RNA structure prediction and hundreds of other problems are solved by ever new variants of dynamic programming. Currently, the development of a successful dynamic programming algorithm is a matter of experience, talent and luck. The typical matrix recurrence relations that make up a dynamic programming algorithm are intricate to construct, and difficult to implement reliably. No general problem independent guidance is available. RESULTS: This article introduces a systematic method for constructing dynamic programming solutions to problems in biosequence analysis. By a conceptual splitting of the algorithm into a recognition and an evaluation phase, algorithm development is simplified considerably, and correct recurrences can be derived systematically. Without additional effort, the method produces an early, executable prototype expressed in a functional programming language. The method is quite generally applicable, and, while programming effort decreases, no overhead in terms of ultimate program efficiency is incurred.  相似文献   

19.
Prediction of β-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting β-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: β-turns, β-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. Availability: http://webclu.bio.wzw.tum.de/predator-web/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号