共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
We describe Distill, a suite of servers for the prediction of protein structural features: secondary structure; relative solvent accessibility; contact density; backbone structural motifs; residue contact maps at 6, 8 and 12 Angstrom; coarse protein topology. The servers are based on large-scale ensembles of recursive neural networks and trained on large, up-to-date, non-redundant subsets of the Protein Data Bank. Together with structural feature predictions, Distill includes a server for prediction of C α traces for short proteins (up to 200 amino acids). 相似文献2.
SUMMARY: ESTminer is a collection of programs that use expressed sequence tag (EST) data from inbred genomes to identify unique genes within gene families. The algorithm utilizes Cap3 to perform an initial clustering of related EST sequences to produce a consensus sequence of a gene family. These consensus sequences are then used to collect all ESTs in the original EST library that are related using BLAST. A redundancy based criterion is applied to each EST to identify reliable unique gene-sequences. Using a highly inbred genome as a source of ESTs eliminates the necessity of computing covariance on each polymorphism to identify alleles of the same gene, thus making this algorithm more streamlined than other alternatives which must computationally attempt to distinguish genes from alleles. AVAILABILITY: The programs were written in PERL and are freely available at http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html CONTACT: nelsonrt@iastate.edu SUPPLEMENTARY INFORMATION: Figures and dataset can be obtained from: http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html. 相似文献
3.
Dyer RJ 《Molecular ecology resources》2009,9(1):110-113
The analysis of genetic marker data is increasingly being conducted in the context of the spatial arrangement of strata (e.g. populations) necessitating a more flexible set of analysis tools. GeneticStudio consists of four interacting programs: (i) Geno a spreadsheet-like interface for the analysis of spatially explicit marker-based genetic variation; (ii) Graph software for the analysis of Population Graph and network topologies, (iii) Manteller, a general purpose for matrix analysis program; and (iv) SNPFinder, a program for identifying single nucleotide polymorphisms. The GeneticStudio suite is available as source code as well as binaries for OSX and Windows and is distributed under the GNU General Public License. 相似文献
4.
A novel alignment-free method for computing functional similarity of membrane proteins based on features of hydropathy distribution is presented. The features of hydropathy distribution are used to represent protein families as hydropathy profiles. The profiles statistically summarize the hydropathy distribution of member proteins. The summation is made by using hydropathy features that numerically represent structurally/functionally significant portions of protein sequences. The hydropathy profiles are numerical vectors that are points in a high dimensional 'hydropathy' space. Their similarities are identified by projection of the space onto principal axes. Here, the approach is applied to the secondary transporters. The analysis using the presented approach is validated by the standard classification of the secondary transporters. The presented analysis allows for prediction of function attributes for proteins of uncharacterized families of secondary transporters. The results obtained using the presented analysis may help to characterize unknown function attributes of secondary transporters. They also show that analysis of hydropathy distribution can be used for function prediction of membrane proteins. 相似文献
5.
We describe a novel presentation of the conformation of the backbone atoms for proteins of known structure. Given the Cα atom cartesian co-ordinates from X-ray crystallography, a matrix is calculated, where the ijth element of the matrix is the cosine of the angle between the direction of the chain at residue i and the direction of the chain at residue j. These “direction matrices” have distinctive patterns which correspond to α-helix, extended structure, straight or bent segments, “superhelix”, and many other important structural features. We discuss the direction matrices for a number of proteins, and make some generalizations on the basic principles of protein folding. 相似文献
6.
7.
Cost-effective ways of controlling inbreeding in conservation or productive plantations imply the allocation of individuals reducing the possibility of close relatives' mating and, consequently, limiting inbreeding. sofsog is a suite of programs, which helps to design plantation sites. First, if the plantation scheme involves several plots, it allows distribution of individuals available among different sites minimizing within-site global coancestry. Then, it yields a plantation design for each site, either following the classical permutated neighbourhood strategy or the recently developed method by Fernández and González-Martínez. This new method allows the implementation of different pollen dispersion kernels, and to include in the designing strategy any available information on individual relationships, reproductive success, differences in phenology, etc., via weighting or penalization matrices. Additionally, the package includes a tool for calculating the molecular coancestry (Identity By State) from codominant marker data. 相似文献
8.
Ashraf Yaseen Mais Nijim Brandon Williams Lei Qian Min Li Jianxin Wang Yaohang Li 《BMC bioinformatics》2016,17(8):281
Background
The fluctuation of atoms around their average positions in protein structures provides important information regarding protein dynamics. This flexibility of protein structures is associated with various biological processes. Predicting flexibility of residues from protein sequences is significant for analyzing the dynamic properties of proteins which will be helpful in predicting their functions.Results
In this paper, an approach of improving the accuracy of protein flexibility prediction is introduced. A neural network method for predicting flexibility in 3 states is implemented. The method incorporates sequence and evolutionary information, context-based scores, predicted secondary structures and solvent accessibility, and amino acid properties. Context-based statistical scores are derived, using the mean-field potentials approach, for describing the different preferences of protein residues in flexibility states taking into consideration their amino acid context.The 7-fold cross validated accuracy reached 61 % when context-based scores and predicted structural states are incorporated in the training process of the flexibility predictor.Conclusions
Incorporating context-based statistical scores with predicted structural states are important features to improve the performance of predicting protein flexibility, as shown by our computational results. Our prediction method is implemented as web service called “FLEXc” and available online at: http://hpcr.cs.odu.edu/flexc.9.
Holden JF Poole Ii FL Tollaksen SL Giometti CS Lim H Yates Iii JR Adams MW 《Comparative and Functional Genomics》2001,2(5):275-288
Cell-free extracts from the hyperthermophilic archaeon Pyrococcus furiosus were separated into membrane and cytoplasmic fractions and each was analyzed by 2D-gel electrophoresis. A total of 66 proteins were identified, 32 in the membrane fraction and 34 in the cytoplasmic fraction. Six prediction programs were used to predict the subcellular locations of these proteins. Three were based on signal-peptides (SignalP, TargetP, and SOSUISignal) and three on transmembrane-spanning alpha-helices (TSEG, SOSUI, and PRED-TMR2). A consensus of the six programs predicted that 23 of the 32 proteins (72%) from the membrane fraction should be in the membrane and that all of the proteins from the cytoplasmic fraction should be in the cytoplasm. Two membrane-associated proteins predicted to be cytoplasmic by the programs are also predicted to consist primarily of transmembrane-spanning beta-sheets using porin protein models, suggesting that they are, in fact, membrane components. An ATPase subunit homolog found in the membrane fraction, although predicted to be cytoplasmic, is most likely complexed with other ATPase subunits in the membrane fraction. An additional three proteins predicted to be cytoplasmic but found in the membrane fraction, may be cytoplasmic contaminants. These include a chaperone homolog that may have attached to denatured membrane proteins during cell fractionation. Omitting these three proteins would boost the membrane-protein predictability of the models to near 80%. A consensus prediction using all six programs for all 2242 ORFs in the P. furiosus genome estimates that 24% of the ORF products are found in the membrane. However, this is likely to be a minimum value due to the programs' inability to recognize certain membrane-related proteins, such as subunits associated with membrane complexes and porin-type proteins. 相似文献
10.
Shandar Ahmad Yumlembam Hemajit Singh Yogesh Paudel Takaharu Mori Yuji Sugita Kenji Mizuguchi 《BMC bioinformatics》2010,11(1):533
Background
Many structural properties such as solvent accessibility, dihedral angles and helix-helix contacts can be assigned to each residue in a membrane protein. Independent studies exist on the analysis and sequence-based prediction of some of these so-called one-dimensional features. However, there is little explanation of why certain residues are predicted in a wrong structural class or with large errors in the absolute values of these features. On the other hand, membrane proteins undergo conformational changes to allow transport as well as ligand binding. These conformational changes often occur via residues that are inherently flexible and hence, predicting fluctuations in residue positions is of great significance. 相似文献11.
MacT is a set of programs for the Apple Macintosh to constructand evaluate unrooted trees derived from amino acid sequencesusing a distance matrix method. Programs are designed on a oneprogramone task basis for (i) determining thebranching order in trees consisting of four or five speciesand calculating various statistical measures, (ii) calculatingstatistical measures for all possible topologies of unrootedtrees and (iii) generating and evaluating trees derived frombootstrapped samples. With four auxiliary programs unrootedtrees can be built for maximal 26 species, and the robustnessof topologies be tested by bootstrapping. 相似文献
12.
Turn prediction in proteins using a pattern-matching approach 总被引:16,自引:0,他引:16
We extend the use of amino acid sequence patterns [Cohen, F.E., Abarbanel, R. M., Kuntz, I. D., & Fletterick, R. J. (1983) Biochemistry 22, 4894-4904] to the identification of turns in globular proteins. The approach uses a conservative strategy, combined with a hierarchical search (strongest patterns first) and length-dependent masking, to achieve high accuracy (95%) on a test set of proteins of known structure. Applying the same procedure to homologous families gives a 90% success rate. Straightforward changes are suggested to improve the predictive power. The computer program, written in Lisp, provides a general pattern-recognition language well suited for a number of investigations of protein and nucleic acid sequences. 相似文献
13.
In meiosis, the exchange of DNA between chromosomes by homologous recombination is a critical step that ensures proper chromosome segregation and increases genetic diversity. Products of recombination include reciprocal exchanges, known as crossovers, and non-reciprocal gene conversions or non-crossovers. The mechanisms underlying meiotic recombination remain elusive, largely because of the difficulty of analyzing large numbers of recombination events by traditional genetic methods. These traditional methods are increasingly being superseded by high-throughput techniques capable of surveying meiotic recombination on a genome-wide basis. Next-generation sequencing or microarray hybridization is used to genotype thousands of polymorphic markers in the progeny of hybrid yeast strains. New computational tools are needed to perform this genotyping and to find and analyze recombination events. We have developed a suite of programs, ReCombine, for using short sequence reads from next-generation sequencing experiments to genotype yeast meiotic progeny. Upon genotyping, the program CrossOver, a component of ReCombine, then detects recombination products and classifies them into categories based on the features found at each location and their distribution among the various chromatids. CrossOver is also capable of analyzing segregation data from microarray experiments or other sources. This package of programs is designed to allow even researchers without computational expertise to use high-throughput, whole-genome methods to study the molecular mechanisms of meiotic recombination. 相似文献
14.
Membrane proteins, which constitute approximately 20% of most genomes, form two main classes: alpha helical and beta barrel transmembrane proteins. Using methods based on Bayesian Networks, a powerful approach for statistical inference, we have sought to address beta-barrel topology prediction. The beta-barrel topology predictor reports individual strand accuracies of 88.6%. The method outlined here represents a potentially important advance in the computational determination of membrane protein topology. 相似文献
15.
Background
Protein-Carbohydrate interactions are crucial in many biological processes with implications to drug targeting and gene expression. Nature of protein-carbohydrate interactions may be studied at individual residue level by analyzing local sequence and structure environments in binding regions in comparison to non-binding regions, which provide an inherent control for such analyses. With an ultimate aim of predicting binding sites from sequence and structure, overall statistics of binding regions needs to be compiled. Sequence-based predictions of binding sites have been successfully applied to DNA-binding proteins in our earlier works. We aim to apply similar analysis to carbohydrate binding proteins. However, due to a relatively much smaller region of proteins taking part in such interactions, the methodology and results are significantly different. A comparison of protein-carbohydrate complexes has also been made with other protein-ligand complexes. 相似文献16.
Fourier transform infrared (FTIR) spectroscopy is a very flexible technique for characterization of protein secondary structure. Measurements can be carried out rapidly in a number of different environments based on only small quantities of proteins. For this technique to become more widely used for protein secondary structure characterization, however, further developments in methods to accurately quantify protein secondary structure are necessary. Here we propose a structural classification of proteins (SCOP) class specialized neural networks architecture combining an adaptive neuro-fuzzy inference system (ANFIS) with SCOP class specialized backpropagation neural networks for improved protein secondary structure prediction. Our study shows that proteins can be accurately classified into two main classes "all alpha proteins" and "all beta proteins" merely based on the amide I band maximum position of their FTIR spectra. ANFIS is employed to perform the classification task to demonstrate the potential of this architecture with moderately complex problems. Based on studies using a reference set of 17 proteins and an evaluation set of 4 proteins, improved predictions were achieved compared to a conventional neural network approach, where structure specialized neural networks are trained based on protein spectra of both "all alpha" and "all beta" proteins. The standard errors of prediction (SEPs) in % structure were improved by 4.05% for helix structure, by 5.91% for sheet structure, by 2.68% for turn structure, and by 2.15% for bend structure. For other structure, an increase of SEP by 2.43% was observed. Those results were confirmed by a "leave-one-out" run with the combined set of 21 FTIR spectra of proteins. 相似文献
17.
Membrane proteins, which constitute approximately 20% of most genomes, are poorly tractable targets for experimental structure determination, thus analysis by prediction and modelling makes an important contribution to their on-going study. Membrane proteins form two main classes: alpha helical and beta barrel trans-membrane proteins. By using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we addressed alpha-helical topology prediction. This method has accuracies of 77.4% for prokaryotic proteins and 61.4% for eukaryotic proteins. The method described here represents an important advance in the computational determination of membrane protein topology and offers a useful, and complementary, tool for the analysis of membrane proteins for a range of applications. 相似文献
18.
Simple flexible programs (TREEMOMENT and PILEUPMOMENT) are described for depicting the average amphipathicity (hydrophobic moment) along multiply aligned sequences of a family of evolutionarily related proteins. The programs are applicable to any number of aligned sequences and can be set for any desired angle corresponding to a residue repeat unit in a protein secondary structural element such as 100 per residue for an alpha- helix or 180 per residue for a beta-strand. These programs can be used to identify amphipathic regions common to the members of a protein family. The use of these programs is exemplified by showing that some families of integral membrane transport proteins (i.e. permeases of the bacterial phosphotransferase system (PTS) and the anion exchangers of animals) exhibit strikingly amphipathic alpha-helical structures immediately preceding the first hydrophobic transmembrane segment of their membrane-embedded domain(s). Other families, such as the major facilitator superfamily of uniporters, symporters and antiporters, do not exhibit this structural feature. The amphipathic structures in PTS permeases have been implicated in membrane insertion during biogenesis. 相似文献
19.
SPAAN: a software program for prediction of adhesins and adhesin-like proteins using neural networks
MOTIVATION: The adhesion of microbial pathogens to host cells is mediated by adhesins. Experimental methods used for characterizing adhesins are time-consuming and demand large resources. The availability of specialized software can rapidly aid experimenters in simplifying this problem. We have employed 105 compositional properties and artificial neural networks to develop SPAAN, which predicts the probability of a protein being an adhesin (Pad). RESULTS: SPAAN had optimal sensitivity of 89% and specificity of 100% on a defined test set and could identify 97.4% of known adhesins at high Pad value from a wide range of bacteria. Furthermore, SPAAN facilitated improved annotation of several proteins as adhesins. Novel adhesins were identified in 17 pathogenic organisms causing diseases in humans and plants. In the severe acute respiratory syndrome (SARS) associated human corona virus, the spike glycoprotein and nsps (nsp2, nsp5, nsp6 and nsp7) were identified as having adhesin-like characteristics. These results offer new lead for rapid experimental testing. AVAILABILITY: SPAAN is freely available through ftp://203.195.151.45 CONTACT: ramu@igib.res.in. 相似文献
20.
M. S. Kondrat’ev A. V. Kabanov V. M. Komarov N. N. Khechinashvili A. A. Samchenko 《Biophysics》2011,56(6):1026-1032
The results of theoretical studies of the structural and dynamic features of peptides and small proteins have been presented that were carried out by quantum chemical and molecular dynamics methods in high-performance graphic stations, “table supercomputers,” using distributed calculations by the CUDA technology. 相似文献