期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets

Oğul H Mumcuoğlu EU 《Bio Systems》2007,87(1):75-81

In this study, n-peptide compositions are utilized for protein vectorization over a discriminative remote homology detection framework based on support vector machines (SVMs). The size of amino acid alphabet is gradually reduced for increasing values of n to make the method to conform with the memory resources in conventional workstations. A hash structure is implemented for accelerated search of n-peptides. The method is tested to see its ability to classify proteins into families on a subset of SCOP family database and compared against many of the existing homology detection methods including the most popular generative methods; SAM-98 and PSI-BLAST and the recent SVM methods; SVM-Fisher, SVM-BLAST and SVM-Pairwise. The results have demonstrated that the new method significantly outperforms SVM-Fisher, SVM-BLAST, SAM-98 and PSI-BLAST, while achieving a comparable accuracy with SVM-Pairwise. In terms of efficiency, it performs much better than SVM-Pairwise. It is shown that the information of n-peptide compositions with reduced amino acid alphabets provides an accurate and efficient means of protein vectorization for SVM-based sequence classification. 相似文献

2.

Motif kernel generated by genetic programming improves remote homology and fold detection

Tony Håndstad Arne JH Hestnes Pål Sætrom 《BMC bioinformatics》2007,8(1):23

Background

Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences. 相似文献

3.

RANKPROP: a web server for protein remote homology detection

Melvin I Weston J Leslie C Noble WS 《Bioinformatics (Oxford, England)》2009,25(1):121-122

Summary: We present a large-scale implementation of the RANKPROPprotein homology ranking algorithm in the form of an openlyaccessible web server. We use the NRDB40 PSI-BLAST all-versus-allprotein similarity network of 1.1 million proteins to constructthe graph for the RANKPROP algorithm, whereas previously, resultswere only reported for a database of 108 000 proteins. We alsodescribe two algorithmic improvements to the original algorithm,including propagation from multiple homologs of the query andbetter normalization of ranking scores, that lead to higheraccuracy and to scores with a probabilistic interpretation. Availability: The RANKPROP web server and source code are availableat http://rankprop.gs.washington.edu Contact: iain{at}nec-labs.com; noble{at}gs.washington.edu Associate Editor: Burkhard Rost 相似文献

4.

Word correlation matrices for protein sequence analysis and remote homology detection

Thomas Lingner Peter Meinicke 《BMC bioinformatics》2008,9(1):259

Background

Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive. 相似文献

5.

A discriminative method for protein remote homology detection and fold recognition combining Top-<Emphasis Type="Italic">n</Emphasis>-grams and latent semantic analysis

Bin Liu Xiaolong Wang Lei Lin Qiwen Dong Xuan Wang 《BMC bioinformatics》2008,9(1):510

Background

Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. 相似文献

6.

SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection

Shah AR Oehmen CS Webb-Robertson BJ 《Bioinformatics (Oxford, England)》2008,24(6):783-790

相似文献

7.

Profile-based direct kernels for remote homology detection and fold recognition

Rangwala H Karypis G 《Bioinformatics (Oxford, England)》2005,21(23):4239-4247

MOTIVATION: Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS: We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes. AVAILABILITY: The programs for computing the various kernel functions are available on request from the authors. 相似文献

8.

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection

Damoulas T Girolami MA 《Bioinformatics (Oxford, England)》2008,24(10):1264-1270

相似文献

9.

A comparison of profile hidden Markov model procedures for remote homology detection 总被引：7，自引：1，他引：7

下载免费PDF全文

Madera M Gough J 《Nucleic acids research》2002,30(19):4321-4328

Profile hidden Markov models (HMMs) are amongst the most successful procedures for detecting remote homology between proteins. There are two popular profile HMM programs, HMMER and SAM. Little is known about their performance relative to each other and to the recently improved version of PSI-BLAST. Here we compare the two programs to each other and to non-HMM methods, to determine their relative performance and the features that are important for their success. The quality of the multiple sequence alignments used to build models was the most important factor affecting the overall performance of profile HMMs. The SAM T99 procedure is needed to produce high quality alignments automatically, and the lack of an equivalent component in HMMER makes it less complete as a package. Using the default options and parameters as would be expected of an inexpert user, it was found that from identical alignments SAM consistently produces better models than HMMER and that the relative performance of the model-scoring components varies. On average, HMMER was found to be between one and three times faster than SAM when searching databases larger than 2000 sequences, SAM being faster on smaller ones. Both methods were shown to have effective low complexity and repeat sequence masking using their null models, and the accuracy of their E-values was comparable. It was found that the SAM T99 iterative database search procedure performs better than the most recent version of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times faster than scoring of SAM models. 相似文献

10.

A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming

Amini A Shrimpton PJ Muggleton SH Sternberg MJ 《Proteins》2007,69(4):823-831

Despite the increased recent use of protein-ligand and protein-protein docking in the drug discovery process due to the increases in computational power, the difficulty of accurately ranking the binding affinities of a series of ligands or a series of proteins docked to a protein receptor remains largely unsolved. This problem is of major concern in lead optimization procedures and has lead to the development of scoring functions tailored to rank the binding affinities of a series of ligands to a specific system. However, such methods can take a long time to develop and their transferability to other systems remains open to question. Here we demonstrate that given a suitable amount of background information a new approach using support vector inductive logic programming (SVILP) can be used to produce system-specific scoring functions. Inductive logic programming (ILP) learns logic-based rules for a given dataset that can be used to describe properties of each member of the set in a qualitative manner. By combining ILP with support vector machine regression, a quantitative set of rules can be obtained. SVILP has previously been used in a biological context to examine datasets containing a series of singular molecular structures and properties. Here we describe the use of SVILP to produce binding affinity predictions of a series of ligands to a particular protein. We also for the first time examine the applicability of SVILP techniques to datasets consisting of protein-ligand complexes. Our results show that SVILP performs comparably with other state-of-the-art methods on five protein-ligand systems as judged by similar cross-validated squares of their correlation coefficients. A McNemar test comparing SVILP to CoMFA and CoMSIA across the five systems indicates our method to be significantly better on one occasion. The ability to graphically display and understand the SVILP-produced rules is demonstrated and this feature of ILP can be used to derive hypothesis for future ligand design in lead optimization procedures. The approach can readily be extended to evaluate the binding affinities of a series of protein-protein complexes. 相似文献

11.

On single and multiple models of protein families for the detection of remote sequence relationships

James A Casbon Mansoor AS Saqi 《BMC bioinformatics》2006,7(1):48-7

Background

The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. 相似文献

12.

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Bobbie-Jo M Webb-Robertson Kyle G Ratuiste Christopher S Oehmen 《BMC bioinformatics》2010,11(1):145

Background

The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. 相似文献

13.

A method for the improvement of threading-based protein models

Kolinski A Rotkiewicz P Ilkowski B Skolnick J 《Proteins》1999,37(4):592-610

A new method for the homology-based modeling of protein three-dimensional structures is proposed and evaluated. The alignment of a query sequence to a structural template produced by threading algorithms usually produces low-resolution molecular models. The proposed method attempts to improve these models. In the first stage, a high-coordination lattice approximation of the query protein fold is built by suitable tracking of the incomplete alignment of the structural template and connection of the alignment gaps. These initial lattice folds are very similar to the structures resulting from standard molecular modeling protocols. Then, a Monte Carlo simulated annealing procedure is used to refine the initial structure. The process is controlled by the model's internal force field and a set of loosely defined restraints that keep the lattice chain in the vicinity of the template conformation. The internal force field consists of several knowledge-based statistical potentials that are enhanced by a proper analysis of multiple sequence alignments. The template restraints are implemented such that the model chain can slide along the template structure or even ignore a substantial fraction of the initial alignment. The resulting lattice models are, in most cases, closer (sometimes much closer) to the target structure than the initial threading-based models. All atom models could easily be built from the lattice chains. The method is illustrated on 12 examples of target/template pairs whose initial threading alignments are of varying quality. Possible applications of the proposed method for use in protein function annotation are briefly discussed. 相似文献

14.

A computational framework for gene regulatory network inference that combines multiple methods and datasets

Rita Gupta Anna Stincone Philipp Antczak Sarah Durant Roy Bicknell Andreas Bikfalvi Francesco Falciani 《BMC systems biology》2011,5(1):52

Background

Reverse engineering in systems biology entails inference of gene regulatory networks from observational data. This data typically include gene expression measurements of wild type and mutant cells in response to a given stimulus. It has been shown that when more than one type of experiment is used in the network inference process the accuracy is higher. Therefore the development of generally applicable and effective methodologies that embed multiple sources of information in a single computational framework is a worthwhile objective. 相似文献

15.

Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modelling using distant relationships

Chakrabarti S Sowdhamini R 《FEBS letters》2004,569(1-3):31-36

Structurally conserved regions or structural templates have been identified and examined for features such as amino acid content, solvent accessibility, secondary structures, non-polar interaction, residue packing and extent of structural deviations in 179 aligned members of superfamilies involving 1208 pairs of protein domains. An analysis of these structural features shows that the retention of secondary structural conservation and similar hydrogen bonding pattern within the templates is 2.5 and 1.8 times higher, respectively, than full-length alignments suggesting that they form the minimum structural requirement of a superfamily. The identification and availability of structural templates find value in different areas of protein structure prediction and modelling such as in sensitive sequence searches, accurate sequence alignment and three-dimensional modelling on the basis of distant relationships. 相似文献

16.

A family-based likelihood ratio test for general pedigree structures that allows for genotyping error and missing data

Yang Y Wise CA Gordon D Finch SJ 《Human heredity》2008,66(2):99-110

The purpose of this work is the development of a family-based association test that allows for random genotyping errors and missing data and makes use of information on affected and unaffected pedigree members. We derive the conditional likelihood functions of the general nuclear family for the following scenarios: complete parental genotype data and no genotyping errors; only one genotyped parent and no genotyping errors; no parental genotype data and no genotyping errors; and no parental genotype data with genotyping errors. We find maximum likelihood estimates of the marker locus parameters, including the penetrances and population genotype frequencies under the null hypothesis that all penetrance values are equal and under the alternative hypothesis. We then compute the likelihood ratio test. We perform simulations to assess the adequacy of the central chi-square distribution approximation when the null hypothesis is true. We also perform simulations to compare the power of the TDT and this likelihood-based method. Finally, we apply our method to 23 SNPs genotyped in nuclear families from a recently published study of idiopathic scoliosis (IS). Our simulations suggest that this likelihood ratio test statistic follows a central chi-square distribution with 1 degree of freedom under the null hypothesis, even in the presence of missing data and genotyping errors. The power comparison shows that this likelihood ratio test is more powerful than the original TDT for the simulations considered. For the IS data, the marker rs7843033 shows the most significant evidence for our method (p = 0.0003), which is consistent with a previous report, which found rs7843033 to be the 2nd most significant TDTae p value among a set of 23 SNPs. 相似文献

17.

A numerical method for renal models that represent tubules with abrupt changes in membrane properties

Layton AT Layton HE 《Journal of mathematical biology》2002,45(6):549-567

The urine concentrating mechanism of mammals and birds depends on a counterflow configuration of thousands of nearly parallel tubules in the medulla of the kidney. Along the course of a renal tubule, cell type may change abruptly, resulting in abrupt changes in the physical characteristics and transmural transport properties of the tubule. A mathematical model that faithfully represents these abrupt changes will have jump discontinuities in model parameters. Without proper treatment, such discontinuities may cause unrealistic transmural fluxes and introduce suboptimal spatial convergence in the numerical solution to the model equations. In this study, we show how to treat discontinuous parameters in the context of a previously developed numerical method that is based on the semi-Lagrangian semi-implicit method and Newton's method. The numerical solutions have physically plausible fluxes at the discontinuities and the solutions converge at second order, as is appropriate for the method. Received: 13 November 2001 / Revised version: 28 June 2002 / Published online: 26 September 2002 This work was supported in part by the National Institutes of Health (National Institute of Diabetes and Digestive and Kidney Diseases, grant DK-42091.) Mathematics Subject Classification (2000): 65-04, 65M12, 65M25, 92-04, 92C35, 35-04, 35L45 Keywords or phrases: Mathematical models – Differential equations – Mathematical biology – Kidney – Renal medulla – Semi-Lagrangian semi-implicit 相似文献

18.

A computer program for enzyme kinetics that combines model discrimination, parameter refinement and sequential experimental design.

下载免费PDF全文

R Franco M T Gavaldà E I Canela 《The Biochemical journal》1986,238(3):855-862

A method of model discrimination and parameter estimation in enzyme kinetics is proposed. The experimental design and analysis of the model are carried out simultaneously and the stopping rule for experimentation is deduced by the experimenter when the probabilities a posteriori indicate that one model is clearly superior to the rest. A FORTRAN77 program specifically developed for joint designs is given. The method is very powerful, as indicated by its usefulness in the discrimination between models. For example, it has been successfully applied to three cases of enzyme kinetics (a single-substrate Michaelian reaction with product inhibition, a single-substrate complex reaction and a two-substrate reaction). By using this method the most probable model and the estimates of the parameters can be obtained in one experimental session. The FORTRAN77 program is deposited as Supplementary Publication SUP 50134 (19 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1986) 233, 5. 相似文献

19.

A novel method for comparing topological models of protein structures enhanced with ligand information

Veeramalai M Gilbert D 《Bioinformatics (Oxford, England)》2008,24(23):2698-2705

相似文献

20.

A Bayesian network model for protein fold and remote homologue recognition

Raval A Ghahramani Z Wild DL 《Bioinformatics (Oxford, England)》2002,18(6):788-801

MOTIVATION: The Bayesian network approach is a framework which combines graphical representation and probability theory, which includes, as a special case, hidden Markov models. Hidden Markov models trained on amino acid sequence or secondary structure data alone have been shown to have potential for addressing the problem of protein fold and superfamily classification. RESULTS: This paper describes a novel implementation of a Bayesian network which simultaneously learns amino acid sequence, secondary structure and residue accessibility for proteins of known three-dimensional structure. An awareness of the errors inherent in predicted secondary structure may be incorporated into the model by means of a confusion matrix. Training and validation data have been derived for a number of protein superfamilies from the Structural Classification of Proteins (SCOP) database. Cross validation results using posterior probability classification demonstrate that the Bayesian network performs better in classifying proteins of known structural superfamily than a hidden Markov model trained on amino acid sequences alone. 相似文献