首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate.  相似文献   

2.

Background

Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs.

Methodology/Principal Findings

We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences.

Conclusions/Significance

Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability: http://compbio.cs.sfu.ca/taverna/smyrna  相似文献   

3.

Background

Proteins interact with a variety of other molecules such as nucleic acids, small molecules and other proteins inside the cell. Structure-determination of protein-protein complexes is challenging due to several reasons such as the large molecular weights of these macromolecular complexes, their dynamic nature, difficulty in purification and sample preparation. Computational docking permits an early understanding of the feasibility and mode of protein-protein interactions. However, docking algorithms propose a number of solutions and it is a challenging task to select the native or near native pose(s) from this pool. DockScore is an objective scoring scheme that can be used to rank protein-protein docked poses. It considers several interface parameters, namely, surface area, evolutionary conservation, hydrophobicity, short contacts and spatial clustering at the interface for scoring.

Results

We have implemented DockScore in form of a webserver for its use by the scientific community. DockScore webserver can be employed, subsequent to docking, to perform scoring of the docked solutions, starting from multiple poses as inputs. The results, on scores and ranks for all the poses, can be downloaded as a csv file and graphical view of the interface of best ranking poses is possible.

Conclusions

The webserver for DockScore is made freely available for the scientific community at: http://caps.ncbs.res.in/dockscore/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0572-6) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution.

Results

Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case.

Conclusions

The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0399-6) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Various methods have been developed to computationally predict hotspot residues at novel protein-protein interfaces. However, there are various challenges in obtaining accurate prediction. We have developed a novel method which uses different aspects of protein structure and sequence space at residue level to highlight interface residues crucial for the protein-protein complex formation.

Results

ECMIS (Energetic Conservation Mass Index and Spatial Clustering) algorithm was able to outperform existing hotspot identification methods. It was able to achieve around 80% accuracy with incredible increase in sensitivity and outperforms other existing methods. This method is even sensitive towards the hotspot residues contributing only small-scale hydrophobic interactions.

Conclusion

Combination of diverse features of the protein viz. energy contribution, extent of conservation, location and surrounding environment, along with optimized weightage for each feature, was the key for the success of the algorithm. The academic version of the algorithm is available at http://caps.ncbs.res.in/download/ECMIS/ECMIS.zip.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-303) contains supplementary material, which is available to authorized users.  相似文献   

6.
de Vries SJ  Bonvin AM 《PloS one》2011,6(3):e17695

Background

Macromolecular complexes are the molecular machines of the cell. Knowledge at the atomic level is essential to understand and influence their function. However, their number is huge and a significant fraction is extremely difficult to study using classical structural methods such as NMR and X-ray crystallography. Therefore, the importance of large-scale computational approaches in structural biology is evident. This study combines two of these computational approaches, interface prediction and docking, to obtain atomic-level structures of protein-protein complexes, starting from their unbound components.

Methodology/Principal Findings

Here we combine six interface prediction web servers into a consensus method called CPORT (Consensus Prediction Of interface Residues in Transient complexes). We show that CPORT gives more stable and reliable predictions than each of the individual predictors on its own. A protocol was developed to integrate CPORT predictions into our data-driven docking program HADDOCK. For cases where experimental information is limited, this prediction-driven docking protocol presents an alternative to ab initio docking, the docking of complexes without the use of any information. Prediction-driven docking was performed on a large and diverse set of protein-protein complexes in a blind manner. Our results indicate that the performance of the HADDOCK-CPORT combination is competitive with ZDOCK-ZRANK, a state-of-the-art ab initio docking/scoring combination. Finally, the original interface predictions could be further improved by interface post-prediction (contact analysis of the docking solutions).

Conclusions/Significance

The current study shows that blind, prediction-driven docking using CPORT and HADDOCK is competitive with ab initio docking methods. This is encouraging since prediction-driven docking represents the absolute bottom line for data-driven docking: any additional biological knowledge will greatly improve the results obtained by prediction-driven docking alone. Finally, the fact that original interface predictions could be further improved by interface post-prediction suggests that prediction-driven docking has not yet been pushed to the limit. A web server for CPORT is freely available at http://haddock.chem.uu.nl/services/CPORT.  相似文献   

7.

Motivation

Computational simulation of protein-protein docking can expedite the process of molecular modeling and drug discovery. This paper reports on our new F2 Dock protocol which improves the state of the art in initial stage rigid body exhaustive docking search, scoring and ranking by introducing improvements in the shape-complementarity and electrostatics affinity functions, a new knowledge-based interface propensity term with FFT formulation, a set of novel knowledge-based filters and finally a solvation energy (GBSA) based reranking technique. Our algorithms are based on highly efficient data structures including the dynamic packing grids and octrees which significantly speed up the computations and also provide guaranteed bounds on approximation error.

Results

The improved affinity functions show superior performance compared to their traditional counterparts in finding correct docking poses at higher ranks. We found that the new filters and the GBSA based reranking individually and in combination significantly improve the accuracy of docking predictions with only minor increase in computation time. We compared F2 Dock 2.0 with ZDock 3.0.2 and found improvements over it, specifically among 176 complexes in ZLab Benchmark 4.0, F2 Dock 2.0 finds a near-native solution as the top prediction for 22 complexes; where ZDock 3.0.2 does so for 13 complexes. F2 Dock 2.0 finds a near-native solution within the top 1000 predictions for 106 complexes as opposed to 104 complexes for ZDock 3.0.2. However, there are 17 and 15 complexes where F2 Dock 2.0 finds a solution but ZDock 3.0.2 does not and vice versa; which indicates that the two docking protocols can also complement each other.

Availability

The docking protocol has been implemented as a server with a graphical client (TexMol) which allows the user to manage multiple docking jobs, and visualize the docked poses and interfaces. Both the server and client are available for download. Server: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dock.shtml. Client: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dockclient.shtml.  相似文献   

8.

Background

T cell receptors (TCRs) can recognize diverse lipid and metabolite antigens presented by MHC-like molecules CD1 and MR1, and the molecular basis of many of these interactions has not been determined. Here we applied our protein docking algorithm TCRFlexDock, previously developed to perform docking of TCRs to peptide-MHC (pMHC) molecules, to predict the binding of αβ and γδ TCRs to CD1 and MR1, starting with the structures of the unbound molecules.

Results

Evaluating against TCR-CD1d complexes with crystal structures, we achieved near-native structures in the top 20 models for two out of four cases, and an acceptable-rated prediction for a third case. We also predicted the structure of an interaction between a MAIT TCR and MR1-antigen that has not been structurally characterized, yielding a top-ranked model that agreed remarkably with a characterized TCR-MR1-antigen structure that has a nearly identical TCR α chain but a different β chain, highlighting the likely dominance of the conserved α chain in MR1-antigen recognition. Docking performance was improved by re-training our scoring function with a set of TCR-pMHC complexes, and for a case with an outlier binding mode, we found that alternative docking start positions improved predictive accuracy. We then performed unbound docking with two mycolyl-lipid specific TCRs that recognize lipid-bound CD1b, which represent a class of interactions that is not structurally characterized. Highly-ranked models of these complexes showed remarkable agreement between their binding topologies, as expected based on their shared germline sequences, while differences in residue-level interactions with their respective antigens point to possible mechanisms underlying their distinct specificities.

Conclusions

Together these results indicate that flexible docking simulations can provide accurate models and atomic-level insights into TCR recognition of MHC-like molecules presenting lipid and other small molecule antigens.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-319) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Normal mode analysis (NMA) using elastic network models is a reliable and cost-effective computational method to characterise protein flexibility and by extension, their dynamics. Further insight into the dynamics–function relationship can be gained by comparing protein motions between protein homologs and functional classifications. This can be achieved by comparing normal modes obtained from sets of evolutionary related proteins.

Results

We have developed an automated tool for comparative NMA of a set of pre-aligned protein structures. The user can submit a sequence alignment in the FASTA format and the corresponding coordinate files in the Protein Data Bank (PDB) format. The computed normalised squared atomic fluctuations and atomic deformation energies of the submitted structures can be easily compared on graphs provided by the web user interface. The web server provides pairwise comparison of the dynamics of all proteins included in the submitted set using two measures: the Root Mean Squared Inner Product and the Bhattacharyya Coefficient. The Comparative Analysis has been implemented on our web server for NMA, WEBnm@, which also provides recently upgraded functionality for NMA of single protein structures. This includes new visualisations of protein motion, visualisation of inter-residue correlations and the analysis of conformational change using the overlap analysis. In addition, programmatic access to WEBnm@ is now available through a SOAP-based web service. Webnm@ is available at http://apps.cbu.uib.no/webnma.

Conclusion

WEBnm@ v2.0 is an online tool offering unique capability for comparative NMA on multiple protein structures. Along with a convenient web interface, powerful computing resources, and several methods for mode analyses, WEBnm@ facilitates the assessment of protein flexibility within protein families and superfamilies. These analyses can give a good view of how the structures move and how the flexibility is conserved over the different structures.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0427-6) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

Traditionally, dental models, facial and intra-oral photographs and a set of two-dimensional radiographs are used for orthodontic diagnosis and treatment planning. As evidence is lacking, the discussion is ongoing which specific records are needed for the process of making an orthodontic treatment plan.

Objective

To estimate the contribution and importance of different diagnostic records for making an orthodontic diagnosis and treatment plan.

Data sources

An electronic search in PubMed (1948–July 2012), EMBASE Excerpta Medica (1980–July 2012), CINAHL (1982–July 2012), Web of Science (1945–July 2012), Scopus (1996–July 2012), and Cochrane Library (1993–July 2012) was performed. Additionally, a hand search of the reference lists of included studies was performed to identify potentially eligible studies. There was no language restriction.

Study selection

The patient, intervention, comparator, outcome (PICO) question formulated for this study was as follows: for patients who need orthodontic treatment (P), will the use of record set X (I) compared with record set Y (C) change the treatment plan (O)? Only primary publications were included.

Data extraction

Independent extraction of data and quality assessment was performed by two observers.

Results

Of the 1041 publications retrieved, 17 met the inclusion criteria. Of these, 4 studies were of high quality. Because of the limited number of high quality studies and the differences in study designs, patient characteristics, and reference standard or index test, a meta-analysis was not possible.

Conclusion

Cephalograms are not routinely needed for orthodontic treatment planning in Class II malocclusions, digital models can be used to replace plaster casts, and cone-beam computed tomography radiographs can be indicated for impacted canines. Based on the findings of this review, the minimum record set required for orthodontic diagnosis and treatment planning could not be defined.

Systematic review registration number

CRD42012002365  相似文献   

11.

Background

Codon usage plays a crucial role when recombinant proteins are expressed in different organisms. This is especially the case if the codon usage frequency of the organism of origin and the target host organism differ significantly, for example when a human gene is expressed in E. coli. Therefore, to enable or enhance efficient gene expression it is of great importance to identify rare codons in any given DNA sequence and subsequently mutate these to codons which are more frequently used in the expression host.

Results

We describe an open-source web-based application, ATGme, which can in a first step identify rare and highly rare codons from most organisms, and secondly gives the user the possibility to optimize the sequence.

Conclusions

This application provides a simple user-friendly interface utilizing three optimization strategies: 1. one-click optimization, 2. bulk optimization (by codon-type), 3. individualized custom (codon-by-codon) optimization. ATGme is an open-source application which is freely available at: http://atgme.org  相似文献   

12.
W Zhang  Y Niu  Y Xiong  M Zhao  R Yu  J Liu 《PloS one》2012,7(8):e43575

Motivation

The conformational B-cell epitopes are the specific sites on the antigens that have immune functions. The identification of conformational B-cell epitopes is of great importance to immunologists for facilitating the design of peptide-based vaccines. As an attempt to narrow the search for experimental validation, various computational models have been developed for the epitope prediction by using antigen structures. However, the application of these models is undermined by the limited number of available antigen structures. In contrast to the most of available structure-based methods, we here attempt to accurately predict conformational B-cell epitopes from antigen sequences.

Methods

In this paper, we explore various sequence-derived features, which have been observed to be associated with the location of epitopes or ever used in the similar tasks. These features are evaluated and ranked by their discriminative performance on the benchmark datasets. From the perspective of information science, the combination of various features can usually lead to better results than the individual features. In order to build the robust model, we adopt the ensemble learning approach to incorporate various features, and develop the ensemble model to predict conformational epitopes from antigen sequences.

Results

Evaluated by the leave-one-out cross validation, the proposed method gives out the mean AUC scores of 0.687 and 0.651 on two datasets respectively compiled from the bound structures and unbound structures. When compared with publicly available servers by using the independent dataset, our method yields better or comparable performance. The results demonstrate the proposed method is useful for the sequence-based conformational epitope prediction.

Availability

The web server and datasets are freely available at http://bcell.whu.edu.cn.  相似文献   

13.

Background

Internet support groups (ISGs) are popular, particularly among people with depression, but there is little high quality evidence concerning their effectiveness.

Aim

The study aimed to evaluate the efficacy of an ISG for reducing depressive symptoms among community members when used alone and in combination with an automated Internet-based psychotherapy training program.

Method

Volunteers with elevated psychological distress were identified using a community-based screening postal survey. Participants were randomised to one of four 12-week conditions: depression Internet Support Group (ISG), automated depression Internet Training Program (ITP), combination of the two (ITP+ISG), or a control website with delayed access to e-couch at 6 months. Assessments were conducted at baseline, post-intervention, 6 and 12 months.

Results

There was no change in depressive symptoms relative to control after 3 months of exposure to the ISG. However, both the ISG alone and the combined ISG+ITP group showed significantly greater reduction in depressive symptoms at 6 and 12 months follow-up than the control group. The ITP program was effective relative to control at post-intervention but not at 6 months.

Conclusions

ISGs for depression are promising and warrant further empirical investigation.

Trial Registration

Controlled-Trials.com ISRCTN65657330  相似文献   

14.

Background

The serine/threonine kinase PIM2 is highly expressed in human leukemia and lymphomas and has been shown to positively regulate survival and proliferation of tumor cells. Its diverse ATP site makes PIM2 a promising target for the development of anticancer agents. To date our knowledge of catalytic domain structures of the PIM kinase family is limited to PIM1 which has been extensively studied and which shares about 50% sequence identity with PIM2.

Principal Findings

Here we determined the crystal structure of PIM2 in complex with an organoruthenium complex (inhibition in sub-nanomolar level). Due to its extraordinary shape complementarity this stable organometallic compound is a highly potent inhibitor of PIM kinases.

Significance

The structure of PIM2 revealed several differences to PIM1 which may be explored further to generate isoform selective inhibitors. It has also demonstrated how an organometallic inhibitor can be adapted to the binding site of protein kinases to generate highly potent inhibitors.

Enhanced version

This article can also be viewed as an enhanced version in which the text of the article is integrated with interactive 3D representations and animated transitions. Please note that a web plugin is required to access this enhanced functionality. Instructions for the installation and use of the web plugin are available in Text S1.  相似文献   

15.

Background

The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.

Results

Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.

Conclusions

CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.  相似文献   

16.

Background

Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.

Results

In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction.

Conclusions

The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.  相似文献   

17.
18.

Background

The goal of haplotype assembly is to infer haplotypes of an individual from a mixture of sequenced chromosome fragments. Limited lengths of paired-end sequencing reads and inserts render haplotype assembly computationally challenging; in fact, most of the problem formulations are known to be NP-hard. Dimensions (and, therefore, difficulty) of the haplotype assembly problems keep increasing as the sequencing technology advances and the length of reads and inserts grow. The computational challenges are even more pronounced in the case of polyploid haplotypes, whose assembly is considerably more difficult than in the case of diploids. Fast, accurate, and scalable methods for haplotype assembly of diploid and polyploid organisms are needed.

Results

We develop a novel framework for diploid/polyploid haplotype assembly from high-throughput sequencing data. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure – namely, the low rank of the underlying solution – to solve it rapidly and with high accuracy. The developed framework is applicable to both diploid and polyploid species. The code for SDhaP is freely available at https://sourceforge.net/projects/sdhap.

Conclusion

Extensive benchmarking tests on both real and simulated data show that the proposed algorithms outperform several well-known haplotype assembly methods in terms of either accuracy or speed or both. Useful recommendations for coverages needed to achieve near-optimal solutions are also provided.  相似文献   

19.

Background

The lack of early biomarkers for acute kidney injury (AKI) seriously inhibits the initiation of preventive and therapeutic measures for this syndrome in a timely manner. We tested the hypothesis that insulin-like growth factor-binding protein 7 (IGFBP7) and tissue inhibitor of metalloproteinases-2 (TIMP-2), both inducers of G1 cell cycle arrest, function as early biomarkers for AKI after congenital heart surgery with cardiopulmonary bypass (CPB).

Methods

We prospectively studied 51 children undergoing cardiac surgery with CPB. Serial urine samples were analyzed for [TIMP-2]•[IGFBP7]. The primary outcome measure was AKI defined by the pRIFLE criteria within 72 hours after surgery.

Results

12 children (24%) developed AKI within 1.67 (SE 0.3) days after surgery. Children who developed AKI after cardiac surgery had a significant higher urinary [TIMP-2]•[IGFBP7] as early as 4 h after the procedure, compared to children who did not develop AKI (mean of 1.93 ((ng/ml)2/1000) (SE 0.4) vs 0.47 ((ng/ml)2/1000) (SE 0.1), respectively; p<0.05). Urinary [TIMP-2]•[IGFBP7] 4 hours following surgery demonstrated an area under the receiver-operating characteristic curve of 0.85. Sensitivity was 0.83, and specificity was 0.77 for a cutoff value of 0.70 ((ng/ml)2/1000).

Conclusions

Urinary [TIMP-2]•[IGFBP7] represent sensitive, specific, and highly predictive early biomarkers for AKI after surgery for congenital heart disease.

Trial Registration

www.germanctr.de/, DRKS00005062  相似文献   

20.

Background

Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled.

Results

The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used.

Conclusions

FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号