首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We undertook this project in response to the rapidly increasing number of protein structures with unknown functions in the Protein Data Bank. Here, we combined a genetic algorithm with a support vector machine to predict protein–protein binding sites. In an experiment on a testing dataset, we predicted the binding sites for 66% of our datasets, made up of 50 testing hetero-complexes. This classifier achieved greater sensitivity (60.17%), specificity (58.17%), accuracy (64.08%), and F-measure (54.79%), and a higher correlation coefficient (0.2502) than those of the support vector machine. This result can be used to guide biologists in designing specific experiments for protein analysis.  相似文献   

3.
随机森林方法预测膜蛋白类型   总被引:2,自引:0,他引:2  
膜蛋白的类型与其功能是密切相关的,因此膜蛋白类型的预测是研究其功能的重要手段,从蛋白质的氨基酸序列出发对膜蛋白的类型进行预测有重要意义。文章基于蛋白质的氨基酸序列,将组合离散增量和伪氨基酸组分信息共同作为预测参数,采用随机森林分类器,对8类膜蛋白进行了预测。在Jackknife检验下的预测精度为86.3%,独立检验的预测精度为93.8%,取得了好于前人的预测结果。  相似文献   

4.

Background

There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses.

Methodology/Principal Findings

We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature.

Conclusions/Significance

Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.  相似文献   

5.

Background

Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well-suited for this task. The classification of P-type ATPases, a large family of ATP-driven membrane pumps transporting essential cations, was selected as a test-case that would generate important biological information as well as provide a proof-of-concept for the application of SLR to a large scale bioinformatics problem.

Results

Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known sequences, we analysed 9.3 million sequences in the UniProtKB and attempted to classify a large number of P-type ATPases. To examine the distribution of pumps on organisms, we also applied SLR to 1,123 complete genomes from the Entrez genome database. Finally, we analysed the predicted membrane topology of the identified P-type ATPases.

Conclusions

Using the SLR-based classification tool we are able to run a large scale study of P-type ATPases. This study provides proof-of-concept for the application of SLR to a bioinformatics problem and the analysis of P-type ATPases pinpoints new and interesting targets for further biochemical characterization and structural analysis.  相似文献   

6.
副粘病毒F1蛋白胞外非保守区对其特异性膜融合的影响   总被引:2,自引:0,他引:2  
为了解融合蛋白F1分子的胞外非保守区在融合蛋白(F)与血凝素.神经氨酸酶(HN)的特异性膜融合中的作用,采用基因定点突变方法,在新城疫病毒(NDV)F1与人副流感病毒(hPIV)F1基因的胞外非保守区进行定点突变,创造酶切位点,得到分别含3个相同酶切位点的突变株NDV-M和hPIV-M。经检测,突变体的细胞融合功能与野毒株相同。然后用3个限制性内切酶分别从NDV-M与hPIV-M中切出两个片段NDVF-1、F-2及hPIVF-1、F-2。NDV-M和hPIV-M相互交换对应的F-1片段后进行基因重组,得到2个嵌合体(Chimera),即NDV-C1和hPIV-C1;同样方法交换F-2片段后又得到2个嵌合体NDV-C2和hPIV-C2。将各种嵌合体DNA与同源及异源HN基因共转染BHK21细胞后,在真核细胞中表达。Giemsa染色和指示基因法检测细胞融合功能,荧光强度分析(FACS)检测F蛋白的表达效率。结果表明,突变体:NDV-M和hPIV-M的细胞融合功能与野毒株相同,可用于构建嵌合体。NDV-1C和NDV—C2分别与NDV HN共表达后,融合功能达到野毒株的76.34%和96.2%,与hPIV HN共表达后均无细胞融合发生;hPIV-C1和hPIV—C2分别与hPIV HN共表达后,融合功能达到野毒株的65.82%和93.78%,与NDV HN共表达后无细胞融合发生。FACS分析表明,突变体及所有嵌合体蛋白F的表达效率与野毒株相比均没有明显变化。实验结果说明在F1蛋白的胞外非保守区中,NDV F-1和hPIV F-1这两个片段对于NDV和hPIV的特异性膜融合具有重要作用;而NDV F-2和hPIV F-2这两个片段对于NDV和hPIV的膜融合来讲,则特异性较低。  相似文献   

7.

Motivation

Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is often the binding to globular protein domains via sequence elements known as molecular recognition features (MoRFs). Development of computational tools for the identification of candidate MoRF locations in amino acid sequences is an important task and an area of growing interest. Given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we introduce MoRFCHiBi_Web, which predicts MoRF locations in protein sequences with higher accuracy compared to current MoRF predictors.

Methods

Three distinct and largely independent property scores are computed with component predictors and then combined to generate the final MoRF propensity scores. The first score reflects the likelihood of sequence windows to harbour MoRFs and is based on amino acid composition and sequence similarity information. It is generated by MoRFCHiBi using small windows of up to 40 residues in size. The second score identifies long stretches of protein disorder and is generated by ESpritz with the DisProt option. Lastly, the third score reflects residue conservation and is assembled from PSSM files generated by PSI-BLAST. These propensity scores are processed and then hierarchically combined using Bayes rule to generate the final MoRFCHiBi_Web predictions.

Results

MoRFCHiBi_Web was tested on three datasets. Results show that MoRFCHiBi_Web outperforms previously developed predictors by generating less than half the false positive rate for the same true positive rate at practical threshold values. This level of accuracy paired with its relatively high processing speed makes MoRFCHiBi_Web a practical tool for MoRF prediction.

Availability

http://morf.chibi.ubc.ca:8080/morf/.  相似文献   

8.
This article is in the area of protein sequence investigation. It studies protein sequence periodicity. The notion of latent periodicity is introduced. A mathematical method for searching for latent periodicity in protein sequences is developed. Implementation of the method developed for known cases of perfect and imperfect periodicity is demonstrated. Latent periodicity of many protein sequences from the SWISS-PROT data bank is revealed by the method and examples of latent periodicity of amino acid sequences are demonstrated for: the translation initiation factor EIF-2B (epsilon subunit) of Saccharomyces cerevisiae from the E2BE_YEAST sequence; the E.coli ferrienterochelin receptor from the FEPA_ECOLI sequence; the lysozyme of Bacteriophage SF6 from the LY_BPSF6 sequence; lipoamide dehydrogenase of Azotobacter vinelandii from the DLDH_AZOVI sequence. These protein sequences have latent periods equal to six, two, seven and 19 amino acids, respectively. We propose that a possible purpose of the amino acid sequence latent periodicity is to determine certain protein structures.  相似文献   

9.
A protein was isolated and purified from the ventral portion of the Potca fish, Tetraodon patoca. The method was accomplished by gel filtration of crude protein extract on Sephadex G-50 followed by Ion exchange chromatography on DEAE-cellulose and finally by affinity chromatography on ConA-Sepharose matrix. The molecular weight of the protein, determined by the gel filtration and SDS-PAGE was about 82,000 and 80,000 respectively, but 42,000 and 38,000 were indicated by SDS-PAGE in the presence of 2-mercaptoethanol. The protein agglutinated rat red blood cells and in a haptein-inhibition test, the protein was inhibited specifically by the d-mannose and mannose containing saccharides. The protein is glycoprotein with neutral sugar content of about 0.35%. The purified protein also showed strong cytotoxic effects, which was performed by brine shrimp lethality bioassay and histopathological examinations. The N-terminal amino acid sequences of both the subunits of the protein were also identified and used a blast search on N-terminal amino acid sequences of the subunits revealed that the protein showed significant homology with the homologous proteins in database.  相似文献   

10.
Located on Chromosome 6p21, classical human leukocyte antigen genes are highly polymorphic. HLA alleles associate with a variety of phenotypes, such as narcolepsy, autoimmunity, as well as immunologic response to infectious disease. Moreover, high resolution genotyping of these loci is critical to achieving long-term survival of allogeneic transplants. Development of methods to obtain high resolution analysis of HLA genotypes will lead to improved understanding of how select alleles contribute to human health and disease risk. Genomic DNAs were obtained from a cohort of n = 383 subjects recruited as part of an Ulcerative Colitis study and analyzed for HLA-DRB1. HLA genotypes were determined using sequence specific oligonucleotide probes and by next-generation sequencing using the Roche/454 GSFLX instrument. The Clustering and Alignment of Polymorphic Sequences (CAPSeq) software application was developed to analyze next-generation sequencing data. The application generates HLA sequence specific 6-digit genotype information from next-generation sequencing data using MUMmer to align sequences and the R package diffusionMap to classify sequences into their respective allelic groups. The incorporation of Bootstrap Aggregating, Bagging to aid in sorting of sequences into allele classes resulted in improved genotyping accuracy. Using Bagging iterations equal to 60, the genotyping results obtained using CAPSeq when compared with sequence specific oligonucleotide probe characterized 4-digit genotypes exhibited high rates of concordance, matching at 759 out of 766 (99.1%) alleles.  相似文献   

11.
When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the “seedâ€� regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.  相似文献   

12.

Introduction  

Proteomic characterization of the human pancreatic islets, containing the insulin producing beta-cells, is likely to be of great importance for improved treatment and understanding of the pathophysiology of diabetes mellitus.  相似文献   

13.
The correct topology and orientation of integral membrane proteins are essential for their proper function, yet such information has not been established for many membrane proteins. A simple technique called fluorescence protease protection (FPP) is presented, which permits the determination of membrane protein topology in living cells. This technique has numerous advantages over other methods for determining protein topology, in that it does not require the availability of multiple antibodies against various domains of the membrane protein, does not require large amounts of protein, and can be performed on living cells. The FPP method employs the spatially confined actions of proteases on the degradation of green fluorescent protein (GFP) tagged membrane proteins to determine their membrane topology and orientation. This simple approach is applicable to a wide variety of cell types, and can be used to determine membrane protein orientation in various subcellular organelles such as the mitochondria, Golgi, endoplasmic reticulum and components of the endosomal/recycling system. Membrane proteins, tagged on either the N-termini or C-termini with a GFP fusion, are expressed in a cell of interest, which is subject to selective permeabilization using the detergent digitonin. Digitonin has the ability to permeabilize the plasma membrane, while leaving intracellular organelles intact. GFP moieties exposed to the cytosol can be selectively degraded through the application of protease, whereas GFP moieties present in the lumen of organelles are protected from the protease and remain intact. The FPP assay is straightforward, and results can be obtained rapidly.  相似文献   

14.
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.  相似文献   

15.
The envelope and precursor membrane (prM) proteins of dengue virus (DENV) are present on the surface of immature virions. During maturation, prM protein is cleaved by furin protease into pr peptide and membrane (M) protein. Although previous studies mainly focusing on the pr region have identified several residues important for DENV replication, the functional role of M protein, particularly the α-helical domain (MH), which is predicted to undergo a large conformational change during maturation, remains largely unknown. In this study, we investigated the role of nine highly conserved MH domain residues in the replication cycle of DENV by site-directed mutagenesis in a DENV1 prME expression construct and found that alanine substitutions introduced to four highly conserved residues at the C terminus and one at the N terminus of the MH domain greatly affect the production of both virus-like particles and replicon particles. Eight of the nine alanine mutants affected the entry of replicon particles, which correlated with the impairment in prM cleavage. Moreover, seven mutants were found to have reduced prM-E interaction at low pH, which may inhibit the formation of smooth immature particles and exposure of prM cleavage site during maturation, thus contributing to inefficient prM cleavage. Taken together, these results are the first report showing that highly conserved MH domain residues, located at 20–38 amino acids downstream from the prM cleavage site, can modulate the prM cleavage, maturation of particles, and virus entry. The highly conserved nature of these residues suggests potential targets of antiviral strategy.  相似文献   

16.
17.
The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of phylogenetic analysis using compositional statistics for all protein sequences from complete genomes. This new method is conceptually simpler than and computationally as fast as the one proposed by Qi et al. (2004b) and Chu et al. (2004). The same data sets used in Qi et al. (2004b) and Chu et al. (2004) are analyzed using the new method. Our distance-based phylogenic tree of the 109 prokaryotes and eukaryotes agrees with the biologists tree of life based on 16S rRNA comparison in a predominant majority of basic branching and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated to two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution.Reviewing Editor: Dr. John Oakeshott  相似文献   

18.
19.

Background  

Proteins have evolved subject to energetic selection pressure for stability and flexibility. Structural similarity between proteins that have gone through conformational changes can be captured effectively if flexibility is considered. Topologically unrelated proteins that preserve secondary structure packing interactions can be detected if both flexibility and Sequential permutations are considered. We propose the FlexSnap algorithm for flexible non-topological protein structural alignment.  相似文献   

20.
杨子恒 《遗传学报》1994,21(3):198-200
本文考察了目前采用的估计同源蛋白质序列间进化距离的方法缺陷,并提出了几个新的计算公式,它们考虑了氨基酸位点间显然存在的替代速率的差异。另外,提出了一种考虑氨基酸间不同替代概率的最大似然估计方法。文中对这些公式进行了计算比较,并对它在实际中的运用提出了建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号