首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
2.
InterPro was developed as a new integrated documentation resource for protein families, domains and functional sites to rationalize the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects and has applications in computational functional classification of newly determined sequences lacking biochemical characterization and in comparative genome analysis. InterPro contains over 3500 entries, with more than 1000000 hits in SWISS-PROT and TrEMBL. The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. InterPro was used for whole proteome analysis of the pathogenic microorganism, Mycobacterium tuberculosis, and comparison with the predicted protein coding sequences of the complete genomes of Bacillus subtilis and Escherichia coli. 64.8% of the M. tuberculosis proteins in the proteome matched InterPro entries, and these could be classified according to function. The comparison with B. subtilis and E. coli provided information on the most common protein families and domains, and the most highly represented families in each organism. InterPro thus provides a useful tool for global views of whole proteomes and their compositions.  相似文献   

3.
Borich SM  Murray A  Gormley E 《Microbios》2000,102(401):7-15
A Mycobacterium bovis gene coding for a putative MalE maltose binding protein was cloned and its full-length sequence determined. Database searches revealed 99.9% identity with IpqY, encoding a putative sugar uptake protein from Mycobacterium tuberculosis strain H37Rv. The deduced protein product showed high sequence similarity to MalE-like proteins from a variety of bacterial species, including Mycobacterium leprae. Analysis of flanking database sequences from M. tuberculosis and M. leprae revealed the presence of malF-, malG- and malK-like genes. Comparison of these mycobacterial sequences with other maltose operons has allowed us to deduce a unique genomic arrangement of the genes involved in the uptake of maltose in members of the Mycobacterium tuberculosis complex and M. leprae.  相似文献   

4.
Mycobacterium tuberculosis, the etiologic agent of tuberculosis (TB) possesses at least five genes predicted to encode proteins with NlpC/P60 hydrolase domains, including the relatively uncharacterized Rv2190c. As NlpC/P60 domain-containing proteins are associated with diverse roles in bacterial physiology, our objective was to characterize Rv2190c in M. tuberculosis growth and virulence. Our data indicate that lack of Rv2190c is associated with impaired growth, both in vitro and during an in vivo mouse model of TB. These growth defects are associated with altered colony morphology and phthiocerol dimycocerosate levels, indicating that Rv2190c is involved in cell wall maintenance and composition. In addition, we have demonstrated that Rv2190c is expressed during active growth phase and that its protein product is immunogenic during infection. Our findings have significant implications, both for better understanding the role of Rv2190c in M. tuberculosis biology and also for translational developments.  相似文献   

5.
Interpreting genome sequences requires the functional analysis of thousands of predicted proteins, many of which are uncharacterized and without obvious homologs. To assess whether the roles of large sets of uncharacterized genes can be assigned by targeted application of a suite of technologies, we used four complementary protein-based methods to analyze a set of 100 uncharacterized but essential open reading frames (ORFs) of the yeast Saccharomyces cerevisiae. These proteins were subjected to affinity purification and mass spectrometry analysis to identify copurifying proteins, two-hybrid analysis to identify interacting proteins, fluorescence microscopy to localize the proteins, and structure prediction methodology to predict structural domains or identify remote homologies. Integration of the data assigned function to 48 ORFs using at least two of the Gene Ontology (GO) categories of biological process, molecular function, and cellular component; 77 ORFs were annotated by at least one method. This combination of technologies, coupled with annotation using GO, is a powerful approach to classifying genes.  相似文献   

6.
Understanding and characterizing the biochemical and evolutionary information within the wealth of protein sequence and structural data, particularly at functionally important sites, is very important. A comprehensive analysis of physico-chemical properties and evolutionary conservation patterns at the molecular and biological function level is expected to yield important clues for identifying similar sites in as-yet uncharacterized proteins. We present a library of protein functional templates (PFTs) designed to represent the compositional and evolutionary conservation patterns of functional sites at the molecular and biological function level. Subsequently we developed LIMACS (LInear MAtching of Conservation Scores), a software tool that uses the template library for the prediction of functionally important sites in a multiple sequence alignment, transferring the molecular function annotation from the most-similar functional site in the template library to a predicted site.  相似文献   

7.
Lengthy co-evolution of Homo sapiens and Mycobacterium tuberculosis, the main causative agent of tuberculosis, resulted in a dramatically successful pathogen species that presents considerable challenge for modern medicine. The continuous and ever increasing appearance of multi-drug resistant mycobacteria necessitates the identification of novel drug targets and drugs with new mechanisms of action. However, further insights are needed to establish automated protocols for target selection based on the available complete genome sequences. In the present study, we perform complete proteome level comparisons between M. tuberculosis, mycobacteria, other prokaryotes and available eukaryotes based on protein domains, local sequence similarities and protein disorder. We show that the enrichment of certain domains in the genome can indicate an important function specific to M. tuberculosis. We identified two families, termed pkn and PE/PPE that stand out in this respect. The common property of these two protein families is a complex domain organization that combines species-specific regions, commonly occurring domains and disordered segments. Besides highlighting promising novel drug target candidates in M. tuberculosis, the presented analysis can also be viewed as a general protocol to identify proteins involved in species-specific functions in a given organism. We conclude that target selection protocols should be extended to include proteins with complex domain architectures instead of focusing on sequentially unique and essential proteins only.  相似文献   

8.

Background  

Signal transduction events often involve transient, yet specific, interactions between structurally conserved protein domains and polypeptide sequences in target proteins. The identification and validation of these associating domains is crucial to understand signal transduction pathways that modulate different cellular or developmental processes. Bioinformatics strategies to extract and integrate information from diverse sources have been shown to facilitate the experimental design to understand complex biological events. These methods, primarily based on information from high-throughput experiments, have also led to the identification of new connections thus providing hypothetical models for cellular events. Such models, in turn, provide a framework for directing experimental efforts for validating the predicted molecular rationale for complex cellular processes. In this context, it is envisaged that the rational design of peptides for protein-peptide binding studies could substantially facilitate the experimental strategies to evaluate a predicted interaction. This rational design procedure involves the integration of protein-protein interaction data, gene ontology, physico-chemical calculations, domain-domain interaction data and information on functional sites or critical residues.  相似文献   

9.
G Schneider 《Gene》1999,237(1):113-121
Artificial neural networks were trained on the prediction of the subcellular location of bacterial proteins. A cross-validated average prediction accuracy of 93% was reached for distinction between cytoplasmic and non-cytoplasmic proteins, based on the analysis of protein amino-acid composition. Principal component analysis and self-organizing maps were used to create graphical representations of amino-acid sequence space. A clear separation of cytoplasmic, periplasmic, and extracellular proteins was observed. The neural network system was applied to predicting potentially secreted proteins in 15 complete genomes. For mesophile bacteria the predicted fractions of non-cytoplasmic proteins agree with previously published estimates, ranging between 15% and 30%. Characteristics of thermophile genomes might lead to an under-estimation of the fraction of secreted proteins by presently available prediction systems. A self-organizing map was constructed from all 15 bacterial genomes. This technique can reveal additional sequence features independent from exhaustive pair-wise sequence alignment. The Treponema pallidum and Mycobacterium tuberculosis data formed separate clusters indicating unusual characteristics of these genomes.  相似文献   

10.
We have determined the complete nucleotide sequence for TEF-1, one of three genes coding for elongation factor (EF)-1 alpha in Mucor racemosus. The deduced EF-1 alpha protein contains 458 amino acids encoded by two exons. The presence of an intervening sequence located near the 3' end of the gene was predicted by the nucleotide sequence data and confirmed by alkaline S1 nuclease mapping. The amino acid sequence of EF-1 alpha was compared to the published amino acid sequences of EF-1 alpha proteins from Saccharomyces cerevisiae and Artemia salina. These proteins shared nearly 85% homology. A similar comparison to the functionally analogous EF-Tu from Escherichia coli revealed several regions of amino acid homology suggesting that the functional domains are conserved in elongation factors from these diverse organisms. Secondary structure predictions indicated that alpha helix and beta sheet conformations associated with the functional domains in EF-Tu are present in the same relative location in EF-1 alpha from M. racemosus. Through this comparative structural analysis we have predicted the general location of functional domains in EF-1 alpha which interact with GTP and tRNA.  相似文献   

11.
12.
Fliess A  Motro B  Unger R 《Proteins》2002,48(2):377-387
An important question in protein evolution is to what extent proteins may have undergone swaps (switches of domain or fragment order) during evolution. Such events might have occurred in several forms: Swaps of short fragments, swaps of structural and functional motifs, or recombination of domains in multidomain proteins. This question is important for the theoretical understanding of the evolution of proteins, and has practical implications for using swaps as a design tool in protein engineering. In order to analyze the question systematically, we conducted a large scale survey of possible swaps and permutations among all pairs of protein from the Swissport database. A swap is defined as a specific kind of sequence mutation between two proteins in which two fragments that appear in both sequences have different relative order in the two sequences. For example, aXbYc and dYeXf are defined as a swap, where X and Y represent sequence fragments that switched their order. Identifying such swaps is difficult using standard sequence comparison packages. One of the main problems in the analysis stems from the fact that many sequences contain repeats, which may be identified as false-positive swaps. We have used two different approaches to detect pairs of proteins with swaps. The first approach is based on the predefined list of domains in Pfam. We identified all the proteins that share at least two domains and analyzed their relative order, looking for pairs in which the order of these domains was switched. We designed an algorithm to distinguish between real swaps and duplications. In the second approach, we used Blast to detect pairs of proteins that share several fragments. Then, we used an automatic procedure to select pairs that are likely to contain swaps. Those pairs were analyzed visually, using a graphical tool, to eliminate duplications. Combining these approaches, about 140 different cases of swaps in the Swissprot database were found (after eliminating multiple pairs within the same family). Some of the cases have been described in the literature, but many are novel examples. Although each new example identified may be interesting to analyze, our main conclusion is that cases of swaps are rare in protein evolution. This observation is at odds with the common view that proteins are very modular to the point that modules (e.g., domains) can be shuffled between proteins with minimal constraints. Our study suggests that sequential constraints, i.e., the relative order between domains, are highly conserved.  相似文献   

13.
14.
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity.Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.  相似文献   

15.
MOTIVATION: The completion of the Arabidopsis genome offers the first opportunity to analyze all of the membrane protein sequences of a plant. The majority of integral membrane proteins including transporters, channels, and pumps contain hydrophobic alpha-helices and can be selected based on TransMembrane Spanning (TMS) domain prediction. By clustering the predicted membrane proteins based on sequence, it is possible to sort the membrane proteins into families of known function, based on experimental evidence or homology, or unknown function. This provides a way to identify target sequences for future functional analysis. RESULTS: An automated approach was used to select potential membrane protein sequences from the set of all predicted proteins and cluster the sequences into related families. The recently completed sequence of Arabidopsis thaliana, a model plant, was analyzed. Of the 25,470 predicted protein sequences 4589 (18%) were identified as containing two or more membrane spanning domains. The membrane protein sequences clustered into 628 distinct families containing 3208 sequences. Of these, 211 families (1764 sequences) either contained proteins of known function or showed homology to proteins of known function in other species. However, 417 families (1444 sequences) contained only sequences with no known function and no homology to proteins of known function. In addition, 1381 sequences did not cluster with any family and no function could be assigned to 1337 of these.  相似文献   

16.
Standley DM  Toh H  Nakamura H 《Proteins》2008,72(4):1333-1351
A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized.  相似文献   

17.
18.
Marassi FM 《Proteins》2011,79(10):2946-2955
The Mycobacterium tuberculosis membrane protein Rv0899 confers adaptation of the bacterium to acidic environments. Due to strong sequence homology of its C-terminus to bacterial OmpA-like domains, Rv0899 has been proposed to constitute an outer membrane porin of M. tuberculosis. However, OmpA-like domains are widespread in a wide variety of bacterial proteins with different functions. Furthermore, the three-dimensional structure of Rv0899 does not contain a transmembrane β-barrel, and recent evidence demonstrates that it does not have porin activity. Instead, the rv0899 gene is part of an operon (rv0899-rv0901) that is required for fast ammonia secretion, pH neutralization, and growth of M. tuberculosis in acidic environments. The mechanism whereby these functions are accomplished is not known. To gain further functional insights, a targeted search of the genomic databases was performed for proteins with sequence similarity beyond the OmpA-like C-terminus. The results presented here, show that Rv0899-like proteins are widespread in bacteria with functions in nitrogen metabolism, adaptation to nutrient poor environments, and/or establishing symbiosis with the host organism, and appear to form a protein family. These findings suggest that M. tuberculosis Rv0899 may also assist similar processes and lend further support to its role in ammonia secretion and M. tuberculosis adaptation to the host environment.  相似文献   

19.
20.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号