首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins.  相似文献   

2.
Approximately 50% of the predicted protein-coding genes of the Trypanosoma cruzi CL Brener strain are annotated as hypothetical or conserved hypothetical proteins. To further characterize these genes, we generated 1161 open-reading frame expressed sequence tags (ORESTES) from the mammalian stages of the VL10 human strain. Sequence clustering resulted in 435 clusters, consisting of 339 singletons and 96 contigs. Significant matches to the T. cruzi predicted gene database were found for ~94% contigs and ~69% singletons. These included genes encoding surface proteins, known to be intensely expressed in the parasite mammalian stages and implicated in host cell invasion and/or immune evasion mechanisms. Among 151 contigs and singletons with similarity to predicted hypothetical protein-coding genes and conserved hypothetical protein-coding genes, 83% showed no match with T. cruzi EST and/or proteome databases. These ORESTES are the first experimental evidence that the corresponding genes are in fact transcribed. Sequences with no significant match were searched against several T. cruzi and National Center for Biotechnology Information non-redundant sequence databases. The ORESTES analysis indicated that 124 predicted conserved hypothetical protein-coding genes and 27 predicted hypothetical protein-coding genes annotated in the CL Brener genome are transcribed in the VL10 mammalian stages. Six ORESTES annotated as hypothetical protein-coding genes showing no match to EST and/or proteome databases were confirmed by Northern blot in VL10. The generation of this set of ORESTES complements the T. cruzi genome annotation and suggests new stage-regulated genes encoding hypothetical proteins.  相似文献   

3.
Nature selected certain regions of the genome for encoding proteins. Most of the sequences were used to encode only RNA. What happened to the remaining sections of the genome? It is possible that some sequences were retired and retained as non-functional entities called pseudogenes. Though several evolutionary prospects with functional endpoints exist, we looked at the possibility of hypothetical proteins correlating with the emergence of pseudogenes and potential of such genes to make novel synthetic molecules. In this commentary, we consider two key aspects: (1) does any correlation exist between hypothetical proteins and pseudogenes and (2)—can we make novel and functional proteins from pseudogenes?  相似文献   

4.
5.
Assigning functions to proteins of unknown function is of considerable interest to the proteomic researchers as the genes encoding them are conserved over various species. Here, we describe HypoDB, a database of hypothetical genes and proteins in six eukaryotes. The database was collected and organized based on the number of entries in each chromosome with few annotations. Hypothetical protein database contains information related to gene and protein sequences, chromosome number and location, secondary and tertiary structure related data. AVAILABILITY: The database is available for free at http://www.trimslabs.com/database/hypodb/index.html.  相似文献   

6.
7.
After 50 years of analysing Neurospora crassa genes one by one large scale sequence analysis has increased the number of accessible genes tremendously in the last few years. Being the only filamentous fungus for which a comprehensive genomic sequence database is publicly accessible N. crassa serves as the model for this important group of microorganisms. The MIPS N. crassa database currently holds more than 16 Mb of non-redundant data of the chromosomes II and V analysed by the German Neurospora Genome Project. This represents more than one-third of the genome. Open reading frames (ORFs) have been extracted from the sequence and the deduced proteins have been annotated extensively. They are classified according to matches in sequence databases and attributed to functional categories according to their relatives. While 41% of analysed proteins are related to known proteins, 30% are hypothetical proteins with no match to a database entry. The entire genome is expected to comprise some 13000 protein coding genes, more than twice as many as found in yeasts, and reflects the high potential of filamentous fungi to cope with various environmental conditions.  相似文献   

8.

Background

Although the human genome database has been completed a decade ago, ∼50% of the proteome remains hypothetical as their functions are unknown. The elucidation of the functions of these hypothetical proteins can lead to additional protein pathways and revelation of new cascades. However, many of these inferences are limited to proteins with substantial sequence similarity. Of particular interest here is the Tectonin domain-containing family of proteins.

Methodology/Principal Findings

We have identified hTectonin, a hypothetical protein in the human genome database, as a distant ortholog of the limulus galactose binding protein (GBP). Phylogenetic analysis revealed strong evolutionary conservation of hTectonin homologues from parasite to human. By computational analysis, we showed that both the hTectonin and GBP form β-propeller structures with multiple Tectonin domains, each containing β-sheets of 4 strands per β-sheet. hTectonin is present in the human leukocyte cDNA library and immune-related cell lines. It interacts with M-ficolin, a known human complement protein whose ancient homolog, carcinolectin (CL5), is the functional protein partner of GBP during infection. Yeast 2-hybrid assay showed that only the Tectonin domains of hTectonin recognize the fibrinogen-like domain of the M-ficolin. Surface plasmon resonance analysis showed real-time interaction between the Tectonin domains 6 & 11 and bacterial LPS, indicating that despite forming 2 β-propellers with its different Tectonin domains, the hTectonin molecule could precisely employ domains 6 & 11 to recognise bacteria.

Conclusions/Significance

By virtue of a recent finding of another Tectonin protein, leukolectin, in the human leukocyte, and our structure-function analysis of the hypothetical hTectonin, we propose that Tectonin domains of proteins could play a vital role in innate immune defense, and that this function has been conserved over several hundred million years, from invertebrates to vertebrates. Furthermore, the approach we have used could be employed in unraveling the characteristics and functions of other hypothetical proteins in the human proteome.  相似文献   

9.

Background  

The rapid completion of genome sequences has created an infrastructure of biological information and provided essential information to link genes to gene products, proteins, the building blocks for cellular functions. In addition, genome/cDNA sequences make it possible to predict proteins for which there is no experimental evidence. Clues for function of hypothetical proteins are provided by sequence similarity with proteins of known function in model organisms.  相似文献   

10.
Domain database is essential for domain property research. Eliminating redundant information in database query is very important for database quality. Here we report the manual construction of a non-redundant human SH2 domain database. There are 119 human SH2 domains in 110 SH2-containing proteins. Human SH2s were aligned with ClustalX, and a homologous tree was generated. In this tree, proteins with similar known function were classified into the same group. Some proteins in the same group have been reported to have similar binding motifs experimentally. The tree might provide clues about possible functions of hypothetical proteins for further experimental verification.  相似文献   

11.
12.
The genomes of many organisms have been sequenced in the last 5 years. Typically about 30% of predicted genes from a newly sequenced genome cannot be given functional assignments using sequence comparison methods. In these situations three-dimensional structural predictions combined with a suite of computational tools can suggest possible functions for these hypothetical proteins. Suggesting functions may allow better interpretation of experimental data (e.g., microarray data and mass spectroscopy data) and help experimentalists design new experiments. In this paper, we focus on three hypothetical proteins of Shewanella oneidensis MR-1 that are potentially related to iron transport/metabolism based on microarray experiments. The threading program PROSPECT was used for protein structural predictions and functional annotation, in conjunction with literature search and other computational tools. Computational tools were used to perform transmembrane domain predictions, coiled coil predictions, signal peptide predictions, sub-cellular localization predictions, motif prediction, and operon structure evaluations. Combined computational results from all tools were used to predict roles for the hypothetical proteins. This method, which uses a suite of computational tools that are freely available to academic users, can be used to annotate hypothetical proteins in general.  相似文献   

13.
Ganoderma lucidum is one of the well-known medicinal basidiomycetes worldwide. The mitochondrion, referred to as the second genome, is an organelle found in most eukaryotic cells and participates in critical cellular functions. Elucidating the structure and function of this genome is important to understand completely the genetic contents of G. lucidum. In this study, we assembled the mitochondrial genome of G. lucidum and analyzed the differential expressions of its encoded genes across three developmental stages. The mitochondrial genome is a typical circular DNA molecule of 60,630 bp with a GC content of 26.67%. Genome annotation identified genes that encode 15 conserved proteins, 27 tRNAs, small and large rRNAs, four homing endonucleases, and two hypothetical proteins. Except for genes encoding trnW and two hypothetical proteins, all genes were located on the positive strand. For the repeat structure analysis, eight forward, two inverted, and three tandem repeats were detected. A pair of fragments with a total length around 5.5 kb was found in both the nuclear and mitochondrial genomes, which suggests the possible transfer of DNA sequences between two genomes. RNA-Seq data for samples derived from three stages, namely, mycelia, primordia, and fruiting bodies, were mapped to the mitochondrial genome and qualified. The protein-coding genes were expressed higher in mycelia or primordial stages compared with those in the fruiting bodies. The rRNA abundances were significantly higher in all three stages. Two regions were transcribed but did not contain any identified protein or tRNA genes. Furthermore, three RNA-editing sites were detected. Genome synteny analysis showed that significant genome rearrangements occurred in the mitochondrial genomes. This study provides valuable information on the gene contents of the mitochondrial genome and their differential expressions at various developmental stages of G. lucidum. The results contribute to the understanding of the functions and evolution of fungal mitochondrial DNA.  相似文献   

14.
The published sequence of the Vibrio cholerae genome indicates that, in addition to the genes that encode proteins of known and unknown function, there are 1577 ORFs identified as conserved hypothetical or hypothetical gene candidates. Because the annotation is not 100% accurate, it is not known which of the 1577 ORFs are true protein-coding genes. In this paper, an algorithm based on the Z curve method, with sensitivity, specificity and accuracy greater than 98%, is used to solve this problem. Twenty-fold cross-validation tests show that the accuracy of the algorithm is 98.8%. A detailed discussion of the mechanism of the algorithm is also presented. It was found that 172 of the 1577 ORFs are unlikely to be protein-coding genes. The number of protein-coding genes in the V. cholerae genome was re-estimated and found to be approximately 3716. This result should be of use in microarray analysis of gene expression in the genome, because the cost of preparing chips may be somewhat decreased. A computer program was written to calculate a coding score called VCZ for gene identification in the genome. Coding/noncoding is simply determined by VCZ > 0/VCZ < 0. The program is freely available on request for academic use.  相似文献   

15.
Chlamydia trachomatis represents a group of human pathogenic obligate intracellular and gram-negative bacteria. The genome of C. trachomatis D comprises 894 open reading frames (ORFs). In this study the global expression of genes in C. trachomatis A, D and L2, which are responsible for different chlamydial diseases, was investigated using a proteomics approach. Based on silver stained two-dimensional polyacrylamide gel electrophoresis (2-D PAGE), gels with purified elementary bodies (EB) and auto-radiography of gels with 35S-labeled C. trachomatis proteins up to 700 protein spots were detectable within the range of the immobilized pH gradient (IPG) system used. Using mass spectrometry and N-terminal sequencing followed by database searching we identified 250 C. trachomatis proteins from purified EB of which 144 were derived from different genes representing 16% of the ORFs predicted from the C. trachomatis D genome and the 7.5 kb C. trachomatis plasmid. Important findings include identification of proteins from the type III secretion apparatus, enzymes from the central metabolism and confirmation of expression of 25 hypothetical ORFs and five polymorphic membrane proteins. Comparison of serovars generated novel data on genetic variability as indicated by electrophoretic variation and potentially important examples of serovar specific differences in protein abundance. The availability of the complete genome made it feasible to map and to identify proteins of C. trachomatis on a large scale and the integration of our data in a 2-D PAGE database will create a basis for post genomic research, important for the understanding of chlamydial development and pathogenesis.  相似文献   

16.
Selvaraj S  Sambandam V  Sardar D  Anishetty S 《Gene》2012,506(1):233-241
One of the challenges faced by Mycobacterium tuberculosis (M. tuberculosis) in dormancy is hypoxia. DosR/DevR of M. tuberculosis is a two component dormancy survival response regulator which induces the expression of 48 genes. In this study, we have used DosR regulon proteins of M. tuberculosis H37Rv as the query set and performed a comprehensive homology search against the non-redundant database. Homologs were found in environmental mycobacteria, environmental bacteria and archaebacteria. Analysis of genomic context of DosR regulon revealed that they are distributed as nine blocks in the genome of M. tuberculosis with many transposases and integrases in their vicinity. Further, we classified DosR regulon proteins into eight functional categories. One of the hypothetical proteins Rv1998c could probably be a methylisocitrate lyase or a phosphonomutase. Another hypothetical protein, Rv0572 was found only in mycobacteria. Insights gained in this study can potentially aid in the development of novel therapeutic interventions.  相似文献   

17.
MOTIVATION: Tandem repeats are associated with disease genes, play an important role in evolution and are important in genomic organization and function. Although much research has been done on short perfect patterns of repeats, there has been less focus on imperfect repeats. Thus, there is an acute need for a tandem repeats database that provides reliable and up to date information on both perfect and imperfect tandem repeats in the human genome and relates these to disease genes. RESULTS: This paper presents a web-accessible relational tandem repeats database that relates tandem repeats to gene locations and disease genes of the human genome. In contrast to other available databases, this database identifies both perfect and imperfect repeats of 1-2000 bp unit lengths. The utility of this database has been illustrated by analysing these repeats for their distribution and frequencies across chromosomes and genomic locations and between protein-coding and non-coding regions. The applicability of this database to identify diseases associated with previously uncharacterized tandem repeats is demonstrated.  相似文献   

18.
The interaction of proteins with their respective DNA targets is known to control many high-fidelity cellular processes. Performing a comprehensive survey of the sequenced genomes for DNA-binding proteins (DBPs) will help in understanding their distribution and the associated functions in a particular genome. Availability of fully sequenced genome of Arabidopsis thaliana enables the review of distribution of DBPs in this model plant genome. We used profiles of both structure and sequence-based DNA-binding families, derived from PDB and PFam databases, to perform the survey. This resulted in 4471 proteins, identified as DNA-binding in Arabidopsis genome, which are distributed across 300 different PFam families. Apart from several plant-specific DNA-binding families, certain RING fingers and leucine zippers also had high representation. Our search protocol helped to assign DNA-binding property to several proteins that were previously marked as unknown, putative or hypothetical in function. The distribution of Arabidopsis genes having a role in plant DNA repair were particularly studied and noted for their functional mapping. The functions observed to be overrepresented in the plant genome harbour DNA-3-methyladenine glycosylase activity, alkylbase DNA N-glycosylase activity and DNA-(apurinic or apyrimidinic site) lyase activity, suggesting their role in specialized functions such as gene regulation and DNA repair.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号