首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Siew N  Fischer D 《Proteins》2003,53(2):241-251
Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans.  相似文献   

2.
MOTIVATION: A large fraction of open reading frames (ORFs) identified as 'hypothetical' proteins correspond to either 'conserved hypothetical' proteins, representing sequences homologous to ORFs of unknown function from other organisms, or to hypothetical proteins lacking any significant sequence similarity to other ORFs in the databases. Elucidating the functions and three-dimensional structures of such orphan ORFs, termed ORFans or poorly conserved ORFs (PCOs), is essential for understanding biodiversity. However, it has been claimed that many ORFans may not encode for expressed proteins. RESULTS: A genome-wide experimental study of 'paralogous PCOs' in the halophilic archaea Halobacterium sp. NRC-1 was conducted. Paralogous PCOs are ORFs with at least one homolog in the same organism, but with no clear homologs in other organisms. The results reveal that mRNA is synthesized for a majority of the Halobacterium sp. NRC-1 paralogous PCO families, including those comprising relatively short proteins, strongly suggesting that these Halobacterium sp. NRC-1 paralogous PCOs correspond to true, expressed proteins. Hence, further computational and experimental studies aimed at characterizing PCOs in this and other organisms are merited. Such efforts could shed light on PCOs' functions and origins, thereby serving to elucidate the vast diversity observed in the genetic material.  相似文献   

3.
Narra HP  Cordes MH  Ochman H 《Proteomics》2008,8(22):4772-4781
ORFan genes can constitute a large fraction of a bacterial genome, but due to their lack of homologs, their functions have remained largely unexplored. To determine if particular features of ORFan-encoded proteins promote their presence in a genome, we analyzed properties of ORFans that originated over a broad evolutionary timescale. We also compared ORFan genes to another class of acquired genes, heterogeneous occurrence in prokaryotes (HOPs), which have homologs in other bacteria. A total of 54 ORFan and HOP genes selected from different phylogenetic depths in the Escherichia coli lineage were cloned, expressed, purified, and subjected to circular dichroism (CD) spectroscopy. A majority of genes could be expressed, but only 18 yielded sufficient soluble protein for spectral analysis. Of these, half were significantly alpha-helical, three were predominantly beta-sheet, and six were of intermediate/indeterminate structure. Although a higher proportion of HOPs yielded soluble proteins with resolvable secondary structures, ORFans resembled HOPs with regard to most of the other features tested. Overall, we found that those ORFan and HOP genes that have persisted in the E. coli lineage were more likely to encode soluble and folded proteins, more likely to display environmental modulation of their gene expression, and by extrapolation, are more likely to be functional.  相似文献   

4.
The complete nucleotide sequences of over 37 microbial and three eukaryote genomes are already publicly available, and more sequencing is in progress. Despite this accumulation of data, newly sequenced microbial genomes continue to reveal up to 50% of functionally uncharacterized "anonymous" genes. A majority of these anonymous proteins have homologues in other organisms, whereas the rest exhibit no clear similarity to any other sequence in the data bases. This set of unique, apparently species-specific, sequences are referred to as ORFans. The biochemical and structural analysis of ORFan gene products is of both evolutionary and functional interest. Here we report the cloning and expression of Escherichia coli ORFan ykfE gene and the functional characterization of the encoded protein. Under physiological conditions, the protein is a homodimer with a strong affinity for C-type lysozyme, as revealed by co-purification and co-crystallization. Activity measurements and fluorescence studies demonstrated that the YkfE gene product is a potent C-type lysozyme inhibitor (K(i) approximately 1 nm). To denote this newly assigned function, ykfE has now been registered under the new gene name Ivy (inhibitor of vertebrate lysozyme) at the E. coli genetic stock center.  相似文献   

5.
ORFans are open reading frames (ORFs) with no detectable sequence similarity to any other sequence in the databases. Each newly sequenced genome contains a significant number of ORFans. Therefore, ORFans entail interesting evolutionary puzzles. However, little can be learned about them using bioinformatics tools, and their study seems to have been underemphasized. Here we present some of the questions that the existence of so many ORFans have raised and review some of the studies aimed at understanding ORFans, their functions and their origins. These works have demonstrated that ORFans are an untapped source of research, requiring further computational and experimental studies.  相似文献   

6.
The genomes of most newly sequenced organisms contain a significant fraction of ORFs (open reading frames) that match no other sequence in the databases. We refer to these singleton ORFs as sequence ORFans. Because little can be learned about ORFans by homology, the origin and functions of ORFans remain a mystery. However, in this era of full genome sequencing, it seems that ORFans have been underemphasized. In this minireview, we draw attention to the increasing number of ORFans and to the consequences of this growth to biological research in the postgenomic era.  相似文献   

7.
The mimivirus genome contains many genes that lack homologs in the sequence database and are thus known as ORFans. In addition, mimivirus genes that encode proteins belonging to known fold families are in some cases fused to domain-sized segments that cannot be classified. One such ORFan region is present in the mimivirus enzyme R596, a member of the Erv family of sulfhydryl oxidases. We determined the structure of a variant of full-length R596 and observed that the carboxy-terminal region of R596 assumes a folded, compact domain, demonstrating that these ORFan segments can be stable structural units. Moreover, the R596 ORFan domain fold is novel, hinting at the potential wealth of protein structural innovation yet to be discovered in large double-stranded DNA viruses. In the context of the R596 dimer, the ORFan domain contributes to formation of a broad cleft enriched with exposed aromatic groups and basic side chains, which may function in binding target proteins or localization of the enzyme within the virus factory or virions. Finally, we find evidence for an intermolecular dithiol/disulfide relay within the mimivirus R596 dimer, the first such extended, intersubunit redox-active site identified in a viral sulfhydryl oxidase.  相似文献   

8.
ORFans are hypothetical proteins lacking any significant sequence similarity with other proteins. Here, we highlighted by quantitative proteomics the TGAM_1934 ORFan from the hyperradioresistant Thermococcus gammatolerans archaeon as one of the most abundant hypothetical proteins. This protein has been selected as a priority target for structure determination on the basis of its abundance in three cellular conditions. Its solution structure has been determined using multidimensional heteronuclear NMR spectroscopy. TGAM_1934 displays an original fold, although sharing some similarities with the 3D structure of the bacterial ortholog of frataxin, CyaY, a protein conserved in bacteria and eukaryotes and involved in iron–sulfur cluster biogenesis. These results highlight the potential of structural proteomics in prioritizing ORFan targets for structure determination based on quantitative proteomics data. The proteomic data and structure coordinates have been deposited to the ProteomeXchange with identifier PXD000402 ( http://proteomecentral.proteomexchange.org/dataset/PXD000402 ) and Protein Data Bank under the accession number 2mcf, respectively.  相似文献   

9.
ORFans are orphan open reading frames. The numbers of ORFans are steadfastly increasing despite of the genome database increment. Characterizing ORFans is essential to fully understanding the diversity of the structure and function of proteins in nature. In this study, MPN423 from Mycoplasma pneumoniae has been cloned, expressed, purified, and crystallized. MPN423 is an orthologous ORFan whose only known homologue in the whole genome database is MG296 from M. genitalium. X-ray diffraction data were collected to 2.7 A from the crystal of a selenomethionine substitute MPN423. The crystal belongs to the primitive monoclinic space group P2(1), with unit-cell parameters of a = 50.5 A, b = 89.2 A, c = 50.6 A, and beta = 102.9 degrees . A preliminary electron density map shows five alpha-helical segments per MPN423 molecule. A full structure determination is under way to provide helpful information to general questions about orthologous ORFan products.  相似文献   

10.

Background:  

The origin of microbial ORFans, ORFs having no detectable homology to other ORFs in the databases, is one of the unexplained puzzles of the post-genomic era. Several hypothesis on the origin of ORFans have been suggested in the last few years, most of which based on selected, relatively small, subsets of ORFans. One of the hypotheses for the origin of ORFans is that they have been acquired thru lateral transfer from viruses. Here we carry out a comprehensive, genome-wide study on the origins of ORFans to quantify the strength of current evidence supporting this hypothesis.  相似文献   

11.

Background

Mimivirus isolated from A. polyphaga is the largest virus discovered so far. It is unique among all the viruses in having genes related to translation, DNA repair and replication which bear close homology to eukaryotic genes. Nevertheless, only a small fraction of the proteins (33%) encoded in this genome has been assigned a function. Furthermore, a large fraction of the unassigned protein sequences bear no sequence similarity to proteins from other genomes. These sequences are referred to as ORFans. Because of their lack of sequence similarity to other proteins, they can not be assigned putative functions using standard sequence comparison methods. As part of our genome-wide computational efforts aimed at characterizing Mimivirus ORFans, we have applied fold-recognition methods to predict the structure of these ORFans and further functions were derived based on conservation of functionally important residues in sequence-template alignments.

Results

Using fold recognition, we have identified highly confident computational 3D structural assignments for 21 Mimivirus ORFans. In addition, highly confident functional predictions for 6 of these ORFans were derived by analyzing the conservation of functional motifs between the predicted structures and proteins of known function. This analysis allowed us to classify these 6 previously unannotated ORFans into their specific protein families: carboxylesterase/thioesterase, metal-dependent deacetylase, P-loop kinases, 3-methyladenine DNA glycosylase, BTB domain and eukaryotic translation initiation factor eIF4E.

Conclusion

Using stringent fold recognition criteria we have assigned three-dimensional structures for 21 of the ORFans encoded in the Mimivirus genome. Further, based on the 3D models and an analysis of the conservation of functionally important residues and motifs, we were able to derive functional attributes for 6 of the ORFans. Our computational identification of important functional sites in these ORFans can be the basis for a subsequent experimental verification of our predictions. Further computational and experimental studies are required to elucidate the 3D structures and functions of the remaining Mimivirus ORFans.  相似文献   

12.
13.
Siew N  Saini HK  Fischer D 《FEBS letters》2005,579(14):3175-3182
A large number of sequences in each newly sequenced genome correspond to lineage and species-specific proteins, also known as ORFans. Amongst these ORFans, a large number are sequences with unknown structures and functions. We have identified a family of sequences, annotated as hypothetical proteins, which are specific to Bacillus and have carried out a computational study aimed at characterizing this family. Fold-recognition methods predict that these sequences belong to the alpha/beta hydrolase fold. We suggest possible catalytic triads for the ORFans and propose a hypothesis regarding the possible families within the alpha/beta hydrolase superfamily to which they may belong.  相似文献   

14.
Type II restriction enzymes are commercially important deoxyribonucleases and very attractive targets for protein engineering of new specificities. At the same time they are a very challenging test bed for protein structure prediction methods. Typically, enzymes that recognize different sequences show little or no amino acid sequence similarity to each other and to other proteins. Based on crystallographic analyses that revealed the same PD-(D/E)XK fold for more than a dozen case studies, they were nevertheless considered to be related until the combination of bioinformatics and mutational analyses has demonstrated that some of these proteins belong to other, unrelated folds PLD, HNH, and GIY-YIG. As a part of a large-scale project aiming at identification of a three-dimensional fold for all type II REases with known sequences (currently approximately 1000 proteins), we carried out preliminary structure prediction and selected candidates for experimental validation. Here, we present the analysis of HpaI REase, an ORFan with no detectable homologs, for which we detected a structural template by protein fold recognition, constructed a model using the FRankenstein monster approach and identified a number of residues important for the DNA binding and catalysis. These predictions were confirmed by site-directed mutagenesis and in vitro analysis of the mutant proteins. The experimentally validated model of HpaI will serve as a low-resolution structural platform for evolutionary considerations in the subgroup of blunt-cutting REases with different specificities. The research protocol developed in the course of this work represents a streamlined version of the previously used techniques and can be used in a high-throughput fashion to build and validate models for other enzymes, especially ORFans that exhibit no sequence similarity to any other protein in the database.  相似文献   

15.
In summary, recently developed technologies have begun to draw back the curtain of mystery that obscures some of the basic mechanisms of DNA replication at multiple levels. Studies using extended DNA and chromatin fiber techniques have proven valuable for identifying the location of origins of replication at specific genomic sites and determining their temporal order of replication, for identifying and quantifying sites of DNA damage and localizing chromatin proteins in relation to sites of DNA replication. The future potential of these methods include further discoveries in functional genomics and contributions to the elucidation of the histone code. Such studies could prove very valuable in studies of the mechanisms of cancer development, aging, and other processes of disordered genomic functioning.  相似文献   

16.

Background  

To discover remote evolutionary relationships and functional similarities between proteins, biologists rely on comparative sequence analysis, and when structures are available, on structural alignments and various measures of structural similarity. The measures/scores that have most commonly been used for this purpose include: alignment length, percent sequence identity, superposition RMSD and their different combinations. More recently, we have introduced the "Homologous core structure overlap score" (HCS) and the "Loop Hausdorff Measure" (LHM). Along with these we also consider the "gapped structural alignment score" (GSAS), which was introduced earlier by other researchers.  相似文献   

17.
The analysis of the dynamic behavior of enzymes is fundamental to structural biology. A direct relationship between protein flexibility and biological function has been shown for bovine pancreatic ribonuclease (RNase A) (Rasmussen et al., Nature 1992;357:423-424). More recently, crystallographic studies have shown that functional motions in RNase A involve the enzyme beta-sheet regions that move concertedly on substrate binding and release (Vitagliano et al., Proteins 2002;46:97-104). These motions have been shown to correspond to intrinsic dynamic properties of the native enzyme by molecular dynamics (MD) simulations. To unveil the occurrence of these collective motions in other members of pancreatic-like superfamily, we carried out MD simulations on human angiogenin (Ang). Essential dynamics (ED) analyses performed on the trajectories reveal that Ang exhibits collective motions similar to RNase A, despite the limited sequence identity (33%) of the two proteins. Furthermore, we show that these collective motions are also present in ensembles of experimentally determined structures of both Ang and RNase A. Finally, these subtle concerted beta-sheet motions were also observed for other two members of the pancreatic-like superfamily by comparing the ligand-bound and ligand-free structures of these enzymes. Taken together, these findings suggest that pancreatic-like ribonucleases share an evolutionary conserved dynamic behavior consisting of subtle beta-sheet motions, which are essential for substrate binding and release.  相似文献   

18.
Adams MA  Suits MD  Zheng J  Jia Z 《Proteomics》2007,7(16):2920-2932
The combination of genomic sequencing with structural genomics has provided a wealth of new structures for previously uncharacterized ORFs, more commonly referred to as hypothetical proteins. This rapid growth has been the direct result of high-throughput, automated approaches in both the identification of new ORFs and the determination of high-resolution 3-D protein structures. A significant bottleneck is reached, however, at the stage of functional annotation in that the assignment of function is not readily automatable. It is often the case that the initial structural analysis at best indicates a functional family for a given hypothetical protein, but further identification of a relevant ligand or substrate is impeded by the diversity of function in a particular structural classification of proteins family, a highly selective and specific ligand-binding site, or the identification of a novel protein fold. Our approach to the functional annotation of hypothetical proteins relies on the combination of structural information with additional bioinformatics evidence garnered from operon prediction, loose functional information of additional operon members, conservation of catalytic residues, as well as cocrystallization trials and virtual ligand screening. The synthesis of all available information for each protein has permitted the functional annotation of several hypothetical proteins from Escherichia coli and each assignment has been confirmed through generally accepted biochemical methods.  相似文献   

19.
Depletion of intracellular Ca(2+) stores evokes Ca(2+) entry across the plasma membrane by inducing Ca(2+) release-activated Ca(2+) (CRAC) currents in many cell types. Recently, Orai and STIM proteins were identified as the molecular identities of the CRAC channel subunit and the endoplasmic reticulum Ca(2+) sensor, respectively. Here, extensive database searching and phylogenetic analysis revealed several lineage-specific duplication events in the Orai protein family, which may account for the evolutionary origins of distinct functional properties among mammalian Orai proteins. Based on similarity to key structural domains and essential residues for channel functions in Orai proteins, database searching also identifies a putative primordial Orai sequence in hyperthermophilic archaeons. Furthermore, modern Orai appears to acquire new structural domains as early as Urochodata, before divergence into vertebrates. The evolutionary patterns of structural domains might be related to distinct functional properties of Drosophila and mammalian CRAC currents. Interestingly, Orai proteins display two conserved internal repeats located at transmembrane segments 1 and 3, both of which contain key amino acids essential for channel function. These findings demonstrate biochemical and physiological relevance of Orai proteins in light of different evolutionary origins and will provide novel insights into future structural and functional studies of Orai proteins.  相似文献   

20.
Of the ~4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ~2877 ORFs, covering ~70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号