首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Narra HP  Cordes MH  Ochman H 《Proteomics》2008,8(22):4772-4781
ORFan genes can constitute a large fraction of a bacterial genome, but due to their lack of homologs, their functions have remained largely unexplored. To determine if particular features of ORFan-encoded proteins promote their presence in a genome, we analyzed properties of ORFans that originated over a broad evolutionary timescale. We also compared ORFan genes to another class of acquired genes, heterogeneous occurrence in prokaryotes (HOPs), which have homologs in other bacteria. A total of 54 ORFan and HOP genes selected from different phylogenetic depths in the Escherichia coli lineage were cloned, expressed, purified, and subjected to circular dichroism (CD) spectroscopy. A majority of genes could be expressed, but only 18 yielded sufficient soluble protein for spectral analysis. Of these, half were significantly alpha-helical, three were predominantly beta-sheet, and six were of intermediate/indeterminate structure. Although a higher proportion of HOPs yielded soluble proteins with resolvable secondary structures, ORFans resembled HOPs with regard to most of the other features tested. Overall, we found that those ORFan and HOP genes that have persisted in the E. coli lineage were more likely to encode soluble and folded proteins, more likely to display environmental modulation of their gene expression, and by extrapolation, are more likely to be functional.  相似文献   

2.
The mimivirus genome contains many genes that lack homologs in the sequence database and are thus known as ORFans. In addition, mimivirus genes that encode proteins belonging to known fold families are in some cases fused to domain-sized segments that cannot be classified. One such ORFan region is present in the mimivirus enzyme R596, a member of the Erv family of sulfhydryl oxidases. We determined the structure of a variant of full-length R596 and observed that the carboxy-terminal region of R596 assumes a folded, compact domain, demonstrating that these ORFan segments can be stable structural units. Moreover, the R596 ORFan domain fold is novel, hinting at the potential wealth of protein structural innovation yet to be discovered in large double-stranded DNA viruses. In the context of the R596 dimer, the ORFan domain contributes to formation of a broad cleft enriched with exposed aromatic groups and basic side chains, which may function in binding target proteins or localization of the enzyme within the virus factory or virions. Finally, we find evidence for an intermolecular dithiol/disulfide relay within the mimivirus R596 dimer, the first such extended, intersubunit redox-active site identified in a viral sulfhydryl oxidase.  相似文献   

3.
Siew N  Fischer D 《Proteins》2003,53(2):241-251
Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans.  相似文献   

4.
Structural biology sheds light on the puzzle of genomic ORFans   总被引:5,自引:0,他引:5  
Genomic ORFans are orphan open reading frames (ORFs) with no significant sequence similarity to other ORFs. ORFans comprise 20-30% of the ORFs of most completely sequenced genomes. Because nothing can be learnt about ORFans via sequence homology, the functions and evolutionary origins of ORFans remain a mystery. Furthermore, because relatively few ORFans have been experimentally characterized, it has been suggested that most ORFans are not likely to correspond to functional, expressed proteins, but rather to spurious ORFs, pseudo-genes or to rapidly evolving proteins with non-essential roles. As a snapshot view of current ORFan structural studies, we searched for ORFans among proteins whose three-dimensional structures have been recently determined. We find that functional and structural studies of ORFans are not as underemphasized as previously suggested. These recently determined structures correspond to ORFans from all Kingdoms of life, and include proteins that have previously been functionally characterized, as well as structural genomics targets of unknown function labeled as "hypothetical proteins". This suggests that many of the ORFans in the databases are likely to correspond to expressed, functional (and even essential) proteins. Furthermore, the recently determined structures include examples of the various types of ORFans, suggesting that the functions and evolutionary origins of ORFans are diverse. Although this survey sheds some light on the ORFan mystery, further experimental studies are required to gain a better understanding of the role and origins of the tens of thousands of ORFans awaiting characterization.  相似文献   

5.
ORFans are hypothetical proteins lacking any significant sequence similarity with other proteins. Here, we highlighted by quantitative proteomics the TGAM_1934 ORFan from the hyperradioresistant Thermococcus gammatolerans archaeon as one of the most abundant hypothetical proteins. This protein has been selected as a priority target for structure determination on the basis of its abundance in three cellular conditions. Its solution structure has been determined using multidimensional heteronuclear NMR spectroscopy. TGAM_1934 displays an original fold, although sharing some similarities with the 3D structure of the bacterial ortholog of frataxin, CyaY, a protein conserved in bacteria and eukaryotes and involved in iron–sulfur cluster biogenesis. These results highlight the potential of structural proteomics in prioritizing ORFan targets for structure determination based on quantitative proteomics data. The proteomic data and structure coordinates have been deposited to the ProteomeXchange with identifier PXD000402 ( http://proteomecentral.proteomexchange.org/dataset/PXD000402 ) and Protein Data Bank under the accession number 2mcf, respectively.  相似文献   

6.
The complete nucleotide sequences of over 37 microbial and three eukaryote genomes are already publicly available, and more sequencing is in progress. Despite this accumulation of data, newly sequenced microbial genomes continue to reveal up to 50% of functionally uncharacterized "anonymous" genes. A majority of these anonymous proteins have homologues in other organisms, whereas the rest exhibit no clear similarity to any other sequence in the data bases. This set of unique, apparently species-specific, sequences are referred to as ORFans. The biochemical and structural analysis of ORFan gene products is of both evolutionary and functional interest. Here we report the cloning and expression of Escherichia coli ORFan ykfE gene and the functional characterization of the encoded protein. Under physiological conditions, the protein is a homodimer with a strong affinity for C-type lysozyme, as revealed by co-purification and co-crystallization. Activity measurements and fluorescence studies demonstrated that the YkfE gene product is a potent C-type lysozyme inhibitor (K(i) approximately 1 nm). To denote this newly assigned function, ykfE has now been registered under the new gene name Ivy (inhibitor of vertebrate lysozyme) at the E. coli genetic stock center.  相似文献   

7.

Background

Mimivirus isolated from A. polyphaga is the largest virus discovered so far. It is unique among all the viruses in having genes related to translation, DNA repair and replication which bear close homology to eukaryotic genes. Nevertheless, only a small fraction of the proteins (33%) encoded in this genome has been assigned a function. Furthermore, a large fraction of the unassigned protein sequences bear no sequence similarity to proteins from other genomes. These sequences are referred to as ORFans. Because of their lack of sequence similarity to other proteins, they can not be assigned putative functions using standard sequence comparison methods. As part of our genome-wide computational efforts aimed at characterizing Mimivirus ORFans, we have applied fold-recognition methods to predict the structure of these ORFans and further functions were derived based on conservation of functionally important residues in sequence-template alignments.

Results

Using fold recognition, we have identified highly confident computational 3D structural assignments for 21 Mimivirus ORFans. In addition, highly confident functional predictions for 6 of these ORFans were derived by analyzing the conservation of functional motifs between the predicted structures and proteins of known function. This analysis allowed us to classify these 6 previously unannotated ORFans into their specific protein families: carboxylesterase/thioesterase, metal-dependent deacetylase, P-loop kinases, 3-methyladenine DNA glycosylase, BTB domain and eukaryotic translation initiation factor eIF4E.

Conclusion

Using stringent fold recognition criteria we have assigned three-dimensional structures for 21 of the ORFans encoded in the Mimivirus genome. Further, based on the 3D models and an analysis of the conservation of functionally important residues and motifs, we were able to derive functional attributes for 6 of the ORFans. Our computational identification of important functional sites in these ORFans can be the basis for a subsequent experimental verification of our predictions. Further computational and experimental studies are required to elucidate the 3D structures and functions of the remaining Mimivirus ORFans.  相似文献   

8.
Monoclonal antibody (mAb) MN423 recognizes Alzheimer's disease specific conformation of tau protein assembled into paired helical filaments (PHF). Since the three-dimensional structure of PHF is currently unavailable, the structure of MN423 binding site could provide important information about PHF conformation with the consequences for the Alzheimer's disease prevention and cure. Fab fragment of MN423 was prepared and purified. We have identified two different conditions for crystallization of the Fab fragment that yielded two crystal forms. They diffracted to 3.0 and 1.6 A resolution with four and one molecule in the asymmetric unit, respectively. Both crystal forms belonged to the space group P2(1) with unit cell parameters a = 76.4 A, b = 138.4 A, c = 92.4 A, beta = 101.9 degrees , and a = 71.5 A, b = 36.8 A, c = 85.5 A, beta = 113.9 degrees .  相似文献   

9.
Siew N  Saini HK  Fischer D 《FEBS letters》2005,579(14):3175-3182
A large number of sequences in each newly sequenced genome correspond to lineage and species-specific proteins, also known as ORFans. Amongst these ORFans, a large number are sequences with unknown structures and functions. We have identified a family of sequences, annotated as hypothetical proteins, which are specific to Bacillus and have carried out a computational study aimed at characterizing this family. Fold-recognition methods predict that these sequences belong to the alpha/beta hydrolase fold. We suggest possible catalytic triads for the ORFans and propose a hypothesis regarding the possible families within the alpha/beta hydrolase superfamily to which they may belong.  相似文献   

10.
The conservation of alternative splicing in orthologous genes from the human and mouse genomes was analyzed. Alternatively spliced mouse genes from the AsMamDB database were used to scan the draft human genome. The mouse protein isoforms were aligned with respect to orthologous human genes, and thus the exon-intron structure of the latter was established. Proteins isoforms that could not be aligned throughout their length were analyzed in detail using the human EST alignment.  相似文献   

11.
Type II restriction enzymes are commercially important deoxyribonucleases and very attractive targets for protein engineering of new specificities. At the same time they are a very challenging test bed for protein structure prediction methods. Typically, enzymes that recognize different sequences show little or no amino acid sequence similarity to each other and to other proteins. Based on crystallographic analyses that revealed the same PD-(D/E)XK fold for more than a dozen case studies, they were nevertheless considered to be related until the combination of bioinformatics and mutational analyses has demonstrated that some of these proteins belong to other, unrelated folds PLD, HNH, and GIY-YIG. As a part of a large-scale project aiming at identification of a three-dimensional fold for all type II REases with known sequences (currently approximately 1000 proteins), we carried out preliminary structure prediction and selected candidates for experimental validation. Here, we present the analysis of HpaI REase, an ORFan with no detectable homologs, for which we detected a structural template by protein fold recognition, constructed a model using the FRankenstein monster approach and identified a number of residues important for the DNA binding and catalysis. These predictions were confirmed by site-directed mutagenesis and in vitro analysis of the mutant proteins. The experimentally validated model of HpaI will serve as a low-resolution structural platform for evolutionary considerations in the subgroup of blunt-cutting REases with different specificities. The research protocol developed in the course of this work represents a streamlined version of the previously used techniques and can be used in a high-throughput fashion to build and validate models for other enzymes, especially ORFans that exhibit no sequence similarity to any other protein in the database.  相似文献   

12.
Wanda: a database of duplicated fish genes   总被引:2,自引:1,他引:1       下载免费PDF全文
Comparative genomics has shown that ray-finned fish (Actinopterygii) contain more copies of many genes than other vertebrates. A large number of these additional genes appear to have been produced during a genome duplication event that occurred early during the evolution of Actinopterygii (i.e. before the teleost radiation). In addition to this ancient genome duplication event, many lineages within Actinopterygii have experienced more recent genome duplications. Here we introduce a curated database named Wanda that lists groups of orthologous genes with one copy from man, mouse and chicken, one or two from tetraploid Xenopus and two or more ancient copies (i.e. paralogs) from ray-finned fish. The database also contains the sequence alignments and phylogenetic trees that were necessary for determining the correct orthologous and paralogous relationships among genes. Where available, map positions and functional data are also reported. The Wanda database should be of particular use to evolutionary and developmental biologists who are interested in the evolutionary and functional divergence of genes after duplication. Wanda is available at http://www.evolutionsbiologie.uni-konstanz.de/Wanda/.  相似文献   

13.
The genomes of most newly sequenced organisms contain a significant fraction of ORFs (open reading frames) that match no other sequence in the databases. We refer to these singleton ORFs as sequence ORFans. Because little can be learned about ORFans by homology, the origin and functions of ORFans remain a mystery. However, in this era of full genome sequencing, it seems that ORFans have been underemphasized. In this minireview, we draw attention to the increasing number of ORFans and to the consequences of this growth to biological research in the postgenomic era.  相似文献   

14.
ORFans are open reading frames (ORFs) with no detectable sequence similarity to any other sequence in the databases. Each newly sequenced genome contains a significant number of ORFans. Therefore, ORFans entail interesting evolutionary puzzles. However, little can be learned about them using bioinformatics tools, and their study seems to have been underemphasized. Here we present some of the questions that the existence of so many ORFans have raised and review some of the studies aimed at understanding ORFans, their functions and their origins. These works have demonstrated that ORFans are an untapped source of research, requiring further computational and experimental studies.  相似文献   

15.
The 26S proteasome is a large protein complex involved in protein degradation. We have shown previously that the PSMD7/Mov34 subunit of the human proteasome contains a proteolytically resistant MPN domain. MPN domain family members comprise subunits of the proteasome, COP9-signalosome and translation initiation factor 3 complexes. Here, the crystal structure of two C-terminally truncated proteins, MPN 1-186 and MPN 1-177, were solved to 1.96 and 3.0 A resolution, respectively. MPN 1-186 is formed by nine beta-strands surrounded by three alpha-helices plus a fourth alpha-helix at the C terminus. This final alpha-helix emerges from the domain core and folds along with a symmetrically related subunit, typical of a domain swap. The crystallographic dimer is consistent with size-exclusion chromatography and DLS analysis showing that MPN 1-186 is a dimer in solution. MPN 1-186 shows an overall architecture highly similar to the previously reported crystal structure of the Archaeal MPN domain AfJAMM of Archaeoglobus fulgidus. However, previous structural and biophysical analyses have shown that neither MPN 1-186 nor full-length human Mov34 bind metal, in opposition to the zinc-binding AfJAMM structures. The zinc ligand residues observed in AfJAMM are conserved in the yeast Rpn11 proteasome and Csn5 COP-signalosome subunits, which is consistent with the isopeptidase activity described for these proteins. The results presented here show that, although the MPN domain of Mov34 shows a typical metalloprotease fold, it is unable to coordinate a metal ion. This finding and amino acid sequence comparisons can explain why the MPN-containing proteins Mov34/PSMD7, RPN8, Csn6, Prp8p and the translation initiation factor 3 subunits f and h do not show catalytic isopeptidase activity, allowing us to propose the hypothesis that in these proteins the MPN domain has a primarily structural function.  相似文献   

16.
One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.  相似文献   

17.
The mammalian COP9 signalosome is an eight-subunit (CSN1–CSN8) complex that plays essential roles in multiple cellular and physiological processes. CSN5 and CSN6 are the only two MPN (Mpr1-Pad1-N-terminal) domain-containing subunits in the complex. Unlike the CSN5 MPN domain, CSN6 lacks a metal-binding site and isopeptidase activity. Here, we report the crystal structure of the human CSN6 MPN domain. Each CSN6 monomer contains nine β sheets surrounded by three helices. Two forms of dimers are observed in the crystal structure. Interestingly, a domain swapping of β8 and β9 strands occurs between two neighboring monomers to complete a typical MPN fold. Analyses of the pseudo metal-binding motif in CSN6 suggest that the loss of two key histidine residues may contribute to the lack of catalytic activity in CSN6. Comparing the MPN domain of our CSN6 with that in the CSN complex shows that apart from the different β8–β9 conformation, they have minor conformational differences at two insertion regions (Ins-1 and Ins-2). Besides, the interacting mode of CSN6–CSN6 in our structure is distinct from that of CSN5–CSN6 in the CSN complex structure. Moreover, the functional implications for Ins-1 and Ins-2 are discussed.  相似文献   

18.
Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory.  相似文献   

19.
A genetic linkage map has been constructed for meadow fescue (Festuca pratensis Huds.) (2n=2x=14) using a full-sib family of a cross between a genotype from a Norwegian population (HF2) and a genotype from a Yugoslavian cultivar (B14). The two-way pseudo-testcross procedure has been used to develop separate maps for each parent, as well as a combined map. A total number of 550 loci have been mapped using homologous and heterologous RFLPs, AFLPs, isozymes and SSRs. The combined map consists of 466 markers, has a total length of 658.8 cM with an average marker density of 1.4 cM/marker. A high degree of orthology and colinearity was observed between meadow fescue and the Triticeae genome(s) for all linkage groups, and the individual linkage groups were designated 1F–7F in accordance with the orthologous Triticeae chromosomes. As expected, the meadow fescue linkage groups were highly orthologous and co-linear with Lolium, and with oat, maize and sorghum, generally in the same manner as the Triticeae chromosomes. It was shown that the evolutionary 4AL/5AL translocation, which characterises some of the Triticeae species, is not present in the meadow fescue genome. A putative insertion of a segment orthologous to Triticeae 2 at the top of 6F, similar to the rearrangement found in the wheat B and the rye R genome, was also observed. In addition, chromosome 4F is completely orthologous to rice chromosome 3 in contrast to the Triticeae where this rice chromosome is distributed over homoeologous group 4 and 5 chromosomes. The meadow fescue genome thus has a more ancestral configuration than any of the Triticeae genomes. The extended meadow fescue map reported here provides the opportunity for beneficial cross-species transfer of genetic knowledge, particularly from the complete genome sequence of rice.Communicated by P. Langridge  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号