首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The published principles of computer analysis of genomes and protein sets in taxonomically distant eukaryotes are expounded. The authors developed a search strategy to identify in genomes of such organisms genes and proteins nonhomologous in primary structure but having similar functions in cells dividing by meiosis. This strategy based on the combined principles of genomics, proteomics, and morphometric analysis of subcellular structures was applied to a computer search for genes encoding the proteins of synaptonemal complexes in genomes of Drosophila melanogaster, the nematode Caenorhabditis elegans, and the plant Arabidopsis thaliana. These proteins proved to be functionally similar to their counterparts in yeast Saccharomyces cerevisiae (protein Zip1p) and mammals (protein SCP1).  相似文献   

2.
Two families of genes related to, and including, rolling circle replication initiator protein (Rep) genes were defined by sequence similarity and by evidence of intergene family recombination. The Rep genes of circoviruses were the best characterized members of the "RecRep1 family." Other members of the RecRep1 family were Rep-like genes found in the genomes of the Canarypox virus, Entamoeba histolytica, and Giardia duodenalis and in a plasmid, p4M, from the Gram-positive bacterium, Bifidobacterium pseudocatenulatum. The "RecRep2 family" comprised some previously identified Rep-like genes from plasmids of phytoplasmas and similar Rep-like genes from the genomes of Lactobacillus acidophilus, Lactococcus lactis, and Phytoplasma asteris. Both RecRep1 and RecRep2 proteins have a nucleotide-binding domain significantly similar to the helicases (2C proteins) of picorna-like viruses. On the N-terminal side of the nucleotide binding domain, RecRep1 proteins have a domain significantly similar to one found in nanovirus Reps, whereas RecRep2 proteins have a domain significantly similar to one in the Reps of pLS1 plasmids. We speculate that RecRep genes have been transferred from viruses or plasmids to parasitic protozoan and bacterial genomes and that Rep proteins were themselves involved in the original recombination events that generated the ancestral RecRep genes.  相似文献   

3.
Recent studies have demonstrated that genomes of poliovirus with deletions in the P1 (capsid) region contain the necessary viral information for RNA replication. To test the effects of the substitution of foreign genes on RNA replication and protein expression, chimeric human immunodeficiency virus type 1 (HIV-1)-poliovirus genomes were constructed in which regions of the gag, pol, or env gene of HIV-1 were substituted for regions of the P1 gene in the infectious cDNA clone of type 1 Mahoney poliovirus. The HIV-1 genes were inserted between nucleotides 1174 and 2956 of the poliovirus cDNA so that the translational reading frame was maintained between the HIV-1 genes and the remaining poliovirus genes. The chimeric genomes were positioned downstream from a T7 RNA polymerase promoter and transcribed in vitro by using T7 RNA polymerase, and the RNA was transfected into HeLa cells. A Northern (RNA blot) analysis of the RNA from transfected cells demonstrated the appropriate-size RNA, corresponding to the full-length chimeric genomes, which increased over time. Immunoprecipitation with antibodies specific for poliovirus RNA polymerase or sera from AIDS patients demonstrated the expression of the poliovirus RNA polymerase and HIV-1 proteins as fusions with the poliovirus P1 protein. The expression of the HIV-1-poliovirus P1 fusion protein was dependent upon an intact RNA polymerase gene, indicating that RNA replication was required for efficient expression. A pulse-chase analysis of the protein expression from the chimeric genomes demonstrated the initial rapid proteolytic processing of the polyprotein from the chimeric genomes to give HIV-1-poliovirus P1 fusion protein in transfected cells; the HIV-1 gag-P1 and HIV-1 pol-P1 fusion proteins exhibited a greater intracellular stability than the HIV-1 env-P1 fusion protein. Finally, superinfection with wild-type poliovirus of HeLa cells which had been transfected with the chimeric genomes did not significantly affect the expression of chimeric fusion protein. The results are discussed in the context of poliovirus RNA replication and demonstrate the feasibility of using poliovirus genomes (minireplicons) as novel vectors for expression of foreign proteins.  相似文献   

4.
Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.  相似文献   

5.
Improving gene annotation of complete viral genomes   总被引:4,自引:0,他引:4       下载免费PDF全文
Gene annotation in viruses often relies upon similarity search methods. These methods possess high specificity but some genes may be missed, either those unique to a particular genome or those highly divergent from known homologs. To identify potentially missing viral genes we have analyzed all complete viral genomes currently available in GenBank with a specialized and augmented version of the gene finding program GeneMarkS. In particular, by implementing genome-specific self-training protocols we have better adjusted the GeneMarkS statistical models to sequences of viral genomes. Hundreds of new genes were identified, some in well studied viral genomes. For example, a new gene predicted in the genome of the Epstein–Barr virus was shown to encode a protein similar to α-herpesvirus minor tegument protein UL14 with heat shock functions. Convincing evidence of this similarity was obtained after only 12 PSI-BLAST iterations. In another example, several iterations of PSI-BLAST were required to demonstrate that a gene predicted in the genome of Alcelaphine herpesvirus 1 encodes a BALF1-like protein which is thought to be involved in apoptosis regulation and, potentially, carcinogenesis. New predictions were used to refine annotations of viral genomes in the RefSeq collection curated by the National Center for Biotechnology Information. Importantly, even in those cases where no sequence similarities were detected, GeneMarkS significantly reduced the number of primary targets for experimental characterization by identifying the most probable candidate genes. The new genome annotations were stored in VIOLIN, an interactive database which provides access to similarity search tools for up-to-date analysis of predicted viral proteins.  相似文献   

6.
J Soppa 《Gene》2001,278(1-2):253-264
Structural maintenance of chromosomes (SMC) proteins are known to be essential for chromosome segregation in some prokaryotes and in eukaryotes. A systematic search for the distribution of SMC proteins in prokaryotes with fully or partially sequenced genomes showed that they form a larger family than previously anticipated and raised the number of known prokaryotic homologs to 54. Secondary structure predictions revealed that the length of the globular N-terminal and C-terminal domains is extremely well conserved in contrast to the hinge domain and coiled-coil domains which are considerably shorter in several bacterial species. SMC proteins are present in all gram-positive bacteria and in nearly all archaea while they were found in less than half of the gram-negative bacteria. Phylogenetic analyses indicate that the SMC tree roughly resembles the 16S rRNA tree, but that cyanobacteria and Aquifex aeolicus obtained smc genes by lateral transfer from archaea. Fourteen out of 22 smc genes located in fully sequenced genomes seem to be co-transcribed with a second gene out of six different gene families, indicating that the deduced gene products might be involved in similar functions. The SMC proteins were compared with other prokaryotic proteins with long coiled-coil domains. The lengths of different protein domains and signature sequences allowed to differentiate SMCs, MukBs, which were found to be confined to gamma proteobacteria, and two subfamilies of COG 0419 including the SbcC nuclease from E. coli. A phylogenetic analysis was performed including the prokaryotic coiled-coil proteins as well as SMCs and Rad18 proteins from selected eukaryotes.  相似文献   

7.
Most genes in evolutionarily complex genomes are expressed to multiple protein isoforms, but there is not yet any simple high‐throughput approach to identify these isoforms. Using an oversimplified top‐down LC–MS/MS strategy, we detected, around the 26‐kD position of SDS‐PAGE, proteins produced from 782 genes in a Cdk4?/? mouse embryonic fibroblast cell line. Interestingly, only 213 (27.24%, about one‐fourth) of these 782 genes have their proteins with a theoretical molecular mass (TMM) 10% smaller or larger than 26 kD, that is, between 23 and 29 kD, the range set as allowed variation in SDS‐PAGE. These 213 proteins are considered as the wild type (WT). The remaining three‐fourths includes proteins from 66 (9.44%) genes with a TMM smaller than 23 kD and proteins from 503 (64.32%, nearly two‐thirds) genes with a TMM larger than 29 kD; these proteins are categorized into a larger‐group or a smaller‐group, respectively, for their appearance at a higher or lower position of SDS‐PAGE. For instance, at this 26‐kD position we detected proteins from the Rps27a, Snrpf, Hist1h4a, and Rps25 genes whose proteins' TMM is 8.6, 9.7, 11.4, and 13.7 kD, respectively, and detected proteins from the Plelc1 and Prkdc genes, whose largest isoform is 533.9 and 471.1 kD, respectively. We extrapolate that many of those proteins migrating unexpectedly in SDS‐PAGE may be isoforms besides the WT protein. Moreover, we also detected a Cdk4 protein in this Cdk4?/? cell line, thus wondering whether some of other gene‐knockout cells or organisms show similar incompleteness of the knockout.  相似文献   

8.
9.
Comparison and classification of folding patterns from a database of protein structures is crucial to understand the principles of protein architecture, evolution and function. Current search methods for proteins with similar folding patterns are slow and computationally intensive. The sharp growth in the number of known protein structures poses severe challenges for methods of structural comparison. There is a need for methods that can search the database of structures accurately and rapidly. We provide several methods to search for similar folding patterns using a concise tableau representation of proteins that encodes the relative geometry of secondary structural elements. Our first approach allows the extraction of identical and very closely-related protein folding patterns in constant-time (per hit). Next, we address the hard computational problem of extraction of maximally-similar subtableaux, when comparing two tableaux. We solve the problem using Quadratic and Linear integer programming formulations and demonstrate their power to identify subtle structural similarities, especially when protein structures significantly diverge. Finally, we describe a rapid and accurate method for comparing a query structure against a database of protein domains, TableauSearch. TableauSearch is rapid enough to search the entire structural database in seconds on a standard desktop computer. Our analysis of TableauSearch on many queries shows that the method is very accurate in identifying similarities of folding patterns, even between distantly related proteins. AVAILABILITY: A web server implementing the TableauSearch is available from http://hollywood.bx.psu.edu/TabSearch.  相似文献   

10.
11.
12.
Microbial genomes encompass a sizable fraction of poorly characterized, narrowly spread fast-evolving genes. Using sensitive methods for sequences comparison and protein structure prediction, we performed a detailed comparative analysis of clusters of such genes, which we denote “dark matter islands”, in archaeal genomes. The dark matter islands comprise up to 20 % of archaeal genomes and show remarkable heterogeneity and diversity. Nevertheless, three classes of entities are common in these genomic loci: (a) integrated viral genomes and other mobile elements; (b) defense systems, and (c) secretory and other membrane-associated systems. The dark matter islands in the genome of thermophiles and mesophiles show similar general trends of gene content, but thermophiles are substantially enriched in predicted membrane proteins whereas mesophiles have a greater proportion of recognizable mobile elements. Based on this analysis, we predict the existence of several novel groups of viruses and mobile elements, previously unnoticed variants of CRISPR-Cas immune systems, and new secretory systems that might be involved in stress response, intermicrobial conflicts and biogenesis of novel, uncharacterized membrane structures.  相似文献   

13.
Three specific proteins, called A, 70K and C, are present in the U1 small nuclear ribonucleoprotein (snRNP) particle, in addition to the common proteins. The human U1 snRNP-specific A protein is, apart from a proline-rich region, highly similar to the U2 snRNP-specific protein B". To examine the homologous regions at the genomic level, we isolated and characterized the human U1-A gene. The human U1-A protein appears to be encoded by a single-copy gene and its locus has been mapped to the q arm of chromosome 19. The gene, about 14-16 kb in length, consists of six exons. The regions homologous to the U2-B" gene are not limited to single exons and are mostly not confined by exon-exon junctions in the corresponding U1-A mRNA. However, the proline-rich region of U1-A, absent in U2-B", is encoded by a single exon, suggesting a specific function for this domain of U1-A. The region of the cap site and upstream sequences contain interesting similarities to the promoter region of other snRNP protein-encoding genes and several housekeeping genes, in particular the vertebrate ribosomal protein-encoding genes. Hybridization experiments with various vertebrate genomic DNAs revealed that U1-A sequences are evolutionarily conserved in all tested vertebrate genomes, except for chicken, duck and pigeon. The divergence of these avian genomes is probably typical for the class of birds.  相似文献   

14.
The genomes of defective-interfering (DI) particles derived from the Sabin strain of type 1 poliovirus (PV1(Sab] were characterized by nuclease S1 mapping using complementary DNA (cDNA) copies of PV1(Sab) genome as probes. The results demonstrated variety in the size and location of the deletions, which were compatible with our previous prediction. The results further indicated that the locations of the deletions were limited within the internal genome region encoding viral capsid proteins and that the deletion sites were clustered in certain areas on the genome. Sequence analysis of a number of cloned cDNAs to the DI genomes revealed that every DI genome retained the correct reading frame for viral protein synthesis. These results strongly suggested that one or all of the viral non-structural proteins might be cis-acting at least at a certain stage in viral replication. A computer search for secondary structures with regard to the deletion sites provided a possible common structure from which, supported by sequences existing on the plus or minus RNA strand of PV1(Sab), deletion regions looped out from the remaining sequences. Replicase might, therefore, skip these transiently formed loop structures with certain frequencies, resulting in the generation of DI genomes. This model could also be considered as a model for genetic recombination in these RNA genomes. Possible "supporting sequences" were also found for every rearranged site on the RNAs of influenza virus and sindbis virus. Thus, we propose a new copy-choice model, designated the "supporting sequence-loop model", for the generation of rearrangements occurring on single-stranded RNA genomes.  相似文献   

15.
The Colorado tick fever virus (CTFV) is the type species of genus Coltivirus, family Reoviridae. Its genome consisting of 12 segments of dsRNA was completely sequenced. It was found to be 29,174 nucleotides long (the longest of all Reoviridae genomes characterized to date). Conserved sequences at the 5' end (SACUUUUGY) and at the 3' end (WUGCAGUS) of the 12 segments were identified. The analysis of the putative proteins deduced from the nucleotide sequences permitted to identify functional motifs. In particular, the VP1 was identified unambiguously as the viral RNA dependent RNA pylmerase (RDRP) (VP1pol), with a GDD located at a similar position to Reoviridae RDRPs. In other genes, RGD cell-binding, NTPAse, single strand binding protein and kinase motifs were identified. Comparison with Reoviridae proteins showed significant similarities to RDRPs (CTFV-VP1) and sigma C protein of orthoreovirus (CTFV-VP6). Similarities to nonviral enzymatic proteins, such as methyltransferases, NTPAses, RNA replication factors, were also identified.  相似文献   

16.
The genomes of the related crenarchaea Pyrobaculum aerophilum and Thermoproteus tenax lack any obvious gene encoding a single-stranded DNA binding protein (SSB). SSBs are essential for DNA replication, recombination, and repair and are found in all other genomes across the three domains of life. These two archaeal genomes also have only one identifiable gene encoding a chromatin protein (the Alba protein), while most other archaea have at least two different abundant chromatin proteins. We performed a biochemical screen for novel nucleic acid binding proteins present in cell extracts of T. tenax. An assay for proteins capable of binding to a single-stranded DNA oligonucleotide resulted in identification of three proteins. The first protein, Alba, has been shown previously to bind single-stranded DNA as well as duplex DNA. The two other proteins, which we designated CC1 (for crenarchaeal chromatin protein 1), are very closely related to one another, and homologs are restricted to the P. aerophilum and Aeropyrum pernix genomes. CC1 is a 6-kDa, monomeric, basic protein that is expressed at a high level in T. tenax. This protein binds single- and double-stranded DNAs with similar affinities. These properties are consistent with a role for CC1 as a crenarchaeal chromatin protein.  相似文献   

17.
E P Rocha  A Danchin    A Viari 《Nucleic acids research》1999,27(17):3567-3576
We analysed the Bacillus subtilis protein coding sequences termini, and compared it to other genomes. The analysis focused on signals, com-positional biases of nucleotides, oligonucleotides, codons and amino acids and mRNA secondary structure. AUG is the preferred start codon in all genomes, independent of their G+C content, and seems to induce less stable mRNA structures. However, it is not conserved between homologous genes neither is it preferred in highly expressed genes. In B.subtilis the ribosome binding site is very strong. We found that downstream boxes do not seem to exist either in Escherichia coli or in B.subtilis. UAA stop codon usage is correlated with the G+C content and is strongly selected in highly expressed genes. We found less stable mRNA structures at both termini, which we related to mRNA-ribosome and mRNA-release-factor interactions. This pattern seems to impose a peculiar A-rich nucleotide and codon usage bias in these regions. Finally the analysis of all proteins from B.subtilis revealed a similar amino acid bias near both termini of proteins consisting of over-representation of hydrophilic residues. This bias near the stop codon is partially release-factor specific.  相似文献   

18.
The gene composition of present-day genomes has been shaped by a complicated evolutionary history, resulting in diverse distributions of genes across genomes. The pattern of presence and absence of a gene in different genomes is called its phylogenetic profile. It has been shown that proteins whose encoding genes have highly similar profiles tend to be functionally related: As these genes were gained and lost together, their encoded proteins can probably only perform their full function if both are present. However, a large proportion of genes encoding interacting proteins do not have matching profiles. In this study, we analysed one possible reason for this, namely that phylogenetic profiles can be affected by multi-functional proteins such as shared subunits of two or more protein complexes. We found that by considering triplets of proteins, of which one protein is multi-functional, a large fraction of disturbed co-occurrence patterns can be explained.  相似文献   

19.
20.
Predicted highly expressed genes of diverse prokaryotic genomes   总被引:13,自引:0,他引:13       下载免费PDF全文
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号