首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 435 毫秒
1.
Despite recent advances, accurate gene function prediction remains an elusive goal, with very few methods directly applicable to the plant Arabidopsis thaliana. In this study, we present GO‐At (gene ontology prediction in A. thaliana), a method that combines five data types (co‐expression, sequence, phylogenetic profile, interaction and gene neighbourhood) to predict gene function in Arabidopsis. Using a simple, yet powerful two‐step approach, GO‐At first generates a list of genes ranked in descending order of probability of functional association with the query gene. Next, a prediction score is automatically assigned to each function in this list based on the assumption that functions appearing most frequently at the top of the list are most likely to represent the function of the query gene. In this way, the second step provides an effective alternative to simply taking the ‘best hit’ from the first list, and achieves success rates of up to 79%. GO‐At is applicable across all three GO categories: molecular function, biological process and cellular component, and can assign functions at multiple levels of annotation detail. Furthermore, we demonstrate GO‐At’s ability to predict functions of uncharacterized genes by identifying ten putative golgins/Golgi‐associated proteins amongst 8219 genes of previously unknown cellular component and present independent evidence to support our predictions. A web‐based implementation of GO‐At ( http://www.bioinformatics.leeds.ac.uk/goat ) is available, providing a unique resource for plant researchers to make predictions for uncharacterized genes and predict novel functions in Arabidopsis.  相似文献   

2.
3.
Linkage Map of Escherichia coli K-12, Edition 10: The Physical Map   总被引:2,自引:0,他引:2       下载免费PDF全文
A physical map, EcoMap10, of the now completely sequenced Escherichia coli chromosome is presented. Calculated genomic positions for the eight restriction enzymes BamHI, HindIII, EcoRI, EcoRV, BglI, KpnI, PstI, and PvuII are depicted. Both sequenced and unsequenced Kohara/Isono miniset clones are aligned to this calculated restriction map. DNA sequence searches identify the precise locations of insertion sequence elements and repetitive extragenic palindrome clusters. EcoGene10, a revised set of genes and functionally uncharacterized open reading frames (ORFs), is also depicted on EcoMap10. The complete set of unnamed ORFs in EcoGene10 are assigned provisional names beginning with the letter “y” by using a systematic nomenclature.  相似文献   

4.
5.
Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.  相似文献   

6.
In bacteriophage T4, there is a strong tendency for genes that encode interacting proteins to be clustered on the chromosome. There is 1.6 kb of DNA between the DNA helicase (gene 41) and the DNA primase (gene 61) genes of this virus. The DNA sequence of this region suggests that it contains five genes, designated as open reading frames (ORFs) 61.1 to 61.5, predicted to encode proteins ranging in size from 5.94 to 22.88 kDa. Are these ORFs actually genes? As one test, we compared the DNA sequence of this region in bacteriophages T2, T4, and T6 and found that ORFs 61.1, 61.3, 61.4, and 61.5 are highly conserved among the three closely related viruses. In contrast, ORF 61.2 is conserved between phages T4 and T6 yet is absent from phage T2, where it is replaced by another ORF, T2 ORF 61.2, which is not found in the T4 and T6 genomes. As a second, independent test for coding sequences, we calculated the codon base position preferences for all ORFs in this region that could encode proteins that contain at least 30 amino acids. Both the T4/T6 and T2 versions of ORF 61.2, as well as the other ORFs, have codon base position preferences that are indistinguishable from those of known T4 genes (coefficients of 0.81 to 0.94); the six other possible ORFs of at least 90 bp in this region are ruled out as genes by this test (coefficients less than zero). Thus, both evolutionary conservation and codon usage patterns lead us to conclude that ORFs 61.1 to 61.5 represent important protein-coding sequences for this family of bacteriophages. Because they are located between the genes that encode the two interacting proteins of the T4 primosome (DNA helicase plus DNA primase), one or more may function in DNA replication by modulating primosome function.  相似文献   

7.
The complete sequence of the genome of an aerobic hyper-thermophiliccrenarchaeon, Aeropyrum pernix K1, which optimally grows at95°C, has been determined by the whole genome shotgun methodwith some modifications. The entire length of the genome was1,669,695 bp. The authenticity of the entire sequence was supportedby restriction analysis of long PCR products, which were directlyamplified from the genomic DNA. As the potential protein-codingregions, a total of 2,694 open reading frames (ORFs) were assigned.By similarity search against public databases, 633 (23.5%) ofthe ORFs were related to genes with putative function and 523(19.4%) to the sequences registered but with unknown function.All the genes in the TCA cycle except for that of alpha-ketoglutaratedehydrogenase were included, and instead of the alpha-ketoglutaratedehydrogenase gene, the genes coding for the two subunits of2-oxoacid:ferredoxin oxidoreductase were identified. The remaining1,538 ORFs (57.1%) did not show any significant similarity tothe sequences in the databases. Sequence comparison among theassigned ORFs suggested that a considerable member of ORFs weregenerated by sequence duplication. The RNA genes identifiedwere a single 16S–23S rRNA operon, two 5S rRNA genes and47 tRNA genes including 14 genes with intron structures. Allthe assigned ORFs and RNA coding regions occupied 89.12% ofthe whole genome. The data presented in this paper are availableon the internet homepage (http://www.mild.nite.go.jp).  相似文献   

8.
The genome sequence of the genetically tractable, mesophilic, hydrogenotrophic methanogen Methanococcus maripaludis contains 1,722 protein-coding genes in a single circular chromosome of 1,661,137 bp. Of the protein-coding genes (open reading frames [ORFs]), 44% were assigned a function, 48% were conserved but had unknown or uncertain functions, and 7.5% (129 ORFs) were unique to M. maripaludis. Of the unique ORFs, 27 were confirmed to encode proteins by the mass spectrometric identification of unique peptides. Genes for most known functions and pathways were identified. For example, a full complement of hydrogenases and methanogenesis enzymes was identified, including eight selenocysteine-containing proteins, with each being paralogous to a cysteine-containing counterpart. At least 59 proteins were predicted to contain iron-sulfur centers, including ferredoxins, polyferredoxins, and subunits of enzymes with various redox functions. Unusual features included the absence of a Cdc6 homolog, implying a variation in replication initiation, and the presence of a bacterial-like RNase HI as well as an RNase HII typical of the Archaea. The presence of alanine dehydrogenase and alanine racemase, which are uniquely present among the Archaea, explained the ability of the organism to use L- and D-alanine as nitrogen sources. Features that contrasted with the related organism Methanocaldococcus jannaschii included the absence of inteins, even though close homologs of most intein-containing proteins were encoded. Although two-thirds of the ORFs had their highest Blastp hits in Methanocaldococcus jannaschii, lateral gene transfer or gene loss has apparently resulted in genes, which are often clustered, with top Blastp hits in more distantly related groups.  相似文献   

9.
10.
Streptomyces xinghaiensis is a Gram-positive, aerobic and non-motile bacterium. The bacterial genome is known. Therefore, it is of interest to study the uncharacterized proteins in the genome. An uncharacterized protein (gi|518540893|86 residues) in the genome was selected for a comprehensive computational sequence-structure-function analysis using available data and tools. Subcellular localization of the targeted protein with conserved residues and assigned secondary structures is documented. Sequence homology search against the protein data bank (PDB) and non-redundant GenBank proteins using BLASTp showed different homologous proteins with known antitoxin function. A homology model of the target protein was developed using a known template (PDB ID: 3CTO:A) with 62% sequence similarity in HHpred after assessment using programs PROCHECK and QMEAN6. The predicted active site using CASTp is analyzed for assigned anti-toxin function. This information finds specific utility in annotating the said uncharacterized protein in the bacterial genome.  相似文献   

11.
Enyenihi AH  Saunders WS 《Genetics》2003,163(1):47-54
We have used a single-gene deletion mutant bank to identify the genes required for meiosis and sporulation among 4323 nonessential Saccharomyces cerevisiae annotated open reading frames (ORFs). Three hundred thirty-four sporulation-essential genes were identified, including 78 novel ORFs and 115 known genes without previously described sporulation defects in the comprehensive Saccharomyces Genome (SGD) or Yeast Proteome (YPD) phenotype databases. We have further divided the uncharacterized sporulation-essential genes into early, middle, and late stages of meiosis according to their requirement for IME1 induction and nuclear division. We believe this represents a nearly complete identification of the genes uniquely required for this complex cellular pathway. The set of genes identified in this phenotypic screen shows only limited overlap with those identified by expression-based studies.  相似文献   

12.
13.
Solute transport systems are one of the major ways in which organisms interact with their environment. Typically, transport is catalysed by integral membrane proteins, of which one of the largest groups is the ATP‐binding cassette (ABC) proteins. On the basis of sequence similarities, a large family of ABC proteins has been identified in Arabidopsis. A total of 60 open reading frames (ORFs) encoding ABC proteins were identified by BLAST homology searching of the nuclear genome. These 60 putative proteins include 89 ABC domains. Based on the assignment of transmembrane domains (TMDs), at least 49 of the 60 proteins identified are ABC transporters. Of these 49 proteins, 28 are full‐length ABC transporters (eight of which have been described previously), and 21 are uncharacterized half‐transporters. Three of the remaining proteins identified appear to be soluble, lacking identifiable TMDs, and most likely have non‐transport functions. The eight other ORFs have homology to the nucleotide‐binding and transmembrane components of multi‐subunit permeases. The majority of ABC proteins found in Arabidopsis can, on the basis of sequence homology, be assigned to subfamilies equivalent to those found in the yeast genome. This assignment of the Arabidopsis ABC proteins into easily recognizable subfamilies (with distinguishable subclusters) is an important first step in the elucidation of their functional role in higher plants.  相似文献   

14.
V Paces  C Vlcek  P Urbánek  Z Hostomsky 《Gene》1986,44(1):115-120
We have sequenced the rightmost 2079 bp of the Bacillus subtilis phage PZA genome. This region encompasses the right early region. We compared it with the homologous region of phage phi 29. Six open reading frames (ORFs) were found in this region of PZA and one of them was assigned to gene 17. Analysis of putative ribosome-binding sites and comparison with phi 29 ORFs indicate that at least some of the remaining ORFs could encode proteins. Corresponding genes were not identified so far by genetic methods. Promoter candidates in the right early region of PZA were found and compared to phi 29 promoters. The sequenced region together with previously determined sequences [Paces et al., Gene 38 (1985) 45-56 and 44 (1986) 107-114] completes the entire 19,366-bp sequence of phage PZA genome.  相似文献   

15.
The complete nucleotide sequence of Saccharomyces cerevisiae chromosome X (745 442 bp) reveals a total of 379 open reading frames (ORFs), the coding region covering approximately 75% of the entire sequence. One hundred and eighteen ORFs (31%) correspond to genes previously identified in S. cerevisiae. All other ORFs represent novel putative yeast genes, whose function will have to be determined experimentally. However, 57 of the latter subset (another 15% of the total) encode proteins that show significant analogy to proteins of known function from yeast or other organisms. The remaining ORFs, exhibiting no significant similarity to any known sequence, amount to 54% of the total. General features of chromosome X are also reported, with emphasis on the nucleotide frequency distribution in the environment of the ATG and stop codons, the possible coding capacity of at least some of the small ORFs (<100 codons) and the significance of 46 non-canonical or unpaired nucleotides in the stems of some of the 24 tRNA genes recognized on this chromosome.  相似文献   

16.
Operons are clusters of genes that are co-regulated from a common promoter. Operons are typically associated with prokaryotes, although a small number of eukaryotes have been shown to possess them. Among metazoans, operons have been extensively characterized in the nematode Caenorhabditis elegans in which ~15% of the total genes are organized into operons. The most recent genome assembly for the ascidian Ciona intestinalis placed ~20% of the genes (2909 total) into 1310 operons. The majority of these operons are composed of two genes, while the largest are composed of six. Here is reported a computational analysis of the genes that comprise the Ciona operons. Gene ontology (GO) terms were identified for about two-thirds of the operon-encoded genes. Using the extensive collection of public EST libraries, estimates of temporal patterns of gene expression were generated for the operon-encoded genes. Lastly, conservation of operons was analyzed by determining how many operon-encoded genes were present in the ascidian Ciona savignyi and whether these genes were organized in orthologous operons. Over 68% of the operon-encoded genes could be assigned one or more GO terms and 697 of the 1310 operons contained genes in which all genes had at least one GO term. Of these 697 operons, GO terms were shared by all of the genes within 146 individual operons, suggesting that most operons encode genes with unrelated functions. An analysis of operon gene expression from nine different EST libraries indicated that for 587 operons, all of the genes that comprise an individual operon were expressed together in at least one EST library, suggesting that these genes may be co-regulated. About 50% (74/146) of the operons with shared GO terms also showed evidence of gene co-regulation. Comparisons with the C. savignyi genome identified orthologs for 1907 of 2909 operon genes. About 38% (504/1310) of the operons are conserved between the two Ciona species. These results suggest that like C. elegans, operons in Ciona are comprised of a variety of genes that are not necessarily related in function. The genes in only 50% of the operons appear to be co-regulated, suggesting that more complex gene regulatory mechanisms are likely operating.  相似文献   

17.
【目的】挖掘梨小食心虫Grapholita molesta幼虫中肠中高表达消化酶和解毒酶基因,为今后研究以肠道为靶标的新型农药和转基因作物提供理论依据。【方法】基于梨小食心虫4龄幼虫中肠转录组高通量测序数据的FPKM值,筛选高表达基因,进行GO功能注释和KEGG通路富集分析,并使用BLAST软件进行比对筛选高表达的消化酶和解毒酶基因,利用MEGA对这些高表达的消化酶和解毒酶及其他鳞翅目昆虫的同源蛋白进行系统发育分析。利用qRT-PCR技术对梨小食心虫幼虫不同龄期中肠中的高表达代表性消化酶和解毒酶基因表达量进行定量分析和验证。【结果】在GO数据库中注释了103 677个在梨小食心虫4龄幼虫中肠中高表达基因,包括细胞组分、分子功能和生物学进程三大类功能共41个分支。KEGG通路分析表明,10 846个高表达基因参与了5类生化代谢通路。筛选到具有完整开放阅读框的消化酶基因17个[5个胰蛋白酶(trypsin, TRY)基因、3个氨肽酶(aminopeptidase, APN)基因和9个羧肽酶(carboxypeptidase, CP)基因]和解毒酶基因32个[11个谷胱甘肽S-转移酶(glutathione S-transferase, GST)基因、13个细胞色素P450(cytochrome P450, CYP450)基因和8个羧酸脂酶(carboxylesterase, CarE)基因]。系统发育分析结果表明,梨小食心虫的消化酶同源聚类分支较为分散,GSTs和CYP450s分支聚类较为集中,但都至少与1个鳞翅目昆虫同源蛋白聚在一支。qRT-PCR验证结果表明,消化酶和解毒酶基因在不同龄期梨小食心虫幼虫中肠中的表达量差异显著,表达量均在4龄幼虫期最高。【结论】本研究成功筛选和验证部分梨小食心虫幼虫中肠中高表达的消化酶和解毒酶基因,明确其与鳞翅目其他昆虫同源蛋白的进化关系。研究结果为鳞翅目其他近缘昆虫的转录组分析和以肠道为靶标的害虫防治提供了参考。  相似文献   

18.
Ueberle B  Frank R  Herrmann R 《Proteomics》2002,2(6):754-764
An existing proteome map of the bacterium Mycoplasma pneumoniae comprising proteins from 224 genes was extended to 305 genes. This corresponds to about 44% of the 688 proposed genome sequence derived open reading frames (ORFs). The newly assigned gene products were enriched, separated by one-dimensional or two-dimensional (2-D) gel electrophoresis and identified by mass spectrometry. The enrichment procedures included differential centrifugation, anion and cation exchange chromatography, affinity chromatography with heparin as a ligand and isolation of biotinylated proteins by binding to immobilized streptavidin. A comparative analysis of the identified proteins from 305 genes with the as yet unverified 383 ORFs concerning isoelectric point, molecular weight and number of transmembrane segments revealed that proteins with more than three predicted transmembrane segments and an isoelectric point above 10.5 are most likely not to be separated by 2-D gel electrophoresis. The mutual benefits of genomics and proteomics were shown by the identification of a todate unannotated 128 amino acid long protein.  相似文献   

19.
The complete sequence of the genome of a hyper-thermophilicarchaebacterium, Pyrococcus horikoshii OT3, has been determinedby assembling the sequences of the physical map-based contigsof fosmid clones and of long polymerase chain reaction (PCR)products which were used for gap-filling. The entire lengthof the genome was 1,738,505 bp. The authenticity of the entiregenome sequence was supported by restriction analysis of longPCR products, which were directly amplified from the genomicDNA. As the potential protein-coding regions, a total of 2061open reading frames (ORFs) were assigned, and by similaritysearch against public databases, 406 (19.7%) were related togenes with putative function and 453 (22.0%) to the sequencesregistered but with unknown function. The remaining 1202 ORFs(58.3%) did not show any significant similarity to the sequencesin the databases. Sequence comparison among the assigned ORFsin the genome provided evidence that a considerable number ofORFs were generated by sequence duplication. By similarity search,11 ORFs were assumed to contain the intein elements. The RNAgenes identified were a single 16S-23S rRNA operon, two 5S rRNAgenes and 46 tRNA genes including two with the intron structure.All the assigned ORFs and RNA coding regions occupied 91.25%of the whole genome. The data presented in this paper are availableon the internet at http://www.nite.go.jp.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号