首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
2.
The genome of Arabidopsis has been searched for sequences of genes involved in acyl lipid metabolism. Over 600 encoded proteins have been identified, cataloged, and classified according to predicted function, subcellular location, and alternative splicing. At least one-third of these proteins were previously annotated as "unknown function" or with functions unrelated to acyl lipid metabolism; therefore, this study has improved the annotation of over 200 genes. In particular, annotation of the lipolytic enzyme group (at least 110 members total) has been improved by the critical examination of the biochemical literature and the sequences of the numerous proteins annotated as "lipases." In addition, expressed sequence tag (EST) data have been surveyed, and more than 3,700 ESTs associated with the genes were cataloged. Statistical analysis of the number of ESTs associated with specific cDNA libraries has allowed calculation of probabilities of differential expression between different organs. More than 130 genes have been identified with a statistical probability > 0.95 of preferential expression in seed, leaf, root, or flower. All the data are available as a Web-based database, the Arabidopsis Lipid Gene database (http://www.plantbiology.msu.edu/lipids/genesurvey/index.htm). The combination of the data of the Lipid Gene Catalog and the EST analysis can be used to gain insights into differential expression of gene family members and sets of pathway-specific genes, which in turn will guide studies to understand specific functions of individual genes.  相似文献   

3.
Annotating the genome of Medicago truncatula   总被引:3,自引:0,他引:3  
Medicago truncatula will be among the first plant species to benefit from the completion of a whole-genome sequencing project. For each of these species, Arabidopsis, rice and now poplar and Medicago, annotation, the process of identifying gene structures and defining their functions, is essential for the research community to benefit from the sequence data generated. Annotation of the Arabidopsis genome involved gene-by-gene curation of the entire genome, but the larger genomes of rice, Medicago and other species necessitate the automation of the annotation process. Profiting from the experience gained from previous whole-genome efforts, a uniform set of Medicago gene annotations has been generated by coordinated international effort and, along with other views of the genome data, has been provided to the research community at several websites.  相似文献   

4.
《PLoS biology》2005,3(2):e38
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.  相似文献   

5.
The Genomes of Oryza sativa: a history of duplications   总被引:6,自引:0,他引:6       下载免费PDF全文
Yu J  Wang J  Lin W  Li S  Li H  Zhou J  Ni P  Dong W  Hu S  Zeng C  Zhang J  Zhang Y  Li R  Xu Z  Li S  Li X  Zheng H  Cong L  Lin L  Yin J  Geng J  Li G  Shi J  Liu J  Lv H  Li J  Wang J  Deng Y  Ran L  Shi X  Wang X  Wu Q  Li C  Ren X  Wang J  Wang X  Li D  Liu D  Zhang X  Ji Z  Zhao W  Sun Y  Zhang Z  Bao J  Han Y  Dong L  Ji J  Chen P  Wu S  Liu J  Xiao Y  Bu D  Tan J  Yang L  Ye C  Zhang J  Xu J  Zhou Y  Yu Y  Zhang B  Zhuang S  Wei H  Liu B  Lei M  Yu H  Li Y  Xu H  Wei S  He X  Fang L  Zhang Z  Zhang Y  Huang X  Su Z  Tong W  Li J  Tong Z  Li S  Ye J  Wang L 《PLoS biology》2005,3(2):e38
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.  相似文献   

6.
The annotation of the recently released Populus trichocarpa genome, has allowed us to characterize extensively the multigenic families of the redoxin proteins. Proteins with two cysteines separated by two amino acids (CxxC motif) are often involved in redox reactions by promoting the formation, reduction or isomerization of disulfide bonds or by binding prosthetic groups or metals. We report here the presence of a new protein family in higher plants, constituted of 19 members in Populus trichocarpa, 15 in Arabidopsis thaliana and 17 in Oryza sativa. These proteins are almost specific to higher plants, with only two homologous genes found in mammals and arthropoda but none in other kingdoms. While these proteins were predicted as glutaredoxin-like proteins (GRL) in the automatic annotation procedure, they do not share the major conserved features of glutaredoxins but instead they display four conserved CxxC motives. A classification of these proteins, based on sequence similarity, gene structure and predicted cellular localization is proposed. The expression of these genes was also investigated by analyzing EST databases and Arabidopsis microarray results.  相似文献   

7.
8.
9.
Phylogenomic Analysis of the PEBP Gene Family in Cereals   总被引:1,自引:0,他引:1  
The TFL1 and FT genes, which are key genes in the control of flowering time in Arabidopsis thaliana, belong to a small multigene family characterized by a specific phosphatidylethanolamine-binding protein domain, termed the PEBP gene family. Several PEBP genes are found in dicots and monocots, and act on the control of flowering time. We investigated the evolution of the PEBP gene family in cereals. First, taking advantage of the complete rice genome sequence and EST databases, we found 19 PEBP genes in this species, 6 of which were not previously described. Ten genes correspond to five pairs of paralogs mapped on known duplicated regions of the rice genome. Phylogenetic analysis of Arabidopsis and rice genes indicates that the PEBP gene family consists of three main homology classes (the so-called TFL1-LIKE, MFT-LIKE, and FT-LIKE subfamilies), in which gene duplication and/or loss occurred independently in Arabidopsis and rice. Second, phylogenetic analyses of genomic and EST sequences from five cereal species indicate that the three subfamilies of PEBP genes have been conserved in cereals. The tree structure suggests that the ancestral grass genome had at least two MFT-like genes, two TFL1-like genes, and eight FT-like genes. A phylogenomic approach leads to some hypotheses about conservation of gene function within the subfamilies. [Reviewing Editor: Dr. Yves Van de Peer]  相似文献   

10.
11.
12.
13.

Background

Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications.

Results

Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5).

Conclusion

Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.  相似文献   

14.
Previous studies have indicated that Arabidopsis thaliana experienced a genome-wide duplication event shortly before its divergence from Brassica followed by extensive chromosomal rearrangements and deletions. While a large number of the duplicated genes have significantly diverged or lost their sister genes, we found 4222 pairs that are still highly conserved, and as a result had similar functional assignments during the annotation of the genome sequence. Using whole-genome DNA microarrays, we identified 906 duplicated gene pairs in which at least one member exhibited a significant response to oxidative stress. Among these, only 117 pairs were up- or down-regulated in both pairs and many of these exhibited dissimilar patterns of expression. Examination of the expression patterns of PAL1 and PAL2, ACD1 and ACD2, genes coding for two Hsp20s, various P450s, and electron transfer flavoproteins suggests Arabidopsis evolved a number of distinct oxidative stress response mechanisms using similar gene sets following the duplication of its genome.  相似文献   

15.
16.
17.
18.
19.
Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G+C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytomegalovirus biology and pathogenesis.  相似文献   

20.
Grapevine is an important perennial fruit to the wine industry, and has implications for the health industry with some causative agents proven to reduce heart disease. Since the sequencing and assembly of grapevine cultivar Pinot Noir, several studies have contributed to its genome annotation. This new study further contributes toward genome annotation efforts by conducting a proteogenomics analysis using the latest genome annotation from CRIBI, legacy proteomics dataset from cultivar Cabernet Sauvignon and a large RNA‐seq dataset. A total of 341 novel annotation events are identified consisting of five frame‐shifts, 37 translated UTRs, 15 exon boundaries, one novel splice, nine novel exons, 159 gene boundaries, 112 reverse strands, and one novel gene event in 213 genes and 323 proteins. From this proteogenomics evidence, the Augustus gene prediction tool predicted 52 novel and revised genes (54 protein isoforms), 11 genes of which are associated with key traits such as stress tolerance and floral and fruity wine characteristics. This study also highlights a likely over‐assembly with the genome, particularly on chromosome 7.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号