首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Exploring the plant transcriptome through phylogenetic profiling   总被引:5,自引:0,他引:5       下载免费PDF全文
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.  相似文献   

2.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

3.
4.
Using a strategy requiring only modest computational resources, wheat expressed sequence tag (EST) sequences from various sources were assembled into contigs and compared with a nonredundant barley sequence assembly, with ESTs, with complete draft genome sequences of rice and Arabidopsis thaliana, and with ESTs from other plant species. These comparisons indicate that (i) wheat sequences available from public sources represent a substantial proportion of the diversity of wheat coding sequences, (ii) prediction of open reading frames in the whole genome sequence improves when supplemented with EST information from other species, (iii) a substantial number of candidates for novel genes that are unique to wheat or related species can be identified, and (iv) a smaller number of genes can be identified that are common to monocots and dicots but absent from Arabidopsis. The sequences in the last group may have been lost from Arabidopsis after descendance from a common ancestor. Examples of potential novel wheat genes and Triticeae-specific genes are presented.  相似文献   

5.
The nucleotide binding site (NBS) is a characteristic domain of many plant resistance gene products. An increasing number of NBS-encoding sequences are being identified through gene cloning, PCR amplification with degenerate primers, and genome sequencing projects. The NBS domain was analyzed from 14 known plant resistance genes and more than 400 homologs, representing 26 genera of monocotyledonous, dicotyle-donous and one coniferous species. Two distinct groups of diverse sequences were identified, indicating divergence during evolution and an ancient origin for these sequences. One group was comprised of sequences encoding an N-terminal domain with Toll/Interleukin-1 receptor homology (TIR), including the known resistance genes, N, M, L6, RPP1 and RPP5. Surprisingly, this group was entirely absent from monocot species in searches of both random genomic sequences and large collections of ESTs. A second group contained monocot and dicot sequences, including the known resistance genes, RPS2, RPM1, I2, Mi, Dm3, Pi-B, Xa1, RPP8, RPS5 and Prf. Amino acid signatures in the conserved motifs comprising the NBS domain clearly distinguished these two groups. The Arabidopsis genome is estimated to contain approximately 200 genes that encode related NBS motifs; TIR sequences were more abundant and outnumber non-TIR sequences threefold. The Arabidopsis NBS sequences currently in the databases are located in approximately 21 genomic clusters and 14 isolated loci. NBS-encoding sequences may be more prevalent in rice. The wide distribution of these sequences in the plant kingdom and their prevalence in the Arabidopsis and rice genomes indicate that they are ancient, diverse and common in plants. Sequence inferences suggest that these genes encode a novel class of nucleotide-binding proteins.  相似文献   

6.
Jiang D  Yin C  Yu A  Zhou X  Liang W  Yuan Z  Xu Y  Yu Q  Wen T  Zhang D 《Cell research》2006,16(5):507-518
To understand the expansion ofmulticopy microRNA (miRNA) families in plants, we localized the reported miRNA genes from Arabidopsis and rice to their chromosomes, respectively, and observed that 37% of 117 miRNA genes from Arabidopsis and 35% of 173 miRNA genes from rice were segmental duplications in the genome. In order to characterize whether the expression diversification has occurred among plant multicopy miRNA family members, we designed PCR primers targeting 48 predicted miRNA precursors from 10 families in Arabidopsis and rice. Results from RT-PCR data suggest that the transcribed precursors of members within the same miRNA family were present at different expression levels. In addition, although miRl60 and miR162 sequences were conserved in Arabidopsis and rice, we found that the expression patterns of these genes differed between the two species. These data suggested that expression diversification has occurred in multicopy miRNA families, increasing our understanding of the expression regulation of miRNAs in plants.  相似文献   

7.
Model systems have played a crucial role for understanding biological processes at genetic, molecular and systems levels. Arabidopsis thaliana is one of the best studied model species for higher plants. Large genomic resources and mutant collections made Arabidopsis an excellent source for functional and comparative genomics. Rice and Brachypodium have a great potential to become model systems for grasses. Given the agronomic importance of grass crops, it is an attractive strategy to apply knowledge from Arabidopsis to grasses. Despite many efforts successful reports are sparse. Knowledge transfer should generally work best between orthologous genes that share functionality and a common ancestor. In higher plants, however, recent genome projects revealed an active and rapid evolution of genome structure, which challenges the concept of one-to-one orthologous mates between two species. In this study, we estimated on the example of protein families that are involved in redox related processes, the impact of gene expansions on the success rate for a knowledge transfer from Arabidopsis to the grass species rice, sorghum and Brachypodium. The sparse synteny between dicot and monocot plants due to frequent rearrangements, translocations and gene losses strongly impairs and reduces the number of orthologs detectable by positional conservation. To address the limitations of sparse synteny and expanded gene families, we applied for the detection of orthologs in this study orthoMCL, a sequence-based approach that allows to group closely related paralogs into one orthologous gene cluster. For a total of 49 out of 170 Arabidopsis genes we could identify conserved copy numbers between the dicot model and the grass annotations whereas approximately one third (34.7%, 59 genes) of the selected Arabidopsis genes lack an assignment to any of the grass genome annotations. The remaining 62 Arabidopsis genes represent groups that are considerably biased in their copy numbers between Arabidopsis and all or most of the three grass genomes.  相似文献   

8.
MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi  相似文献   

9.
10.
11.
Partial cDNA sequencing to obtain expressed sequence tags (ESTs) has led to the identification of tags to about 8000 of the estimated 20 000 genes in Arabidopsis thaliana . This figure represents four to five times the number of complete coding sequences from this organism available in international databases. In contrast to mammals, many proteins are encoded by multigene families in A. thaliana . Using ribosomal protein gene families as an example, it is possible to construct relatively long sequences from overlapping ESTs which are of sufficiently high quality to be able to unambiguously identify tags to individual members of multigene families, even when the sequences are highly conserved. A total of 106 genes encoding 50 different cytoplasmic ribosomal protein types have been identified, most proteins being encoded by at least two and up to four genes. Coding sequences of members of individual gene families are almost always very highly conserved and derived amino acid sequences are almost, if not completely, identical in the vast majority of cases. Sequence divergence is observed in untranslated regions which allows the definition of gene-specific probes. The method can be used to construct high-quality tags to any protein.  相似文献   

12.
Cloning and characterization of microRNAs from rice   总被引:31,自引:0,他引:31       下载免费PDF全文
Sunkar R  Girke T  Jain PK  Zhu JK 《The Plant cell》2005,17(5):1397-1411
  相似文献   

13.
14.
We have developed genetic maps, based on expressed sequence tags (ESTs) that are homologous to Arabidopsis genes, in four dicotyledonous crop plant species from different families. A comparison of these maps with the physical map of Arabidopsis reveals common genome segments that appear to have been conserved throughout the evolution of the dicots. In the four crop species analysed these segments comprise between 16 and 33% of the Arabidopsis genome. Our findings extend the synteny patterns previously observed only within plant families, and indicate that structural and functional information from the model species will be, at least in part, applicable in crop plants with large genomes.  相似文献   

15.
Large-scale single-pass sequencing of cDNAs from different plants has provided an extensive reservoir for the cloning of genes, the evaluation of tissue-specific gene expression, markers for map-based cloning, and the annotation of genomic sequences. Although as of January 2000 GenBank contained over 220,000 entries of expressed sequence tags (ESTs) from plants, most publicly available plant ESTs are derived from vegetative tissues and relatively few ESTs are specifically derived from developing seeds. However, important morphogenetic processes are exclusively associated with seed and embryo development and the metabolism of seeds is tailored toward the accumulation of economically valuable storage compounds such as oil. Here we describe a new set of ESTs from Arabidopsis, which has been derived from 5- to 13-d-old immature seeds. Close to 28,000 cDNAs have been screened by DNA/DNA hybridization and approximately 10,500 new Arabidopsis ESTs have been generated and analyzed using different bioinformatics tools. Approximately 40% of the ESTs currently have no match in dbEST, suggesting many represent mRNAs derived from genes that are specifically expressed in seeds. Although these data can be mined with many different biological questions in mind, this study emphasizes the import of photosynthate into developing embryos, its conversion into seed oil, and the regulation of this pathway.  相似文献   

16.
Monocotyledons and dicotyledons are distinct, not only in their body plans and developmental patterns, but also in the structural features of their cell walls. The recent completion of the rice (Oryza sativa) genomic sequence and publication of the sequence data, together with the completed database of the Arabidopsis thaliana genome, provide the first opportunity to compare the full complement of cell-wall-related genes from the two distinct classes of flowering plants. We made this comparison by exploiting the fact that Arabidopsis and rice have type I and type II walls, respectively, and therefore represent the two extremes in terms of the structural features of plant cell walls. In this review article, we classify all cell-wall-related genes into 32 gene families, and generate their phylogenetic trees. Using these data, we can phylogenetically compare individual genes of particular interest between Arabidopsis and rice. This comparative genome approach shows that the differences in wall architecture in the two plant groups actually mirror the diversity of the individual gene families involved in the cell-wall dynamics of the respective plant species. This study also identifies putative rice orthologs of genes with well-defined functions in Arabidopsis and other plant species.  相似文献   

17.
18.
Conservation of Arabidopsis flowering genes in model legumes   总被引:14,自引:0,他引:14       下载免费PDF全文
The model plants Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) have provided a wealth of information about genes and genetic pathways controlling the flowering process, but little is known about the corresponding pathways in legumes. The garden pea (Pisum sativum) has been used for several decades as a model system for physiological genetics of flowering, but the lack of molecular information about pea flowering genes has prevented direct comparison with other systems. To address this problem, we have searched expressed sequence tag and genome sequence databases to identify flowering-gene-related sequences from Medicago truncatula, soybean (Glycine max), and Lotus japonicus, and isolated corresponding sequences from pea by degenerate-primer polymerase chain reaction and library screening. We found that the majority of Arabidopsis flowering genes are represented in pea and in legume sequence databases, although several gene families, including the MADS-box, CONSTANS, and FLOWERING LOCUS T/TERMINAL FLOWER1 families, appear to have undergone differential expansion, and several important Arabidopsis genes, including FRIGIDA and members of the FLOWERING LOCUS C clade, are conspicuously absent. In several cases, pea and Medicago orthologs are shown to map to conserved map positions, emphasizing the closely syntenic relationship between these two species. These results demonstrate the potential benefit of parallel model systems for an understanding of flowering phenology in crop and model legume species.  相似文献   

19.
The Fabaceae, the third largest family of plants and the source of many crops, has been the target of many genomic studies. Currently, only the grasses surpass the legumes for the number of publicly available expressed sequence tags (ESTs). The quantity of sequences from diverse plants enables the use of computational approaches to identify novel genes in specific taxa. We used BLAST algorithms to compare unigene sets from Medicago truncatula, Lotus japonicus, and soybean (Glycine max and Glycine soja) to nonlegume unigene sets, to GenBank's nonredundant and EST databases, and to the genomic sequences of rice (Oryza sativa) and Arabidopsis. As a working definition, putatively legume-specific genes had no sequence homology, below a specified threshold, to publicly available sequences of nonlegumes. Using this approach, 2,525 legume-specific EST contigs were identified, of which less than three percent had clear homology to previously characterized legume genes. As a first step toward predicting function, related sequences were clustered to build motifs that could be searched against protein databases. Three families of interest were more deeply characterized: F-box related proteins, Pro-rich proteins, and Cys cluster proteins (CCPs). Of particular interest were the >300 CCPs, primarily from nodules or seeds, with predicted similarity to defensins. Motif searching also identified several previously unknown CCP-like open reading frames in Arabidopsis. Evolutionary analyses of the genomic sequences of several CCPs in M. truncatula suggest that this family has evolved by local duplications and divergent selection.  相似文献   

20.
Ancient signals: comparative genomics of plant MAPK and MAPKK gene families   总被引:10,自引:0,他引:10  
MAPK signal transduction modules play crucial roles in regulating many biological processes in plants, and their components are encoded by highly conserved genes. The recent availability of genome sequences for rice and poplar now makes it possible to examine how well the previously described Arabidopsis MAPK and MAPKK gene family structures represent the broader evolutionary situation in plants, and analysis of gene expression data for MPK and MKK genes in all three species allows further refinement of those families, based on functionality. The Arabidopsis MAPK nomenclature appears sufficiently robust to allow it to be usefully extended to other well-characterized plant systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号