首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Increasing evidence demonstrates the importance of long coiled-coil proteins for the spatial organization of cellular processes. Although several protein classes with long coiled-coil domains have been studied in animals and yeast, our knowledge about plant long coiled-coil proteins is very limited. The repeat nature of the coiled-coil sequence motif often prevents the simple identification of homologs of animal coiled-coil proteins by generic sequence similarity searches. As a consequence, counterparts of many animal proteins with long coiled-coil domains, like lamins, golgins, or microtubule organization center components, have not been identified yet in plants. Here, all Arabidopsis proteins predicted to contain long stretches of coiled-coil domains were identified by applying the algorithm MultiCoil to a genome-wide screen. A searchable protein database, ARABI-COIL (http://www.coiled-coil.org/arabidopsis), was established that integrates information on number, size, and position of predicted coiled-coil domains with subcellular localization signals, transmembrane domains, and available functional annotations. ARABI-COIL serves as a tool to sort and browse Arabidopsis long coiled-coil proteins to facilitate the identification and selection of candidate proteins of potential interest for specific research areas. Using the database, candidate proteins were identified for Arabidopsis membrane-bound, nuclear, and organellar long coiled-coil proteins.  相似文献   

Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis   总被引:34,自引:0,他引:34       下载免费PDF全文
The Arabidopsis genome contains approximately 200 genes that encode proteins with similarity to the nucleotide binding site and other domains characteristic of plant resistance proteins. Through a reiterative process of sequence analysis and reannotation, we identified 149 NBS-LRR-encoding genes in the Arabidopsis (ecotype Columbia) genomic sequence. Fifty-six of these genes were corrected from earlier annotations. At least 12 are predicted to be pseudogenes. As described previously, two distinct groups of sequences were identified: those that encoded an N-terminal domain with Toll/Interleukin-1 Receptor homology (TIR-NBS-LRR, or TNL), and those that encoded an N-terminal coiled-coil motif (CC-NBS-LRR, or CNL). The encoded proteins are distinct from the 58 predicted adapter proteins in the previously described TIR-X, TIR-NBS, and CC-NBS groups. Classification based on protein domains, intron positions, sequence conservation, and genome distribution defined four subgroups of CNL proteins, eight subgroups of TNL proteins, and a pair of divergent NL proteins that lack a defined N-terminal motif. CNL proteins generally were encoded in single exons, although two subclasses were identified that contained introns in unique positions. TNL proteins were encoded in modular exons, with conserved intron positions separating distinct protein domains. Conserved motifs were identified in the LRRs of both CNL and TNL proteins. In contrast to CNL proteins, TNL proteins contained large and variable C-terminal domains. The extant distribution and diversity of the NBS-LRR sequences has been generated by extensive duplication and ectopic rearrangements that involved segmental duplications as well as microscale events. The observed diversity of these NBS-LRR proteins indicates the variety of recognition molecules available in an individual genotype to detect diverse biotic challenges.  相似文献   

MetaCyc (http://metacyc.org) contains experimentally determined biochemical pathways to be used as a reference database for metabolism. In conjunction with the Pathway Tools software, MetaCyc can be used to computationally predict the metabolic pathway complement of an annotated genome. To increase the breadth of pathways and enzymes, more than 60 plant-specific pathways have been added or updated in MetaCyc recently. In contrast to MetaCyc, which contains metabolic data for a wide range of organisms, AraCyc is a species-specific database containing only enzymes and pathways found in the model plant Arabidopsis (Arabidopsis thaliana). AraCyc (http://arabidopsis.org/tools/aracyc/) was the first computationally predicted plant metabolism database derived from MetaCyc. Since its initial computational build, AraCyc has been under continued curation to enhance data quality and to increase breadth of pathway coverage. Twenty-eight pathways have been manually curated from the literature recently. Pathway predictions in AraCyc have also been recently updated with the latest functional annotations of Arabidopsis genes that use controlled vocabulary and literature evidence. AraCyc currently features 1,418 unique genes mapped onto 204 pathways with 1,156 literature citations. The Omics Viewer, a user data visualization and analysis tool, allows a list of genes, enzymes, or metabolites with experimental values to be painted on a diagram of the full pathway map of AraCyc. Other recent enhancements to both MetaCyc and AraCyc include implementation of an evidence ontology, which has been used to provide information on data quality, expansion of the secondary metabolism node of the pathway ontology to accommodate curation of secondary metabolic pathways, and enhancement of the cellular component ontology for storing and displaying enzyme and pathway locations within subcellular compartments.  相似文献   

Arabidopsis thaliana is the most widely-studied plant today. The concerted efforts of over 11 000 researchers and 4000 organizations around the world are generating a rich diversity and quantity of information and materials. This information is made available through a comprehensive on-line resource called the Arabidopsis Information Resource (TAIR) (http://arabidopsis.org), which is accessible via commonly used web browsers and can be searched and downloaded in a number of ways. In the last two years, efforts have been focused on increasing data content and diversity, functionally annotating genes and gene products with controlled vocabularies, and improving data retrieval, analysis and visualization tools. New information include sequence polymorphisms including alleles, germplasms and phenotypes, Gene Ontology annotations, gene families, protein information, metabolic pathways, gene expression data from microarray experiments and seed and DNA stocks. New data visualization and analysis tools include SeqViewer, which interactively displays the genome from the whole chromosome down to 10 kb of nucleotide sequence and AraCyc, a metabolic pathway database and map tool that allows overlaying expression data onto the pathway diagrams. Finally, we have recently incorporated seed and DNA stock information from the Arabidopsis Biological Resource Center (ABRC) and implemented a shopping-cart style on-line ordering system.  相似文献   

GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox   总被引:26,自引:0,他引:26  
High-throughput gene expression analysis has become a frequent and powerful research tool in biology. At present, however, few software applications have been developed for biologists to query large microarray gene expression databases using a Web-browser interface. We present GENEVESTIGATOR, a database and Web-browser data mining interface for Affymetrix GeneChip data. Users can query the database to retrieve the expression patterns of individual genes throughout chosen environmental conditions, growth stages, or organs. Reversely, mining tools allow users to identify genes specifically expressed during selected stresses, growth stages, or in particular organs. Using GENEVESTIGATOR, the gene expression profiles of more than 22,000 Arabidopsis genes can be obtained, including those of 10,600 currently uncharacterized genes. The objective of this software application is to direct gene functional discovery and design of new experiments by providing plant biologists with contextual information on the expression of genes. The database and analysis toolbox is available as a community resource at https://www.genevestigator.ethz.ch.  相似文献   

Genomic data visualization on the Web   总被引:2,自引:0,他引:2  
Many types of genomic data can be represented in matrix format, with rows corresponding to genes and columns corresponding to gene features. The heat map is a popular technique for visualizing such data, plotting the data on a two-dimensional grid and using a color scale to represent the magnitude of each matrix entry. Prism is a Web-based software tool for generating annotated heat map visualizations of genome-wide data quickly. The tool provides a selection of genome-specific annotation catalogs as well as a catalog upload capability. The heat maps generated are clickable, allowing the user to drill down to examine specific matrix entries, and gene annotations are linked to relevant genomic databases. AVAILABILITY: http://noble.gs.washington.edu/prism  相似文献   

Model systems have played a crucial role for understanding biological processes at genetic, molecular and systems levels. Arabidopsis thaliana is one of the best studied model species for higher plants. Large genomic resources and mutant collections made Arabidopsis an excellent source for functional and comparative genomics. Rice and Brachypodium have a great potential to become model systems for grasses. Given the agronomic importance of grass crops, it is an attractive strategy to apply knowledge from Arabidopsis to grasses. Despite many efforts successful reports are sparse. Knowledge transfer should generally work best between orthologous genes that share functionality and a common ancestor. In higher plants, however, recent genome projects revealed an active and rapid evolution of genome structure, which challenges the concept of one-to-one orthologous mates between two species. In this study, we estimated on the example of protein families that are involved in redox related processes, the impact of gene expansions on the success rate for a knowledge transfer from Arabidopsis to the grass species rice, sorghum and Brachypodium. The sparse synteny between dicot and monocot plants due to frequent rearrangements, translocations and gene losses strongly impairs and reduces the number of orthologs detectable by positional conservation. To address the limitations of sparse synteny and expanded gene families, we applied for the detection of orthologs in this study orthoMCL, a sequence-based approach that allows to group closely related paralogs into one orthologous gene cluster. For a total of 49 out of 170 Arabidopsis genes we could identify conserved copy numbers between the dicot model and the grass annotations whereas approximately one third (34.7%, 59 genes) of the selected Arabidopsis genes lack an assignment to any of the grass genome annotations. The remaining 62 Arabidopsis genes represent groups that are considerably biased in their copy numbers between Arabidopsis and all or most of the three grass genomes.  相似文献   

Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames,start sites,splice sites,and related structural features.The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures.In addition,the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations,nor do they represent these annotations in a format consistent with current file standards.These frameworks also lack consideration for functional attributes,such as the presence or absence of protein domains that can be used for gene model validation.To provide oversight to the increasing number of published genome annotations,we present a software package,the Gene Filtering,Analysis,and Conversion(gFACs),to filter,analyze,and convert predicted gene models and alignments.The software operates across a wide range of alignment,analysis,and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes.gFACs supports common downstream applications,including genome browsers,and generates extensive details on the filtering process,including distributions that can be visualized to further assess the proposed gene space.gFACs is freely available and implemented in Perl with support from Bio Perl libraries at https://gitlab.com/Plant Genomics Lab/gFACs.  相似文献   

High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).  相似文献   

Sixteen Pl and TAC clones assigned to Arabidopsis thaliana chromosome5 were sequenced, and their sequence features were analyzedusing various computer programs. The total length of the sequencesdetermined was 1,013,767 bp. Together with the nucleotide sequencesof 109 clones previously reported, the regions of chromosome5 sequenced so far now total 9,072,622 bp, which presumablycovers approximately one-third of the chromosome. A similaritysearch against the reported gene sequences predicted the presenceof a total of 225 protein-coding genes and/or gene segmentsin the newly sequenced regions, indicating an average gene densityof one gene per 4.5 kb. Introns were identified in 72.4% ofthe potential protein genes for which the entire gene structurewas predicted, and the average number per gene and the averagelength of the introns were 3.3 and 163 bp, respectively. Thesesequence features are essentially identical to those in thepreviously reported sequences. The sequence data and gene informationare available on the World Wide Web database KAOS (Kazusa Arabidopsisdata Opening Site) at http://www.kazusa.or.jp/arabi/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号