首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the increasing quantities of Brassica genomic data being entered into the public domain and in preparation for the complete Brassica genome sequencing effort, there is a growing requirement for the structuring and detailed bioinformatic analysis of Brassica genomic information within a user-friendly database. At the Plant Biotechnology Centre, Melbourne, Australia, we have developed a series of tools and computational pipelines to assist in the processing and structuring of genomic data, to aid its application to agricultural biotechnology research. These tools include a sequence database, ASTRA, a sequence processing pipeline incorporating annotation against GenBank, SwissProt and Arabidopsis Gene Ontology (GO) data and tools for molecular marker discovery and comparative genome analysis. All sequences are mined for simple sequence repeat (SSR) molecular markers using 'SSR primer' and mapped onto the complete Arabidopsis thaliana genome by sequence comparison. The database may be queried using a text-based search of sequence annotation or GO terms, BLAST comparison against resident sequences, or by the position of candidate orthologues within the Arabidopsis genome. Tools have also been developed and applied to the discovery of single nucleotide polymorphism (SNP) molecular markers and the in silico mapping of Brassica BAC end sequences onto the Arabidopsis genome. Planned extensions to this resource include the integration of gene expression data and the development of an EnsEMBL-based genome viewer.  相似文献   

2.
A cDNA-AFLP experiment was designed to identify and clone nucleotide sequences induced during seed germination in Arabidopsis thaliana. Sequences corresponding to known genes involved in processes important for germination, such as mitochondrial biogenesis, protein synthesis and cell cycle progression, were isolated. Other sequences correspond to Arabidopsis BAC clones in regions where genes have not been annotated. Notably, a number of the sequences cloned did not correspond to available sequences in the databases from the Arabidopsis genome, but instead present significant similarity with DNA from other organisms, for example fish species; among them, some may encode transposons. A number of the sequences isolated showed no significant similarity with any sequences in the public databases. Oligonucleotides derived from these new sequences were used to amplify genomic DNA of Arabidopsis. Expression analysis of representative sequences is presented. This work suggests that, during germination, there may be a massive transposon mobilization that may be useful in the annotation of new genome sequences and identification of regulatory mechanisms.  相似文献   

3.
4.
A common feature of the animal sialyltransferases (STs) is the presence of four conserved motifs, namely large (L), small (S), very small (VS) and motif III. Although sialic acid (SA) has not been detected in plants, three orthologues containing sequences similar to the ST motifs have been identified in the Arabidopsis thaliana L. database. In this study, we report that the At3g48820 gene (Gene ID: 824043) codes for a Golgi resident protein lacking the ability to transfer SA to asialofetuin or Galβ1,3GalNAc and Galβ1,4GlcNAc oligosaccharide acceptors. Restoration of deteriorated motifs S, VS and motif III by constructing chimeric proteins consisting of the 28–308 amino acid region of the A. thaliana At3g48820 ST-like protein and the 264–393 amino acid region of the Oryza sativa L. AK107493 ST-like protein, or of the 28–240 amino acid region of the At3g48820 protein and the 204–350 amino acid region of the Homo sapiens L. α2,3-ST ( NP_008858 ) was not able to recover sialyltransferase activity. Altering the appropriate amino acid regions of the A. thaliana At3g48820 ST-like protein to those typical for the mammalian motif III (HHYWE) and VS motif (HDADFE) also did not have any effect. Our data, together with previous results, indicate that A. thaliana in particular, and plants in general, do not have transferases for SA. Substrates for the plant ST-like proteins might be compounds involved in secondary metabolism.  相似文献   

5.
MOTIVATION: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. RESULTS: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software. AVAILABILITY: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/. CONTACT: Pierre.Rouze@gengenp.rug.ac.be.  相似文献   

6.
WILMA-automated annotation of protein sequences   总被引:1,自引:0,他引:1  
  相似文献   

7.
Insertional mutagenesis techniques, including transposon- and T-DNA-mediated mutagenesis, are key resources for systematic identification of gene function in the model plant species Arabidopsis thaliana. We have developed a database (http://atidb.cshl.org/) for archiving, searching and analyzing insertional mutagenesis lines. Flanking sequences from approximately 10 500 insertion lines (including transposon and T-DNA insertions) from several tagging programs in Arabidopsis were mapped to the genome sequence through our annotation system before being entered into the database. The database front end provides World Wide Web searching and analyzing interfaces for genome researchers and other biologists. Users can search the database to identify insertions in a particular gene or perform genome-wide analysis to study the distribution and preference of insertions. Tools integrated with the database include a graphical genome browser, a protein search function, a graphical representation of the insertion distribution and a Blast search function. The database is based on open source components and is available under an open source license.  相似文献   

8.
9.
The growing number of rice microsatellite markers warrants a comprehensive comparison of allelic variability between the markers developed using different methods, with various sequence repeat motifs, and from coding and non-coding portions of the genome. We have performed such a comparison over a set of 323 microsatellite markers; 194 were derived from genomic library screening and 129 were derived from the analysis of rice-expressed sequence tags (ESTs) available in public DNA databases. We have evaluated the frequency of polymorphism between parental pairs of six inter- subspecific crosses and one inter-specific cross widely used for mapping in rice. Microsatellites derived from genomic libraries detected a higher level of polymorphism than those derived from ESTs contained in the GenBank database (83.8% versus 54.0%). Similarly, the other measures of genetic variability [the number of alleles per locus, polymorphism information content (PIC), and allele size ranges] were all higher in genomic library-derived microsatellites than in their EST-database counterparts. The highest overall degree of genetic diversity was seen in GA-containing microsatellites of genomic library origin, while the most conserved markers contained CCG- or CAG-trinucleotide motifs and were developed from GenBank sequences. Preferential location of specific motifs in coding versus non-coding regions of known genes was related to observed levels of microsatellite diversity. A strong positive correlation was observed between the maximum length of a microsatellite motif and the standard deviation of the molecular-weight of amplified fragments. The reliability of molecular weight standard deviation (SDmw) as an indicator of genetic variability of microsatellite loci is discussed. Received: 5 May 1999 / Accepted: 16 August 1999  相似文献   

10.
Comparison of rice and Arabidopsis annotation   总被引:2,自引:0,他引:2  
Several versions of the rice genome were published in 2002, providing a first overview of the genome content of this model monocot. At the same time, the genome of the model dicot, Arabidopsis thaliana, reached a new level of annotation as thousands of full-length cDNA sequences were integrated with the genome sequence.  相似文献   

11.
12.
Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value < or = 10(-30)) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5' to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5' noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks.  相似文献   

13.
High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation.  相似文献   

14.
We have analyzed existing methodologies and created novel methodologies for the automatic assignment of S-adenosylmethionine (AdoMet)-dependent methyltransferase functionality to genomic open reading frames based on predicted protein sequences. A large class of the AdoMet-dependent methyltransferases shares a common binding motif for the AdoMet cofactor in the form of a seven-strand twisted beta-sheet; this structural similarity is mirrored in a degenerate sequence similarity that we refer to as methyltransferase signature motifs. These motifs are the basis of our assignments. We find that simple pattern matching based on the motif sequence is of limited utility and that a new method of "sensitized matrices for scoring methyltransferases" (SM2) produced with modified versions of the MEME and MAST tools gives greatly improved results for the Saccharomyces cerevisiae yeast genome. From our analysis, we conclude that this class of methyltransferases makes up approximately 0.6-1.6% of the genes in the yeast, human, mouse, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Escherichia coli genomes. We provide lists of unidentified genes that we consider to have a high probability of being methyltransferases for future biochemical analyses.  相似文献   

15.
We examined the degree of conservation of gene order in two plant species, Prunus persica (peach) and Arabidopsis thaliana (thale cress), whose lineages diverged more than 90 million years ago. In the three peach genomic regions studied, segments with a gene order congruent with A. thaliana were short (two to three genes in length); and for any peach region, corresponding segments were found in diverse locations in the A. thaliana genome. At the gene level and lower, the A. thaliana sequence was enormously useful for identifying likely coding regions in peach sequences and in determining their intron-exon structure. The peach BAC sequence data reported here contained a BLAST-detectable putative coding sequence an average of every 7 kb, and the peach introns identified in this study were, on average, almost twice the length of the corresponding introns in A. thaliana.  相似文献   

16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号