首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Tiling arrays of high-density oligonucleotide probes spanning the entire genome are powerful tools for the discovery of new genes. However, it is difficult to determine the structure of the spliced product of a structurally unknown gene from noisy array signals only. Here we introduce a statistical method that estimates the precise splicing points and the exon/intron structure of a structurally unknown gene by maximizing the odds or the ratio of posterior probabilities of the structure under the observation of array signal intensities and nucleic acid sequences. Our method more accurately predicted the gene structures than the simple threshold-based method, and more correctly estimated the expression values of structurally unknown genes than the window-based method. It was observed that the Markov model contributed to the precision of splice points, and that the statistical significance of expression (P-value) represented the reliability of the estimated gene structure and expression value well. We have implemented the method as a program ARTADE (ARabidopsis Tiling Array-based Detection of Exons) and applied it to the Arabidopsis thaliana whole-genome array data analysis. The database of the predicted results and the ARTADE program are available at http://omicspace.riken.jp/ARTADE/.  相似文献   

3.
4.
MOTIVATION: Coordinate regulation of gene expression can provide information on gene function. To begin a large-scale analysis of Dictyostelium gene function, we clustered genes based on their expression in wild-type and mutant strains and analyzed their functions. RESULTS: We found 17 modes of wild-type gene expression and refined them into 57 submodes considering mutant data. Annotation analyses revealed correlations between co-expression and function and an unexpected correlation between expression and function of genes involved in various aspects of chemotaxis. Co-regulation of chemotaxis genes was also found in published data from neutrophils. To test the predictive power of the analysis, we examined the phenotypes of mutations in seven co-regulated genes that had no published role in chemotaxis. Six mutants exhibited chemotaxis defects, supporting the idea that function can be inferred from co-expression. The clustering and annotation analyses provide a public resource for Dictyostelium functional genomics.  相似文献   

5.
6.
7.
Biochemical and cytogenetic experiments have led to the hypothesis that eukaryotic chromatin is organized into a series of distinct domains that are functionally independent. Two expectations of this hypothesis are: (i) adjacent genes are more frequently co-expressed than is expected by chance; and (ii) co-expressed neighbouring genes are often functionally related. Here we report that over 10% of Arabidopsis thaliana genes are within large, co-expressed chromosomal regions. Two per cent (497/22,520) of genes are highly co-expressed (r > 0.7), about five times the number expected by chance. These genes fall into 226 groups distributed across the genome, and each group typically contains two to three genes. Among the highly co-expressed groups, 40% (91/226) have genes with high amino acid sequence similarity. Nonetheless, duplicate genes alone do not explain the observed levels of co-expression. Co-expressed, non-homologous genes are transcribed in parallel, share functions, and lie close together more frequently than expected. Our results show that the A. thaliana genome contains domains of gene expression. Small domains have highly co-expressed genes that often share functional and sequence similarity and are probably co-regulated by nearby regulatory sequences. Genes within large, significantly correlated groups are typically co-regulated at a low level, suggesting the presence of large chromosomal domains.  相似文献   

8.
We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.  相似文献   

9.
We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.  相似文献   

10.
Ouzounis CA  Karp PD 《Genome biology》2002,3(2):comment2001.1-comment20016
Annotation, the process by which structural or functional information is inferred for genes or proteins, is crucial for obtaining value from genome sequences. We define the process of annotating a previously annotated genome sequence as 're-annotation', and examine the strengths and weaknesses of current manual and automatic genome-wide re-annotation approaches.  相似文献   

11.
12.
13.
14.
15.
During the last ten years, Arabidopsis thaliana has become the most favoured plant system for the study of many aspects of development and adaptation to adverse conditions and diseases. The sequencing of the Arabidopsis thaliana genome is nearly completed with more than 90% of the sequence being released in public databases. This is the first plant genome to be analysed and it has revealed a tremendous amount of information about the nature of the genes it contains and its largely duplicated organisation. French groups have been involved in Arabidopsis genomics at several steps: EST (expressed sequence tags) sequencing, construction and ordering (physical mapping of chromosomes) of a YAC (yeast artificial chromosomes) library, genomic sequencing. In parallel an extensive programme of functional genomics is being undertaken through the systematic analysis of insertional mutants. This information provides a support for analysing other more economically important plant genomes such as the rice genome and constitutes the beginning of a systematic investigation on plant gene functions and will promote new strategies for plant improvement.  相似文献   

16.
17.
18.
19.
Despite the completion of the sequencing of the entire genome of Arabidopsis thaliana (L.) Heynh., the exact determination of each single gene and its function remains an open question. This is especially true for multigene families. An approach that combines analysis of genomic structure, expression data and functional genomics to ascertain the role of the members of the multidrug-resistance-related protein ( MRP) gene family, a subfamily of the ATP-binding cassette (ABC) transporters from Arabidopsis is presented. We used cDNA sequencing and alignment-based re-annotation of genomic sequences to define the exact genic structure of all known AtMRP genes. Analysis of promoter regions suggested different induction conditions even for closely related genes. Expression analysis for the entire gene family confirmed these assumptions. Phylogenetic analysis and determination of segmental duplication in the regions of AtMRP genes revealed that the evolution of the extraordinarily high number of ABC transporter genes in plants cannot solely be explained by polyploidisation during the evolution of the Arabidopsis genome. Interestingly MRP genes from Oryza sativa L. (rice; OsMRP) show very similar genomic structures to those from Arabidopsis. Screening of large populations of T-DNA-mutagenised lines of A. thaliana resulted in the isolation of AtMRP insertion mutants. This work opens the way for the defined analysis of a multigene family of important membrane transporters whose broad variety of functions expands their traditional role as cellular detoxifiers.  相似文献   

20.
Impact of genomics approaches on plant genetics and physiology   总被引:2,自引:0,他引:2  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号