首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.  相似文献   

2.
The nucleotide-binding site (NBS)-Leucine-rich repeat (LRR) gene family accounts for the largest number of known disease resistance genes, and is one of the largest gene families in plant genomes. We have identified 333 nonredundant NBS-LRRs in the current Medicago truncatula draft genome (Mt1.0), likely representing 400 to 500 NBS-LRRs in the full genome, or roughly 3 times the number present in Arabidopsis (Arabidopsis thaliana). Although many characteristics of the gene family are similar to those described on other plant genomes, several evolutionary features are particularly pronounced in M. truncatula, including a high degree of clustering, evidence of significant numbers of ectopic translocations from clusters to other parts of the genome, a small number of more evolutionarily stable NBS-LRRs, and numerous truncations and fusions leading to novel domain compositions. The gene family clearly has had a large impact on the structure of the genome, both through ectopic translocations (potentially, a means of seeding new NBS-LRR clusters), and through two extraordinarily large superclusters. Chromosome 6 encodes approximately 34% of all TIR-NBS-LRRs, while chromosome 3 encodes approximately 40% of all coiled-coil-NBS-LRRs. Almost all atypical domain combinations are in the TIR-NBS-LRR subfamily, with many occurring within one genomic cluster. This analysis shows the gene family not only is important functionally and agronomically, but also plays a structural role in the genome.  相似文献   

3.
4.
Chen CL  Chen CJ  Vallon O  Huang ZP  Zhou H  Qu LH 《Genetics》2008,179(1):21-30
Chlamydomonas reinhardtii is a unicellular green alga, the lineage of which diverged from that of land plants >1 billion years ago. Using the powerful small nucleolar RNA (snoRNA) mining platform to screen the C. reinhardtii genome, we identified 322 snoRNA genes grouped into 118 families. The 74 box C/D families can potentially guide methylation at 96 sites of ribosomal RNAs (rRNAs) and snRNAs, and the 44 box H/ACA families can potentially guide pseudouridylation at 62 sites. Remarkably, 242 of the snoRNA genes are arranged into 76 clusters, of which 77% consist of homologous genes produced by small local tandem duplications. At least 70 snoRNA gene clusters are found within introns of protein-coding genes. Although not exhaustive, this analysis reveals that C. reinhardtii has the highest number of intronic snoRNA gene clusters among eukaryotes. The prevalence of intronic snoRNA gene clusters in C. reinhardtii is similar to that of rice but in contrast with the one-snoRNA-per-intron organization of vertebrates and fungi and with that of Arabidopsis thaliana in which only a few intronic snoRNA gene clusters were identified. This analysis of C. reinhardtii snoRNA gene organization shows the functional importance of introns in a single-celled organism and provides evolutionary insight into the origin of intron-encoded RNAs in the plant lineage.  相似文献   

5.
6.
Genome-wide analysis of plant glutaredoxin systems   总被引:1,自引:0,他引:1  
The recent release of the first tree genome (Populus trichocarpa) has allowed a comparison to be made of the multigenic glutaredoxin (Grx) and glutathione reductase (GR) families of this tree with those of other sequenced organisms and especially of the two other fully sequenced plant species, Arabidopsis thaliana and Oryza sativa. Grxs are small proteins involved in disulphide bridge or protein-glutathione adduct reduction, and they are maintained in a reduced form using glutathione and an NADPH-dependent GR. While the P. trichocarpa and O. sativa genomes are nearly five times larger than that of A. thaliana, they contain approximately 45 000 and 37 500 genes compared with the 25 500 genes of A. thaliana. On the one hand, the GR gene composition varies little between species and the gene structures are relatively conserved. On the other hand, the Grx gene family can be divided into three subgroups and the gene content is larger in P. trichocarpa (36 genes) compared with A. thaliana and O. sativa (31 and 27 genes, respectively). This could be partly explained by the occurrence of more duplication events, and this is especially true for one of the three identified Grx subgroups (subgroup III). The expression of most of these genes was confirmed by analysing expressed sequence tags present in various databases. In addition, the expression of Grx of subgroups I and II was examined by RT-PCR in various poplar organs. A complete classification based essentially on gene structure and sequence identity is proposed.  相似文献   

7.
8.
9.
【背景】纳他霉素(Natamycin)是一种天然、广谱、高效的多烯大环内酯类抗真菌剂,褐黄孢链霉菌(Streptomyces gilvosporeus)是一种重要的纳他霉素产生菌。目前S. gilvosporeus基因组序列分析还未有报道,限制了该菌中纳他霉素及其他次级代谢产物合成及调控的研究。【目的】解析纳他霉素高产菌株S. gilvosporeus F607的基因组序列信息,挖掘其次级代谢产物基因资源,为深入研究该菌株的纳他霉素高产机理及生物合成调控机制奠定基础。【方法】利用相关软件对F607菌株的基因组序列进行基因预测、功能注释、进化分析和共线性分析,并预测次级代谢产物合成基因簇;对纳他霉素生物合成基因簇进行注释分析,比较分析不同菌种中纳他霉素生物合成基因簇的差异;分析预测S.gilvosporeusF607中纳他霉素生物合成途径。【结果】F607菌株基因组总长度为8482298bp,(G+C)mol%为70.95%,分别在COG、GO、KEGG数据库提取到5 062、4 428、5063个基因的注释信息。同时,antiSMASH软件预测得到29个次级代谢产物合成基因簇,其中纳他霉素基因簇与S.natalensis、S. chattanoogensis等菌株的纳他霉素基因簇相似性分别为81%和77%。除2个参与调控的sngT和sgnH基因和9个未知功能的orf基因有差异外,S. gilvosporeus F607基因簇中其他纳他霉素生物合成基因及其排列顺序与已知的纳他霉素基因簇高度一致。【结论】分析了S. gilvosporeus全基因组信息,预测了S. gilvosporeus F607中纳他霉素生物合成的途径,为从基因组层面上解析S. gilvosporeus F607菌株高产纳他霉素的内在原因提供了基础数据,为揭示纳他霉素高产的机理及工业化生产和未来新药的发现奠定了良好的基础。  相似文献   

10.
The nucleotide sequence of the complete genome of a cyanobacterium,Microcystis aeruginosa NIES-843, was determined. The genomeof M. aeruginosa is a single, circular chromosome of 5 842 795base pairs (bp) in length, with an average GC content of 42.3%.The chromosome comprises 6312 putative protein-encoding genes,two sets of rRNA genes, 42 tRNA genes representing 41 tRNA species,and genes for tmRNA, the B subunit of RNase P, SRP RNA, and6Sa RNA. Forty-five percent of the putative protein-encodingsequences showed sequence similarity to genes of known function,32% were similar to hypothetical genes, and the remaining 23%had no apparent similarity to reported genes. A total of 688kb of the genome, equivalent to 11.8% of the entire genome,were composed of both insertion sequences and miniature inverted-repeattransposable elements. This is indicative of a plasticity ofthe M. aeruginosa genome, through a mechanism that involveshomologous recombination mediated by repetitive DNA elements.In addition to known gene clusters related to the synthesisof microcystin and cyanopeptolin, novel gene clusters that maybe involved in the synthesis and modification of toxic smallpolypeptides were identified. Compared with other cyanobacteria,a relatively small number of genes for two component systemsand a large number of genes for restriction-modification systemswere notable characteristics of the M. aeruginosa genome.  相似文献   

11.
12.
The genome of Arabidopsis thaliana is exceedingly small, in part because it lacks the large middle repetitive DNA component characteristic of other plants. In this paper we have characterized a member of the low copy DNA component: the gene family for the light-harvesting chlorophyll a/b-protein. This gene family is unusual in that it contains far fewer members than the 7-16 coding sequences for this protein found in other plants. We used cross-hybridization with a Lemna gene encoding a light-harvesting chlorophyll a/b-protein to isolate 3 genes from Arabidopsis, all of which are clustered on an 11-kb genomic clone. Southern blot analysis suggests that there is a fourth related gene in Arabidopsis. Sequence analysis of the three genes demonstrates that within the translated region the nucleic acid sequence homology is 96%, the deduced amino acid sequence of the mature proteins is identical for the three genes, and two of the genes have a high degree of sequence homology in both their 5' and 3' immediate flanking regions. The genes have regulatory sequences typical of eukaryotic genes upstream of the translation start sites. However, not all of these genes are equally expressed in plants grown under normal light-dark conditions.  相似文献   

13.
The Arabidopsis thaliana genome is currently being sequenced, eventually leading towards the unravelling of all potential genes. We wanted to gain more insight into the way this genome might be organized at the ultrastructural level. To this extent we identified matrix attachment regions demarking potential chromatin domains, in a 16 kb region around the plastocyanin gene. The region was cloned and sequenced revealing six genes in addition to the plastocyanin gene. Using an heterologous in vitro nuclear matrix binding assay, to search for evolutionary conserved matrix attachment regions (MARs), we identified three such MARs. These three MARs divide the region into two small chromatin domains of 5 kb, each containing two genes. Comparison of the sequence of the three MARs revealed a degenerated 21 bp sequence that is shared between these MARs and that is not found elsewhere in the region. A similar sequence element is also present in four other MARs of Arabidopsis.Therefore, this sequence may constitute a landmark for the position of MARs in the genome of this plant. In a genomic sequence database of Arabidopsis the 21 bp element is found approximately once every 10 kb. The compactness of the Arabidopsis genome could account for the high incidence of MARs and MRSs we observed.  相似文献   

14.
Gene duplication is considered to be a source of genetic information for the creation of new functions. The Arabidopsis thaliana genome sequence revealed that a majority of plant genes belong to gene families. Regarding the problem of genes involved in the genesis of novel organs or functions during evolution, the reconstitution of the evolutionary history of gene families is of critical importance. A comparison of the intron/exon gene structure may provide clues for the understanding of the evolutionary mechanisms underlying the genesis of gene families. An extensive study of A. thaliana genome showed that families of duplicated genes may be organized according to the number and/or density of intron and the diversity in gene structure. In this paper, we propose a genomic classification of several A. thaliana gene families based on introns in an evolutionary perspective. abbreviations BGAL, -galactosidases; PCMP, plant combinatorial and modular protein  相似文献   

15.
Exploring the plant transcriptome through phylogenetic profiling   总被引:5,自引:0,他引:5       下载免费PDF全文
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.  相似文献   

16.
The basidiomycete Paxillus involutus is forming ectomycorrhizal symbiosis with a broad range of forest trees. Reassociation kinetics on P. involutus nuclear DNA indicated a haploid genome size of 23 Mb including 11% of repetitive DNA. A similar genome size (20 Mb) was estimated by genomic reconstruction analysis using three single copy genes. To assess the gene density in the P. involutus genome, a cosmid containing a 33-kb fragment of genomic DNA was sequenced and used to identify putative open reading frames (ORFs). Twelve potential ORFs were predicted, eight displayed significant sequence similarities to known proteins found in other organisms and notably, several homologues to the Podospora anserina vegetative incompatibility protein (HetE1) were found. By extrapolation, we estimate the total number of genes in the P. involutus haploid genome to approximately 7700.  相似文献   

17.

Background

In addition to gene identification and annotation, repetitive sequence analysis has become an integral part of genome sequencing projects. Identification of repeats is important not only because it improves gene prediction, but also because of the role that repetitive sequences play in determining the structure and evolution of genes and genomes. Several methods using different repeat-finding strategies are available for whole-genome repeat sequence analysis. Four independent approaches were used to identify and characterize the repetitive fraction of the Mycosphaerella graminicola (synonym Zymoseptoria tritici) genome. This ascomycete fungus is a wheat pathogen and its finished genome comprises 21 chromosomes, eight of which can be lost with no obvious effects on fitness so are dispensable.

Results

Using a combination of four repeat-finding methods, at least 17% of the M. graminicola genome was estimated to be repetitive. Class I transposable elements, that amplify via an RNA intermediate, account for about 70% of the total repetitive content in the M. graminicola genome. The dispensable chromosomes had a higher percentage of repetitive elements as compared to the core chromosomes. Distribution of repeats across the chromosomes also varied, with at least six chromosomes showing a non-random distribution of repetitive elements. Repeat families showed transition mutations and a CpA → TpA dinucleotide bias, indicating the presence of a repeat-induced point mutation (RIP)-like mechanism in M. graminicola. One gene family and two repeat families specific to subtelomeres also were identified in the M. graminicola genome. A total of 78 putative clusters of nested elements was found in the M. graminicola genome. Several genes with putative roles in pathogenicity were found associated with these nested repeat clusters. This analysis of the transposable element content in the finished M. graminicola genome resulted in a thorough and highly curated database of repetitive sequences.

Conclusions

This comprehensive analysis will serve as a scaffold to address additional biological questions regarding the origin and fate of transposable elements in fungi. Future analyses of the distribution of repetitive sequences in M. graminicola also will be able to provide insights into the association of repeats with genes and their potential role in gene and genome evolution.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1132) contains supplementary material, which is available to authorized users.  相似文献   

18.
A first generation cosmid contig map of the Leishmania major Friedlin genome has been constructed, and genomic sequencing is well underway. Chromosome 1 (Chr1) and Chr3 have been completely sequenced, and Chr4 is virtually complete. Sequencing of several other chromosomes is in progress and the complete genome sequence may be available as soon as 2003. More than 600 completely sequenced new genes have been identified, representing approximately 8% of the total gene complement (approximately 8,600 genes) of Leishmania. Notably, a large proportion (approximately 69%) of the genes remain unclassified, with 40% of these being potentially Leishmania- (or kinetoplastid-) specific. Most interestingly, the genes are organized into large (>100-300 kb) polycistronic clusters of adjacent genes on the same DNA strand. Chr1 contains two such clusters organized in a 'divergent' manner, whereas Chr3 contains two 'convergent' clusters, with a single 'divergent' gene at one telomere, with the two large clusters separated by a tRNA gene. Statistical analyses of Chr1 show that the 'divergent junction' region between the two polycistronic gene clusters may be a candidate for an origin of DNA replication.  相似文献   

19.
Few plant peptides involved in intercellular communication have been experimentally isolated. Sequence analysis of the Arabidopsis thaliana genome has revealed numerous transmembrane receptors predicted to bind proteinacious ligands, emphasizing the importance of identifying peptides with signaling function. Annotation of the Arabidopsis genome sequence has made it possible to identify peptide-encoding genes. However, such annotational identification is impeded because small genes are poorly predicted by gene-prediction algorithms, thus prompting the alternative approaches described here. We initially performed a systematic analysis of short polypeptides encoded by annotated genes on two Arabidopsis chromosomes using SignalP to identify potentially secreted peptides. Subsequent homology searches with selected, putatively secreted peptides, led to the identification of a potential, large Arabidopsis family of 34 genes. The predicted peptides are characterized by a conserved C-terminal sequence motif and additional primary structure conservation in a core region. The majority of these genes had not previously been annotated. A subset of the predicted peptides show high overall sequence similarity to Rapid Alkalinization Factor (RALF), a peptide isolated from tobacco. We therefore refer to this peptide family as RALFL for RALF-Like. RT-PCR analysis confirmed that several of the Arabidopsis genes are expressed and that their expression patterns vary. The identification of a large gene family in the genome of the model organism Arabidopsis thaliana demonstrates that a combination of systematic analysis and homology searching can contribute to peptide discovery.  相似文献   

20.
We screened plant genome sequences, primarily from rice and Arabidopsis thaliana, for CpG islands, and identified DNA segments rich in CpG dinucleotides within these sequences. These CpG-rich clusters appeared in the analysed sequences as discrete peaks and occurred at the frequencies of one per 4.7 kb in rice and one per 4.0 kb in A. thaliana. In rice and A. thaliana, most of the CpG-rich clusters were associated with genes, which suggests that these clusters are useful landmarks in genome sequences for identifying genes in plants with small genomes. In contrast, in plants with larger genomes, only a few of the clusters were associated with genes. These plant CpG-rich clusters satisfied the criteria used for identifying human CpG islands, which suggests that these CpG clusters may be regarded as plant CpG islands. The position of each island relative to the 5'-end of its associated gene varied considerably. Genes in the analysed sequences were grouped into five classes according to the position of the CpG islands within their associated genes. A large proportion of the genes belonged to one of two classes, in which a CpG island occurred near the 5'-end of the gene or covered the whole gene region. The position of a plant CpG island within its associated gene appeared to be related to the extent of tissue-specific expression of the gene; the CpG islands of most of the widely expressed rice genes occurred near the 5'-end of the genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号