首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
张翼 《生命科学》2008,20(2):202-206
对非编码RNA功能的认识是后基因组时代的一个研究焦点,本文主要介绍非编码RNA在RNA剪接中的催化和调控功能。在RNA加工过程中,三大类内含子的剪接都是由RNA成员主导。其中Ⅰ型和Ⅱ型内含子能催化自身的切除和外显子连接反应;而核mRNA内含子的剪接则由剪接体里的小核RNA主导。Ⅰ型和Ⅱ型内含子存在于细菌、低等真核细胞和植物的细胞器内;而真核细胞的核编码蛋白质基因内全部是核mRNA内含子,并且其数目随生物体的复杂性而显著升高。一个多内含子前体mRNA通过选择性剪接产生多种,甚至上万种不同的mRNA和蛋白质,对蛋白质组的复杂度和时空表达调控至关重要。选择性剪接调控由剪接调控蛋白特异识别和结合前体mRNA里所富含的顺式RNA调控元件完成的;系统认识这两者之间的对应关系是揭示基因组表达调控网络的一把钥匙。  相似文献   

2.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

3.
At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different environmental niches and can have a profound influence on microbial community dynamics. As a consequence, any individual genome or population will contain merely a fraction of the total genetic diversity of any operationally defined “species”, whose ecological potential can thus be only fully understood by studying all of their genomes and the genes therein. This concept, known as the pangenome, is valuable for studying microbial ecology and evolution, as it partitions genomes into core (present in all the genomes from a species, and responsible for housekeeping and species-level niche adaptation among others) and accessory regions (present only in some, and responsible for intra-species differentiation). Here we present SuperPang, an algorithm producing pangenome assemblies from a set of input genomes of varying quality, including metagenome-assembled genomes (MAGs). SuperPang runs in linear time and its results are complete, non-redundant, preserve gene ordering and contain both coding and non-coding regions. Our approach provides a modular view of the pangenome, identifying operons and genomic islands, and allowing to track their prevalence in different populations. We illustrate this by analysing intra-species diversity in Polynucleobacter, a bacterial genus ubiquitous in freshwater ecosystems, characterized by their streamlined genomes and their ecological versatility. We show how SuperPang facilitates the simultaneous analysis of allelic and gene content variation under different environmental pressures, allowing us to study the drivers of microbial diversification at unprecedented resolution.  相似文献   

4.
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA ‘word-sizes’ and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.  相似文献   

5.
A report on the EMBO/EMBL Symposium on The Non-Coding Genome, held in Heidelberg, Germany, 9-12 October, 2013.We share 98% coding genome similarity with mouse and have about the same number of protein coding genes as worms, yet the differences in complexity are obvious. Where is this complexity encoded? A huge change in our understanding of genome evolution and regulation of gene expression arrived with the development of high-throughput sequencing technologies. It turns out that most of our genome is transcribed, but only a small percentage has coding information imbedded. The rest of the genome, the non-coding genome, mistakenly labeled as ‘junk DNA’, is where evolutionary complexity resides. In The Non-Coding Genome meeting, several research studies delved deeper into the importance of the non-coding genome, identifying novel classes of non-coding RNAs (ncRNAs) and novel regulatory functions, and expanding our knowledge about this new world, opening more exciting questions to study and answer.  相似文献   

6.
Prokaryotic genomes are considered to be 'wall-to-wall' genomes, which consist largely of genes for proteins and structural RNAs, with only a small fraction of the genomic DNA allotted to intergenic regions, which are thought to typically contain regulatory signals. The majority of bacterial and archaeal genomes contain 6-14% non-coding DNA. Significant positive correlations were detected between the fraction of non-coding DNA and inter- and intra-operonic distances, suggesting that different classes of non-coding DNA evolve congruently. In contrast, no correlation was found between any of these characteristics of non-coding sequences and the number of genes or genome size. Thus, the non-coding regions and the gene sets in prokaryotes seem to evolve in different regimes. The evolution of non-coding regions appears to be determined primarily by the selective pressure to minimize the amount of non-functional DNA, while maintaining essential regulatory signals, because of which the content of non-coding DNA in different genomes is relatively uniform and intra- and inter-operonic non-coding regions evolve congruently. In contrast, the gene set is optimized for the particular environmental niche of the given microbe, which results in the lack of correlation between the gene number and the characteristics of non-coding regions.  相似文献   

7.
One of the surprising insights gained from research in evolutionary developmental biology (evo-devo) is that increasing diversity in body plans and morphology in organisms across animal phyla are not reflected in similarly dramatic changes at the level of gene composition of their genomes. For instance, simplicity at the tissue level of organization often contrasts with a high degree of genetic complexity. Also intriguing is the observation that the coding regions of several genes of invertebrates show high sequence similarity to those in humans. This lack of change (conservation) indicates that evolutionary novelties may arise more frequently through combinatorial processes, such as changes in gene regulation and the recruitment of novel genes into existing regulatory gene networks (co-option), and less often through adaptive evolutionary processes in the coding portions of a gene. As a consequence, it is of great interest to examine whether the widespread conservation of the genetic machinery implies the same developmental function in a last common ancestor, or whether homologous genes acquired new developmental roles in structures of independent phylogenetic origin. To distinguish between these two possibilities one must refer to current concepts of phylogeny reconstruction and carefully investigate homology relationships. Particularly problematic in terms of homology decisions is the use of gene expression patterns of a given structure. In the future, research on more organisms other than the typical model systems will be required since these can provide insights that are not easily obtained from comparisons among only a few distantly related model species.  相似文献   

8.
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.  相似文献   

9.
Xu S  Rao N  Chen X  Zhou B 《Biotechnology letters》2011,33(5):889-896
The accuracy of prediction methods based on power spectrum analysis depends on the threshold that is used to discriminate between protein coding and non-coding sequences in the genomes of eukaryotes. Because the structure of genes vary among different eukaryotes, it is difficult to determine the best prediction threshold for a eukaryote relying only on prior biological knowledge. To improve the accuracy of prediction methods based on power spectral analysis, we developed a novel method based on a bootstrap algorithm to infer organism-specific optimal thresholds for eukaryotes. As prior information, our method requires the input of only a few annotated protein coding regions from the organism being studied. Our results show that using the calculated optimal thresholds for our test datasets, the average prediction accuracy of our method is 81%, an increase of 19% over that obtained using the same empirical threshold P = 4 for all datasets. The proposed method is simple and convenient and easily applied to infer optimal thresholds that can be used to predict coding regions in the genomes of most organisms.  相似文献   

10.
The identification of conserved sequence tags (CSTs) through comparative genome analysis may reveal important regulatory elements involved in shaping the spatio-temporal expression of genetic information. It is well known that the most significant fraction of CSTs observed in human–mouse comparisons correspond to protein coding exons, due to their strong evolutionary constraints. As we still do not know the complete gene inventory of the human and mouse genomes it is of the utmost importance to establish if detected conserved sequences are genes or not. We propose here a simple algorithm that, based on the observation of the specific evolutionary dynamics of coding sequences, efficiently discriminates between coding and non-coding CSTs. The application of this method may help the validation of predicted genes, the prediction of alternative splicing patterns in known and unknown genes and the definition of a dictionary of non-coding regulatory elements.  相似文献   

11.
Heuristic approach to deriving models for gene finding.   总被引:21,自引:2,他引:19       下载免费PDF全文
Computer methods of accurate gene finding in DNA sequences require models of protein coding and non-coding regions derived either from experimentally validated training sets or from large amounts of anonymous DNA sequence. Here we propose a new, heuristic method producing fairly accurate inhomogeneous Markov models of protein coding regions. The new method needs such a small amount of DNA sequence data that the model can be built 'on the fly' by a web server for any DNA sequence >400 nt. Tests on 10 complete bacterial genomes performed with the GeneMark.hmm program demonstrated the ability of the new models to detect 93.1% of annotated genes on average, while models built by traditional training predict an average of 93.9% of genes. Models built by the heuristic approach could be used to find genes in small fragments of anonymous prokaryotic genomes and in genomes of organelles, viruses, phages and plasmids, as well as in highly inhomogeneous genomes where adjustment of models to local DNA composition is needed. The heuristic method also gives an insight into the mechanism of codon usage pattern evolution.  相似文献   

12.
13.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

14.
植绥螨科属于囊螨总科,大部分植绥螨为害螨、害虫的重要天敌,在农业生产上具有重要的应用价值。其线粒体基因组具有独特特征,引起了生物学家的广泛关注。本文就植绥螨科线粒体基因组的结构、非编码区、碱基组成、基因重排和tRNA的特征进行综述,其特征有:(1)植绥螨科中发现了螯肢动物最大的线粒体基因组;(2)植绥螨科线粒体基因组非编码区的AT含量差异大,编码区的AT含量差异小;(3)植绥螨科线粒体基因组均发生了不同程度的的基因重排;(4)已测定的植绥螨科部分物种tRNA基因二级结构出现了截短和碱基错配的现象,部分物种的线粒体基因组出现反密码子突变的情况。在今后的研究中应进一步测定植绥螨科关键类群的线粒体基因组,深入分析植绥螨科出现大量基因重排的原因,以期能反映植绥螨科的真实进化历程。  相似文献   

15.
A dual coding event, which is the translation of different isoforms from a single gene, is one of the special patterns among the alternative splicing events. This is an important mechanism for the regulation of protein diversity in human and mouse genomes. Although the regulation for dual coding events has been characterized in a few genes, the individual mechanism remains unclear. Numerous studies have described the exonization of transposable elements, which is the splicing mediated insertion of transposable element sequence fragments into mature mRNAs. Therefore, in this study, we investigated the number of transposable element (TE)-derived dual coding genes in human, chimpanzee and mouse genomes. TE fusion exons appeared in the dual coding regions of 309 human genes. Functional protein domain alterations by TE-derived dual coding events were observed in 129 human genes. Comparative TE-derived dual coding events were also analyzed in chimpanzee and mouse orthologs. Seventy chimpanzee orthologs had TE-derived dual coding events, but mouse orthologs did not have any TE-derived dual coding events. Taken together, our analyses listed the number of TE-derived dual coding genes which could be investigated by experimental analysis and suggested that TE-derived dual coding events were major sources for the functional diversity of human genes, but not mouse genes.  相似文献   

16.
There is inherent capacity to increase the degree of aggregation within each of the levels of structural organization of living matter. At the macromolecular level (MML), this is an increase in the gene number in the genomes of evolving organisms; at the cellular level (CL), an increase in cell size; and at the multicellular level (MCL), an increase in the number of cells in the multicellular aggregate. However, the increase in the degree of aggregation causes gene incompatibility in case of genome evolution and instability in case of large cells and multicellular aggregates with simple structure. Gene incompatibility may be neutralized by spacio-temporal disconnection of the products of incompatible genes at the cellular and multicellular levels. The larger cells and multicellular aggregates are stabilized by increased structural complexity which is a consequence of the origin of new genes. There is a feedback between the processes of evolution at different levels MML→CL→ MCL.The processes of evolutionary development at different levels of structural organization are also relatively independent. The coincidence of these processes gives rise to stable organisms of higher complexity, which are then subjected to natural selection and population processes to establish a new step in progressive biological evolution. In all of the normal organisms of newly evolved species there is a correspondence between the different levels of structural organization, i.e. in their degree of aggregation, their complexity and functional organization. The form of correspondence for multicellular organisms is presented.  相似文献   

17.
In recent years, the increase in the amounts of available genomic data has made it easier to appreciate the extent by which organisms increase their genetic diversity through horizontally transferred genetic material. Such transfers have the potential to give rise to extremely dynamic genomes where a significant proportion of their coding DNA has been contributed by external sources. Because of the impact of these horizontal transfers on the ecological and pathogenic character of the recipient organisms, methods are continuously sought that are able to computationally determine which of the genes of a given genome are products of transfer events. In this paper, we introduce and discuss a novel computational method for identifying horizontal transfers that relies on a gene's nucleotide composition and obviates the need for knowledge of codon boundaries. In addition to being applicable to individual genes, the method can be easily extended to the case of clusters of horizontally transferred genes. With the help of an extensive and carefully designed set of experiments on 123 archaeal and bacterial genomes, we demonstrate that the new method exhibits significant improvement in sensitivity when compared to previously published approaches. In fact, it achieves an average relative improvement across genomes of between 11 and 41% compared to the Codon Adaptation Index method in distinguishing native from foreign genes. Our method's horizontal gene transfer predictions for 123 microbial genomes are available online at http://cbcsrv.watson.ibm.com/HGT/.  相似文献   

18.
Simple Sequence Repeats (SSRs) or microsatellites constitute a significant portion of genomes however; their significance in organellar genomes has not been completely understood. The availability of organelle genome sequences allows us to understand the organization of SSRs in their genic and intergenic regions. In the present work, SSRs were identified and categorized in 14 mitochondrial and 22 chloroplast genomes of algal species belonging to Chlorophyta. Based on the study, it was observed that number of SSRs in non-coding region were more as compared to coding region and frequency of mononucleotides repeats were highest followed by dinucleotides in both mitochondrial and chloroplast genomes. It was also observed that maximum number of SSRs was found in genes encoding for beta subunit of RNA polymerase in chloroplast genomes and NADH dehydrogenase in mitochondrial genomes. This is the first and original report on whole genomes sequence analysis of organellar genomes of green algae.  相似文献   

19.
Primary structure of thousands of genes is being determined in many laboratories worldwide. While it is relatively easy to analyse the coding region(s) of genes, it is usually hard to understand what is located in non-coding regions. A non-coding region may contain very valuable information about the mode of functioning of a given gene, e. g. promoters, enhancers, silencers etc. The regulatory function of these sequences is determined by their interaction with certain sequence-specific proteins, i. e. the presence of a certain DNA sequence in a non-coding region of a gene may suggest that the gene is regulated by a specific protein factor. This minireview summarizes recent data on most known eukaryotic sequence-specific DNA-binding protein factors, including their origin, DNA consensus, and their role in expression of corresponding genes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号