首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The split structure of most mammalian protein-coding genes allows for the potential to produce multiple different mRNA and protein isoforms from a single gene locus through the process of alternative splicing (AS). We propose a computational approach called UNCOVER based on a pair hidden Markov model to discover conserved coding exonic sequences subject to AS that have so far gone undetected. Applying UNCOVER to orthologous introns of known human and mouse genes predicts skipped exons or retained introns present in both species, while discriminating them from conserved noncoding sequences. The accuracy of the model is evaluated on a curated set of genes with known conserved AS events. The prediction of skipped exons in the approximately 1% of the human genome represented by the ENCODE regions leads to more than 50 new exon candidates. Five novel predicted AS exons were validated by RT-PCR and sequencing analysis of 15 introns with strong UNCOVER predictions and lacking EST evidence. These results imply that a considerable number of conserved exonic sequences and associated isoforms are still completely missing from the current annotation of known genes. UNCOVER also identifies a small number of candidates for conserved intron retention.  相似文献   

A low level of genetic variation has limited the application of molecular markers for characterizing important traits in cultivated tomato. To detect polymorphisms in tomato conserved ortholog sets (COS), expressed sequence tags (ESTs) were searched against tomato and Arabidopsis genomic sequences to define the positions of introns. Introns were amplified from 12 different accessions of tomato by polymerase chain reaction and nucleotide sequences were determined by sequencing. Results indicated that there was a possibility of 71% to amplify introns from tomato genomic DNA through this approach. A total of 201 introns were sequenced from 86 COS unigenes. The intron positions and numbers were conserved between tomato and Arabidopsis, but average intron length was three times longer in tomato than in Arabidopsis. A total of 307 single nucleotide polymorphisms (SNPs) and 75 indels were detected in introns of 57 COS unigenes among 12 tomato lines. Within cultivated tomato germplasm 172 SNPs and 47 indels were detected in introns of 33 COS unigenes. In addition, 41 SNPs were identified in the exons of 27 COS unigenes. The frequency of SNPs was 2.4 times higher in introns than in exons in the 22 COS unigenes having both intronic and exonic polymorphisms. These results indicate that intronic regions may contain sufficient variation to develop sufficient marker resources for genome-wide analysis in cultivated tomato.  相似文献   

Orthologous introns have identical positions relative to the coding sequence in orthologous genes of different species. By analyzing the complete genomes of five plants we generated a database of 40,512 orthologous intron groups of dicotyledonous plants, 28,519 orthologous intron groups of angiosperms, and 15,726 of land plants (moss and angiosperms). Multiple sequence alignments of each orthologous intron group were obtained using the Mafft algorithm. The number of conserved regions in plant introns appeared to be hundreds of times fewer than that in mammals or vertebrates. Approximately three quarters of conserved intronic regions among angiosperms and dicots, in particular, correspond to alternatively-spliced exonic sequences. We registered only a handful of conserved intronic ncRNAs of flowering plants. However, the most evolutionarily conserved intronic region, which is ubiquitous for all plants examined in this study, including moss, possessed multiple structural features of tRNAs, which caused us to classify it as a putative tRNA-like ncRNA. Intronic sequences encoding tRNA-like structures are not unique to plants. Bioinformatics examination of the presence of tRNA inside introns revealed an unusually long-term association of four glycine tRNAs inside the Vac14 gene of fish, amniotes, and mammals.  相似文献   

Mammalian G protein-coupled receptor (GPCR) genes are characterised by a large proportion of intronless genes or a lower density of introns when compared with GPCRs of invertebrates. It is unclear which mechanisms have influenced intron density in this protein family, which is one of the largest in the mammalian genomes. We used a combination of Hidden Markov Models (HMM) and BLAST searches to establish the comprehensive repertoire of Rhodopsin GPCRs from seven species and performed overall alignments and phylogenetic analysis using the maximum parsimony method for over 1400 receptors in 12 subgroups. We identified 14 different Ancestral Receptor Groups (ARGs) that have members in both vertebrate and invertebrate species. We found that there exists a remarkable difference in the intron density among ancestral and new Rhodopsin GPCRs. The intron density among ARGs members was more than 3.5-fold higher than that within non-ARG members and more than 2-fold higher when considering only the 7TM region. This suggests that the new GPCR genes have been predominantly formed intronless while the ancestral receptors likely accumulated introns during their evolution. Many of the intron positions found in mammalian ARG receptor sequences were found to be present in orthologue invertebrate receptors suggesting that these intron positions are ancient. This analysis also revealed that one intron position is much more frequent than any other position and it is common for a number of phylogenetically different Rhodopsin GPCR groups. This intron position lies within a functionally important, conserved, DRY motif which may form a proto-splice site that could contribute to positional intron insertion. Moreover, we have found that other receptor motifs, similar to DRY, also contain introns between the second and third nucleotide of the arginine codon which also forms a proto-splice site. Our analysis presents compelling evidence that there was not a major loss of introns in mammalian GPCRs and formation of new GPCRs among mammals explains why these have fewer introns compared to invertebrate GPCRs. We also discuss and speculate about the possible role of different RNA- and DNA-based mechanisms of intron insertion and loss.  相似文献   

G C S Kuhn 《Heredity》2015,115(1):1-2
Recent years have seen considerable progress in applying single nucleotide polymorphisms (SNPs) to population genetics studies. However, relatively few have attempted to use them to study the genetic differentiation of wild bird populations and none have examined possible differences of exonic and intronic SNPs in these studies. Here, using 144 SNPs, we examined population genetic differentiation in the saker falcon (Falco cherrug) across Eurasia. The position of each SNP was verified using the recently sequenced saker genome with 108 SNPs positioned within the introns of 10 fragments and 36 SNPs in the exons of six genes, comprising MHC, MC1R and four others. In contrast to intronic SNPs, both Bayesian clustering and principal component analyses using exonic SNPs consistently revealed two genetic clusters, within which the least admixed individuals were found in Europe/central Asia and Qinghai (China), respectively. Pairwise D analysis for exonic SNPs showed that the two populations were significantly differentiated and between the two clusters the frequencies of five SNP markers were inferred to be influenced by selection. Central Eurasian populations clustered in as intermediate between the two main groups, consistent with their geographic position. But the westernmost populations of central Europe showed evidence of demographic isolation. Our work highlights the importance of functional exonic SNPs for studying population genetic pattern in a widespread avian species.  相似文献   

In this study, we have investigated the positions of introns in the globin gene of Scapharca inaequivalvis homodimeric hemoglobin. We found the three exon/two intron organization typical of vertebrate globin genes, with the two introns in highly conserved positions, as it occurs in the A and B globin genes of the tetrameric hemoglobin from the same organism, confirming the absence of the so-called `central intron' found in the globin genes of plants and of some invertebrates. We identified two homodimeric globin genes (3207 and 2723 bp) that differ only with respect to the size of the first intron. Sequence analysis of the two first introns (1668 and 1364 bp) has revealed that they are highly homologous, except for a 569- and 296-bp insertion in each intron I. Interestingly, the two first introns contain regions with an unusually high identity (∼80%) with regions of the first intron of the congeneric clam Anadara trapezia and the related clam Barbatia reveana globin genes, suggesting that these uncoding regions may have played a regulatory role that has subsequently been lost during the course of the evolution.  相似文献   

Ferritin, a protein widespread in nature, concentrates iron ∼1011–1012-fold above the solubility within a spherical shell of 24 subunits; it derives in plants and animals from a common ancestor (based on sequence) but displays a cytoplasmic location in animals compared to the plastid in contemporary plants. Ferritin gene regulation in plants and animals is altered by development, hormones, and excess iron; iron signals target DNA in plants but mRNA in animals. Evolution has thus conserved the two end points of ferritin gene expression, the physiological signals and the protein structure, while allowing some divergence of the genetic mechanisms. Comparison of ferritin gene organization in plants and animals, made possible by the cloning of a dicot (soybean) ferritin gene presented here and the recent cloning of two monocot (maize) ferritin genes, shows evolutionary divergence in ferritin gene organization between plants and animals but conservation among plants or among animals; divergence in the genetic mechanism for iron regulation is reflected by the absence in all three plant genes of the IRE, a highly conserved, noncoding sequence in vertebrate animal ferritin mRNA. In plant ferritin genes, the number of introns (n= 7) is higher than in animals (n= 3). Second, no intron positions are conserved when ferritin genes of plants and animals are compared, although all ferritin gene introns are in the coding region; within kingdoms, the intron positions in ferritin genes are conserved. Finally, secondary protein structure has no apparent relationship to intron/exon boundaries in plant ferritin genes, whereas in animal ferritin genes the correspondence is high. The structural differences in introns/exons among phylogenetically related ferritin coding sequences and the high conservation of the gene structure within plant or animal kingdoms suggest that kingdom-specific functional constraints may exist to maintain a particular intron/exon pattern within ferritin genes. In the case of plants, where ferritin gene intron placement is unrelated to triplet codons or protein structure, and where ferritin is targeted to the plastid, the selection pressure on gene organization may relate to RNA function and plastid/nuclear signaling. Received: 25 July 1995 / Accepted: 3 October 1995  相似文献   

The history of MADS box genes is well-known in angiosperms. While duplication events and gene losses occur frequently, gene structure and intron positions are very conserved. We investigated all six introns in a duplicated MADS box gene (deficiens, def) in selected Impatiens taxa, thereby assessing intron features. For the first time, our study provides a comparison of molecular changes in all introns of a gene from a phylogenetic perspective. Interestingly, a uniform pattern of molecular evolution in the introns of each copy was not observed, but intron length increases, decreases, and size retention can be found in each copy. A tendency to accumulate long autapomorphic indels is also present, thus, a longer intron length does not reflect a higher number of parsimony-informative characters. Substitution rates vary between introns of each gene copy. While four of the six introns of def1 exhibit a change in their substitution rate, five of the six def2 introns maintain their rates throughout the genus albeit at different levels. In MADS box genes several regulatory sequences are found residing in introns. Thus, presence of putative regulatory motifs was investigated. Most of them are not conserved in position and usually present in only one of the gene copies. In addition, the potential for phylogenetic reconstruction of introns in both def copies is shortly discussed.  相似文献   

Nuclear protein-coding genes of euglenids (Discoba, Euglenozoa, Euglenida) contain conventional (spliceosomal) and nonconventional introns. The latter have been found only in euglenozoans. A unique feature of nonconventional introns is the ability to form a stable and slightly conserved RNA secondary structure bringing together intron ends and placing adjacent exons in proximity. To date, little is known about the mechanism of their excision (e.g. whether it involves the spliceosome or not). The tubA gene of Euglena gracilis harbors three conventional and three nonconventional introns. While the conventional introns are excised as lariats, nonconventional introns are present in the cell solely as circular RNAs with full-length ends. Based on this discovery as well as on previous observations indicating that nonconventional introns are observed frequently at unique positions of genes, we suggest that this new type of intronic circRNA might play a role in intron mobility.  相似文献   

As part of the exploratory sequencing program Génolevures, visual scrutinisation and bioinformatic tools were used to detect spliceosomal introns in seven hemiascomycetous yeast species. A total of 153 putative novel introns were identified. Introns are rare in yeast nuclear genes (<5% have an intron), mainly located at the 5′ end of ORFs, and not highly conserved in sequence. They all share a clear non-random vocabulary: conserved splice sites and conserved nucleotide contexts around splice sites. Homologues of metazoan snRNAs and putative homologues of SR splicing factors were identified, confirming that the spliceosomal machinery is highly conserved in eukaryotes. Several introns’ features were tested as possible markers for phylogenetic analysis. We found that intron sizes vary widely within each genome, and according to the phylogenetic position of the yeast species. The evolutionary origin of spliceosomal introns was examined by analysing the degree of conservation of intron positions in homologous yeast genes. Most introns appeared to exist in the last common ancestor of present day yeast species, and then to have been differentially lost during speciation. However, in some cases, it is difficult to exclude a possible sliding event affecting a pre-existing intron or a gain of a novel intron. Taken together, our results indicate that the origin of spliceosomal introns is complex within a given genome, and that present day introns may have resulted from a dynamic flux between intron conservation, intron loss and intron gain during the evolution of hemiascomycetous yeasts.  相似文献   

We have collected over half a million splice sites from five species-Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana-and classified them into four subtypes: U2-type GT-AG and GC-AG and U12-type GT-AG and AT-AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT-AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3' splice sites (3'ss) and (iv) distinct evolutionary histories of 5' and 3'ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.  相似文献   

We analyzed precursor messenger RNAs (pre-mRNAs) of 12 eukaryotic species. In each species, three groups of highly expressed genes, ribosomal proteins, heat shock proteins, and amino-acyl tRNA synthetases, were compared with a control group (randomly selected genes). The purine-pyrimidine (R-Y) composition of pre-mRNAs of the three targeted gene groups proved to differ significantly from the control. The exons of the three groups tested have higher purine contents and R-tract abundance and lower abundance of Y-tracts compared to the control (R-tract—tract of sequential purines with R n ≥ 5; Y-tract—tract of sequential pyrimidines with Y n ≥ 5). In species widely employing “intron definition” in the splicing process, the Y content of introns of the three targeted groups appeared to be higher compared to the control group. Furthermore, in all examined species, the introns of the targeted genes have a lower abundance of R-tracts compared to the control. We hypothesized that the R-Y composition of the targeted gene groups contributes to high rate and efficiency of both splicing and translation, in addition to the mRNA coding role. This is presumably achieved by (1) reducing the possibility of the formation of secondary structures in the mRNA, (2) using the R-tracts and R-biased sequences as exonic splicing enhancers, (3) lowering the amount of targets for pyrimidine tract binding protein in the exons, and (4) reducing the amount of target sequences for binding of serine/arginine-rich (SR) proteins in the introns, thereby allowing SR proteins to bind to proper (exonic) targets. (Reviewing Editor: Dr. Axel Meyer)  相似文献   

The sequence of the apocytochrome b (cob) gene of Neurospora crassa has been determined. The structural gene is interrupted by two intervening sequences of approximately 1260 bp each. The polypeptide encoded by the exons shows extensive homology with the cob proteins of Aspergillus nidulans and Saccharomyces cerevisiae (79% and 60%, respectively). The two introns are, however, located at sites different from those of introns in the cob genes of A. nidulans and S. cerevisiae (which contain highly homologous introns at the same site within the gene). The introns share several short regions of sequence homology (10-12 bp long) with each other and with other fungal mitochondrial introns. Moreover, the second intron contains a 50 nucleotide long sequence that is highly homologous with sequences within every ribosomal intron of fungal mitochondria sequenced to date. The conserved sequences may allow the formation of a core secondary structure, which is nearly identical in many mitochondrial introns. The conserved secondary structure may be required for intron splicing. The second intron contains an open reading frame, continuous with the preceding exon, of approximately 290 codons. Two stretches of 10 amino acid residues, conserved in many introns, are present in the open reading frame.  相似文献   

The overlapping ND4L and ND5 genes of Neurospora crassa mitochondria are interrupted by one and two intervening sequences, respectively, of about 1,490, 1,408 and 1,135 bp in length. All three intervening sequences are class I introns and as such have the potential to fold into the conserved secondary structure that has been proposed for the majority of fungal mitochondrial introns. They contain long open reading frames (ORFs; from 306 to 425 codons long) that are continuous and in frame with the upstream exon sequences. These ORFs contain the conserved decapeptide-encoding sequences that are characteristic of the ORFs present in most class I introns. Extensive homology exists among the ORFs encoded by the ND4L intron, ND5 intron 1, and the second intron of the N. crassa oli2 gene. Also, internal repeats of about 130 amino acid residues are present twice in each of these three ORFs, suggesting that a duplication event may have occurred in the formation of these ORFs. The ND4L intron shares extensive homology (at the levels of both primary and proposed secondary structures) with the self-splicing intervening sequence present in the Tetrahymena nuclear rRNA gene. This homology includes but is not limited to the core secondary structure, as peripheral structural elements are also conserved in the two introns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号