首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 946 毫秒
1.
SUMMARY: Comparative analysis of exon/intron organization of genes and their resulting protein structures is important for understanding evolutionary relationships between species, rules of protein organization and protein functionality. We present Structural Exon Database (SEDB), with a Web interface, an application that allows users to retrieve the exon/intron organization of genes and map the location of the exon boundaries and the intron phase onto a multiple structural alignment. SEDB is linked with Friend, an integrated analytical multiple sequence/structure viewer, which allows simultaneous visualization of exon boundaries on structure and sequence alignments. With SEDB researchers can study the correlations of gene structure with the properties of the encoded three-dimensional protein structures across eukaryotic organisms. AVAILABILITY: SEDB is publicly available at http://glinka.bio.neu.edu/SEDB/SEDB.html SUPPLEMENTARY INFORMATION: On the SEDB Web site.  相似文献   

2.
Recent studies indicate that many introns, as well as the complex spliceosomal mechanism to remove them, were present early in eukaryotic evolution. This study examines intron and exon characteristics from annotations of whole genomes to investigate the intron recognition mechanism. Exon definition uses the exon as the unit of recognition, placing length constraints on the exon but not on the intron (allowing it a greater range of lengths). In contrast, intron definition uses the intron itself as the unit of recognition and thus removes constraints on internal exon length forced by the use of an exon definition mechanism. Thus, intron and exon lengths within a genome can reflect the constraints imposed by its splicing. This study shows that it is possible firstly to recover valid intron and exon information from genome annotation. We then compare internal intron and exon information from a range of eukaryotic genomes and investigate possible evolutionary length constraints on introns and exons and how they can impact on the intron recognition mechanism. Results indicate that exon definition-based mechanisms may predominate in vertebrates although the exact system in fish is expected to show some differences with the better characterized system from mammals. We also raise the possibility that the last common ancestor of plants and animals contained some type of exon definition and that this mechanism was replaced in some genes and lineages by intron definition, possibly as a result of intron loss and/or intron shortening.  相似文献   

3.
Xing XB  Li QR  Sun H  Fu X  Zhan F  Huang X  Li J  Chen CL  Shyr Y  Zeng R  Li YX  Xie L 《Genomics》2011,98(5):343-351
Identifying protein-coding genes in eukaryotic genomes remains a challenge in post-genome era due to the complex gene models. We applied a proteogenomics strategy to detect un-annotated protein-coding regions in mouse genome. High-accuracy tandem mass spectrometry (MS/MS) data from diverse mouse samples were generated by LTQ-Orbitrap mass spectrometer in house. Two searchable diagnostic proteomic datasets were constructed, one with all possible encoding exon junctions, and the other with all putative encoding exons, for the discovery of novel exon splicing events and novel uninterrupted protein-coding regions. Altogether 29,586 unique peptides were identified. Aligning backwards to the mouse genome, the translation of 4471 annotated genes was validated by the known peptides; and 172 genic events were defined in mouse genome by the novel peptides. The approach in the current work can provide substantial evidences for eukaryote genome annotation in encoding genes.  相似文献   

4.
Base composition is not uniform across the genome of Drosophila melanogaster. Earlier analyses have suggested that there is variation in composition in D. melanogaster on both a large scale and a much smaller, within-gene, scale. Here we present analyses on 117 genes which have reliable intron/exon boundaries and no known alternative splicing. We detect significant heterogeneity in G+C content among intron segments from the same gene, as well as a significant positive correlation between the intron and the third codon position G+C content within genes. Both of these observations appear to be due, in part, to an overall decline in intron and third codon position G+C content along Drosophila genes with introns. However, there is also evidence of an increase in third codon position G+C content at the start of genes; this is particularly evident in genes without introns. This is consistent with selection acting against preferred codons at the start of genes. Received: 24 February 1997 / Accepted: 10 November 1997  相似文献   

5.
Plants contain more genes encoding core cell cycle regulators than other organisms but it is unclear whether these represent distinct functions. D-type cyclins (CYCD) play key roles in the G1-to-S-phase transition, and Arabidopsis (Arabidopsis thaliana) contains 10 CYCD genes in seven defined subgroups, six of which are conserved in rice (Oryza sativa). Here, we identify 22 CYCD genes in the poplar (Populus trichocarpa) genome and confirm that these six CYCD subgroups are conserved across higher plants, suggesting subgroup-specific functions. Different subgroups show gene number increases, with CYCD3 having three members in Arabidopsis, six in poplar, and a single representative in rice. All three species contain a single CYCD7 gene. Despite low overall sequence homology, we find remarkable conservation of intron/exon boundaries, because in most CYCD genes of plants and mammals, the first exon ends in the conserved cyclin signature. Only CYCD3 genes contain the complete cyclin box in a single exon, and this structure is conserved across angiosperms, again suggesting an early origin for the subgroup. The single CYCD gene of moss has a gene structure closely related to those of higher plants, sharing an identical exon/intron structure with several higher plant subgroups. However, green algae have CYCD genes structurally unrelated to higher plants. Conservation is also observed in the location of potential cyclin-dependent kinase phosphorylation sites within CYCD proteins. Subgroup structure is supported by conserved regulatory elements, particularly in the eudicot species, including conserved E2F regulatory sites within CYCD3 promoters. Global expression correlation analysis further supports distinct expression patterns for CYCD subgroups.  相似文献   

6.
The quest for evolutionary mechanisms providing separation between the coding (exons) and noncoding (introns) parts of genomic DNA remains an important focus of genetics. This work combines an analysis of the most recent achievements of genomics and fundamental concepts of random processes to provide a novel point of view on genome evolution. Exon sizes in sequenced genomes show a lognormal distribution typical of a random Kolmogoroff fractioning process. This implies that the process of intron incretion may be independent of exon size, and therefore could be dependent on intron-exon boundaries. All genomes examined have two distinctive classes of exons, each with different evolutionary histories. In the framework proposed in this article, these two classes of exons can be derived from a hypothetical ancestral genome by (spontaneous) symmetry breaking. We note that one of these exon classes comprises mostly alternatively spliced exons.  相似文献   

7.
8.
Splicing and the evolution of proteins in mammals   总被引:3,自引:0,他引:3  
It is often supposed that a protein's rate of evolution and its amino acid content are determined by the function and anatomy of the protein. Here we examine an alternative possibility, namely that the requirement to specify in the unprocessed RNA, in the vicinity of intron–exon boundaries, information necessary for removal of introns (e.g., exonic splice enhancers) affects both amino acid usage and rates of protein evolution. We find that the majority of amino acids show skewed usage near intron–exon boundaries, and that differences in the trends for the 2-fold and 4-fold blocks of both arginine and leucine show this to be owing to effects mediated at the nucleotide level. More specifically, there is a robust relationship between the extent to which an amino acid is preferred/avoided near boundaries and its enrichment/paucity in splice enhancers. As might then be expected, the rate of evolution is lowest near intron–exon boundaries, at least in part owing to splice enhancers, such that domains flanking intron–exon junctions evolve on average at under half the rate of exon centres from the same gene. In contrast, the rate of evolution of intronless retrogenes is highest near the domains where intron–exon junctions previously resided. The proportion of sequence near intron–exon boundaries is one of the stronger predictors of a protein's rate of evolution in mammals yet described. We conclude that after intron insertion selection favours modification of amino acid content near intron–exon junctions, so as to enable efficient intron removal, these changes then being subject to strong purifying selection even if nonoptimal for protein function. Thus there exists a strong force operating on protein evolution in mammals that is not explained directly in terms of the biology of the protein.  相似文献   

9.
The major mutation in the cystic fibrosis (CF) gene is a 3-bp deletion (delta F508) in exon 10. About 50% of the CF chromosomes in Southern Europe carry this mutation, while other previously described mutations account for less than 4%. To identify other common mutations in CF patients from the Mediterranean area, we have sequenced, exon by exon, 16 chromosomes that did not show the delta F508 deletion from a selected panel of eight unrelated CF patients. We describe here one missense and one nonsense mutation, and four sequence polymorphisms. We have also found two previously reported mutations in three chromosomes. Overall, these mutations may account for about 20% of CF alleles in the Italian and Spanish populations. No other mutations were detected in 10 out of 16 CF chromosomes after analyzing about 90% of the coding region of the CF gene, and 39 out of 54 intron/exon boundaries. Therefore, about 26% of CF mutations remain to be identified. In addition we provide the intron/exon boundary sequences for exons 4 to 9. These results together with previously reported linkage data suggest that in the Mediterranean populations further mutations may lie in the promoter region, or in intron sequences not yet analyzed.  相似文献   

10.
11.
12.
Precise identification of correct exon–intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals, delineating exon and introns in a genomic segment, seems insufficient, predominantly due to poor sequence consensus as well as limitations of training on available experimental data sets. We present here a novel concept for characterizing exon–intron boundaries in genomic segments on the basis of structural and energetic properties. We analyzed boundary junctions on both sides of all the exons (3 28 368) of protein coding genes from human genome (GENCODE database) using 28 structural and three energy parameters. Study of sequence conservation at these sites shows very poor consensus. It is observed that DNA adopts a unique structural and energy state at the boundary junctions. Also, signals are somewhat different for housekeeping and tissue specific genes. Clustering of 31 parameters into four derived vectors gives some additional insights into the physical mechanisms involved in this biological process. Sites of structural and energy signals correlate well to the positions playing important roles in pre-mRNA splicing.  相似文献   

13.
We describe the pattern of molecular evolution at a sarcomeric myosin gene, MYH16, using more than 30,000 bp of exon and intron sequence data from the chimpanzee and human genome sequencing projects to evaluate the timing and consequences of a human lineage-specific frameshift deletion. We estimate the age of the deletion at approximately 5.3 MYA. This estimate is consistent with the time of human and chimpanzee divergence and is significantly older than the first appearance of the genus Homo in the fossil record. We also find conflicting estimates of nonsynonymous fixation rates (d(N)) across different regions of this gene, revealing a complex pattern inconsistent with a simple model of pseudogene evolution for human MYH16.  相似文献   

14.
We previously reported that exon skipping in vivo due to point mutations in the 5' splice site (5'ss) signal of an internal mammalian exon can be prevented by coexpression of U1 small nuclear RNAs, termed shift-U1s, with complementarity to sequence upstream or downstream of the mutated site. We now show by S1 nuclease protection experiments that a typical shift-U1 restores splicing of the upstream intron, but not necessarily of the down stream intron. This indicates that the normal 5'ss sequence acts as an enhancer for splicing of the upstream intron, that it owes this activity to base pairing with U1, and that the enhancer activity is reproduced by base pairing of U1 with other sequences in the area. Shift-U1s are dispensable when the 3'ss sequence of the upstream intron is improved, which suggests that base pairing of U1 with sequences at or near the downstream end of the exon normally functions by compensating for a weakness in the upstream 3'ss. Accordingly, U1 appears to be involved in communication across the exon, but our data indicate at the same time that extensive base pairing between U1 and the 5'ss sequence is not necessary for accurate splicing of the downstream intron. These findings are discussed in relation to the coordinate selection exon termini proposed by the exon definition model.  相似文献   

15.
Virtually all pre-mRNA introns begin with the sequence /GU and end with AG/ (where / indicates a border between an exon and an intron). We have previously shown that the G residues at the first and last positions of the yeast actin intron interact during the second step of splicing. In this work, we ask if other highly conserved intron nucleotides also take part in this /G-G/ interaction. Of special interest is the penultimate intron nucleotide (AG/), which is important for the second step of splicing and is in proximity to other conserved intron nucleotides. Therefore, we tested interactions of the penultimate intron nucleotide with the second intron nucleotide (/GU) and with the branch site nucleotide. We also tested two models that predict interactions between sets of three conserved intron nucleotides. In addition, we used random mutagenesis and genetic selection to search for interactions between nucleotides in the pre-mRNA. We find no evidence for other interactions between intron nucleotides besides the interaction between the first and last intron nucleotides.  相似文献   

16.
17.
Nonrandomness in the intron and exon phase distributions in a sample of 305 human genes has been found and analyzed. It was shown that exon duplications had a significant effect on the exon phase nonrandomness. All of the nonrandomness is probably due to both the processes of exon duplication and shuffling. A quantitative estimation of exon duplications in the human genome and their influence on the intron and exon phase distributions has been analyzed. According to our estimation, the proportion of duplicated exons in the human genome constitutes at least 6% of the total. Generalizing the particular case of exon duplication to the more common event of exon shuffling, we modeled and analyzed the influence of exon shuffling on intron phase distribution. Received: 28 March 1997 / Accepted: 9 July 1997  相似文献   

18.
Tyrosine kinase (TK) proteins play a central role in cellular behavior and development of animals. The expansion of this superfamily is regarded as a key event in the evolution of the complex signaling pathways and gene networks of metazoans and is a prominent example of how shuffling of protein modules may generate molecular novelties. Using the intron/exon structure within the TK domain (TK intron code) as a complementary tool for the assignment of orthology and paralogy, we identified and studied the 118 TK proteins of the amphioxus Branchiostoma floridae genome to elucidate TK gene family evolution in metazoans and chordates in particular. Unlike all characterized metazoans to date, amphioxus has members of all known widespread TK families, with not a single loss. Putting amphioxus TKs in an evolutionary context, including new data from the cnidarian Nematostella vectensis, the echinoderm Strongylocentrotus purpuratus, and the ascidian Ciona intestinalis, we suggest new evolutionary histories for different TK families and draw a new global picture of gene loss/gain in the different phyla. Surprisingly, our survey also detected an unprecedented expansion of a group of closely related TK families, including TIE, FGFR, PDGFR, and RET, due most probably to massive gene duplication and exon shuffling. Based on their highly similar intron/exon structure at the TK domain, we suggest that this group of TK families constitute a superfamily of TK proteins, which we termed EXpanding TK, after their seemingly unique propensity to gene duplication and exon shuffling, not only in amphioxus but also across all metazoan groups. Due to this extreme tendency to both retention and expansion of TK genes, amphioxus harbors the richest and most diverse TK repertoire among all metazoans studied so far, retaining most of the gene complement of its ancestors, but having evolved its own repertoire of genetic novelties.  相似文献   

19.
20.
The arthropod Down syndrome cell adhesion molecule (Dscam) gene can generate tens of thousands of protein isoforms via combinatorial splicing of numerous alternative exons encoding immunoglobulin variable domains organized into three clusters referred to as the exon 4, 6, and 9 clusters. Dscam protein diversity is important for nervous system development and immune functions. We have performed extensive phylogenetic analyses of Dscam from 20 arthropods (each containing between 46 and 96 alternative exons) to reconstruct the detailed history of exon duplication and loss events that built this remarkable system over 450 million years of evolution. Whereas the structure of the exon 4 cluster is ancient, the exon 6 and 9 clusters have undergone massive, independent expansions in each insect lineage. An analysis of nearly 2000 duplicated exons enabled detailed reconstruction of the timing, location, and boundaries of these duplication events. These data clearly show that new Dscam exons have arisen continuously throughout arthropod evolution and that this process is still occurring in the exon 6 and 9 clusters. Recently duplicated regions display boundaries corresponding to a single exon and the adjacent intron. The boundaries, homology, location, clustering, and relative frequencies of these duplication events strongly suggest that staggered homologous recombination is the major mechanism by which new Dscam exons evolve. These data provide a remarkably detailed picture of how complex gene structure evolves and reveal the molecular mechanism behind this process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号