首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Localizing triplet periodicity in DNA and cDNA sequences   总被引:1,自引:0,他引:1  

Background  

The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.  相似文献   

2.

Background  

Snake venom toxins evolve more rapidly than other proteins through accelerated changes in the protein coding regions. Previously we have shown that accelerated segment switch in exons to alter targeting (ASSET) might play an important role in its functional evolution of viperid three-finger toxins. In this phenomenon, short sequences in exons are radically changed to unrelated sequences and hence affect the folding and functional properties of the toxins.  相似文献   

3.
Wu J 《BMC genomics》2008,9(Z2):S13

Background

Computational gene prediction tools routinely generate large volumes of predicted coding exons (putative exons). One common limitation of these tools is the relatively low specificity due to the large amount of non-coding regions.

Methods

A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the coding exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern of codons across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate coding exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions.

Results

The method was tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN. The proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity ≤ 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/

Conclusion

The proposed method demonstrates an application of the evolutionary conservation principle to coding exons. It is a complementary method which can be used as an additional criteria to refine many existing gene predictions.
  相似文献   

4.
5.
6.
7.
8.
9.

Background  

The periodic pattern of DNA in exons is a known phenomenon. It was suggested that one of the initial causes of periodicity could be the universal (RNY) n pattern (R = A or G, Y = C or U, N = any base) of ancient RNA. Two major questions were addressed in this paper. Firstly, the cause of DNA periodicity, which was investigated by comparisons between real and simulated coding sequences. Secondly, quantification of DNA periodicity was made using an evolutionary algorithm, which was not previously used for such purposes.  相似文献   

10.

Background  

Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons.  相似文献   

11.
12.
13.

Background  

Insertions and deletions of DNA segments (indels) are together with substitutions the major mutational processes that generate genetic variation. Here we focus on recent DNA insertions and deletions in protein coding regions of the human genome to investigate selective constraints on indels in protein evolution.  相似文献   

14.

Background  

DNA homopolymer tracts, poly(dA).poly(dT) and poly(dG).poly(dC), are the simplest of simple sequence repeats. Homopolymer tracts have been systematically examined in the coding, intron and flanking regions of a limited number of eukaryotes. As the number of DNA sequences publicly available increases, the representation (over and under) of homopolymer tracts of different lengths in these regions of different genomes can be compared.  相似文献   

15.
16.
17.

Background

The neuronal ceroid lipofuscinoses (NCL) are a heterogenous group of inherited progressive neurodegenerative diseases in different mammalian species. Tibetan Terrier and Polish Owczarek Nizinny (PON) dogs show rare late-onset NCL variants with autosomal recessive inheritance, which can not be explained by mutations of known human NCL genes. These dog breeds represent animal models for human late-onset NCL. In mice the chloride channel 3 gene (Clcn3) encoding an intracellular chloride channel was described to cause a phenotype similar to NCL.

Results

Two full-length cDNA splice variants of the canine CLCN3 gene are reported. The current canine whole genome sequence assembly was used for gene structure analyses and revealed 13 coding CLCN3 exons in 52 kb of genomic sequence. Sequence analysis of the coding exons and flanking intron regions of CLCN3 using six NCL-affected Tibetan terrier dogs and an NCL-affected Polish Owczarek Nizinny (PON) dog, as well as eight healthy Tibetan terrier dogs revealed 13 SNPs. No consistent CLCN3 haplotype was associated with NCL.

Conclusion

For the examined animals we excluded the complete coding region and adjacent intronic regions of canine CLCN3 to harbor disease-causing mutations. Therefore it seems to be unlikely that a mutation in this gene is responsible for the late-onset NCL phenotype in these two dog breeds.  相似文献   

18.

Background

The identification of signatures of natural selection has long been used as an approach to understanding the unique features of any given species. Genes within segmental duplications are overlooked in most studies of selection due to the limitations of draft nonhuman genome assemblies and to the methodological reliance on accurate gene trees, which are difficult to obtain for duplicated genes.

Results

In this work, we detected exons with an accumulation of high-quality nucleotide differences between the human assembly and shotgun sequencing reads from single human and macaque individuals. Comparing the observed rates of nucleotide differences between coding exons and their flanking intronic sequences with a likelihood-ratio test, we identified 74 exons with evidence for rapid coding sequence evolution during the evolution of humans and Old World monkeys. Fifty-five percent of rapidly evolving exons were either partially or totally duplicated, which is a significant enrichment of the 6% rate observed across all human coding exons.

Conclusions

Our results provide a more comprehensive view of the action of selection upon segmental duplications, which are the most complex regions of our genomes. In light of these findings, we suggest that segmental duplications could be subjected to rapid evolution more frequently than previously thought.  相似文献   

19.

Background  

Alternative splicing of mutually exclusive exons is an important mechanism for increasing protein diversity in eukaryotes. The insect Mhc (myosin heavy chain) gene produces all different muscle myosins as a result of alternative splicing in contrast to most other organisms of the Metazoa lineage, that have a family of muscle genes with each gene coding for a protein specialized for a functional niche.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号