首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to explore the mechanism for the genomic replication of classical swine fever virus (CSFV), so as to make a basis for investigating its pathogenicity, an introduction of the information theory is presented in connection with the statistical mechanics, whence small-sample statistics appears naturally as a consequence of the Bayesian approach. Furthermore, a selection rule for identifying the pattern of a recognition site for an RNA-binding protein is proposed by means of the maximum entropy principle. Based on those, the information contents of 3'-untranslated regions (3'UTRs) of genomes of 20 CSFV strains and 5'-untranslated regions (5'UTRs) of genomes of 58 CSFV strains are analyzed with a computational algorithm in a reduction mode, and the 3'UTR sites of 20 strains and 5'UTR sites of 58 strains containing important motifs are extracted from the unaligned RNA sequences of unequal lengths. These sites, which have the patterns of sequence and structure similar to the putative cis elements related to the regulation of genomic replication, would be identified as the potential recognition sites in 3'UTRs and 5'UTRs for CSFV replicase responsible for classical swine fever virus genomic replication, and to some extent, this identification is supported by experimental evidence. Finally, information analysis allows a presumption to be made about the CSFV RNA replication initiation mechanism.  相似文献   

2.
Xiao  Ming  Zhan Zhu  Zhi  Liu  Jueping  Yu Zhang  Chu 《Molecular Biology》2002,36(1):34-43
In order to explore the mechanism for the genomic replication of classical swine fever virus (CSFV), so as to make a basis for investigating its pathogenicity, an introduction of the information theory is presented in connection with the statistical mechanics, whence small-sample statistics appears naturally as a consequence of the Bayesian approach. Furthermore, a selection rule for identifying the pattern of a recognition site for an RNA-binding protein is proposed by means of the maximum entropy principle. Based on those, the information contents of 3"-untranslated regions (3"UTRs) of genomes of 20 CSFV strains and 5"-untranslated regions (5"UTRs) of genomes of 58 CSFV strains are analyzed with a computational algorithm in a reduction mode, and the 3"UTR sites of 20 strains and 5"UTR sites of 58 strains containing important motifs are extracted from the unaligned RNA sequences of unequal lengths. These sites, which have the patterns of sequence and structure similar to the putative cis elements related to the regulation of genomic replication, would be identified as the potential recognition sites in 3"UTRs and 5"UTRs for CSFV replicase responsible for classical swine fever virus genomic replication, and to some extent, this identification is supported by experimental evidence. Finally, information analysis allows a presumption to be made about the CSFV RNA replication initiation mechanism.  相似文献   

3.
Prediction of splice junctions in mRNA sequences.   总被引:8,自引:6,他引:2       下载免费PDF全文
K Nakata  M Kanehisa    C DeLisi 《Nucleic acids research》1985,13(14):5327-5340
A general method based on the statistical technique of discriminant analysis is developed to distinguish boundaries of coding and non-coding regions in nucleic acid sequences. In particular, the method is applied to the prediction of splicing sites in messenger RNA precursors. Information used for discrimination includes consensus sequence patterns around splice junctions, free energy of snRNA and mRNA base pairing, and statistical differences between coding and non-coding regions such as periodic appearance of specific bases in coding regions reflecting the non-random usage of degenerate codons. Given the reading frame of an exon (but not the exon/intron boundaries), the method will predict the following exon, namely, the intron to be excised out. When applied to human sequences in the GenBank database, the method correctly identified 80% of true splice junctions.  相似文献   

4.
The mature mRNA always carries nucleotide sequences that faithfully mirror the protein product according to the niles of the genetic code. However, in the chromosome, the nucleotide sequence that represents a certain protein is interrupted by additional sequences. Therefore, most eukaryotic genes are longer than their final mRNA products. The human genome project revealed that only a tiny portion of sequences serves as protein-coding region and almost one quarter of the genome is occupied by non-coding intervening sequences. The elimination of these non-coding regions from the precursor RNA in a process termed splicing must be extremely precise, because even a single nucleotide mistake may cause a fatal error. At present, two types of intervening sequences have been identified in protein-coding genes. One of them, the U2-dependent or major-class is prevalent and represents 99% of known sequences. The other one, the so-called U12-dependent or minor-class of introns, occurs in much lesser amounts in the genome. The basic problem of nuclear splicing concerns i/ the molecular mechanisms, which ensure that the coding regions are correctly recognized and spliced together: ii/ the principles and mechanisms that guarantee the high fidelity of the splicing system; iii/ the differences in the excision mechanisms of the two classes of introns. We are going to present models explaining how intervening sequences are accurately removed and the coding regions correctly juxtaposed. The two splicing mechanisms will also be compared.  相似文献   

5.
A fractal method to distinguish coding and non-coding sequences in a complete genome is proposed, based on different statistical behaviors between these two kinds of sequences. We first propose a number sequence representation of DNA sequences. Multifractal analysis is then performed on the measure representation of the obtained number sequence. The three exponents C(-1), C1 and C2 are selected from the result of multifractal analysis. Each DNA may be represented by a point in the three-dimensional space generated by these three-component vectors. It is shown that points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes are roughly distributed in different regions. Fisher's discriminant algorithm can be used to separate these two regions in the spanned space. If the point (C(-1),C1,C2) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. For all 51 prokaryotes we considered , the average discriminant accuracies pc,pnc,qc and qnc reach 72.28%, 84.65%, 72.53% and 84.18%, respectively.  相似文献   

6.
Factors contributing to the outcome of oxidative damage to nucleic acids   总被引:9,自引:0,他引:9  
Oxidative damage to DNA appears to be a factor in cancer, yet explanations for why highly elevated levels of such lesions do not always result in cancer remain elusive. Much of the genome is non-coding and lesions in these regions might be expected to have little biological effect, an inference supported by observations that there is preferential repair of coding sequences. RNA has an important coding function in protein synthesis, and yet the consequences of RNA oxidation are largely unknown. Some non-coding nucleic acid is functional, e.g. promoters, and damage to these sequences may well have biological consequences. Similarly, oxidative damage to DNA may promote microsatellite instability, inhibit methylation and accelerate telomere shortening. DNA repair appears pivotal to the maintenance of genome integrity, and genetic alterations in repair capacity, due to single nucleotide polymorphisms or mutation, may account for inter-individual differences in cancer susceptibility. This review will survey these aspects of oxidative damage to nucleic acids and their implication for disease.  相似文献   

7.
8.
Informational parameters of nucleic acid and molecular evolution   总被引:5,自引:0,他引:5  
From the point of view of information theory, a statistical analysis of 2000 nucleic acid sequences (732 coding regions and 1177 non-coding regions) is given. The sequences are grouped into 20 categories. The probability-order-difference (POD) matrix is defined which is used to analyse the evolutionary distance of any two categories of sequences. The informational parameters D1, D2 and X = (1 + D1/D2)-1 and F are calculated for each sequence and averaged in each category. The statistical dependence of these parameters on molecular evolution is discussed. It is found that [X] is a good statistical quantity which describes the vocabulary compositions as well as the grammatical constructions of the genetic language. From the statistical analysis it is shown that [X] may play an important role in investigating the evolutionary level of nucleic acid molecules.  相似文献   

9.
Dynamic flexibility in the Escherichia coli genome.   总被引:2,自引:0,他引:2  
L Tsai  Z Sun 《FEBS letters》2001,507(2):225-230
Empirical rules based on tetranucleotide parameters were presented to predict the structural parameters twist (Omega), roll (rho), tilt (tau) and slide (D(y)). A statistical mechanical model was used to analyze the flexibility of the Escherichia coli genome. The replication terminus region displayed a low level of flexibility. A strong correlation can be seen between G+C content and flexibility. Average flexibilities in the coding regions were found to be significantly larger than those in non-coding regions. The flexible characteristics in the 5'-neighborhood of the coding regions and in three class sigma promoter sequences in the E. coli genome were also analyzed.  相似文献   

10.
11.
Identification of coding regions in DNA sequences remains challenging. Various methods have been proposed, but these are limited by species-dependence and the need for adequate training sets. The elements in DNA coding regions are known to be distributed in a quasi-random way, while those in non-coding regions have typical similar structures. For short sequences, these statistical characteristics cannot be extracted correctly and cannot even be detected. This paper introduces a new way to solve the problem: balanced estimation of diffusion entropy (BEDE).  相似文献   

12.
13.
14.
The complete genome of a lapinized classical swine fever virus (CSFV) vaccine strain was amplified into nine overlapping fragments by RT-PCR, and nucleotide sequences were determined. Complete genome sequence alignment and phylogenetic analysis indicated 92.6-98.6% identities at the nucleotide level with other reported CSFV strains and could be grouped into subgroup 1.1 along with other attenuated strains of CSFV. The 5'-UTR demonstrated >97.0% nucleotide similarity with most of vaccine CSFV strains from China. Further, its 3'-UTR sequence indicated a length similar to all the CSFV strains from China with >98.0% nucleotide similarity, although high length heterogeneity of 3'-UTR was reported among different CSFV strains. There was 12 nt (TTTTCTTTTTTT) insertion in 3'-UTR similar to other reported attenuated vaccine strains. However, secondary structure of 3'-UTR indicated that Indian CSFV strain requires further passage to obtain a 3'-UTR structure similar to most of the attenuated strains.  相似文献   

15.
We present a model for genome evolution, comprising biologically plausible events such as transpositions inside the genome and insertions of exogenous sequences. This model attempts to formulate a minimal proposition accounting for key statistical properties of genomes, avoiding, as far as possible, unsupportable hypotheses for the remote evolutionary past. The statistical properties that are observed in genomic sequences and are reproduced by the proposed model are: (i) deviations from randomness at different length scales, measured by suitable algorithms, (ii) a special form of size distribution (power law distribution) characterising different levels of genome organisation in the non-coding, and (iii) extensive resemblance in the alternation of coding and non-coding regions at several length scales (self-similarity) in long genomic sequences of higher eukaryotes.  相似文献   

16.
SUMMARY: JaDis is a Java application for computing evolutionary distances between nucleic acid sequences and G+C base frequencies. It allows specific comparison of coding sequences, of non-coding sequences or of a non-coding sequence with coding sequences. AVAILABILITY: http://pbil.univ-lyon1.fr/software/jadis.html  相似文献   

17.
18.
19.
赵亚男  李朝品 《昆虫学报》2020,63(3):354-364
【目的】测定和分析甜果螨Carpoglyphus lactis线粒体基因组全序列,并在线粒体基因组水平探讨其在真螨总目(Acariformes)中的系统发育地位,为真螨总目分类及果螨科线粒体基因组研究提供科学依据。【方法】挑取实验室饲养的甜果螨成螨,用传统的酚氯仿抽提法和试剂盒提取法提取甜果螨基因组DNA。然后采用节肢动物或螨类线粒体基因的通用引物PCR扩增出甜果螨线粒体基因cox1,cob,rrnS和nad4-nad5的部分序列;再设计种特异性引物进行Long-PCR扩增和步移法测序,测出甜果螨线粒体基因组全序列。应用SeqMan, SEQUIN 9.0和tRNAscan等生物信息学软件,对甜果螨线粒体基因组的基因结构等进行生物信息学分析。最后基于17种真螨总目螨类的蛋白质编码基因,采用最大似然法构建系统发育树。【结果】甜果螨线粒体全基因组总长为14 060 bp(GenBank登录号:MN073839),为典型的闭合双链DNA分子,共由37个基因组成,包括13个蛋白质编码基因(PCGs)、22个tRNA基因和2个rRNA基因;甜果螨线粒体基因组还包括1个大的非编码区(large non-coding region, LNR)。系统发育分析结果显示,甜果螨Carpoglyphus lactis属于无气门亚目粉螨总科(Acaroidae),与椭圆食粉螨Aleuroglyphus ovatus构成一支。粉螨总科(Acaroidae)和薄口螨总科(Histiostomatoidae)聚成一簇,与痒螨股(Psoroptidia)构成姐妹群。【结论】本研究首次获得并分析了甜果螨线粒体基因组全序列。甜果螨与椭圆食粉螨的亲缘关系较近。  相似文献   

20.
Sixteen clinical strains of classical swine fever virus (CSFV) isolated in Japan were subjected to analyses of nucleotide sequence variations in the 5' end and NS5B regions of the genome. These isolates were divided into three genovars, CSFV-1, CSFV-2 and CSFV-3, based on palindromic nucleotide substitutions at the three variable loci in the 5' untranslated region (UTR). Phylogenetic trees constructed from nucleotide sequences in the 5'-UTR and NS5B gene indicated that the CSFV strains were divided into three clusters, I, II and III. CSFV strains included in clusters I, II and III were identical to those in the CSFV-1, CSFV-2 and CSFV-3 genovars, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号