首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
RNA二级结构预测系统构建   总被引:9,自引:0,他引:9  
运用下列RNA二级结构预测算法:碱基最大配对方法、Zuker极小化自由能方法、螺旋区最优堆积、螺旋区随机堆积和所有可能组合方法与基于一级螺旋区的RNA二级结构绘图技术, 构建了RNA二级结构预测系统Rnafold. 另外, 通过随机选取20个tRNA序列, 从自由能和三叶草结构两个方面比较了前4种二级结构预测算法, 并运用t检验方法分析了自由能的统计学差别. 从三叶草结构来看, 以随机堆积方法最好, 其次是螺旋区最优堆积方法和Zuker算法, 以碱基最大配对方法最差. 最后, 分析了两种极小化自由能方法之间的差别.  相似文献   

2.
电阻抗断层成像技术研究进展   总被引:1,自引:0,他引:1  
电阻抗断层成像(EIT)是一种重要的医学成像方法,通过对物体表面的分布电测量来获知物体内部的电特性图像,有着良好的应用前景.本文对EIT的硬件系统和成像算法的研究进展作了全面的描述.首先对硬件部分的信号源和驱动模式进行了介绍,并对目前使用的EIT系统作了简要的分析;然后介绍了成像算法,从二维和三维成像两个方面对目前EIT的重建算法进行阐述.最后,对EIT进行了讨论和总结.  相似文献   

3.
蛋白质网络聚类是识别功能模块的重要手段,不仅有利于理解生物系统的组织结构,对预测蛋白质功能也具有重要的意义。针对目前蛋白质网络聚类算法缺乏有效分析软件的事实,本文设计并实现了一个新的蛋白质网络聚类算法分析平台ClusterE。该平台实现了查全率、查准率、敏感性、特异性、功能富集分析等聚类评估方法,并且集成了FAG-EC、Dpclus、Monet、IPC-MCE、IPCA等聚类算法,不仅可以对蛋白质网络聚类分析结果进行可视化,并且可以在不同聚类分析指标下对多个聚类算法进行可视化比较与分析。该平台具有良好的扩展性,其中聚类算法以及聚类评估方法都是以插件形式集成到系统中。  相似文献   

4.
本文介绍利用激光扫描共聚焦显微镜获得的共聚焦图像的三维重建和显示方法,并以ACAS Ultina312激光扫描共聚焦显微镜系统为例,分析了SPF算法、投影算法和深度阴影算法等共聚焦图像数据的三维重建和图像显示方法的特点。  相似文献   

5.
多芯片对比实验中,由于多方面的变异因素,使得芯片间存在明显的系统偏移.因此,芯片表达谱数据的校正处理是关键的数据预处理步骤.当前,已经提出了很多校正算法,比如:比例常数校正、非线性校正、分位数校正等.提出了一种新的校正算法.在选择的最小秩差异探针集上,进行非线性M-A校正.并采用迭代策略减弱基准芯片方法对基准芯片选择的敏感性.在标准测试集上,同几种已知的方法进行了对比分析.  相似文献   

6.
自动对焦是实现线虫自动化筛选的一个重要步骤.在光学显微镜系统中,通过采集同一个视野下不同焦面的图像,再通过清晰度评价函数对这些图像进行运算,得到的最大值被认为是最佳对焦位置.在本研究中,对16种常用的自动对焦算法以及最近提出的一些算法进行了评估,通过评估找出最适合线虫脂滴图像的自动对焦算法,从而搭建一套线虫脂滴自动化筛选系统.同时就对焦精度、运算时间、抗噪声能力、对焦曲线等特征进行了分析评价,结果表明,大多数算法对线虫脂滴图像都有较好的表现,特别是绝对Tenengrad算法在对焦精度上有最好的表现,我们将优选该算法应用到线虫脂滴自动化筛选系统中.  相似文献   

7.
目的:识读MIT-BIH心电数据库格式,为研究心电监护系统提供信号源,为心电算法的仿真打好了基础.方法:以VC++6.0为平台,利用面向对象语言编程读取心电数据,经过D/A转换输出到心电监护系统,仿真验证心电算法.结果:心电信号输出与回放同步,可有效控制信号源.实现截取、保存、回放任一段心电波形.结论:系统平台可作为心电信号软件管理平台,提供给心电监护系统进行仿真试验.  相似文献   

8.
微生物菌群结构的异质性在影响宿主健康与疾病等方面有着十分重要的作用.对于菌群结构的时间与空间尺度异质性研究主要有非监督学习算法以及监督学习算法.由于菌群数据特性与文本数据特性之间的相似性,本文采用非监督学习的LDA概率话题模型对菌群结构的时间异质性进行研究,并与系统聚类和K-Means聚类这两种方法进行比较.采用LDA模型折叠Gibbs抽样的蒙特卡洛算法对两种数据源北平顶猴(Macaca leonina)阴道菌群(MVB)和轻微型肝性脑病(MHE)菌群的时间异质性OTUs数据集进行了分析.用LDA模型分别将MVB和MHE数据源中的27个样本和77个样本的OTUs数据集分为6个Topic和4个Topic.这与系统聚类和K-Means聚类划分成的簇数目(分别为5,3与4,3)有所不同.此外,实验表明结合MVB样本间生理数据-pH和MHE中样本α多样性,pH和α值的分类相似性更能与LDA模型的样本分类特性保持一致.因此,LDA在样本的聚集程度上更能精确地对OTUs数据集进行分类.更为重要的是,LDA模型还可以鉴定出每个Topic中具有代表性的OTUs.与系统聚类和K-Means聚类方法相比较,LDA模型不仅能更为有效地量化菌群结构的异质性,还能鉴定出相对应影响异质性的OTUs.  相似文献   

9.
基因调控网络重建是功能基因组研究的基础,有助于理解基因间的调控机理,探索复杂的生命系统及其本质.针对传统贝叶斯方法计算复杂度高、仅能构建小规模基因调控网络,而信息论方法假阳性边较多、且不能推测基因因果定向问题.本文基于有序条件互信息和有限父结点,提出一种快速构建基因调控网络的OCMIPN算法.OCMIPN方法首先采用有序条件互信息构建基因调控相关网络;然后根据基因调控网络拓扑先验知识,限制每个基因结点的父结点数量,利用贝叶斯方法推断出基因调控网络结构,有效降低算法的时间计算复杂度.人工合成网络及真实生物分子网络上仿真实验结果表明:OCMIPN方法不仅能构建出高精度的基因调控网络,且时间计算复杂度较低,其性能优于LASSO、ARACNE、Scan BMA和LBN等现有流行算法.  相似文献   

10.
基因组结构变异的检测是生物信息学的重要方向之一.本文分别对基于高通量测序技术的双末端映射方法、映射分布方法、分裂片段方法和序列拼接方法等检测技术的四种算法进行详细的解读和说明,阐述了以上四种方法两两结合的检测算法,并分析了各种检测方法的性能和适用的条件,说明混合结合的方法将会成为未来发展的方向.  相似文献   

11.
A system for the computer analysis of nucleic acid and protein sequences ("Helix") is described. Format of the DNA sequences is EMBL--compatible and may be easily commented with the help of convenient menus. "Helix" has also following possibilities: an effective alignment of gele reading data and formation of the final sequence; simple making of recombined molecules "in calcular"; calculations of nucleotide and dinucleotide distribution along the sequence; looking for coding frames; calculations percentage of codons and amino acids in coding frames; searching for direct and inverted repeats; sequences alignment; protein secondary structure prediction; restriction mapping; DNA--protein translation. "Helix" also contain programs for RNA-structure prediction, looking for homologies throughover the EMAL bank, choosing optimal sequence for probes and searching promoters. All the programs are written at FORTRAN-77 and automatically translated into FORTRAN-4. "Helix" require only 64 kbite.  相似文献   

12.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

13.
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.  相似文献   

14.
A computer program, which runs on MS-DOS personal computers, is described that assists in the design of synthetic genes coding for proteins. The goal of the program is the design of a gene which (i) contains as many unique restriction sites as possible and (ii) uses a specific codon usage. The gene designed according to the criteria above is (i) suitable for 'modular mutagenesis' experiments and (ii) optimized for expression. The program 'reverse-translates' protein sequences into degenerated DNA sequences, generates a map of potential restriction sites and locates sequence positions where unique restriction sites can be accommodated. The nucleic acid sequence is then 'refined' according to a specific codon usage to remove any degeneration. Unique restriction sites, if potentially present, can be 'forced' into the degenerated nucleic acid sequence by using 'priority codes' assigned to different restriction sequences.  相似文献   

15.
A Markov analysis of DNA sequences   总被引:12,自引:0,他引:12  
We present a model by which we look at the DNA sequence as a Markov process. It has been suggested by several workers that some basic biological or chemical features of nucleic acids stand behind the frequencies of dinucleotides (doublets) in these chains. Comparing patterns of doublet frequencies in DNA of different organisms was shown to be a fruitful approach to some phylogenetic questions (Russel & Subak-Sharpe, 1977). Grantham (1978) formulated mRNA sequence indices, some of which involve certain doublet frequencies. He suggested that using these indices may provide indications of the molecular constraints existing during gene evolution. Nussinov (1981) has shown that a set of dinucleotide preference rules holds consistently for eukaryotes, and suggested a strong correlation between these rules and degenerate codon usage. Gruenbaum, Cedar & Razin (1982) found that methylation in eukaryotic DNA occurs exclusively at C-G sites. Important biological information thus seems to be contained in the doublet frequencies. One of the basic questions to be asked (the "correlation question") is to what extent are the 64 trinucleotide (triplet) frequencies measured in a sequence determined by the 16 doublet frequencies in the same sequence. The DNA is described here as a Markov process, with the nucleotides being outcomes of a sequence generator. Answering the correlation question mentioned above means finding the order of the Markov process. The difficulty is that natural sequences are of finite length, and statistical noise is quite strong. We show that even for a 16000 nucleotide long sequence (like that of the human mitochondrial genome) the finite length effect cannot be neglected. Using the Markov chain model, the correlation between doublet and triplet frequencies can, however, be determined even for finite sequences, taking proper account of the finite length. Two natural DNA sequences, the human mitochondrial genome and the SV40 DNA, are analysed as examples of the method.  相似文献   

16.
Starting from two datasets of codon usage in coding sequences from mesophilic and thermophilic bacteria, we used internal correspondence analysis to study the variability of codon usage within and between species, and within and between amino acids. The first dataset included 18,958,458 codons from 58,482 coding sequences from completely sequenced genomes of 25 species, along with 6,793,581 dinucleotides from 21,876 intergenic spaces. The second dataset, with partially sequenced genomes, included 97,095,873 codons from 293 bacterial species. Results were consistent between the two datasets. The trend for the amino-acid composition of thermophilic proteins was found to be under the control of a pressure at the nucleic acid level, not a selection at the protein level. This effect was not present in intergenic spaces, ruling out a pressure at the DNA level. The pattern at the mRNA level was more complex than a simple purine enrichment of the sense strand of coding sequences. Outliers in the partial genome dataset introduced a note of caution about the interpretation of temperature as the direct determinant of the trend observed in thermophiles. The surprising lack of selection on the amino-acid content of thermophilic proteins suggests that the amino-acid repertoire was set up in a hot environment.  相似文献   

17.
An annotated bibliography of mathematical and computer analyses of protein and nucleic acid sequences is presented. The major subject areas represented are the determination of sequences, restriction mapping, similarity searching, sequence alignment, codon utilization, statistical analysis, information theoretic analysis, the construction of secondary and tertiary structure and DNA topology.  相似文献   

18.
A novel bias in codon third-letter usage was found in Escherichia coli genes with low fractions of "optimal codons", by comparing intact sequences with control random sequences. Third-letter usage has been found to be biased according to preference in codon usage and to doublet preference from the following first letter. The present study examines third-letter usage in the context of the nucleotide sequence when these preferences are considered. In order to exclude any influence by these factors, the random sequences were generated such that the amino acid sequence, codon usage, and the doublet frequency in each gene were all preserved. Comparison of intact sequences with these randomly generated sequences reveals that third letters of codons show a strong preference for the purine/pyrimidine pattern of the next codons: purine (R) is preferred to pyrimidine (Y) at the third site when followed by an R-Y-R codon, and pyrimidine is preferred when followed by an R-R-Y, an R-Y-Y or a Y-R-Y codon. This bias is probably related to interactions of tRNA molecules in the ribosome.  相似文献   

19.
We have sequenced the ebgA (evolved beta-galactosidase) gene of Escherichia coli K12. The sequence shows 50% nucleotide identity with the E. coli lacZ gene, demonstrating that the two genes are related by descent from a common ancestral gene. Comparison of the two sequences suggests that the ebgA gene has recently been under selection. A significant excess of identical, rather than synonymous, codons used to encode identical amino acids at the same positions in the aligned sequences implies that some form of selection is operating directly at the DNA level. This selection is independent of, and in addition to, selection based on codon usage or on function of the gene products.   相似文献   

20.
The trpFB operon from Acinetobacter calcoaceticus encoding the phosphoribosyl anthranilate isomerase and the beta-subunit of tryptophan synthase has been cloned by complementation of a trpB mutation in A. calcoaceticus, identified by deletion analysis, and sequenced. It encodes potential polypeptides of 214 amino acids with a calculated molecular weight of 23,008 (TrpF) and 403 amino acids with a molecular weight of 44,296 (TrpB). The encoded TrpB sequence shows striking homologies to those from other bacteria, ranging from 47% amino acids identity with the Brevibacterium lactofermentum protein and 64% identity with the Caulobacter crescentus protein. The encoded TrpF sequence, on the other hand, is much less homologous to the ones from other species, ranging between 27% identity with the Bacillus subtilis enzyme and 36% identity with the C. crescentus enzyme. The homologies of both polypeptides are evenly distributed over the entire sequences. The codon usage shows the strong preference for A and T in the third positions typical for A. calcoaceticus genes. The trpFB operon appears to be unlinked to trpA. The trpFB promoter has been determined by primer extension analysis of RNA synthesized from the chromosomally and plasmid-encoded trpFB operons. The starting nucleotides are identical in both cases and define the first promoter from A. calcoaceticus. Potential regulatory features are implied by a palindromic element overlapping the -35 consensus box of the promoter.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号