首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
核小体定位对真核生物基因表达调控发挥着重要作用。前期基于核小体核心及连接区域的k-mer频次分布偏好性,构建了位置权重矩阵算法,并在酿酒酵母基因组内较好地预测了核小体占据率。利用该理论模型,以1 bp碱基为步长、147 bp碱基为窗口,用该算法计算了酵母1号、3号、14号染色体上核小体形成能力强、中、弱各3条长度为147 bp的DNA序列,将这些片段克隆到重组质粒中,大量扩增回收9条标记biotin分子的目的序列。同时分别表达纯化了组蛋白H2A、H2B、H3和H4,复性后装配形成组蛋白八聚体结构。利用盐透析方法将9条DNA序列在体外组装形成核小体结构,经biotin标记检测后计算了反应过程的吉布斯自由能,对比了9条目的序列形成核小体的亲和力大小。研究发现,9条序列中有5条序列与理论预测完全符合,4条序列与理论预测不完全一致。实验结果与该算法预测的核小体定位结果基本一致,表明该理论模型能够有效预测酿酒酵母基因组核小体占据水平。  相似文献   

2.
本研究比较分析了大熊猫和北极熊全基因组序列中的1~6碱基重复的完美型微卫星序列的分布特征,通过微卫星序列搜索和统计软件MSDB分析分别得到855 018和936 238个微卫星序列,其长度总和分别是14 919 240 bp和18 434 348 bp;分别占基因组大小的0.64%和0.79%,大熊猫和北极熊基因组总丰度分别是371.8个/Mb和405.6个/Mb,二者基因组中微卫星都是单碱基重复的最多,其次是二碱基、四碱基、三碱基和五碱基,六碱基重复类型的数量最少。大熊猫和北极熊含量最丰富的重复拷贝类别主要有A、AC、AG、AAAT、AAAG、AT和C等。本研究为后续开发和筛选大量高质量的熊科物种微卫星标记提供了数据支持。  相似文献   

3.
红原鸡全基因组中微卫星分布规律研究   总被引:1,自引:0,他引:1  
本文对红原鸡Gallus gallus全基因组中微卫星数量及分布规律进行了分析,查找到l~6个碱基重复类型的微卫星序列共282728个,约占全基因组序列(1.1Gb)的0.49%,分布频率为1/3.89kb,微卫星序列的长度主要在12~70个碱基长度范围内。第1、2、3条染色体上微卫星分布频率较高,而32号染色体上无微卫星分布。不同类型微卫星中,单碱基重复类型数目最多,为184192个,占总数的65.1%;其次是四、二、三、五、六碱基重复单元序列,分别占到总数的12.8%、9.7%、7.2%、4.6%、0.8%。T、A、AT、GTTT、AAAC、G、C、ATTT、AC、GT、AAAT、ATT、AAC、AAT、GTT、AG、CT、CTTT、AAAG、GTTTT、AAACA、AAGG、CCTT是红原鸡基因组中最主要的微卫星重复类型。本研究为红原鸡微卫星标记的分离筛选、遗传多样性的研究以及不同物种微卫星的比较分析奠定了基础。  相似文献   

4.
目的:微卫星是基因组上的短串联重复序列,具有高度多态性,表现为核心序列中重复单位的重复次数的变化,这种变化造成不同等位基因核心序列的长度不同。因此,其基因型主要依靠PCR扩增片段长度来判定。在各类研究中,人们更倾向于使用4碱基重复的微卫星以减少2碱基微卫星的stutter等问题的影响。但是4碱基微卫星核心序列结构复杂时,就会对分型的正确性产生影响,从而影响到下游分析的正确性。在很多野生动物的研究中,这一问题常常被忽略。本文以亚洲黑熊(Ursus thibetanus)的2个四碱基微卫星位点UamD116和UamB1为例,揭示内部结构对分型的影响。方法:我们选用96份亚洲黑熊样品(包括血液、肌肉组织和毛发等样品)进行微卫星分型研究,通过荧光标记的PCR扩增和毛细管电泳分型,比较了基于扩增片段长度的分型和基于序列核心结构的分型效果的差异。结果:UamD116核心序列结构除了含有多种不同的重复单位外,还在重复单位之间有碱基插入,出现单碱基T、二碱基TC和三碱基AAG插入;并在一类等位基因下游侧翼序列有1个GA缺失。基于序列结构的分型中可以将不同的等位基因分开,而在基于片段长度的分型中,容易将不同的等位基因合并为1个等位基因。在位点UamB1共发现两种类型的等位基因,在一类等位基因中出现一个3bp的插入,使等位基因之间的差异不再是4bp,而是1bp。在仅依据片段长度分型时,相差1bp的等位基因被认定为1个。此外,还有不同等位基因核心序列不同,但是二者长度完全一致。依据片段长度分型共发现8个等位基因,而经过序列分型确定的等位基因数为12个,相应地基因频率及其他遗传学参数都发生相应的改变。结论:对于核心序列结构复杂的微卫星必须通过等位基因测序来矫正片段长度分型的结果,才能得到可靠的群体遗传学结论。  相似文献   

5.
构巢曲霉菌基因组中的数量可变重复序列的组成和分布   总被引:1,自引:0,他引:1  
利用已经公布的构巢曲霉菌基因组测序结果,对该真菌已测序基因组(30.1Mb)中的数量可变重复(VNTR)序列进行了较为系统、全面地分析。结果表明,在已经公布的基因组序列中,共有4837个以1—6个核苷酸为基序的VNTR序列(长度大于15bp,匹配值大于80%),其碱基总数占整个基因组碱基数的0.31%,平均6.2kb碱基中分布有一个大于15bp的VNTR。其中数量最多的五碱基VNTR。数量达到1386个,其次为六碱基VNTR(1228个),三碱基VNTR(1199个),这3种VNTR总数达3813个,占VNTR总数的78.8%。数量最少的是二碱基VNTR,只有144个。在9541个开放阅读框架(ORF)中的VNTR总数为1683个,共分布于1356个0RF中。其中只有1个VNTR的ORF为1117个。与其他生物内VNTR的分布类似,在基因编码区中,以三碱基VNTR和六碱基VNTR占绝对优势,在阅读框架中的VNTR中分别占到58.0%和52.9%。编码区的三碱基、六碱基VNTR分别为该菌基因组中相应VNTR总数的约44.4%和38.6%。由于编码区的碱基数占基因组碱基数的59.3%,所以这两种长度的VNTR在编码区中的密度略低于基因组中的平均密度。在编码区上下游300bp调控区,除在编码区数量较多的三碱基VNTR和六碱基VNTR外,其他各种长度的VNTR的比例都超过了10%。可见300bp的上下游调控区域是单碱基、二碱基、四碱基、五碱基VNTR的富集区。在上游区域中,单碱基、二碱基和四碱基VNTR的比例比下游区域中多,五碱基VNTR的数量则基本一致。  相似文献   

6.
针对不同长度外显子的三周期特性强弱不同,尤其是短外显子的三周期较弱的情况,利用Matlab软件建立了DNA序列外显子仿真模型。研究了外显子长度、碱基ATCG在密码子3个位置上出现的概率及不同映射方法对外显子三周期特性的影响。研究结果表明:当碱基ATCG在密码子的3个位置的概率一定时,随着外显子长度的增加,三周期特性会越来越明显,并具有一定的函数关系;当长度一定时,碱基ATCG在密码子的3个位置上出现的概率矩阵总分布方差越大,三周期特性越来越明显;不同数值映射方法对短外显子的三周期特性也有一定影响。研究结果可为外显子尤其是短外显子的有效检测提供参考。  相似文献   

7.
对核酸序列编码区的重复片段进行了统计分析。发现不同物种的序列具有近似相等的约化重复长度和次数,研究了重复区内碱基分布的特点,并讨论了此结果可能的生物学涵义。  相似文献   

8.
目的:分析物种间miRNA序列的共同点和差异点,为后续miRNA研究奠定基础。方法:从miRBase数据库下载8种模式生物,即智人、小鼠、大鼠、果蝇、线虫、拟南芥、水稻、玉米的全部miRNA,通过生物信息学相关软件及方法对其进行分析。结果:各物种成熟miRNA序列长度均约为22 nt,植物miRNA长度分布范围较动物更集中;而pre-miRNA则相反,植物pre-miRNA的长度变异远大于动物;各物种miRNA序列第一个碱基倾向于U,而其他位点的碱基在不同物种间变异较大;miRNA的保守性有一定的范围,不存在在所有物种中均保守的miRNA。结论:找到了miRNA间的一些共同点及差异点,可为后续miRNA鉴定注释提供借鉴。  相似文献   

9.
棘腹蛙Paa boulengeri的遗传研究和基因组信息比较匮乏,致使可有效利用的分子标记非常有限。以棘腹蛙RNA-seq高通量测序数据为基础进行微卫星分子标记的大规模发掘和特征分析,结果显示:在121.6 Mb的棘腹蛙转录组序列中发现微卫星位点3165个,包含于3034条Contig序列中。在筛选到的1~6碱基重复核心的微卫星中,单碱基重复核心的比例最高,之后为三碱基、二碱基、四碱基、六碱基和五碱基重复核心,分别占29.0%、25.2%、21.7%、10.0%、10.0%和3.0%。其中A/T、AC/GT、AGG/CCT、ACAT/ATCT、AAAAT/ATTTT和AAAAAG/CTTTTT分别是单碱基、二碱基、三碱基、四碱基、五碱基、六碱基重复类型中对应的优势重复单元。棘腹蛙编码区微卫星多为重复长度小于24 bp的短序列,长度大于24 bp的微卫星仅占总数的0.92%。对编码区微卫星的侧翼序列分析发现,微卫星侧翼序列的GC含量显著低于转录组整体GC含量,且在含有微卫星上下游侧翼序列的Contig中,71.9%的序列可以设计特异引物扩增出含有微卫星序列的位点。研究结果为棘腹蛙的遗传研究和分子系统地理学研究提供了丰富的序列信息和标记资源。  相似文献   

10.
中国明对虾基因组小卫星重复序列分析   总被引:4,自引:0,他引:4  
高焕  孔杰 《动物学报》2005,51(1):101-107
通过对中国明对虾基因组随机DNA片断的测序 ,我们获得了总长度约 6 4 10 0 0个碱基的基因组DNA序列 ,从中共找到 172 0个重复序列。其中 ,小卫星序列的数目为 398个 ,占重复序列总数目的 2 3 14 %。这些小卫星序列的重复单位长度为 7- 16 5个碱基 ,集中分布于 7- 2 1个碱基范围内 ,其中以重复单位长度为 12个碱基的重复序列数目最多 ,为 5 8个 ,占小卫星重复序列总数目的 14 5 7%。不同拷贝数目所对应的重复序列的数目情况为 :拷贝数目为 2的重复单位所组成的重复序列数目最多 ,为 137个 ;其次是拷贝数目为 3的重复序列 ,为12 2个 ,且随着拷贝数目的增加 ,由其所组成的重复序列的数目呈递减的趋势。其中一部分序列见GeneBank数据库 ,登录号为AY6 990 72 -AY6 990 76。 398个重复序列分别由 398种重复单位所组成 ,因而小卫星重复序列的类型很多 ,我们初步分成三类 :两种碱基组成类别、三种碱基组成类别和四种碱基组成类别 ,并进一步根据各个重复序列中所含有的碱基种类的数量从大到小排列这些碱基而分成若干小类。从这些分类中可以看出 ,中国明对虾基因组中的小卫星整体上是富含A T的重复序列 ,并具有一定的“等级制度” ,揭示了其与微卫星重复序列之间的关系 ,即一部分小卫星重复序列可能起源于微卫星  相似文献   

11.
Estimation of a covariance matrix with zeros   总被引:1,自引:0,他引:1  
We consider estimation of the covariance matrix of a multivariaterandom vector under the constraint that certain covariancesare zero. We first present an algorithm, which we call iterativeconditional fitting, for computing the maximum likelihood estimateof the constrained covariance matrix, under the assumption ofmultivariate normality. In contrast to previous approaches,this algorithm has guaranteed convergence properties. Droppingthe assumption of multivariate normality, we show how to estimatethe covariance matrix in an empirical likelihood approach. Theseapproaches are then compared via simulation and on an exampleof gene expression.  相似文献   

12.
GeneRAGE: a robust algorithm for sequence clustering and domain detection   总被引:9,自引:0,他引:9  
MOTIVATION: Efficient, accurate and automatic clustering of large protein sequence datasets, such as complete proteomes, into families, according to sequence similarity. Detection and correction of false positive and negative relationships with subsequent detection and resolution of multi-domain proteins. RESULTS: A new algorithm for the automatic clustering of protein sequence datasets has been developed. This algorithm represents all similarity relationships within the dataset in a binary matrix. Removal of false positives is achieved through subsequent symmetrification of the matrix using a Smith-Waterman dynamic programming alignment algorithm. Detection of multi-domain protein families and further false positive relationships within the symmetrical matrix is achieved through iterative processing of matrix elements with successive rounds of Smith-Waterman dynamic programming alignments. Recursive single-linkage clustering of the corrected matrix allows efficient and accurate family representation for each protein in the dataset. Initial clusters containing multi-domain families, are split into their constituent clusters using the information obtained by the multi-domain detection step. This algorithm can hence quickly and accurately cluster large protein datasets into families. Problems due to the presence of multi-domain proteins are minimized, allowing more precise clustering information to be obtained automatically. AVAILABILITY: GeneRAGE (version 1.0) executable binaries for most platforms may be obtained from the authors on request. The system is available to academic users free of charge under license.  相似文献   

13.
The profile hidden Markov model (PHMM) is widely used to assign the protein sequences to their respective families. A major limitation of a PHMM is the assumption that given states the observations (amino acids) are independent. To overcome this limitation, the dependency between amino acids in a multiple sequence alignment (MSA) which is the representative of a PHMM can be appended to the PHMM. Due to the fact that with a MSA, the sequences of amino acids are biologically related, the one-by-one dependency between two amino acids can be considered. In other words, based on the MSA, the dependency between an amino acid and its corresponding amino acid located above can be combined with the PHMM. For this purpose, the new emission probability matrix which considers the one-by-one dependencies between amino acids is constructed. The parameters of a PHMM are of two types; transition and emission probabilities which are usually estimated using an EM algorithm called the Baum-Welch algorithm. We have generalized the Baum-Welch algorithm using similarity emission matrix constructed by integrating the new emission probability matrix with the common emission probability matrix. Then, the performance of similarity emission is discussed by applying it to the top twenty protein families in the Pfam database. We show that using the similarity emission in the Baum-Welch algorithm significantly outperforms the common Baum-Welch algorithm in the task of assigning protein sequences to protein families.  相似文献   

14.
Chen Z 《Biometrics》2005,61(2):474-480
The advent of complete genetic linkage maps of DNA markers has made systematic studies of mapping quantitative trait loci (QTL) in experimental organisms feasible. The method of multiple-interval mapping provides an appropriate way for mapping QTL using genetic markers. However, efficient algorithms for the computation involved remain to be developed. In this article, a full EM algorithm for the simultaneous computation of the MLEs of QTL effects and positions is developed. EM-based formulas are derived for computing the observed Fisher information matrix. The full EM algorithm is compared with an ECM algorithm developed by Kao and Zeng (1997, Biometrics 53, 653-665). The validity of the inverted observed Fisher information matrix as an estimate of the variance matrix of the MLEs is demonstrated by a simulation study.  相似文献   

15.
MOTIVATION: Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. RESULTS: In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. AVAILABILITY: The software is available as supplementary material.  相似文献   

16.
In this study we propose a new feature extraction algorithm, dNMF (discriminant non-negative matrix factorization), to learn subtle class-related differences while maintaining an accurate generative capability. In addition to the minimum representation error for the standard NMF (non-negative matrix factorization) algorithm, the dNMF algorithm also results in higher between-class variance for discriminant power. The multiplicative NMF learning algorithm has been modified to cope with this additional constraint. The cost function was carefully designed so that the extraction of feature coefficients from a single testing pattern with pre-trained feature vectors resulted in a quadratic convex optimization problem in non-negative space for uniqueness. It also resolves issues related to the previous discriminant NMF algorithms. The developed dNMF algorithm has been applied to the emotion recognition task for speech, where it needs to emphasize the emotional differences while de-emphasizing the dominant phonetic components. The dNMF algorithm successfully extracted subtle emotional differences, demonstrated much better recognition performance and showed a smaller representation error from an emotional speech database.  相似文献   

17.
In the case of noninbred and unselected populations with linkage equilibrium, the additive and dominance genetic effects are uncorrelated and the variance-covariance matrix of the second component is simply a product of its variance by a matrix that can be computed from the numerator relationship matrix A. The aim of this study is to present a new approach to estimate the dominance part with a reduced set of equations and hence a lower computing cost. The method proposed is based on the processing of the residual terms resulting from the BLUP methodology applied to an additive animal model. Best linear unbiased prediction of the dominance component d is almost identical to the one given by the full mixed model equations. Based on this approach, an algorithm for restricted maximum likelihood (REML) estimation of the variance components is also presented. By way of illustration, two numerical examples are given and a comparison between the parameters estimated with the expectation maximization (EM) algorithm and those obtained by the proposed algorithm is made. The proposed algorithm is iterative and yields estimates that are close to those obtained by EM, which is also iterative.  相似文献   

18.
A new algorithm for aligning several sequences based on thecalculation of a consensus matrix and the comparison of allthe sequences using this consensus matrix is described. Thisconsensus matrix contains the preference scores of each nucleotideøaminoacid and gaps in every position of the alignment. Two modificationsof the algorithm corresponding to the evolutionary and functionalmeanings of the alignment were developed. The first one solvesthe best-fitting problem without any penalty for end gaps andwith an internal gap penalty function independent on the gaplength. This algorithm should be used when comparing evolutionary-relatedproteins for identifying the most conservative residues. Theother modification of the algorithm finds the most similar segmentsin the given sequences. It can be used for finding those partsof the sequences that are responsible for the same biologicalJunction. In this case the gap penalty function was chosen tobe proportional to the gap length. The result of aligning aminoacid sequences of neutral proteases and a compilation of 65allosteric effectors and substrates of PEP carboxylase are presented.  相似文献   

19.
A non-deterministic minimization algorithm recently proposed is analyzed. Some characteristics are analytically derived from the analysis of positive definite quadratic forms. An improvement is proposed and compared with the basic algorithm. Different variants of the basic algorithm are finally compared to a standard Conjugate Gradient minimization algorithm in the computation of the Rayleigh coefficient of a positive definite symmetric matrix.  相似文献   

20.
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号