首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 234 毫秒
1.
基于RefSeq数据库的人类标准转录数据集的构建   总被引:5,自引:0,他引:5  
  相似文献   

2.
3.
陈磊  刘毅慧 《生物信息学》2011,9(3):229-234
基因芯片技术是基因组学中的重要研究工具。而基因芯片数据( 微阵列数据) 往往是高维的,使得降维成为微阵列数据分析中的一个必要步骤。本文对美国哈佛医学院 G. J. Gordon 等人提供的肺癌微阵列数据进行分析。通过 t- test,Wilcoxon 秩和检测分别提取微阵列数据特征属性,后根据 CART( Classification and Regression Tree) 算法,以 Gini 差异性指标作为误差函数,用提取的特征属性广延的构造分类树; 再进行剪枝找到最优规模的树,目的是提高树的泛化性能使得能很好适应新的预测数据。实验证明: 该方法对肺癌微阵列数据分类识别率达到 96% 以上,且很稳定; 并可以得到人们容易理解的分类规则和分类关键基因。  相似文献   

4.
该研究以红花檵木(Loropetalum chinense var.rubrum)为材料,根据转录组测序结果和PCR方法克隆到1个黄酮醇合成酶(FLS)同源基因,命名为LcFLS1。生物信息学分析显示,LcFLS1的开放阅读框为996bp,编码331个氨基酸。氨基酸序列分析显示,LcFLS1具有典型的2-酮戊二酸和铁依赖性双加氧酶结构域;蛋白结构预测表明,球形蛋白结构的核心区域存在10个与2-酮戊二酸配体互作的位点。进化树分析结果表明,LcFLS1与茶树(Camellia sinensis)等木本植物的亲缘关系较近,而与拟南芥(Arabidopsis thaliana)等草本植物的亲缘关系较远。荧光定量PCR检测显示,LcFLS1在红花檵木的花中相对表达量最高,而在茎中最少。成功构建了LcFLS1基因的过表达载体pLcFLS1-SUPER1300,经农杆菌侵染花序法将pLcFLS1-SUPER1300质粒转入拟南芥中获得转基因植株,PCR鉴定表明获得了转LcFLS1基因拟南芥阳性植株。该研究结果为红花檵木黄酮醇的生物合成机制研究,以及药用价值的开发利用奠定了基础。  相似文献   

5.
为了研究牦牛附睾组织中精子成熟的相关机理,并为探讨高原动物的生殖机制提供基本数据。本研究运用基因克隆技术对牦牛附睾Eppin基因CDS全长序列进行克隆,采用生物信息学方法进行分析,Eppin基因和编码序列特征进行了预测和分析。结果表明,牦牛Eppin基因的CDS含有一个405 bp长度的片段,由134个氨基酸编码;牦牛Eppin基因对应的蛋白分子量和理论等电点分别为15.09 ku和8.67 ku,其对应的氨基酸没有跨膜结构,归于近水性蛋白;25个α-螺旋、27个延伸链、2个β-折叠及80个无规则卷曲构成其蛋白质二级结构;牦牛Eppin基因编码氨基酸序列与黄牛、藏羚羊、绵羊等物种间同源性较高,系统进化情况与其亲缘关系远近一致。本研究应用实时荧光定量PCR技术分析Eppin基因在附睾组织3个不同区段(头部,颈部和尾部)中的表达情况,荧光定量PCR结果显示,Eppin基因在牦牛附睾组织3个不同区段中均有不同程度的表达,在附睾头部中表达最高,颈部和尾部表达较低。本研究将为牦牛附睾精子成熟的机制和Eppin基因在牦牛附睾上皮细胞中的功能提供一定的基础数据。  相似文献   

6.
基因表达是生物体中最重要和最基础的生物学过程和分子活动,生物体正是通过调控不同基因表达而实现生长发育和抵御刺激等生命活动.转录组测序是目前在生物医学研究中应用最为广泛的高通量检测基因表达的技术,也促进了大量针对转录组数据的生物信息挖掘方法和工具的发展.本文就基因表达中的转录组数据分析和挖掘方法进行了综述,从已有大规模转...  相似文献   

7.
张江蕾  陈少辉 《生态学报》2024,44(18):8314-8325
蒸散发是水循环的关键要素,分析其变化特征有助于理解区域水资源的时空分布格局。黄河水源涵养区是黄河流域重要的生态功能区,对该区域的蒸散发变化特征进行研究并归因分析,有助于缓解黄河流域的水资源供需矛盾。基于机器学习与ERA5-land再分析数据集,探究黄河水源涵养区2000-2022年蒸散发时空变化特征及影响因素,利用驱动要素去趋势方法分析不同影响因素的作用区域。结果表明:(1)黄河水源涵养区蒸散发多年平均值分布区间为256.49-841.45 mm,空间分布特征为自东向西递减,整体呈增加趋势;(2)黄河水源涵养区蒸散发的主要影响因素是地表净太阳辐射、总降水量、相对湿度,不同子流域内的主导影响因素不同,主导影响因素与区域内的水热条件及下垫面状况有关;(3) ERA5-land再分析数据集有着较好的模拟精度,可以作为大空间尺度和长时间区间研究的数据来源,但是由于下垫面的复杂性,仍需要在研究区内开展适应性评估。  相似文献   

8.
Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM (Support Vector Machines) based recursive fea- ture elimination (nRFE) algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors (PDT) could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications.  相似文献   

9.
A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods.  相似文献   

10.
Summary This paper proposes a modified radial basis function classification algorithm for non-linear cancer classification. In the algorithm, a modified simulated annealing method is developed and combined with the linear least square and gradient paradigms to optimize the structure of the radial basis function (RBF) classifier. The proposed algorithm can be adopted to perform non-linear cancer classification based on gene expression profiles and applied to two microarray data sets involving various human tumor classes: (1) Normal versus colon tumor; (2) acute myeloid leukemia (AML) versus acute lymphoblastic leukemia (ALL). Finally, accuracy and stability for the proposed algorithm are further demonstrated by comparing with the other cancer classification algorithms.  相似文献   

11.
12.
An optimized methylation-sensitive restriction fingerprinting technique was used to search for differentially methylated CpG islands in the tumor genome and detected seven genes subject to abnormal epigenetic regulation in breast cancer: SEMA6B, BIN1, VCPIP1, LAMC3, KCNH2, CACNG4, and PSMF1. For each gene, the rate of promoter methylation and changes in expression were estimated in tumor and morphologically intact paired specimens of breast tissue (N = 100). Significant methylation rates of 38, 18, and 8% were found for SEMA6B, BIN1, and LAMC3, respectively. The genes were not methylated in morphologically intact breast tissue. The expression of SEMA6B, BIN1, VCPIP1, LAMC3, KCNH2, CACNG4, and PSMF1 was decreased in 44–94% of tumor specimens by the real-time RT-PCR assay. The most profound changes in SEMA6B and LAMC3 suggest that these genes can be included in biomarker panels for breast cancer diagnosis. Fine methylation mapping of the most frequently methylated CpG islands (SEMA6B, BIN1, and LAMC3) provides a fundamental basis for developing efficient methylation tests for these genes.  相似文献   

13.
基于肿瘤基因表达谱的肿瘤分类是生物信息学的一个重要研究内容。传统的肿瘤信息特征提取方法大多基于信息基因选择方法,但是在筛选基因时,不可避免的会造成分类信息的流失。提出了一种基于邻接矩阵分解的肿瘤亚型特征提取方法,首先对肿瘤基因表达谱数据构造高斯权邻接矩阵,接着对邻接矩阵进行奇异值分解,最后将分解得到的正交矩阵特征行向量作为分类特征输入支持向量机进行分类识别。采用留一法对白血病两个亚型的基因表达谱数据集进行实验,实验结果证明了该方法的可行性和有效性。  相似文献   

14.
Dynamic models of gene expression and classification   总被引:3,自引:0,他引:3  
Powerful new methods, like expression profiles using cDNA arrays, have been used to monitor changes in gene expression levels as a result of a variety of metabolic, xenobiotic or pathogenic challenges. This potentially vast quantity of data enables, in principle, the dissection of the complex genetic networks that control the patterns and rhythms of gene expression in the cell. Here we present a general approach to developing dynamic models for analyzing time series of whole genome expression. In this approach, a self-consistent calculation is performed that involves both linear and non-linear response terms for interrelating gene expression levels. This calculation uses singular value decomposition (SVD) not as a statistical tool but as a means of inverting noisy and near-singular matrices. The linear transition matrix that is determined from this calculation can be used to calculate the underlying network reflected in the data. This suggests a direct method of classifying genes according to their place in the resulting network. In addition to providing a means to model such a large multivariate system this approach can be used to reduce the dimensionality of the problem in a rational and consistent way, and suppress the strong noise amplification effects often encountered with expression profile data. Non-linear and higher-order Markov behavior of the network are also determined in this self-consistent method. In data sets from yeast, we calculate the Markov matrix and the gene classes based on the linear-Markov network. These results compare favorably with previously used methods like cluster analysis. Our dynamic method appears to give a broad and general framework for data analysis and modeling of gene expression arrays. Electronic Publication  相似文献   

15.
基于PCR的基因差异表达分析技术   总被引:2,自引:0,他引:2  
基因差异表达分析是研究许多生物学过程的分子基础的一条直接、有效的途径。自DDRT-PCR技术建立以来,一系列基于PCR的基因差异表达分析技术,如SAGE、SSH、RDA和DNA微阵列等相继发展起来,为分析和克隆差异表达的基因提供了更为快速、灵敏的工具。本对这几种方法进行了简要综述,比较了不同方法的优缺点,并展望了今后基因差异表达研究技术的发展方向。  相似文献   

16.
Array-based gene expression studies frequently serve to identify genes that are expressed differently under two or more conditions. The actual analysis of the data, however, may be hampered by a number of technical and statistical problems. Possible remedies on the level of computational analysis lie in appropriate preprocessing steps, proper normalization of the data and application of statistical testing procedures in the derivation of differentially expressed genes. This review summarizes methods that are available for these purposes and provides a brief overview of the available software tools.  相似文献   

17.
The accumulation of DNA microarray data has now made it possible to use gene expression profiles to analyse expression data. A gene expression profile contains the expression data for a given gene over various samples, and can be contrasted with an expression signature, which contains the expression data for a single sample. Gene expression profiles are most revealing when samples are grouped appropriately, either by standard clinical or pathological categories or by categories discovered through cluster analysis techniques. Expression profiles can exist at various levels of abstraction, yielding information across various tissues or across diseases within a particular tissue. Hypothesis tests may be applied to expression profiles on a large scale to identify candidate genes of interest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号