共查询到17条相似文献,搜索用时 234 毫秒
2.
3.
基因芯片技术是基因组学中的重要研究工具。而基因芯片数据( 微阵列数据) 往往是高维的,使得降维成为微阵列数据分析中的一个必要步骤。本文对美国哈佛医学院 G. J. Gordon 等人提供的肺癌微阵列数据进行分析。通过 t- test,Wilcoxon 秩和检测分别提取微阵列数据特征属性,后根据 CART( Classification and Regression Tree) 算法,以 Gini 差异性指标作为误差函数,用提取的特征属性广延的构造分类树; 再进行剪枝找到最优规模的树,目的是提高树的泛化性能使得能很好适应新的预测数据。实验证明: 该方法对肺癌微阵列数据分类识别率达到 96% 以上,且很稳定; 并可以得到人们容易理解的分类规则和分类关键基因。 相似文献
4.
该研究以红花檵木(Loropetalum chinense var.rubrum)为材料,根据转录组测序结果和PCR方法克隆到1个黄酮醇合成酶(FLS)同源基因,命名为LcFLS1。生物信息学分析显示,LcFLS1的开放阅读框为996bp,编码331个氨基酸。氨基酸序列分析显示,LcFLS1具有典型的2-酮戊二酸和铁依赖性双加氧酶结构域;蛋白结构预测表明,球形蛋白结构的核心区域存在10个与2-酮戊二酸配体互作的位点。进化树分析结果表明,LcFLS1与茶树(Camellia sinensis)等木本植物的亲缘关系较近,而与拟南芥(Arabidopsis thaliana)等草本植物的亲缘关系较远。荧光定量PCR检测显示,LcFLS1在红花檵木的花中相对表达量最高,而在茎中最少。成功构建了LcFLS1基因的过表达载体pLcFLS1-SUPER1300,经农杆菌侵染花序法将pLcFLS1-SUPER1300质粒转入拟南芥中获得转基因植株,PCR鉴定表明获得了转LcFLS1基因拟南芥阳性植株。该研究结果为红花檵木黄酮醇的生物合成机制研究,以及药用价值的开发利用奠定了基础。 相似文献
5.
为了研究牦牛附睾组织中精子成熟的相关机理,并为探讨高原动物的生殖机制提供基本数据。本研究运用基因克隆技术对牦牛附睾Eppin基因CDS全长序列进行克隆,采用生物信息学方法进行分析,Eppin基因和编码序列特征进行了预测和分析。结果表明,牦牛Eppin基因的CDS含有一个405 bp长度的片段,由134个氨基酸编码;牦牛Eppin基因对应的蛋白分子量和理论等电点分别为15.09 ku和8.67 ku,其对应的氨基酸没有跨膜结构,归于近水性蛋白;25个α-螺旋、27个延伸链、2个β-折叠及80个无规则卷曲构成其蛋白质二级结构;牦牛Eppin基因编码氨基酸序列与黄牛、藏羚羊、绵羊等物种间同源性较高,系统进化情况与其亲缘关系远近一致。本研究应用实时荧光定量PCR技术分析Eppin基因在附睾组织3个不同区段(头部,颈部和尾部)中的表达情况,荧光定量PCR结果显示,Eppin基因在牦牛附睾组织3个不同区段中均有不同程度的表达,在附睾头部中表达最高,颈部和尾部表达较低。本研究将为牦牛附睾精子成熟的机制和Eppin基因在牦牛附睾上皮细胞中的功能提供一定的基础数据。 相似文献
6.
郭安源 《中国科学:生命科学》2021,(1):70-82
基因表达是生物体中最重要和最基础的生物学过程和分子活动,生物体正是通过调控不同基因表达而实现生长发育和抵御刺激等生命活动.转录组测序是目前在生物医学研究中应用最为广泛的高通量检测基因表达的技术,也促进了大量针对转录组数据的生物信息挖掘方法和工具的发展.本文就基因表达中的转录组数据分析和挖掘方法进行了综述,从已有大规模转... 相似文献
7.
蒸散发是水循环的关键要素,分析其变化特征有助于理解区域水资源的时空分布格局。黄河水源涵养区是黄河流域重要的生态功能区,对该区域的蒸散发变化特征进行研究并归因分析,有助于缓解黄河流域的水资源供需矛盾。基于机器学习与ERA5-land再分析数据集,探究黄河水源涵养区2000-2022年蒸散发时空变化特征及影响因素,利用驱动要素去趋势方法分析不同影响因素的作用区域。结果表明:(1)黄河水源涵养区蒸散发多年平均值分布区间为256.49-841.45 mm,空间分布特征为自东向西递减,整体呈增加趋势;(2)黄河水源涵养区蒸散发的主要影响因素是地表净太阳辐射、总降水量、相对湿度,不同子流域内的主导影响因素不同,主导影响因素与区域内的水热条件及下垫面状况有关;(3) ERA5-land再分析数据集有着较好的模拟精度,可以作为大空间尺度和长时间区间研究的数据来源,但是由于下垫面的复杂性,仍需要在研究区内开展适应性评估。 相似文献
8.
Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM (Support Vector Machines) based recursive fea- ture elimination (nRFE) algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors (PDT) could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications. 相似文献
9.
A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods. 相似文献
10.
Non-linear cancer classification using a modified radial basis function classification algorithm 总被引:1,自引:0,他引:1
Summary This paper proposes a modified radial basis function classification algorithm for non-linear cancer classification. In the
algorithm, a modified simulated annealing method is developed and combined with the linear least square and gradient paradigms
to optimize the structure of the radial basis function (RBF) classifier. The proposed algorithm can be adopted to perform
non-linear cancer classification based on gene expression profiles and applied to two microarray data sets involving various
human tumor classes: (1) Normal versus colon tumor; (2) acute myeloid leukemia (AML) versus acute lymphoblastic leukemia (ALL).
Finally, accuracy and stability for the proposed algorithm are further demonstrated by comparing with the other cancer classification
algorithms. 相似文献
11.
12.
E. B. Kuznetsova T. V. Kekeeva S. S. Larin V. V. Zemlyakova O. V. Babenko M. V. Nemtsova D. V. Zaletayev V. V. Strelnikov 《Molecular Biology》2007,41(4):562-570
An optimized methylation-sensitive restriction fingerprinting technique was used to search for differentially methylated CpG islands in the tumor genome and detected seven genes subject to abnormal epigenetic regulation in breast cancer: SEMA6B, BIN1, VCPIP1, LAMC3, KCNH2, CACNG4, and PSMF1. For each gene, the rate of promoter methylation and changes in expression were estimated in tumor and morphologically intact paired specimens of breast tissue (N = 100). Significant methylation rates of 38, 18, and 8% were found for SEMA6B, BIN1, and LAMC3, respectively. The genes were not methylated in morphologically intact breast tissue. The expression of SEMA6B, BIN1, VCPIP1, LAMC3, KCNH2, CACNG4, and PSMF1 was decreased in 44–94% of tumor specimens by the real-time RT-PCR assay. The most profound changes in SEMA6B and LAMC3 suggest that these genes can be included in biomarker panels for breast cancer diagnosis. Fine methylation mapping of the most frequently methylated CpG islands (SEMA6B, BIN1, and LAMC3) provides a fundamental basis for developing efficient methylation tests for these genes. 相似文献
13.
14.
Dynamic models of gene expression and classification 总被引:3,自引:0,他引:3
Powerful new methods, like expression profiles using cDNA arrays, have been used to monitor changes in gene expression levels
as a result of a variety of metabolic, xenobiotic or pathogenic challenges. This potentially vast quantity of data enables,
in principle, the dissection of the complex genetic networks that control the patterns and rhythms of gene expression in the
cell. Here we present a general approach to developing dynamic models for analyzing time series of whole genome expression.
In this approach, a self-consistent calculation is performed that involves both linear and non-linear response terms for interrelating
gene expression levels. This calculation uses singular value decomposition (SVD) not as a statistical tool but as a means
of inverting noisy and near-singular matrices. The linear transition matrix that is determined from this calculation can be
used to calculate the underlying network reflected in the data. This suggests a direct method of classifying genes according
to their place in the resulting network. In addition to providing a means to model such a large multivariate system this approach
can be used to reduce the dimensionality of the problem in a rational and consistent way, and suppress the strong noise amplification
effects often encountered with expression profile data. Non-linear and higher-order Markov behavior of the network are also
determined in this self-consistent method. In data sets from yeast, we calculate the Markov matrix and the gene classes based
on the linear-Markov network. These results compare favorably with previously used methods like cluster analysis. Our dynamic
method appears to give a broad and general framework for data analysis and modeling of gene expression arrays.
Electronic Publication 相似文献
15.
16.
Array-based gene expression studies frequently serve to identify genes that are expressed differently under two or more conditions. The actual analysis of the data, however, may be hampered by a number of technical and statistical problems. Possible remedies on the level of computational analysis lie in appropriate preprocessing steps, proper normalization of the data and application of statistical testing procedures in the derivation of differentially expressed genes. This review summarizes methods that are available for these purposes and provides a brief overview of the available software tools. 相似文献
17.
Wu TD 《Briefings in bioinformatics》2002,3(1):7-17
The accumulation of DNA microarray data has now made it possible to use gene expression profiles to analyse expression data. A gene expression profile contains the expression data for a given gene over various samples, and can be contrasted with an expression signature, which contains the expression data for a single sample. Gene expression profiles are most revealing when samples are grouped appropriately, either by standard clinical or pathological categories or by categories discovered through cluster analysis techniques. Expression profiles can exist at various levels of abstraction, yielding information across various tissues or across diseases within a particular tissue. Hypothesis tests may be applied to expression profiles on a large scale to identify candidate genes of interest. 相似文献