首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基因芯片筛选差异表达基因方法比较   总被引:1,自引:0,他引:1  
单文娟  童春发  施季森 《遗传》2008,30(12):1640-1646
摘要: 使用计算机模拟数据和真实的芯片数据, 对8种筛选差异表达基因的方法进行了比较分析, 旨在比较不同方法对基因芯片数据的筛选效果。模拟数据分析表明, 所使用的8种方法对均匀分布的差异表达基因有很好的识别、检出作用。算法方面, SAM和Wilcoxon秩和检验方法较好; 数据分布方面, 正态分布的识别效果较好, 卡方分布和指数分布的识别效果较差。杨树cDNA芯片分析表明, SAM、Samroc和回归模型方法相近, 而Wilcoxon秩和检验方法与它们有较大差异。  相似文献   

2.
Chinese hamster ovary (CHO) cells are a ubiquitous tool for industrial therapeutic recombinant protein production. However, consistently generating high-producing clones remains a major challenge during the cell line development process. The glutamine synthetase (GS) and dihydrofolate reductase (DHFR) selection systems are commonly used CHO expression platforms based on controlling the balance of expression between the transgenic and endogenous GS or DHFR genes. Since the expression of the endogenous selection gene in CHO hosts can interfere with selection, generating a corresponding null CHO cell line is required to improve selection stringency, productivity, and stability. However, the efficiency of generating bi-allelic genetic knockouts using conventional protocols is very low (<5%). This significantly affects clone screening efficiency and reduces the chance of identifying robust knockout host cell lines. In this study, we use the GS expression system as an example to improve the genome editing process with zinc finger nucleases (ZFNs), resulting in improved GS-knockout efficiency of up to 46.8%. Furthermore, we demonstrate a process capable of enriching knockout CHO hosts with robust bioprocess traits. This integrated host development process yields a larger number of GS-knockout hosts with desired growth and recombinant protein expression characteristics.  相似文献   

3.
《Genomics》2020,112(1):114-126
Gene expression data are expected to make a great contribution in the producing of efficient cancer diagnosis and prognosis. Gene expression data are coded by large measured genes, and only of a few number of them carry precious information for different classes of samples. Recently, several researchers proposed gene selection methods based on metaheuristic algorithms for analysing and interpreting gene expression data. However, due to large number of selected genes with limited number of patient's samples and complex interaction between genes, many gene selection methods experienced challenges in order to approach the most relevant and reliable genes. Hence, in this paper, a hybrid filter/wrapper, called rMRMR-MBA is proposed for gene selection problem. In this method, robust Minimum Redundancy Maximum Relevancy (rMRMR) as filter to select the most promising genes and an modified bat algorithm (MBA) as search engine in wrapper approach is proposed to identify a small set of informative genes. The performance of the proposed method has been evaluated using ten gene expression datasets. For performance evaluation, MBA is evaluated by studying the convergence behaviour of MBA with and without TRIZ optimisation operators. For comparative evaluation, the results of the proposed rMRMR-MBA were compared against ten state-of-arts methods using the same datasets. The comparative study demonstrates that the proposed method produced better results in terms of classification accuracy and number of selected genes in two out of ten datasets and competitive results on the remaining datasets. In a nutshell, the proposed method is able to produce very promising results with high classification accuracy which can be considered a promising contribution for gene selection domain.  相似文献   

4.
Although Chinese hamster ovary (CHO) cells, with their unique characteristics, have become a major workhorse for the manufacture of therapeutic recombinant proteins, one of the major challenges in CHO cell line generation (CLG) is how to efficiently identify those rare, high‐producing clones among a large population of low‐ and non‐productive clones. It is not unusual that several hundred individual clones need to be screened for the identification of a commercial clonal cell line with acceptable productivity and growth profile making the cell line appropriate for commercial application. This inefficiency makes the process of CLG both time consuming and laborious. Currently, there are two main CHO expression systems, dihydrofolate reductase (DHFR)‐based methotrexate (MTX) selection and glutamine synthetase (GS)‐based methionine sulfoximine (MSX) selection, that have been in wide industrial use. Since selection of recombinant cell lines in the GS‐CHO system is based on the balance between the expression of the GS gene introduced by the expression plasmid and the addition of the GS inhibitor, L‐MSX, the expression of GS from the endogenous GS gene in parental CHOK1SV cells will likely interfere with the selection process. To study endogenous GS expression's potential impact on selection efficiency, GS‐knockout CHOK1SV cell lines were generated using the zinc finger nuclease (ZFN) technology designed to specifically target the endogenous CHO GS gene. The high efficiency (~2%) of bi‐allelic modification on the CHO GS gene supports the unique advantages of the ZFN technology, especially in CHO cells. GS enzyme function disruption was confirmed by the observation of glutamine‐dependent growth of all GS‐knockout cell lines. Full evaluation of the GS‐knockout cell lines in a standard industrial cell culture process was performed. Bulk culture productivity improved two‐ to three‐fold through the use of GS‐knockout cells as parent cells. The selection stringency was significantly increased, as indicated by the large reduction of non‐producing and low‐producing cells after 25 µM L‐MSX selection, and resulted in a six‐fold efficiency improvement in identifying similar numbers of high‐productive cell lines for a given recombinant monoclonal antibody. The potential impact of GS‐knockout cells on recombinant protein quality is also discussed. Biotechnol. Bioeng. 2012; 109:1007–1015. © 2011 Wiley Periodicals, Inc.  相似文献   

5.
利用谷氨酰胺合成酶基因(GS)[1]作扩增选择标记,结合CMV-IE启动子,在CHO细胞中高效表达乙型肝炎表面抗原基因。初筛克隆表达水平RPHA检测为1:64,经过谷氨酰胺合成酶基因的抑制剂MSX的两轮基因扩增,HBsAg的表达水平RPHA在1:256以上。方瓶静置培养收液,RIA检测HBsAg最高产量为9.5μg/毫升。表达水平较以前利用dhfr基因扩增选择系统所得到的高表达细胞系B43高一倍以上。利用GS基因扩增选择系统可以在哺乳动物细胞中高水平表达外源基因。  相似文献   

6.
This paper studies the problem of building multiclass classifiers for tissue classification based on gene expression. The recent development of microarray technologies has enabled biologists to quantify gene expression of tens of thousands of genes in a single experiment. Biologists have begun collecting gene expression for a large number of samples. One of the urgent issues in the use of microarray data is to develop methods for characterizing samples based on their gene expression. The most basic step in the research direction is binary sample classification, which has been studied extensively over the past few years. This paper investigates the next step-multiclass classification of samples based on gene expression. The characteristics of expression data (e.g. large number of genes with small sample size) makes the classification problem more challenging. The process of building multiclass classifiers is divided into two components: (i) selection of the features (i.e. genes) to be used for training and testing and (ii) selection of the classification method. This paper compares various feature selection methods as well as various state-of-the-art classification methods on various multiclass gene expression datasets. Our study indicates that multiclass classification problem is much more difficult than the binary one for the gene expression datasets. The difficulty lies in the fact that the data are of high dimensionality and that the sample size is small. The classification accuracy appears to degrade very rapidly as the number of classes increases. In particular, the accuracy was very low regardless of the choices of the methods for large-class datasets (e.g. NCI60 and GCM). While increasing the number of samples is a plausible solution to the problem of accuracy degradation, it is important to develop algorithms that are able to analyze effectively multiple-class expression data for these special datasets.  相似文献   

7.
Chinese hamster ovary (CHO) cells have been one of the most widely used host cells for the manufacture of therapeutic recombinant proteins. An effective and efficient clinical cell line development process, which could quickly identify those rare, high-producing cell lines among a large population of low and non-productive cells, is of considerable interest to speed up biological drug development. In the glutamine synthetase (GS)-CHO expression system, selection of top-producing cell lines is based on controlling the balance between the expression level of GS and the concentration of its specific inhibitor, l-methionine sulfoximine (MSX). The combined amount of GS expressed from plasmids that have been introduced through transfection and the endogenous CHO GS gene determine the stringency and efficiency of selection. Previous studies have shown significant improvement in selection stringency by using GS-knockout CHO cells, which eliminate background GS expression from the endogenous GS gene in CHOK1SV cells. To further improve selection stringency, a series of weakened SV40E promoters have been generated and used to modulate plasmid-based GS expression with the intent of manipulating GS-CHO selection, finely adjusting the balance between GS expression and GS inhibitor (MSX) levels. The reduction of SV40E promoter activities have been confirmed by TaqMan RT-PCR and GFP expression profiling. Significant productivity improvements in both bulk culture and individual clonal cell line have been achieved with the combined use of GS-knockout CHOK1SV cells and weakened SV40E promoters driving GS expression in the current cell line generation process. The selection stringency was significantly increased, as indicated by the shift towards higher distribution of producing-cell populations, even with no MSX added into cell culture medium. The potential applications of weakened SV40E promoter and GS-knockout cells in development of targeted integration and transient CHO expression systems are also discussed.  相似文献   

8.
9.
10.
The most widely used statistical methods for finding differentially expressed genes (DEGs) are essentially univariate. In this study, we present a new T(2) statistic for analyzing microarray data. We implemented our method using a multiple forward search (MFS) algorithm that is designed for selecting a subset of feature vectors in high-dimensional microarray datasets. The proposed T2 statistic is a corollary to that originally developed for multivariate analyses and possesses two prominent statistical properties. First, our method takes into account multidimensional structure of microarray data. The utilization of the information hidden in gene interactions allows for finding genes whose differential expressions are not marginally detectable in univariate testing methods. Second, the statistic has a close relationship to discriminant analyses for classification of gene expression patterns. Our search algorithm sequentially maximizes gene expression difference/distance between two groups of genes. Including such a set of DEGs into initial feature variables may increase the power of classification rules. We validated our method by using a spike-in HGU95 dataset from Affymetrix. The utility of the new method was demonstrated by application to the analyses of gene expression patterns in human liver cancers and breast cancers. Extensive bioinformatics analyses and cross-validation of DEGs identified in the application datasets showed the significant advantages of our new algorithm.  相似文献   

11.
R Abo  GD Jenkins  L Wang  BL Fridley 《PloS one》2012,7(8):e43301
Genetic variation underlying the regulation of mRNA gene expression in humans may provide key insights into the molecular mechanisms of human traits and complex diseases. Current statistical methods to map genetic variation associated with mRNA gene expression have typically applied standard linkage and/or association methods; however, when genome-wide SNP and mRNA expression data are available performing all pair wise comparisons is computationally burdensome and may not provide optimal power to detect associations. Consideration of different approaches to account for the high dimensionality and multiple testing issues may provide increased efficiency and statistical power. Here we present a novel approach to model and test the association between genetic variation and mRNA gene expression levels in the context of gene sets (GSs) and pathways, referred to as gene set - expression quantitative trait loci analysis (GS-eQTL). The method uses GSs to initially group SNPs and mRNA expression, followed by the application of principal components analysis (PCA) to collapse the variation and reduce the dimensionality within the GSs. We applied GS-eQTL to assess the association between SNP and mRNA expression level data collected from a cell-based model system using PharmGKB and KEGG defined GSs. We observed a large number of significant GS-eQTL associations, in which the most significant associations arose between genetic variation and mRNA expression from the same GS. However, a number of associations involving genetic variation and mRNA expression from different GSs were also identified. Our proposed GS-eQTL method effectively addresses the multiple testing limitations in eQTL studies and provides biological context for SNP-expression associations.  相似文献   

12.
Rapid and inexpensive sequencing technologies are making it possible to collect whole genome sequence data on multiple individuals from a population. This type of data can be used to quickly identify genes that control important ecological and evolutionary phenotypes by finding the targets of adaptive natural selection, and we therefore refer to such approaches as "reverse ecology." To quantify the power gained in detecting positive selection using population genomic data, we compare three statistical methods for identifying targets of selection: the McDonald-Kreitman test, the mkprf method, and a likelihood implementation for detecting d(N)/d(S) > 1. Because the first two methods use polymorphism data we expect them to have more power to detect selection. However, when applied to population genomic datasets from human, fly, and yeast, the tests using polymorphism data were actually weaker in two of the three datasets. We explore reasons why the simpler comparative method has identified more genes under selection, and suggest that the different methods may really be detecting different signals from the same sequence data. Finally, we find several statistical anomalies associated with the mkprf method, including an almost linear dependence between the number of positively selected genes identified and the prior distributions used. We conclude that interpreting the results produced by this method should be done with some caution.  相似文献   

13.

Background

Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task.

Results

We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.

Conclusions

A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users.  相似文献   

14.
Improving missing value estimation in microarray data with gene ontology   总被引:3,自引:0,他引:3  
MOTIVATION: Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation. RESULTS: We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments. AVAILABILITY: Java and Matlab codes are available on request from the authors. SUPPLEMENTARY MATERIAL: Available online at http://users.utu.fi/jotatu/GOImpute.html.  相似文献   

15.
In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets.  相似文献   

16.
用于生产重组蛋白药物的抗凋亡CHO宿主细胞株的建立   总被引:5,自引:0,他引:5  
哺乳动物工程细胞在大规模培养生产重组蛋白时很容易发生细胞凋亡,从而导致生产过程提前终止,造成生产成本高昂。细胞代谢产物氨已被证明可以促进细胞凋亡,而线粒体膜整合蛋白Bcl-2可以通过促进线粒体膜完整性而抑制细胞凋亡。本实验应用谷氨酰胺合成酶加压系统在CHO工程细胞中高效表达中国仓鼠Bcl-2蛋白,使细胞具有抗凋亡能力的同时,利用谷氨酸和氨合成谷氨酰胺而有效降低培养基中氨的含量,从而达到抑制细胞凋亡的目的。  相似文献   

17.
JX Liu  Y Xu  CH Zheng  Y Wang  JY Yang 《PloS one》2012,7(7):e38873
Conventional gene selection methods based on principal component analysis (PCA) use only the first principal component (PC) of PCA or sparse PCA to select characteristic genes. These methods indeed assume that the first PC plays a dominant role in gene selection. However, in a number of cases this assumption is not satisfied, so the conventional PCA-based methods usually provide poor selection results. In order to improve the performance of the PCA-based gene selection method, we put forward the gene selection method via weighting PCs by singular values (WPCS). Because different PCs have different importance, the singular values are exploited as the weights to represent the influence on gene selection of different PCs. The ROC curves and AUC statistics on artificial data show that our method outperforms the state-of-the-art methods. Moreover, experimental results on real gene expression data sets show that our method can extract more characteristic genes in response to abiotic stresses than conventional gene selection methods.  相似文献   

18.
MOTIVATION: DNA microarray data analysis has been used previously to identify marker genes which discriminate cancer from normal samples. However, due to the limited sample size of each study, there are few common markers among different studies of the same cancer. With the rapid accumulation of microarray data, it is of great interest to integrate inter-study microarray data to increase sample size, which could lead to the discovery of more reliable markers. RESULTS: We present a novel, simple method of integrating different microarray datasets to identify marker genes and apply the method to prostate cancer datasets. In this study, by applying a new statistical method, referred to as the top-scoring pair (TSP) classifier, we have identified a pair of robust marker genes (HPN and STAT6) by integrating microarray datasets from three different prostate cancer studies. Cross-platform validation shows that the TSP classifier built from the marker gene pair, which simply compares relative expression values, achieves high accuracy, sensitivity and specificity on independent datasets generated using various array platforms. Our findings suggest a new model for the discovery of marker genes from accumulated microarray data and demonstrate how the great wealth of microarray data can be exploited to increase the power of statistical analysis. CONTACT: leixu@jhu.edu.  相似文献   

19.
20.
Discrimination of disease patients based on gene expression data is a crucial problem in clinical area. An important issue to solve this problem is to find a discriminative subset of genes from thousands of genes on a microarray or DNA chip. Aiming at finding informative genes for disease classification on microarray, we present a gene selection method based on the forward variable (gene) selection method (FSM) and show, using typical public microarray datasets, that our method can extract a small set of genes being crucial for discriminating different classes with a very high accuracy almost closed to perfect classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号