首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Testing for differentially expressed genes with microarray data   总被引:1,自引:1,他引:0       下载免费PDF全文
This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.  相似文献   

2.
The ordinary-, penalized-, and bootstrap t-test, least squares and best linear unbiased prediction were compared for their false discovery rates (FDR), i.e. the fraction of falsely discovered genes, which was empirically estimated in a duplicate of the data set. The bootstrap-t-test yielded up to 80% lower FDRs than the alternative statistics, and its FDR was always as good as or better than any of the alternatives. Generally, the predicted FDR from the bootstrapped P-values agreed well with their empirical estimates, except when the number of mRNA samples is smaller than 16. In a cancer data set, the bootstrap-t-test discovered 200 differentially regulated genes at a FDR of 2.6%, and in a knock-out gene expression experiment 10 genes were discovered at a FDR of 3.2%. It is argued that, in the case of microarray data, control of the FDR takes sufficient account of the multiple testing, whilst being less stringent than Bonferoni-type multiple testing corrections. Extensions of the bootstrap simulations to more complicated test-statistics are discussed.  相似文献   

3.
Black tooth stain is a characteristic extrinsic discoloration commonly seen on the cervical enamel following the contour of the gingiva. To investigate the relationship between black tooth stain and the oral microbiota, we used 16S rRNA gene sequencing to compare the microbial composition of dental plaque and saliva among caries-free children with and without black stain. Dental plaque and saliva, as well as black stain, were sampled from 10 children with and 15 children without black stain. Data were analyzed using the pipeline tool MOTHUR. Student’s t-test was used to compare alpha diversities and the Mann-Whitney U test to compare the relative abundances of the microbial taxa. A total of 10 phyla, 19 classes, 32 orders, 61 families and 102 genera were detected in these samples. Shannon and Simpson diversity were found to be significantly lower in saliva samples of children with black stain. Microbial diversity was reduced in the black stain compared to the plaque samples. Actinomyces, Cardiobacterium, Haemophilus, Corynebacterium, Tannerella and Treponema were more abundant and Campylobacter less abundant in plaque samples of children with black stain. Principal component analysis demonstrated clustering among the dental plaque samples from the control group, while the plaque samples from the black stain group were not and appeared to cluster into two subgroups. Alterations in oral microbiota may be associated with the formation of black stain.  相似文献   

4.

Background

The recent emergence of high-throughput automated image acquisition technologies has forever changed how cell biologists collect and analyze data. Historically, the interpretation of cellular phenotypes in different experimental conditions has been dependent upon the expert opinions of well-trained biologists. Such qualitative analysis is particularly effective in detecting subtle, but important, deviations in phenotypes. However, while the rapid and continuing development of automated microscope-based technologies now facilitates the acquisition of trillions of cells in thousands of diverse experimental conditions, such as in the context of RNA interference (RNAi) or small-molecule screens, the massive size of these datasets precludes human analysis. Thus, the development of automated methods which aim to identify novel and biological relevant phenotypes online is one of the major challenges in high-throughput image-based screening. Ideally, phenotype discovery methods should be designed to utilize prior/existing information and tackle three challenging tasks, i.e. restoring pre-defined biological meaningful phenotypes, differentiating novel phenotypes from known ones and clarifying novel phenotypes from each other. Arbitrarily extracted information causes biased analysis, while combining the complete existing datasets with each new image is intractable in high-throughput screens.

Results

Here we present the design and implementation of a novel and robust online phenotype discovery method with broad applicability that can be used in diverse experimental contexts, especially high-throughput RNAi screens. This method features phenotype modelling and iterative cluster merging using improved gap statistics. A Gaussian Mixture Model (GMM) is employed to estimate the distribution of each existing phenotype, and then used as reference distribution in gap statistics. This method is broadly applicable to a number of different types of image-based datasets derived from a wide spectrum of experimental conditions and is suitable to adaptively process new images which are continuously added to existing datasets. Validations were carried out on different dataset, including published RNAi screening using Drosophila embryos [Additional files 1, 2], dataset for cell cycle phase identification using HeLa cells [Additional files 1, 3, 4] and synthetic dataset using polygons, our methods tackled three aforementioned tasks effectively with an accuracy range of 85%–90%. When our method is implemented in the context of a Drosophila genome-scale RNAi image-based screening of cultured cells aimed to identifying the contribution of individual genes towards the regulation of cell-shape, it efficiently discovers meaningful new phenotypes and provides novel biological insight. We also propose a two-step procedure to modify the novelty detection method based on one-class SVM, so that it can be used to online phenotype discovery. In different conditions, we compared the SVM based method with our method using various datasets and our methods consistently outperformed SVM based method in at least two of three tasks by 2% to 5%. These results demonstrate that our methods can be used to better identify novel phenotypes in image-based datasets from a wide range of conditions and organisms.

Conclusion

We demonstrate that our method can detect various novel phenotypes effectively in complex datasets. Experiment results also validate that our method performs consistently under different order of image input, variation of starting conditions including the number and composition of existing phenotypes, and dataset from different screens. In our findings, the proposed method is suitable for online phenotype discovery in diverse high-throughput image-based genetic and chemical screens.  相似文献   

5.
Traditional methods that aim to identify biomarkers that distinguish between two groups, like Significance Analysis of Microarrays or the t-test, perform optimally when such biomarkers show homogeneous behavior within each group and differential behavior between the groups. However, in many applications, this is not the case. Instead, a subgroup of samples in one group shows differential behavior with respect to all other samples. To successfully detect markers showing such imbalanced patterns of differential signal, a different approach is required. We propose a novel method, specifically designed for the Detection of Imbalanced Differential Signal (DIDS). We use an artificial dataset and a human breast cancer dataset to measure its performance and compare it with three traditional methods and four approaches that take imbalanced signal into account. Supported by extensive experimental results, we show that DIDS outperforms all other approaches in terms of power and positive predictive value. In a mouse breast cancer dataset, DIDS is the only approach that detects a functionally validated marker of chemotherapy resistance. DIDS can be applied to any continuous value data, including gene expression data, and in any context where imbalanced differential signal is manifested.  相似文献   

6.
PurposeTo address high false-positive results of FFDM issue, we make the first effort to develop a computer-aided diagnosis (CAD) scheme to analyze and distinguish breast lesions.MethodThe breast lesion regions were first segmented and depicted on FFDM images from 106 patients. In this work, 11 gray-level gap-length matrix texture features and 12 shape features were extracted form craniocaudal view and mediolateral oblique view, and then Student’s t-test, Fisher-score and Relief-F were introduced to select features. We also investigated the effect of three factors, i.e., discretisation, selection methods and classifier methods, of the classification performance via analysis of variance. Finally, a classification model was constructed. Spearman’s correlation coefficient analysis was conducted to assess the internal relevance of features.ResultsThe proposed scheme using Student’s t-test achieved an area under the receiver operating characteristic curve (AUC) value of 0.923 at 512 bins. The AUC values are 0.884, 0.867, 0.874 and 0.901 for the low gray-level gaps emphasis (LGGE), solidity, extent, and the combined set, respectively. Solidity and extent depicts the correlation coefficient of 0.86 (P < 0.05).ConclusionsWe present a new CAD scheme based on the contribution of the significant factors. The experimental results demonstrate that the presented scheme can be used to successfully distinguish breast carcinoma lesions and benign fibroadenoma lesions in our FFDM dataset and the MIAS dataset, which may provide a CAD method to assist radiologists in diagnosing and interpreting screening mammograms. Moreover, we found that LGGE, solidity and extent features show great potential for breast lesion classification.  相似文献   

7.
To identify predictive biomarkers for clinical responses to bortezomib treatment, 0.06 mL of each whole blood without any cell separation procedures was stimulated ex vivo using five agents, and eight mRNAs were quantified. In six centers, heparinized peripheral blood was prospectively obtained from 80 previously treated or untreated, symptomatic multiple myeloma (MM) patients with measurable levels of M-proteins. The blood sample was procured prior to treatment as well as 2-3 days and 1-3 weeks after the first dose of bortezomib, which was intravenously administered biweekly or weekly, during the first cycle. Six stimulant-mRNA combinations; that is, lipopolysaccharide (LPS)-granulocyte-macrophage colony-stimulating factor (GM-CSF), LPS-CXCL chemokine 10 (CXCL10), LPS-CCL chemokine 4 (CCL4), phytohemagglutinin-CCL4, zymosan A (ZA)-GMCSF and ZA-CCL4 showed significantly higher induction in the complete and very good partial response group than in the stable and progressive disease group, as determined by both parametric (t-test) and non-parametric (unpaired Mann-Whitney test) tests. Moreover, LPS-induced CXCL10 mRNA expression was significantly suppressed 2-3 days after the first dose of bortezomib in all patients, as determined by both parametric (t-test) and non-parametric (paired Wilcoxon test) tests, whereas the complete and very good partial response group showed sustained suppression 1-3 weeks after the first dose. Thus, pretreatment LPS-CXCL10 mRNA and/or the six combinations may serve as potential biomarkers for the response to bortezomib treatment in MM patients.  相似文献   

8.
9.
Hair samples of 23 male professional drivers and 20 male university teachers in Hong Kong were collected, and the concentrations of Al, Sb, As, Ca, Cu, Fe, Pb, Mg, Mn, Hg, K, Sr, S, V, and Zn were measured. Both of the target groups fell within the same age group of 35–45. The washing method of using detergent and powder was found to be comparable to that of using ether. Difference in the mean concentration of each detected element in the two groups was tested by the Student'st-test and the Wilcoxon rank-sums test. Hair concentrations of Al, Sb, Pb, Mg, Mn, and K in the «Driver Group» were significantly (p<0.05) higher than those in the “Teacher Group.” On the other hand, As and Hg were found to have a higher concentration in hair of teachers. Interpretation of the findings in terms of the environmental factor and the metabolic rate was attempted.  相似文献   

10.
The goal of improving systemic treatment of breast cancers is to evolve from treating every patient with non-specific cytotoxic chemotherapy/hormonal therapy, to a more individually-tailored direct treatment. Although anatomic staging and histological grade are important prognostic factors, they often fail to predict the clinical course of this disease. This study aimed to develop a gene expression profile associated with breast cancers of differing grades. We extracted mRNA from FFPE archival breast IDC tissue samples (Grades I–III), including benign tumours. Affymetrix GeneChip® Human Genome U133 Plus 2.0 Arrays were used to determine gene expression profiles and validated by Q-PCR. IHC was used to detect the AXIN2 protein in all tissues. From the array data, an independent group t-test revealed that 178 genes were significantly (P ≤ 0.01) differentially expressed between three grades of malignant breast tumours when compared to benign tissues. From these results, eight genes were significantly differentially expressed in more than one comparison group and are involved in processes implicated in breast cancer development and/or progression. The two most implicated candidates genes were CLD10 and ESPTI1 as their gene expression profile from the microarray analysis was replicated in Q-PCR analyses of the original tumour samples as well as in an extended population. The IHC revealed a significant association between AXIN2 protein expression and ER status. It is readily acknowledged and established that significant differences exist in gene expression between different cancer grades. Expansion of this approach may lead to an improved ability to discriminate between cancer grade and other pathological factors.  相似文献   

11.
Silvopastoral systems can be a good alternative for sustainable livestock production because they can provide ecosystem services and improve animal welfare. Most farm animals live in groups and the social organization and interactions between individuals have an impact on their welfare. Therefore, the objective of this study was to describe and compare the social behaviour of cattle (Bos indicus×Bos taurus) in a silvopastoral system based on a high density of leucaena (Leucaena leucocephala) combined with guinea grass (Megathyrsus maximus), star grass (Cynodon nlemfuensis) and some trees; with a monoculture system with C. nlemfuensis, in the region of Merida, Yucatán. Eight heifers in each system were observed from 0730 to 1530 h each day for 12 consecutive days during the dry season and 12 consecutive days during the rainy season. The animals followed a rotation between three paddocks, remaining 4 days in each paddock. The vegetation was characterized in the paddocks of the silvopastoral system to estimate the average percentage of shade provided. To make a comparison between systems, we used a t test with group dispersion, and Mann–Whitney tests with the frequency of affiliative and agonistic behaviours. We assessed differences in linearity and stability of dominance hierarchies using Landau’s index and Dietz R-test, respectively. The distance of cows with respect to the centroid of the group was shorter, and non-agonistic behaviours were 62% more frequent in the intensive silvopastoral system than in the monoculture one. Heifers in the silvopastoral system had a more linear and non-random dominance hierarchy in both seasons (dry season: h’=0.964; rainy season: h’=0.988), than heifers in the monoculture system (dry season: h’=0.571, rainy season: h’=0.536). The dominance hierarchy in the silvopastoral system was more stable between seasons (R-test=0.779) than in the monoculture system (R-test=0.224). Our results provide the first evidence that heifers in the silvopastoral system maintain more stable social hierarchies and express more sociopositive behaviours, suggesting that animal welfare was enhanced.  相似文献   

12.
Large-scale systematic analysis of gene essentiality is an important step closer toward unraveling the complex relationship between genotypes and phenotypes. Such analysis cannot be accomplished without unbiased and accurate annotations of essential genes. In current genomic databases, most of the essential gene annotations are derived from whole-genome transposon mutagenesis (TM), the most frequently used experimental approach for determining essential genes in microorganisms under defined conditions. However, there are substantial systematic biases associated with TM experiments. In this study, we developed a novel Poisson model–based statistical framework to simulate the TM insertion process and subsequently correct the experimental biases. We first quantitatively assessed the effects of major factors that potentially influence the accuracy of TM and subsequently incorporated relevant factors into the framework. Through iteratively optimizing parameters, we inferred the actual insertion events occurred and described each gene’s essentiality on probability measure. Evaluated by the definite mapping of essential gene profile in Escherichia coli, our model significantly improved the accuracy of original TM datasets, resulting in more accurate annotations of essential genes. Our method also showed encouraging results in improving subsaturation level TM datasets. To test our model’s broad applicability to other bacteria, we applied it to Pseudomonas aeruginosa PAO1 and Francisella tularensis novicida TM datasets. We validated our predictions by literature as well as allelic exchange experiments in PAO1. Our model was correct on six of the seven tested genes. Remarkably, among all three cases that our predictions contradicted the TM assignments, experimental validations supported our predictions. In summary, our method will be a promising tool in improving genomic annotations of essential genes and enabling large-scale explorations of gene essentiality. Our contribution is timely considering the rapidly increasing essential gene sets. A Webserver has been set up to provide convenient access to this tool. All results and source codes are available for download upon publication at http://research.cchmc.org/essentialgene/.  相似文献   

13.
Background: Golestan province in northeastern Iran has been known as a high-risk area for esophageal cancer (EC). This study was conducted to assess aflatoxin (AF) contamination of wheat flour (WF) samples in high and low EC-risk areas of Golestan province. Methods: Four WF samples were collected randomly from each of 25 active silos throughout the province in 2009. The levels of AFs were measured using the High-performance liquid chromatography method. Using the data of EC rates obtained from Golestan population-based cancer registry, the province was divided into high and low risk areas for EC. Student t-test and multivariate regression analysis were used to compare the levels of aflatoxins as well as the condition of silos between the two areas. Results: One hundred WF samples were collected. The mean levels of total aflatoxin and aflatoxin B1 was 1.99 and 0.53 ng g?1, respectively. The levels of total AF (p = 0.03), AFG2 (p = 0.02) and AFB1 (p = 0.003) were significantly higher in samples obtained from high risk area. Multivariate regression analysis showed that humidity of silo was the most important source of difference between silos of the two areas (p = 0.04). Conclusion: We found a positive relationship between AF level of WF samples and the risk of EC. So, AF contamination may be a possible risk factor for EC in our region. We also found that humidity of silos was the most important determinant of AF contamination of WF. Intensive control of silos conditions including humidity and temperature are needed especially in high EC-risk areas.  相似文献   

14.

Background

Biclustering algorithm can find a number of co-expressed genes under a set of experimental conditions. Recently, differential co-expression bicluster mining has been used to infer the reasonable patterns in two microarray datasets, such as, normal and cancer cells.

Methods

In this paper, we propose an algorithm, DECluster, to mine Differential co-Expression biCluster in two discretized microarray datasets. Firstly, DECluster produces the differential co-expressed genes from each pair of samples in two microarray datasets, and constructs a differential weighted undirected sample–sample relational graph. Secondly, the differential biclusters are generated in the above differential weighted undirected sample–sample relational graph. In order to mine maximal differential co-expression biclusters efficiently, we design several pruning techniques for generating maximal biclusters without candidate maintenance.

Results

The experimental results show that our algorithm is more efficient than existing methods. The performance of DECluster is evaluated by empirical p-value and gene ontology, the results show that our algorithm can find more statistically significant and biological differential co-expression biclusters than other algorithms.

Conclusions

Our proposed algorithm can find more statistically significant and biological biclusters in two microarray datasets than the other two algorithms.  相似文献   

15.
Based on a novel Q-primer real-time polymerase chain reaction (PCR) system, we designed allele-specific Q-primers for the detection of three β-thalassemia mutations [Cd41/42(-TCTT), IVSI nt5 (G>C), and IVSII nt654 (C>T)] that have a high carrier frequency in Southeast Asia. With clear distinction between heterozygote and wild-type, ΔCt (threshold cycle) values were defined. The results of evaluating 139 blinded samples by our system match perfectly with those obtained by the conventional reverse dot blot (RDB) method. With a 384-well plate that included replicates in the same analysis, our throughput reached 190 reactions per run with a turnaround time as short as 130 min, and the cost of consumables was as low as $1 (US) for each test.  相似文献   

16.
The categorical data set is an important data class in experimental biology and contains data separable into several mutually exclusive categories. Unlike measurement of a continuous variable, categorical data cannot be analyzed with methods such as the Student's t-test. Thus, these data require a different method of analysis to aid in interpretation. In this article, we will review issues related to categorical data, such as how to plot them in a graph, how to integrate results from different experiments, how to calculate the error bar/region, and how to perform significance tests. In addition, we illustrate analysis of categorical data using experimental results from developmental biology and virology studies.  相似文献   

17.
Organoids enable in vitro modeling of complex developmental processes and disease pathologies. Like most 3D cultures, organoids lack sufficient oxygen supply and therefore experience cellular stress. These negative effects are particularly prominent in complex models, such as brain organoids, and can affect lineage commitment. Here, we analyze brain organoid and fetal single‐cell RNA sequencing (scRNAseq) data from published and new datasets, totaling about 190,000 cells. We identify a unique stress signature in the data from all organoid samples, but not in fetal samples. We demonstrate that cell stress is limited to a defined subpopulation of cells that is unique to organoids and does not affect neuronal specification or maturation. We have developed a computational algorithm, Gruffi, which uses granular functional filtering to identify and remove stressed cells from any organoid scRNAseq dataset in an unbiased manner. We validated our method using six additional datasets from different organoid protocols and early brains, and show its usefulness to other organoid systems including retinal organoids. Our data show that the adverse effects of cell stress can be corrected by bioinformatic analysis for improved delineation of developmental trajectories and resemblance to in vivo data.  相似文献   

18.
《Genomics》2019,111(6):1298-1305
Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.  相似文献   

19.
The development of cancer therapies may be improved by the discovery of tumor-specific molecular dependencies. The requisite tools include genetic and chemical perturbations, each with its strengths and limitations. Chemical perturbations can be readily applied to primary cancer samples at large scale, but mechanistic understanding of hits and further pharmaceutical development is often complicated by the fact that a chemical compound has affinities to multiple proteins. To computationally infer specific molecular dependencies of individual cancers from their ex vivo drug sensitivity profiles, we developed a mathematical model that deconvolutes these data using measurements of protein-drug affinity profiles. Through integrating a drug-kinase profiling dataset and several drug response datasets, our method, DepInfeR, correctly identified known protein kinase dependencies, including the EGFR dependence of HER2+ breast cancer cell lines, the FLT3 dependence of acute myeloid leukemia (AML) with FLT3-ITD mutations and the differential dependencies on the B-cell receptor pathway in the two major subtypes of chronic lymphocytic leukemia (CLL). Furthermore, our method uncovered new subgroup-specific dependencies, including a previously unreported dependence of high-risk CLL on Checkpoint kinase 1 (CHEK1). The method also produced a detailed map of the kinase dependencies in a heterogeneous set of 117 CLL samples. The ability to deconvolute polypharmacological phenotypes into underlying causal molecular dependencies should increase the utility of high-throughput drug response assays for functional precision oncology.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号