首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
A new summarization method for Affymetrix probe level data   总被引:3,自引:0,他引:3  
MOTIVATION: We propose a new model-based technique for summarizing high-density oligonucleotide array data at probe level for Affymetrix GeneChips. The new summarization method is based on a factor analysis model for which a Bayesian maximum a posteriori method optimizes the model parameters under the assumption of Gaussian measurement noise. Thereafter, the RNA concentration is estimated from the model. In contrast to previous methods our new method called 'Factor Analysis for Robust Microarray Summarization (FARMS)' supplies both P-values indicating interesting information and signal intensity values. RESULTS: We compare FARMS on Affymetrix's spike-in and Gene Logic's dilution data to established algorithms like Affymetrix Microarray Suite (MAS) 5.0, Model Based Expression Index (MBEI), Robust Multi-array Average (RMA). Further, we compared FARMS with 43 other methods via the 'Affycomp II' competition. The experimental results show that FARMS with default parameters outperforms previous methods if both sensitivity and specificity are simultaneously considered by the area under the receiver operating curve (AUC). We measured two quantities through the AUC: correctly detected expression changes versus wrongly detected (fold change) and correctly detected significantly different expressed genes in two sets of arrays versus wrongly detected (P-value). Furthermore FARMS is computationally less expensive then RMA, MAS and MBEI. AVAILABILITY: The FARMS R package is available from http://www.bioinf.jku.at/software/farms/farms.html. SUPPLEMENTARY INFORMATION: http://www.bioinf.jku.at/publications/papers/farms/supplementary.ps  相似文献   

3.

Background

Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods.

Results

Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression.

Conclusion

In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects.
  相似文献   

4.

Background  

Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing).  相似文献   

5.
The objective of this study was to design and validate a next-generation sequencing assay (NGS) to detect BRCA1 and BRCA2 mutations. We developed an assay using random shearing of genomic DNA followed by RNA bait tile hybridization and NGS sequencing on both the Illumina MiSeq and Ion Personal Gene Machine (PGM). We determined that the MiSeq Reporter software supplied with the instrument could not detect deletions greater than 9 base pairs. Therefore, we developed an alternative alignment and variant calling software, Quest Sequencing Analysis Pipeline (QSAP), that was capable of detecting large deletions and insertions. In validation studies, we used DNA from 27 stem cell lines, all with known deleterious BRCA1 or BRCA2 mutations, and DNA from 67 consented control individuals who had a total of 352 benign variants. Both the MiSeq/QSAP combination and PGM/Torrent Suite combination had 100% sensitivity for the 379 known variants in the validation series. However, the PGM/Torrent Suite combination had a lower intra- and inter-assay precision of 96.2% and 96.7%, respectively when compared to the MiSeq/QSAP combination of 100% and 99.4%, respectively. All PGM/Torrent Suite inconsistencies were false-positive variant assignments. We began commercial testing using both platforms and in the first 521 clinical samples MiSeq/QSAP had 100% sensitivity for BRCA1/2 variants, including a 64-bp deletion and a 10-bp insertion not identified by PGM/Torrent Suite, which also suffered from a high false-positive rate. Neither the MiSeq nor PGM platform with their supplied alignment and variant calling software are appropriate for a clinical laboratory BRCA sequencing test. We have developed an NGS BRCA1/2 sequencing assay, MiSeq/QSAP, with 100% analytic sensitivity and specificity in the validation set consisting of 379 variants. The MiSeq/QSAP combination has sufficient performance for use in a clinical laboratory.  相似文献   

6.
7.
MOTIVATION: The most commonly utilized microarrays for mRNA profiling (Affymetrix) include 'probe sets' of a series of perfect match and mismatch probes (typically 22 oligonucleotides per probe set). There are an increasing number of reported 'probe set algorithms' that differ in their interpretation of a probe set to derive a single normalized 'signal' representative of expression of each mRNA. These algorithms are known to differ in accuracy and sensitivity, and optimization has been done using a small set of standardized control microarray data. We hypothesized that different mRNA profiling projects have varying sources and degrees of confounding noise, and that these should alter the choice of a specific probe set algorithm. Also, we hypothesized that use of the Microarray Suite (MAS) 5.0 probe set detection p-value as a weighting function would improve the performance of all probe set algorithms. RESULTS: We built an interactive visual analysis software tool (HCE2W) to test and define parameters in Affymetrix analyses that optimize the ratio of signal (desired biological variable) versus noise (confounding uncontrolled variables). Five probe set algorithms were studied with and without statistical weighting of probe sets using the MAS 5.0 probe set detection p-values. The signal-to-noise ratio optimization method was tested in two large novel microarray datasets with different levels of confounding noise, a 105 sample U133A human muscle biopsy dataset (11 groups: mutation-defined, extensive noise), and a 40 sample U74A inbred mouse lung dataset (8 groups: little noise). Performance was measured by the ability of the specific probe set algorithm, with and without detection p-value weighting, to cluster samples into the appropriate biological groups (unsupervised agglomerative clustering with F-measure values). Of the total random sampling analyses, 50% showed a highly statistically significant difference between probe set algorithms by ANOVA [F(4,10) > 14, p < 0.0001], with weighting by MAS 5.0 detection p-value showing significance in the mouse data by ANOVA [F(1,10) > 9, p < 0.013] and paired t-test [t(9) = -3.675, p = 0.005]. Probe set detection p-value weighting had the greatest positive effect on performance of dChip difference model, ProbeProfiler and RMA algorithms. Importantly, probe set algorithms did indeed perform differently depending on the specific project, most probably due to the degree of confounding noise. Our data indicate that significantly improved data analysis of mRNA profile projects can be achieved by optimizing the choice of probe set algorithm with the noise levels intrinsic to a project, with dChip difference model with MAS 5.0 detection p-value continuous weighting showing the best overall performance in both projects. Furthermore, both existing and newly developed probe set algorithms should incorporate a detection p-value weighting to improve performance. AVAILABILITY: The Hierarchical Clustering Explorer 2.0 is available at http://www.cs.umd.edu/hcil/hce/ Murine arrays (40 samples) are publicly available at the PEPR resource (http://microarray.cnmcresearch.org/pgadatatable.asp http://pepr.cnmcresearch.org Chen et al., 2004).  相似文献   

8.
9.
BackgroundCOPD is currently the fourth leading cause of death worldwide. Statins are lipid lowering agents with documented cardiovascular benefits. Observational studies have shown that statins may have a beneficial role in COPD. The impact of statins on blood gene expression from COPD patients is largely unknown.ObjectiveIdentify blood gene signature associated with statin use in COPD patients, and the pathways underpinning this signature that could explain any potential benefits in COPD.MethodsWhole blood gene expression was measured on 168 statin users and 451 non-users from the ECLIPSE study using the Affymetrix Human Gene 1.1 ST microarray chips. Factor Analysis for Robust Microarray Summarization (FARMS) was used to process the expression data. Differential gene expression analysis was undertaken using the Linear Models for Microarray data (Limma) package adjusting for propensity score and surrogate variables. Similarity of the expression signal with published gene expression profiles was performed in ProfileChaser.Results25 genes were differentially expressed between statin users and non-users at an FDR of 10%, including LDLR, CXCR2, SC4MOL, FAM108A1, IFI35, FRYL, ABCG1, MYLIP, and DHCR24. The 25 genes were significantly enriched in cholesterol homeostasis and metabolism pathways. The resulting gene signature showed correlation with Huntington’s disease, Parkinson’s disease and acute myeloid leukemia gene signatures.ConclusionThe blood gene signature of statins’ use in COPD patients was enriched in cholesterol homeostasis pathways. Further studies are needed to delineate the role of these pathways in lung biology.  相似文献   

10.
11.
Data analysis and management represent a major challenge for gene expression studies using microarrays. Here, we compare different methods of analysis and demonstrate the utility of a personal microarray database. Gene expression during HIV infection of cell lines was studied using Affymetrix U-133 A and B chips. The data were analyzed using Affymetrix Microarray Suite and Data Mining Tool, Silicon Genetics GeneSpring, and dChip from Harvard School of Public Health. A small-scale database was established with FileMaker Pro Developer to manage and analyze the data. There was great variability among the programs in the lists of significantly changed genes constructed from the same data. Similarly choices of different parameters for normalization, comparison, and standardization greatly affected the outcome. As many probe sets on the U133 chip target the same Unigene clusters, the Unigene information can be used as an internal control to confirm and interpret the probe set results. Algorithms used for the determination of changes in gene expression require further refinement and standardization. The use of a personal database powered with Unigene information can enhance the analysis of gene expression data.  相似文献   

12.
Summaries of Affymetrix GeneChip probe level data   总被引:9,自引:0,他引:9  
High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11–20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.  相似文献   

13.
MOTIVATION: The power of microarray analyses to detect differential gene expression strongly depends on the statistical and bioinformatical approaches used for data analysis. Moreover, the simultaneous testing of tens of thousands of genes for differential expression raises the 'multiple testing problem', increasing the probability of obtaining false positive test results. To achieve more reliable results, it is, therefore, necessary to apply adjustment procedures to restrict the family-wise type I error rate (FWE) or the false discovery rate. However, for the biologist the statistical power of such procedures often remains abstract, unless validated by an alternative experimental approach. RESULTS: In the present study, we discuss a multiplicity adjustment procedure applied to classical univariate as well as to recently proposed multivariate gene-expression scores. All procedures strictly control the FWE. We demonstrate that the use of multivariate scores leads to a more efficient identification of differentially expressed genes than the widely used MAS5 approach provided by the Affymetrix software tools (Affymetrix Microarray Suite 5 or GeneChip Operating Software). The practical importance of this finding is successfully validated using real time quantitative PCR and data from spike-in experiments. AVAILABILITY: The R-code of the statistical routines can be obtained from the corresponding author. CONTACT: Schuster@imise.uni-leipzig.de  相似文献   

14.

Background

High-density oligonucleotide arrays have become a valuable tool for high-throughput gene expression profiling. Increasing the array information density and improving the analysis algorithms are two important computational research topics.

Results

A new algorithm, Match-Only Integral Distribution (MOID), was developed to analyze high-density oligonucleotide arrays. Using known data from both spiking experiments and no-change experiments performed with Affymetrix GeneChip® arrays, MOID and the Affymetrix algorithm implemented in Microarray Suite 4.0 (MAS4) were compared. While MOID gave similar performance to MAS4 in the spiking experiments, better performance was observed in the no-change experiments. MOID also provides a set of alternative statistical analysis tools to MAS4. There are two main features that distinguish MOID from MAS4. First, MOID uses continuous P values for the likelihood of gene presence, while MAS4 resorts to discrete absolute calls. Secondly, MOID uses heuristic confidence intervals for both gene expression levels and fold change values, while MAS4 categorizes the significance of gene expression level changes into discrete fold change calls.

Conclusions

The results show that by using MOID, Affymetrix GeneChip® arrays may need as little as ten probes per gene without compromising analysis accuracy.  相似文献   

15.
Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation.  相似文献   

16.
17.
The Purkinje cell degeneration (PCD) mutant mouse is characterized by a degeneration of cerebellar Purkinje cells and progressive ataxia. To identify the molecular mechanisms that lead to the death of Purkinje neurons in PCD mice, we used Affymetrix microarray technology to compare cerebellar gene expression profiles in pcd3J mutant mice 14 days of age (prior to Purkinje cell loss) to unaffected littermates. Microarray analysis, Ingenuity Pathway Analysis (IPA) and expression analysis systematic explorer (EASE) software were used to identify biological and molecular pathways implicated in the progression of Purkinje cell degeneration. IPA analysis indicated that mutant pcd3J mice showed dysregulation of specific processes that may lead to Purkinje cell death, including several molecules known to control neuronal apoptosis such as Bad, CDK5 and PTEN. These findings demonstrate the usefulness of these powerful microarray analysis tools and have important implications for understanding the mechanisms of selective neuronal death and for developing therapeutic strategies to treat neurodegenerative disorders.  相似文献   

18.
Immunosuppressive drugs significantly increasing numbers of A. galli and incidences of infection were: cortisone, cortisol, 9-α-fluorohydrocortisone, 2-methyl-9-α-fluorohydrocortisone, prednisone, prednisolone, 6-mercaptopurine, 2, 6-diaminopurine, 6-thioguanine, 5-bromodeoxyridine, 5-fluorouracil, methotrexate, chlorambucil, and actinomycin D. These drugs and/or worm burdens significantly suppressed weight gains of hosts, and neither altered the male:female ratio of worms nor their growth. The following drugs neither altered worm burdens nor increased incidences of infection: corticosterone, 2-azaadenine, 8-azaguanine, azathioprine, 6-azauridine, busulfan, thio-TEPA, triethylenemelamine, vincristine, acriflavine, reserpine, and l-phenylalanine.Worm burdens and incidences of infection were increased significantly in chickens surgically bursectomized when 3 or 14 but not 35 days old. Chicks bursectomized in ovo with testerosterone propionate on Day 5 or 14 of incubation and infected on Day 14 after hatching developed significantly increased worm burdens and incidences of infection.Applying the Kolmogorov-Smirnov test for goodness of fit to data on increased worm burdens showed that the immunosuppressive drugs or bursectomy had a normalization effect on the statistical distribution.  相似文献   

19.

Background  

During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dChip users want to analyze large number of arrays automatically and share data analysis procedures and parameters. Improvement is also needed when the dChip user support team tries to identify the causes of reported analysis errors or bugs from users.  相似文献   

20.
Variation at 13 microsatellite loci was surveyed from ~3 800 steelhead trout, Oncorhynchus mykiss, from 51 populations in British Columbia, Washington, and the Columbia River drainage. Mean FST over all 13 loci and 51 populations was 0.066. Regional structuring of populations was apparent, with Thompson River, upper Fraser River, and Columbia River populations forming distinct groups. In the Nass River, winter-run populations were distinct from the summer-run populations. Significant differences in allele frequencies were observed among regional stock groups at all loci. Analysis of variance components indicated that 5.7% of the total observed variation was distributed among 11 regions, and 2.3% of the variation was among populations within regions. Analysis of simulated mixed-stock samples suggested that variation at the microsatellite loci provided relatively accurate and precise estimates of stock composition for fishery management applications, and this was confirmed by application to actual fishery samples of known origin. Within the Fraser River drainage, individual steelhead trout can be identified to one of the three regions of origin with an accuracy of 94–97%. Microsatellites provided an effective way to determine population structure, and provided reliable estimates of stock composition in mixed-stock fisheries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号