首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Microarray analysis makes it possible to determine the relative expression of thousands of genes simultaneously. It has gained popularity at a rapid rate, but many caveats remain. In an effort to establish reliable microarray protocols for sweetpotato [Ipomoea batatas (L.) Lam.], we compared the effect of replication number and image analysis software with results obtained by quantitative rela-time PCR (Q-RT-PCR). Sweetpotato storage root development is the most economically important process in sweetpotato. In order to identify genes that may play a role in this process, RNA for microarray analysis was extracted from sweetpotato fibrous and storage roots. Four data sets, Spot4, Spot6, Finder4 and Finder6, were created using 4 or 6 replications, and the image analysis software of UCSF Spot or TIGR Spotfinder were used for spot detection and quantification. The ability of these methods to identify significant differential expression between treatments was investigated. The data sets with 6 replications were better at identifying genes with significant differential expression than the ones of 4 replications. Furthermore when using 6 replicates, UCSF Spot was superior to TIGR Spotfinder in identifying genes differentially expressed (18 out of 19) based on Q-RT-PCR. Our study shows the importance of proper replication number and image analysis for microarray studies.  相似文献   

2.
ABSTRACT: BACKGROUND: Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. RESULTS: We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions, that is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. CONCLUSIONS: The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.  相似文献   

3.
Time course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First, the procedure normalizes and standardizes the expression profile of each gene, and then, identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates, and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness, and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and report interesting results.  相似文献   

4.
5.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

6.
MOTIVATION: Due to advances in experimental technologies, such as microarray, mass spectrometry and nuclear magnetic resonance, it is feasible to obtain large-scale data sets, in which measurements for a large number of features can be simultaneously collected. However, the sample sizes of these data sets are usually small due to their relatively high costs, which leads to the issue of concordance among different data sets collected for the same study: features should have consistent behavior in different data sets. There is a lack of rigorous statistical methods for evaluating this concordance or discordance. METHODS: Based on a three-component normal-mixture model, we propose two likelihood ratio tests for evaluating the concordance and discordance between two large-scale data sets with two sample groups. The parameter estimation is achieved through the expectation-maximization (E-M) algorithm. A normal-distribution-quantile-based method is used for data transformation. RESULTS: To evaluate the proposed tests, we conducted some simulation studies, which suggested their satisfactory performances. As applications, the proposed tests were applied to three SELDI-MS data sets with replicates. One data set has replicates from different platforms and the other two have replicates from the same platform. We found that data generated by SELDI-MS showed satisfactory concordance between replicates from the same platform but unsatisfactory concordance between replicates from different platforms. AVAILABILITY: The R codes are freely available at http://home.gwu.edu/~ylai/research/Concordance.  相似文献   

7.
8.
Epigenome-wide association studies (EWAS) have focused primarily on DNA methylation as a chemically stable and functional epigenetic modification. However, the stability and accuracy of the measurement of methylation in different tissues and extraction types is still being actively studied, and the longitudinal stability of DNA methylation in commonly studied peripheral tissues is of great interest. Here, we used data from two studies, three tissue types, and multiple time points to assess the stability of DNA methylation measured with the Illumina Infinium HumanMethylation450 BeadChip array. Redundancy analysis enabled visual assessment of agreement of replicate samples overall and showed good agreement after removing effects of tissue type, age, and sex. At the probe level, analysis of variance contrasts separating technical and biological replicates clearly showed better agreement between technical replicates versus longitudinal samples, and suggested increased stability for buccal cells versus blood or blood spots. Intraclass correlations (ICCs) demonstrated that inter-individual variability is of similar magnitude to within-sample variability at many probes; however, as inter-individual variability increased, so did ICC. Furthermore, we were able to demonstrate decreasing agreement in methylation levels with time, despite a maximal sampling interval of only 576 days. Finally, at 6 popular candidate genes, there was a large range of stability across probes. Our findings highlight important sources of technical and biological variation in DNA methylation across different tissues over time. These data will help to inform longitudinal sampling strategies of future EWAS.  相似文献   

9.
Question: What are the effects of the number of presences on models generated with multivariate adaptive regression splines (MARS)? Do these effects vary with data quality and quantity and species ecology? Location: Spain and Ecuador. Methods: We used two data sets: (1) two trees from Spain, representing high‐occurrence number data sets with real absences and unbalanced prevalence; (2) two herbs from Ecuador, representing low‐occurrence number data sets without real absences and balanced prevalence. For model quality, we used two different measures: reliability and stability. For each sample size, different replicates were generated at random and then used to generate a consensus model. Results: Model reliability and stability decrease with sample size. Optimal minimum sample size varies depending on many factors, many of which are unknown. Regional niche variation and ecological heterogeneity are critical. Conclusions: (1) Model predictive power improves greatly with more than 18‐20 presences. (2) Model reliability depends on data quantity and quality as well as species ecological characteristics. (3) Depending on the number of presences in the data set, investigators must carefully distinguish between models that should be treated with skepticism and those whose predictions can be applied with reasonable confidence. (4) For species combining few initial presences and wide environmental range variation, it is advisable to generate several replicate models that partition the initial data and generate a consensus model. (5) Models of species with a narrow environmental range variation can be highly stable and reliable, even when generated with few presences.  相似文献   

10.
ABSTRACT: BACKGROUND: RT-qPCR is a common tool for quantification of gene expression, but its accuracy is dependent on the choice and stability (steady state expression levels) of the reference gene/s used for normalization. To date, in the bone field, there have been few studies to determine the most stable reference genes and, usually, RT-qPCR data is normalised to non-validated reference genes, most commonly GAPDH, ACTB and 18 S rRNA. Here we draw attention to the potential deleterious impact of using classical reference genes to normalise expression data for bone studies without prior validation of their stability. RESULTS: Using the geNorm and Normfinder programs, panels of mouse and human genes were assessed for their stability under three different experimental conditions: 1) disease progression of Crouzon syndrome (craniosynostosis) in a mouse model, 2) proliferative culture of cranial suture cells isolated from craniosynostosis patients and 3) osteogenesis of a mouse bone marrow stromal cell line. We demonstrate that classical reference genes are not always the most 'stable' genes and that gene 'stability' is highly dependent on experimental conditions. Selected stable genes, individually or in combination, were then used to normalise osteocalcin and alkaline phosphatase gene expression data during cranial suture fusion in the craniosynostosis mouse model and strategies compared. Strikingly, the expression trends of alkaline phosphatase and osteocalcin varied significantly when normalised to the least stable, the most stable or the three most stable genes. CONCLUSION: To minimise errors in evaluating gene expression levels, analysis of a reference panel and subsequent normalization to several stable genes is strongly recommended over normalization to a single gene. In particular, we conclude that use of single, non-validated "housekeeping" genes such as GAPDH, ACTB and 18 S rRNA, currently a widespread practice by researchers in the bone field, is likely to produce data of questionable reliability when changes are 2 fold or less, and such data should be interpreted with due caution.  相似文献   

11.
Domain-enhanced analysis of microarray data using GO annotations   总被引:2,自引:0,他引:2  
MOTIVATION: New biological systems technologies give scientists the ability to measure thousands of bio-molecules including genes, proteins, lipids and metabolites. We use domain knowledge, e.g. the Gene Ontology, to guide analysis of such data. By focusing on domain-aggregated results at, say the molecular function level, increased interpretability is available to biological scientists beyond what is possible if results are presented at the gene level. RESULTS: We use a 'top-down' approach to perform domain aggregation by first combining gene expressions before testing for differentially expressed patterns. This is in contrast to the more standard 'bottom-up' approach, where genes are first tested individually then aggregated by domain knowledge. The benefits are greater sensitivity for detecting signals. Our method, domain-enhanced analysis (DEA) is assessed and compared to other methods using simulation studies and analysis of two publicly available leukemia data sets. AVAILABILITY: Our DEA method uses functions available in R (http://www.r-project.org/) and SAS (http://www.sas.com/). The two experimental data sets used in our analysis are available in R as Bioconductor packages, 'ALL' and 'golubEsets' (http://www.bioconductor.org/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

12.
MOTIVATION: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. RESULTS: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling-based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. CONCLUSIONS: Our method reports clusterings defined by biologically focused sets of genes. In annotation-driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner. AVAILABILITY: We provide the R package adSplit as part of Bioconductor release 1.9 and on http://compdiag.molgen.mpg.de/software.  相似文献   

13.
The utility of previously generated microarray data is severely limited owing to small study size, leading to under-powered analysis, and failure of replication. Multiplicity of platforms and various sources of systematic noise limit the ability to compile existing data from similar studies. We present a model for transformation of data across different generations of Affymetrix arrays, developed using previously published datasets describing technical replicates performed with two generations of arrays. The transformation is based upon a probe set-specific regression model, generated from replicate measurements across platforms, performed using correlation coefficients. The model, when applied to the expression intensities of 5069 shared, sequence-matched probe sets in three different generations of Affymetrix Human oligonucleotide arrays, showed significant improvement in inter generation correlations between sample-wide means and individual probe set pairs. The approach was further validated by an observed reduction in Euclidean distance between signal intensities across generations for the predicted values. Finally, application of the model to independent, but related datasets resulted in improved clustering of samples based upon their biological, as opposed to technical, attributes. Our results suggest that this transformation method is a valuable tool for integrating microarray datasets from different generations of arrays.  相似文献   

14.
15.
16.
Kim BS  Rha SY  Cho GB  Chung HC 《Genomics》2004,84(2):441-448
Replication is a crucial aspect of microarray experiments, due to various sources of errors that persist even after systematic effects are removed. It has been confirmed that replication in microarray studies is not equivalent to duplication, and hence it is not a waste of scientific resources. Replication and reproducibility are the most important issues for microarray application in genomics. However, little attention has been paid to the assessment of reproducibility among replicates. Here we develop, using Spearman's footrule, a new measure of the reproducibility of cDNA microarrays, which is based on how consistently a gene's relative rank is maintained in two replicates. The reproducibility measure, termed index.R, has an R2-type operational interpretation. Index.R assesses reproducibility at the initial stage of the microarray data analysis even before normalization is done. We first define three layers of replicates, biological, technical, and hybridizational, which refer to different biological units, different mRNAs from the same tissue, and separate cDNAs from a cDNA pool. As the replicate layer moves down to a lower level, the experiment has fewer sources of errors and thus is expected to be more reproducible. To validate the method we apply index.R to two sets of controlled cDNA microarray experiments, each of which has two or three layers of replicates. Index.R shows a uniform increase as the layer of the replicates moves into a more homogeneous environment. We also note that index.R has a larger jump size than Pearson's correlation or Spearman's rank correlation for each replicate layer move, and therefore, it has greater expandability as a measure in [0,1] than these two other measures.  相似文献   

17.
Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcome in breast cancer.  相似文献   

18.
We previously mapped early-activated replication origins in the promoter regions of five abundantly transcribed genes in the slime mold Physarum polycephalum. This physical linkage between origins and genes is congruent with the preferential early replication of the active genes in mammalian cells. To determine how general this replicational organization is in the synchronous plasmodium of Physarum, we analyzed the replication of three weakly expressed genes. Bromodeoxyuridine (BrdUrd) density-shift and gene dosage experiments indicated that the redB (regulated in development) and redE genes replicate early, whereas redA replicates in mid-S phase. Bi-dimensional gel electrophoresis revealed that redA coincides with an origin that appears to be activated within a large temporal window in S phase so that the replication of the gene is not well defined temporally. The early replication of the redB and redE genes is due to the simultaneous activation of flanking origins at the onset of S phase. As a result, these two genes correspond to termination sites of DNA replication. Our data demonstrate that not all the Physarum promoters are preferred sites of initiation but, so far, all the expressed genes analyzed in detail either coincide with a replication origin or are embedded into a cluster of early firing replicons.  相似文献   

19.
The purpose of this study is to determine the kinetics of the replication of intrachromosomal versus extrachromosomal amplified dihydrofolate reductase (DHFR) genes. Previous studies reported that the DHFR gene, when carried intrachromosomally on a homogeneously staining region, replicates (as a unit) within the first 2 h of the S phase of the cell cycle. We wished to determine if the extrachromosomal location of the amplified genes carried on double minute chromosomes effects the timing of their replication. Equilibrium cesium chloride ultracentrifugation was used to separate newly replicated (BUdR-labeled) DNA from bulk DNA in a synchronized cell population. Hybridization with the cDNA for the DHFR gene allowed us to determine the period of time within the cell cycle in which the DHFR DNA sequences were replicated. We found that, in contrast to intrachromosomal dihydrofolate reductase genes that uniformly replicate as a unit at the beginning of the S phase of the cell cycle, dihydrofolate reductase genes carried on double minute chromosomes (DMs) replicate throughout the S phase of the cell cycle. These results suggest that control of replication of extrachromosomal DNA sequences may differ from intrachromosomal sequences.  相似文献   

20.
Reliable reference genes are critical for relative quantification using quantitative real‐time PCR (qPCR). Ten tomato genes (Solanum lycopersicum) and their respective primer sets, which have been used over the last 6 years as references in expression studies, were evaluated for their performance using leaf tissue samples grown under semi‐controlled conditions and infected with grey mould (Botrytis cinerea) or late blight (Phytophthora infestans). The target genes coding for U6 snRNA‐associated Sm‐like protein LSm7, calcineurin B‐like protein and V‐type proton ATPase were the most stable expressed of all the genes tested in three experimental repetitions. Evaluation of candidate reference genes with geNorm and NormFinder softwares yielded the lowest mean values for their respective primer sets LSM7, SlCBL1 and SlATPase, suggesting stable expression. However, SlATPase primer set revealed a comparably high intra‐group variation and was thus not considered further. In follow‐up experiments with P. infestans, the geNorm and NormFinder values of primer sets LSM7 and SlCBL1 were even lower, indicating the stability of their expression also under these conditions. Primer efficiency differed by ‐18 to +5 percentage points from values presented in the literature. Our findings show that a reference primer set which delivers the best results in one system may be outperformed by another under different experimental conditions, thus recommending a reassessment of both expression stability and qPCR efficiency whenever the biological or technical experimental set‐up is changed. On the basis of our results, we recommend the use of LSM7 and SlCBL1 as reference primer sets for gene expression studies on plant tissue derived from open or semi‐controlled conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号