首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A certain minimal amount of RNA from biological samples is necessary to perform a microarray experiment with suitable replication. In some cases, the amount of RNA available is insufficient, necessitating RNA amplification prior to target synthesis. However, there is some uncertainty about the reliability of targets that have been generated from amplified RNA, because of nonlinearity and preferential amplification. This current work develops a straightforward strategy to assess the reliability of microarray data obtained from amplified RNA. The tabular method we developed, which utilises a Down-Up-Missing-Below (DUMB) classification scheme, shows that microarrays generated with amplified RNA targets are reliable within constraints. There was an increase in false negatives because of the need for increased filtering. Furthermore, this analysis method is generic and can be broadly applied to evaluate all microarray data. A copy of the Microsoft Excel spreadsheet is available upon request from Edward Bearden.  相似文献   

2.
3.
Assessing genome-wide statistical significance is an important issue in genetic studies. We describe a new resampling approach for determining the appropriate thresholds for statistical significance. Our simulation results demonstrate that the proposed approach accurately controls the genome-wide type I error rate even under the large p small n situations.  相似文献   

4.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

5.
Distance sampling is a technique for estimating the abundance of animals or other objects in a region, allowing for imperfect detection. This paper evaluates the statistical efficiency of the method when its assumptions are met, both theoretically and by simulation. The theoretical component of the paper is a derivation of the asymptotic variance penalty for the distance sampling estimator arising from uncertainty about the unknown detection parameters. This asymptotic penalty factor is tabulated for several detection functions. It is typically at least 2 but can be much higher, particularly for steeply declining detection rates. The asymptotic result relies on a model which makes the strong assumption that objects are uniformly distributed across the region. The simulation study relaxes this assumption by incorporating over-dispersion when generating object locations. Distance sampling and strip transect estimators are calculated for simulated data, for a variety of overdispersion factors, detection functions, sample sizes and strip widths. The simulation results confirm the theoretical asymptotic penalty in the non-overdispersed case. For a more realistic overdispersion factor of 2, distance sampling estimation outperforms strip transect estimation when a half-normal distance function is correctly assumed, confirming previous literature. When the hazard rate model is correctly assumed, strip transect estimators have lower mean squared error than the usual distance sampling estimator when the strip width is close enough to its optimal value (± 75% when there are 100 detections; ± 50% when there are 200 detections). Whether the ecologist can set the strip width sufficiently accurately will depend on the circumstances of each particular study.  相似文献   

6.
The Poisson distribution may be employed to test whether mutation frequencies differ from control frequencies. This paper describes how this testing procedure may be used for either one-tailed or two-tailed hypotheses. It is also shown how the power of the statistical test can be calculated, the power being the probability of correctly concluding the null hypothesis to be false.  相似文献   

7.
8.
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth''s ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth''s parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth''s parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.  相似文献   

9.
Tissue microarray (TMA) is a new high-throughput method that enables simultaneous analysis of the profiles of protein expression in multiple tissue samples. TMA technology has not previously been adapted for physiological and pathophysiological studies of rodent kidneys. We have evaluated the validity and reliability of using TMA to assess protein expression in mouse and rat kidneys. A representative TMA block that we have produced included: (1) mouse and rat kidney cortex, outer medulla, and inner medulla fixed with different fixatives; (2) rat kidneys at different stages of development fixed with different fixatives; (3) mouse and rat kidneys with different physiological or pathophysiological treatments; and (4) built-in controls. As examples of the utility, immunostaining for cyclooxygenase-2, renin, Tamm Horsfall protein, aquaporin-2, connective tissue growth factor, and synaptopodin was carried out with kidney TMA slides. Quantitative analysis of cyclooxygense-2 expression in kidneys confirms that individual cores provide meaningful representations comparable to whole-kidney sections. These studies show that kidney TMA technique is a promising and useful tool for investigating the expression profiles of proteins of interest in rodent kidneys under different physiological and pathophysiological conditions. (J Histochem Cytochem 58:413–420, 2010)  相似文献   

10.

Background

As microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set.

Results

Using real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted.

Conclusions

Compared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended.  相似文献   

11.
12.
In order to explore the bioleaching mechanism and improve the bioleaching efficiency,the micro-bial community in the bioleaching solution was compared with that on the surface of minerals based on the microarray analysis.Meanwhile,the elements composition in the bioleaching solution was analyzed using the ICP-AES method.Results showed that there was a high concentration of S and Cu in the leaching solution which up to 2 380 mg/L and 1 378 mg/L,respectively,after continuously bioleaching of copper-ore concen-trate for 30 days by a mixed culture associated with 12 species of bioleaching microorganisms.Based on the data of microarray,the total of cell number in the surface of minerals was far higher than that in the bi-oleaching solution.Furthermore,the dominant communities on the surface of minerals,such as Acidithiobacillus ferrooxidans,Acidithiobacillus thiooxidans and Acidithiobacillus caldus,were similar to that in the bioleaching solution.However,the relative level of some bacteria,such as Sulfobacillus aci-dophilus and Sulfobacillus thermosulfidooxidans,showed great discrepancy with lower presence in the bi-oleaching solution with respect to the mineral surface.  相似文献   

13.
生物医学动物实验中的实验设计和统计分析   总被引:1,自引:0,他引:1  
实验设计和统计分析在动物实验研究的启动、实施和结果评价中起着关键的作用。我们对实验设计的因素、原则及实验设计类型进行了综述,阐明了统计分析在整个研究所有环节中的重要意义,并提出在生物医学动物实验中容易忽视的统计学分析的问题。  相似文献   

14.
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call “relative Signal-to-Noise ratio” (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.  相似文献   

15.
A DNA microarray to monitor the expression of bacterial metabolic genes within mixed microbial communities was designed and tested. Total RNA was extracted from pure and mixed cultures containing the 2,4-dichlorophenoxyacetic acid (2,4-D)-degrading bacterium Ralstonia eutropha JMP134, and the inducing agent 2,4-D. Induction of the 2,4-D catabolic genes present in this organism was readily detected 4, 7, and 24 h after the addition of 2,4-D. This strain was diluted into a constructed mixed microbial community derived from a laboratory scale sequencing batch reactor. Induction of two of five 2,4-D catabolic genes (tfdA and tfdC) from populations of JMP134 as low as 105 cells/ml was clearly detected against a background of 108 cells/ml. Induction of two others (tfdB and tfdE) was detected from populations of 106 cells/ml in the same background; however, the last gene, tfdF, showed no significant induction due to high variability. In another experiment, the induction of resin acid degradative genes was statistically detectable in sludge-fed pulp mill effluent exposed to dehydroabietic acid in batch experiments. We conclude that microarrays will be useful tools for the detection of bacterial gene expression in wastewaters and other complex systems.  相似文献   

16.
Riyan Cheng  Abraham A. Palmer 《Genetics》2013,193(3):1015-1018
We used simulations to evaluate methods for assessing statistical significance in association studies. When the statistical model appropriately accounted for relatedness among individuals, unrestricted permutation tests and a few other simulation-based methods effectively controlled type I error rates; otherwise, only gene dropping controlled type I error but at the expense of statistical power.  相似文献   

17.
Craig A. Stow 《Ecosystems》1999,2(3):237-241
A recently identified dinoflagellate, Pfiesteria piscicida, has been implicated as a cause of fishkills in mid-Atlantic estuaries. To date, field evidence supporting this argument has consisted of samples, analyzed for the presence of the toxic Pfiesteria forms, gathered during a fishkill. I present a probabilistic approach to examine the use of this kind of a posteriori information as an indication of cause and effect relationships. The analysis shows that the conditional probability of the presence of Pfiesteria after a fishkill has begun provides little support for Pfiesteria as a cause of fishkills, without also knowing the probability of Pfiesteria's presence under all conditions. Documenting the relative presence of toxic life stages during fishkills and under non-fishkill conditions will provide supporting evidence to assess Pfiesteria's role in fishkills. However, proving that Pfiesteria causes estuarine fishkills using only ‘after the fact‘ information is essentially impossible.  相似文献   

18.
Because of the high operation costs involved in microarray experiments, the determination of the number of replicates required to detect a gene significantly differentially expressed in a given multiple-testing procedure is of considerable significance. Calculation of power/replicate numbers required in multiple-testing procedures provides design guidance for microarray experiments. Based on this model and by choice of a multiple-testing procedure, expression noises based on permutation resampling can be considerably minimized. The method for mixture distribution model is suitable to various microarray data types obtained from single noise sources, or from multiple noise sources. By using the biological replicate number required in microarray experiments for a given power or by determining the power required to detect a gene significantly differentially expressed, given the sample size, or the best multiple-testing method can be chosen. As an example, a single-distribution model of t-statistic was fitted to an observed microarray dataset of 3 000 genes responsive to stroke in rat, and then used to calculate powers of four popular multiple-testing procedures to detect a gene of an expression change D. The results show that the B-procedure had the lowest power to detect a gene of small change among the multiple-testing procedures, whereas the BH-procedure had the highest power. However, all multiple-testing procedures had the same power to identify a gene having the largest change. Similar to a single test, the power of the BH-procedure to detect a small change does not vary as the number of genes increases, but powers of the other three multiple-testing procedures decline as the number of genes increases.  相似文献   

19.
谭远德  颜亨梅 《遗传学报》2006,33(12):1132-1140
鉴于基因芯片实验的造价,在基因芯片实验设计中,首要考虑的因素是需要多少重复才能检测出一个具有显著差异表达的基因。计算多重检验法要求的重复数(样本大小)或功效可为基因芯片实验设计提供重要的参考。为此,本文基于置换重抽样法构建了一种基因表达噪声混合分布模型。该方法适用各类基因表达数据,即无论是基因表达单噪声源或是多噪声源都可行。应用混合模型和多重检验法并给定统计功效。研究者能在基因芯片实验中获得所需要的最少生物学重复数:或者根据样本大小来确定测定一个显著差异表达的基因所具有的检验功效;或者根据样本大小和统计检验功效,选择最好的统计测验方法。本文以一组在老鼠中与中风有关的3000个基因的基因芯片实验所获得的数据为例,应用该方法拟和后组建了一个单分布模型(即表达单噪声源的分布模型)。根据该模型,我们计算了4种多重检验法在鉴定一个具有表达差异(D)值的基因中所需要的统计功效。结果表明。检测一个小的差异D值,4种多重检验法中B方法的统计功效最低,而BH方法最高。但是,对于鉴定一个具有最大表达差异的基因时,4种方法有相同的鉴定功效。与传统的单个检验法一样,BH方法检测一个小的变化所需要的效率不会随基因数目增加而改变,其他3种多重检验法的检测功效则随基因数目增加而降低。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号