首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
A Bayesian model-based clustering approach is proposed for identifying differentially expressed genes in meta-analysis. A Bayesian hierarchical model is used as a scientific tool for combining information from different studies, and a mixture prior is used to separate differentially expressed genes from non-differentially expressed genes. Posterior estimation of the parameters and missing observations are done by using a simple Markov chain Monte Carlo method. From the estimated mixture model, useful measure of significance of a test such as the Bayesian false discovery rate (FDR), the local FDR (Efron et al., 2001), and the integration-driven discovery rate (IDR; Choi et al., 2003) can be easily computed. The model-based approach is also compared with commonly used permutation methods, and it is shown that the model-based approach is superior to the permutation methods when there are excessive under-expressed genes compared to over-expressed genes or vice versa. The proposed method is applied to four publicly available prostate cancer gene expression data sets and simulated data sets.  相似文献   

2.
Estimating p-values in small microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of distinct permutations can be severely limited, and pooling the permutation-derived test statistics across all genes has been proposed. However, the null distribution of the test statistics under permutation is not the same for equally and differentially expressed genes. This can have a negative impact on both p-value estimation and the power of information borrowing statistics. RESULTS: We investigate permutation based methods for estimating p-values. One of methods that uses pooling from a selected subset of the data are shown to have the correct type I error rate and to provide accurate estimates of the false discovery rate (FDR). We provide guidelines to select an appropriate subset. We also demonstrate that information borrowing statistics have substantially increased power compared to the t-test in small experiments.  相似文献   

3.
MOTIVATION: In analyses of microarray data with a design of different biological conditions, ranking genes by their differential 'importance' is often desired so that biologists can focus research on a small subset of genes that are most likely related to the experiment conditions. Permutation methods are often recommended and used, in place of their parametric counterparts, due to the small sample sizes of microarray experiments and possible non-normality of the data. The recommendations, however, are based on classical knowledge in the hypothesis test setting. RESULTS: We explore the relationship between hypothesis testing and gene ranking. We indicate that the permutation method does not provide a metric for the distance between two underlying distributions. In our simulation studies permutation methods tend to be equally or less accurate than parametric methods in ranking genes. This is partially due to the discreteness of the permutation distributions, as well as the non-metric property. In data analysis the variability in ranking genes can be assessed by bootstrap. It turns out that the variability is much lower for permutation than parametric methods, which agrees with the known robustness of permutation methods to individual outliers in the data.  相似文献   

4.
MOTIVATION: An important goal in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Various parametric tests, such as the two-sample t-test, have been used, but their possibly too strong parametric assumptions or large sample justifications may not hold in practice. As alternatives, a class of three nonparametric statistical methods, including the empirical Bayes method of Efron et al. (2001), the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2001), have been proposed. All the three methods depend on constructing a test statistic and a so-called null statistic such that the null statistic's distribution can be used to approximate the null distribution of the test statistic. However, relatively little effort has been directed toward assessment of the performance or the underlying assumptions of the methods in constructing such test and null statistics. RESULTS: We point out a problem of a current method to construct the test and null statistics, which may lead to largely inflated Type I errors (i.e. false positives). We also propose two modifications that overcome the problem. In the context of MMM, the improved performance of the modified methods is demonstrated using simulated data. In addition, our numerical results also provide evidence to support the utility and effectiveness of MMM.  相似文献   

5.
We performed a detailed bioinformatic study of the catalytic step of fructose-6-phosphate phosphorylation in glycolysis based on the raw genomic draft of Propionibacterium freudenreichii subsp. shermanii (P. shermanii) ATCC9614 [Meurice et al., 2004]. Our results provide the first in silico evidence of the coexistence of genes coding for an ATP-dependent phosphofructokinase (ATP-PFK) and a PPi-dependent phosphofructokinase (PPi-PFK), whereas the fructose-1,6-bisphosphatase (FBP) and ADP-dependent phosphofructokinase (ADP-PFK) are absent. The deduced amino acid sequence corresponding to the PPi-PFK (AJ508922) shares 100% similarity with the already characterised propionibacterial protein (P29495; Ladror et al., 1991]. The unexpected ATP-PFK gene (AJ509827) encodes a protein of 373 aa which is highly similar (50% positive residues) along at least 95% of its sequence length to different well-characterised ATP-PFKs. The characteristic PROSITE pattern important for the enzyme function of ATP-PFKs (PS00433) was conserved in the putative ATP-PFK sequence: 8 out of 9 amino acid residues. According to the recent evolutionary study of PFK proteins with different phosphate donors [Bapteste et al., 2003], the propionibacterial ATP-PFK harbours a G104-K124 residue combination, which strongly suggested that this enzyme belongs to the group of atypical ATP-PFKs. According to our phylogenetic analyses the amino acid sequence of the ATP-PFK is clustered with the atypical ATP-PFKs from group III of the Siebers classification [Siebers et al., 1998], whereas the expected PPi-PFK protein is closer to the PPi-PFKs from clade P [Müller et al., 2001]. The possible significance of the co-existence of these two PFKs and their importance for the regulation of glycolytic pathway flux in P. shermanii is discussed.  相似文献   

6.
Partially paired data sets often occur in microarray experiments (Kim et al., 2005; Liu, Liang and Jang, 2006). Discussions of testing with partially paired data are found in the literature (Lin and Stivers 1974; Ekbohm, 1976; Bhoj, 1978). Bhoj (1978) initially proposed a test statistic that uses a convex combination of paired and unpaired t statistics. Kim et al. (2005) later proposed the t3 statistic, which is a linear combination of paired and unpaired t statistics, and then used it to detect differentially expressed (DE) genes in colorectal cancer (CRC) cDNA microarray data. In this paper, we extend Kim et al.'s t3 statistic to the Hotelling's T2 type statistic Tp for detecting DE gene sets of size p. We employ Efron's empirical null principle to incorporate inter-gene correlation in the estimation of the false discovery rate. Then, the proposed Tp statistic is applied to Kim et al's CRC data to detect the DE gene sets of sizes p=2 and p=3. Our results show that for small p, particularly for p=2 and marginally for p=3, the proposed Tp statistic compliments the univariate procedure by detecting additional DE genes that were undetected in the univariate test procedure. We also conduct a simulation study to demonstrate that Efron's empirical null principle is robust to the departure from the normal assumption.  相似文献   

7.
Stewart WC  Thompson EA 《Biometrics》2006,62(3):728-734
As a result of previous large, multipoint linkage studies there is a substantial amount of existing marker data. Due to the increased sample size, genetic maps estimated from these data could be more accurate than publicly available maps. However, current methods for map estimation are restricted to data sets containing pedigrees with a small number of individuals, or cannot make full use of marker data that are observed at several loci on members of large, extended pedigrees. In this article, a maximum likelihood (ML) method for map estimation that can make full use of the marker data in a large, multipoint linkage study is described. The method is applied to replicate sets of simulated marker data involving seven linked loci, and pedigree structures based on the real multipoint linkage study of Abkevich et al. (2003, American Journal of Human Genetics 73, 1271-1281). The variance of the ML estimate is accurately estimated, and tests of both simple and composite null hypotheses are performed. An efficient procedure for combining map estimates over data sets is also suggested.  相似文献   

8.
Tan YD  Fornage M  Fu YX 《Genomics》2006,88(6):846-854
Microarray technology provides a powerful tool for the expression profile of thousands of genes simultaneously, which makes it possible to explore the molecular and metabolic etiology of the development of a complex disease under study. However, classical statistical methods and technologies fail to be applicable to microarray data. Therefore, it is necessary and motivating to develop powerful methods for large-scale statistical analyses. In this paper, we described a novel method, called Ranking Analysis of Microarray Data (RAM). RAM, which is a large-scale two-sample t-test method, is based on comparisons between a set of ranked T statistics and a set of ranked Z values (a set of ranked estimated null scores) yielded by a "randomly splitting" approach instead of a "permutation" approach and a two-simulation strategy for estimating the proportion of genes identified by chance, i.e., the false discovery rate (FDR). The results obtained from the simulated and observed microarray data show that RAM is more efficient in identification of genes differentially expressed and estimation of FDR under undesirable conditions such as a large fudge factor, small sample size, or mixture distribution of noises than Significance Analysis of Microarrays.  相似文献   

9.
Prostate cancer is an increasing threat throughout the world. As a result of a demographic shift in population, the number of men at risk for developing prostate cancer is growing rapidly. For 2002, an estimated 189,000 prostate cancer cases were diagnosed in the U.S., accompanied by an estimated 30,200 prostate cancer deaths [Jemal et al., 2002]. Most prostate cancer is now diagnosed in men who were biopsied as a result of an elevated serum PSA (>4 ng/ml) level detected following routine screening. Autopsy studies [Breslow et al., 1977; Yatani et al., 1982; Sakr et al., 1993], and the recent results of the Prostate Cancer Prevention Trial (PCPT) [Thompson et al., 2003], a large scale clinical trial where all men entered the trial without an elevated PSA (<3 ng/ml) were subsequently biopsied, indicate the prevalence of histologic prostate cancer is much higher than anticipated by PSA screening. Environmental factors, such as diet and lifestyle, have long been recognized contributors to the development of prostate cancer. Recent studies of the molecular alterations in prostate cancer cells have begun to provide clues as to how prostate cancer may arise and progress. For example, while inflammation in the prostate has been suggested previously as a contributor to prostate cancer development [Gardner and Bennett, 1992; Platz, 1998; De Marzo et al., 1999; Nelson et al., 2003], research regarding the genetic and pathological aspects of prostate inflammation has only recently begun to receive attention. Here, we review the subject of inflammation and prostate cancer as part of a "chronic epithelial injury" hypothesis of prostate carcinogenesis, and the somatic genome and phenotypic changes characteristic of prostate cancer cells. We also present the implications of these changes for prostate cancer diagnosis, detection, prevention, and treatment.  相似文献   

10.
MOTIVATION: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR), which is defined as the expected percentage of the number of false positive genes among the claimed significant genes. As a consequence, the accuracy of the FDR estimators will be important for correctly controlling FDR. Xie et al. found that the standard permutation method of estimating FDR is biased and proposed to delete the predicted differentially expressed (DE) genes in the estimation of FDR for one-sample comparison. However, we notice that the formula of the FDR used in their paper is incorrect. This makes the comparison results reported in their paper unconvincing. Other problems with their method include the biased estimation of FDR caused by over- or under-deletion of DE genes in the estimation of FDR and by the implicit use of an unreasonable estimator of the true proportion of equivalently expressed (EE) genes. Due to the great importance of accurate FDR estimation in microarray data analysis, it is necessary to point out such problems and propose improved methods. RESULTS: Our results confirm that the standard permutation method overestimates the FDR. With the correct FDR formula, we show the method of Xie et al. always gives biased estimation of FDR: it overestimates when the number of claimed significant genes is small, and underestimates when the number of claimed significant genes is large. To overcome these problems, we propose two modifications. The simulation results show that our estimator gives more accurate estimation.  相似文献   

11.
A simplified approach developed recently for the production of heterologous proteins in Escherichia coli uses 2-liter polyethylene terephthalate beverage bottles as disposable culture vessels [Sanville Millard, C. et al. 2003. Protein Expr. Purif. 29, 311-320]. The method greatly reduces the time and effort needed to produce native proteins for structural or functional studies. We now demonstrate that the approach is also well suited for production of proteins in defined media with incorporation of selenomethionine to facilitate structure determination by multiwavelength anomalous diffraction. Induction of a random set of Bacillus stearothermophilus target genes under the new protocols generated soluble selenomethionyl proteins in good yield. Several selenomethionyl proteins were purified in good yields and three were subjected to amino acid analysis. Incorporation of selenomethionine was determined to be greater than 95% in one protein and greater than 98% in the other two. In the preceding paper [Zhao et al., this issue, pp. 87-93], the approach is further extended to production of [U-15N]- or [U-13C, U-15N]-labeled proteins. The approach thus appears suitable for high-throughput production of proteins for structure determination by X-ray crystallography or nuclear magnetic resonance spectroscopy.  相似文献   

12.
MOTIVATION: The parametric F-test has been widely used in the analysis of factorial microarray experiments to assess treatment effects. However, the normality assumption is often untenable for microarray experiments with small replications. Therefore, permutation-based methods are called for help to assess the statistical significance. The distribution of the F-statistics across all the genes on the array can be regarded as a mixture distribution with a proportion of statistics generated from the null distribution of no differential gene expression whereas the other proportion of statistics generated from the alternative distribution of genes differentially expressed. This results in the fact that the permutation distribution of the F-statistics may not approximate well to the true null distribution of the F-statistics. Therefore, the construction of a proper null statistic to better approximate the null distribution of F-statistic is of great importance to the permutation-based multiple testing in microarray data analysis. RESULTS: In this paper, we extend the ideas of constructing null statistics based on pairwise differences to neglect the treatment effects from the two-sample comparison problem to the multifactorial balanced or unbalanced microarray experiments. A null statistic based on a subpartition method is proposed and its distribution is employed to approximate the null distribution of the F-statistic. The proposed null statistic is able to accommodate unbalance in the design and is also corrected for the undue correlation between its numerator and denominator. In the simulation studies and real biological data analysis, the number of true positives and the false discovery rate (FDR) of the proposed null statistic are compared with those of the permutated version of the F-statistic. It has been shown that our proposed method has a better control of the FDRs and a higher power than the standard permutation method to detect differentially expressed genes because of the better approximated tail probabilities.  相似文献   

13.
The use of 2-L polyethylene terephthalate beverage bottles as a bacterial culture vessel has been recently introduced as an enabling technology for high-throughput structural biology [Sanville Millard, C. et al., 2003. Protein Express. Purif. 29, 311-320]. In the article following this one [Stols et al., this issue, pp. 95-102], this approach was elaborated for selenomethionine labeling used for multiwavelength anomalous dispersion phasing in the X-ray crystallographic determinations of protein structure. Herein, we report an effective and reproducible schedule for uniform 15N- and 13C-labeling of recombinant proteins in 2-L beverage bottles for structural determination by NMR spectroscopy. As an example, three target proteins selected from Arabidopsis thaliana were expressed in Escherichia coli Rosetta (DE3)/pLysS from a T7-based expression vector, purified, and characterized by electrospray ionization mass spectrometry and NMR analysis by 1H-15N heteronuclear single quantum correlation spectroscopy. The results show that expressions in the unlabeled medium provide a suitable control for estimation of the level of production of the labeled protein. Mass spectral characterizations show that the purified proteins contained a level of isotopic incorporation equivalent to the isotopically labeled materials initially present in the growth medium, while NMR analysis of the [U-15N]-labeled proteins provided a convenient method to assess the solution state properties of the target protein prior to production of a more costly double-labeled sample.  相似文献   

14.
Medical intervention in procreation is not recent, as the first artificial insemination (AI) was performed more than two centuries ago. However, the interference in the reproductive process with Al is limited. The first major change concerned the possibility of fertilizing oocytesin vitro (IVF) and culture of preimplantation embryos before their transfer to the uterus. In the early nineties, it was shown that direct injection of a single spermatozoon, even abnormal or immature, into an oocyte could result in a viable embryo and child. These techniques expanded very rapidly and 45,000 IVFs, with ICSI in 50% of cases, were performed in France in 2001 (FIVNAT). Although a high incidence of major defects has not been reported, the health status of children born by these techniques is a growing concern. Congenital malformations [Hansenet al., 2002], chromosomal abnormalities [Van Steirteghemet al., 2002], neurological disorders [Stromberget al., 2002] and low birth weight [Schieveet al., 2002] have been observed and discussed, but none of them seems to be statistically much more frequent after assisted reproductive technology (ART). It is important to determine the mechanism of these defects in order to prevent them. These risks may be related to the parents’ health status and to their infertility, but they could also be linked to the techniques used for procreation. Recently, several human and animal studies have suggested an increased risk of imprinting disorders in ART offspring [Debaunet al., 2003; Gicquelet al., 2003; Maheret al., 2003; Hallidayet al., 2004]. Several elements can be considered to be responsible for these defects and each step of reproductive technology could be concerned and must be studied. Priority should be given to confirm the incidence of rare genomic imprinting diseases, such as Beckwith Wiedemann Syndrome and Angelman Syndrome after ART. Should systematic analysis of the methylation status of several imprinted genes therefore be performed to evaluate the respective influence of the use of immature gametes, ovarian stimulation and embryo culture involved in IVF/ICSI? It would also be important to evaluate other epigenetic modifications to determine the role of epigenetic deregulations that could be related to ART.  相似文献   

15.
Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal pi pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.  相似文献   

16.
Cartilage is a charged hydrated fibrous tissue exhibiting a high degree of tension-compression nonlinearity (i.e., tissue anisotropy). The effect of tension-compression nonlinearity on solute transport has not been investigated in cartilaginous tissue under dynamic loading conditions. In this study, a new model was developed based on the mechano-electrochemical mixture model [Yao and Gu, 2007, J. Biomech. Model Mechanobiol., 6, pp. 63-72, Lai et al., 1991, J. Biomech. Eng., 113, pp. 245-258], and conewise linear elasticity model [Soltz and Ateshian, 2000, J. Biomech. Eng., 122, pp. 576-586; Curnier et al., 1995, J. Elasticity, 37, pp. 1-38]. The solute desorption in cartilage under unconfined dynamic compression was investigated numerically using this new model. Analyses and results demonstrated that a high degree of tissue tension-compression nonlinearity could enhance the transport of large solutes considerably in the cartilage sample under dynamic unconfined compression, whereas it had little effect on the transport of small solutes (at 5% dynamic strain level). The loading-induced convection is an important mechanism for enhancing the transport of large solutes in the cartilage sample with tension-compression nonlinearity. The dynamic compression also promoted diffusion of large solutes in both tissues with and without tension-compression nonlinearity. These findings provide a new insight into the mechanisms of solute transport in hydrated, fibrous soft tissues.  相似文献   

17.
Predation avoidance relies primarily on behavioural mechanisms [van Schaik and van Hooff, 1983]. Primates alarm call at predators, including most birds and mammals [Cheney and Wrangham, 1987]. Alarm calls could be used to signal to the predator that it has been spotted [Zuberbühler et al., 1999], thereby probably decreasing the likelihood of an attack [Schultz, 2001], and they also inform prey of the presence of the predator, thereby increasing overall attention levels [Schülke, 2001]. Although eagles are reported to be one of the predators of Rhinopithecus bieti [Bai et al., 1987], few interactions between these monkeys and raptors have been documented to date. Here I document an interaction witnessed between R. bieti and a buzzard [Buteo sp., Yang X-J, pers. comm.].  相似文献   

18.
Recently the assumption of the independence of individual frequency components in a signal has been rejected, for example, for the EEG during defined physiological states such as sleep or sedation [9, 10]. Thus, the use of higher-order spectral analysis capable of detecting interrelations between individual signal components has proved useful. The aim of the present study was to investigate the quality of various non-parametric and parametric estimation algorithms using simulated as well as true physiological data. We employed standard algorithms available for the MATLAB. The results clearly show that parametric bispectral estimation is superior to non-parametric estimation in terms of the quality of peak localisation and the discrimination from other peaks.  相似文献   

19.
Cook AJ  Li Y 《Biometrics》2008,64(4):1289-1292
Summary. This short note evaluates the assumptions required for a permutation test to approximate the null distribution of the spatial scan statistic for censored outcomes proposed in Cook et al. (2007). In particular, we study the exchangeability conditions required for such a test under survival models. A simulation study is further performed to assess the impact on the type I error when the global exchangeability assumption is violated and to determine whether the permutation test still well approximates the null distribution.  相似文献   

20.
Count data are common endpoints in clinical trials, for example magnetic resonance imaging lesion counts in multiple sclerosis. They often exhibit high levels of overdispersion, that is variances are larger than the means. Inference is regularly based on negative binomial regression along with maximum‐likelihood estimators. Although this approach can account for heterogeneity it postulates a common overdispersion parameter across groups. Such parametric assumptions are usually difficult to verify, especially in small trials. Therefore, novel procedures that are based on asymptotic results for newly developed rate and variance estimators are proposed in a general framework. Moreover, in case of small samples the procedures are carried out using permutation techniques. Here, the usual assumption of exchangeability under the null hypothesis is not met due to varying follow‐up times and unequal overdispersion parameters. This problem is solved by the use of studentized permutations leading to valid inference methods for situations with (i) varying follow‐up times, (ii) different overdispersion parameters, and (iii) small sample sizes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号