首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background

When conducting multiple hypothesis tests, it is important to control the number of false positives, or the False Discovery Rate (FDR). However, there is a tradeoff between controlling FDR and maximizing power. Several methods have been proposed, such as the q-value method, to estimate the proportion of true null hypothesis among the tested hypotheses, and use this estimation in the control of FDR. These methods usually depend on the assumption that the test statistics are independent (or only weakly correlated). However, many types of data, for example microarray data, often contain large scale correlation structures. Our objective was to develop methods to control the FDR while maintaining a greater level of power in highly correlated datasets by improving the estimation of the proportion of null hypotheses.

Results

We showed that when strong correlation exists among the data, which is common in microarray datasets, the estimation of the proportion of null hypotheses could be highly variable resulting in a high level of variation in the FDR. Therefore, we developed a re-sampling strategy to reduce the variation by breaking the correlations between gene expression values, then using a conservative strategy of selecting the upper quartile of the re-sampling estimations to obtain a strong control of FDR.

Conclusion

With simulation studies and perturbations on actual microarray datasets, our method, compared to competing methods such as q-value, generated slightly biased estimates on the proportion of null hypotheses but with lower mean square errors. When selecting genes with controlling the same FDR level, our methods have on average a significantly lower false discovery rate in exchange for a minor reduction in the power.  相似文献   

2.

Background

The role of migratory birds and of poultry trade in the dispersal of highly pathogenic H5N1 is still the topic of intense and controversial debate. In a recent contribution to this journal, Flint argues that the strict application of the scientific method can help to resolve this issue.

Discussion

We argue that Flint's identification of the scientific method with null hypothesis testing is misleading and counterproductive. There is far more to science than the testing of hypotheses; not only the justification, bur also the discovery of hypotheses belong to science. We also show why null hypothesis testing is weak and that Bayesian methods are a preferable approach to statistical inference. Furthermore, we criticize the analogy put forward by Flint between involuntary transport of poultry and long-distance migration.

Summary

To expect ultimate answers and unequivocal policy guidance from null hypothesis testing puts unrealistic expectations on a flawed approach to statistical inference and on science in general.  相似文献   

3.

Key message

Identification and allele-specific marker development of a functional SNP of HvLox - 1 which associated with barley lipoxygenase activity.

Abstract

Improving the stability of the flavor of beer is one of the main objectives in breeding barley for malting, and lipoxygenase-1 (LOX-1) is a key enzyme controlling this trait. In this study, a modified LOX activity assay was used for null LOX-1 mutant screening. Four barley landraces with no detected level of LOX-1 activity were screened from 1,083 barley germplasm accessions from China. The genomic sequence diversity of the HvLox-1 gene of the four null LOX-1 Chinese landraces was compared with that of a further 76 accessions. A total of 104 nucleotide polymorphisms were found, which contained 83 single-nucleotide polymorphisms (SNPs), 7 multiple-nucleotide polymorphisms, and 14 insertions and deletions. Most notably, we found a rare C/G mutation (SNP-61) in the second intron which led to null LOX-1 activity through an altered splicing acceptor site. In addition, an allele-specific polymerase chain reaction marker was developed for the genotyping of SNP-61, which could be used in breeding programs for barley to be used for malting. The objective was to improve beer quality.  相似文献   

4.

Background

Thick blood films are routinely used to diagnose Plasmodium falciparum malaria. Here, they were used to diagnose volunteers exposed to experimental malaria challenge.

Methods

The frequency with which blood films were positive at given parasite densities measured by PCR were analysed. The poisson distribution was used to calculate the theoretical likelihood of diagnosis. Further in vitro studies used serial dilutions to prepare thick films from malaria cultures at known parasitaemia.

Results

Even in expert hands, thick blood films were considerably less sensitive than might have been expected from the parasite numbers measured by quantitative PCR. In vitro work showed that thick films prepared from malaria cultures at known parasitaemia consistently underestimated parasite densities.

Conclusion

It appears large numbers of parasites are lost during staining. This limits their sensitivity, and leads to erroneous estimates of parasite density.  相似文献   

5.

Introduction

Gene-set analysis (GSA) methods are used as complementary approaches to genome-wide association studies (GWASs). The single marker association estimates of a predefined set of genes are either contrasted with those of all remaining genes or with a null non-associated background. To pool the p-values from several GSAs, it is important to take into account the concordance of the observed patterns resulting from single marker association point estimates across any given gene set. Here we propose an enhanced version of Fisher’s inverse χ2-method META-GSA, however weighting each study to account for imperfect correlation between association patterns.

Simulation and Power

We investigated the performance of META-GSA by simulating GWASs with 500 cases and 500 controls at 100 diallelic markers in 20 different scenarios, simulating different relative risks between 1 and 1.5 in gene sets of 10 genes. Wilcoxon’s rank sum test was applied as GSA for each study. We found that META-GSA has greater power to discover truly associated gene sets than simple pooling of the p-values, by e.g. 59% versus 37%, when the true relative risk for 5 of 10 genes was assume to be 1.5. Under the null hypothesis of no difference in the true association pattern between the gene set of interest and the set of remaining genes, the results of both approaches are almost uncorrelated. We recommend not relying on p-values alone when combining the results of independent GSAs.

Application

We applied META-GSA to pool the results of four case-control GWASs of lung cancer risk (Central European Study and Toronto/Lunenfeld-Tanenbaum Research Institute Study; German Lung Cancer Study and MD Anderson Cancer Center Study), which had already been analyzed separately with four different GSA methods (EASE; SLAT, mSUMSTAT and GenGen). This application revealed the pathway GO0015291 “transmembrane transporter activity” as significantly enriched with associated genes (GSA-method: EASE, p = 0.0315 corrected for multiple testing). Similar results were found for GO0015464 “acetylcholine receptor activity” but only when not corrected for multiple testing (all GSA-methods applied; p≈0.02).  相似文献   

6.

Background

Microarray technology provides an efficient means for globally exploring physiological processes governed by the coordinated expression of multiple genes. However, identification of genes differentially expressed in microarray experiments is challenging because of their potentially high type I error rate. Methods for large-scale statistical analyses have been developed but most of them are applicable to two-sample or two-condition data.

Results

We developed a large-scale multiple-group F-test based method, named ranking analysis of F-statistics (RAF), which is an extension of ranking analysis of microarray data (RAM) for two-sample t-test. In this method, we proposed a novel random splitting approach to generate the null distribution instead of using permutation, which may not be appropriate for microarray data. We also implemented a two-simulation strategy to estimate the false discovery rate. Simulation results suggested that it has higher efficiency in finding differentially expressed genes among multiple classes at a lower false discovery rate than some commonly used methods. By applying our method to the experimental data, we found 107 genes having significantly differential expressions among 4 treatments at <0.7% FDR, of which 31 belong to the expressed sequence tags (ESTs), 76 are unique genes who have known functions in the brain or central nervous system and belong to six major functional groups.

Conclusion

Our method is suitable to identify differentially expressed genes among multiple groups, in particular, when sample size is small.  相似文献   

7.
8.

Background

The detection of change in magnitude of directional coupling between two non-linear time series is a common subject of interest in the biomedical domain, including studies involving the respiratory chemoreflex system. Although transfer entropy is a useful tool in this avenue, no study to date has investigated how different transfer entropy estimation methods perform in typical biomedical applications featuring small sample size and presence of outliers.

Methods

With respect to detection of increased coupling strength, we compared three transfer entropy estimation techniques using both simulated time series and respiratory recordings from lambs. The following estimation methods were analyzed: fixed-binning with ranking, kernel density estimation (KDE), and the Darbellay-Vajda (D-V) adaptive partitioning algorithm extended to three dimensions. In the simulated experiment, sample size was varied from 50 to 200, while coupling strength was increased. In order to introduce outliers, the heavy-tailed Laplace distribution was utilized. In the lamb experiment, the objective was to detect increased respiratory-related chemosensitivity to O 2 and CO 2 induced by a drug, domperidone. Specifically, the separate influence of end-tidal PO 2 and PCO 2 on minute ventilation before and after administration of domperidone was analyzed.

Results

In the simulation, KDE detected increased coupling strength at the lowest SNR among the three methods. In the lamb experiment, D-V partitioning resulted in the statistically strongest increase in transfer entropy post-domperidone for . In addition, D-V partitioning was the only method that could detect an increase in transfer entropy for , in agreement with experimental findings.

Conclusions

Transfer entropy is capable of detecting directional coupling changes in non-linear biomedical time series analysis featuring a small number of observations and presence of outliers. The results of this study suggest that fixed-binning, even with ranking, is too primitive, and although there is no clear winner between KDE and D-V partitioning, the reader should note that KDE requires more computational time and extensive parameter selection than D-V partitioning. We hope this study provides a guideline for selection of an appropriate transfer entropy estimation method.  相似文献   

9.

Aims

A new approach is proposed to estimate fine root production, mortality, and decomposition that occur simultaneously in terrestrial ecosystems utilizing sequential soil core sampling or ingrowth core techniques.

Methods

The calculation assumes knowledge of the decomposition rate of dead fine roots during a given time period from a litter bag experiment. A mass balance model of organic matter derived from live fine roots is applied with an assumption about fine root mortality and decomposition to estimate decomposed dead fine roots from variables that can be quantified.

Results

Comparison of the estimated fine root dynamics with the decision matrix method and three new methods (forward estimate, continuous inflow estimate, and backward estimate) in a ca. 80-year-old Chamaecyparis obtusa plantation in central Japan showed that the decision matrix nearly always underestimated production, mortality, and decomposition by underscoring the values of the forward estimate, which theoretically underestimates the true value. The fine root production and mortality obtained by the decision matrix were on average 14% and 38% lower than those calculated by the continuous inflow estimate method. In addition, the values by the continuous inflow estimate method were always between those calculated by the forward estimate and backward estimate methods. The latter is known to overestimate the true value.

Conclusions

Therefore, we consider that the continuous inflow estimate method provides the best estimates of fine root production, mortality, and decomposition among the four approaches compared.  相似文献   

10.

Background

This article describes classical and Bayesian interval estimation of genetic susceptibility based on random samples with pre-specified numbers of unrelated cases and controls.

Results

Frequencies of genotypes in cases and controls can be estimated directly from retrospective case-control data. On the other hand, genetic susceptibility defined as the expected proportion of cases among individuals with a particular genotype depends on the population proportion of cases (prevalence). Given this design, prevalence is an external parameter and hence the susceptibility cannot be estimated based on only the observed data. Interval estimation of susceptibility that can incorporate uncertainty in prevalence values is explored from both classical and Bayesian perspective. Similarity between classical and Bayesian interval estimates in terms of frequentist coverage probabilities for this problem allows an appealing interpretation of classical intervals as bounds for genetic susceptibility. In addition, it is observed that both the asymptotic classical and Bayesian interval estimates have comparable average length. These interval estimates serve as a very good approximation to the "exact" (finite sample) Bayesian interval estimates. Extension from genotypic to allelic susceptibility intervals shows dependency on phenotype-induced deviations from Hardy-Weinberg equilibrium.

Conclusions

The suggested classical and Bayesian interval estimates appear to perform reasonably well. Generally, the use of exact Bayesian interval estimation method is recommended for genetic susceptibility, however the asymptotic classical and approximate Bayesian methods are adequate for sample sizes of at least 50 cases and controls.  相似文献   

11.

Background

Desert ants (Cataglyphis fortis) are central place foragers that navigate by means of path integration. This mechanism remains accurate even on three-dimensional itineraries. In this study, we tested three hypotheses concerning the underlying principles of Cataglyphis' orientation in 3-D: (1) Do the ants employ a strictly two-dimensional representation of their itineraries, (2) do they link additional information about ascents and descents to their 2-D home vector, or (3) do they use true 3-D vector navigation?

Results

We trained ants to walk routes within channels that included ascents and descents. In choice tests, ants walked on ramps more frequently and at greater lengths if their preceding journey also included vertical components. However, the sequence of ascents and descents, as well as their distance from nest and feeder, were not retraced. Importantly, the animals did not compensate for an enforced vertical deviation from the home vector.

Conclusion

We conclude that Cataglyphis fortis essentially represents its environment in a simplified, two-dimensional fashion, with information about vertical path segments being learnt, but independently from their congruence with the actual three-dimensional configuration of the environment. Our findings render the existence of a path integration mechanism that is functional in all three dimensions highly unlikely.  相似文献   

12.

Background

The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal.

Results

Here we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. First, ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Second, individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. Third, by penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. Finally, by exploiting multiple processors, large datasets can be analyzed even more rapidly.

Conclusions

The enhancements we have described make ADMIXTURE a more accurate, efficient, and versatile tool for ancestry estimation.  相似文献   

13.

Aims

Responses to salt stress of two Gypsophila species that share territory, but with different ecological optima and distribution ranges, were analysed. G. struthium is a regionally dominant Iberian endemic gypsophyte, whereas G. tomentosa is a narrow endemic reported as halophyte. The working hypothesis is that salt tolerance shapes the presence of these species in their specific habitats.

Methods

Taking a multidisciplinary approach, we assessed the soil characteristics and vegetation structure at the sampling site, seed germination and seedling development, growth and flowering, synthesis of proline and cation accumulation under artificial conditions of increasing salt stress and effect of PEG on germination and seedling development.

Results

Soil salinity was low at the all sampling points where the two species grow, but moisture was higher in the area of G. tomentosa. Differences were found in the species’ salt and drought tolerance. The different parameters tested did not show a clear pattern indicating the main role of salt tolerance in plant distribution.

Conclusions

G. tomentosa cannot be considered a true halophyte as previously reported because it is unable to complete its life cycle under salinity. The presence of G. tomentosa in habitats bordering salt marshes is a strategy to avoid plant competition and extreme water stress.  相似文献   

14.

Background and aims

(i) compare the concentrations of total polyphenols (TP) and condensed tannins (CT), and CT profiles in different organs of mature trees and seedlings of eight true mangrove species in Hong Kong; (ii) examine the antioxidant activities of CT and (iii) relate the non-enzymatic antioxidative defence system with the vertical zonation pattern of mangrove species.

Methods

Mature trees and seedlings of eight species were collected from a Hong Kong mangrove swamp to determine TP and CT concentrations and the antioxidant activities of CT.

Results

According to TP concentrations, the true mangrove species could be broadly classified into three groups, (i) Lumnitzera racemosa and Aegiceras corniculatum > (ii) Heritiera littoralis, Excoecaria agallocha, Bruguiera gymnorrhiza and Kandelia obovata > (iii) Acanthus ilicifolius and Avicennia marina. The last two are pioneer species in the most foreshore location. They also had significantly lower antioxidant activities, CT concentrations and different CT profiles than the other six species in mid- and low-tides.

Conclusions

Classification of the eight true mangrove species into three groups based on polyphenols was similar to their vertical zonation from land to sea. The relationships between these antioxidants and zonation should be further verified by transplantation studies.  相似文献   

15.

Background

q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation.

Results

We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method.

Conclusions

The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak.
  相似文献   

16.

Background

Gender differences in gene expression were estimated in liver samples from 9 males and 9 females. The study tested 31,110 genes for a gender difference using a design that adjusted for sources of variation associated with cDNA arrays, normalization, hybridizations and processing conditions.

Results

The genes were split into 2,800 that were clearly expressed (expressed genes) and 28,310 that had expression levels in the background range (not expressed genes). The distribution of p-values from the 'not expressed' group was consistent with no gender differences. The distribution of p-values from the 'expressed' group suggested that 8 % of these genes differed by gender, but the estimated fold-changes (expression in males / expression in females) were small. The largest observed fold-change was 1.55. The 95 % confidence bounds on the estimated fold-changes were less than 1.4 fold for 79.3 %, and few (1.1%) exceed 2-fold.

Conclusion

Observed gender differences in gene expression were small. When selecting genes with gender differences based upon their p-values, false discovery rates exceed 80 % for any set of genes, essentially making it impossible to identify any specific genes with a gender difference.
  相似文献   

17.
The effective population size (N e ) is a key parameter in evolutionary and population genetics. Single-sample N e estimation provides an alternative to traditional approaches requiring two or more samples. Single-sample methods assume that the study population has no genetic sub-structure, which is unlikely to be true in wild populations. Here we empirically investigated two single-sample estimators (onesamp and L d N e) in replicated and controlled genetically structured populations of Drosophila melanogaster. Using experimentally controlled population parameters, we calculated the Wright–Fisher expected N e for the structured population ( Total N e ) and demonstrated that the loss of heterozygosity did not significantly differ from Wright’s model. We found that disregarding the population substructure resulted in Total N e estimates with a low coefficient of variation but these estimates were systematically lower than the expected values, whereas hierarchical estimates accounting for population structure were closer to the expected values but had a higher coefficient of variation. Analysis of simulated populations demonstrated that incomplete sampling, initial allelic diversity and balancing selection may have contributed to deviations from the Wright–Fisher model. Overall the approximate-Bayesian onesamp method performed better than L d N e (with appropriate priors). Both methods performed best when dispersal rates were high and the population structure was approaching panmixia.  相似文献   

18.

Background and Aims

We developed a method for processing roots from soil cores and monoliths in the laboratory to reduce the time and cost devoted to separating roots from debris and improve the accuracy of root variable estimates. The method was tested on soil cores from a California oak savanna, with roots from trees, Quercus douglasii, and annual grasses.

Methods

In the randomized sampling method, one isolates the sample fraction consisting of roots and organic debris?<?= 1 cm in length, and randomizes it through immersion in water and vigorous mixing. Sub-samples from the mixture are then used to estimate the percentage of roots in this fraction, thereby enabling an estimate of total sample biomass.

Results

We found that root biomass estimates, determined through the randomization method, differed from total root biomass established by meticulously picking every root from a sample with an error of 3.0 % +/? 0.6 %?s.e.

Conclusions

This method greatly reduces the time and resources required for root processing from soil cores and monoliths, and improves the accuracy of root variable estimates compared to standard methods. This gives researchers the ability to increase sample frequency and reduce the error associated with studying roots at the landscape and plant scales.  相似文献   

19.

Key message

AtSKIP participated in cytokinin-regulated leaf initiation. Putative phosphorylated AtSKIP (AtSKIP DD ) displayed the opposite function in the leaf development from AtSKIP transgenic seedlings.

Abstract

AtSKIP, as a multiple protein, is involved in many physiological processes, such as flowering, cell cycle regulator, photomorphogenesis and stress tolerance. However, the mechanism of AtSKIP in these processes is unclear. Here, we identify one gene, AtSKIP, which is associated with cytokinin-regulated leaf growth process in Arabidopsis. The expression of AtSKIP was regulated by cytokinin. Leaf development in AtSKIP overproduced seedlings was independent of light, but promoted by cytokinin, and phosphorylation of AtSKIP (AtSKIPDD) partially interfered with AtSKIP function as a positive regulator in cytokinin signaling, indicative of true leaf formation, and the defects of AtSKIPDD in the true leaf formation could be recovered to some extent by the addition of cytokinin. Moreover, different cytokinin-responsive gene Authentic Response Regulator 7 (ARR7) promoter-GUS activity further proved that expression of AtSKIP or AtSKIPDD altered endogenous cytokinin signaling in plants. Together, these data indicate that AtSKIP participates in cytokinin-regulated promotion of leaf growth in photomorphogenesis, and that phosphorylation interferes with AtSKIP normal function.  相似文献   

20.

Background

Current methods of analyzing Affymetrix GeneChip® microarray data require the estimation of probe set expression summaries, followed by application of statistical tests to determine which genes are differentially expressed. The S-Score algorithm described by Zhang and colleagues is an alternative method that allows tests of hypotheses directly from probe level data. It is based on an error model in which the detected signal is proportional to the probe pair signal for highly expressed genes, but approaches a background level (rather than 0) for genes with low levels of expression. This model is used to calculate relative change in probe pair intensities that converts probe signals into multiple measurements with equalized errors, which are summed over a probe set to form the S-Score. Assuming no expression differences between chips, the S-Score follows a standard normal distribution, allowing direct tests of hypotheses to be made. Using spike-in and dilution datasets, we validated the S-Score method against comparisons of gene expression utilizing the more recently developed methods RMA, dChip, and MAS5.

Results

The S-score showed excellent sensitivity and specificity in detecting low-level gene expression changes. Rank ordering of S-Score values more accurately reflected known fold-change values compared to other algorithms.

Conclusion

The S-score method, utilizing probe level data directly, offers significant advantages over comparisons using only probe set expression summaries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号