首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Classification of patients based on molecular markers, for example into different risk groups, is a modern field in medical research. The aim of this classification is often a better diagnosis or individualized therapy. The search for molecular markers often utilizes extremely high-dimensional data sets (e.g. gene-expression microarrays). However, in situations where the number of measured markers (genes) is intrinsically higher than the number of available patients, standard methods from statistical learning fail to deal correctly with this so-called "curse of dimensionality". Also feature or dimension reduction techniques based on statistical models promise only limited success. Several recent methods explore ideas of how to quantify and incorporate biological prior knowledge of molecular interactions and known cellular processes into the feature selection process. This article aims to give an overview of such current methods as well as the databases, where this external knowledge can be obtained from. For illustration, two recent methods are compared in detail, a feature selection approach for support vector machines as well as a boosting approach for regression models. As a practical example, data on patients with acute lymphoblastic leukemia are considered, where the binary endpoint "relapse within first year" should be predicted.  相似文献   

2.
等位基因多态性群体遗传结构的多元非线性分析方法   总被引:4,自引:0,他引:4  
长期以来,对于多维基因多态性数据的多元统计分析,如计算遗传距离时昕用的聚类分析、分析群体遗传结构时所用的主成分分析、因子分析和典型相关分析等,一直应用为无约束条件数据而设计的经典多元线性分析方法,并没有注意基因多态性数据的“闭合效应”所带来的问题。从分析基因多态性数据的分布和结构特征入手,文中指出了基因多态性分布具有“闭合数据”的特点,分析了由于“闭合效应”的影响,经典多元线性方法用于群体遗传结构分析昕面临的困难。根据成分数据统计分析的理论和方法,提出了基因多态性群体遗传结构的多元非线性分析基本方法。并以主成分分析为例,通过实例比较和分析了经典线性主成分分析和“对数比”非线性主成分分析的结果,证明“对数比”非线性主成分分析方法是研究基因多态性群体遗传结构的良好方法,具有特异、灵敏等优点,其结果符合群体遗传学规律。  相似文献   

3.
Instead of comparing "mutation frequencies" as used in the conventional host-mediated assay (HMA), a modified concept of measuring mutagenic potency is introduced by using a number of time intervals for taking samples. Regression analysis methods can then be applied to the numbers of mutant bacteria (reversions). Not only the mutagenic but also an additional antibacterial potency of a compound can be detected and estimated in the sam assay. It is demonstrated that interference of (undetected) antibacterial activity with the mutagenic activity may lead to misclassification of a substance concening its mutagenicity in the conventional HMA. This kind of erroneous assessment will be avoided by the LIHMA. Another advantage of the LIHMA over the conventiona HMA is that regression analysis also allows estimation of the sensitivity and reliability of the assay. The calculative procedure may be programmed on desk computers and is then most suitable for laboratories where large numbers of substances have to be examined routinely. A numerical is given using results obtained with nitrosoguanidine.  相似文献   

4.
In a typical clinical trial, there are one or two primary endpoints, and a few secondary endpoints. When at least one primary endpoint achieves statistical significance, there is considerable interest in using results for the secondary endpoints to enhance characterization of the treatment effect. Because multiple endpoints are involved, regulators may require that the familywise type I error rate be controlled at a pre-set level. This requirement can be achieved by using "gatekeeping" methods. However, existing methods suffer from logical oddities such as allowing results for secondary endpoint(s) to impact the likelihood of success for the primary endpoint(s). We propose a novel and easy-to-implement gatekeeping procedure that is devoid of such deficiencies. A real data example and simulation results are used to illustrate efficiency gains of our method relative to existing methods.  相似文献   

5.
Vaccinomics is the convergence of vaccinology and population-based omics sciences. The success of knowledge-based innovations such as vaccinomics is not only contingent on access to new biotechnologies. It also requires new ways of governance of science, knowledge production, and management. This article presents a conceptual analysis of the anticipatory and adaptive approaches that are crucial for the responsible design and sustainable transition of vaccinomics to public health practice. Anticipatory governance is a new approach to manage the uncertainties embedded on an innovation trajectory with participatory foresight, in order to devise governance instruments for collective "steering" of science and technology. As a contrast to hitherto narrowly framed "downstream impact assessments" for emerging technologies, anticipatory governance adopts a broader and interventionist approach that recognizes the social construction of technology design and innovation. It includes in its process explicit mechanisms to understand the factors upstream to the innovation trajectory such as deliberation and cocultivation of the aims, motives, funding, design, and direction of science and technology, both by experts and publics. This upstream shift from a consumer "product uptake" focus to "participatory technology design" on the innovation trajectory is an appropriately radical and necessary departure in the field of technology assessment, especially given that considerable public funds are dedicated to innovations. Recent examples of demands by research funding agencies to anticipate the broad impacts of proposed research--at a very upstream stage at the time of research funding application--suggest that anticipatory governance with foresight may be one way how postgenomics scientific practice might transform in the future toward responsible innovation. Moreover, the present context of knowledge production in vaccinomics is such that policy making for vaccines of the 21st century is occurring in the face of uncertainties where the "facts are uncertain, values in dispute, stakes high and decisions urgent and where no single one of these dimensions can be managed in isolation from the rest." This article concludes, however, that uncertainty is not an accident of the scientific method, but its very substance. Anticipatory governance with participatory foresight offers a mechanism to respond to such inherent sociotechnical uncertainties in the emerging field of vaccinomics by making the coproduction of scientific knowledge by technology and the social systems explicit. Ultimately, this serves to integrate scientific and social knowledge thereby steering innovations to coproduce results and outputs that are socially robust and context sensitive.  相似文献   

6.
Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.  相似文献   

7.
The National Toxicology Program (NTP) was established in 1978 with the broad goal of strengthening the science base of chemical toxicity, thus providing better information to regulatory and research agencies. Since that time the NTP has conducted in-depth toxicity/carcinogenesis studies on over 200 chemicals of importance to industry, the public at large and the general environment; clearly the largest such database in the world. This database is unique in that it represents an objective fairly standard accumulation of peer-reviewed information on a myriad of chemicals composed of various chemical classes, non-carcinogens as well as carcinogens. The results of these studies are reported as "no evidence, equivocal evidence, some evidence or clear evidence of carcinogenic activity" in a single sex/species. There is also an "inadequate" category for studies that have major limitations. Although noted, no attempt is made to give added weight to chemicals which cause neoplasms at multiple sites, at rare versus common sites, in both species/sexes, which occur early in the study, at low as well as high doses, or those observed in the presence or absence of toxicity (necrosis, degeneration, etc.) in the same organ. Such observational data may serve as "markers" or "alerts" for whether a chemical's in vivo carcinogenic activity is the result of mutagenic or non-mutagenic activity.  相似文献   

8.
Lájer (2007) raised the problem of using a non-random sample for statistical testing of plant community data. He argued that this violates basic assumptions of the tests, resulting thus in non-significant results. However, a huge part of present-day knowledge of vegetation science is still based on non-random, preferentially collected data of plant communities. I argue that, given the inherent limits of preferential sampling, a change of approach is now necessary, with the adoption of sampling based on random principles seeming the obvious choice. However, a complete transition to random-based sampling designs in vegetation science is limited by the yet undefined nature of plant communities and by the still diffused opinion that plant communities have a discrete nature. Randomly searching for such entities is almost impossible, given their dependence on scale of observation, plot size and shape, and the need for finding well-defined types. I conclude that the only way to solve this conundrum is to consider and study plant communities as operational units. If the limits of the plant communities are defined operationally, they can be investigated using proper sampling techniques and the collected data analyzed using adequate statistical tools.  相似文献   

9.
L Edler 《Mutation research》1992,277(1):11-33
Short-term tests (STTs) for detecting and assessing genotoxic or mutagenic effects have catalyzed the development of biostatistical methods for more than one decade. Most notably, the Ames Salmonella/microsome assay created statistical methodology with a range of applications going beyond genotoxicity. Early approaches with parametric statistical methods appeared to be insufficient and have been replaced by non-parametric ones requiring less restrictive distributional assumptions. There have also been successful attempts to use biomathematical models for establishing dose-response relationships. Overdispersion has been recognized as a major problem for the evaluation of mutagenic count data and methods to cope with it have became available. A theory of generalized linear modelling is emerging to combine dose-response modeling with much less restrictive distributional assumptions, while allowing the inclusion of concomitant factors arising from the experimental conditions. The methodological survey below reviews the present state of this development and is intended to promote further research into biostatistical issues and methods of analysis. Appropriate methods for the design and analysis of STTs are discussed. The progress for the Ames assay was only partially transmitted to the analysis of the large number of other short-term assays. Several such assays are reviewed with respect to their present state of statistical evaluation.  相似文献   

10.
The virtual ecologist approach: simulating data and observers   总被引:3,自引:0,他引:3  
Ecologists carry a well‐stocked toolbox with a great variety of sampling methods, statistical analyses and modelling tools, and new methods are constantly appearing. Evaluation and optimisation of these methods is crucial to guide methodological choices. Simulating error‐free data or taking high‐quality data to qualify methods is common practice. Here, we emphasise the methodology of the ‘virtual ecologist’ (VE) approach where simulated data and observer models are used to mimic real species and how they are ‘virtually’ observed. This virtual data is then subjected to statistical analyses and modelling, and the results are evaluated against the ‘true’ simulated data. The VE approach is an intuitive and powerful evaluation framework that allows a quality assessment of sampling protocols, analyses and modelling tools. It works under controlled conditions as well as under consideration of confounding factors such as animal movement and biased observer behaviour. In this review, we promote the approach as a rigorous research tool, and demonstrate its capabilities and practical relevance. We explore past uses of VE in different ecological research fields, where it mainly has been used to test and improve sampling regimes as well as for testing and comparing models, for example species distribution models. We discuss its benefits as well as potential limitations, and provide some practical considerations for designing VE studies. Finally, research fields are identified for which the approach could be useful in the future. We conclude that VE could foster the integration of theoretical and empirical work and stimulate work that goes far beyond sampling methods, leading to new questions, theories, and better mechanistic understanding of ecological systems.  相似文献   

11.
There are copula-based statistical models in the literature for regression with dependent data such as clustered and longitudinal overdispersed counts, for which parameter estimation and inference are straightforward. For situations where the main interest is in the regression and other univariate parameters and not the dependence, we propose a "weighted scores method", which is based on weighting score functions of the univariate margins. The weight matrices are obtained initially fitting a discretized multivariate normal distribution, which admits a wide range of dependence. The general methodology is applied to negative binomial regression models. Asymptotic and small-sample efficiency calculations show that our method is robust and nearly as efficient as maximum likelihood for fully specified copula models. An illustrative example is given to show the use of our weighted scores method to analyze utilization of health care based on family characteristics.  相似文献   

12.
Some of the more important statistical methods used in the analysisof experiments concerned with studies of phytotoxicity aredescribed, and are illustrated by data from field and laboratoryexperiments undertaken within the Department of Agricultureof the University of Oxford. Most attention has been given toa consideration of quantal effects, such as the proportionatemortality. Adjustments to allow for natural mortality or theappearance of additional plants during the course of the experimentare outlined, together with the conditions under which the datashould be transformed before analysis. Since the relationshipbetween the proportionate response and some function of thedose or concentration of the toxicant generally follows a normalsigmoid law, the methods of probit analysis are appropriatefor presize estimations. In this connexion, the design of experimetsis discussed and the calculations involved in such an analysisare illustrated. In investigations where quantitative measurements are recorded,the dose-response relationehip may also be of the normal sigmoidform, so that the data be treated by a modification of the probittechnique. The methods of statistical treatment demanded whenthe dose-response relationship does not conform to a normalsigmoid are briefly discussed.  相似文献   

13.

Background

Age-at-harvest data are among the most commonly collected, yet neglected, demographic data gathered by wildlife agencies. Statistical population construction techniques can use this information to estimate the abundance of wild populations over wide geographic areas and concurrently estimate recruitment, harvest, and natural survival rates. Although current reconstruction techniques use full age-class data (0.5, 1.5, 2.5, 3.5, … years), it is not always possible to determine an animal''s age due to inaccuracy of the methods, expense, and logistics of sample collection. The ability to inventory wild populations would be greatly expanded if pooled adult age-class data (e.g., 0.5, 1.5, 2.5+ years) could be successfully used in statistical population reconstruction.

Methodology/Principal Findings

We investigated the performance of statistical population reconstruction models developed to analyze full age-class and pooled adult age-class data. We performed Monte Carlo simulations using a stochastic version of a Leslie matrix model, which generated data over a wide range of abundance levels, harvest rates, and natural survival probabilities, representing medium-to-big game species. Results of full age-class and pooled adult age-class population reconstructions were compared for accuracy and precision. No discernible difference in accuracy was detected, but precision was slightly reduced when using the pooled adult age-class reconstruction. On average, the coefficient of variation increased by 0.059 when the adult age-class data were pooled prior to analyses. The analyses and maximum likelihood model for pooled adult age-class reconstruction are illustrated for a black-tailed deer (Odocoileus hemionus) population in Washington State.

Conclusions/Significance

Inventorying wild populations is one of the greatest challenges of wildlife agencies. These new statistical population reconstruction models should expand the demographic capabilities of wildlife agencies that have already collected pooled adult age-class data or are seeking a cost-effective method for monitoring the status and trends of our wild resources.  相似文献   

14.
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly observed in data. The performance of phylogenetic reconstruction methods where the underlying data are generated by a mixture model has stimulated considerable recent debate. Much of the controversy stems from simulations of mixture model data on a given tree topology for which reconstruction algorithms output a tree of a different topology; these findings were held up to show the shortcomings of particular tree reconstruction methods. In so doing, the underlying assumption was that mixture model data on one topology can be distinguished from data evolved on an unmixed tree of another topology given enough data and the "correct" method. Here we show that this assumption can be false. For biologists, our results imply that, for example, the combined data from two genes whose phylogenetic trees differ only in terms of branch lengths can perfectly fit a tree of a different topology.  相似文献   

15.
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.  相似文献   

16.
Under the boundary line model for a biological data set, where one variable is a biological response (e.g. crop yield) to an independent variable (e.g. available water content of the soil), we interpret the upper (or lower) boundary on a plot of the dependent variable (ordinate) against the independent variable (abscissa) as representing the maximum (or minimum) possible response for a given value of the independent variable. This concept has been widely used in soil science, agronomy and plant physiology; but it has been subject to criticism. In particular, no methods that are used to analyse the boundary line quantify the evidence that the envelope of the plot represents a boundary (in the sense of some limiting response to the independent variable) rather than simply being a fringe of extreme values of no intrinsic biological interest. In this article, we present a novel procedure that tests a data set for evidence of a boundary by considering its statistical properties in the region of the proposed boundary. The method is demonstrated using both simulated and real data sets.  相似文献   

17.
This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool.  相似文献   

18.
1. In literature two interesting methods are described to obtain from whole pooled brains or areas three types of mitochondria, namely, those of perikaryal origin and those contained in synaptosomes. 2. However, for many types of studies, such "preparative" preparations are not useful; for example, in pharmacological studies only data from a single n number of animals may be of statistical usefulness and may be correctly analyzed by statistical tests. 3. Thus a method is described by which it was possible to characterize by enzyme activities three populations from single rat brain hippocampus. 4. During preparative "analytical" procedure, it was noted that the 10% Ficoll gradients previously used in the literature were unable to separate purified mitochondria-free mitochondria. This gradient should be 12% Ficoll for single areas. 5. In addition, when results are compared using the more appropriate omega 2t for calculations of gravity forces to be applied instead of the maximum or average g for different rotors, enzymatic characterization differed considerably among the various mitochondrial populations. 6. The above considerations are also true when different pestle clearances and/or pestle rotations speeds are used during omogenizations; also lysis conditions are essential. 7. Results showed that selected experimental conditions are to be used when subcellular fractions are to be analyzed biochemically.  相似文献   

19.
We discuss numerical methods for simulating large-scale, integrate-and-fire (I&F) neuronal networks. Important elements in our numerical methods are (i) a neurophysiologically inspired integrating factor which casts the solution as a numerically tractable integral equation, and allows us to obtain stable and accurate individual neuronal trajectories (i.e., voltage and conductance time-courses) even when the I&F neuronal equations are stiff, such as in strongly fluctuating, high-conductance states; (ii) an iterated process of spike-spike corrections within groups of strongly coupled neurons to account for spike-spike interactions within a single large numerical time-step; and (iii) a clustering procedure of firing events in the network to take advantage of localized architectures, such as spatial scales of strong local interactions, which are often present in large-scale computational models—for example, those of the primary visual cortex. (We note that the spike-spike corrections in our methods are more involved than the correction of single neuron spike-time via a polynomial interpolation as in the modified Runge-Kutta methods commonly used in simulations of I&F neuronal networks.) Our methods can evolve networks with relatively strong local interactions in an asymptotically optimal way such that each neuron fires approximately once in operations, where N is the number of neurons in the system. We note that quantifications used in computational modeling are often statistical, since measurements in a real experiment to characterize physiological systems are typically statistical, such as firing rate, interspike interval distributions, and spike-triggered voltage distributions. We emphasize that it takes much less computational effort to resolve statistical properties of certain I&F neuronal networks than to fully resolve trajectories of each and every neuron within the system. For networks operating in realistic dynamical regimes, such as strongly fluctuating, high-conductance states, our methods are designed to achieve statistical accuracy when very large time-steps are used. Moreover, our methods can also achieve trajectory-wise accuracy when small time-steps are used. Action Editor: Nicolas Brunel  相似文献   

20.
An adaptation of some methods used in obesity research is presented as a teaching example to illustrate the use of 'animal models' in medical research. From sixth-form level upwards, it serves as a theoretical exercise in the analysis and interpretation of methods and data. For undergraduates it is also suitable as a laboratory exercise in dissection and measurement techniques.

Four simple means of altering fat levels in laboratory mice are described, contrasting invasive injection techniques with non-invasive dietary and behavioural means. Several measures of body fatness can be evaluated, using either the experimental mice or, more simply, untreated mice from outbred stocks. Guidelines are given for practicals in which objective and subjective measures of fatness in human subjects can be collected. Sample data for mice are given that, alone or together with class-collected data, allow graphical and statistical analysis. Throughout, the exercise lends itself to discussion of the assumptions both of the methods used on mice and men, and of the relation of such investigations to the problems of overweight in man. Obesity has many manifestations: the study of a variety of ‘animal models’ is one approach in the search for relevant physiological knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号