首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Maitra R  Ramler IP 《Biometrics》2009,65(2):341-352
Summary :  A new methodology is proposed for clustering datasets in the presence of scattered observations. Scattered observations are defined as unlike any other, so traditional approaches that force them into groups can lead to erroneous conclusions. Our suggested approach is a scheme which, under assumption of homogeneous spherical clusters, iteratively builds cores around their centers and groups points within each core while identifying points outside as scatter. In the absence of scatter, the algorithm reduces to k -means. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the dataset. Results in experimental situations show excellent performance, especially when clusters are elliptically symmetric. The methodology is applied to the analysis of the United States Environmental Protection Agency's Toxic Release Inventory reports on industrial releases of mercury for the year 2000.  相似文献   

2.
Understanding of the molecular mechanisms of protein-protein interactions (PPIs) at the cell surface of living cells is fundamental to comprehend the functional meaning of a large number of cellular processes. Here we discuss how new methodological strategies derived from non-invasive fluorescence-based approaches (i.e. fluorescence resonance energy transfer, FRET) have been successfully developed to characterize plasma membrane PPIs. Importantly, these technologies alone - or in concert with complementary methods (i.e. SNAP-tag/TR-FRET, TIRF/FRET) - can become extremely powerful approaches for visualizing cell surface PPIs, even between more than two proteins and also in native tissues. Interestingly, these methods would also be relevant in drug discovery in order to develop new high-throughput screening approaches or to identify new therapeutic targets. Accordingly, herein we provide a thorough assessment on all biotechnological aspects, including strengths and weaknesses, of these fluorescence-based methodologies when applied in the study of PPIs occurring at the cell surface of living cells.  相似文献   

3.
Chemical-specific hazard quotient (HQ) risk characterization in ecological risk assessment (ERA) can be a value-added tool for risk management decision-making at chemical release sites, when applied appropriately. However, there is little consensus regarding how HQ results can be used for risk management decision-making at the population, community, and ecosystem levels. Furthermore, stakeholders are reluctant to consider alternatives to HQ results for risk management decisions. Chemical-specific HQs risk characterization should be viewed as only one of several approaches (i.e., tools) for addressing ecological issues; and in many situations, other quantitative and qualitative approaches will likely result in superior risk management decisions. The purpose of this paper is to address fundamental issues and limitations associated with chemical-specific HQ risk characterization in ERA, to identify when it may be appropriate, to explore alternatives that are currently available, and to identify areas that could be developed for the future. Several alternatives (i.e., compensatory restoration, performance-based ecological monitoring, ecological significance criteria, net environmental benefit analysis), including their limitations, that can supplement, augment, or substitute for HQs in ERA are presented. In addition, areas of research (i.e., wildlife habitat assessment/landscape ecology/population biology, and field validated risk-based screening levels) that could yield new tools are discussed.  相似文献   

4.
Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally measured thermodynamic parameters, to machine-learning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant. We study the potential contribution of increasing the amount of information utilized by RNA folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F?-measure over correctly-predicted base-pairs (i.e., 16% error rate), compared to the previously best reported score of 70% (i.e., 30% error rate). That is, the new model yields an error reduction of about 50%. Trained models and source code are available at www.cs.bgu.ac.il/?negevcb/contextfold.  相似文献   

5.
The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.  相似文献   

6.
We derive and compare the operating characteristics of hierarchical and square array-based testing algorithms for case identification in the presence of testing error. The operating characteristics investigated include efficiency (i.e., expected number of tests per specimen) and error rates (i.e., sensitivity, specificity, positive and negative predictive values, per-family error rate, and per-comparison error rate). The methodology is illustrated by comparing different pooling algorithms for the detection of individuals recently infected with HIV in North Carolina and Malawi.  相似文献   

7.
The pharmaceutical industry and regulatory agencies are increasingly interested in conducting bridging studies in order to bring an approved drug product from the original region (eg, United States or European Union) to a new region (eg, Asian-Pacific countries). In this article, we provide a new methodology for the design and analysis of bridging studies by assuming prior knowledge on how the null and alternative hypotheses in the original, foreign study are related to the null and alternative hypotheses in the bridging study and setting the type I error for the bridging study according to the strength of the foreign-study evidence. The new methodology accounts for randomness in the foreign-study evidence and controls the average type I error of the bridging study over all possibilities of the foreign-study evidence. In addition, the new methodology increases statistical power, when compared to approaches that do not use foreign-study evidence, and it allows for the possibility of not conducting the bridging study when the foreign-study evidence is unfavorable. Finally, we conducted extensive simulation studies to demonstrate the usefulness of the proposed methodology.  相似文献   

8.
Controlling the proportion of false positives in multiple dependent tests   总被引:4,自引:0,他引:4  
Genome scan mapping experiments involve multiple tests of significance. Thus, controlling the error rate in such experiments is important. Simple extension of classical concepts results in attempts to control the genomewise error rate (GWER), i.e., the probability of even a single false positive among all tests. This results in very stringent comparisonwise error rates (CWER) and, consequently, low experimental power. We here present an approach based on controlling the proportion of false positives (PFP) among all positive test results. The CWER needed to attain a desired PFP level does not depend on the correlation among the tests or on the number of tests as in other approaches. To estimate the PFP it is necessary to estimate the proportion of true null hypotheses. Here we show how this can be estimated directly from experimental results. The PFP approach is similar to the false discovery rate (FDR) and positive false discovery rate (pFDR) approaches. For a fixed CWER, we have estimated PFP, FDR, pFDR, and GWER through simulation under a variety of models to illustrate practical and philosophical similarities and differences among the methods.  相似文献   

9.
Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance threshold in this combined setting. Still, with the FWE under control, application of our method resulted in the discovery of eight novel CISs, which each have a probability less than 5% of being false detections.  相似文献   

10.
Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use.  相似文献   

11.
It is typical in QTL mapping experiments that the number of markers under investigation is large. This poses a challenge to commonly used regression models since the number of feature variables is usually much larger than the sample size, especially, when epistasis effects are to be considered. The greedy nature of the conventional stepwise procedures is well known and is even more conspicuous in such cases. In this article, we propose a two-phase procedure based on penalized likelihood techniques and extended Bayes information criterion (EBIC) for QTL mapping. The procedure consists of a screening phase and a selection phase. In the screening phase, the main and interaction features are alternatively screened by a penalized likelihood mechanism. In the selection phase, a low-dimensional approach using EBIC is applied to the features retained in the screening phase to identify QTL. The two-phase procedure has the asymptotic property that its positive detection rate (PDR) and false discovery rate (FDR) converge to 1 and 0, respectively, as sample size goes to infinity. The two-phase procedure is compared with both traditional and recently developed approaches by simulation studies. A real data analysis is presented to demonstrate the application of the two-phase procedure.  相似文献   

12.
MOTIVATION: In microarray studies gene discovery based on fold-change values is often misleading because error variability for each gene is heterogeneous under different biological conditions and intensity ranges. Several statistical testing methods for differential gene expression have been suggested, but some of these approaches are underpowered and result in high false positive rates because within-gene variance estimates are based on a small number of replicated arrays. RESULTS: We propose to use local-pooled-error (LPE) estimates and robust statistical tests for evaluating significance of each gene's differential expression. Our LPE estimation is based on pooling errors within genes and between replicate arrays for genes in which expression values are similar. We have applied our LPE method to compare gene expression in na?ve and activated CD8+ T-cells. Our results show that the LPE method effectively identifies significant differential-expression patterns with a small number of replicated arrays. AVAILABILITY: The methodology is implemented with S-PLUS and R functions available at http://hesweb1.med.virginia.edu/bioinformatics  相似文献   

13.
Fragment-based approaches to enzyme inhibition   总被引:1,自引:0,他引:1  
Fragment-based approaches have provided a new paradigm for small-molecule drug discovery. The methodology is complementary to high-throughput screening approaches, starting from fragments of low molecular complexity and high ligand efficiency, and building up to more potent inhibitors. The approach, which depends heavily on a number of biophysical techniques, is now being taken up by more groups in both industry and academia. This article describes key aspects of the process and highlights recent developments and applications.  相似文献   

14.
Two-part joint models for a longitudinal semicontinuous biomarker and a terminal event have been recently introduced based on frequentist estimation. The biomarker distribution is decomposed into a probability of positive value and the expected value among positive values. Shared random effects can represent the association structure between the biomarker and the terminal event. The computational burden increases compared to standard joint models with a single regression model for the biomarker. In this context, the frequentist estimation implemented in the R package frailtypack can be challenging for complex models (i.e., a large number of parameters and dimension of the random effects). As an alternative, we propose a Bayesian estimation of two-part joint models based on the Integrated Nested Laplace Approximation (INLA) algorithm to alleviate the computational burden and fit more complex models. Our simulation studies confirm that INLA provides accurate approximation of posterior estimates and to reduced computation time and variability of estimates compared to frailtypack in the situations considered. We contrast the Bayesian and frequentist approaches in the analysis of two randomized cancer clinical trials (GERCOR and PRIME studies), where INLA has a reduced variability for the association between the biomarker and the risk of event. Moreover, the Bayesian approach was able to characterize subgroups of patients associated with different responses to treatment in the PRIME study. Our study suggests that the Bayesian approach using the INLA algorithm enables to fit complex joint models that might be of interest in a wide range of clinical applications.  相似文献   

15.
In recent years, the number of studies focusing on the genetic basis of common disorders with a complex mode of inheritance, in which multiple genes of small effect are involved, has been steadily increasing. An improved methodology to identify the cumulative contribution of several polymorphous genes would accelerate our understanding of their importance in disease susceptibility and our ability to develop new treatments. A critical bottleneck is the inability of standard statistical approaches, developed for relatively modest predictor sets, to achieve power in the face of the enormous growth in our knowledge of genomics. The inability is due to the combinatorial complexity arising in searches for multiple interacting genes. Similar "curse of dimensionality" problems have arisen in other fields, and Bayesian statistical approaches coupled to Markov chain Monte Carlo (MCMC) techniques have led to significant improvements in understanding. We present here an algorithm, APSampler, for the exploration of potential combinations of allelic variations positively or negatively associated with a disease or with a phenotype. The algorithm relies on the rank comparison of phenotype for individuals with and without specific patterns (i.e., combinations of allelic variants) isolated in genetic backgrounds matched for the remaining significant patterns. It constructs a Markov chain to sample only potentially significant variants, minimizing the potential of large data sets to overwhelm the search. We tested APSampler on a simulated data set and on a case-control MS (multiple sclerosis) study for ethnic Russians. For the simulated data, the algorithm identified all the phenotype-associated allele combinations coded into the data and, for the MS data, it replicated the previously known findings.  相似文献   

16.
Ibrahim JG  Chen MH  Xia HA  Liu T 《Biometrics》2012,68(2):578-586
Recent guidance from the Food and Drug Administration for the evaluation of new therapies in the treatment of type 2 diabetes (T2DM) calls for a program-wide meta-analysis of cardiovascular (CV) outcomes. In this context, we develop a new Bayesian meta-analysis approach using survival regression models to assess whether the size of a clinical development program is adequate to evaluate a particular safety endpoint. We propose a Bayesian sample size determination methodology for meta-analysis clinical trial design with a focus on controlling the type I error and power. We also propose the partial borrowing power prior to incorporate the historical survival meta data into the statistical design. Various properties of the proposed methodology are examined and an efficient Markov chain Monte Carlo sampling algorithm is developed to sample from the posterior distributions. In addition, we develop a simulation-based algorithm for computing various quantities, such as the power and the type I error in the Bayesian meta-analysis trial design. The proposed methodology is applied to the design of a phase 2/3 development program including a noninferiority clinical trial for CV risk assessment in T2DM studies.  相似文献   

17.
Specific plant species that can take up and accumulate abnormally high concentrations of elements in their aboveground tissues are referred to as “hyperaccumulators”. The use of this term is justified in the case of enormous element-binding capacity of plants growing in their natural habitats and showing no toxicity symptoms. An increasing interest in the study of hyperaccumulators results from their potential applications in environmental biotechnology (phytoremediation, phytomining) and their emerging role in nanotechnology. The highest number of plant species with confirmed hyperaccumulative properties has been reported for hyperaccumulators of nickel, cadmium, zinc, manganese, arsenic and selenium. More limited data exist for plants accumulating other elements, including common pollutants (chromium, lead and boron) or elements of commercial value, such as copper, gold and rare earth elements. Different approaches have been used for the study of hyperaccumulators – geobotanical, chemical, biochemical and genetic. The chemical approach is the most important in screening for new hyperaccumulators. This article presents and critically reviews current trends in new hyperaccumulator research, emphasizing analytical methodology that is applied in identification of new hyperaccumulators of trace elements and its future perspectives.  相似文献   

18.
A new method of species (inverse) classification of vegetation data, i.e. classification of species into groups with similar ecological tolerances, is presented which overcomes the problems of species abundance distorting the results. The algorithm TWO-STEP is based on the use of an asymmetric measure of dissimilarity: where i, j are species, h is the stand, n is the total number of stands, and xih is the amount of species i in stand h. The algorithm uses the rows of the asymmetric dissimilarity matrix generated as above to form a second symmetric dissimilarity matrix using the measure: where m is the number of species and k the species. Flexible sorting is applied to generate a species classification. Comparison of results after applying the TWO-STEP algorithm and a standard alternative to an artificial data set demonstrates its efficacy. TWO-STEP also shows considerable advantages over previous analyses for a Queensland rainforest data set (quantitative) and an English heath (qualitative) data set. Normalization of species data appears advantageous for quantitative data only.  相似文献   

19.
Data quality     
A methodology is presented that enables incorporating expert judgment regarding the variability of input data for environmental life cycle assessment (LCA) modeling. The quality of input data in the life-cycle inventory (LCI) phase is evaluated by LCA practitioners using data quality indicators developed for this application. These indicators are incorporated into the traditional LCA inventory models that produce non-varying point estimate results (i.e., deterministic models) to develop LCA inventory models that produce results in the form of random variables that can be characterized by probability distributions (i.e., stochastic models). The outputs of these probabilistic LCA models are analyzed using classical statistical methods for better decision and policy making information. This methodology is applied to real-world beverage delivery system LCA inventory models. The inventory study results for five beverage delivery system alternatives are compared using statistical methods that account for the variance in the model output values for each alternative. Sensitivity analyses are also performed that indicate model output value variance increases as input data uncertainty increases (i.e., input data quality degrades). Concluding remarks point out the strengths of this approach as an alternative to providing the traditional qualitative assessment of LCA inventory study input data with no efficient means of examining the combined effects on the model results. Data quality assessments can now be captured quantitatively within the LCA inventory model structure. The approach produces inventory study results that are variables reflecting the uncertainty associated with the input data. These results can be analyzed using statistical methods that make efficient quantitative comparisons of inventory study alternatives possible. Recommendations for future research are also provided that include the screening of LCA inventory model inputs for significance and the application of selection and ranking techniques to the model outputs.  相似文献   

20.
Here, we describe a new model of voluntary alcohol drinking by group-housed mice. The model employs sensor-equipped cages that track the behaviors of the individual animals via implanted radio chips. After the animals were allowed intermittent access to alcohol (three 24 h intervals every week) for 4 weeks, the proportions of licks directed toward bottles containing alcohol were 50.9% and 39.6% for the male and female mice, respectively. We used three approaches (i.e., quinine adulteration, a progressive ratio schedule and a schedule involving a risk of punishment) to test for symptoms of compulsive alcohol drinking. The addition of 0.01% quinine to the alcohol solution did not significantly affect intake, but 0.03% quinine induced a greater than 5-fold reduction in the number of licks on the alcohol bottles. When the animals were required to perform increasing numbers of instrumental responses to obtain access to the bottle with alcohol (i.e., a progressive ratio schedule), they frequently reached a maximum of 21 responses irrespective of the available reward. Although the mice rarely achieved higher response criteria, the number of attempts was ∼10 times greater in case of alcohol than water. We have developed an approach for mapping social interactions among animals that is based on analysis of the sequences of entries into the cage corners. This approach allowed us to identify the mice that followed other animals in non-random fashions. Approximately half of the mice displayed at least one interaction of this type. We have not yet found a clear correlation between imitative behavior and relative alcohol preference. In conclusion, the model we describe avoids the limitations associated with testing isolated animals and reliably leads to stable alcohol drinking. Therefore, this model may be well suited to screening for the effects of genetic mutations or pharmacological treatments on alcohol-induced behaviors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号