首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results.  相似文献   

2.

Background  

The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process.  相似文献   

3.

Background  

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories.  相似文献   

4.

Background  

The development of software tools that analyze microarray data in the context of genetic knowledgebases is being pursued by multiple research groups using different methods. A common problem for many of these tools is how to correct for multiple statistical testing since simple corrections are overly conservative and more sophisticated corrections are currently impractical. A careful study of the nature of the distribution one would expect by chance, such as by a simulation study, may be able to guide the development of an appropriate correction that is not overly time consuming computationally.  相似文献   

5.

Background  

Nuclear magnetic resonance spectroscopy is one of the primary tools in metabolomics analyses, where it is used to track and quantify changes in metabolite concentrations or profiles in response to perturbation through disease, toxicants or drugs. The spectra generated through such analyses are typically confounded by noise of various types, obscuring the signals and hindering downstream statistical analysis. Such issues are becoming increasingly significant as greater numbers of large-scale systems or longitudinal studies are being performed, in which many spectra from different conditions need to be compared simultaneously.  相似文献   

6.

Background  

Many aspects of biological functions can be modeled by biological networks, such as protein interaction networks, metabolic networks, and gene coexpression networks. Studying the statistical properties of these networks in turn allows us to infer biological function. Complex statistical network models can potentially more accurately describe the networks, but it is not clear whether such complex models are better suited to find biologically meaningful subnetworks.  相似文献   

7.

Background  

A useful application of flow cytometry is the investigation of cell receptor-ligand interactions. However such analyses are often compromised due to problems interpreting changes in ligand binding where the receptor expression is not constant. Commonly, problems are encountered due to cell treatments resulting in altered receptor expression levels, or when cell lines expressing a transfected receptor with variable expression are being compared. To overcome this limitation we have developed a Microsoft Excel spreadsheet that aims to automatically and effectively simplify flow cytometric data and perform statistical tests in order to provide a clearer graphical representation of results.  相似文献   

8.

Background  

The statistical study of biological networks has led to important novel biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes.  相似文献   

9.

Introduction  

Disease activity in patients with rheumatoid arthritis (RA) is associated with increased cardiovascular morbidity and mortality, of which N-terminal pro-brain natriuretic peptide (NT-proBNP) is a predictor. Our objective was to examine the cross-sectional and longitudinal associations between markers of inflammation, measures of RA disease activity, medication used in the treatment of RA, and NT-proBNP levels (dependent variable).  相似文献   

10.

Background  

The Central Limit Theorem (CLT) is a statistical principle that states that as the number of repeated samples from any population increase, the variance among sample means will decrease and means will become more normally distributed. It has been conjectured that the CLT has the potential to provide benefits for group living in some animals via greater predictability in food acquisition, if the number of foraging bouts increases with group size. The potential existence of benefits for group living derived from a purely statistical principle is highly intriguing and it has implications for the origins of sociality.  相似文献   

11.

Background  

In cancer studies, it is common that multiple microarray experiments are conducted to measure the same clinical outcome and expressions of the same set of genes. An important goal of such experiments is to identify a subset of genes that can potentially serve as predictive markers for cancer development and progression. Analyses of individual experiments may lead to unreliable gene selection results because of the small sample sizes. Meta analysis can be used to pool multiple experiments, increase statistical power, and achieve more reliable gene selection. The meta analysis of cancer microarray data is challenging because of the high dimensionality of gene expressions and the differences in experimental settings amongst different experiments.  相似文献   

12.

Background  

Innovative extensions of (M) ANOVA gain common ground for the analysis of designed metabolomics experiments. ASCA is such a multivariate analysis method; it has successfully estimated effects in megavariate metabolomics data from biological experiments. However, rigorous statistical validation of megavariate effects is still problematic because megavariate extensions of the classical F-test do not exist.  相似文献   

13.
14.

Background  

Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required.  相似文献   

15.

Background  

It is common for the results of a microarray study to be analyzed in the context of biologically-motivated groups of genes such as pathways or Gene Ontology categories. The most common method for such analysis uses the hypergeometric distribution (or a related technique) to look for "over-representation" of groups among genes selected as being differentially expressed or otherwise of interest based on a gene-by-gene analysis. However, this method suffers from some limitations, and biologist-friendly tools that implement alternatives have not been reported.  相似文献   

16.

Objective:

The obesity prevalence is growing worldwide and largely responsible for cardiovascular disease, the most common cause of death in the western world. The rationale of this study was to distinguish metabolically healthy from unhealthy overweight/obese young and adult patients as compared to healthy normal weight age matched controls by an extensive anthropometric, laboratory, and sonographic vascular assessment.

Design and Methods:

Three hundred fifty five young [8 to < 18 years, 299 overweight/obese(ow/ob), 56 normal weight (nw)] and 354 adult [>18‐60 years, 175 (ow/ob), 179 nw)] participants of the STYJOBS/EDECTA (STYrian Juvenile Obesity Study/Early DEteCTion of Atherosclerosis) cohort were analyzed. STYJOBS/EDECTA (NCT00482924) is a crossectional study to investigate metabolic/cardiovascular risk profiles in normal and ow/ob people free of disease except metabolic syndrome (MetS).

Results:

From 299 young ow/ob subjects (8‐< 18 years), 108 (36%), and from 175 adult ow/ob subjects (>18‐60 years), 79 (45%) had positive criteria for MetS. In both age groups, prevalence of MetS was greater among males. Overweight/obese subjects were divided into “healthy” (no MetS criterion except anthropometry fulfilled) and “unhealthy” (MetS positive). Although percentage body fat did not differ between “healthy” and “unhealthy” ow/ob, nuchal and visceral fat were significantly greater in the “unhealthy” group which had also significantly higher values of carotid intima media thickness (IMT). With MetS as the dependent variable, two logistic regressions including juveniles < 18 years or adults >18 years were performed. The potential predictor variables selected with the exception of age and gender by t test comparisons included IMT, ultrasensitive c‐reactive protein (US‐CRP), IL‐6, malondialdehyde (MDA), oxidized LDL, leptin, adiponectin, uric acid (UA), aldosterone, cortisol, transaminases, fibrinogen. In both groups, uric acid and in adults only, leptin and adiponectin, turned out as the best predictor.

Conclusion:

Serum levels of UA are a significant predictor of unhealthy obesity in juveniles and adults.  相似文献   

17.

Background  

With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing.  相似文献   

18.

Background  

Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.  相似文献   

19.

Objectives

Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.

Methods

In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.

Results

Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).

Conclusions

Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears.  相似文献   

20.

Background  

Pharmacokinetic and pharmacodynamic (PK/PD) indices are increasingly being used in the microbiological field to assess the efficacy of a dosing regimen. In contrast to methods using MIC, PK/PD-based methods reflect in vivo conditions and are more predictive of efficacy. Unfortunately, they entail the use of one PK-derived value such as AUC or Cmax and may thus lead to biased efficiency information when the variability is large. The aim of the present work was to evaluate the efficacy of a treatment by adjusting classical breakpoint estimation methods to the situation of variable PK profiles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号