首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
2.
Detecting genomic footprints of selection is an important step in the understanding of evolution. Accounting for linkage disequilibrium in genome scans increases detection power, but haplotype‐based methods require individual genotypes and are not applicable on pool‐sequenced samples. We propose to take advantage of the local score approach to account for linkage disequilibrium in genome scans for selection, cumulating (possibly small) signals from single markers over a genomic segment, to clearly pinpoint a selection signal. Using computer simulations, we demonstrate that this approach detects selection with higher power than several state‐of‐the‐art single‐marker, windowing or haplotype‐based approaches. We illustrate this on two benchmark data sets including individual genotypes, for which we obtain similar results with the local score and one haplotype‐based approach. Finally, we apply the local score approach to Pool‐Seq data obtained from a divergent selection experiment on behaviour in quail and obtain precise and biologically coherent selection signals: while competing methods fail to highlight any clear selection signature, our method detects several regions involving genes known to act on social responsiveness or autistic traits. Although we focus here on the detection of positive selection from multiple population data, the local score approach is general and can be applied to other genome scans for selection or other genomewide analyses such as GWAS.  相似文献   

3.
We have the technology and capability to develop an all‐in‐one microarray that can provide complete information on a microbial community, including algae, protozoa, bacteria, archaea, fungi, viruses, antimicrobial resistance, biotoxins and functional activity. With lab‐on‐a‐chip, nanotechnology integrating a variety of the latest methods for a large number of sample types (water, sediment, waste water, food, blood, etc.) it is possible to make a desktop instrument that would have universal applications. There are two major thrusts to this grand challenge that will allow us to take advantage of the latest biotechnological breakthroughs in real time. The first is a bioengineering thrust that will take advantage of the large multidisciplinary laboratories in developing key technologies. Miniaturization will reduce reagent costs and increase sensitivity and reaction kinetics for rapid turnaround time. New and evolving technologies will allow us to port the designs for state‐of‐the‐art microarrays today to completely new nanotechnology inspired platforms as they mature. The second thrust is in bioinformatics to use our existing expertise to take advantage of the rapidly evolving landscape of bioinformatics data. This increasing capacity of the data set will allow us to resolve microbial species to greatly improved levels and identify functional genes beyond the hypothetical protein level. A cheap and portable assay would impact countless areas, including clean water technologies, emerging diseases, bioenergy, infectious disease diagnosis, climate change, food safety, environmental clean‐up and bioterrorism. In my opinion it is possible but it will require a very large group of multidiscplenary scientists from multiple institutions crossing many international boundaries and funding over a 5‐year period of more than $100 million. Given the impact that this SuperChip could have it is well worth the price!!!  相似文献   

4.
Numerous statistical methods have been developed for analyzing high‐dimensional data. These methods often focus on variable selection approaches but are limited for the purpose of testing with high‐dimensional data. They are often required to have explicit‐likelihood functions. In this article, we propose a “hybrid omnibus test” for high‐dicmensional data testing purpose with much weaker requirements. Our hybrid omnibus test is developed under a semiparametric framework where a likelihood function is no longer necessary. Our test is a version of a frequentist‐Bayesian hybrid score‐type test for a generalized partially linear single‐index model, which has a link function being a function of a set of variables through a generalized partially linear single index. We propose an efficient score based on estimating equations, define local tests, and then construct our hybrid omnibus test using local tests. We compare our approach with an empirical‐likelihood ratio test and Bayesian inference based on Bayes factors, using simulation studies. Our simulation results suggest that our approach outperforms the others, in terms of type I error, power, and computational cost in both the low‐ and high‐dimensional cases. The advantage of our approach is demonstrated by applying it to genetic pathway data for type II diabetes mellitus.  相似文献   

5.
Significant efforts have been devoted in the last decade to improving molecular docking techniques to predict both accurate binding poses and ranking affinities. Some shortcomings in the field are the limited number of standard methods for measuring docking success and the availability of widely accepted standard data sets for use as benchmarks in comparing different docking algorithms throughout the field. In order to address these issues, we have created a Cross‐Docking Benchmark server. The server is a versatile cross‐docking data set containing 4,399 protein‐ligand complexes across 95 protein targets intended to serve as benchmark set and gold standard for state‐of‐the‐art pose and ranking prediction in easy, medium, hard, or very hard docking targets. The benchmark along with a customizable cross‐docking data set generation tool is available at http://disco.csb.pitt.edu . We further demonstrate the potential uses of the server in questions outside of basic benchmarking such as the selection of the ideal docking reference structure.  相似文献   

6.
Template‐based protein structure modeling is commonly used for protein structure prediction. Based on the observation that multiple template‐based methods often perform better than single template‐based methods, we further explore the use of a variable number of multiple templates for a given target in the latest variant of TASSER, TASSERVMT. We first develop an algorithm that improves the target‐template alignment for a given template. The improved alignment, called the SP3 alternative alignment, is generated by a parametric alignment method coupled with short TASSER refinement on models selected using knowledge‐based scores. The refined top model is then structurally aligned to the template to produce the SP3 alternative alignment. Templates identified using SP3 threading are combined with the SP3 alternative and HHEARCH alignments to provide target alignments to each template. These template models are then grouped into sets containing a variable number of template/alignment combinations. For each set, we run short TASSER simulations to build full‐length models. Then, the models from all sets of templates are pooled, and the top 20–50 models selected using FTCOM ranking method. These models are then subjected to a single longer TASSER refinement run for final prediction. We benchmarked our method by comparison with our previously developed approach, pro‐sp3‐TASSER, on a set with 874 easy and 318 hard targets. The average GDT‐TS score improvements for the first model are 3.5 and 4.3% for easy and hard targets, respectively. When tested on the 112 CASP9 targets, our method improves the average GDT‐TS scores as compared to pro‐sp3‐TASSER by 8.2 and 9.3% for the 80 easy and 32 hard targets, respectively. It also shows slightly better results than the top ranked CASP9 Zhang‐Server, QUARK and HHpredA methods. The program is available for download at http://cssb.biology.gatech.edu/ . © 2011 Wiley Periodicals, Inc.  相似文献   

7.
Tree vigor is often used as a covariate when tree mortality is predicted from tree growth in tropical forest dynamic models, but it is rarely explicitly accounted for in a coherent modeling framework. We quantify tree vigor at the individual tree level, based on the difference between expected and observed growth. The available methods to join nonlinear tree growth and mortality processes are not commonly used by forest ecologists so that we develop an inference methodology based on an MCMC approach, allowing us to sample the parameters of the growth and mortality model according to their posterior distribution using the joint model likelihood. We apply our framework to a set of data on the 20‐year dynamics of a forest in Paracou, French Guiana, taking advantage of functional trait‐based growth and mortality models already developed independently. Our results showed that growth and mortality are intimately linked and that the vigor estimator is an essential predictor of mortality, highlighting that trees growing more than expected have a far lower probability of dying. Our joint model methodology is sufficiently generic to be used to join two longitudinal and punctual linked processes and thus may be applied to a wide range of growth and mortality models. In the context of global changes, such joint models are urgently needed in tropical forests to analyze, and then predict, the effects of the ongoing changes on the tree dynamics in hyperdiverse tropical forests.  相似文献   

8.
Within the past few years plant functional trait analyses have been widely applied to learn more about the processes and patterns of ecosystem development in response to environmental changes. These approaches are based on the assumption that plants with similar ecologically relevant trait attributes respond to environmental changes in comparable ways. Several methods have been described on how to analyse a priori defined trait sets with respect to environment. Irrespective of the statistical methods used to contrast ecosystem responses and environmental conditions, each functional trait approach depends strongly on the initial trait set. In nearly all recent studies on functional trait analysis a test, if a trait is responsible, is applied independently from the core analysis. In the current study we present a method that extracts those traits from a wider set of traits which are optimal for describing the ecosystem response to a given environmental gradient. This was done by the use of iterative three‐table ordination techniques with each possible trait combination. We further concentrated on the effect of the inclusion of too many traits in such analyses. As examples the method was applied to three long term studies on abandoned arable fields. The approach was validated by comparing the results with literature‐knowledge on arable field succession. Although the trait pre‐selection was only based on a statistical procedure, our method was able to identify all relevant processes of ecosystem responses. All three sites show comparable ecosystem responses; the importance of the competitive ability of plants was highlighted. We further demonstrated that the use of too many traits results in an over‐fitting of the trait‐environment model. The presented method of iterative RLQ‐analyses is adequate to identify responding traits to environmental changes: the discovered processes of successional development of abandoned arable fields are consistent with our knowledge from the literature.  相似文献   

9.
One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."  相似文献   

10.
PCR‐based methods are the most common technique for sex determination of birds. Although these methods are fast, easy and accurate, they still require special facilities that preclude their application outdoors. Consequently, there is a time lag between sampling and obtaining results that impedes researchers to take decisions in situ and in real time considering individuals’ sex. We present an outdoor technique for sex determination of birds based on the amplification of the duplicated sex‐chromosome‐specific gene Chromo‐Helicase‐DNA binding protein using a loop‐mediated isothermal amplification (LAMP). We tested our method on Griffon Vulture (Gyps fulvus), Egyptian Vulture (Neophron percnopterus) and Black Kite (Milvus migrans) (family Accipitridae). We introduce the first fieldwork procedure for sex determination of animals in the wild, successfully applied to raptor species of three different subfamilies using the same specific LAMP primers. This molecular technique can be deployed directly in sampling areas because it only needs a voltage inverter to adapt a thermo‐block to a car lighter and results can be obtained by the unaided eye based on colour change within the reaction tubes. Primers and reagents are prepared in advance to facilitate their storage at room temperature. We provide detailed guidelines how to implement this procedure, which is simpler (no electrophoresis required), cheaper and faster (results in c. 90 min) than PCR‐based laboratory methods. Our successful cross‐species application across three different raptor subfamilies posits our set of markers as a promising tool for molecular sexing of other raptor families and our field protocol extensible to all bird species.  相似文献   

11.
The clinical pulmonary infection score (CPIS) and bronchoalveolar lavage (BAL) are important diagnostic variables of pneumonia for forcefully ventilated patients who are susceptible to nosocomial infection. Because of its invasive nature, BAL is performed for patients only if the CPIS is greater than a certain threshold value. Thus, CPIS and BAL are closely related, yet BAL values are substantially missing. In a randomized clinical trial, the control and oral treatment groups were compared based on the outcomes from these procedures. Because of the relevance of both outcomes with respect to evaluating the efficacy of treatments, we propose and examine a nonparametric test based on these outcomes, which employs the empirical likelihood methodology. While efficient parametric methods are available when data are observed incompletely, performing appropriate goodness‐of‐fit tests to justify the parametric assumptions is difficult. Our motivation is to provide an approach based on no particular distributional assumption, which enables us to use all observed bivariate data, whether completed or not in an approximate likelihood manner. A broad Monte Carlo study evaluates the asymptotic properties and efficiency of the proposed method based on various sample sizes and underlying distributions. The proposed technique is applied to a data set from a pneumonia study demonstrating its practical worth.  相似文献   

12.
Classification of gene microarrays by penalized logistic regression   总被引:2,自引:0,他引:2  
Classification of patient samples is an important aspect of cancer diagnosis and treatment. The support vector machine (SVM) has been successfully applied to microarray cancer diagnosis problems. However, one weakness of the SVM is that given a tumor sample, it only predicts a cancer class label but does not provide any estimate of the underlying probability. We propose penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem. We show that when using the same set of genes, PLR and the SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability. Often a primary goal in microarray cancer diagnosis is to identify the genes responsible for the classification, rather than class prediction. We consider two gene selection methods in this paper, univariate ranking (UR) and recursive feature elimination (RFE). Empirical results indicate that PLR combined with RFE tends to select fewer genes than other methods and also performs well in both cross-validation and test samples. A fast algorithm for solving PLR is also described.  相似文献   

13.
In clinical studies, we often compare the success rates of two treatment groups where post‐treatment responses of subjects within clusters are usually correlated. To estimate the difference between the success rates, interval estimation procedures that do not account for this intraclass correlation are likely inappropriate. To address this issue, we propose three interval procedures by direct extensions of recently proposed methods for independent binary data based on the concepts of design effect and effective sample size used in sample surveys. Each of them is then evaluated with four competing variance estimates. We also extend three existing methods recommended for complex survey data using different weighting schemes required for those three existing methods. An extensive simulation study is conducted for the purposes of evaluating and comparing the performance of the proposed methods in terms of coverage and expected width. The interval estimation procedures are illustrated using three examples in clinical and social science studies. Our analytic arguments and numerical studies suggest that the methods proposed in this work may be useful in clustered data analyses.  相似文献   

14.
Species abundances are undoubtedly the most widely available macroecological data, but can we use them to distinguish among several models of community structure? Here we present a Bayesian analysis of species‐abundance data that yields a full joint probability distribution of each model's parameters plus a relatively parameter‐independent criterion, the posterior Bayes factor, to compare these models. We illustrate our approach by comparing three classical distributions: the zero‐sum multinomial (ZSM) distribution, based on Hubbell's neutral model, the multivariate Poisson lognormal distribution (MPLN), based on niche arguments, and the discrete broken stick (DBS) distribution, based on MacArthur's broken stick model. We give explicit formulas for the probability of observing a particular species‐abundance data set in each model, and argue that conditioning on both sample size and species count is needed to allow comparisons between the two distributions. We apply our approach to two neotropical communities (trees, fish). We find that DBS is largely inferior to ZSM and MPLN for both communities. The tree data do not allow discrimination between ZSM and MPLN, but for the fish data ZSM (neutral model) overwhelmingly outperforms MPLN (niche model), suggesting that dispersal plays a previously underestimated role in structuring tropical freshwater fish communities. We advocate this approach for identifying the relative importance of dispersal and niche‐partitioning in determining diversity of different ecological groups of species under different environmental conditions.  相似文献   

15.
Prioritization of compounds using inverse docking approach is limited owing to potential drawbacks in its scoring functions. Classically, molecules ranked by best or lowest binding energies and clustering methods have been considered as probable hits. Mining probable hits from an inverse docking approach is very complicated given the closely related protein targets and the chemically similar ligand data set. To overcome this problem, we present here a computational approach using receptor‐centric and ligand‐centric methods to infer the reliability of the inverse docking approach and to recognize probable hits. This knowledge‐driven approach takes advantage of experimentally identified inhibitors against a particular protein target of interest to delineate shape and molecular field properties and use a multilayer perceptron model to predict the biological activity of the test molecules. The approach was validated using flavone derivatives possessing inhibitory activities against principal antimalarial molecular targets of fatty acid biosynthetic pathway, FabG, FabI and FabZ, respectively. We propose that probable hits can be retrieved by comparing the rank list of docking, quantitative‐structure activity relationship and multilayer perceptron models. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
A great deal of recent research has focused on the challenging task of selecting differentially expressed genes from microarray data ("gene selection"). Numerous gene selection algorithms have been proposed in the literature, but it is often unclear exactly how these algorithms respond to conditions like small sample sizes or differing variances. Choosing an appropriate algorithm can therefore be difficult in many cases. In this paper we propose a theoretical analysis of gene selection, in which the probability of successfully selecting differentially expressed genes, using a given ranking function, is explicitly calculated in terms of population parameters. The theory developed is applicable to any ranking function which has a known sampling distribution, or one which can be approximated analytically. In contrast to methods based on simulation, the approach presented here is computationally efficient and can be used to examine the behavior of gene selection algorithms under a wide variety of conditions, even when the number of genes involved runs into the tens of thousands. The utility of our approach is illustrated by comparing three widely-used gene selection methods.  相似文献   

17.
Monte Carlo feature selection for supervised classification   总被引:4,自引:0,他引:4  
MOTIVATION: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. RESULTS: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. AVAILABILITY: Prototype available upon request.  相似文献   

18.
Oceanic island ecosystems are vulnerable to the introduction of alien species, and they provide a habitat for many endangered species. Knowing the diet of an endangered animal is important for appropriate nature restoration efforts on oceanic islands because introduced species may be a major component of the diets of some endangered species. DNA barcoding techniques together with next‐generation sequencing may provide more detailed information on animal diets than other traditional methods. We performed a diet analysis using 48 fecal samples from the critically endangered red‐headed wood pigeon that is endemic to the Ogasawara Islands based on chloroplast trnL P6 loop sequences. The frequency of each detected plant taxa was compared with a microhistological analysis of the same sample set. The DNA barcoding approach detected a much larger number of plants than the microhistological analysis. Plants that were difficult to identify by microhistological analysis after being digested in the pigeon stomachs were frequently identified only by DNA barcoding. The results of the barcoding analysis indicated the frequent consumption of introduced species, in addition to several native species, by the red‐headed wood pigeon. The rapid eradication of specific introduced species may reduce the food resources available to this endangered bird; thus, balancing eradication efforts with the restoration of native food plants should be considered. Although some technical problems still exist, the trnL approach to next‐generation sequencing may contribute to a better understanding of oceanic island ecosystems and their conservation.  相似文献   

19.
Large‐scale agreement studies are becoming increasingly common in medical settings to gain better insight into discrepancies often observed between experts' classifications. Ordered categorical scales are routinely used to classify subjects' disease and health conditions. Summary measures such as Cohen's weighted kappa are popular approaches for reporting levels of association for pairs of raters' ordinal classifications. However, in large‐scale studies with many raters, assessing levels of association can be challenging due to dependencies between many raters each grading the same sample of subjects' results and the ordinal nature of the ratings. Further complexities arise when the focus of a study is to examine the impact of rater and subject characteristics on levels of association. In this paper, we describe a flexible approach based upon the class of generalized linear mixed models to assess the influence of rater and subject factors on association between many raters' ordinal classifications. We propose novel model‐based measures for large‐scale studies to provide simple summaries of association similar to Cohen's weighted kappa while avoiding prevalence and marginal distribution issues that Cohen's weighted kappa is susceptible to. The proposed summary measures can be used to compare association between subgroups of subjects or raters. We demonstrate the use of hypothesis tests to formally determine if rater and subject factors have a significant influence on association, and describe approaches for evaluating the goodness‐of‐fit of the proposed model. The performance of the proposed approach is explored through extensive simulation studies and is applied to a recent large‐scale cancer breast cancer screening study.  相似文献   

20.
The use of model‐based methods to infer a phylogenetic tree from a given data set is frequently motivated by the truism that under certain circumstances the parsimony approach (MP) may produce incorrect topologies, while explicit model‐based approaches are believed to avoid this problem. In the realm of empirical data from actual taxa, it is not known (or knowable) how commonly MP, maximum‐likelihood or Bayesian inference are inaccurate. To test the perceived need for “sophisticated” model‐based approaches, we assessed the degree of congruence between empirical phylogenetic hypotheses generated by alternative methods applied to DNA sequence data in a sample of 1000 recently published articles. Of 504 articles that employed multiple methods, only two exhibited strongly supported incongruence among alternative methods. This result suggests that the MP approach does not produce deviant hypotheses of relationship due to convergent evolution in long branches. Our finding therefore indicates that the use of multiple analytical methods is largely superfluous. We encourage the use of analytical approaches unencumbered by ad hoc assumptions that sap the explanatory power of the evidence. © The Willi Hennig Society 2010.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号