首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Generalised information criteria in model selection   总被引:7,自引:0,他引:7  
  相似文献   

2.
Food patch visitation was compared to the availability of fruit patches of different species during 2 years in a Bornean lowland forest to examine orangutan (Pongo pygmaeus) diet selectivity. Feeding on both the pulp and the seeds of nonfig fruit varied directly with fruit patch availability, demonstrating preference for these foods over fig fruit or other plant parts (bark or leaves). Factors determining fruit selectivity rank were examined through multiple regression analysis. Modeling selectivity for 52 chemically unprotected primate-fruit pulp species revealed strong preferences for species of (i) large crop size (numbers of fruits ripening in an individual patch), (ii) high pulp weight/fruit, and (iii) high pulp mass per swallowed unit of pulp + seed, demonstrating orangutan sensitivity especially to patch size (g of pulp or total energy/patch) and perhaps to fruit handling time. Modeling selectivity for 18 fig species showed that 4 factors significantly influenced fig species rank: crop size, pulp weight/fruit, and 2 chemical variables, percentage digestible carbohydrate and percentage phenolic compounds in the fig fruit pulp. The selectivity rank based on the overall nutrient gain from feeding in the fruit patch (the product of the first 3 variables) is proportionally depressed by the percentage tannin content, demonstrating that orangutans integrate values for these variables in selecting fig patches. The conclusions from these results and from analysis of selectivity for seeds and for other fruit types are that orangutan foraging decisions are strongly influenced by the meal size expected from a feeding visit (i.e., by patch size), that tannins and other toxins deter feeding, and that the energy content, rather than the protein content, of foods is important in diet selection. The foraging strategy of orangutans is interpreted relative to these results and to Bornean fruiting phenology. By integrating spatial, morphometric, and chemical variables in analysis, this study is the first to demonstrate the application of foraging theory to separate out the key variables that determine diet selection in a primate. Multivariate analysis should routinely be applied to such data to distinguish among the many covarying attributes of food items and patches; inferences drawn in previous studies of primate diet selection, which ignore key spatial and morphological variables and rely on univariate correlations, are therefore suspect.  相似文献   

3.
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution.  相似文献   

4.
A host of technologies exists for the separation of living, nonadherent cells, with separation decisions typically based on fluorescence or immunolabeling of cells. Methods to separate adherent cells as well as to broaden the range of possible sorting criteria would be of high value and complementary to existing strategies. Cells were cultured on arrays of releasable pallets. The arrays were screened and individual cell(s)/pallets were released and collected. Conventional fluorescence and immunolabeling of cells were compatible with the pallet arrays, as were separations based on gene expression. By varying the size of the pallet and the number of cells cultured on the array, single cells or clonal colonies of cells were isolated from a heterogeneous population. Since cells remained adherent throughout the isolation process, separations based on morphologic characteristics, for example cell shape, were feasible. Repeated measurements of each cell in an array were performed permitting the selection of cells based on their temporal behavior, e.g. growth rate. The pallet array system provides the flexibility to select and collect adherent cells based on phenotypic and temporal criteria and other characteristics not accessible by alternative methods.  相似文献   

5.
Chen  Jiahua; Chen  Zehua 《Biometrika》2008,95(3):759-771
The ordinary Bayesian information criterion is too liberal formodel selection when the model space is large. In this paper,we re-examine the Bayesian paradigm for model selection andpropose an extended family of Bayesian information criteria,which take into account both the number of unknown parametersand the complexity of the model space. Their consistency isestablished, in particular allowing the number of covariatesto increase to infinity with the sample size. Their performancein various situations is evaluated by simulation studies. Itis demonstrated that the extended Bayesian information criteriaincur a small loss in the positive selection rate but tightlycontrol the false discovery rate, a desirable property in manyapplications. The extended Bayesian information criteria areextremely useful for variable selection in problems with a moderatesample size but with a huge number of covariates, especiallyin genome-wide association studies, which are now an activearea in genetics research.  相似文献   

6.
Gene selection: a Bayesian variable selection approach   总被引:13,自引:0,他引:13  
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data. SUPPLEMENTARY INFORMATION: http://stat.tamu.edu/people/faculty/bmallick.html.  相似文献   

7.
MOTIVATION: Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Most current methods select genes based on known phenotype information. However, certain set of genes may correspond to new phenotypes which are yet unknown, and it is important to develop novel effective selection methods for their discovery without using any prior phenotype information. RESULTS: We propose and study a new method to select relevant genes based on their similarity information only. The method relies on a mechanism for discarding irrelevant genes. A two-way ordering of gene expression data can force irrelevant genes towards the middle in the ordering and thus can be discarded. Mechanisms based on variance and principal component analysis are also studied. When applied to expression profiles of colon cancer and leukemia, the unsupervised method outperforms the baseline algorithm that simply uses all genes, and it also selects relevant genes close to those selected using supervised methods. SUPPLEMENT: More results and software are online: http://www.nersc.gov/~cding/2way.  相似文献   

8.
The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. Because this process utilizes various mapping techniques that each provides an independent line of evidence, a combination of multiple maps can improve the accuracy of the resulting chromosomal assemblies. We present ALLMAPS, a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps. ALLMAPS is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps. ALLMAPS is a useful tool in building high-quality genome assemblies. ALLMAPS is available at: https://github.com/tanghaibao/jcvi/wiki/ALLMAPS.  相似文献   

9.
Gene selection using support vector machines with non-convex penalty   总被引:2,自引:0,他引:2  
MOTIVATION: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. RESULTS: In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. AVAILABILITY: MATLAB codes are available upon request from the authors.  相似文献   

10.
Defining selection criteria to improve yield under drought   总被引:19,自引:0,他引:19  
The many selection criteria that have been proposed to increase drought resistance of our crops have had little, if any, impact on improving crop yields in dry environments. There are several likely reasons for this lack of success. Some of these are: (i) criteria proposed have been related more to survival mechanisms under drought than to productivity, (ii) criteria are inappropriate to the target environment, and (iii) criteria are temporal and are therefore likely to have minimal impact on growth and yield over the entire lifecycle. Another important reason is that breeders have not been convinced the proposed criteria will be successful as they are too difficult to measure. On the other hand, empirical breeding programmes to improve yield under drought have been successful. Surprisingly, some of the greatest successes have been achieved by breeding in environments where water is non-limiting. This paper reviews breeding approaches to improve yield under drought. It focuses on critical factors that must be considered to identify likely plant attributes that can be targeted. These factors, their link with yield, the nature of the target environment, the level of organisation where the trait is expressed are discussed. Three quite different examples are given to emphasize the above considerations and which show substantial promise in targeting traits to improve yield under drought. They are drought at flowering, improving transpiration efficiency and improving early leaf area development.  相似文献   

11.
The use of large-scale microarray expression profiling to identify predictors of disease class has become of major interest. Beyond their impact in the clinical setting (i.e. improving diagnosis and treatment), these markers are also likely to provide clues on the molecular mechanisms underlining the diseases. In this paper we describe a new method for the identification of multiple gene predictors of disease class. The method is applied to the classification of two forms of arthritis that have a similar clinical endpoint but different underlying molecular mechanisms: rheumatoid arthritis (RA) and osteoarthritis (OA). We aim at both the classification of samples and the location of genes characterizing the different classes. We achieve both goals simultaneously by combining a binary probit model for classification with Bayesian variable selection methods to identify important genes.We find very small sets of genes that lead to good classification results. Some of the selected genes are clearly correlated with known aspects of the biology of arthritis and, in some cases, reflect already known differences between RA and OA.  相似文献   

12.

Background

Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses.

Methods

This study presents a modified Artificial Bee Colony Algorithm (ABC) to select minimum number of genes that are deemed to be significant for cancer along with improvement of predictive accuracy. The search equation of ABC is believed to be good at exploration but poor at exploitation. To overcome this limitation we have modified the ABC algorithm by incorporating the concept of pheromones which is one of the major components of Ant Colony Optimization (ACO) algorithm and a new operation in which successive bees communicate to share their findings.

Results

The proposed algorithm is evaluated using a suite of ten publicly available datasets after the parameters are tuned scientifically with one of the datasets. Obtained results are compared to other works that used the same datasets. The performance of the proposed method is proved to be superior.

Conclusion

The method presented in this paper can provide subset of genes leading to more accurate classification results while the number of selected genes is smaller. Additionally, the proposed modified Artificial Bee Colony Algorithm could conceivably be applied to problems in other areas as well.
  相似文献   

13.
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.  相似文献   

14.
15.
The effect of multiple alleles on long-term response to selection is examined by simulations using a pseudosampling technique to simulate the multidimensional diffusion process. The effects of alleles are independently drawn from a normal distribution and the initial frequencies of alleles are assumed either to be equal or to be drawn from a neutral equilibrium population. With these two initial gene frequency distributions we examined various properties of the selection response process for the effects of number of alleles and selection intensity. For neutral initial frequencies the effects of multiple alleles compared with two alleles are minor on the ratio of final to initial response (E(R infinity/E(R1)) and the half life of response (t0.5), but are significant on the variance of response. Under certain conditions the variance of the selection limit can even increase as selection gets stronger. For equal initial frequencies the effects of multiple alleles are, however, minor on the ratio of the variance of the selection limit to the initial genetic variance, but E(R infinity/E(R1) and t0.5 increase as the number of alleles increases. The results show that for certain statistics the effects of multiple alleles can be minimized by an appropriate transformation of parameters for given initial gene frequencies, but the effects cannot, in general, be removed by any single transformation or reparameterization of parameters.  相似文献   

16.

Background

Graduate entry medicine raises new questions about the suitability of students with different backgrounds. We examine this, and the broader issue of effectiveness of selection and assessment procedures.

Methods

The data included background characteristics, academic record, interview score and performance in pre-clinical modular assessment for two years intake of graduate entry medical students. Exploratory factor analysis is a powerful method for reducing a large number of measures to a smaller group of underlying factors. It was used here to identify patterns within and between the selection and performance data.

Principal Findings

Basic background characteristics were of little importance in predicting exam success. However, easily interpreted components were detected within variables comprising the ‘selection’ and ‘assessment’ criteria. Three selection components were identified (‘Academic’, ‘GAMSAT’, ‘Interview’) and four assessment components (‘General Exam’, ‘Oncology’, ‘OSCE’, ‘Family Case Study’). There was a striking lack of relationships between most selection and performance factors. Only ‘General Exam’ and ‘Academic’ showed a correlation (Pearson''s r = 0.55, p<0.001).

Conclusions

This study raises questions about methods of student selection and their effectiveness in predicting performance and assessing suitability for a medical career. Admissions tests and most exams only confirmed previous academic achievement, while interview scores were not correlated with any consequent assessment.  相似文献   

17.
Ectomycorrhizae and landfill site reclamations: fungal selection criteria   总被引:1,自引:0,他引:1  
The ectomycorrhizal fungi Laccaria proxima and Hebeloma crustuliniforme , but not Paxillus involutus , tolerated reduced oxygen tensions characteristic of landfill site covering materials. Compared with conventional methods of ectomycorrhizal seedling production, significant time savings were made by use of fermenter cultured L. proxima inoculum. Further time savings could not, however, be made by medium supplementation with Betula pendula (Silver Birch) extract.
Since vertical leachate migration is common in most sites, fungal choice in relation to pH tolerance must be an important selection criterion. With pH regimes between acid and neutral, L. proxima should be chosen in contrast with basophilic regimes where H. crustuliniforme should be used.  相似文献   

18.
Discrete models of competitors (initial population and mutants) are considered in which reproduction is set by an increasing and concave function, and migration in the space consisting of a set of areas is described by a Markov matrix. This allows using the theory of monotonic operators to study problems of selection, coexistence and stability. It is shown that the higher is the number of areas, the more severe are the requirements of selective advantage to the initial population.  相似文献   

19.
Three species of rats (Rattus exulans, R. rattus, R. norvegicus) are widely invasive, having established populations in terrestrial habitats worldwide. These species exploit a wide variety of foods and can devastate native flora and fauna. Rats can consume a variety of plant parts, but may have the most dramatic effects on plant populations through consumption and destruction of seeds. The vulnerability of vegetation to rat consumption is influenced by many factors including size of plant part, and mechanical and chemical defenses. We reviewed the literature to find out what plant species and plant parts invasive rats are consuming and what characteristics these sources share that may influence selection by rats. Many of the studies we found were preformed in New Zealand and our analyses are, therefore, focused on this location. We also performed feeding trials in the laboratory with R. norvegicus to determine if seed hardness and palatability would influence rat consumptive choices. We found more reports of rat consumption of fruits and seeds versus vegetative plant parts, and smaller fruits and seeds versus larger. R. norvegicus are reported to consume proportionally more vegetative plant parts than either R. exulans or R. rattus, possibly due to their more ground dwelling habits. Large size and hard seed coats may deter rat feeding, but unpalatable chemicals may be even more effective deterrents to rats. Scientists and managers can better manage vegetation in rat invaded areas by understanding the criteria rats use to select food.  相似文献   

20.
Gene identities and multiple relationships   总被引:9,自引:0,他引:9  
E A Thompson 《Biometrics》1974,30(4):667-680
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号