首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The study of human evolution has always been a major issue in physical anthropology. Since computers became available this study became a new dimension in that it became feasible to apply advanced mathematical multivariate methods which make use of morphometric data. However, looking back on what has been achieved so far with these methods, it must be admitted that the results obtained are often unsatisfactory. This has led to a certain lack of acceptance of these methods. In the present paper it is argued that very useful results may be obtained by applying more sophisticated multivariate methods which are specifically designed for the anthropological problems at issue. Three examples are given. The first deals with the controversy between "Creationism" on the one hand and "Evolutionism" on the other. Our results strongly support the Evolutionists' point of view. The second example deals with the reconstruction of human phylogeny. An investigation is discussed which has led to a startling new hypothesis concerning the evolution of man. The last example concerns a preliminary investigation of trends in human sexual dimorphism. The results obtained so far seem to support the opinion expressed by other workers that tendencies exist in our modern society which lead to changes in the present dimorphism.  相似文献   

2.
Information theory has long been used to quantify interactions between two variables. With the rise of complex systems research, multivariate information measures have been increasingly used to investigate interactions between groups of three or more variables, often with an emphasis on so called synergistic and redundant interactions. While bivariate information measures are commonly agreed upon, the multivariate information measures in use today have been developed by many different groups, and differ in subtle, yet significant ways. Here, we will review these multivariate information measures with special emphasis paid to their relationship to synergy and redundancy, as well as examine the differences between these measures by applying them to several simple model systems. In addition to these systems, we will illustrate the usefulness of the information measures by analyzing neural spiking data from a dissociated culture through early stages of its development. Our aim is that this work will aid other researchers as they seek the best multivariate information measure for their specific research goals and system. Finally, we have made software available online which allows the user to calculate all of the information measures discussedwithin this paper.  相似文献   

3.
In this paper, we present a new method for the prediction and uncertainty quantification of data-driven multivariate systems. Traditionally, either mechanistic or non-mechanistic modeling methodologies have been used for prediction; however, it is uncommon for the two to be incorporated together. We compare the forecast accuracy of mechanistic modeling, using Bayesian inference, a non-mechanistic modeling approach based on state space reconstruction, and a novel hybrid methodology composed of the two for an age-structured population data set. The data come from cannibalistic flour beetles, in which it is observed that the adults preying on the eggs and pupae result in non-equilibrium population dynamics. Uncertainty quantification methods for the hybrid models are outlined and illustrated for these data. We perform an analysis of the results from Bayesian inference for the mechanistic model and hybrid models to suggest reasons why hybrid modeling methodology may enable more accurate forecasts of multivariate systems than traditional approaches.  相似文献   

4.
Numerical taxonomy and the related methods of multidimensional scaling (MDS) have been applied in a wide range of disciplines in the last twenty-five years. A common feature has been the initial derivation of a measure of association or proximity among pairs of objects from a multivariate data matrix. Although a large number of measures are in use, there has been little systematic study of the sensitivities of these measures to different aspects of character data. I report here a general model, for a class of these measures, that reveals basic patterns of sensitivity that underlie a wide variety of common measures and shows that there is a continuum of potential association measures which exhibit useful combinations of sensitivities. I distinguish between “separation” sensitive and “minimum value” sensitive measures and describe a new measure that is intermediate in exhibiting both minimum value and separation sensitivity. The utility of this new type of measure in three disciplines—ecology, psychology, and systematics—is briefly described.  相似文献   

5.
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.  相似文献   

6.
7.
Recent advances in high‐throughput methods of molecular analyses have led to an explosion of studies generating large‐scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in‐depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high‐throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure.  相似文献   

8.
Attitudes toward biotechnology in the European Union   总被引:6,自引:0,他引:6  
Public attitudes toward biotechnology in the European Union have been characterized as negative using Eurobarometer data, but so far little attention has been paid to building a robust metric appropriate for emerging public opinion issues which combine high salience with very limited knowledge by the public. On the basis of the general literature about the formation and structure of attitudes and about public perceptions of science, this article presents a new metric and analysis: first, for estimating the level of awareness and knowledge of biotechnology in Europe; second, for assessing the stability and depth of these evaluative perceptions; and third, for exploring the roles of canonical socio-demographic variables, the knowledge variable and general attitudinal schemas for understanding the perceptions of both benefits and risks of biotech applications. The results show the importance of general value orientations or "worldviews" in shaping positive attitudes, and more of these general cognitive schemas should be measured in future research. The same multivariate model was unable to account for a significant percentage of the total variance in the perception of risks, suggesting that new measures are needed to tap this critical area in the acceptance of biotech in Europe.  相似文献   

9.
Xu C  Li Z  Xu S 《Genetics》2005,169(2):1045-1059
Joint mapping for multiple quantitative traits has shed new light on genetic mapping by pinpointing pleiotropic effects and close linkage. Joint mapping also can improve statistical power of QTL detection. However, such a joint mapping procedure has not been available for discrete traits. Most disease resistance traits are measured as one or more discrete characters. These discrete characters are often correlated. Joint mapping for multiple binary disease traits may provide an opportunity to explore pleiotropic effects and increase the statistical power of detecting disease loci. We develop a maximum-likelihood method for mapping multiple binary traits. We postulate a set of multivariate normal disease liabilities, each contributing to the phenotypic variance of one disease trait. The underlying liabilities are linked to the binary phenotypes through some underlying thresholds. The new method actually maps loci for the variation of multivariate normal liabilities. As a result, we are able to take advantage of existing methods of joint mapping for quantitative traits. We treat the multivariate liabilities as missing values so that an expectation-maximization (EM) algorithm can be applied here. We also extend the method to joint mapping for both discrete and continuous traits. Efficiency of the method is demonstrated using simulated data. We also apply the new method to a set of real data and detect several loci responsible for blast resistance in rice.  相似文献   

10.
Comparative genomics using data mining tools   总被引:3,自引:0,他引:3  
We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis wereMethanococcus jannaschii, Haemophilus influenzae andSaccharomyces cerevisiae. We have identified the common and different features between the three genomes in the protein evolution patterns.M. jannaschii has been seen to have a greater number of proteins with more charged amino acids whereasS. cerevisiae has been observed to have a greater number of hydrophilic proteins. Despite the differences in intrinsic compositional characteristics between the proteins from the different genomes we have also identified certain common characteristics. We have carried out exploratory Principal Component Analysis of the multivariate data on the proteins of each organism in an effort to classify the proteins into clusters. Interestingly, we found that most of the proteins in each organism cluster closely together, but there are a few ‘outliers’. We focus on the outliers for the functional investigations, which may aid in revealing any unique features of the biology of the respective organisms.  相似文献   

11.
MOTIVATION: An early use of gene-expression data coming from microarrays was to discover non-linear multivariate intergene relationships. Pursuing this direction, the motivation for this paper is 2-fold: (1) to discover and elucidate multivariate logical predictive relations among gene expressions in a dataset arising from radiation studies using the NCI 60 Anti-Cancer Drug Screen (ACDS) cell lines; and (2) to demonstrate how these logical relations based on coarse quantization reflect corresponding relations in the continuous data. RESULTS: Using the coefficient of determination, a large number of logical relationships have been discovered among genes in the NCI 60 ACDS cell lines. Moreover, these relationships can be seen directly in the original continuous data, and many are robust relative to the thresholds used to obtain the logical data from the continuous data. A key observation is that a number of intergene relationships appear to be considerably stronger when p53 is functional as compared to when it is not, which is consistent with earlier findings in the literature. AVAILABILITY: The appendix is available at http://gsp.tamu.edu/Publications/supplement.htm CONTACT: edward@ee.tamu.edu.  相似文献   

12.
It is accepted that observed patterns in community structure change as analyses are carried out at higher taxonomic levels. Univariate analyses which incorporate higher taxonomic structure within assemblages have been shown to be informative. In this paper we suggest ways in which changes in multivariate relationships at higher taxonomic levels and associated with higher taxonomic/phylogenetic structure of the community may be incorporated into multivariate analyses, an aspect never occurred before in this type of analysis. Four approaches, namely: biodiversity MDS (bdMDS), number of taxa MDS (ntMDS), delta MDS (δMDS) and lambda MDS (λMDS), are proposed, and applied to theoretical data as well as to data collected from the literature on the Mediterranean lagoonal environment. Results show that these approaches have the capacity to distinguish severely impacted lagoons from naturally disturbed ones, although in practice the simplest method (ntMDS) was the most successful. Analyses based on the most abundant groups (polychaetes, molluscs, crustaceans) did not always match analyses based on the entire macrofauna, mirroring the performance of taxonomic distinctness indices in the Mediterranean lagoons. The important characteristics of the approaches introduced, as well as potential criticisms are provided. Application of these techniques on smaller scales and to other habitats, is suggested prior to their wider use in the region.  相似文献   

13.
14.
Kirkpatrick M  Meyer K 《Genetics》2004,168(4):2295-2306
Estimating the genetic and environmental variances for multivariate and function-valued phenotypes poses problems for estimation and interpretation. Even when the phenotype of interest has a large number of dimensions, most variation is typically associated with a small number of principal components (eigen-vectors or eigenfunctions). We propose an approach that directly estimates these leading principal components; these then give estimates for the covariance matrices (or functions). Direct estimation of the principal components reduces the number of parameters to be estimated, uses the data efficiently, and provides the basis for new estimation algorithms. We develop these concepts for both multivariate and function-valued phenotypes and illustrate their application in the restricted maximum-likelihood framework.  相似文献   

15.
The objective of this study was to evaluate genetic and non-genetic factors influencing characteristics of young buck semen production using a multivariate model that takes into account the longitudinal structure of data. Data were collected from 1989 to 2002 at two French A.I. centres. The data corresponded to 13151 and 9206 ejaculates of 758 Alpine and 535 Saanen bucks respectively, collected at the beginning of the first breeding season (September-December). The semen volume, the total number of spermatozoa, the concentration, the motility score of spermatozoa after freezing and the percentage of motile spermatozoa after freezing were registered for each ejaculate. Within-breed heritabilities and repeatabilities were estimated using a multivariate animal model using a power spatial covariance structure for environmental effect. For all characteristics and the two breeds, the main source of variation was the year-month interaction that interacted with the centre. We observed a decrease in years of motility score after freezing. Age and frequency of collection had a significant effect on semen volume and number of spermatozoa for both breeds, and on concentration of spermatozoa for the Alpine breed. No effect of these factors was found on the characteristics observed after freezing. Heritabilities for concentration, number of spermatozoa, semen volume, motility score after freezing and percentage of motile spermatozoa after freezing per ejaculate were respectively, 0.32, 0.15, 0.25, 0.12 and 0.05 for the Saanen breed and 0.34, 0.25, 0.29, 0.17 and 0.03 for the Alpine breed. Genetic correlations between volume and number of spermatozoa were respectively, 0.74 for the Alpine breed and 0.86 for the Saanen breed. Further study is required to compare the semen characteristics of young bucks with their mature production.  相似文献   

16.
A raster or grid-based Geographic Information System with data on tsetse, trypanosomiasis, animal production, agriculture and land use has recently been developed in Togo. The area-wide sampling of tsetse fly, aided by satellite imagery, is the subject of two separate papers. This paper follows on a first paper, published in this journal, describing the generation of digital tsetse distribution and abundance maps and how these accord with the local climatic and agro-ecological setting. Such maps when combined with data on the disease, the hosts and their owners, should contribute to the knowledge of the spatial epidemiology of trypanosomiasis and assist planning of integrated control operations. Here we address the problem of generating tsetse distribution and abundance maps from remotely sensed data, using a restricted amount of field data. Different discriminant analysis models have been applied using contemporary tsetse data and remotely sensed, low resolution data acquired from the National Oceanographic and Atmospheric Administration (NOAA) and Meteosat platforms. The results confirm the potential of satellite data application and multivariate analysis for the prediction of the tsetse distribution and abundance. This opens up new avenues because satellite predictions and field data may be combined to strengthen and/or substitute one another. The analysis shows how the strategic incorporation of satellite imagery may minimize field collection of data. Field surveys may be modified and conducted in two stages, first concentrating on the expected fly distribution limits and thereafter on fly abundance. The study also shows that when applying satellite data, care should be taken in selecting the optimal number of predictor variables because this number varies with the amount of training data for predicting abundance and on the homogeneity of the distribution limits for predicting fly presence. Finally, it is suggested that in addition to the use of contemporary training data and predictor variables, training and predicted data sets should refer to the same eco-geographic zone.  相似文献   

17.
A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method''s variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.  相似文献   

18.
Hong Zhang  Zheyang Wu 《Biometrics》2023,79(2):1159-1172
Combining dependent tests of significance has broad applications but the related p-value calculation is challenging. For Fisher's combination test, current p-value calculation methods (eg, Brown's approximation) tend to inflate the type I error rate when the desired significance level is substantially less than 0.05. The problem could lead to significant false discoveries in big data analyses. This paper provides two main contributions. First, it presents a general family of Fisher type statistics, referred to as the GFisher, which covers many classic statistics, such as Fisher's combination, Good's statistic, Lancaster's statistic, weighted Z-score combination, and so forth. The GFisher allows a flexible weighting scheme, as well as an omnibus procedure that automatically adapts proper weights and the statistic-defining parameters to a given data. Second, the paper presents several new p-value calculation methods based on two novel ideas: moment-ratio matching and joint-distribution surrogating. Systematic simulations show that the new calculation methods are more accurate under multivariate Gaussian, and more robust under the generalized linear model and the multivariate t-distribution. The applications of the GFisher and the new p-value calculation methods are demonstrated by a gene-based single nucleotide polymorphism (SNP)-set association study. Relevant computation has been implemented to an R package GFisher available on the Comprehensive R Archive Network.  相似文献   

19.
The Flooding Pampa natural grassland has an intricate pattern of plant communities, related to small topographic differences that determine important changes in soil characteristics. Despite limitations imposed by soil properties and periodic waterlogging, opportunistic tilling is carried out to plant pastures. There is little information on how pasture planting may affect the structure of the grassland communities. In order to document changes caused by cultural activities on structural and functional characteristics of plant communities in this landscape, we made field surveys in grasslands and very old pastures (grassland communities recovered through secondary succession) using transects located across existing topographic gradients. The patchy structure of this landscape was revealed by the multivariate analysis, by means of which four plant communities could be identified in the natural grassland. Species composition of these communities differed from that of the corresponding old pastures. They lost an important number of exclusive species, but also gained species: some new to the landscape and many already present in other environments. Pasture planting reduced the rate of species replacements along the gradient and produced changes in patchiness, but had no effect on the species–area curve at the landscape scale. Neither did we find differences in total number of species, average number of species/site and proportion of functional types. The new grassland created by opportunistic pasture planting has developed into a structural gradient in which important differences occurred in the lower waterlogged-prone stands, whereas the sites of the other communities experienced less structural changes.  相似文献   

20.
Feature Selection for Classification of SELDI-TOF-MS Proteomic Profiles   总被引:3,自引:0,他引:3  
BACKGROUND: Proteomic peptide profiling is an emerging technology harbouring great expectations to enable early detection, enhance diagnosis and more clearly define prognosis of many diseases. Although previous research work has illustrated the ability of proteomic data to discriminate between cases and controls, significantly less attention has been paid to the analysis of feature selection strategies that enable learning of such predictive models. Feature selection, in addition to classification, plays an important role in successful identification of proteomic biomarker panels. METHODS: We present a new, efficient, multivariate feature selection strategy that extracts useful feature panels directly from the high-throughput spectra. The strategy takes advantage of the characteristics of surface-enhanced laser desorption/ionisation time-of-flight mass spectrometry (SELDI-TOF-MS) profiles and enhances widely used univariate feature selection strategies with a heuristic based on multivariate de-correlation filtering. We analyse and compare two versions of the method: one in which all feature pairs must adhere to a maximum allowed correlation (MAC) threshold, and another in which the feature panel is built greedily by deciding among best univariate features at different MAC levels. RESULTS: The analysis and comparison of feature selection strategies was carried out experimentally on the pancreatic cancer dataset with 57 cancers and 59 controls from the University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania, USA. The analysis was conducted in both the whole-profile and peak-only modes. The results clearly show the benefit of the new strategy over univariate feature selection methods in terms of improved classification performance. CONCLUSION: Understanding the characteristics of the spectra allows us to better assess the relative importance of potential features in the diagnosis of cancer. Incorporation of these characteristics into feature selection strategies often leads to a more efficient data analysis as well as improved classification performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号