首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.  相似文献   

2.
3.
Quantitative trait nucleotide analysis using Bayesian model selection   总被引:4,自引:0,他引:4  
Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.  相似文献   

4.
The paper describes the dominating role of surface tension (ST) on the modeling, monitoring, and estimating pK(a) for a large series of 43 substituted sulfonamides. Because of the direct correlation of ST with parachor (Pc) vis-a-vis molecular volume (MV), ST is considered as a steric parameter. Single as well as multi-parametric regressions have indicated that ST has a dominating role in QSAR of the set of sulfonamides used and that excellent results are obtained in multi-parametric regression analysis. The results are discussed critically on the basis of statistical parameters.  相似文献   

5.
6.
Analysis by GC and GC/MS of the essential oil obtained from Malaysian Curcuma mangga Val. & Zijp (Zingiberaceae) rhizomes allowed the identification of 97 constituents, comprising 89.5% of the total oil composition. The major compounds were identified as myrcene (1; 46.5%) and β-pinene (2; 14.6%). The chemical composition of this and additional 13 oils obtained from selected Curcuma L. taxa were compared using multivariate statistical analyses (agglomerative hierarchical cluster analysis and principal component analysis). The results of the statistical analyses of this particular data set pointed out that 1 could be potentially used as a valuable infrageneric chemotaxonomical marker for C. mangga. Moreover, it seems that C. mangga, C. xanthorrhiza Roxb., and C. longa L. are, with respect to the volatile secondary metabolites, closely related. In addition, comparison of the essential oil profiles revealed a potential influence of the environmental (geographical) factors, alongside with the genetic ones, on the production of volatile secondary metabolites in Curcuma taxa.  相似文献   

7.
8.
Statistical analysis of the distribution of the mean somatometric, functional, and psychophysiological parameters in the total sample of subjects with the use of the χ2 and λ tests and low, medium, and high habitual physical activity (HPA) levels (LHPA, MHPA, and HHPA, respectively) at different ontogenetic stages (junior and senior school students and young adults of both sexes) showed wide quantitative and qualitative ranges of psychophysiological individuality in a healthy population and demonstrated that it is reasonable to distinguish three typological groups or functional constitutional types (FT-1 corresponding to LHPA; FT-2, to MHPA; and FT-3, to HHPA). Typical first- and second-order parameters, as well as the results of third-order tests characterizing the current state, were determined for each FT. In order to comprehensively assess the constitutional type (synthetic constitution) of the subjects with low, medium, and high HPAs, the integrated analysis of the set of their characteristics was performed using the principles of polythetic (multi-variate) classification. The results obtained using multivariate statistical methods confirmed the basic postulate of the concept of typological variability of physiological individuality that a healthy human population is qualitatively heterogeneous in morphological, functional, and psychophysiological traits. The integrated physiological and statistical analyses of the results provided a scientific basis for three functional constitutional types (FT-1, FT-2, and FT-3) corresponding to three synthetic constitutional types (C 0-1, C 00, and C 01). These data indicate that the systemic (constitutional) approach to the estimation of individual typological characteristics confirm a high informativeness of partial constitution (FT-1, FT-2, and FT-3) in the human biological organization, and the set of characters selected for analysis allows the synthetic constitutional types to be adequately differentiated on a formal basis.  相似文献   

9.
10.
ABSTRACT Most ecologists use statistical methods as their main analytical tools when analyzing data to identify relationships between a response and a set of predictors; thus, they treat all analyses as hypothesis tests or exercises in parameter estimation. However, little or no prior knowledge about a system can lead to creation of a statistical model or models that do not accurately describe major sources of variation in the response variable. We suggest that under such circumstances data mining is more appropriate for analysis. In this paper we 1) present the distinctions between data-mining (usually exploratory) analyses and parametric statistical (confirmatory) analyses, 2) illustrate 3 strengths of data-mining tools for generating hypotheses from data, and 3) suggest useful ways in which data mining and statistical analyses can be integrated into a thorough analysis of data to facilitate rapid creation of accurate models and to guide further research.  相似文献   

11.
Many factors have been hypothesized to affect the human secondary sex ratio (the annual percentage of males among all live births), among them race, parental ages, and birth order. Some authors have even proposed warfare as a factor influencing live birth sex ratios. The hypothesis that during and shortly after periods of war the human secondary sex ratio is higher has received little statistical treatment. In this paper we evaluate the war hypothesis using 3 statistical methods: linear regression, randomization, and time-series analysis. Live birth data from 10 different countries were included. Although we cannot speak of a general phenomenon, statistical evidence for an association between warfare and live birth sex ratio was found for several countries. Regression and randomization test results were in agreement. Time-series analysis showed that most human sex-ratio time series can be described by a common model. The results obtained using intervention models differed somewhat from results obtained by regression methods.  相似文献   

12.
OBJECTIVE: To present a set of novel computerized analysis algorithms to construct a computer-aided cytologic diagnosis (CACD) system to differentiate lung cancer biomarkers and identify cancer cells in the tissue-based specimen images. STUDY DESIGN: Molecular methods, including application of cancer-specific markers, may prove to be complementary to cytology diagnosis, especially when they are combined with CACD system for biomarker assessment. We trained a novel CACD system to recognize expression of the cancer biomarkers histone H2AX in lung cancer cells and then tested the accuracy of this system to distinguish resected lung cancer from preneoplastic and normal tissues. The major characteristics of CACD algorithms is to adapt detection parameters according to cellular image contents. Our newly developed wavelet transform is able to adaptively select different resolution and orientation features based on image content requirements. RESULTS: Visual, statistical and quantitative results as CACD performance evaluation are presented in this paper. CONCLUSION: The presented algorithms and CACD system for cellular feature enhancement, segmentation and classification are very important in distinguishing benign and malignant lesions.  相似文献   

13.
Most conventional human health and function evaluation methods are based on a traditional notion that all the population characteristics follow the Gaussian distribution law with the parameters M and s forming the basis of the norm conception. But some known facts contradict this idea that requires checking the statistical homogeneity of population characteristics. Analysis of statistical distribution and central tendencies for simple measured indices in population and somatotypes samples proved an idea of natural population distinctions by a broad set of morpho-functional features (by means of 23-D matrix cluster analysis for different indices) and provided the scientific grounds to use a constitutional approach in human sciences and physical education as well. Gaussian distribution law was found within somatotype groups permitting the use of its parameters for norm evaluation. In practice for somatotype determination the relative girth body dimensions (normalized by body height) were proved to be preferable.  相似文献   

14.
Two statistical tests for meiotic breakpoint analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
Meiotic breakpoint analysis (BPA), a statistical method for ordering genetic markers, is increasing in importance as a method for building genetic maps of human chromosomes. Although BPA does not provide estimates of genetic distances between markers, it efficiently locates new markers on already defined dense maps, when likelihood analysis becomes cumbersome or the sample size is small. However, until now no assessments of statistical significance have been available for evaluating the possibility that the results of a BPA were produced by chance. In this paper, we propose two statistical tests to determine whether the size of a sample and its genetic information content are sufficient to distinguish between "no linkage" and "linkage" of a marker mapped by BPA to a certain region. Both tests are exact and should be conducted after a BPA has assigned the marker to an interval on the map. Applications of the new tests are demonstrated by three examples: (1) a synthetic data set, (2) a data set of five markers on human chromosome 8p, and (3) a data set of four markers on human chromosome 17q.  相似文献   

15.
Two multivariate statistical methods, factor analysis (FA) and hierarchical cluster analysis (HCA), were applied to experimental data set to evaluate their usefulness in selecting the adequate expression system and optimal growth parameters for recombinant cyprosin B production. Using FA, the large data set was reduced to two factors representing 73.4% of variability. Factor 1, with 53.5% of variability, corresponds to recombinant cyprosin B expression and efficient secretion, while factor 2, accounting for 19.9% of variability, represents cell growth and physiological characteristics. FA and HCA allowed the establishment of correlations among different variables and the clusters obtained providing clear identification of the experimental parameters related to cyprosin B production, which results on more accurate scientific output and time saving when selection of an adequate expression system is concerned.  相似文献   

16.
Shirota M  Ishida T  Kinoshita K 《Proteins》2011,79(5):1550-1563
In protein structure prediction, it is crucial to evaluate the degree of native-likeness of given model structures. Statistical potentials extracted from protein structure data sets are widely used for such quality assessment problems, but they are only applicable for comparing different models of the same protein. Although various other methods, such as machine learning approaches, were developed to predict the absolute similarity of model structures to the native ones, they required a set of decoy structures in addition to the model structures. In this paper, we tried to reformulate the statistical potentials as absolute quality scores, without using the information from decoy structures. For this purpose, we regarded the native state and the reference state, which are necessary components of statistical potentials, as the good and bad standard states, respectively, and first showed that the statistical potentials can be regarded as the state functions, which relate a model structure to the native and reference states. Then, we proposed a standardized measure of protein structure, called native-likeness, by interpolating the score of a model structure between the native and reference state scores defined for each protein. The native-likeness correlated with the similarity to the native structures and discriminated the native structures from the models, with better accuracy than the raw score. Our results show that statistical potentials can quantify the native-like properties of protein structures, if they fully utilize the statistical information obtained from the data set.  相似文献   

17.
High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.  相似文献   

18.
19.
Cluster analysis has proven to be a valuable statistical method for analyzing whole genome expression data. Although clustering methods have great utility, they do represent a lower level statistical analysis that is not directly tied to a specific model. To extend such methods and to allow for more sophisticated lines of inference, we use cluster analysis in conjunction with a specific model of gene expression dynamics. This model provides phenomenological dynamic parameters on both linear and non-linear responses of the system. This analysis determines the parameters of two different transition matrices (linear and nonlinear) that describe the influence of one gene expression level on another. Using yeast cell cycle microarray data as test set, we calculated the transition matrices and used these dynamic parameters as a metric for cluster analysis. Hierarchical cluster analysis of this transition matrix reveals how a set of genes influence the expression of other genes activated during different cell cycle phases. Most strikingly, genes in different stages of cell cycle preferentially activate or inactivate genes in other stages of cell cycle, and this relationship can be readily visualized in a two-way clustering image. The observation is prior to any knowledge of the chronological characteristics of the cell cycle process. This method shows the utility of using model parameters as a metric in cluster analysis.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号