首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于性状—标记回归的QTL区间测验方法   总被引:5,自引:1,他引:4  
吴为人  李维明 《遗传》2001,23(2):143-146
本提两种基于性状-标记回归的QTL区间测验方法,分别称为TMRIT-I和TMRIT-II。前采用似然比统计量进行显性测验,与基于最小二乘的简化复合区间定位法(sCIM)等价,但计算机上明显简单快捷;后则采用一种“伪似然比”统计量进行显性测验,不仅进一步简化计算,而且明显提高统计功效,二皆可通过排列测验估计显阈值,给出了一个模拟例子。  相似文献   

2.
It is shown that a recently published least squares method for the estimation of the average center of rotation is biased. Consequently, a correction term is proposed, and an iterative algorithm is derived for finding a bias compensated solution to the least squares problem.The accuracy of the proposed bias compensated least squares method is compared to the previously proposed least squares method by Monte-Carlo simulations. The tests show that the new method gives a substantial improvement in accuracy.  相似文献   

3.
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.  相似文献   

4.
5.
S. Datta  M. Kiparsky  D. M. Rand    J. Arnold 《Genetics》1996,144(4):1985-1992
In this paper we use cytonuclear disequilibria to test the neutrality of mtDNA markers. The data considered here involve sample frequencies of cytonuclear genotypes subject to both statistical sampling variation as well as genetic sampling variation. First, we obtain the dynamics of the sample cytonuclear disequilibria assuming random drift alone as the source of genetic sampling variation. Next, we develop a test statistic using cytonuclear disequilibria via the theory of generalized least squares to test the random drift model. The null distribution of the test statistic is shown to be approximately chi-squared using an asymptotic argument as well as computer simulation. Power of the test statistic is investigated under an alternative model with drift and selection. The method is illustrated using data from cage experiments utilizing different cytonuclear genotypes of Drosophila melanogaster. A program for implementing the neutrality test is available upon request.  相似文献   

6.
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.  相似文献   

7.
It is very common in regression analysis to encounter incompletely observed covariate information. A recent approach to analyse such data is weighted estimating equations (Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994), JASA, 89, 846-866, and Zhao, L. P., Lipsitz, S. R. and Lew, D. (1996), Biometrics, 52, 1165-1182). With weighted estimating equations, the contribution to the estimating equation from a complete observation is weighted by the inverse of the probability of being observed. We propose a test statistic to assess if the weighted estimating equations produce biased estimates. Our test statistic is similar to the test statistic proposed by DuMouchel and Duncan (1983) for weighted least squares estimates for sample survey data. The method is illustrated using data from a randomized clinical trial on chemotherapy for multiple myeloma.  相似文献   

8.
Ghosh D 《Biometrics》2003,59(4):992-1000
Due to the advent of high-throughput microarray technology, it has become possible to develop molecular classification systems for various types of cancer. In this article, we propose a methodology using regularized regression models for the classification of tumors in microarray experiments. The performances of principal components, partial least squares, and ridge regression models are studied; these regression procedures are adapted to the classification setting using the optimal scoring algorithm. We also develop a procedure for ranking genes based on the fitted regression models. The proposed methodologies are applied to two microarray studies in cancer.  相似文献   

9.
Buckley–James (BJ) model is a typical semiparametric accelerated failure time model, which is closely related to the ordinary least squares method and easy to be constructed. However, traditional BJ model built on linearity assumption only captures simple linear relationships, while it has difficulty in processing nonlinear problems. To overcome this difficulty, in this paper, we develop a novel regression model for right-censored survival data within the learning framework of BJ model, basing on random survival forests (RSF), extreme learning machine (ELM), and L2 boosting algorithm. The proposed method, referred to as ELM-based BJ boosting model, employs RSF for covariates imputation first, then develops a new ensemble of ELMs—ELM-based boosting algorithm for regression by ensemble scheme of L2 boosting, and finally, uses the output function of the proposed ELM-based boosting model to replace the linear combination of covariates in BJ model. Due to fitting the logarithm of survival time with covariates by the nonparametric ELM-based boosting method instead of the least square method, the ELM-based BJ boosting model can capture both linear covariate effects and nonlinear covariate effects. In both simulation studies and real data applications, in terms of concordance index and integrated Brier sore, the proposed ELM-based BJ boosting model can outperform traditional BJ model, two kinds of BJ boosting models proposed by Wang et al., RSF, and Cox proportional hazards model.  相似文献   

10.
S Eguchi  M Matsuura 《Biometrics》1990,46(2):415-426
A new method of testing the Hardy-Weinberg equilibrium in the human leukocyte antigen (HLA) system is proposed and applied to real data. The derivation is based on the maximum likelihood method and closely related to standard regression theory. The test statistic has a closed representation of residual sum of squares by a projection mapping of data onto the estimated regression plane. Under the Hardy-Weinberg law the noniterative estimates for the gene frequencies are suggested by the use of the projection mapping. The test statistic and gene frequency estimates are shown to be asymptotically equivalent to the maximum likelihood method and to be more efficient than the other suggested test statistic when there are more than two identified alleles.  相似文献   

11.
A recursive least squares based on Multi-model is proposed for non-uniformly sampled-data nonlinear (NUSDN) systems. The corresponding state space model of an NUSDN system is derived using lifting technique. Taking advantage of the Fuzzy c-Mean Clustering algorithm, NUSDN is divided into several local models. The basic idea is that the NUSDN system is viewed as a model switching system under a given rule. Once the local models are identified, the global model is determined. A pH neutralization process validate the performance of the proposed algorithm.  相似文献   

12.
Summary This article develops a latent model and likelihood‐based inference to detect temporal clustering of events. The model mimics typical processes generating the observed data. We apply model selection techniques to determine the number of clusters, and develop likelihood inference and a Monte Carlo expectation–maximization algorithm to estimate model parameters, detect clusters, and identify cluster locations. Our method differs from the classical scan statistic in that we can simultaneously detect multiple clusters of varying sizes. We illustrate the methodology with two real data applications and evaluate its efficiency through simulation studies. For the typical data‐generating process, our methodology is more efficient than a competing procedure that relies on least squares.  相似文献   

13.
Daniel R. Kowal  Bohan Wu 《Biometrics》2023,79(2):1520-1533
‘‘For how many days during the past 30 days was your mental health not good?” The responses to this question measure self-reported mental health and can be linked to important covariates in the National Health and Nutrition Examination Survey (NHANES). However, these count variables present major distributional challenges: The data are overdispersed, zero-inflated, bounded by 30, and heaped in 5- and 7-day increments. To address these challenges—which are especially common for health questionnaire data—we design a semiparametric estimation and inference framework for count data regression. The data-generating process is defined by simultaneously transforming and rounding (star ) a latent Gaussian regression model. The transformation is estimated nonparametrically and the rounding operator ensures the correct support for the discrete and bounded data. Maximum likelihood estimators are computed using an expectation-maximization (EM) algorithm that is compatible with any continuous data model estimable by least squares. star regression includes asymptotic hypothesis testing and confidence intervals, variable selection via information criteria, and customized diagnostics. Simulation studies validate the utility of this framework. Using star regression, we identify key factors associated with self-reported mental health and demonstrate substantial improvements in goodness-of-fit compared to existing count data regression models.  相似文献   

14.
15.
To achieve rapid detection of carbapenem-resistant Escherichia coli strains, a pattern recognition method based on electrospray ionization Orbitrap mass spectrometry (ESI-Orbitrap MS) was used for the analysis of drug-resistant, and sensitive strains of metabolites were analyzed. Results of five clustering methods applied to analytical data of metabolites were evaluated using iso-phenotypic coefficients. The effectiveness of three methods, principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA) and orthogonal partial least squares discriminant analysis (OPLS-DA), was compared. Univariate statistics such as t-test and fold change were also used to examine the screened differential information. Both PLS-DA and OPLS-DA could achieve rapid identification of strain classes, and OPLS-DA was more powerful in screening 96 significantly different ions. This work is expected to be useful for rapid and accurate identification of strains.  相似文献   

16.

Background  

Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space.  相似文献   

17.
18.
基于DWT-GA-PLS的土壤碱解氮含量高光谱估测方法   总被引:1,自引:0,他引:1  
以山东齐河县为研究区,实地采集土壤样本,在土样高光谱测试并进行一阶导数变换的基础上,先运用离散小波变换(DWT)对土壤光谱去噪降维,然后采用遗传算法(GA)筛选土壤碱解氮定量估测模型的参与变量,最后应用偏最小二乘(PLS)回归构建土壤碱解氮含量的估测模型.结果表明: 离散小波变换结合遗传算法和偏最小二乘法(DWT-GA-PLS)用于土壤碱解氮含量定量估测,不仅可压缩光谱变量、减少模型参与变量,而且可改善模型估测准确度;较之于采用土壤全谱,小波离散分解1~2层低频系数构建的模型在参与变量大幅减少的情况下,取得更准确或与之相当的预测结果,其中,基于第2层小波低频系数采用GA筛选变量构建的PLS模型的预测效果表现最好,预测R2达到0.85,RMSE为8.11 mg·kg-1,RPD为2.53.说明DWT-GA-PLS用于土壤碱解氮含量高光谱定量估测的有效性.  相似文献   

19.
A general Akaike-type criterion for model selection in robust regression   总被引:2,自引:0,他引:2  
BURMAN  P.; NOLAN  D. 《Biometrika》1995,82(4):877-886
Akaike's procedure (1970) for selecting a model minimises anestimate of the expected squared error in predicting new, independentobservations. This selection criterion was designed for modelsfitted by least squares. A different model-fitting technique,such as least absolute deviation regression, requires an appropriatemodel selection procedure. This paper presents a general Akaike-typecriterion applicable to a wide variety of loss functions formodel fitting. It requires only that the function be convexwith a unique minimum, and twice differentiable in expectation.Simulations show that the estimators proposed here well approximatetheir respective prediction errors.  相似文献   

20.
Investigation of protein‐ligand interactions obtained from experiments has a crucial part in the design of newly discovered and effective drugs. Analyzing the data extracted from known interactions could help scientists to predict the binding affinities of promising ligands before conducting experiments. The objective of this study is to advance the CIFAP (compressed images for affinity prediction) method, which is relevant to a protein‐ligand model, identifying 2D electrostatic potential images by separating the binding site of protein‐ligand complexes and using the images for predicting the computational affinity information represented by pIC50 values. The CIFAP method has 2 phases, namely, data modeling and prediction. In data modeling phase, the separated 3D structure of the binding pocket with the ligand inside is fitted into an electrostatic potential grid box, which is then compressed through 3 orthogonal directions into three 2D images for each protein‐ligand complex. Sequential floating forward selection technique is performed for acquiring prediction patterns from the images. In the prediction phase, support vector regression (SVR) and partial least squares regression are used for testing the quality of the CIFAP method for predicting the binding affinity of 45 CHK1 inhibitors derived from 2‐aminothiazole‐4‐carboxamide. The results show that the CIFAP method using both support vector regression and partial least squares regression is very effective for predicting the binding affinities of CHK1‐ligand complexes with low‐error values and high correlation. As a future work, the results could be improved by working on the pose of the ligands inside the grid.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号