首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Stocks of commercial fish are often modelled using sampling data of various types, of unknown precision, and from various sources assumed independent. We want each set to contribute to estimates of the parameters in relation to its precision and goodness of fit with the model. Iterative re-weighting of the sets is proposed for linear models until the weight of each set is found to be proportional to (relative weighting) or equal to (absolute weighting) the set-specific residual invariances resulting from a generalised least squares fit. Formulae for the residual variances are put forward involving fractional allocation of degrees of freedom depending on the numbers of independent observations in each set, the numbers of sets contributing to the estimate of each parameter, and the number of weights estimated. To illustrate the procedure, numbers of the 1984 year-class of North Sea cod (a) landed commercially each year, and (b) caught per unit of trawling time by an annual groundfish survey are modelled as a function of age to estimate total mortality, Z, relative catching power of the two fishing methods, and relative precision of the two sets of observations as indices of stock abundance. It was found that the survey abundance indices displayed residual variance about 29 times higher than that of the annual landings.  相似文献   

3.
Variable selection is critical in competing risks regression with high-dimensional data. Although penalized variable selection methods and other machine learning-based approaches have been developed, many of these methods often suffer from instability in practice. This paper proposes a novel method named Random Approximate Elastic Net (RAEN). Under the proportional subdistribution hazards model, RAEN provides a stable and generalizable solution to the large-p-small-n variable selection problem for competing risks data. Our general framework allows the proposed algorithm to be applicable to other time-to-event regression models, including competing risks quantile regression and accelerated failure time models. We show that variable selection and parameter estimation improved markedly using the new computationally intensive algorithm through extensive simulations. A user-friendly R package RAEN is developed for public use. We also apply our method to a cancer study to identify influential genes associated with the death or progression from bladder cancer.  相似文献   

4.
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re‐weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (?F and F‐ratio) under ideal or non‐ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non‐ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions.  相似文献   

5.
A symmetric stepwise mutation model with reflecting boundaries is employed to evaluate microsatellite evolution under range constraints. Methods of estimating range constraints and mutation rates under the assumptions of the model are developed. Least squares procedures are employed to improve molecular distance estimation for use in phylogenetic reconstruction in the case where range constraints and mutation rates vary across loci. The bias and accuracy of these methods are evaluated using computer simulations, and they are compared to previously existing methods which do not assume range constraints. Range constraints are seen to have a substantial impact on phylogenetic conclusions based on molecular distances, particularly for more divergent taxa. Results indicate that if range constraints are in effect, the methods developed here should be used in both the preliminary planning and final analysis of phylogenetic studies employing microsatellites. It is also seen that in order to make accurate phylogenetic inferences under range constraints, a larger number of loci are required than in their absence.  相似文献   

6.
7.
Phalanges are considered to be highly informative in the reconstruction of extinct primate locomotor behavior since these skeletal elements directly interact with the substrate during locomotion. Variation in shaft curvature and relative phalangeal length has been linked to differences in the degree of suspension and overall arboreal locomotor activities. Building on previous work, this study investigated these two skeletal characters in a comparative context to analyze function, while taking evolutionary relationships into account. This study examined the correspondence between proportions of suspension and overall substrate usage observed in 17 extant taxa and included angle of curvature and relative phalangeal length. Predictive models based on these traits are reported. Published proportions of different locomotor behaviors were regressed against each phalangeal measurement and a size proxy. The relationship between each behavior and skeletal trait was investigated using ordinary least-squares, phylogenetic generalized least-squares (pGLS), and two pGLS transformation methods to determine the model of best-fit. Phalangeal curvature and relative length had significant positive relationships with both suspension and overall arboreal locomotion. Cross-validation analyses demonstrated that relative length and curvature provide accurate predictions of relative suspensory behavior and substrate usage in a range of extant species when used together in predictive models. These regression equations provide a refined method to assess the amount of suspensory and overall arboreal locomotion characterizing species in the catarrhine fossil record.  相似文献   

8.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

9.
A confidence region for topologies is a data-dependent set of topologies that, with high probability, can be expected to contain the true topology. Because of the connection between confidence regions and hypothesis tests, implicitly or explicitly, the construction of confidence regions for topologies is a component of many phylogenetic studies. Existing methods for constructing confidence regions, however, often give conflicting results. The Shimodaira-Hasegawa test seems too conservative, including too many topologies, whereas the other commonly used method, the Swofford-Olsen-Waddell-Hillis test, tends to give confidence regions with too few topologies. Confidence regions are constructed here based on a generalized least squares test statistic. The methodology described is computationally inexpensive and broadly applicable to maximum likelihood distances. Assuming the model used to construct the distances is correct, the coverage probabilities are correct with large numbers of sites.  相似文献   

10.
The usual F test of regression coincidence, which is appropriate under a homoscedastic model, is examined under a multiplicatively heteroscedastic model. The departure of the test from its nominal level is slight when the sample of explanatory variables is symmetric, but may be substantially inflated when the sample has positive skew. Conversely, the nominal level may be slightly depressed when the sample has negative skew. The size of the perturbation from the nominal level depends on the degree of heteroscedasticity, however its effect is more pronounced with positively skewed samples. Similar trends are evident for the usual F test of regression parallelism. There is no apparent pattern to the discrepancy of the level of the test with regard to the data which would permit empirical researchers to adjust their results.  相似文献   

11.
The method of generalized least squares (GLS) is used to assess the variance function for isothermal titration calorimetry (ITC) data collected for the 1:1 complexation of Ba(2+) with 18-crown-6 ether. In the GLS method, the least squares (LS) residuals from the data fit are themselves fitted to a variance function, with iterative adjustment of the weighting function in the data analysis to produce consistency. The data are treated in a pooled fashion, providing 321 fitted residuals from 35 data sets in the final analysis. Heteroscedasticity (nonconstant variance) is clearly indicated. Data error terms proportional to q(i) and q(i)/v are well defined statistically, where q(i) is the heat from the ith injection of titrant and v is the injected volume. The statistical significance of the variance function parameters is confirmed through Monte Carlo calculations that mimic the actual data set. For the data in question, which fall mostly in the range of q(i)=100-2000 microcal, the contributions to the data variance from the terms in q(i)(2) typically exceed the background constant term for q(i)>300 microcal and v<10 microl. Conversely, this means that in reactions with q(i) much less than this, heteroscedasticity is not a significant problem. Accordingly, in such cases the standard unweighted fitting procedures provide reliable results for the key parameters, K and DeltaH(degrees) and their statistical errors. These results also support an important earlier finding: in most ITC work on 1:1 binding processes, the optimal number of injections is 7-10, which is a factor of 3 smaller than the current norm. For high-q reactions, where weighting is needed for optimal LS analysis, tips are given for using the weighting option in the commercial software commonly employed to process ITC data.  相似文献   

12.
Lu Deng  Han Zhang  Lei Song  Kai Yu 《Biometrics》2020,76(2):369-379
Mendelian randomization (MR) is a type of instrumental variable (IV) analysis that uses genetic variants as IVs for a risk factor to study its causal effect on an outcome. Extensive investigations on the performance of IV analysis procedures, such as the one based on the two-stage least squares (2SLS) procedure, have been conducted under the one-sample scenario, where measures on IVs, the risk factor, and the outcome are assumed to be available for each study participant. Recent MR analysis usually is performed with data from two independent or partially overlapping genetic association studies (two-sample setting), with one providing information on the association between the IVs and the outcome, and the other on the association between the IVs and the risk factor. We investigate the performance of 2SLS in the two-sample–based MR when the IVs are weakly associated with the risk factor. We derive closed form formulas for the bias and mean squared error of the 2SLS estimate and verify them with numeric simulations under realistic circumstances. Using these analytic formulas, we can study the pros and cons of conducting MR analysis under one-sample and two-sample settings and assess the impact of having overlapping samples. We also propose and validate a bias-corrected estimator for the causal effect.  相似文献   

13.
Exact test statistics and confidence intervals for a general split block ANOCOVA model are derived. With a single covariate, each statistic for testing main effect A, main effect B, and the AxB interaction has one less numerator degree of freedom than its counterpart in the ordinary ANOVA without a covariate. Sufficient conditions on the model parameters which allow these lost numerator degrees of freedom to be regained are given, as are exact statistics and confidence intervals for the corresponding reduced models. A note of caution is offered when constructing test statistics for reduced versions of the general model using the method of generalized least squares. General analysis of covariance models for two other block designs are presented.  相似文献   

14.
Near-infrared spectroscopy (NIRS) is known to be a suitable technique for rapid fermentation monitoring. Industrial fermentation media are complex, both chemically (ill-defined composition) and physically (multiphase sample matrix), which poses an additional challenge to the development of robust NIRS calibration models. We investigated the use of NIRS for at-line monitoring of the concentration of clavulanic acid during an industrial fermentation. An industrial strain of Streptomyces clavuligerus was cultivated at 200-L scale for the production of clavulanic acid. Partial least squares (PLS) regression was used to develop calibration models between spectral and analytical data. In this work, two different variable selection methods, genetic algorithms (GA) and PLS-bootstrap, were studied and compared with models built using all the spectral variables. Calibration models for clavulanic acid concentration performed well both on internal and external validation. The two variable selection methods improved the predictive ability of the models up to 20%, relative to the calibration model built using the whole spectra.  相似文献   

15.
A new method for analyzing three-state protein unfolding equilibria is described that overcomes the difficulties created by direct effects of denaturants on circular dichroism (CD) and fluorescence spectra of the intermediate state. The procedure begins with a singular value analysis of the data matrix to determine the number of contributing species and perturbations. This result is used to choose a fitting model and remove all spectra from the fitting equation. Because the fitting model is a product of a matrix function which is nonlinear in the thermodynamic parameters and a matrix that is linear in the parameters that specify component spectra, the problem is solved with a variable projection algorithm. Advantages of this procedure are perturbation spectra do not have to be estimated before fitting, arbitrary assumptions about magnitudes of parameters that describe the intermediate state are not required, and multiple experiments involving different spectroscopic techniques can be simultaneously analyzed. Two tests of this method were performed: First, simulated three-state data were analyzed, and the original and recovered thermodynamic parameters agreed within one standard error, whereas recovered and original component spectra agreed within 0.5%. Second, guanidine-induced unfolding titrations of the human retinoid-X-receptor ligand-binding domain were analyzed according to a three-state model. The standard unfolding free energy changes in the absence of guanidine and the guanidine concentrations at zero free-energy change for both transitions were determined from a joint analysis of fluorescence and CD spectra. Realistic spectra of the three protein states were also obtained.  相似文献   

16.
Information about the age distribution and survival of wild populations is of much interest in ecology and biodemography, but is hard to obtain. Established schemes such as capture-recapture often are not feasible. In the proposed residual demography paradigm, individuals are randomly sampled from the wild population at unknown ages and the resulting captive cohort is reared out in the laboratory until death. Under some basic assumptions one obtains a demographic convolution equation that involves the unknown age distribution of the wild population, the observed survival function of the captive cohort, and the observed survival function of a reference cohort that is independently raised in the laboratory from birth. We adopt a statistical penalized least squares method for the deconvolution of this equation, aiming at extracting the age distribution of the wild population under suitable constraints. Under stationarity of the population, the age density is proportional to the survival function of the wild population and can thus be inferred. Several extensions are discussed. Residual demography is demonstrated for data on fruit flies Bactrocera oleae.  相似文献   

17.
This paper has two major objectives. The first is to present a two-stage least squares procedure for estimation of the parameters in a linear model whose parameters are in themselves linear functions of some hyperparameters. The second, and perhaps more important point, is that the new estimator can be shown to be generally more precise than either the Bayesian or the generalized single-stage least squares estimator reported by LINDLEY and SMITH (1072).  相似文献   

18.
Identifying the optimal treatment strategy for cancer is an important challenge, particularly for complex diseases like epithelial ovarian cancer (EOC) that are prone to recurrence. In this study we developed a quantitative, multivariate model to predict the extent of ovarian cancer cell death following treatment with an ErbB inhibitor (canertinib, CI-1033). A partial least squares regression model related the levels of ErbB receptors and ligands at the time of treatment to sensitivity to CI-1033. In this way, the model mimics the clinical problem by incorporating only information that would be available at the time of drug treatment. The full model was able to fit the training set data and was predictive. Model analysis demonstrated the importance of including both ligand and receptor levels in this approach, consistent with reports of the role of ErbB autocrine loops in EOC. A reduced multi-protein model was able to predict CI-1033 sensitivity of six distinct EOC cell lines derived from the three subtypes of EOC, suggesting that quantitatively characterizing the ErbB network could be used to broadly predict EOC response to CI-1033. Ultimately, this systems biology approach examining multiple proteins has the potential to uncover multivariate functions to identify subsets of tumors that are most likely to respond to a targeted therapy.  相似文献   

19.
This paper reviews a general framework for the modelling of longitudinal data with random measurement times based on marked point processes and presents a worked example. We construct a quite general regression models for longitudinal data, which may in particular include censoring that only depend on the past and outside random variation, and dependencies between measurement times and measurements. The modelling also generalises statistical counting process models. We review a non-parametric Nadarya-Watson kernel estimator of the regression function, and a parametric analysis that is based on a conditional least squares (CLS) criterion. The parametric analysis presented, is a conditional version of the generalised estimation equations of LIANG and ZEGER (1986). We conclude that the usual nonparametric and parametric regression modelling can be applied to this general set-up, with some modifications. The presented framework provides an easily implemented and powerful tool for model building for repeated measurements.  相似文献   

20.
Large populations of potential cellulosic biomass feedstocks are currently being screened for fuel and chemical applications. The monomeric sugar content, released through hydrolysis, is of particular importance and is currently measured with time‐consuming HPLC methods. A method for sugar detection is presented here that employs 1H NMR spectra regressed against primary HPLC sugar concentration data to build partial least squares (PLS) models. The PLS2 model is able to predict concentrations of both major sugar components, like glucose and xylose, and minor sugars, such as arabinose and mannose, in biomass hydrolysates. The model was built with 65 samples from a variety of different biomass species and covers a wide range of sugar concentrations. Model predictions were validated with a set of 15 samples which were all within error of both HPLC and NMR integration measurements. The data collection time for these NMR measurements is less than 20 min, offering a significant improvement to the 1 h acquisition time that is required for HPLC. Biotechnol. Bioeng. 2013; 110: 721–728. © 2012 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号