首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Zhang  Hao Helen; Lu  Wenbin 《Biometrika》2007,94(3):691-703
We investigate the variable selection problem for Cox's proportionalhazards model, and propose a unified model selection and estimationprocedure with desired theoretical properties and computationalconvenience. The new method is based on a penalized log partiallikelihood with the adaptively weighted L1 penalty on regressioncoefficients, providing what we call the adaptive Lasso estimator.The method incorporates different penalties for different coefficients:unimportant variables receive larger penalties than importantones, so that important variables tend to be retained in theselection process, whereas unimportant variables are more likelyto be dropped. Theoretical properties, such as consistency andrate of convergence of the estimator, are studied. We also showthat, with proper choice of regularization parameters, the proposedestimator has the oracle properties. The convex optimizationnature of the method leads to an efficient algorithm. Both simulatedand real examples show that the method performs competitively.  相似文献   

2.
The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high‐dimensional models where the number of covariates is much larger than the number of observations ( $p \,{\gg }\, n$ ) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L1‐penalized Cox regression using the lasso (Tibshirani ( 1997 ). Statistics in Medicine 16 , 385–395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li ( 2001 ). Journal of the American Statistical Association 96 , 1348–1360; Fan and Li ( 2002 ). The Annals of Statistics 30 , 74–99). The purpose of this article is to implement them practically into the model building process when analyzing high‐dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou ( 2006 ). Journal of the American Statistical Association 101 , 1418–1429). We compare them with “standard” applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research.  相似文献   

3.
The paper considers the problems in the adaptive evolution of life-history traits for individuals in the nonlinear Leslie model of age-structured population. The possibility to predict adaptation results as the values of organism’s traits (properties) that provide for the maximum of a certain function of traits (optimization criterion) is studied. An ideal criterion of this type is Darwinian fitness as a characteristic of vital success of an individual. Criticism of the optimization approach is associated with the fact that it does not take into account the changes in the environment (in a broad sense) caused by evolution, thereby leading to losses in the adequacy of the criterion. In addition, the justification for this criterion under stationary conditions is not usually rigorous. It has been suggested to overcome these objections in terms of the adaptive dynamics theory using the concept of invasive fitness. The reasons are given that favor the application of the average number of offspring for an individual, R L , as an optimization criterion in the nonlinear Leslie model. According to the theory of quantitative genetics, the selection for fertility (that is, for a set of correlated quantitative traits determined by both multiple loci and the environment) leads to an increase in R L . In terms of adaptive dynamics, the maximum R L corresponds to the evolutionary stability and, in certain cases, convergent stability of the values for traits. The search for evolutionarily stable values on the background of limited resources for reproduction is a problem of linear programming.  相似文献   

4.
Uncovering driver genes is crucial for understanding heterogeneity in cancer. L 1-type regularization approaches have been widely used for uncovering cancer driver genes based on genome-scale data. Although the existing methods have been widely applied in the field of bioinformatics, they possess several drawbacks: subset size limitations, erroneous estimation results, multicollinearity, and heavy time consumption. We introduce a novel statistical strategy, called a Recursive Random Lasso (RRLasso), for high dimensional genomic data analysis and investigation of driver genes. For time-effective analysis, we consider a recursive bootstrap procedure in line with the random lasso. Furthermore, we introduce a parametric statistical test for driver gene selection based on bootstrap regression modeling results. The proposed RRLasso is not only rapid but performs well for high dimensional genomic data analysis. Monte Carlo simulations and analysis of the “Sanger Genomics of Drug Sensitivity in Cancer dataset from the Cancer Genome Project” show that the proposed RRLasso is an effective tool for high dimensional genomic data analysis. The proposed methods provide reliable and biologically relevant results for cancer driver gene selection.  相似文献   

5.
An adaptive R-estimator θA and an adaptive trimmed mean MAT are proposed. The performance of these and a number of other robust estimators are studied on real data sets, drawn from the astronomical, behavioural, biomedical, chemical, engineering and physical sciences. In the case of sets that can be assumed to have come from symmetric distributions, the best performer is θA. The next best performers are the Hodges-Lehmann estimator, Bisquare (7.5) and Huber (1.5), in that order. MAT works well with all kinds of sets–symmetric or skewed. Extensions of these results to ANOVA and regression models are mentioned.  相似文献   

6.
The paper considers the properties of individual life history corresponding to the Leslie model of age-structured population. The life history is modelled as a finite Markov chain with absorption at a death state of individual. In this model, individual longevity, average number of offspring R L (produced by an individual over the entire life), and some other known characteristics of the life history have been derived using simple probability methods that do not involve matrix calculus and their individual components have been interpreted. In the linear Leslie population model (derived by simple modification of a Markov chain), R L determines the growth or decline of a population. Individuals with higher R L values have evolutionary advantages in the population due to accelerated growth in their number. The selection of fertility as a factor of the increase in R L is considered. In the Leslie model, fertility is a set of correlated quantitative traits, where the age-specific fertility components are determined both by multiple loci and the environment. According to the genetic theory of quantitative trait selection, they evolve towards an increase in R L . Taking into account the limited resources for reproduction, selection optimizes the fertility distribution according to age. Optimal distribution corresponds to the attainment of the maximum R L . This complies with the maximization of the rate of population growth (r-selection), which is characteristic of linear population models. The search for the R L maximum and optimal distribution of fertility belongs to the field of linear programming.  相似文献   

7.
This paper focuses on the problems of estimation and variable selection in the functional linear regression model (FLM) with functional response and scalar covariates. To this end, two different types of regularization (L1 and L2) are considered in this paper. On the one hand, a sample approach for functional LASSO in terms of basis representation of the sample values of the response variable is proposed. On the other hand, we propose a penalized version of the FLM by introducing a P-spline penalty in the least squares fitting criterion. But our aim is to propose P-splines as a powerful tool simultaneously for variable selection and functional parameters estimation. In that sense, the importance of smoothing the response variable before fitting the model is also studied. In summary, penalized (L1 and L2) and nonpenalized regression are combined with a presmoothing of the response variable sample curves, based on regression splines or P-splines, providing a total of six approaches to be compared in two simulation schemes. Finally, the most competitive approach is applied to a real data set based on the graft-versus-host disease, which is one of the most frequent complications (30% –50%) in allogeneic hematopoietic stem-cell transplantation.  相似文献   

8.
Cross-validation is the standard method for hyperparameter tuning, or calibration, of machine learning algorithms. The adaptive lasso is a popular class of penalized approaches based on weighted L1-norm penalties, with weights derived from an initial estimate of the model parameter. Although it violates the paramount principle of cross-validation, according to which no information from the hold-out test set should be used when constructing the model on the training set, a “naive” cross-validation scheme is often implemented for the calibration of the adaptive lasso. The unsuitability of this naive cross-validation scheme in this context has not been well documented in the literature. In this work, we recall why the naive scheme is theoretically unsuitable and how proper cross-validation should be implemented in this particular context. Using both synthetic and real-world examples and considering several versions of the adaptive lasso, we illustrate the flaws of the naive scheme in practice. In particular, we show that it can lead to the selection of adaptive lasso estimates that perform substantially worse than those selected via a proper scheme in terms of both support recovery and prediction error. In other words, our results show that the theoretical unsuitability of the naive scheme translates into suboptimality in practice, and call for abandoning it.  相似文献   

9.
LetL be a Leslie population matrix. Leslie (1945) and others have shown that the matrixL has a leading positive eigenvalueλ 0 and that in general: (1) $$\mathop {\lim }\limits_{t \to \infty } \frac{{L^t X}}{{\lambda _0^t }} = \gamma X_{\lambda _0 } $$ whereX λ 0 is an eigenvector corresponding toλ 0,X is any initial population vector, and γ is a scalar quantity detormined byX. In this article we generalize (1) exhaustively by removing the mild restrictions on the fertility rates which most writers impose. The result is an oscillatory limit of a kind first noted by Bernardelli (1941) and Lewis (1942) and described by Bernardelli as “population waves”. We calculate in terms ofλ 0 and the entries of the matrixL the values of this oscillatory limit as well as its time-independent average over one period. This calculation includes as its leading special case the result of (1), confirming incidentally that γ is nonzero. To stabilize a population, the matrixL must be adjusted so thatλ 0=1. The limits calculated for the oscillatory and non-oscillatory cases then have maximum significance since they represent the limiting population vectors. We discuss a simple scheme for accomplishing stanbilization which yields as a byproduct an easily accessible scalar measure ofL's tendency to promote population growth. The reciprocal of this measure is the familiar net reproduction rate.  相似文献   

10.
Biophysical understanding of membrane domains requires accurate knowledge of their structural details and elasticity. We report on a global small angle x-ray scattering data analysis technique for coexisting liquid-ordered (Lo) and liquid-disordered (Ld) domains in fully hydrated multilamellar vesicles. This enabled their detailed analysis for differences in membrane thickness, area per lipid, hydrocarbon chain length, and bending fluctuation as demonstrated for two ternary mixtures (DOPC/DSPC/CHOL and DOPC/DPPC/CHOL) at different cholesterol concentrations. Lo domains were found to be ∼10 Å thicker, and laterally up to 20 Å2/lipid more condensed than Ld domains. Their bending fluctuations were also reduced by ∼65%. Increase of cholesterol concentration caused significant changes in structural properties of Ld, while its influence on Lo properties was marginal. We further observed that temperature-induced melting of Lo domains is associated with a diffusion of cholesterol to Ld domains and controlled by Lo/Ld thickness differences.  相似文献   

11.
Gusta LV 《Plant physiology》1975,56(5):707-709
The freezing of water in acclimated and nonacclimated cereals was studied using pulsed nuclear magnetic resonance spectroscopy. The quantity of unfreezable water per unit dry matter was not strongly dependent on the degree of cold acclimation. In contrast, the fraction of water frozen which was tolerated by nonacclimated winter cereals and by an acclimated spring wheat (Triticum aestivum L.) was less than in acclimated hardy cereals. The freezing curves had the following form:LT = L0ΔTm/T + KLT and L0 are liquid water per unit dry matter at T and 0 C, respectively. ΔTm is the melting point depression and K is the liquid water which does not freeze.  相似文献   

12.
Bechhofer and Turnbull (1978) proposed two procedures to compare k normal means with a standard and the procedures guarantee that (1) with probability at least P0* (specified), no category is selected when the best experimental category is sufficiently worst than the standard, and (2) with probability at least P1* (specified), the best experimental category is selected when it is sufficiently better than the second best and the standard. For the case of common known variance, they studied a single-stage procedure. For the case of common unknown variance, they studied a two-stage procedure. Under the same formulation of Bechhofer and Turnbull (1978) and for the same selection goals (1) and (2) described above, Wilcox (1984a) proposed a procedure to the case of unknown and unequal variances, and supplied a table of the necessary constants to implement the procedure. This paper considers the case of unknown and unequal variances for the same formulation of Bechhofer and Turnbull, and Wilcox, but assumes that μ0 is an unknown control. A two-stage procedure is proposed to solve the problem. A lower bound of the probability of a correct selection is derived and it takes the same form as the double integral appeared in Rinott (1978) which was used for the lower bound of the probability of a correct selection for a different selection goal.  相似文献   

13.
A study of the metal-to-metal charge-transfer (MMCT) transition within the binuclear cyano-bridged complexes cis-[L13CoIII(μ-NC)FeII(CN)5] (L13 = 12-methyl-1,4,7,10-tetraazacyclotridecan-12-amine), trans-[L14CoIII(μ-NC)FeII(CN)5] (L14 = 6-methyl-1,4,8,11-tetraazacyclotetradecan-6-amine) and trans-[L15CoIII(μ-NC)FeII(CN)5] (L15 = 10-methyl-1,4,8,12-tetraazacyclopentadecan-10-amine) has been carried out in electrolyte solutions at varying concentrations. Using these data, as well as the reaction free energies obtained from electrochemical measurements, the reorganisation and activation free energies for the forward and reverse thermal electron-transfer processes have been estimated. The changes of these parameters with the electrolyte concentration, as well as those of the energy of the maximum MMCT band and the reaction free energy, are mainly due to ion-pairing effects.  相似文献   

14.
We describe the synthesis, structure and reactivity of novel bis(1-alkenyl)platinum(II) complexes, Pt[CH2(CH2)nCHCH2]2L2 (where L2 = dppp, dppe, dppm and n = 1, 2). The stability of the title complexes with the different ligands is discussed. The steric, chelating and electronic properties of the ligands have a significant impact on the structure as well as on the reactivity of the complexes. Novel reactions with elemental sulfur and carbon dioxide are described and discussed.  相似文献   

15.
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.  相似文献   

16.
Nickel(II) complexes of N,N′-dimethyl-N,N′-bis(pyridyl-2yl-methyl)ethylene-diamine (L1), N,N′-dimethyl-N,N′-bis(pyridyl-2-ylmethyl)-1,2-diaminopropane (L2) and N,N′-dimethyl-N,N′-bis(pyridyl-2-ylmethyl)-1,3-diaminopropane (L3) were prepared and their spectroscopic and redox properties studied. The distorted octahedral structure was determined for [NiL3ClCH3OH](ClO4) by using X-ray crystallography. The electronic spectral behavior of the complexes at different pHs was analyzed; it is shown that a new band grew at the expense of the other band intensity in acid media. The redox properties of ligands and their complexes show the peaks of Ni(II) → Ni(III) and Ni(II) → Ni(0) as these were detected at low concentration while Ni(II) → Ni(I) process was detectable clearly at high concentration. Furthermore, the interaction studies of 2-mercaptoethanesulfonic acid as a simulator of coenzyme M reductase (CoM) with NiN4 chromophores are discussed.  相似文献   

17.
A new class of asymmetric N-capped (dianionic/trianionic) tripodal proligands [Hx(Ln)] (x = 2, n = 1-6; x = 3, n = 7, 8) which possess pendant arms with N2OS, N2S2 or NOS2 donor groups and with different chelate ring sizes {5,5,5} or {5,6,5} has been prepared. Treatment of these ligands with [WO2Cl2(dme)] (dme = 1,2-dimethoxyethane) in the presence of base (triethylamine or KOH) leads to the formation of cis-dioxotungsten(VI) complexes of the types [WO2(Ln)] (n = 1-6) and K[WO2(Ln)] (n = 7, 8). Reaction of these tetradentate ligands with [MoO2(acac)2] (acac = acetylacetonate) gives the corresponding Mo(VI) analogues [MoO2(Ln)] (n = 1-6) and K[MoO2(Ln)] (n = 7, 8). Moreover, a new five coordinate dioxomolybdenum(VI) complex with an NS2 tridentate ligand [MoO2(L9)] has been synthesised using similar procedure. All these compounds have been spectroscopically characterised and the molecular structures of [MoO2(Ln)] (n = 2, 6) and [WO2(L6)] have been established by X-ray diffraction analysis. The electrochemistry and the catalytic activity for oxidation of allylic and benzylic alcohols of these dioxo complexes have also been investigated.  相似文献   

18.
In this work, we utilize micropipette aspiration and fluorescence imaging to examine the material properties of lipid vesicles made from mixtures of palmitoyloleoylphosphocholine (POPC) and dipalmitoylphosphatidylcholine (DPPC). At elevated temperatures/low DPPC fractions, these lipids are in a miscible liquid crystalline (Lα) state, whereas at lower temperatures/higher DPPC fractions they phase-separate into Lα and gel phases. We show that the elastic modulus, K, and critical tension, τc, of Lα vesicles are independent of DPPC fraction. However, as the sample temperature is increased from 15°C to 45°C, we measure decreases in both K and τc of 20% and 50%, respectively. The elasticity change is likely driven by a change in interfacial tension. We describe the reduction in critical tension using a simple model of thermally activated membrane pores. Vesicles with two-phase coexistence exhibit material properties that differ from Lα vesicles including critical tensions that are 20–40% lower. Fluorescence imaging of phase coexistent POPC/DPPC vesicles shows that the DPPC-rich domains exist in an extended network structure that exhibits characteristics of a solid. This gel network explains many of the unusual material properties of two-phase membranes.  相似文献   

19.
The von Bertalanffy growth equation (VBGE) is commonly used in ecology and fisheries management to model individual growth of an organism. Generally, a nonlinear regression is used with length-at-age data to recover key life history parameters: L (asymptotic size), k (the growth coefficient), and t 0 (a time used to calculate size at age 0). However, age data are often unavailable for many species of interest, which makes the regression impossible. To confront this problem, we have developed a Bayesian model to find L using only length data. We use length-at-age data for female blue shark, Prionace glauca, to test our hypothesis. Preliminary comparisons of the model output and the results of a nonlinear regression using the VBGE show similar estimates of L . We also developed a full Bayesian model that fits the VBGE to the same data used in the classical regression and the length-based Bayesian model. Classical regression methods are highly sensitive to missing data points, and our analysis shows that fitting the VBGE in a Bayesian framework is more robust. We investigate the assumptions made with the traditional curve fitting methods, and argue that either the full Bayesian or the length-based Bayesian models are preferable to classical nonlinear regressions. These methods clarify and address assumptions␣made in classical regressions using von Bertalanffy growth and facilitate more detailed stock assessments of species for which data are sparse.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号