首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Spatial smoothing and hot spot detection for CGH data using the fused lasso   总被引:4,自引:0,他引:4  
We apply the "fused lasso" regression method of (TSRZ2004) to the problem of "hot- spot detection", in particular, detection of regions of gain or loss in comparative genomic hybridization (CGH) data. The fused lasso criterion leads to a convex optimization problem, and we provide a fast algorithm for its solution. Estimates of false-discovery rate are also provided. Our studies show that the new method generally outperforms competing methods for calling gains and losses in CGH data.  相似文献   

2.
The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high‐dimensional models where the number of covariates is much larger than the number of observations ( $p \,{\gg }\, n$ ) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L1‐penalized Cox regression using the lasso (Tibshirani ( 1997 ). Statistics in Medicine 16 , 385–395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li ( 2001 ). Journal of the American Statistical Association 96 , 1348–1360; Fan and Li ( 2002 ). The Annals of Statistics 30 , 74–99). The purpose of this article is to implement them practically into the model building process when analyzing high‐dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou ( 2006 ). Journal of the American Statistical Association 101 , 1418–1429). We compare them with “standard” applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research.  相似文献   

3.
Discovering regulatory interactions from time-course gene expression data constitutes a canonical problem in functional genomics and systems biology. The framework of graphical Granger causality allows one to estimate such causal relationships from these data. In this study, we propose an adaptively thresholding estimates of Granger causal effects obtained from the lasso penalization method. We establish the asymptotic properties of the proposed technique, and discuss the advantages it offers over competing methods, such as the truncating lasso. Its performance and that of its competitors is assessed on a number of simulated settings and it is applied on a data set that captures the activation of T-cells.  相似文献   

4.
Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. ) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1‐se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1‐se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets  相似文献   

5.
Microcin J25 (MccJ25) is a plasmid-encoded, ribosomally synthesized antibacterial peptide with a unique lasso structure. The lasso structure, produced with the aid of two processing enzymes, provides exceptional stability to MccJ25. We report the synthesis of six peptides (1-6), derived from the MccJ25 sequence, that are designed to form folded conformation by disulfide bond formation and electrostatic or hydrophobic interactions. Two peptides (1 and 6) display good activity against Salmonella newport, and are the first synthetic derivatives of MccJ25 that are bactericidal. Peptide 1 displays potent activity against several Salmonella strains including two MccJ25 resistant strains. The solution conformation and the stability studies of the active peptides suggest that they do not fold into a lasso conformation and peptide 1 displays antimicrobial activity by inhibition of target cell respiration. Like MccJ25, the synthetic MccJ25 derivatives display minimal toxicity to mammalian cells suggesting that these peptides act specifically on bacterial cells.  相似文献   

6.
Many estimators of the average effect of a treatment on an outcome require estimation of the propensity score, the outcome regression, or both. It is often beneficial to utilize flexible techniques, such as semiparametric regression or machine learning, to estimate these quantities. However, optimal estimation of these regressions does not necessarily lead to optimal estimation of the average treatment effect, particularly in settings with strong instrumental variables. A recent proposal addressed these issues via the outcome-adaptive lasso, a penalized regression technique for estimating the propensity score that seeks to minimize the impact of instrumental variables on treatment effect estimators. However, a notable limitation of this approach is that its application is restricted to parametric models. We propose a more flexible alternative that we call the outcome highly adaptive lasso. We discuss the large sample theory for this estimator and propose closed-form confidence intervals based on the proposed estimator. We show via simulation that our method offers benefits over several popular approaches.  相似文献   

7.
【目的】套索肽作为一类核糖体翻译后修饰肽(RiPPs)广泛分布于放线菌中,以其独特的修饰结构和多样的生理活性受到了广泛的关注。为了更好地研究未知的套索肽,期望开发基于链霉菌的无细胞转录翻译平台(下称“无细胞平台”)实现无细胞合成套索肽或其前体肽。【方法】首先尝试以不同的链霉菌构建无细胞合成平台,并以绿色荧光蛋白为报告蛋白对平台产率进行优化;在构建合适稳定的表达体系后,将包含有套索肽生物合成基因的质粒引入体系中以探索套索肽的无细胞合成。【结果】在对基于模式菌株Streptomyces lividans TK24的无细胞体系进行制备工艺、体系组分、反应条件等多个参数进行优化后,该体系最高能达到90μg/mL的荧光蛋白表达量;基于该体系成功表达了目标套索肽的前体肽,并通过融合SUMO标签增加前体肽在该体系中的稳定性。【结论】本研究成功构建了一类链霉菌无细胞平台,为丰富来源的基因表达提供了可能性。尽管该体系在对表达套索肽未知蛋白的适用性上仍有待进一步提升,但无细胞平台在天然产物的探索中将起到越来越重要的作用。  相似文献   

8.
Background: For understanding biological cellular systems, it is important to analyze interactions between protein residues and RNA bases. A method based on conditional random fields (CRFs) was developed for predicting contacts between residues and bases, which receives multiple sequence alignments for given protein and RNA sequences, respectively, and learns the model with many parameters involved in relationships between neighboring residue-base pairs by maximizing the pseudo likelihood function. Methods: In this paper, we proposed a novel CRF-based model with more complicated dependency relationships between random variables than the previous model, but which takes less parameters for the sake of avoidance of overfitting to training data. Results: We performed cross-validation experiments for evaluating the proposed model, and took the average of AUC (area under receiver operating characteristic curve) scores. The result suggests that the proposed CRF-based model without using L1-norm regularization (lasso) outperforms the existing model with and without the lasso under several input observations to CRFs. Conclusions: We proposed a novel stochastic model for predicting protein-RNA residue-base contacts, and improved the prediction accuracy in terms of the AUC score. It implies that more dependency relationships in a CRF could be controlled by less parameters.  相似文献   

9.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

10.
Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high‐dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.  相似文献   

11.
As many wildlife species, including wading birds, adapt to anthropogenic landscapes and, in some cases, exhibit altered behaviors, studies that involve capturing birds may require new methods better suited for use in urban areas and to accommodate altered animal behavior. We developed two novel techniques, a leg lasso and flip net, for capturing American White Ibises (Eudocimus albus) in urban environments in southern Florida, and also used a traditional technique (mist‐nets) in non‐urban wetland habitats. The flip net and leg lasso were developed to capture White Ibises habituated to the presence of humans. Ibises were captured in urban and wetland environments from October 2015 to August 2017 in Palm Beach, Broward, and Lee counties, Florida. We captured 6.0 ± 13.5 ibis/h with the flip net, 1.6 ± 0.8 ibis/h with the leg lasso, and 0.5 ± 2.6 ibis/h with mist‐nets. We captured larger (higher mass to tarsus length ratio) birds using the flip net and leg lasso than using mist‐nets, and captured more males with leg lassos than with other two techniques. The novel techniques we used are efficient, cost effective, easy to use, and also potentially useful for capturing other species of birds. Leg lassos and flip nets are also safe to use in populated areas for both birds and humans.  相似文献   

12.
The antimicrobial peptide microcin J25 (MccJ25) is posttranslationally matured from a linear preprotein into its native lasso conformation by two enzymes. One of these enzymes cleaves the preprotein and the second enzyme installs the requisite isopeptide bond to establish the lasso structure. Analysis of a mimic of MccJ25 that can be cyclized without the influence of the maturation enzymes suggests that MccJ25 does not spontaneously adopt a near-lasso structure. In addition, we conducted atomistically detailed replica-exchange molecular dynamics simulations of pro-microcin J25 (pro-MccJ25), the 21-residue uncyclized analog of MccJ25, to determine the conformational ensemble explored in the absence of the leader sequence or maturation enzymes. We applied a nonlinear dimensionality reduction technique known as the diffusion map to the simulation trajectories to extract two global order parameters describing the fundamental dynamical motions of the system, and identify three distinct pathways. One path corresponds to the spontaneous adoption of a left-handed lasso, in which the N-terminus wraps around the C-terminus in the opposite sense to the right-handed topology of native MccJ25. Our computational and experimental results suggest a role for the MccJ25 leader sequence and/or its maturation enzymes in facilitating the adoption of the right-handed topology.  相似文献   

13.
Direct chemical labeling of antibody produces molecules with poorly defined modifications. Use of a small antibody‐binding protein as an adapter can simplify antibody functionalization by forming a specific antibody‐bound complex and introducing site‐specific modifications. To stabilize a noncovalent antibody complex that may be used without chemical crosslinking, a bivalent antibody‐binding protein is engineered with an improved affinity of interaction by joining two Z domains with a conformationally flexible linker. The linker is essential for the increase in affinity because it allows simultaneous binding of both domains. The molecule is further circularized using a split intein, creating a novel adapter protein (“lasso”), which binds human immunoglobulin G1 (IgG1) with K D = 0.53 n m and a dissociation rate that is 55‐ to 84‐fold slower than Z. The lasso contains a unique cysteine for conjugation with a reporter and may be engineered to introduce other functional groups, including a biotin tag and protease recognition sequences. When used in enzyme‐linked immunosorbent assay (ELISA), the lasso generates a stronger reporter signal compared to a secondary antibody and lowers the limit of detection by 12‐fold. The small size of the lasso and a long half‐life of dissociation make the peptide a useful tool in antibody detection and immobilization.  相似文献   

14.

With the increasing availability of microbiome 16S data, network estimation has become a useful approach to studying the interactions between microbial taxa. Network estimation on a set of variables is frequently explored using graphical models, in which the relationship between two variables is modeled via their conditional dependency given the other variables. Various methods for sparse inverse covariance estimation have been proposed to estimate graphical models in the high-dimensional setting, including graphical lasso. However, current methods do not address the compositional count nature of microbiome data, where abundances of microbial taxa are not directly measured, but are reflected by the observed counts in an error-prone manner. Adding to the challenge is that the sum of the counts within each sample, termed “sequencing depth,” is an experimental technicality that carries no biological information but can vary drastically across samples. To address these issues, we develop a new approach to network estimation, called BC-GLASSO (bias-corrected graphical lasso), which models the microbiome data using a logistic normal multinomial distribution with the sequencing depths explicitly incorporated, corrects the bias of the naive empirical covariance estimator arising from the heterogeneity in sequencing depths, and builds the inverse covariance estimator via graphical lasso. We demonstrate the advantage of BC-GLASSO over current approaches to microbial interaction network estimation under a variety of simulation scenarios. We also illustrate the efficacy of our method in an application to a human microbiome data set.

  相似文献   

15.
A shuttle vector pHSG396Sp was constructed to perform gene expression using Sphingomonas subterranea as a host. A new lasso peptide biosynthetic gene cluster, derived from Brevundimonas diminuta, was amplified by PCR and integrated to afford a expression vector pHSG396Sp-12697L. The new lasso peptide brevunsin was successfully produced by S. subterranea, harboring the expression vector, with a high production yield (10.2 mg from 1 L culture). The chemical structure of brevunsin was established by NMR and MS/MS experiments. Based on the information obtained from the NOE experiment, the three-dimensional structure of brevunsin was determined, which indicated that brevunsin possessed a typical lasso structure. This expression vector system provides a new heterologous production method for unexplored lasso peptides that are encoded by bacterial genomes.  相似文献   

16.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear‐mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm‐based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.  相似文献   

17.
The glucagon receptor antagonist BI-32169, recently isolated from Streptomyces sp., was described as a bicyclic peptide, although its primary structure comprises conserved elements of class I and class II lasso peptides. Tandem mass spectrometric and nuclear magnetic resonance spectroscopic studies revealed that BI-32169 is a lasso-structured peptide constituting the new class III of lasso peptides. The determined lasso fold opens new avenues to improve the promising biological activity of BI-32169.  相似文献   

18.
Lasso peptide isopeptidase is an enzyme that specifically hydrolyzes the isopeptide bond of lasso peptides, rendering these peptides linear. To carry out a detailed structure-activity analysis of the lasso peptide isopeptidase AtxE2 from Asticcacaulis excentricus, we solved NMR structures of its substrates astexin-2 and astexin-3. Using in vitro enzyme assays, we show that the C-terminal tail portion of these peptides is dispensable with regards to isopeptidase activity. A collection of astexin-2 and astexin-3 variants with alanine substitutions at each position within the ring and the loop was constructed, and we showed that all of these peptides except for one were cleaved by the isopeptidase. Thus, much like the lasso peptide biosynthetic enzymes, lasso peptide isopeptidase has broad substrate specificity. Quantitative analysis of the cleavage reactions indicated that alanine substitutions in loop positions of these peptides led to reduced cleavage, suggesting that the loop is serving as a recognition element for the isopeptidase.  相似文献   

19.
Lu Xia  Bin Nan  Yi Li 《Biometrics》2023,79(1):344-357
Modeling and drawing inference on the joint associations between single-nucleotide polymorphisms and a disease has sparked interest in genome-wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the debiased lasso approach (van de Geer et al., 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this “large n, diverging p” scenario, we propose an alternative debiased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original debiased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposed refined debiased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large-scale hospital-based epidemiology cohort study investigating the joint effects of genetic variants on lung cancer risks.  相似文献   

20.
ABSTRACT: BACKGROUND: Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. RESULTS: We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions, that is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. CONCLUSIONS: The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号