首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

2.
3.
Data‐driven materials discovery has become increasingly important in identifying materials that exhibit specific, desirable properties from a vast chemical search space. Synergic prediction and experimental validation are needed to accelerate scientific advances related to critical societal applications. A design‐to‐device study that uses high‐throughput screens with algorithmic encodings of structure–property relationships is reported to identify new materials with panchromatic optical absorption, whose photovoltaic device applications are then experimentally verified. The data‐mining methods source 9431 dye candidates, which are auto‐generated from the literature using a custom text‐mining tool. These candidates are sifted via a data‐mining workflow that is tailored to identify optimal combinations of organic dyes that have complementary optical absorption properties such that they can harvest all available sunlight when acting as co‐sensitizers for dye‐sensitized solar cells (DSSCs). Six promising dye combinations are shortlisted for device testing, whereupon one dye combination yields co‐sensitized DSSCs with power conversion efficiencies comparable to those of the high‐performance, organometallic dye, N719. These results demonstrate how data‐driven molecular engineering can accelerate materials discovery for panchromatic photovoltaic or other applications.  相似文献   

4.
Summary Second‐generation sequencing (sec‐gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings of A,C,G, or T's, between 30 and 100 characters long—which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base‐calling. The complexity of the base‐calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across‐sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec‐gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base‐calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base‐calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base‐calling performance.  相似文献   

5.
Many research questions in fields such as personalized medicine, drug screens or systems biology depend on obtaining consistent and quantitatively accurate proteomics data from many samples. SWATH‐MS is a specific variant of data‐independent acquisition (DIA) methods and is emerging as a technology that combines deep proteome coverage capabilities with quantitative consistency and accuracy. In a SWATH‐MS measurement, all ionized peptides of a given sample that fall within a specified mass range are fragmented in a systematic and unbiased fashion using rather large precursor isolation windows. To analyse SWATH‐MS data, a strategy based on peptide‐centric scoring has been established, which typically requires prior knowledge about the chromatographic and mass spectrometric behaviour of peptides of interest in the form of spectral libraries and peptide query parameters. This tutorial provides guidelines on how to set up and plan a SWATH‐MS experiment, how to perform the mass spectrometric measurement and how to analyse SWATH‐MS data using peptide‐centric scoring. Furthermore, concepts on how to improve SWATH‐MS data acquisition, potential trade‐offs of parameter settings and alternative data analysis strategies are discussed.  相似文献   

6.
Evaluating the classification accuracy of a candidate biomarker signaling the onset of disease or disease status is essential for medical decision making. A good biomarker would accurately identify the patients who are likely to progress or die at a particular time in the future or who are in urgent need for active treatments. To assess the performance of a candidate biomarker, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are commonly used. In many cases, the standard simple random sampling (SRS) design used for biomarker validation studies is costly and inefficient. In order to improve the efficiency and reduce the cost of biomarker validation, marker‐dependent sampling (MDS) may be used. In a MDS design, the selection of patients to assess true survival time is dependent on the result of a biomarker assay. In this article, we introduce a nonparametric estimator for time‐dependent AUC under a MDS design. The consistency and the asymptotic normality of the proposed estimator is established. Simulation shows the unbiasedness of the proposed estimator and a significant efficiency gain of the MDS design over the SRS design.  相似文献   

7.
Summary We propose a Bayesian chi‐squared model diagnostic for analysis of data subject to censoring. The test statistic has the form of Pearson's chi‐squared test statistic and is easy to calculate from standard output of Markov chain Monte Carlo algorithms. The key innovation of this diagnostic is that it is based only on observed failure times. Because it does not rely on the imputation of failure times for observations that have been censored, we show that under heavy censoring it can have higher power for detecting model departures than a comparable test based on the complete data. In a simulation study, we show that tests based on this diagnostic exhibit comparable power and better nominal Type I error rates than a commonly used alternative test proposed by Akritas (1988, Journal of the American Statistical Association 83, 222–230). An important advantage of the proposed diagnostic is that it can be applied to a broad class of censored data models, including generalized linear models and other models with nonidentically distributed and nonadditive error structures. We illustrate the proposed model diagnostic for testing the adequacy of two parametric survival models for Space Shuttle main engine failures.  相似文献   

8.
9.
10.
We present new inference methods for the analysis of low‐ and high‐dimensional repeated measures data from two‐sample designs that may be unbalanced, the number of repeated measures per subject may be larger than the number of subjects, covariance matrices are not assumed to be spherical, and they can differ between the two samples. In comparison, we demonstrate how crucial it is for the popular Huynh‐Feldt (HF) method to make the restrictive and often unrealistic or unjustifiable assumption of equal covariance matrices. The new method is shown to maintain desired α‐levels better than the well‐known HF correction, as demonstrated in several simulation studies. The proposed test gains power when the number of repeated measures is increased in a manner that is consistent with the alternative. Thus, even increasing the number of measurements on the same subject may lead to an increase in power. Application of the new method is illustrated in detail, using two different real data sets. In one of them, the number of repeated measures per subject is smaller than the sample size, while in the other one, it is larger.  相似文献   

11.
This paper discusses two‐sample comparison in the case of interval‐censored failure time data. For the problem, one common approach is to employ some nonparametric test procedures, which usually give some p‐values but not a direct or exact quantitative measure of the survival or treatment difference of interest. In particular, these procedures cannot provide a hazard ratio estimate, which is commonly used to measure the difference between the two treatments or samples. For interval‐censored data, a few nonparametric test procedures have been developed, but it does not seem to exist as a procedure for hazard ratio estimation. Corresponding to this, we present two procedures for nonparametric estimation of the hazard ratio of the two samples for interval‐censored data situations. They are generalizations of the corresponding procedures for right‐censored failure time data. An extensive simulation study is conducted to evaluate the performance of the two procedures and indicates that they work reasonably well in practice. For illustration, they are applied to a set of interval‐censored data arising from a breast cancer study.  相似文献   

12.
13.
Oligodendrocytes are a type of neuroglia that provide trophic support and insulation to axons in the central nervous system. The genesis and maturation of oligodendrocytes are essential processes for myelination and the course of CNS development. Using ion mobility‐enhanced, data‐independent acquisitions and 2D‐nanoUPLC fractionation operating at nanoscale flow rates, we established a comprehensive data set of proteins expressed by the human oligodendroglia cell line MO3.13. The final dataset incorporating all fractions comprised 223 531 identified peptides assigned to 10 390 protein hits, an improvement of 4.5 times on identified proteins described previously by our group using the same cell line. Identified proteins play pivotal roles in many biological processes such as cell growth and development and energy metabolism, providing a rich resource for future studies on oligodendrocyte development, myelination, axonal support, and the regulation of such process. Our results can help further studies that use MO3.13 cells as a tool of investigation, not only in relation to oligodendrocyte maturation, but also to diseases that have oligodendrocytes as key players. All MS data have been deposited in the ProteomeXchange with identifier PXD004696.  相似文献   

14.
15.
16.
In order to assign the absolute configurations of 8‐tert‐butyl‐2‐hydroxy‐7‐methoxy‐8‐methyl‐9‐oxa‐6‐azaspiro[4.5]dec‐6‐en‐10‐one ( 2a , 2b ), their esters ( 5a , 5b , 5c , 5d ) with (R)‐ or (S)‐2‐methoxyphenylacetic acid ( 4a , 4b ) have been synthesized. The absolute configurations of these compounds have been determined on the basis of NOESY correlations between the protons of the tert‐butyl group and the cyclopentane fragment of the molecules. The crucial part of this analysis was assignment of the absolute configuration at C‐5. Additionally, by calculation of the chemical shift anisotropy, δRS, for the relevant protons, it was also possible to confirm the absolute configurations at the C‐2 centres of compounds 2a , 2b and 5a , 5b , 5c , 5d . Chirality, 25:422–426, 2013.© 2013 Wiley Periodicals, Inc.  相似文献   

17.
Redundancy Analysis (RDA) is a well‐known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high‐dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi‐sRDA) framework is a prominent candidate for high‐dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi‐sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.  相似文献   

18.
A rapid micro‐scale solid‐phase micro‐extraction (SPME) procedure coupled with gas‐chromatography with flame ionized detector (GC‐FID) was used to extract parts per billion levels of a principle basmati aroma compound “2‐acetyl‐1‐pyrroline” (2‐AP) from bacterial samples. In present investigation, optimization parameters of bacterial incubation period, sample weight, pre‐incubation time, adsorption time, and temperature, precursors and their concentrations has been studied. In the optimized conditions, detection of 2‐AP produced by Bacillus cereus ATCC10702 using only 0.5 g of sample volume was 85 μg/kg. Along with 2‐AP, 15 other compounds produced by B. cereus were also reported out of which 14 were reported for the first time consisting mainly of (E)?2‐hexenal, pentadecanal, 4‐hydroxy‐2‐butanone, n‐hexanal, 2–6‐nonadienal, 3‐methoxy‐2(5H) furanone and 2‐acetyl‐1‐pyridine and octanal. High recovery of 2‐AP (87 %) from very less amount of B. cereus samples was observed. The method is reproducible fast and can be used for detection of 2‐AP production by B. cereus. © 2014 American Institute of Chemical Engineers Biotechnol. Prog., 30:1356–1363, 2014  相似文献   

19.
A sensitive and simple spectrofluorimetric method has been developed and validated for the determination of the anti‐epileptic drug carbamazepine (CBZ) in its dosage forms. The method was based on a nucleophilic substitution reaction of CBZ with 4‐chloro‐7‐nitrobenzo‐2‐ oxa‐1,3‐diazole (NBD‐Cl) in borate buffer (pH 9) to form a highly fluorescent derivative that was measured at 530 nm after excitation at 460 nm. Factors affecting the formation of the reaction product were studied and optimized, and the reaction mechanism was postulated. The fluorescence–concentration plot is rectilinear over the range of 0.6–8 µg/mL with limit of detection of 0.06 µg/mL and limit of quantitation of 0.19 µg/mL. The method was applied to the analysis of commercial tablets and the results were in good agreement with those obtained using the reference method. Validation of the analytical procedures was evaluated according to ICH guidelines. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
Advanced electrode materials with bendability and stretchability are critical for the rapid development of fully flexible/stretchable lithium‐ion batteries. However, the sufficiently stretchable lithium‐ion battery is still underdeveloped that is one of the biggest challenges preventing from realizing fully deformable power sources. Here, a low‐temperature hydrothermal synthesis of a cathode material for stretchable lithium‐ion battery is reported by the in situ growth of LiMn2O4 (LMO) nanocrystals inside 3D carbon nanotube (CNT) film networks. The LMO/CNT film composite has demonstrated the chemical bonding between the LMO active materials and CNT scaffolds, which is the most important characteristic of the stretchable electrodes. When coupled with a wrinkled MnOx /CNT film anode, a binder‐free, all‐manganese‐based stretchable full battery cell is assembled which delivers a high average specific capacity of ≈97 mA h g?1 and stabilizes after over 300 cycles with an enormous strain of 100%. Furthermore, combining with other merits such as low cost, natural abundance, and environmentally friendly, the all‐manganese design is expected to accelerate the practical applications of stretchable lithium‐ion batteries for fully flexible and biomedical electronics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号