首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A modified amylose containing 10% of tritiated D-allose residues has been hydrolyzed by porcine pancreatic alpha amylase (PPA). This reaction produced a number of radioactive oligosaccharides of low molecular weight, including modified mono-, di-, and tri-saccharides, as well as larger products. Analysis of these products by chemical and enzymic methods identified D-allose, two isomers of modified maltose, and isomers of modified maltotriose. These results may be interpreted in terms of current PPA models to indicate that D-allose residues may be productively bound at all five subsites of the active site of the enzyme. The distribution of modified residues in these products, however, further suggests that productive binding of D-allose at the subsite where catalytic attack occurs (subsite 3) is less favorable than binding of D-glucose. These results are compared with results of a series of PPA substrates having modifications at C-3 and at other positions. Trends observed in enzyme hydrolysis of these modified substrates reflect factors that contribute to PPA catalysis, with respect to steric, electronic, and hydrogen-bonding interactions between enzyme and substrate.  相似文献   

2.
Attaining personalized healthy aging requires accurate monitoring of physiological changes and identifying subclinical markers that predict accelerated or delayed aging. Classic biostatistical methods most rely on supervised variables to estimate physiological aging and do not capture the full complexity of inter-parameter interactions. Machine learning (ML) is promising, but its black box nature eludes direct understanding, substantially limiting physician confidence and clinical usage. Using a broad population dataset from the National Health and Nutrition Examination Survey (NHANES) study including routine biological variables and after selection of XGBoost as the most appropriate algorithm, we created an innovative explainable ML framework to determine a Personalized physiological age (PPA). PPA predicted both chronic disease and mortality independently of chronological age. Twenty-six variables were sufficient to predict PPA. Using SHapley Additive exPlanations (SHAP), we implemented a precise quantitative associated metric for each variable explaining physiological (i.e., accelerated or delayed) deviations from age-specific normative data. Among the variables, glycated hemoglobin (HbA1c) displays a major relative weight in the estimation of PPA. Finally, clustering profiles of identical contextualized explanations reveal different aging trajectories opening opportunities to specific clinical follow-up. These data show that PPA is a robust, quantitative and explainable ML-based metric that monitors personalized health status. Our approach also provides a complete framework applicable to different datasets or variables, allowing precision physiological age estimation.  相似文献   

3.
A model-based rational strategy for the selection of chromatographic resins is presented. The main question being addressed is that of selecting the most optimal chromatographic resin from a few promising alternatives. The methodology starts with chromatographic modeling,parameters acquisition, and model validation, followed by model-based optimization of the chromatographic separation for the resins of interest. Finally, the resins are rationally evaluated based on their optimized operating conditions and performance metrics such as product purity, yield, concentration, throughput, productivity, and cost. Resin evaluation proceeds by two main approaches. In the first approach, Pareto frontiers from multi-objective optimization of conflicting objectives are overlaid for different resins, enabling direct visualization and comparison of resin performances based on the feasible solution space. The second approach involves the transformation of the resin performances into weighted resin scores, enabling the simultaneous consideration of multiple performance metrics and the setting of priorities. The proposed model-based resin selection strategy was illustrated by evaluating three mixed mode adsorbents (ADH, PPA, and HEA) for the separation of a ternary mixture of bovine serum albumin, ovalbumin, and amyloglucosidase. In order of decreasing weighted resin score or performance, the top three resins for this separation were ADH [PPA[HEA. The proposed model-based approach could be a suitable alternative to column scouting during process development, the main strengths being that minimal experimentation is required and resins are evaluated under their ideal working conditions, enabling a fair comparison. This work also demonstrates the application of column modeling and optimization to mixed mode chromatography.  相似文献   

4.
In this paper, we investigate K‐group comparisons on survival endpoints for observational studies. In clinical databases for observational studies, treatment for patients are chosen with probabilities varying depending on their baseline characteristics. This often results in noncomparable treatment groups because of imbalance in baseline characteristics of patients among treatment groups. In order to overcome this issue, we conduct propensity analysis and match the subjects with similar propensity scores across treatment groups or compare weighted group means (or weighted survival curves for censored outcome variables) using the inverse probability weighting (IPW). To this end, multinomial logistic regression has been a popular propensity analysis method to estimate the weights. We propose to use decision tree method as an alternative propensity analysis due to its simplicity and robustness. We also propose IPW rank statistics, called Dunnett‐type test and ANOVA‐type test, to compare 3 or more treatment groups on survival endpoints. Using simulations, we evaluate the finite sample performance of the weighted rank statistics combined with these propensity analysis methods. We demonstrate these methods with a real data example. The IPW method also allows us for unbiased estimation of population parameters of each treatment group. In this paper, we limit our discussions to survival outcomes, but all the methods can be easily modified for any type of outcomes, such as binary or continuous variables.  相似文献   

5.
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero‐inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation‐maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open‐source R package mpath .  相似文献   

6.
Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.  相似文献   

7.
South-east Queensland (Australia) streams were described by 21 local habitat variables that were chosen because of their potential association with fish distribution. An Assessment by a Nearest Neighbour Analysis (ANNA) model used large-scale variables that are robust to human influence to predict what the values of each of the 21 local habitat variables at each site would be without modification from human activity. The ANNA model used elevation, stream order, distance from source and longitude to predict the local habitat variables; other candidate predictor variables (mean rainfall, latitude and catchment area) were not found to be useful. The ANNA model was able to predict five of the 21 local habitat variables (average width, sand (%), cobble (%), rocks (%) and large woody debris) with an R 2 of at least 0.2. The observed values of these five local habitat variables were used to model the distributions of individual fish species. The species distribution models were developed using logistic regression based on a subset of the data (some of the data were withheld for model validation) and a forward stepwise model selection procedure. There was no difference in predictive performance of fish distribution models for model predictions based on observed values and model predictions based on ANNA predicted values of local habitat variables in the withheld data (p-value = 0.85). Therefore, it is possible to predict the suitability of sites as habitat for given fish species using estimated (estimates based on large-scale variables) natural values of local habitat variables.  相似文献   

8.
Summary Several statistical methods for detecting associations between quantitative traits and candidate genes in structured populations have been developed for fully observed phenotypes. However, many experiments are concerned with failure‐time phenotypes, which are usually subject to censoring. In this article, we propose statistical methods for detecting associations between a censored quantitative trait and candidate genes in structured populations with complex multiple levels of genetic relatedness among sampled individuals. The proposed methods correct for continuous population stratification using both population structure variables as covariates and the frailty terms attributable to kinship. The relationship between the time‐at‐onset data and genotypic scores at a candidate marker is modeled via a parametric Weibull frailty accelerated failure time (AFT) model as well as a semiparametric frailty AFT model, where the baseline survival function is flexibly modeled as a mixture of Polya trees centered around a family of Weibull distributions. For both parametric and semiparametric models, the frailties are modeled via an intrinsic Gaussian conditional autoregressive prior distribution with the kinship matrix being the adjacency matrix connecting subjects. Simulation studies and applications to the Arabidopsis thaliana line flowering time data sets demonstrated the advantage of the new proposals over existing approaches.  相似文献   

9.
MOTIVATION: Although population-based association mapping may be subject to the bias caused by population stratification, alternative methods that are robust to population stratification such as family-based linkage analysis have lower mapping resolution. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate genes and traits of interest. The association between a candidate gene and a quantitative trait is often evaluated via a regression model with inferred population structure variables as covariates, where the residual distribution is customarily assumed to be from a symmetric and unimodal parametric family, such as a Gaussian, although this may be inappropriate for the analysis of many real-life datasets. RESULTS: In this article, we proposed a new structured association (SA) test. Our method corrects for continuous population stratification by first deriving population structure and kinship matrices through a set of random genetic markers and then modeling the relationship between trait values, genotypic scores at a candidate marker and genetic background variables through a semiparametric model, where the error distribution is modeled as a mixture of Polya trees centered around a normal family of distributions. We compared our model to the existing SA tests in terms of model fit, type I error rate, power, precision and accuracy by application to a real dataset as well as simulated datasets.  相似文献   

10.
大白猪和通城猪全基因组选择性清扫分析   总被引:1,自引:0,他引:1  
李秀领  杨松柏  唐中林  李奎  刘榜  樊斌 《遗传》2012,(10):53-63
长期的人工选择使猪的生产性能得到显著提高,与选择相关的基因组区域也随之发生特定遗传变异表征(选择信号)。不同类型品种所受到选择强度不一,选择信号亦不相同,选择性清扫分析已逐渐成为选择信号的主要检测手段。文章基于商用型大白猪(n=45)和地方猪品种通城猪(n=45)的猪60K SNP芯片分型数据,借助遗传分化系数Fst法进行选择信号检测分析。利用gPLINK软件设定质控标准,共计34 304个SNPs被筛选出用于统计分析。使用Genepop软件包计算两个猪品种之间的遗传分化参数Fst,所得Fst平均值为0.3209。选取Fst>0.7036(即占总Fst值数目的 1%),共计344个SNPs被选择出来。SNP位置注释显示这些位点涉及到79个候选基因(Sus scrofa Build 9)。利用在线软件Ingenuity Pathway Analysis对候选基因的生物学通路进行网络分析,发现它们多与生长繁殖及免疫应答有关,如NCOA6、ERBB4、RUNX2和APOB等基因。研究结果为进行猪产肉、抗病等性状候选基因和致因突变深入挖掘提供了有益参考。  相似文献   

11.
Summary Selections from factor and principal component analyses were compared with those from the Smith-Hazel index when selecting for several switchgrass (Panicum virgatum L.) traits. The objective of this study was to examine several alternatives to index selection. Such procedures would potentially eliminate problems of selection associated with Smith-Hazel indices, including errors in genetic parameter estimates and difficulty in assigning relative economic weights to traits. Selection was performed on 1,280 plants that were evaluated over 2 years at 1 location, in a randomized complete block design with 4 replicates. The plants were evaluated for forage yield and several forage quality traits. The comparisons of index selection with principal factor analysis, maximum-likelihood factor analysis and principal component analysis were made for three sets of traits (five traits per set) to estimate repeatability for the comparisons. Multivariate analyses were performed on both simple and genotypic correlation matrices. Comparisons were made by computing Spearman's rank correlations between selection index plant scores and scores computed from multivariate analysis and by determining the number of plants selected in common for the selection methods. Among the three multivariate analysis methods evaluated in this study, principal component analysis had the highest correlation with index selection. The high correlation for principal component analysis of simple correlation matrices indicates the potential for using this statistical method for selection purposes. This would permit the breeder to reduce field costs (e.g., time, labor, equipment) required to obtain the genetic parameter estimates necessary to construct selection indices.  相似文献   

12.
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.  相似文献   

13.
Imputation, weighting, direct likelihood, and direct Bayesian inference (Rubin, 1976) are important approaches for missing data regression. Many useful semiparametric estimators have been developed for regression analysis of data with missing covariates or outcomes. It has been established that some semiparametric estimators are asymptotically equivalent, but it has not been shown that many are numerically the same. We applied some existing methods to a bladder cancer case-control study and noted that they were the same numerically when the observed covariates and outcomes are categorical. To understand the analytical background of this finding, we further show that when observed covariates and outcomes are categorical, some estimators are not only asymptotically equivalent but also actually numerically identical. That is, although their estimating equations are different, they lead numerically to exactly the same root. This includes a simple weighted estimator, an augmented weighted estimator, and a mean-score estimator. The numerical equivalence may elucidate the relationship between imputing scores and weighted estimation procedures.  相似文献   

14.
Modified α-d-(1 → 4)-glucans containing a small proportion of 14C-labeled 2-deoxy-d-glucose or 2-amino-2-deoxy-d-glucose were examined as substrates for porcine pancreatic α-amylase (PPA). Cyclomaltoheptaose containing single 2-deoxy-d-glucose residues, synthesized by incubation of 2-deoxyglucosylglycogen with cyclomaltodextrin glucanotransferase in the presence of Triton X-100, was hydrolyzed by PPA to produce 2-deoxy-d-glucose; two isomers of 2-deoxymaltose, and a mixture of modified maltotrioses. These results indicate that 2-deoxy-d-glucose may be productively bound at all five subsites of the PPA active site. Reaction kinetics and the distribution of products formed suggest, however, that productive binding of the modified residue does not occur readily at the point of catalytic attack (subsite 3) and that the preferred position of hydrolysis of modified substrates may be different from that of unmodified substrates. Results of PPA hydrolysis of glycogen containing [14C]-2-amino-2-deoxy-d-glucose showed that a modified trisaccharide and a modified disaccharide were the smallest substituted products formed. Analysis of these products indicated that they did not contain modified residues at their reducing ends. Formation of the observed 2-amino-2-deoxy-maltooligosaccharides is consistent with a scheme where productive binding of 2-amino-2-deoxy-d-glucose is allowed at subsites 1, 2, 4, and 5, but not at subsite 3, the subsite at which hydrolysis occurs.  相似文献   

15.
16.
Underprivileged areas were identified by weighting several census variables that relate to social conditions, by using weights determined by means of a questionnaire sent to one in 10 of the general practitioners in the United Kingdom. The weighted variables were added (after statistical manipulation) to give a score for each of the 9265 electoral wards in England and Wales. Blank ward maps were sent to general practitioners in five family practitioner committee areas and they were asked to shade the wards according to the degree to which the population increased their workload or the pressure on their services. Maps of these same areas were then prepared by using the calculated scores with the cut off points between the worst, the intermediate, and the best areas as on those used by the general practitioners. The two sets of maps were then compared to determine how well the maps that were based on scores agreed with the general practitioners'' maps showing their assessment of the variation of workload in their areas. Overall, 6.3% of the wards differed in shading in any way between the two sets of maps. In the three areas where the general practitioners shaded complete wards and did not report having difficulties with shading only 1.2% of the wards differed. It may be possible to use these "underprivileged area" scores to indicate where problems occur for general practitioners and to extend this work to other primary health care workers.  相似文献   

17.
In clinical studies involving multiple variables, simultaneous tests are often considered where both the outcomes and hypotheses are correlated. This article proposes a multivariate mixture prior on treatment effects, that allows positive probability of zero effect for each hypothesis, correlations among effect sizes, correlations among binary outcomes of zero versus nonzero effect, and correlations among the observed test statistics (conditional on the effects). We develop a Bayesian multiple testing procedure, for the multivariate two-sample situation with unknown covariance structure, and obtain the posterior probabilities of no difference between treatment regimens for specific variables. Prior selection methods and robustness issues are discussed in the context of a clinical example.  相似文献   

18.
Longitudinal data are common in clinical trials and observational studies, where missing outcomes due to dropouts are always encountered. Under such context with the assumption of missing at random, the weighted generalized estimating equation (WGEE) approach is widely adopted for marginal analysis. Model selection on marginal mean regression is a crucial aspect of data analysis, and identifying an appropriate correlation structure for model fitting may also be of interest and importance. However, the existing information criteria for model selection in WGEE have limitations, such as separate criteria for the selection of marginal mean and correlation structures, unsatisfactory selection performance in small‐sample setups, and so forth. In particular, there are few studies to develop joint information criteria for selection of both marginal mean and correlation structures. In this work, by embedding empirical likelihood into the WGEE framework, we propose two innovative information criteria named a joint empirical Akaike information criterion and a joint empirical Bayesian information criterion, which can simultaneously select the variables for marginal mean regression and also correlation structure. Through extensive simulation studies, these empirical‐likelihood‐based criteria exhibit robustness, flexibility, and outperformance compared to the other criteria including the weighted quasi‐likelihood under the independence model criterion, the missing longitudinal information criterion, and the joint longitudinal information criterion. In addition, we provide a theoretical justification of our proposed criteria, and present two real data examples in practice for further illustration.  相似文献   

19.
Climate is one of the most important drivers for adaptive evolution in forest trees. Climatic selection contributes greatly to local adaptation and intraspecific differentiation, but this kind of selection could also have promoted interspecific divergence through ecological speciation. To test this hypothesis, we examined intra‐ and interspecific genetic variation at 25 climate‐related candidate genes and 12 reference loci in two closely related pine species, Pinus massoniana Lamb. and Pinus hwangshanensis Hisa, using population genetic and landscape genetic approaches. These two species occur in Southeast China but have contrasting ecological preferences in terms of several environmental variables, notably altitude, although hybrids form where their distributions overlap. One or more robust tests detected signals of recent and/or ancient selection at two‐thirds (17) of the 25 candidate genes, at varying evolutionary timescales, but only three of the 12 reference loci. The signals of recent selection were species specific, but signals of ancient selection were mostly shared by the two species likely because of the shared evolutionary history. FST outlier analysis identified six SNPs in five climate‐related candidate genes under divergent selection between the two species. In addition, a total of 24 candidate SNPs representing nine candidate genes showed significant correlation with altitudinal divergence in the two species based on the covariance matrix of population history derived from reference SNPs. Genetic differentiation between these two species was higher at the candidate genes than at the reference loci. Moreover, analysis using the isolation‐with‐migration model indicated that gene flow between the species has been more restricted for climate‐related candidate genes than the reference loci, in both directions. Taken together, our results suggest that species‐specific and divergent climatic selection at the candidate genes might have counteracted interspecific gene flow and played a key role in the ecological divergence of these two closely related pine species.  相似文献   

20.
We propose criteria for variable selection in the mean model and for the selection of a working correlation structure in longitudinal data with dropout missingness using weighted generalized estimating equations. The proposed criteria are based on a weighted quasi‐likelihood function and a penalty term. Our simulation results show that the proposed criteria frequently select the correct model in candidate mean models. The proposed criteria also have good performance in selecting the working correlation structure for binary and normal outcomes. We illustrate our approaches using two empirical examples. In the first example, we use data from a randomized double‐blind study to test the cancer‐preventing effects of beta carotene. In the second example, we use longitudinal CD4 count data from a randomized double‐blind study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号