首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure.  相似文献   

2.
Ranked set sampling (RSS) as suggested by MCINTYRE (1952) and TAKAHASI and WAKIMOTO (1968) may be used to estimate the parameters of the simple regression line. The objective is to use the RSS method to increase the efficiency of the estimators relative to the simple random sampling (SRS) method. Estimators of the slope and intercept are considered. Computer simulated results are given, and an example using real data presented to illustrate the computations.  相似文献   

3.
Nahhas RW  Wolfe DA  Chen H 《Biometrics》2002,58(4):964-971
McIntyre (1952, Australian Journal of Agricultural Research 3, 385-390) introduced ranked set sampling (RSS) as a method for improving estimation of a population mean in settings where sampling and ranking of units from the population are inexpensive when compared with actual measurement of the units. Two of the major factors in the usefulness of RSS are the set size and the relative costs of the various operations of sampling, ranking, and measurement. In this article, we consider ranking error models and cost models that enable us to assess the effect of different cost structures on the optimal set size for RSS. For reasonable cost structures, we find that the optimal RSS set sizes are generally larger than had been anticipated previously. These results will provide a useful tool for determining whether RSS is likely to lead to an improvement over simple random sampling in a given setting and, if so, what RSS set size is best to use in this case.  相似文献   

4.
Bayesian Estimation of the parameter of a distribution is considered using Ranked set sampling (RSS). It is shown that for at least one RSS plan, the Bayes estimator has smaller Bayes risk than the Bayes estimator using simple random sampling (SRS). Furthermore, for exponential family with conjugate prior, the Bayes estimator of the mean using balanced RSS dominates, in terms of its Bayes risk, the Bayes estimator of the mean using SRS. This procedure is used to estimate the average Milk yield of four hundreds and two sheep. The empirical efficiency supports the theoretical findings.  相似文献   

5.
Chen H  Stasny EA  Wolfe DA 《Biometrics》2006,62(1):150-158
The application of ranked set sampling (RSS) techniques to data from a dichotomous population is currently an active research topic, and it has been shown that balanced RSS leads to improvement in precision over simple random sampling (SRS) for estimation of a population proportion. Balanced RSS, however, is not in general optimal in terms of variance reduction for this setting. The objective of this article is to investigate the application of unbalanced RSS in estimation of a population proportion under perfect ranking, where the probabilities of success for the order statistics are functions of the underlying population proportion. In particular, the Neyman allocation, which assigns sample units for each order statistic proportionally to its standard deviation, is shown to be optimal in the sense that it leads to minimum variance within the class of RSS estimators that are simple averages of the means of the order statistics. We also use a substantial data set, the National Health and Nutrition Examination Survey III (NHANES III) data, to demonstrate the feasibility and benefits of Neyman allocation in RSS for binary variables.  相似文献   

6.
Wang YG  Chen Z  Liu J 《Biometrics》2004,60(2):556-561
Nahhas, Wolfe, and Chen (2002, Biometrics58, 964-971) considered optimal set size for ranked set sampling (RSS) with fixed operational costs. This framework can be very useful in practice to determine whether RSS is beneficial and to obtain the optimal set size that minimizes the variance of the population estimator for a fixed total cost. In this article, we propose a scheme of general RSS in which more than one observation can be taken from each ranked set. This is shown to be more cost-effective in some cases when the cost of ranking is not so small. We demonstrate using the example in Nahhas, Wolfe, and Chen (2002, Biometrics58, 964-971), by taking two or more observations from one set even with the optimal set size from the RSS design can be more beneficial.  相似文献   

7.
Ranked set sampling (RSS) as suggested by McIntyre (1952) and developed by Takahasi and Wakimoto (1968) is used to estimate the ratio. It is proved that by using RSS method the efficiency of the estimator relative to the simple random sampling (SRS) method has increased. Computer simulated results are given. An example using real data is presented to illustrate the computations.  相似文献   

8.
Summary Colorectal cancer is the second leading cause of cancer related deaths in the United States, with more than 130,000 new cases of colorectal cancer diagnosed each year. Clinical studies have shown that genetic alterations lead to different responses to the same treatment, despite the morphologic similarities of tumors. A molecular test prior to treatment could help in determining an optimal treatment for a patient with regard to both toxicity and efficacy. This article introduces a statistical method appropriate for predicting and comparing multiple endpoints given different treatment options and molecular profiles of an individual. A latent variable‐based multivariate regression model with structured variance covariance matrix is considered here. The latent variables account for the correlated nature of multiple endpoints and accommodate the fact that some clinical endpoints are categorical variables and others are censored variables. The mixture normal hierarchical structure admits a natural variable selection rule. Inference was conducted using the posterior distribution sampling Markov chain Monte Carlo method. We analyzed the finite‐sample properties of the proposed method using simulation studies. The application to the advanced colorectal cancer study revealed associations between multiple endpoints and particular biomarkers, demonstrating the potential of individualizing treatment based on genetic profiles.  相似文献   

9.
Ranked set sampling (RSS) as suggested by McIntyre (1952) may be modified to introduced a new sampling method called pair rank set sampling (PRSS), which might be used in some area of application instead of the RSS to increase the efficiency of the estimators relative to the simple random sampling (SRS) method. Estimators of the population mean are considered. An example using real data is presented to illustrate computations.  相似文献   

10.
Ranked set sampling (RSS) as suggested by McIntyre (1952) and independently by Takahasi and Wakimoto (1968) may be used to estimate the parameters of the one-way layout. The objective is to use the RSS method to increase the efficiency of the estimators relative to the simple random (SRS) method. Estimators of the populations (treatments) effect are considered. Computer simulated results are given, and an example using real data presented to illustrate the computations.  相似文献   

11.
A nonparametric selected ranked set sampling is suggested. The estimator of population mean based on the new approach is compared with that using the simple random sampling (SRS), the ranked set sampling (RSS) and the median ranked set sampling (MRSS) methods. The estimator of population mean using the new approach is found to be more efficient than its counter‐parts for almost all the cases considered.  相似文献   

12.
Lu TP  Lai LC  Tsai MH  Chen PC  Hsu CP  Lee JM  Hsiao CK  Chuang EY 《PloS one》2011,6(9):e24829
Numerous efforts have been made to elucidate the etiology and improve the treatment of lung cancer, but the overall five-year survival rate is still only 15%. Identification of prognostic biomarkers for lung cancer using gene expression microarrays poses a major challenge in that very few overlapping genes have been reported among different studies. To address this issue, we have performed concurrent genome-wide analyses of copy number variation and gene expression to identify genes reproducibly associated with tumorigenesis and survival in non-smoking female lung adenocarcinoma. The genomic landscape of frequent copy number variable regions (CNVRs) in at least 30% of samples was revealed, and their aberration patterns were highly similar to several studies reported previously. Further statistical analysis for genes located in the CNVRs identified 475 genes differentially expressed between tumor and normal tissues (p<10−5). We demonstrated the reproducibility of these genes in another lung cancer study (p = 0.0034, Fisher''s exact test), and showed the concordance between copy number variations and gene expression changes by elevated Pearson correlation coefficients. Pathway analysis revealed two major dysregulated functions in lung tumorigenesis: survival regulation via AKT signaling and cytoskeleton reorganization. Further validation of these enriched pathways using three independent cohorts demonstrated effective prediction of survival. In conclusion, by integrating gene expression profiles and copy number variations, we identified genes/pathways that may serve as prognostic biomarkers for lung tumorigenesis.  相似文献   

13.

Background

Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

Methods

The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

Results

After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

Conclusion

The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.  相似文献   

14.
Yuan Z  Ghosh D 《Biometrics》2008,64(2):431-439
Summary .   In medical research, there is great interest in developing methods for combining biomarkers. We argue that selection of markers should also be considered in the process. Traditional model/variable selection procedures ignore the underlying uncertainty after model selection. In this work, we propose a novel model-combining algorithm for classification in biomarker studies. It works by considering weighted combinations of various logistic regression models; five different weighting schemes are considered in the article. The weights and algorithm are justified using decision theory and risk-bound results. Simulation studies are performed to assess the finite-sample properties of the proposed model-combining method. It is illustrated with an application to data from an immunohistochemical study in prostate cancer.  相似文献   

15.
In preclinical cancer studies, three-dimensional (3D) cell spheroids and aggregates are preferred over monolayer cell cultures due to their architectural and functional similarity to solid tumors. We performed a proof-of-concept study to generate physiologically relevant and predictive preclinical models using non–small cell lung adenocarcinoma, and colon and colorectal adenocarcinoma cell line-derived 3D spheroids and aggregates. Distinct panels were designed to determine the expression profiles of frequently studied biomarkers of the two cancer subtypes. The lung adenocarcinoma panel included ALK, EGFR, TTF-1, and CK7 biomarkers, and the colon and colorectal adenocarcinoma panel included BRAF V600E, MSH2, MSH6, and CK20. Recent advances in immunofluorescence (IF) multiplexing and imaging technology enable simultaneous detection and quantification of multiple biomarkers on a single slide. In this study, we performed IF staining of multiple biomarkers per section on formalin-fixed paraffin-embedded 3D spheroids and aggregates. We optimized protocol parameters for automated IF and demonstrated staining concordance with automated chromogenic immunohistochemistry performed with validated protocols. Next, post-acquisition spectral unmixing of the captured fluorescent signals were utilized to delineate four differently stained biomarkers within a single multiplex IF image, followed by automated quantification of the expressed markers. This workflow has the potential to be adapted to preclinical high-throughput screening and drug efficacy studies utilizing 3D spheroids from cancer cell lines and patient-derived organoids. The process allows for cost, time, and resource savings through concurrent staining of several biomarkers on a single slide, the ability to study the interactions of multiple expressed proteins within a single region of interest, and enable quantitative assessment of biomarkers in cancer cells.  相似文献   

16.
Comprehensive and in-depth discovery of the disease proteome is an important issue in recent proteomics developments. Previous studies have shown a number of biomarkers discovered in various diseases, including lung cancer. Some of them are potentially useful in lung cancer diagnostics and prognostics. However, few of them can act as organ-specific biomarkers to extensively compare multiple cancer models. This article evaluates a recently published study employing comparative proteomics on multiple genetically engineered mouse models and sheds light on the usefulness and application of the discovered marker panel for human lung cancer diagnostics.  相似文献   

17.
The intravenous glucose tolerance test (IVGTT) interpreted with the minimal model provides individual indexes of insulin sensitivity (S(I)) and glucose effectiveness (S(G)). In population studies, the traditional approach, the standard two-stage (STS) method, fails to account for uncertainty in individual estimates, resulting in an overestimation of between-subject variability. Furthermore, in the presence of reduced sampling and/or insulin resistance, individual estimates may be unobtainable, biasing population information. Therefore, we investigated the use of two population approaches, the iterative two-stage (ITS) method and nonlinear mixed-effects modeling (NM), in a population (n = 235) of insulin-sensitive and insulin-resistant subjects under full (FSS, 33 samples) and reduced [RSS(240-min), 13 samples and RSS(180-min), 12 samples] IVGTT sampling schedules. All three population methods gave similar results with the FSS. Using RSS(240), the three methods gave similar results for S(I), but S(G) population means were overestimated. With RSS(180), S(I) and S(G) population means were higher for all three methods compared with their FSS counterparts. NM estimated similar between-subject variability (19% S(G), 53% S(I)) with RSS(180), whereas ITS showed regression to the mean for S(G) (0.01% S(G), 56% S(I)) and STS provided larger population variability in S(I) (29% S(G), 91% S(I)). NM provided individual estimates for all subjects, whereas the two-stage methods failed in 16-18% of the subjects using RSS(180) and 6-14% using RSS(240). We conclude that population approaches, specifically NM, are useful in studies with a sparsely sampled IVGTT ( approximately 12 samples) of short duration ( approximately 3 h) and when individual parameter estimates in all subjects are desired.  相似文献   

18.
Summary Identification of novel biomarkers for risk assessment is important for both effective disease prevention and optimal treatment recommendation. Discovery relies on the precious yet limited resource of stored biological samples from large prospective cohort studies. Case‐cohort sampling design provides a cost‐effective tool in the context of biomarker evaluation, especially when the clinical condition of interest is rare. Existing statistical methods focus on making efficient inference on relative hazard parameters from the Cox regression model. Drawing on recent theoretical development on the weighted likelihood for semiparametric models under two‐phase studies ( Breslow and Wellner, 2007 ), we propose statistical methods to evaluate accuracy and predictiveness of a risk prediction biomarker, with censored time‐to‐event outcome under stratified case‐cohort sampling. We consider nonparametric methods and a semiparametric method. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using data from Framingham Offspring Study to evaluate the accuracy of a recently developed risk score incorporating biomarker information for predicting cardiovascular disease.  相似文献   

19.
Summary .  Rigorous statistical evaluation of the predictive values of novel biomarkers is critical prior to applying novel biomarkers into routine standard care. It is important to identify factors that influence the performance of a biomarker in order to determine the optimal conditions for test performance. We propose a covariate-specific time-dependent positive predictive values curve to quantify the predictive accuracy of a prognostic marker measured on a continuous scale and with censored failure time outcome. The covariate effect is accommodated with a semiparametric regression model framework. In particular, we adopt a smoothed survival time regression technique ( Dabrowska, 1997 ,  The Annals of Statistics   25, 1510–1540) to account for the situation where risk for the disease occurrence and progression is likely to change over time. In addition, we provide asymptotic distribution theory and resampling-based procedures for making statistical inference on the covariate-specific positive predictive values. We illustrate our approach with numerical studies and a dataset from a prostate cancer study.  相似文献   

20.
One of the main causes of death in the world is lung cancer. According to the World Health Organization, the annual incidence of lung cancer increases significantly. Moreover, lung cancer accounts for one of the highest mortality rates, mainly due to late detection. Numerous studies have been conducted in order to identify new biomarkers for early diagnosis and for monitoring and evaluation of lung cancer stages. An ideal biomarker candidate is represented by the analysis of microRNAs expression. In this paper, we want to summarize microRNAs expressions in lung cancer. We also want to present the expression of microRNAs depending on the evolution of lung cancer. For this study, we analyzed the studies available in scientific databases, such as PubMed and Scopus. The studies were selected using the search keywords “microRNAs expression,” “lung cancer,” and “genetic biomarkers.” The most significant articles were selected for the study, following rigorous analysis. To evaluate and monitor lung cancer, the expression of microRNAs may be used successfully due to increased specificity and selectivity. However, further studies are needed on the assignment and validation of microRNAs for each type of lung cancer, respectively, for each stage of evolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号