首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT: BACKGROUND: : In early-stage of cancer, primary treatment can be considered as effective at eliminating the tumor for a non-negligible proportion of patients whereas for the others it leads to a lower tumor burden and thereby potentially prolonged survival. In this mixed population of patients, it is of great interest to detect complex differences in survival distributions associated with molecular markers that potentially activate latent downstream pathways implicated in tumor progression. METHOD: : We propose a novel model-based score test designed for identifying molecular markers with complex effects on survival in early-stage cancer. From a biological point of view, the proposed score test allows to detect complex changes in the survival distributions linked to either the tumor burden or its dynamic growth. RESULTS: : Simulation results show that the proposed statistic is powerful at identifying departure from the null hypothesis of no survival difference. The practical use of the proposed statistic is exemplified by analyzing the prognostic impact of Kras mutation in early-stage of lung adenocarcinomas. This analysis leads to the conclusion that Kras mutation has a significant negative prognostic impact on survival. Moreover, it emphasizes that the complex role of Kras mutation on survival would have been overlooked by considering results from the classical logrank test. CONCLUSION: With the growing number of biological markers to be tested in early-stage cancer, the proposed score test statistic is a powerful tool for detecting molecular markers associated with complex survival patterns.  相似文献   

2.
In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.  相似文献   

3.
Large-scale genetic-association studies that take advantage of an extremely dense set of genetic markers have begun to produce very compelling statistical associations between multiple makers exhibiting strong linkage disequilibrium (LD) in a single genomic region and a phenotype of interest. However, the ultimate biological or "functional" significance of these multiple associations has been difficult to discern. In fact, the LD relationships between not only the markers found to be associated with the phenotype but also potential functionally or causally relevant genetic variations that reside near those markers have been exploited in such studies. Unfortunately, LD, especially strong LD, between variations at neighboring loci can make it difficult to distinguish the functionally relevant variations from nonfunctional variations. Although there are (rare) situations in which it is impossible to determine the independent phenotypic effects of variations in LD, there are strategies for accommodating LD between variations at different loci, and they can be used to tease out their independent effects on a phenotype. These strategies make it possible to differentiate potentially causative from noncausative variations. We describe one such approach involving ridge regression. We showcase the method by using both simulated and real data. Our results suggest that ridge regression and related techniques have the potential to distinguish causative from noncausative variations in association studies.  相似文献   

4.
Gene set analysis allows the inclusion of knowledge from established gene sets, such as gene pathways, and potentially improves the power of detecting differentially expressed genes. However, conventional methods of gene set analysis focus on gene marginal effects in a gene set, and ignore gene interactions which may contribute to complex human diseases. In this study, we propose a method of gene interaction enrichment analysis, which incorporates knowledge of predefined gene sets (e.g. gene pathways) to identify enriched gene interaction effects on a phenotype of interest. In our proposed method, we also discuss the reduction of irrelevant genes and the extraction of a core set of gene interactions for an identified gene set, which contribute to the statistical variation of a phenotype of interest. The utility of our method is demonstrated through analyses on two publicly available microarray datasets. The results show that our method can identify gene sets that show strong gene interaction enrichments. The enriched gene interactions identified by our method may provide clues to new gene regulation mechanisms related to the studied phenotypes. In summary, our method offers a powerful tool for researchers to exhaustively examine the large numbers of gene interactions associated with complex human diseases, and can be a useful complement to classical gene set analyses which only considers single genes in a gene set.  相似文献   

5.
Maity A  Lin X 《Biometrics》2011,67(4):1271-1284
We propose in this article a powerful testing procedure for detecting a gene effect on a continuous outcome in the presence of possible gene-gene interactions (epistasis) in a gene set, e.g., a genetic pathway or network. Traditional tests for this purpose require a large number of degrees of freedom by testing the main effect and all the corresponding interactions under a parametric assumption, and hence suffer from low power. In this article, we propose a powerful kernel machine based test. Specifically, our test is based on a garrote kernel method and is constructed as a score test. Here, the term garrote refers to an extra nonnegative parameter that is multiplied to the covariate of interest so that our score test can be formulated in terms of this nonnegative parameter. A key feature of the proposed test is that it is flexible and developed for both parametric and nonparametric models within a unified framework, and is more powerful than the standard test by accounting for the correlation among genes and hence often uses a much smaller degrees of freedom. We investigate the theoretical properties of the proposed test. We evaluate its finite sample performance using simulation studies, and apply the method to the Michigan prostate cancer gene expression data.  相似文献   

6.
Marginal models for longitudinal continuous proportional data   总被引:5,自引:0,他引:5  
Song PX  Tan M 《Biometrics》2000,56(2):496-502
Summary. Continuous proportional data arise when the response of interest is a percentage between zero and one, e.g., the percentage of decrease in renal function at different follow‐up times from the baseline. In this paper, we propose methods to directly model the marginal means of the longitudinal proportional responses using the simplex distribution of Barndorff‐Nielsen and Jørgensen that takes into account the fact that such responses are percentages restricted between zero and one and may as well have large dispersion. Parameters in such a marginal model are estimated using an extended version of the generalized estimating equations where the score vector is a nonlinear function of the observed response. The method is illustrated with an ophthalmology study on the use of intraocular gas in retinal repair surgeries.  相似文献   

7.
Although there is growing interest to take genomics into the complex realms of natural populations, there is a general shortage of genomic resources and tools available for wild species. This applies not at least to birds, for which genomic approaches should be helpful to questions such as adaptation, speciation and population genetics. In this study, we describe a genome-wide reference set of conserved avian gene markers, broadly applicable across birds. By aligning protein-coding sequences from the recently assembled chicken genome with orthologous sequences in zebra finch, we identified particularly conserved exonic regions flanking introns of suitable size for subsequent amplification and sequencing. Primers were designed for 242 gene markers evenly distributed across the chicken genome, with a mean inter-marker interval of 4.2 Mb. Between 78% and 93% of the markers amplified a specific product in five species tested (chicken, peregrine falcon, collared flycatcher, great reed warbler and blue tit). Two hundred markers were sequenced in collared flycatcher, yielding a total of 122.41 kb of genomic DNA sequence (12096 bp coding sequence and 110 314 bp noncoding). Intron size of collared flycatcher and chicken was highly correlated, as was GC content. A polymorphism screening using these markers in a panel of 10 unrelated collared flycatchers identified 871 single nucleotide polymorphisms (pi = 0.0029) and 33 indels (mainly very short). Avian genome characteristics such as uniform genome size and low rate of syntenic rearrangements suggest that this marker set will find broad utility as a genome-wide reference resource for molecular ecological and population genomic analysis of birds. We envision that it will be particularly useful for obtaining large-scale orthologous targets in different species--important in, for instance, phylogenetics--and for large-scale identification of evenly distributed single nucleotide polymorphisms needed in linkage mapping or in studies of gene flow and hybridization.  相似文献   

8.
9.
In disease screening and prognosis studies, an important task is to determine useful markers for identifying high-risk subgroups. Once such markers are established, they can be incorporated into public health practice to provide appropriate strategies for treatment or disease monitoring based on each individual's predicted risk. In the recent years, genetic and biological markers have been examined extensively for their potential to signal progression or risk of disease. In addition to these markers, it has often been argued that short-term outcomes may be helpful in making a better prediction of disease outcomes in clinical practice. In this paper we propose model-free non-parametric procedures to incorporate short-term event information to improve the prediction of a long-term terminal event. We include the optional availability of a single discrete marker measurement and assess the additional information gained by including the short-term outcome. We focus on the semi-competing risk setting where the short-term event is an intermediate event that may be censored by the terminal event while the terminal event is only subject to administrative censoring. Simulation studies suggest that the proposed procedures perform well in finite samples. Our procedures are illustrated using a data set of post-dialysis patients with end-stage renal disease.  相似文献   

10.
The genetic basis of complex diseases is expected to be highly heterogeneous, with complex interactions among multiple disease loci and environment factors. Due to the multi-dimensional property of interactions among large number of genetic loci, efficient statistical approach has not been well developed to handle the high-order epistatic complexity. In this article, we introduce a new approach for testing genetic epistasis in multiple loci using an entropy-based statistic for a case-only design. The entropy-based statistic asymptotically follows a χ2 distribution. Computer simulations show that the entropy-based approach has better control of type I error and higher power compared to the standard χ2 test. Motivated by a schizophrenia data set, we propose a method for measuring and testing the relative entropy of a clinical phenotype, through which one can test the contribution or interaction of multiple disease loci to a clinical phenotype. A sequential forward selection procedure is proposed to construct a genetic interaction network which is illustrated through a tree-based diagram. The network information clearly shows the relative importance of a set of genetic loci on a clinical phenotype. To show the utility of the new entropy-based approach, it is applied to analyze two real data sets, a schizophrenia data set and a published malaria data set. Our approach provides a fast and testable framework for genetic epistasis study in a case-only design.  相似文献   

11.
The availability of highly polymorphic markers permits testing whether complex traits and diseases result from genomic interactions between nonallelic normal variants at separate loci. Such variants may be identified by deviations from the expected distributions of alleles at a high number of polymorphic loci, when individuals with the phenotype of interest are compared to normal controls of the same breeding unit, provided that both groups share the same remote ancestry and had no ancestors in common for the last three to four generations. The circumstances needed for such studies are ideally met on the island of Sardinia. The recurrent finding of the same type of association in separate breeding units between the phenotype of interest and a given genotype should allow a distinction between true genetic identity by descent and randomly occurring identities, as these will be obviously different in separate breeding units. The availability of several breeding units located in sharply different ecological environments will permit assessment of the role of nature/nurture factors in the degree of manifestation of each newly discovered genotype/phenotype association. A pilot study to evaluate the proposed strategy has been carried out in the Sardinian village of Carloforte, a community of about 8,000 individuals who have remained genetically homogeneous. Fifty-five control samples have been genotyped with six tetranucleotide microsatellites and with a subset of the 400 markers contained in the ABI PRISM linkage mapping panel, version 2. The allele frequencies for these microsatellite markers have been determined for these 55 individuals and compared to those from a random sampling of subsets of these 55 persons. For the six tetranucleotide microsatellites, a subset of as few as 20 people displayed the same allele frequency distributions as observed with the original 55 unrelated individuals. In conclusion, when samples are chosen from the same breeding unit, the number of individuals sufficient to draw the genomic profile of an isolated population can be relatively small. Likewise, the number of probands with the phenotype of interest can be even smaller when they are ascertained with the same genealogical criteria as the normal controls. By comparing the genomic profile of the probands to a fraction of the control samples within each of several separate breeding units of common remote ancestry, the search for genotype/phenotype association for mono- and multifactorial traits and diseases should be simplified and yield unequivocal results.  相似文献   

12.
In complex diseases, various combinations of genomic perturbations often lead to the same phenotype. On a molecular level, combinations of genomic perturbations are assumed to dys-regulate the same cellular pathways. Such a pathway-centric perspective is fundamental to understanding the mechanisms of complex diseases and the identification of potential drug targets. In order to provide an integrated perspective on complex disease mechanisms, we developed a novel computational method to simultaneously identify causal genes and dys-regulated pathways. First, we identified a representative set of genes that are differentially expressed in cancer compared to non-tumor control cases. Assuming that disease-associated gene expression changes are caused by genomic alterations, we determined potential paths from such genomic causes to target genes through a network of molecular interactions. Applying our method to sets of genomic alterations and gene expression profiles of 158 Glioblastoma multiforme (GBM) patients we uncovered candidate causal genes and causal paths that are potentially responsible for the altered expression of disease genes. We discovered a set of putative causal genes that potentially play a role in the disease. Combining an expression Quantitative Trait Loci (eQTL) analysis with pathway information, our approach allowed us not only to identify potential causal genes but also to find intermediate nodes and pathways mediating the information flow between causal and target genes. Our results indicate that different genomic perturbations indeed dys-regulate the same functional pathways, supporting a pathway-centric perspective of cancer. While copy number alterations and gene expression data of glioblastoma patients provided opportunities to test our approach, our method can be applied to any disease system where genetic variations play a fundamental causal role.  相似文献   

13.
The extremes of phenotype displayed by the domestic dog, as well as the largest number of naturally occurring inherited diseases in any mammalian species except man (>450), have generated a large interest in genomic linkage mapping in the species. Marker sets for linkage mapping should ideally show both high levels of polymorphism among the target group of animals and an even spacing of markers across the whole genome. Currently a microsatellite marker set known as Minimal Screening Set 2 (MSS2) is widely used. Here, we have extended this marker set by filling in gaps as noted from the marker positions in the CanFam genome assembly (1.0) and the 5000cR radiation hybrid (RH) map. An additional 183 markers have been positioned to increase the coverage of the MSS2 set wherever it contains a gap >9 mb or 1000(5000) RH units. We have called the marker set derived from the MSS2 set and these 183 markers, MSS3. The average physical spacing of markers in the complete 507 marker MSS3 set is 5 mb, whereas average heterozygosity of the 183 new markers on a panel of 10 dogs of differing breeds is 0.74. This marker group will allow genome-wide scans in the dog to be conducted at close to 5 cM resolution.  相似文献   

14.
Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoing challenge to model the interaction effects between clinical and genomic variables, due to high dimensionality of the data and heterogeneity across data types. In this paper, we propose an integrative approach that models interaction effects using a single-index varying-coefficient model, where the effects of genomic features can be modified by clinical variables. We propose a penalized approach for separate selection of main and interaction effects. Notably, the proposed methods can be applied to right-censored survival outcomes based on a Cox proportional hazards model. We demonstrate the advantages of the proposed methods through extensive simulation studies and provide applications to a motivating cancer genomic study.  相似文献   

15.
Summary The genomic distribution and genetic behavior of DNA sequences introduced into the tomato genome by Agrobacterium tumefaciens were investigated in the backcross progeny of 10 transformed Lycopersicon esculentum x L. pennellii hybrids. All transformants were found to represent single locus insertions based on the co-segregation of restriction fragments corresponding to the T-DNA left and right border sequences in the backcross progeny. Isozyme and restriction fragment length polymorphism (RFLP) markers were used to test linkage relationships of the insertion in each backcross family. The T-DNA inserts in 9 of the 10 transformants were mapped in relation to one or more of these markers, and each mapped to a different chromosomal location. Because only one insertion did not show linkage with the markers employed, it must be located somewhere other than the genomic regions covered by the markers assayed. We conclude that Agrobacterium-mediated insertion in the Lycopersicon genome appears to be random at the chromosomal level. No discrepancies were found between the T-DNA genotype and the nopaline phenotype in the 322 backcross progeny of the nopaline positive transformants. Backcross progeny of two nopaline negative transformants showed incomplete correspondence between the T-DNA genotype and the kanamycin resistance phenotype. No alteration of T-DNA was observed in progeny showing a discrepancy between T-DNA and kanamycin resistance. However, two kanamycin resistant progeny plants of one of these two transformants possessed altered T-DNA restriction patterns, indicating genetic instability of the T-DNA in this transformant.Journal article no. 1223 of the New Mexico Agricultural Experiment Station  相似文献   

16.
Pang Z  Kuk AY 《Biometrics》2007,63(1):218-227
Exchangeable binary data are often collected in developmental toxicity and other studies, and a whole host of parametric distributions for fitting this kind of data have been proposed in the literature. While these distributions can be matched to have the same marginal probability and intra-cluster correlation, they can be quite different in terms of shape and higher-order quantities of interest such as the litter-level risk of having at least one malformed fetus. A sensible alternative is to fit a saturated model (Bowman and George, 1995, Journal of the American Statistical Association 90, 871-879) using the expectation-maximization (EM) algorithm proposed by Stefanescu and Turnbull (2003, Biometrics 59, 18-24). The assumption of compatibility of marginal distributions is often made to link up the distributions for different cluster sizes so that estimation can be based on the combined data. Stefanescu and Turnbull proposed a modified trend test to test this assumption. Their test, however, fails to take into account the variability of an estimated null expectation and as a result leads to inaccurate p-values. This drawback is rectified in this article. When the data are sparse, the probability function estimated using a saturated model can be very jagged and some kind of smoothing is needed. We extend the penalized likelihood method (Simonoff, 1983, Annals of Statistics 11, 208-218) to the present case of unequal cluster sizes and implement the method using an EM-type algorithm. In the presence of covariate, we propose a penalized kernel method that performs smoothing in both the covariate and response space. The proposed methods are illustrated using several data sets and the sampling and robustness properties of the resulting estimators are evaluated by simulations.  相似文献   

17.
Summary Recently meta‐analysis has been widely utilized to combine information across multiple studies to evaluate a common effect. Integrating data from similar studies is particularly useful in genomic studies where the individual study sample sizes are not large relative to the number of parameters of interest. In this article, we are interested in developing robust prognostic rules for the prediction of t ‐year survival based on multiple studies. We propose to construct a composite score for prediction by fitting a stratified semiparametric transformation model that allows the studies to have related but not identical outcomes. To evaluate the accuracy of the resulting score, we provide point and interval estimators for the commonly used accuracy measures including the time‐specific receiver operating characteristic curves, and positive and negative predictive values. We apply the proposed procedures to develop prognostic rules for the 5‐year survival of breast cancer patients based on five breast cancer genomic studies.  相似文献   

18.
The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.  相似文献   

19.
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.  相似文献   

20.
Genetic mapping is one of the key steps in positional cloning. The traditional mapping strategies typically require to genotype a set of markers that are nearly evenly or randomly distributed across the genome or a region of interest. Such “grid” strategies work with low efficiency. We propose an improved mapping strategy by integrating the principle of one-dimensional optimization and information on physical map into the standard mapping procedure used in experimental populations. Computer simulations based on a set of empirical data suggest that our new procedure can reduce the number of markers required for genotyping to less than one-fourth of that of the standard procedure. An illustrative application also demonstrates a pronounced reduction of the burden in genotyping. The proposed strategy offers a quick and cost-effective access to the target gene for positional cloning without any extra expense except for making use of genomic sequence data. A Microsoft Excel spreadsheet, for performing easy calculations described in this article, is available on request from the authors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号