首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

GWAS owe their popularity to the expectation that they will make a major impact on diagnosis, prognosis and management of disease by uncovering genetics underlying clinical phenotypes. The dominant paradigm in GWAS data analysis so far consists of extensive reliance on methods that emphasize contribution of individual SNPs to statistical association with phenotypes. Multivariate methods, however, can extract more information by considering associations of multiple SNPs simultaneously. Recent advances in other genomics domains pinpoint multivariate causal graph-based inference as a promising principled analysis framework for high-throughput data. Designed to discover biomarkers in the local causal pathway of the phenotype, these methods lead to accurate and highly parsimonious multivariate predictive models. In this paper, we investigate the applicability of causal graph-based method TIE* to analysis of GWAS data. To test the utility of TIE*, we focus on anti-CCP positive rheumatoid arthritis (RA) GWAS datasets, where there is a general consensus in the community about the major genetic determinants of the disease.

Results

Application of TIE* to the North American Rheumatoid Arthritis Cohort (NARAC) GWAS data results in six SNPs, mostly from the MHC locus. Using these SNPs we develop two predictive models that can classify cases and disease-free controls with an accuracy of 0.81 area under the ROC curve, as verified in independent testing data from the same cohort. The predictive performance of these models generalizes reasonably well to Swedish subjects from the closely related but not identical Epidemiological Investigation of Rheumatoid Arthritis (EIRA) cohort with 0.71-0.78 area under the ROC curve. Moreover, the SNPs identified by the TIE* method render many other previously known SNP associations conditionally independent of the phenotype.

Conclusions

Our experiments demonstrate that application of TIE* captures maximum amount of genetic information about RA in the data and recapitulates the major consensus findings about the genetic factors of this disease. In addition, TIE* yields reproducible markers and signatures of RA. This suggests that principled multivariate causal and predictive framework for GWAS analysis empowers the community with a new tool for high-quality and more efficient discovery.

Reviewers

This article was reviewed by Prof. Anthony Almudevar, Dr. Eugene V. Koonin, and Prof. Marianthi Markatou.  相似文献   

2.
D J Diller  M R Redinbo  E Pohl  W G Hol 《Proteins》1999,36(4):526-541
A significant portion of new protein structures contain folds that are related to those seen before. During the development of a computer program that can accurately position, in electron density maps, large protein domains with large structural deviations, it became apparent that the redundancy in protein folds could be used in a non trivial manner during a protein structure determination. As a result a computational procedure, Database Assisted Density Interpretation (DADI), was developed and tested to aid in the building of models in protein crystallography and to assist in interpreting electron density maps. The initial tests of the DADI procedure using a small database of protein domains are described. The philosophy is to first work with entire domains then with the secondary structure elements of these domains and finally with individual residues of the secondary structure elements via Monte Carlo, "chopping" and "clipping" procedures, respectively. The first test case was a traceable 3.2 A multiple isomorphous replacement with anomalous scattering (MIRAS) electron density map of a human topoisomerase I-DNA complex. The second test case uses poor electron density for the third domain of the diphtheria toxin repressor resulting from a molecular replacement solution with the first two domains. Despite the fact that a fairly small database was employed in these test cases, the DADI procedure was able to find a large portion of the protein backbone with very few errors. In the first case nearly 45% of the backbone and more than 80% of the secondary structure was placed automatically. In the second test case nearly 50% of the third domain was automatically detected. A particular encouraging result was that in both cases more than 75% of the beta sheet secondary structure was found automatically by the DADI procedure. Clearly, the procedures employed are promising avenues to exploit the current explosion of protein structures for the determination of future structures. Proteins 1999;36:526-541.  相似文献   

3.
Xu C  Li Z  Xu S 《Genetics》2005,169(2):1045-1059
Joint mapping for multiple quantitative traits has shed new light on genetic mapping by pinpointing pleiotropic effects and close linkage. Joint mapping also can improve statistical power of QTL detection. However, such a joint mapping procedure has not been available for discrete traits. Most disease resistance traits are measured as one or more discrete characters. These discrete characters are often correlated. Joint mapping for multiple binary disease traits may provide an opportunity to explore pleiotropic effects and increase the statistical power of detecting disease loci. We develop a maximum-likelihood method for mapping multiple binary traits. We postulate a set of multivariate normal disease liabilities, each contributing to the phenotypic variance of one disease trait. The underlying liabilities are linked to the binary phenotypes through some underlying thresholds. The new method actually maps loci for the variation of multivariate normal liabilities. As a result, we are able to take advantage of existing methods of joint mapping for quantitative traits. We treat the multivariate liabilities as missing values so that an expectation-maximization (EM) algorithm can be applied here. We also extend the method to joint mapping for both discrete and continuous traits. Efficiency of the method is demonstrated using simulated data. We also apply the new method to a set of real data and detect several loci responsible for blast resistance in rice.  相似文献   

4.
5.
6.
Microarray experiments can generate enormous amounts of data, but large datasets are usually inherently complex, and the relevant information they contain can be difficult to extract. For the practicing biologist, we provide an overview of what we believe to be the most important issues that need to be addressed when dealing with microarray data. In a microarray experiment we are simply trying to identify which genes are the most "interesting" in terms of our experimental question, and these will usually be those that are either overexpressed or underexpressed (upregulated or downregulated) under the experimental conditions. Analysis of the data to find these genes involves first preprocessing of the raw data for quality control, including filtering of the data (e.g., detection of outlying values) followed by standardization of the data (i.e., making the data uniformly comparable throughout the dataset). This is followed by the formal quantitative analysis of the data, which will involve either statistical hypothesis testing or multivariate pattern recognition. Statistical hypothesis testing is the usual approach to "class comparison," where several experimental groups are being directly compared. The best approach to this problem is to use analysis of variance, although issues related to multiple hypothesis testing and probability estimation still need to be evaluated. Pattern recognition can involve "class prediction," for which a range of supervised multivariate techniques are available, or "class discovery," for which an even broader range of unsupervised multivariate techniques have been developed. Each technique has its own limitations, which need to be kept in mind when making a choice from among them. To put these ideas in context, we provide a detailed examination of two specific examples of the analysis of microarray data, both from parasitology, covering many of the most important points raised.  相似文献   

7.
An open-source Python library EMDA for cryo-EM map and model manipulation is presented with a specific focus on validation. The use of several functionalities in the library is presented through several examples. The utility of local correlation as a metric for identifying map-model differences and unmodeled regions in maps, and how it is used as a metric of map-model validation is demonstrated. The mapping of local correlation to individual atoms, and its use to draw insights on local signal variations are discussed. EMDA’s likelihood-based map overlay is demonstrated by carrying out a superposition of two domains in two related structures. The overlay is carried out first to bring both maps into the same coordinate frame and then to estimate the relative movement of domains. Finally, the map magnification refinement in EMDA is presented with an example to highlight the importance of adjusting the map magnification in structural comparison studies.  相似文献   

8.
9.
Platelet aggregation by oral streptococci   总被引:2,自引:0,他引:2  
One proposed mechanism in the pathogenesis of infective endocarditis is the direct aggregation of platelets by the bacteria causing the disease. Some, but not all, strains of Streptococcus sanguis have been reported to aggregate platelets but the taxonomy of this and related taxa has changed recently. The ability to aggregate platelets by 24 genetically grouped laboratory stock strains was studied along with 8 recent isolates from cases of endocarditis. Strains belonging to S. sanguis could aggregate platelets, but not S. gordonii, "S. parasanguis", S. mitis, S. oralis or related taxa. Also, preliminary data indicate that certain biotypes of S. sanguis lack the ability to aggregate platelets. Of the recent clinical isolates, only 4 aggregated platelets and each of these showed phenotypes typical of S. sanguis. These data suggest that the ability to aggregate platelets is not essential for an organism to be able to cause endocarditis, although it may be a significant virulence factor.  相似文献   

10.
Gerbera hybrida is an economically important cut flower. In the production and transportation of gerbera with unavoidable periods of high relative humidity, grey mould occurs and results in losses in quality and quantity of flowers. Considering the limitations of chemical use in greenhouses and the impossibility to use these chemicals in auction or after sale, breeding for resistant gerbera cultivars is considered as the best practical approach. In this study, we developed two segregating F1 populations (called S and F). Four parental linkage maps were constructed using common and parental specific SNP markers developed from expressed sequence tag sequencing. Parental genetic maps, containing 30, 29, 27 and 28 linkage groups and a consensus map covering 24 of the 25 expected chromosomes, could be constructed. After evaluation of Botrytis disease severity using three different tests, whole inflorescence, bottom (of disc florets) and ray floret, quantitative trait locus (QTL) mapping was performed using the four individual parental maps. A total of 20 QTLs (including one identical QTL for whole inflorescence and bottom tests) were identified in the parental maps of the two populations. The number of QTLs found and the explained variance of most QTLs detected reflect the complex mechanism of Botrytis disease response.  相似文献   

11.
Many linkage studies are performed in inbred populations, either small isolated populations or large populations with a long tradition of marriages between relatives. In such populations, there exist very complex genealogies with unknown loops. Therefore, the true inbreeding coefficient of an individual is often unknown. Good estimators of the inbreeding coefficient (f) are important, since it has been shown that underestimation of f may lead to false linkage conclusions. When an individual is genotyped for markers spanning the whole genome, it should be possible to use this genomic information to estimate that individual's f. To do so, we propose a maximum-likelihood method that takes marker dependencies into account through a hidden Markov model. This methodology also allows us to infer the full probability distribution of the identity-by-descent (IBD) status of the two alleles of an individual at each marker along the genome (posterior IBD probabilities) and provides a variance for the estimates. We simulate a full genome scan mimicking the true autosomal genome for (1) a first-cousin pedigree and (2) a quadruple-second-cousin pedigree. In both cases, we find that our method accurately estimates f for different marker maps. We also find that the proportion of genome IBD in an individual with a given genealogy is very variable. The approach is illustrated with data from a study of demyelinating autosomal recessive Charcot-Marie-Tooth disease.  相似文献   

12.
Cryo-electron microscopy of "single particles" is a powerful method to analyze structures of large macromolecular assemblies that are not amenable to investigation by traditional X-ray crystallographic methods. A key step in these studies is to obtain atomic interpretations of multiprotein complexes by fitting atomic structures of individual components into maps obtained from electron microscopic data. Here, we report the use of a "core-weighting" method, combined with a grid-threading Monte Carlo (GTMC) approach for this purpose. The "core" of an individual structure is defined to represent the part where the density distribution is least likely to be altered by other components that comprise the macromolecular assembly of interest. The performance of the method has been evaluated by its ability to determine the correct fit of (i) the alpha-chain of the T-cell receptor variable domain into a simulated map of the alphabeta complex at resolutions between 5 and 40 A, and (ii) the E2 catalytic domain of the pyruvate dehydrogenase into an experimentally determined map, at 14 A resolution, of the icosahedral complex formed by 60 copies of this enzyme. Using the X-ray structures of the two test cases as references, we demonstrate that, in contrast to more traditional methods, the combination of the core-weighting method and the grid-threading Monte Carlo approach can identify the correct fit reliably and rapidly from the low-resolution maps that are typical of structures determined with the use of single-particle electron microscopy.  相似文献   

13.
Recently we found that visual arrestin binds microtubules and that this interaction plays an important role in arrestin localization in photoreceptor cells. Here we use site-directed mutagenesis and spin labeling to explore the molecular mechanism of this novel regulatory interaction. The microtubule binding site maps to the concave sides of the two arrestin domains, overlapping with the rhodopsin binding site, which makes arrestin interactions with rhodopsin and microtubules mutually exclusive. Arrestin interaction with microtubules is enhanced by several "activating mutations" and involves multiple positive charges and hydrophobic elements. The comparable affinity of visual arrestin for microtubules and unpolymerized tubulin (K(D) > 40 mum and >65 mum, respectively) suggests that the arrestin binding site is largely localized on the individual alphabeta-dimer. The changes in the spin-spin interaction of a double-labeled arrestin indicate that the conformation of microtubule-bound arrestin differs from that of free arrestin in solution. In sharp contrast to rhodopsin, where tight binding requires an extended interdomain hinge, arrestin binding to microtubules is enhanced by deletions in this region, suggesting that in the process of microtubule binding the domains may move in the opposite direction. Thus, microtubule and rhodopsin binding induce different conformational changes in arrestin, suggesting that arrestin assumes three distinct conformations in the cell, likely with different functional properties.  相似文献   

14.
Banerjee S  Carlin BP 《Biometrics》2004,60(1):268-275
Several recent papers (e.g., Chen, Ibrahim, and Sinha, 1999, Journal of the American Statistical Association 94, 909-919; Ibrahim, Chen, and Sinha, 2001a, Biometrics 57, 383-388) have described statistical methods for use with time-to-event data featuring a surviving fraction (i.e., a proportion of the population that never experiences the event). Such cure rate models and their multivariate generalizations are quite useful in studies of multiple diseases to which an individual may never succumb, or from which an individual may reasonably be expected to recover following treatment (e.g., various types of cancer). In this article we extend these models to allow for spatial correlation (estimable via zip code identifiers for the subjects) as well as interval censoring. Our approach is Bayesian, where posterior summaries are obtained via a hybrid Markov chain Monte Carlo algorithm. We compare across a broad collection of rather high-dimensional hierarchical models using the deviance information criterion, a tool recently developed for just this purpose. We apply our approach to the analysis of a smoking cessation study where the subjects reside in 53 southeastern Minnesota zip codes. In addition to the usual posterior estimates, our approach yields smoothed zip code level maps of model parameters related to the relapse rates over time and the ultimate proportion of quitters (the cure rates).  相似文献   

15.

Background

Studies on political ideology and health have found associations between individual ideology and health as well as between ecological measures of political ideology and health. Individual ideology and aggregate measures such as political regimes, however, were never examined simultaneously.

Methodology/Principal Findings

Using adjusted logistic multilevel models to analyze data on individuals from 29 European countries and Israel, we found that individual ideology and political regime are independently associated with self-rated health. Individuals with rightwing ideologies report better health than leftwing individuals. Respondents from Eastern Europe and former Soviet republics report poorer health than individuals from social democratic, liberal, Christian conservative, and former Mediterranean dictatorship countries. In contrast to individual ideology and political regimes, country level aggregations of individual ideology are not related to reporting poor health.

Conclusions/Significance

This study shows that although both individual political ideology and contextual political regime are independently associated with individuals'' self-rated health, individual political ideology appears to be more strongly associated with self-rated health than political regime.  相似文献   

16.
Regulators of complement activation (RCA) inhibit complement‐induced immune responses on healthy host tissues. We present crystal structures of human RCA (MCP, DAF, and CR1) and a smallpox virus homolog (SPICE) bound to complement component C3b. Our structural data reveal that up to four consecutive homologous CCP domains (i–iv), responsible for inhibition, bind in the same orientation and extended arrangement at a shared binding platform on C3b. Large sequence variations in CCP domains explain the diverse C3b‐binding patterns, with limited or no contribution of some individual domains, while all regulators show extensive contacts with C3b for the domains at the third site. A variation of ~100° rotation around the longitudinal axis is observed for domains binding at the fourth site on C3b, without affecting the overall binding mode. The data suggest a common evolutionary origin for both inhibitory mechanisms, called decay acceleration and cofactor activity, with variable C3b binding through domains at sites ii, iii, and iv, and provide a framework for understanding RCA disease‐related mutations and immune evasion.  相似文献   

17.
Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such ‘latent variables’ (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson''s disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.  相似文献   

18.
Robyr D  Suka Y  Xenarios I  Kurdistani SK  Wang A  Suka N  Grunstein M 《Cell》2002,109(4):437-446
Yeast contains a family of five related histone deacetylases (HDACs) whose functions are known at few genes. Therefore, we used chromatin immunoprecipitation and intergenic microarrays to generate genome-wide HDAC enzyme activity maps. Rpd3 and Hda1 deacetylate mainly distinct promoters and gene classes where they are recruited largely by novel mechanisms. Hda1 also deacetylates subtelomeric domains containing normally repressed genes that are used instead for gluconeogenesis, growth on carbon sources other than glucose, and adverse growth conditions. These domains have certain features of heterochromatin but are distinct from subtelomeric heterochromatin repressed by the deacetylase Sir2. Finally, Hos1/Hos3 and Hos2 preferentially affect ribosomal DNA and ribosomal protein genes, respectively. Thus, acetylation microarrays uncover the "division of labor" for yeast histone deacetylases.  相似文献   

19.
A central theoretical goal of epidemiology is the construction of spatial models of disease prevalence and risk, including maps for the potential spread of infectious disease. We provide three continent-wide maps representing the relative risk of malaria in Africa based on ecological niche models of vector species and risk analysis at a spatial resolution of 1 arc-minute (9 185 275 cells of approximately 4 sq km). Using a maximum entropy method we construct niche models for 10 malaria vector species based on species occurrence records since 1980, 19 climatic variables, altitude, and land cover data (in 14 classes). For seven vectors (Anopheles coustani, A. funestus, A. melas, A. merus, A. moucheti, A. nili, and A. paludis) these are the first published niche models. We predict that Central Africa has poor habitat for both A. arabiensis and A. gambiae, and that A. quadriannulatus and A. arabiensis have restricted habitats in Southern Africa as claimed by field experts in criticism of previous models. The results of the niche models are incorporated into three relative risk models which assume different ecological interactions between vector species. The "additive" model assumes no interaction; the "minimax" model assumes maximum relative risk due to any vector in a cell; and the "competitive exclusion" model assumes the relative risk that arises from the most suitable vector for a cell. All models include variable anthrophilicity of vectors and spatial variation in human population density. Relative risk maps are produced from these models. All models predict that human population density is the critical factor determining malaria risk. Our method of constructing relative risk maps is equally general. We discuss the limits of the relative risk maps reported here, and the additional data that are required for their improvement. The protocol developed here can be used for any other vector-borne disease.  相似文献   

20.

Background

Deep mining of healthcare data has provided maps of comorbidity relationships between diseases. In parallel, integrative multi-omics investigations have generated high-resolution molecular maps of putative relevance for understanding disease initiation and progression. Yet, it is unclear how to advance an observation of comorbidity relations (one disease to others) to a molecular understanding of the driver processes and associated biomarkers.

Results

Since Chronic Obstructive Pulmonary disease (COPD) has emerged as a central hub in temporal comorbidity networks, we developed a systematic integrative data-driven framework to identify shared disease-associated genes and pathways, as a proxy for the underlying generative mechanisms inducing comorbidity. We integrated records from approximately 13 M patients from the Medicare database with disease-gene maps that we derived from several resources including a semantic-derived knowledge-base. Using rank-based statistics we not only recovered known comorbidities but also discovered a novel association between COPD and digestive diseases. Furthermore, our analysis provides the first set of COPD co-morbidity candidate biomarkers, including IL15, TNF and JUP, and characterizes their association to aging and life-style conditions, such as smoking and physical activity.

Conclusions

The developed framework provides novel insights in COPD and especially COPD co-morbidity associated mechanisms. The methodology could be used to discover and decipher the molecular underpinning of other comorbidity relationships and furthermore, allow the identification of candidate co-morbidity biomarkers.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号