Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach.  相似文献   

Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.  相似文献   

As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.  相似文献   

The study of double-stranded RNA unwinding by helicases is a problem of basic scientific interest. One such example is provided by studies on the hepatitis C virus (HCV) NS3 helicase using single molecule mechanical experiments. HCV currently infects nearly 3% of the world population and NS3 is a protein essential for viral genome replication. The objective of this study is to model the RNA unwinding mechanism based on previously published data and study its characteristics and their dependence on force, ATP and NS3 protein concentration. In this work, RNA unwinding by NS3 helicase is hypothesized to occur in a series of discrete steps and the steps themselves occurring in accordance with an underlying point process. A point process driven change point model is employed to model the RNA unwinding mechanism. The results are in large agreement with findings in previous studies. A gamma distribution based renewal process was found to model well the point process that drives the unwinding mechanism. The analysis suggests that the periods of constant extension observed during NS3 activity can indeed be classified into pauses and subpauses and that each depend on the ATP concentration. The step size is independent of external factors and seems to have a median value of 11.37 base pairs. The steps themselves are composed of a number of substeps with an average of about 4 substeps per step and an average substep size of about 3.7 base pairs. An interesting finding pertains to the stepping velocity. Our analysis indicates that stepping velocity may be of two kinds- a low and a high velocity.  相似文献   

One of the key problems in the study of ancient DNA is that of authenticating sequences obtained from PCR amplifications of highly degraded samples. Contamination of ancient samples and postmortem damage to endogenous DNA templates are the major obstacles facing researchers in this task. In particular, the authentication of sequences obtained from ancient human remains is thought by many to be rather challenging. We propose a novel approach, based on the c statistic, that can be employed to help identify the sequence motif of an endogenous template, based on a sample of sequences that reflect the nucleotide composition of individual template molecules obtained from ancient tissues (such as cloned products from a PCR amplification). The c statistic exploits as information the most common form of postmortem damage observed among clone sequences in ancient DNA studies, namely, lesion-induced substitutions caused by cytosine deamination events. Analyses of simulated sets of templates with miscoding lesions and real sets of clone sequences from the literature indicate that the c-based approach is highly effective in identifying endogenous sequence motifs, even when they are not present among the sampled clones. The proposed approach is likely to be of general use to researchers working with DNA from ancient tissues, particularly from human remains, where authentication of results has been most challenging. [Reviewing Editor: Dr. Magnus Nordborg]  相似文献   

In this paper we propose an instrument for collecting sensitive data that allows for each participant to customize the amount of information that she is comfortable revealing. Current methods adopt a uniform approach where all subjects are afforded the same privacy guarantees; however, privacy is a highly subjective property with intermediate points between total disclosure and non-disclosure: each respondent has a different criterion regarding the sensitivity of a particular topic. The method we propose empowers respondents in this respect while still allowing for the discovery of interesting findings through the application of well-known inferential procedures.  相似文献   

To incorporate quality by design concepts into the management of leachables, an emphasis is often put on understanding the extractable profile for the materials of construction for manufacturing disposables, container-closure, or delivery systems. Component manufacturing processes may also impact the extractable profile. An approach was developed to (1) identify critical components that may be sources of leachables, (2) enable an understanding of manufacturing process factors that affect extractable profiles, (3) determine if quantitative models can be developed that predict the effect of those key factors, and (4) evaluate the practical impact of the key factors on the product. A risk evaluation for an inhalation product identified injection molding as a key process. Designed experiments were performed to evaluate the impact of molding process parameters on the extractable profile from an ABS inhaler component. Statistical analysis of the resulting GC chromatographic profiles identified processing factors that were correlated with peak levels in the extractable profiles. The combination of statistically significant molding process parameters was different for different types of extractable compounds. ANOVA models were used to obtain optimal process settings and predict extractable levels for a selected number of compounds. The proposed paradigm may be applied to evaluate the impact of material composition and processing parameters on extractable profiles and utilized to manage product leachables early in the development process and throughout the product lifecycle.KEY WORDS: design of experiments, extractables, injection molding, leachables, process parameters, quality by design  相似文献   

BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.  相似文献   



A fundamental issue in neuroscience is how to identify the multiple biophysical mechanisms through which neurons generate observed patterns of spiking activity. In previous work, we proposed a method for linking observed patterns of spiking activity to specific biophysical mechanisms based on a state space modeling framework and a sequential Monte Carlo, or particle filter, estimation algorithm. We have shown, in simulation, that this approach is able to identify a space of simple biophysical models that were consistent with observed spiking data (and included the model that generated the data), but have yet to demonstrate the application of the method to identify realistic currents from real spike train data. Here, we apply the particle filter to spiking data recorded from rat layer V cortical neurons, and correctly identify the dynamics of an slow, intrinsic current. The underlying intrinsic current is successfully identified in four distinct neurons, even though the cells exhibit two distinct classes of spiking activity: regular spiking and bursting. This approach – linking statistical, computational, and experimental neuroscience – provides an effective technique to constrain detailed biophysical models to specific mechanisms consistent with observed spike train data.  相似文献   

In various social species, animals have been observed to share friendly relationships with some group members and to resolve conflicts through reconciliation, the exchange of affiliative behaviour soon after a conflict that functions to restore the relationship between the former opponents. The valuable relationship hypothesis predicts that reconciliation should be observed more often after conflicts between friends. Friendly relationships can be described by three dimensions (i.e. value, security and compatibility); however, research into the relative importance of these dimensions for the occurrence of reconciliation is sparse. Moreover, reconciliation may depend on factors other than the social relationship between opponents including, for example, their social status or the context of the conflict. Our study aimed at analysing which factors are important determinants of reconciliation and at testing the valuable relationship hypothesis, by analysing the relative effects of relationship value, security and compatibility on the occurrence and timing of reconciliation. We collected data on two troops of wild Japanese macaques living on Yakushima Island, Japan, and selected the best predicting variables of reconciliation using linear mixed models. Our results show that reconciliation occurs more frequently, and earlier, after conflicts between opponents who exchange a higher percentage of grooming. Two additional variables related to relationship security and value were selected in the best models: frequency of aggression and of approaches resulting in tolerated co‐feeding. Among the variables not related to relationship quality, distance between opponents at the end of the conflict, kinship, sex of the opponents and context of conflict (i.e. during feeding or social time) were included in our models. Our findings support the valuable relationship hypothesis and, in particular, highlight that the fitness‐related benefits of social relationships (i.e. the relationship value) are important determinants of the evolution of friendly relationships and reconciliation.  相似文献   

With the aim of integrating HIV and tuberculosis care in rural Kenya, a team of researchers, clinicians, and technologists used the human-centered design approach to facilitate design, development, and deployment processes of new patient-specific TB clinical decision support system for medical providers. In Kenya, approximately 1.6 million people are living with HIV and have a 20-times higher risk of dying of tuberculosis. Although tuberculosis prevention and treatment medication is widely available, proven to save lives, and prioritized by the World Health Organization, ensuring that it reaches the most vulnerable communities remains challenging. Human-centered design, used in the fields of industrial design and information technology for decades, is an approach to improving the effectiveness and impact of innovations that has been scarcely used in the health field. Using this approach, our team followed a 3-step process, involving mixed methods assessment to (1) understand the situation through the collection and analysis of site observation sessions and key informant interviews; (2) develop a new clinical decision support system through iterative prototyping, end-user engagement, and usability testing; and, (3) implement and evaluate the system across 24 clinics in rural West Kenya. Through the application of this approach, we found that human-centered design facilitated the process of digital innovation in a complex and resource-constrained context.  相似文献   

The vast majority of genome-wide association study (GWAS) findings reported to date are from populations with European Ancestry (EA), and it is not yet clear how broadly the genetic associations described will generalize to populations of diverse ancestry. The Population Architecture Using Genomics and Epidemiology (PAGE) study is a consortium of multi-ancestry, population-based studies formed with the objective of refining our understanding of the genetic architecture of common traits emerging from GWAS. In the present analysis of five common diseases and traits, including body mass index, type 2 diabetes, and lipid levels, we compare direction and magnitude of effects for GWAS-identified variants in multiple non-EA populations against EA findings. We demonstrate that, in all populations analyzed, a significant majority of GWAS-identified variants have allelic associations in the same direction as in EA, with none showing a statistically significant effect in the opposite direction, after adjustment for multiple testing. However, 25% of tagSNPs identified in EA GWAS have significantly different effect sizes in at least one non-EA population, and these differential effects were most frequent in African Americans where all differential effects were diluted toward the null. We demonstrate that differential LD between tagSNPs and functional variants within populations contributes significantly to dilute effect sizes in this population. Although most variants identified from GWAS in EA populations generalize to all non-EA populations assessed, genetic models derived from GWAS findings in EA may generate spurious results in non-EA populations due to differential effect sizes. Regardless of the origin of the differential effects, caution should be exercised in applying any genetic risk prediction model based on tagSNPs outside of the ancestry group in which it was derived. Models based directly on functional variation may generalize more robustly, but the identification of functional variants remains challenging.  相似文献   

An immunological approach to the detection of taurine resulted in antibodies specific enough to be used for immunocytochemical studies. The experimental conditions were similar to those previously described for raising antibodies against some small-sized neurotransmitter molecules: antisera were obtained from rabbits immunized with taurine conjugated to carrier proteins via glutaraldehyde and purified by adsorption on the glutaraldehyde-treated protein carriers. Antibody affinity and specificity were determined in competition experiments between conjugated taurine and other conjugated amino acids or derivatives by enzyme-linked immunosorbent assay. The resulting cross-reactivity ratios, calculated at half-displacement, showed conjugated taurine to be the best recognized compound. Given the molecular structure of taurine and the method used to prepare the conjugate, it seemed necessary to perform an oxidation step. However, adsorption of antisera on reoxidized or nonreoxidized taurine conjugates suggested that reoxidation did not make a significant difference. Immunocytochemical application of the sera revealed populations of strongly immunopositive nerve cells in the cerebellum, striatum, and septum. The results confirmed that antitaurine antibodies can be used as specific tools for a better understanding of the role of taurine in the central nervous system.  相似文献   

Soil characteristics influence earthworm population dynamics, species distribution and community structure. According in the present study an attempt was made to determine the soil physiochemical factors influencing earthworms of Kashmir valley with a view to improve the soil productivity by enhancing earthworm diversity under different pedoecosystems. Data collection on 15 soil parameters from 20 earthworm inhabiting sites revealed significant variation within and among the sites in soil temperature (F23, 19 = 148.83, 9.71; P < 0.05), moisture (F23, 19 = 16.91, 46.20; P < 0.05), pH (F19 = 47.21; P < 0.05), electrical conductivity (F23, 19 = 11.67, 87.13; P < 0.05), sodium (F23, 19 = 2.46, 211.25; P < 0.05), potassium (F19 = 22.91; P < 0.05), calcium (F19 = 15.90; P < 0.05), magnesium (F23, 19 = 1.76, 104.51; P < 0.05), organic carbon (F23, 19 = 64.60, 222.50; P < 0.05), organic nitrogen (F23, 19 = 4.59, 3.81; P < 0.05) and phosphorous (F23, 19 = 5.11, 137.87; P < 0.05). Aporrectodea caliginosa trapezoides and A. rosea rosea exhibited wide range of distribution whereas Octolasion cyaneum, A. c. trapezoides and A. parva showed restricted distribution. Hierarchical cluster analysis grouped 20 earthworm collection sites into three clusters—earthworm absent sites, low earthworm diversity sites and moderate earthworm diversity sites. Principal component analysis assisted from the data set of 20 sites, resulting into four latent factors accounting for 77.95 % of total variance, identified the factors affecting earthworm communities are mainly related to physical habitat factor, chemical factor, soil texture factor and growth factor, each accounting for 26.41, 20.16, 18.25 and 13.13 % of total variance respectively.  相似文献   



This is the first statistical demonstration that heterozygosity is not responsible for classical Mendelian FMF per se, but constitutes a susceptibility factor for clinically-similar multifactorial forms of the disease. We also provide a first estimate of the risk for heterozygotes to develop FMF.  相似文献   

