首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach.  相似文献   

2.
Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.  相似文献   

3.
As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.  相似文献   

4.
The study of double-stranded RNA unwinding by helicases is a problem of basic scientific interest. One such example is provided by studies on the hepatitis C virus (HCV) NS3 helicase using single molecule mechanical experiments. HCV currently infects nearly 3% of the world population and NS3 is a protein essential for viral genome replication. The objective of this study is to model the RNA unwinding mechanism based on previously published data and study its characteristics and their dependence on force, ATP and NS3 protein concentration. In this work, RNA unwinding by NS3 helicase is hypothesized to occur in a series of discrete steps and the steps themselves occurring in accordance with an underlying point process. A point process driven change point model is employed to model the RNA unwinding mechanism. The results are in large agreement with findings in previous studies. A gamma distribution based renewal process was found to model well the point process that drives the unwinding mechanism. The analysis suggests that the periods of constant extension observed during NS3 activity can indeed be classified into pauses and subpauses and that each depend on the ATP concentration. The step size is independent of external factors and seems to have a median value of 11.37 base pairs. The steps themselves are composed of a number of substeps with an average of about 4 substeps per step and an average substep size of about 3.7 base pairs. An interesting finding pertains to the stepping velocity. Our analysis indicates that stepping velocity may be of two kinds- a low and a high velocity.  相似文献   

5.
Development of post-GWAS (genome-wide association study) methods are greatly needed for characterizing the function of trait-associated SNPs. Strategies integrating various biological data sets with GWAS results will provide insights into the mechanistic role of associated SNPs. Here, we present a method that integrates RNA sequencing (RNA-seq) and allele-specific expression data with GWAS data to further characterize SNPs associated with follicular lymphoma (FL). We investigated the influence on gene expression of three established FL-associated loci—rs10484561, rs2647012, and rs6457327—by measuring their correlation with human-leukocyte-antigen (HLA) expression levels obtained from publicly available RNA-seq expression data sets from lymphoblastoid cell lines. Our results suggest that SNPs linked to the protective variant rs2647012 exert their effect by a cis-regulatory mechanism involving modulation of HLA-DQB1 expression. In contrast, no effect on HLA expression was observed for the colocalized risk variant rs10484561. The application of integrative methods, such as those presented here, to other post-GWAS investigations will help identify causal disease variants and enhance our understanding of biological disease mechanisms.  相似文献   

6.
7.
One of the key problems in the study of ancient DNA is that of authenticating sequences obtained from PCR amplifications of highly degraded samples. Contamination of ancient samples and postmortem damage to endogenous DNA templates are the major obstacles facing researchers in this task. In particular, the authentication of sequences obtained from ancient human remains is thought by many to be rather challenging. We propose a novel approach, based on the c statistic, that can be employed to help identify the sequence motif of an endogenous template, based on a sample of sequences that reflect the nucleotide composition of individual template molecules obtained from ancient tissues (such as cloned products from a PCR amplification). The c statistic exploits as information the most common form of postmortem damage observed among clone sequences in ancient DNA studies, namely, lesion-induced substitutions caused by cytosine deamination events. Analyses of simulated sets of templates with miscoding lesions and real sets of clone sequences from the literature indicate that the c-based approach is highly effective in identifying endogenous sequence motifs, even when they are not present among the sampled clones. The proposed approach is likely to be of general use to researchers working with DNA from ancient tissues, particularly from human remains, where authentication of results has been most challenging. [Reviewing Editor: Dr. Magnus Nordborg]  相似文献   

8.
9.
NIKLAS  KARL J. 《Annals of botany》1998,82(2):177-187
A method is presented to calculate the factor of safety (basedon any criterion of failure) for populations of equivalent biologicalstructures that normally sustain mechanical loads (e.g. stemsof the same order of branching). The method uses the empiricallydetermined mean and standard deviation of the actual loads andthe load capabilities provided that both types of loads arenormally distributed in the population. The factor of safetySiscalculated from the formulaS  相似文献   

10.
A statistical method is proposed for recognition of a bacterial species or for differentiation of two groups of bacterial strains. Comparison between two groups is done by the “t”-test. When the mean S-values (similarity value) of two groups are A and B, and the mean S-value for all possible combinations between strains of both groups is S, a condition necessary for defining the two groups as different species is to demonstrate the existence of equations A > S and B > S. Unless this condition is fulfilled, the two groups should be considered unseparable. The condition necessary for recognition of two groups as one species is to demonstrate the existence of equations A:= S and B = S. A few examples of the test were shown using the mycobacteria, and it was suggested that this statistical method is useful in the recognition of a species.  相似文献   

11.
In this paper we propose an instrument for collecting sensitive data that allows for each participant to customize the amount of information that she is comfortable revealing. Current methods adopt a uniform approach where all subjects are afforded the same privacy guarantees; however, privacy is a highly subjective property with intermediate points between total disclosure and non-disclosure: each respondent has a different criterion regarding the sensitivity of a particular topic. The method we propose empowers respondents in this respect while still allowing for the discovery of interesting findings through the application of well-known inferential procedures.  相似文献   

12.
《Cell metabolism》2020,31(4):741-754.e5
  1. Download : Download high-res image (160KB)
  2. Download : Download full-size image
  相似文献   

13.
14.
This article describes the preparation of starch particles, by spray drying, for possible application to a dry powder coating process. Dry powder coating consists of spraying a fine powder and a plasticizer on particles. The efficiency of the coating is linked to the powder morphological and dimensional characteristics. Different experimental parameters of the spray-drying process were analyzed, including type of solvent, starch concentration, rate of polymer feeding, pressure of the atomizing air, drying air flow, and temperature of drying air. An optimization and screening of the experimental parameters by a design of the experiment (DOE) approach have been done. Finally, the produced spray-dried starch particles were conveniently tested in a dry coating process, in comparison to the commercial initial starch. The obtained results, in terms of coating efficiency, demonstrated that the spray-dried particles led to a sharp increase of coating efficiency value.  相似文献   

15.
16.
To incorporate quality by design concepts into the management of leachables, an emphasis is often put on understanding the extractable profile for the materials of construction for manufacturing disposables, container-closure, or delivery systems. Component manufacturing processes may also impact the extractable profile. An approach was developed to (1) identify critical components that may be sources of leachables, (2) enable an understanding of manufacturing process factors that affect extractable profiles, (3) determine if quantitative models can be developed that predict the effect of those key factors, and (4) evaluate the practical impact of the key factors on the product. A risk evaluation for an inhalation product identified injection molding as a key process. Designed experiments were performed to evaluate the impact of molding process parameters on the extractable profile from an ABS inhaler component. Statistical analysis of the resulting GC chromatographic profiles identified processing factors that were correlated with peak levels in the extractable profiles. The combination of statistically significant molding process parameters was different for different types of extractable compounds. ANOVA models were used to obtain optimal process settings and predict extractable levels for a selected number of compounds. The proposed paradigm may be applied to evaluate the impact of material composition and processing parameters on extractable profiles and utilized to manage product leachables early in the development process and throughout the product lifecycle.KEY WORDS: design of experiments, extractables, injection molding, leachables, process parameters, quality by design  相似文献   

17.
Identifying antimicrobial resistant(AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing(NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations(CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments(GPA). Simulation results showed that GPA has reduced the false discovery rate(FDR) and mean absolute error(MAE) in CNV and single nucleotide variant(SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.  相似文献   

18.
19.
BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future.  相似文献   

20.

Background

The study of the prehistoric origins and dispersal routes of domesticated plants is often based on the analysis of either archaeobotanical or genetic data. As more data become available, spatially explicit models of crop dispersal can be used to combine different types of evidence.

Methodology/Principal Findings

We present a model in which a crop disperses through a landscape that is represented by a conductance matrix. From this matrix, we derive least-cost distances from the geographical origin of the crop and use these to predict the age of archaeological crop remains and the heterozygosity of crop populations. We use measures of the overlap and divergence of dispersal trajectories to predict genetic similarity between crop populations. The conductance matrix is constructed from environmental variables using a number of parameters. Model parameters are determined with multiple-criteria optimization, simultaneously fitting the archaeobotanical and genetic data. The consilience reached by the model is the extent to which it converges around solutions optimal for both archaeobotanical and genetic data. We apply the modelling approach to the dispersal of maize in the Americas.

Conclusions/Significance

The approach makes possible the integrative inference of crop dispersal processes, while controlling model complexity and computational requirements.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号