期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions

Leah E Mechanic Brian T Luke Julie E Goodman Stephen J Chanock Curtis C Harris 《BMC bioinformatics》2008,9(1):146

Background

The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives. 相似文献

2.

SNP interaction detection with random forests in high-dimensional genetic data

SJ Winham CL Colby RR Freimuth X Wang M de Andrade M Huebner JM Biernacka 《BMC bioinformatics》2012,13(1):164

ABSTRACT: BACKGROUND: Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional. RESULTS: RF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions. CONCLUSIONS: While RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data. 相似文献

3.

MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study

Xiang Wan Can Yang Qiang Yang Hong Xue Nelson LS Tang Weichuan Yu 《BMC bioinformatics》2009,10(1):13-15

Background

The interactions of multiple single nucleotide polymorphisms (SNPs) are highly hypothesized to affect an individual's susceptibility to complex diseases. Although many works have been done to identify and quantify the importance of multi-SNP interactions, few of them could handle the genome wide data due to the combinatorial explosive search space and the difficulty to statistically evaluate the high-order interactions given limited samples. 相似文献

4.

Graphle: Interactive exploration of large, dense graphs

Curtis Huttenhower Sajid O Mehmood Olga G Troyanskaya 《BMC bioinformatics》2009,10(1):417

Background

A wide variety of biological data can be modeled as network structures, including experimental results (e.g. protein-protein interactions), computational predictions (e.g. functional interaction networks), or curated structures (e.g. the Gene Ontology). While several tools exist for visualizing large graphs at a global level or small graphs in detail, previous systems have generally not allowed interactive analysis of dense networks containing thousands of vertices at a level of detail useful for biologists. Investigators often wish to explore specific portions of such networks from a detailed, gene-specific perspective, and balancing this requirement with the networks' large size, complex structure, and rich metadata is a substantial computational challenge. 相似文献

5.

Predicting functionally important SNP classes based on negative selection

Mark A Levenstien Robert J Klein 《BMC bioinformatics》2011,12(1):26

Background

With the advent of cost-effective genotyping technologies, genome-wide association studies allow researchers to examine hundreds of thousands of single nucleotide polymorphisms (SNPs) for association with human disease. Recently, many researchers applying this strategy have detected strong associations to disease with SNP markers that are either not in linkage disequilibrium with any nonsynonymous SNP or large distances from any annotated gene. In such cases, no well-established standard practice for effective SNP selection for follow-up studies exists. We aim to identify and prioritize groups of SNPs that are more likely to affect phenotypes in order to facilitate efficient SNP selection for follow-up studies. 相似文献

6.

Identification of SNP interactions using logic regression

Schwender H Ickstadt K 《Biostatistics (Oxford, England)》2008,9(1):187-198

Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to measure the importance of these interactions. There are many approaches based on classification methods such as CART and random forests that allow measuring the importance of single variables. But none of these methods enable the importance of combinations of variables to be quantified directly. In this paper, we show how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case-control study and propose 2 measures for quantifying the importance of these interactions for classification. These approaches are then applied on the one hand to simulated data sets and on the other hand to the SNP data of the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. 相似文献

7.

GLOSSI: a method to assess the association of genetic loci-sets with complex diseases

High-Seng Chai Hugues Sicotte Kent R Bailey Stephen T Turner Yan W Asmann Jean-Pierre A Kocher 《BMC bioinformatics》2009,10(1):102-10

Background

The developments of high-throughput genotyping technologies, which enable the simultaneous genotyping of hundreds of thousands of single nucleotide polymorphisms (SNP) have the potential to increase the benefits of genetic epidemiology studies. Although the enhanced resolution of these platforms increases the chance of interrogating functional SNPs that are themselves causative or in linkage disequilibrium with causal SNPs, commonly used single SNP-association approaches suffer from serious multiple hypothesis testing problems and provide limited insights into combinations of loci that may contribute to complex diseases. Drawing inspiration from Gene Set Enrichment Analysis developed for gene expression data, we have developed a method, named GLOSSI (Gene-loci Set Analysis), that integrates prior biological knowledge into the statistical analysis of genotyping data to test the association of a group of SNPs (loci-set) with complex disease phenotypes. The most significant loci-sets can be used to formulate hypotheses from a functional viewpoint that can be validated experimentally. 相似文献

8.

A 34K SNP genotyping array for Populus trichocarpa: Design,application to the study of natural populations and transferability to other Populus species

A. Geraldes S. P. DiFazio G. T. Slavov P. Ranjan W. Muchero J. Hannemann L. E. Gunter A. M. Wymore C. J. Grassa N. Farzaneh I. Porth A. D. McKown O. Skyba E. Li M. Fujita J. Klápště J. Martin W. Schackwitz C. Pennacchio D. Rokhsar M. C. Friedmann G. O. Wasteneys R. D. Guy Y. A. El‐Kassaby S. D. Mansfield Q. C. B. Cronk J. Ehlting C. J. Douglas G. A. Tuskan 《Molecular ecology resources》2013,13(2):306-323

Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. For such studies, the use of large single nucleotide polymorphism (SNP) genotyping arrays still offers the most cost‐effective solution. Herein we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre‐ascertained in 34 wild accessions covering most of the species latitudinal range. We adopted a candidate gene approach to the array design that resulted in the selection of 34 131 SNPs, the majority of which are located in, or within 2 kb of, 3543 candidate genes. A subset of the SNPs on the array (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%. We demonstrate that even among small numbers of samples (n = 10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca. Finally, we provide evidence for the utility of the array to address evolutionary questions such as intraspecific studies of genetic differentiation, species assignment and the detection of natural hybrids. 相似文献

9.

MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening

Nicolas Sauton David Lagorce Bruno O Villoutreix Maria A Miteva 《BMC bioinformatics》2008,9(1):184

Background

The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites. 相似文献

10.

PDA: Pooled DNA analyzer

Hsin-Chou Yang Chia-Ching Pan Chin-Yu Lin Cathy SJ Fann 《BMC bioinformatics》2006,7(1):233-13

Background

Association mapping using abundant single nucleotide polymorphisms is a powerful tool for identifying disease susceptibility genes for complex traits and exploring possible genetic diversity. Genotyping large numbers of SNPs individually is performed routinely but is cost prohibitive for large-scale genetic studies. DNA pooling is a reliable and cost-saving alternative genotyping method. However, no software has been developed for complete pooled-DNA analyses, including data standardization, allele frequency estimation, and single/multipoint DNA pooling association tests. This motivated the development of the software, 'PDA' (Pooled DNA Analyzer), to analyze pooled DNA data. 相似文献

11.

Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms

Barrenäs F Chavali S Alves AC Coin L Jarvelin MR Jörnsten R Langston MA Ramasamy A Rogers G Wang H Benson M 《Genome biology》2012,13(6):R46-9

Background

Complex diseases are associated with altered interactions between thousands of genes. We developed a novel method to identify and prioritize disease genes, which was generally applicable to complex diseases.

Results

We identified modules of highly interconnected genes in disease-specific networks derived from integrating gene-expression and protein interaction data. We examined if those modules were enriched for disease-associated SNPs, and could be used to find novel genes for functional studies. First, we analyzed publicly available gene expression microarray and genome-wide association study (GWAS) data from 13, highly diverse, complex diseases. In each disease, highly interconnected genes formed modules, which were significantly enriched for genes harboring disease-associated SNPs. To test if such modules could be used to find novel genes for functional studies, we repeated the analyses using our own gene expression microarray and GWAS data from seasonal allergic rhinitis. We identified a novel gene, FGF2, whose relevance was supported by functional studies using combined small interfering RNA-mediated knock-down and gene expression microarrays. The modules in the 13 complex diseases analyzed here tended to overlap and were enriched for pathways related to oncological, metabolic and inflammatory diseases. This suggested that this union of the modules would be associated with a general increase in susceptibility for complex diseases. Indeed, we found that this union was enriched with GWAS genes for 145 other complex diseases.

Conclusions

Modules of highly interconnected complex disease genes were enriched for disease-associated SNPs, and could be used to find novel genes for functional studies. 相似文献

12.

IRIS: a method for reverse engineering of regulatory relations in gene networks

Sandro Morganella Pietro Zoppoli Michele Ceccarelli 《BMC bioinformatics》2009,10(1):444

相似文献

13.

Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology

Antonio M. Ramos Richard P. M. A. Crooijmans Nabeel A. Affara Andreia J. Amaral Alan L. Archibald Jonathan E. Beever Christian Bendixen Carol Churcher Richard Clark Patrick Dehais Mark S. Hansen Jakob Hedegaard Zhi-Liang Hu Hindrik H. Kerstens Andy S. Law Hendrik-Jan Megens Denis Milan Danny J. Nonneman Gary A. Rohrer Max F. Rothschild Tim P. L. Smith Robert D. Schnabel Curt P. Van Tassell Jeremy F. Taylor Ralph T. Wiedmann Lawrence B. Schook Martien A. M. Groenen 《PloS one》2009,4(8)

Background

The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay.

Methodology/Principal Findings

A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina''s Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274.

Conclusions/Significance

Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs. 相似文献

14.

Genome‐wide single‐generation signatures of local selection in the panmictic European eel

J. M. Pujolar M. W. Jacobsen T. D. Als J. Frydenberg K. Munch B. Jónsson J. B. Jian L. Cheng G. E. Maes L. Bernatchez M. M. Hansen 《Molecular ecology》2014,23(10):2514-2528

Next‐generation sequencing and the collection of genome‐wide data allow identifying adaptive variation and footprints of directional selection. Using a large SNP data set from 259 RAD‐sequenced European eel individuals (glass eels) from eight locations between 34 and 64^oN, we examined the patterns of genome‐wide genetic diversity across locations. We tested for local selection by searching for increased population differentiation using F_ST‐based outlier tests and by testing for significant associations between allele frequencies and environmental variables. The overall low genetic differentiation found (F_ST = 0.0007) indicates that most of the genome is homogenized by gene flow, providing further evidence for genomic panmixia in the European eel. The lack of genetic substructuring was consistent at both nuclear and mitochondrial SNPs. Using an extensive number of diagnostic SNPs, results showed a low occurrence of hybrids between European and American eel, mainly limited to Iceland (5.9%), although individuals with signatures of introgression several generations back in time were found in mainland Europe. Despite panmixia, a small set of SNPs showed high genetic differentiation consistent with single‐generation signatures of spatially varying selection acting on glass eels. After screening 50 354 SNPs, a total of 754 potentially locally selected SNPs were identified. Candidate genes for local selection constituted a wide array of functions, including calcium signalling, neuroactive ligand–receptor interaction and circadian rhythm. Remarkably, one of the candidate genes identified is PERIOD, possibly related to differences in local photoperiod associated with the >30° difference in latitude between locations. Genes under selection were spread across the genome, and there were no large regions of increased differentiation as expected when selection occurs within just a single generation due to panmixia. This supports the conclusion that most of the genome is homogenized by gene flow that removes any effects of diversifying selection from each new generation. 相似文献

15.

A new way of identifying biomarkers in biomedical basic-research studies

Yassouridis A Ludwig T Steiger A Leisch F 《PloS one》2012,7(5):e35741

A simple, nonparametric and distribution free method was developed for quick identification of the most meaningful biomarkers among a number of candidates in complex biological phenomena, especially in relatively small samples. This method is independent of rigid model forms or other link functions. It may be applied both to metric and non-metric data as well as to independent or matched parallel samples. With this method identification of the most relevant biomarkers is not based on inferential methods; therefore, its application does not require corrections of the level of significance, even in cases of thousands of variables. Hence, the introduced method is appropriate to analyze and evaluate data of complex investigations in clinical and pre-clinical basic research, such as gene or protein expressions, phenotype-genotype associations in case-control studies on the basis of thousands of genes and SNPs (single nucleotide polymorphism), search of prevalence in sleep EEG-Data, functional magnetic resonance imaging (fMRI) or others. 相似文献

16.

Large-scale genotyping of complex DNA 总被引：21，自引：0，他引：21

Kennedy GC Matsuzaki H Dong S Liu WM Huang J Liu G Su X Cao M Chen W Zhang J Liu W Yang G Di X Ryder T He Z Surti U Phillips MS Boyce-Jacino MT Fodor SP Jones KW 《Nature biotechnology》2003,21(10):1233-1237

Genetic studies aimed at understanding the molecular basis of complex human phenotypes require the genotyping of many thousands of single-nucleotide polymorphisms (SNPs) across large numbers of individuals. Public efforts have so far identified over two million common human SNPs; however, the scoring of these SNPs is labor-intensive and requires a substantial amount of automation. Here we describe a simple but effective approach, termed whole-genome sampling analysis (WGSA), for genotyping thousands of SNPs simultaneously in a complex DNA sample without locus-specific primers or automation. Our method amplifies highly reproducible fractions of the genome across multiple DNA samples and calls genotypes at >99% accuracy. We rapidly genotyped 14,548 SNPs in three different human populations and identified a subset of them with significant allele frequency differences between groups. We also determined the ancestral allele for 8,386 SNPs by genotyping chimpanzee and gorilla DNA. WGSA is highly scaleable and enables the creation of ultrahigh density SNP maps for use in genetic studies. 相似文献

17.

Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

Ueki M Tamiya G 《BMC bioinformatics》2012,13(1):72

ABSTRACT: BACKGROUND: Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs) is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS) however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. RESULTS: We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS) [17] for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units) technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case-Control Consortium) data. CONCLUSIONS: Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction. 相似文献

18.

A novel permutation test for case‐only analysis identifies epistatic effects on human longevity in the FOXO gene family

Qihua Tan Mette Soerensen Torben A. Kruse Kaare Christensen Lene Christiansen 《Aging cell》2013,12(4):690-694

相似文献

19.

Empirical Bayes analysis of single nucleotide polymorphisms

Holger Schwender Katja Ickstadt 《BMC bioinformatics》2008,9(1):144

Background

An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors. 相似文献

20.

A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage

Jianmin?Wang Xiaoqiu?Huang Email author 《BMC bioinformatics》2005,6(1):220

Background

The allele frequencies of single-nucleotide polymorphisms (SNPs) are needed to select an optimal subset of common SNPs for use in association studies. Sequence-based methods for finding SNPs with allele frequencies may need to handle thousands of sequences from the same genome location (sequences of deep coverage). 相似文献