首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
SUMMARY: The interpretation of genome-wide association results is confounded by linkage disequilibrium between nearby alleles. We have developed a flexible bioinformatics query tool for single-nucleotide polymorphisms (SNPs) to identify and to annotate nearby SNPs in linkage disequilibrium (proxies) based on HapMap. By offering functionality to generate graphical plots for these data, the SNAP server will facilitate interpretation and comparison of genome-wide association study results, and the design of fine-mapping experiments (by delineating genomic regions harboring associated variants and their proxies). AVAILABILITY: SNAP server is available at http://www.broad.mit.edu/mpg/snap/.  相似文献   

3.
4.
5.

Background

Multifactor dimensionality reduction (MDR) is widely used to analyze interactions of genes to determine the complex relationship between diseases and polymorphisms in humans. However, the astronomical number of high-order combinations makes MDR a highly time-consuming process which can be difficult to implement for multiple tests to identify more complex interactions between genes. This study proposes a new framework, named fast MDR (FMDR), which is a greedy search strategy based on the joint effect property.

Results

Six models with different minor allele frequencies (MAFs) and different sample sizes were used to generate the six simulation data sets. A real data set was obtained from the mitochondrial D-loop of chronic dialysis patients. Comparison of results from the simulation data and real data sets showed that FMDR identified significant gene–gene interaction with less computational complexity than the MDR in high-order interaction analysis.

Conclusion

FMDR improves the MDR difficulties associated with the computational loading of high-order SNPs and can be used to evaluate the relative effects of each individual SNP on disease susceptibility. FMDR is freely available at http://bioinfo.kmu.edu.tw/FMDR.rar.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1717-8) contains supplementary material, which is available to authorized users.  相似文献   

6.
Copy number variation (CNV) is one of the most prevalent genetic variations in the genome, leading to an abnormal number of copies of moderate to large genomic regions. High-throughput technologies such as next-generation sequencing often identify thousands of CNVs involved in biological or pathological processes. Despite the growing demand to filter and classify CNVs by factors such as frequency in population, biological features, and function, surprisingly, no online web server for CNV annotations has been made available to the research community. Here, we present CNVannotator, a web server that accepts an input set of human genomic positions in a user-friendly tabular format. CNVannotator can perform genomic overlaps of the input coordinates using various functional features, including a list of the reported 356,817 common CNVs, 181,261 disease CNVs, as well as, 140,342 SNPs from genome-wide association studies. In addition, CNVannotator incorporates 2,211,468 genomic features, including ENCODE regulatory elements, cytoband, segmental duplication, genome fragile site, pseudogene, promoter, enhancer, CpG island, and methylation site. For cancer research community users, CNVannotator can apply various filters to retrieve a subgroup of CNVs pinpointed in hundreds of tumor suppressor genes and oncogenes. In total, 5,277,234 unique genomic coordinates with functional features are available to generate an output in a plain text format that is free to download. In summary, we provide a comprehensive web resource for human CNVs. The annotated results along with the server can be accessed at http://bioinfo.mc.vanderbilt.edu/CNVannotator/.  相似文献   

7.

Background

The diversity of viruses, the absence of universally common genes in them, and their ability to act as carriers of genetic material make assessment of evolutionary paths of viral genes very difficult. One important factor contributing to this complexity is horizontal gene transfer.

Results

We explore the possibility for the systematic identification of atypical genes within virus families, including viruses whose genome is not encoded by a double-stranded DNA. Our method is based on gene statistical features that differ in genes that were subject of recent horizontal gene transfer from those of the genome in which they are observed. We employ a one-class SVM approach to detect atypical genes within a virus family basing of their statistical signatures and without explicit knowledge of the source species. The simplicity of the statistical features used makes the method applicable to various viruses irrespective of their genome size or type.

Conclusions

On simulated data, the method can robustly identify alien genes irrespective of the coding nucleic acid found in a virus. It also compares well to results obtained in related studies for double-stranded DNA viruses. Its value in practice is confirmed by the identification of isolated examples of horizontal gene transfer events that have already been described in the literature. A Python package implementing the method and the results for the analyzed virus families are available at http://svm-agp.bioinf.mpi-inf.mpg.de.  相似文献   

8.
9.
Dong C  Qian Z  Jia P  Wang Y  Huang W  Li Y 《PloS one》2007,2(12):e1262

Background

The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches.

Methodology/Principal Findings

In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics.

Conclusions/Significance

This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions.  相似文献   

10.
11.
12.
13.
The selective genotyping approach, where only individuals from the high and low extremes of the trait distribution are selected for genotyping and the remaining individuals are not genotyped, has been known as a cost-saving strategy to reduce genotyping work and can still maintain nearly equivalent efficiency to complete genotyping in QTL mapping. We propose a novel and simple statistical method based on the normal mixture model for selective genotyping when both genotyped and ungenotyped individuals are fitted in the model for QTL analysis. Compared to the existing methods, the main feature of our model is that we first provide a simple way for obtaining the distribution of QTL genotypes for the ungenotyped individuals and then use it, rather than the population distribution of QTL genotypes as in the existing methods, to fit the ungenotyped individuals in model construction. Another feature is that the proposed method is developed on the basis of a multiple-QTL model and has a simple estimation procedure similar to that for complete genotyping. As a result, the proposed method has the ability to provide better QTL resolution, analyze QTL epistasis, and tackle multiple QTL problem under selective genotyping. In addition, a truncated normal mixture model based on a multiple-QTL model is developed when only the genotyped individuals are considered in the analysis, so that the two different types of models can be compared and investigated in selective genotyping. The issue in determining threshold values for selective genotyping in QTL mapping is also discussed. Simulation studies are performed to evaluate the proposed methods, compare the different models, and study the QTL mapping properties in selective genotyping. The results show that the proposed method can provide greater QTL detection power and facilitate QTL mapping for selective genotyping. Also, selective genotyping using larger genotyping proportions may provide roughly equivalent power to complete genotyping and that using smaller genotyping proportions has difficulties doing so. The R code of our proposed method is available on http://www.stat.sinica.edu.tw/chkao/.  相似文献   

14.
Genome-wide association study (GWAS) provides a powerful tool for investigating the genetic architecture of human polygenic diseases and is generally used to identify the genetic factors of disease susceptibility, clinical phenotypes, and treatment response. The differences in allele frequencies of single nucleotide polymorphisms (SNPs) distributed throughout the genome are analyzed with a microarray technique or other technologies that allow simultaneous genotyping at several tens of thousands to several millions of SNPs per sample. Owing to its power to find out highly reliable differences between patients and controls, GWAS became a common approach to identification of the genetic susceptibility factors in complex diseases of a polygenic nature. Using multiple sclerosis (MS) as a prototype complex disease, the review considers the main achievements and challenges of using GWAS to identify the genes involved in the disease and, therefore, to better understand the pathogenetic molecular mechanisms and genetic risk factors.  相似文献   

15.
Identification of key metabolites for complex diseases is a challenging task in today''s medicine and biology. A special disease is usually caused by the alteration of a series of functional related metabolites having a global influence on the metabolic network. Moreover, the metabolites in the same metabolic pathway are often associated with the same or similar disease. Based on these functional relationships between metabolites in the context of metabolic pathways, we here presented a pathway-based random walk method called PROFANCY for prioritization of candidate disease metabolites. Our strategy not only takes advantage of the global functional relationships between metabolites but also sufficiently exploits the functionally modular nature of metabolic networks. Our approach proved successful in prioritizing known metabolites for 71 diseases with an AUC value of 0.895. We also assessed the performance of PROFANCY on 16 disease classes and found that 4 classes achieved an AUC value over 0.95. To investigate the robustness of the PROFANCY, we repeated all the analyses in two metabolic networks and obtained similar results. Then we applied our approach to Alzheimer''s disease (AD) and found that a top ranked candidate was potentially related to AD but had not been reported previously. Furthermore, our method was applicable to prioritize the metabolites from metabolomic profiles of prostate cancer. The PROFANCY could identify prostate cancer related-metabolites that are supported by literatures but not considered to be significantly differential by traditional differential analysis. We also developed a freely accessible web-based and R-based tool at http://bioinfo.hrbmu.edu.cn/PROFANCY.  相似文献   

16.
The high tumor heterogeneity makes it very challenging to identify key tumorigenic pathways as therapeutic targets. The integration of multiple omics data is a promising approach to identify driving regulatory networks in patient subgroups. Here, we propose a novel conceptual framework to discover patterns of miRNA-gene networks, observed frequently up- or down-regulated in a group of patients and to use such networks for patient stratification in hepatocellular carcinoma (HCC). We developed an integrative subgraph mining approach, called iSubgraph, and identified altered regulatory networks frequently observed in HCC patients. The miRNA and gene expression profiles were jointly analyzed in a graph structure. We defined a method to transform microarray data into graph representation that encodes miRNA and gene expression levels and the interactions between them as well. The iSubgraph algorithm was capable to detect cooperative regulation of miRNAs and genes even if it occurred only in some patients. Next, the miRNA-mRNA modules were used in an unsupervised class prediction model to discover HCC subgroups via patient clustering by mixture models. The robustness analysis of the mixture model showed that the class predictions are highly stable. Moreover, the Kaplan-Meier survival analysis revealed that the HCC subgroups identified by the algorithm have different survival characteristics. The pathway analyses of the miRNA-mRNA co-modules identified by the algorithm demonstrate key roles of Myc, E2F1, let-7, TGFB1, TNF and EGFR in HCC subgroups. Thus, our method can integrate various omics data derived from different platforms and with different dynamic scales to better define molecular tumor subtypes. iSubgraph is available as MATLAB code at http://www.cs.umd.edu/~ozdemir/isubgraph/.  相似文献   

17.
Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption—specifically, that difficult-to-impute SNPs tend to have larger effects—and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate—their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.  相似文献   

18.
Genetic linkage maps are indispensable tools in genetic, genomic and breeding studies. As one of genotyping-by-sequencing methods, RAD-Seq (restriction-site associated DNA sequencing) has gained particular popularity for construction of high-density linkage maps. Current RAD analytical tools are being predominantly used for typing codominant markers. However, no genotyping algorithm has been developed for dominant markers (resulting from recognition site disruption). Given their abundance in eukaryotic genomes, utilization of dominant markers would greatly diminish the extensive sequencing effort required for large-scale marker development. In this study, we established, for the first time, a novel statistical framework for de novo dominant genotyping in mapping populations. An integrated package called RADtyping was developed by incorporating both de novo codominant and dominant genotyping algorithms. We demonstrated the superb performance of RADtyping in achieving remarkably high genotyping accuracy based on simulated and real mapping datasets. The RADtyping package is freely available at http://www2.ouc.edu.cn/mollusk/ detailen.asp?id=727.  相似文献   

19.
Identifying molecular connections between developmental processes and disease can lead to new hypotheses about health risks at all stages of life. Here we introduce a new approach to identifying significant connections between gene sets and disease genes, and apply it to several gene sets related to human development. To overcome the limits of incomplete and imperfect information linking genes to disease, we pool genes within disease subtrees in the MeSH taxonomy, and we demonstrate that such pooling improves the power and accuracy of our approach. Significance is assessed through permutation. We created a web-based visualization tool to facilitate multi-scale exploration of this large collection of significant connections (http://gda.cs.tufts.edu/development). High-level analysis of the results reveals expected connections between tissue-specific developmental processes and diseases linked to those tissues, and widespread connections to developmental disorders and cancers. Yet interesting new hypotheses may be derived from examining the unexpected connections. We highlight and discuss the implications of three such connections, linking dementia with bone development, polycystic ovary syndrome with cardiovascular development, and retinopathy of prematurity with lung development. Our results provide additional evidence that plays a key role in the early pathogenesis of polycystic ovary syndrome. Our evidence also suggests that the VEGF pathway and downstream NFKB signaling may explain the complex relationship between bronchopulmonary dysplasia and retinopathy of prematurity, and may form a bridge between two currently-competing hypotheses about the molecular origins of bronchopulmonary dysplasia. Further data exploration and similar queries about other gene sets may generate a variety of new information about the molecular relationships between additional diseases.  相似文献   

20.
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号