首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lyu  Chuqiao  Wang  Lei  Zhang  Juhua 《BMC genomics》2018,19(10):905-165

Background

The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.

Methods

Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.

Results

Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.

Conclusions

Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.
  相似文献   

2.

Background

New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation andpenalties for multiple testing.

Methods

The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge.

Results

Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data.

Conclusions

The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.
  相似文献   

3.
Min  Xu  Zeng  Wanwen  Chen  Shengquan  Chen  Ning  Chen  Ting  Jiang  Rui 《BMC bioinformatics》2017,18(13):478-46

Background

With the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable.

Results

To facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database.

Conclusions

DeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.
  相似文献   

4.

Background

While continental level ancestry is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge.

Methods

We study the problem of predicting human biogeographical ancestry from genomic data under resource constraints. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We propose methods to construct such ancestry informative SNP panels using correlation-based and outlier-based methods.

Results

We accessed the performance of the proposed SNP panels derived from just one chromosome, using data from the 1000 Genome Project, Phase 3. For continental-level ancestry classification, we achieved an overall classification rate of 96.75% using 206 single nucleotide polymorphisms (SNPs). For sub-population level ancestry prediction, we achieved an average pairwise binary classification rates as follows: subpopulations in Europe: 76.6% (58 SNPs); Africa: 87.02% (87 SNPs); East Asia: 73.30% (68 SNPs); South Asia: 81.14% (75 SNPs); America: 85.85% (68 SNPs).

Conclusion

Our results demonstrate that one single chromosome (in particular, Chromosome 1), if carefully analyzed, could hold enough information for accurate prediction of human biogeographical ancestry. This has significant implications in terms of the computational resources required for analysis of ancestry, and in the applications of such analyses, such as in studies of genetic diseases, forensics, and soft biometrics.
  相似文献   

5.

Background

The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions.

Results

We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses.

Conclusions

We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.
  相似文献   

6.
7.

Background

KASP (KBioscience Competitive Allele Specific PCR) and Amplifluor (Amplification with fluorescence) SNP markers are two prominent technologies based upon a shared identical Allele-specific PCR platform.

Methods

Amplifluor-like SNP and KASP analysis was carried out using published and own design of Universal probes (UPs) and Gene-specific primers (GSPs).

Results

Advantages of the Amplifluor-like system over KASP include the significantly lower costs and much greater flexibility in the adjustment and development of ‘self-designed’ dual fluorescently-labelled UPs and regular GSPs. The presented results include optimisation of ‘tail’ length in UPs and GSPs, protocol adjustment, and the use of various fluorophores in different qPCR instruments. The compatibility of the KASP Master-mix in both original and Amplifluor-like systems has been demonstrated in the presented results, proving their similar principles. Results of SNP scoring with rare alleles in addition to more frequently occurring alleles are shown.

Conclusions

The Amplifluor-like system produces SNP genotyping results with a level of sensitivity and accuracy comparable to KASP but at a significantly cheaper cost and with much greater flexibility for UPs with self-designed GSPs.
  相似文献   

8.

Background

Cervical cancer is the fifth most common cancer among women, which is the third leading cause of cancer death in women worldwide. Brachytherapy is the most effective treatment for cervical cancer. For brachytherapy, computed tomography (CT) imaging is necessary since it conveys tissue density information which can be used for dose planning. However, the metal artifacts caused by brachytherapy applicators remain a challenge for the automatic processing of image data for image-guided procedures or accurate dose calculations. Therefore, developing an effective metal artifact reduction (MAR) algorithm in cervical CT images is of high demand.

Methods

A novel residual learning method based on convolutional neural network (RL-ARCNN) is proposed to reduce metal artifacts in cervical CT images. For MAR, a dataset is generated by simulating various metal artifacts in the first step, which will be applied to train the CNN. This dataset includes artifact-insert, artifact-free, and artifact-residual images. Numerous image patches are extracted from the dataset for training on deep residual learning artifact reduction based on CNN (RL-ARCNN). Afterwards, the trained model can be used for MAR on cervical CT images.

Results

The proposed method provides a good MAR result with a PSNR of 38.09 on the test set of simulated artifact images. The PSNR of residual learning (38.09) is higher than that of ordinary learning (37.79) which shows that CNN-based residual images achieve favorable artifact reduction. Moreover, for a 512?×?512 image, the average removal artifact time is less than 1 s.

Conclusions

The RL-ARCNN indicates that residual learning of CNN remarkably reduces metal artifacts and improves critical structure visualization and confidence of radiation oncologists in target delineation. Metal artifacts are eliminated efficiently free of sinogram data and complicated post-processing procedure.
  相似文献   

9.
10.

Background

Ocular images play an essential role in ophthalmological diagnoses. Having an imbalanced dataset is an inevitable issue in automated ocular diseases diagnosis; the scarcity of positive samples always tends to result in the misdiagnosis of severe patients during the classification task. Exploring an effective computer-aided diagnostic method to deal with imbalanced ophthalmological dataset is crucial.

Methods

In this paper, we develop an effective cost-sensitive deep residual convolutional neural network (CS-ResCNN) classifier to diagnose ophthalmic diseases using retro-illumination images. First, the regions of interest (crystalline lens) are automatically identified via twice-applied Canny detection and Hough transformation. Then, the localized zones are fed into the CS-ResCNN to extract high-level features for subsequent use in automatic diagnosis. Second, the impacts of cost factors on the CS-ResCNN are further analyzed using a grid-search procedure to verify that our proposed system is robust and efficient.

Results

Qualitative analyses and quantitative experimental results demonstrate that our proposed method outperforms other conventional approaches and offers exceptional mean accuracy (92.24%), specificity (93.19%), sensitivity (89.66%) and AUC (97.11%) results. Moreover, the sensitivity of the CS-ResCNN is enhanced by over 13.6% compared to the native CNN method.

Conclusion

Our study provides a practical strategy for addressing imbalanced ophthalmological datasets and has the potential to be applied to other medical images. The developed and deployed CS-ResCNN could serve as computer-aided diagnosis software for ophthalmologists in clinical application.
  相似文献   

11.
12.
Xie  Rui  Wen  Jia  Quitadamo  Andrew  Cheng  Jianlin  Shi  Xinghua 《BMC genomics》2017,18(9):845-49

Background

Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance.

Results

To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns.

Conclusion

We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes’ contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
  相似文献   

13.

Introduction

Collecting feces is easy. It offers direct outcome to endogenous and microbial metabolites.

Objectives

In a context of lack of consensus about fecal sample preparation, especially in animal species, we developed a robust protocol allowing untargeted LC-HRMS fingerprinting.

Methods

The conditions of extraction (quantity, preparation, solvents, dilutions) were investigated in bovine feces.

Results

A rapid and simple protocol involving feces extraction with methanol (1/3, M/V) followed by centrifugation and a step filtration (10 kDa) was developed.

Conclusion

The workflow generated repeatable and informative fingerprints for robust metabolome characterization.
  相似文献   

14.

Introduction

While atenolol is an effective antihypertensive agent, its use is also associated with adverse events including hyperglycemia and incident diabetes that may offset the benefits of blood pressure lowering. By combining metabolomic and genomic data acquired from hypertensive individuals treated with atenolol, it may be possible to better understand the pathways that most impact the development of an adverse glycemic state.

Objective

To identify biomarkers that can help predict susceptibility to blood glucose excursions during exposure to atenolol.

Methods

Plasma samples acquired from 234 Caucasian participants treated with atenolol in the Pharmacogenomic Evaluation of Antihypertensive Responses trial were analyzed by gas chromatography Time-Of-Flight Mass Spectroscopy. Metabolomics and genomics data were integrated by first correlating participant’s metabolomic profiles to change in glucose after treatment with atenolol, and then incorporating genotype information from genes involved in metabolite pathways associated with glucose response.

Results

Our findings indicate that the baseline level of β-alanine was associated with glucose change after treatment with atenolol (Q = 0.007, β = 2.97 mg/dL). Analysis of genomic data revealed that carriers of the G allele for SNP rs2669429 in gene DPYS, which codes for dihydropyrimidinase, an enzyme involved in β-alanine formation, had significantly higher glucose levels after treatment with atenolol when compared with non-carriers (Q = 0.05, β = 2.76 mg/dL). This finding was replicated in participants who received atenolol as an add-on therapy (P = 0.04, β = 1.86 mg/dL).

Conclusion

These results suggest that β-alanine and rs2669429 may be predictors of atenolol-induced hyperglycemia in Caucasian individuals and further investigation is warranted.
  相似文献   

15.

Background

P-glycoprotein (P-gp) is a 170-kDa membrane protein. It provides a barrier function and help to excrete toxins from the body as a transporter. Some bioflavonoids have been shown to block P-gp activity.

Objective

To evaluate the important amino acid residues within nucleotide binding domain 1 (NBD1) of P-gp that play a key role in molecular interactions with flavonoids using structure-based pharmacophore model.

Methods

In the molecular docking with NBD1 models, a putative binding site of flavonoids was proposed and compared with the site for ATP. The binding modes for ligands were achieved using LigandScout to generate the P-gp–flavonoid pharmacophore models.

Results

The binding pocket for flavonoids was investigated and found these inhibitors compete with the ATP for binding site in NBD1 including the NBD1 amino acid residues identified by the in silico techniques to be involved in the hydrogen bonding and van der Waals (hydrophobic) interactions with flavonoids.

Conclusion

These flavonoids occupy with the same binding site of ATP in NBD1 proffering that they may act as an ATP competitive inhibitor.
  相似文献   

16.

Background

Identification of genes underlying production traits is a key aim of the mink research community. Recent availability of genomic tools have opened the possibility for faster genetic progress in mink breeding. Availability of mink genome assembly allows genome-wide association studies in mink.

Results

In this study, we used genotyping-by-sequencing to obtain single nucleotide polymorphism (SNP) genotypes of 2496 mink. After multiple rounds of filtering, we retained 28,336 high quality SNPs and 2352 individuals for a genome-wide association study (GWAS). We performed the first GWAS for body weight, behavior, along with 10 traits related to fur quality in mink.

Conclusions

Combining association results with existing functional information of genes and mammalian phenotype databases, we proposed WWC3, MAP2K4, SLC7A1 and USP22 as candidate genes for body weight and pelt length in mink.
  相似文献   

17.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

18.

Background

Metabolic syndrome is a risk factor for type 2 diabetes and cardiovascular disease. We identified common genetic variants that alter the risk for metabolic syndrome in the Korean population. To isolate these variants, we conducted a multiple-genotype and multiple-phenotype genome-wide association analysis using the family-based quasi-likelihood score (MFQLS) test. For this analysis, we used 7211 and 2838 genotyped study subjects for discovery and replication, respectively. We also performed a multiple-genotype and multiple-phenotype analysis of a gene-based single-nucleotide polymorphism (SNP) set.

Results

We found an association between metabolic syndrome and an intronic SNP pair, rs7107152 and rs1242229, in SIDT2 gene at 11q23.3. Both SNPs correlate with the expression of SIDT2 and TAGLN, whose products promote insulin secretion and lipid metabolism, respectively. This SNP pair showed statistical significance at the replication stage.

Conclusions

Our findings provide insight into an underlying mechanism that contributes to metabolic syndrome.
  相似文献   

19.

Background

GAW20 working group 5 brought together researchers who contributed 7 papers with the aim of evaluating methods to detect genetic by epigenetic interactions. GAW20 distributed real data from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study, including single-nucleotide polymorphism (SNP) markers, methylation (cytosine-phosphate-guanine [CpG]) markers, and phenotype information on up to 995 individuals. In addition, a simulated data set based on the real data was provided.

Results

The 7 contributed papers analyzed these data sets with a number of different statistical methods, including generalized linear mixed models, mediation analysis, machine learning, W-test, and sparsity-inducing regularized regression. These methods generally appeared to perform well. Several papers confirmed a number of causative SNPs in either the large number of simulation sets or the real data on chromosome 11. Findings were also reported for different SNPs, CpG sites, and SNP–CpG site interaction pairs.

Conclusions

In the simulation (200 replications), power appeared generally good for large interaction effects, but smaller effects will require larger studies or consortium collaboration for realizing a sufficient power.
  相似文献   

20.

Introduction

Feed optimization is a key step to the environmental and economic sustainability of aquaculture, especially for carnivorous species. Plant-derived ingredients can contribute to reduce costs and nitrogenous effluents while sparing wild fish stocks. However, the metabolic use of carbohydrates from vegetable sources by carnivorous fish is still not completely understood.

Objectives

We aimed to study the effects of diets with carbohydrates of different digestibilities, gelatinized starch (DS) and raw starch (RS), in the muscle metabolome of European seabass (Dicentrarchus labrax).

Methods

We followed an NMR-metabolomics approach, using two sample preparation procedures, the intact muscle (HRMAS) and the aqueous muscle extracts (1H NMR), to compare the variations in muscle metabolome between the two diets.

Results

In muscle, multivariate analysis revealed similar metabolome shifts for DS and RS diets, when compared with the control diet. HRMAS of intact muscle, which included both hydrophobic and hydrophilic metabolites, showed increased lipid in DS-fed fish by univariate analysis. Regardless of the nature of the starch, increased glycine and phenylalanine, and decreased proline were observed when compared to the Ctr diet. Combined univariate analysis of intact muscle and aqueous extracts indicated specific diet related changes in lipid and amino acid metabolism, consistent with increased dietary carbohydrate supplementation.

Conclusions

Due to differential sample processing, outputs differ in detail but provide complementary information. After tracing nutritional alterations by profiling fillet components, DS seems to be the most promising alternative to fishmeal-based diets in aquaculture. This approach should be reproducible for other farmed fish species and provide valuable information on nutritional and organoleptic properties of the final product.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号