首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.  相似文献   

2.
BackgroundMachine learning (ML) has been gradually integrated into oncologic research but seldom applied to predict cervical cancer (CC), and no model has been reported to predict survival and site-specific recurrence simultaneously. Thus, we aimed to develop ML models to predict survival and site-specific recurrence in CC and to guide individual surveillance.MethodsWe retrospectively collected data on CC patients from 2006 to 2017 in four hospitals. The survival or recurrence predictive value of the variables was analyzed using multivariate Cox, principal component, and K-means clustering analyses. The predictive performances of eight ML models were compared with logistic or Cox models. A novel web-based predictive calculator was developed based on the ML algorithms.ResultsThis study included 5112 women for analysis (268 deaths, 343 recurrences): (1) For site-specific recurrence, larger tumor size was associated with local recurrence, while positive lymph nodes were associated with distant recurrence. (2) The ML models exhibited better prognostic predictive performance than traditional models. (3) The ML models were superior to traditional models when multiple variables were used. (4) A novel predictive web-based calculator was developed and externally validated to predict survival and site-specific recurrence.ConclusionML models might be a better analytic approach in CC prognostic prediction than traditional models as they can predict survival and site-specific recurrence simultaneously, especially when using multiple variables. Moreover, our novel web-based calculator may provide clinicians with useful information and help them make individual postoperative follow-up plans and further treatment strategies.  相似文献   

3.
4.
Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.  相似文献   

5.
The evolution of omics and computational competency has accelerated discoveries of the underlying biological processes in an unprecedented way. High throughput methodologies, such as flow cytometry, can reveal deeper insights into cell processes, thereby allowing opportunities for scientific discoveries related to health and diseases. However, working with cytometry data often imposes complex computational challenges due to high-dimensionality, large size, and nonlinearity of the data structure. In addition, cytometry data frequently exhibit diverse patterns across biomarkers and suffer from substantial class imbalances which can further complicate the problem. The existing methods of cytometry data analysis either predict cell population or perform feature selection. Through this study, we propose a “wisdom of the crowd” approach to simultaneously predict rare cell populations and perform feature selection by integrating a pool of modern machine learning (ML) algorithms. Given that our approach integrates superior performing ML models across different normalization techniques based on entropy and rank, our method can detect diverse patterns existing across the model features. Furthermore, the method identifies a dynamic biomarker structure that divides the features into persistently selected, unselected, and fluctuating assemblies indicating the role of each biomarker in rare cell prediction, which can subsequently aid in studies of disease progression.  相似文献   

6.
The digital information age has been a catalyst in creating a renewed interest in Artificial Intelligence (AI) approaches, especially the subclass of computer algorithms that are popularly grouped into Machine Learning (ML). These methods have allowed one to go beyond limited human cognitive ability into understanding the complexity in the high dimensional data. Medical sciences have seen a steady use of these methods but have been slow in adoption to improve patient care. There are some significant impediments that have diluted this effort, which include availability of curated diverse data sets for model building, reliable human-level interpretation of these models, and reliable reproducibility of these methods for routine clinical use. Each of these aspects has several limiting conditions that need to be balanced out, considering the data/model building efforts, clinical implementation, integration cost to translational effort with minimal patient level harm, which may directly impact future clinical adoption. In this review paper, we will assess each aspect of the problem in the context of reliable use of the ML methods in oncology, as a representative study case, with the goal to safeguard utility and improve patient care in medicine in general.  相似文献   

7.
8.
9.
During mammalian evolution, complex systems of epigenetic gene regulation have been established: Epigenetic mechanisms control tissue-specific gene expression, X chromosome inactivation in females and genomic imprinting. Studying DNA sequence conservation in imprinted genes, it becomes evident that evolution of gene function and evolution of epigenetic gene regulation are tightly connected. Furthermore, comparative studies allow the identification of DNA sequence features that distinguish imprinted genes from biallelically expressed genes. Among these features are CpG islands, tandem repeats and retrotransposed elements that are known to play major roles in epigenetic gene regulation. Currently, more and more genetic and epigenetic data sets become available. In future, such data sets will provide the basis for more complex investigations on epigenetic variation in human populations. Therein, an exciting topic will be the genetic and epigenetic variability of imprinted genes and its input on human disease.  相似文献   

10.
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning.  相似文献   

11.
Population genetic data from multiple taxa can address comparative phylogeographic questions about community‐scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co‐demographic models that directly test multi‐taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced‐genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD‐seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi‐dice , a wrapper program that exploits existing simulation software for flexible execution of hierarchical model‐based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co‐demographic pulses and specifying flexible hyperprior distributions. In sum, multi‐dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics.  相似文献   

12.
13.
Perovskite solar cells (PSCs) have recently received considerable attention due to the high energy conversion efficiency achieved within a few years of their inception. However, a machine learning (ML) approach to guide the development of high‐performing PSCs is still lacking. In this paper ML is used to optimize material composition, develop design strategies, and predict the performance of PSCs. The ML models are developed using 333 data points selected from about 2000 peer reviewed publications. These models guide the design of new perovskite materials and the development of high‐performing solar cells. Based on ML guidance, new perovskite compositions are experimentally synthesized to test the practicability of the model. The ML model also shows its ability to predict underlying physical phenomena as well as the performance of PSCs. The PSC model matches well with the theoretical prediction by the Shockley and Queisser limit, which is almost impossible for a human to find from an ensemble of data points. Moreover, strategies for developing high‐performing PSCs with different bandgaps are also derived from the model. These findings show that ML is very promising not only for predicting the performance, but also for providing a deeper understanding of the physical phenomena associated with the PSCs.  相似文献   

14.
Proper functioning of complex phenotypes requires that multiple traits work together. Examination of relationships among traits within and between complex characters and how they interact to function as a whole organism is critical to advancing our understanding of evolutionary developmental plasticity. Phenotypic integration refers to the relationships among multiple characters of a complex phenotype, and their relationships with other functional units (modules) in an organism. In this review, I summarize a brief history of the concept of phenotypic integration in plant and animal biology. Following an introduction of concepts, including modularity, I use an empirical case-study approach to highlight recent advance in clarifying the developmental and genomic basis of integration. I end by highlighting some novel approaches to genomic and epigenetic perturbations that offer promise in further addressing the role of phenotypic integration in evolutionary diversification. In the age of the phenotype, studies that examine the genomic and developmental changes in relationships of traits across environments will shape the next chapter in our quest for understanding the evolution of complex characters.  相似文献   

15.
The identification of genetic and epigenetic alterations from primary tumor cells has become a common method to identify genes critical to the development and progression of cancer. We seek to identify those genetic and epigenetic aberrations that have the most impact on gene function within the tumor. First, we perform a bioinformatic analysis of copy number variation (CNV) and DNA methylation covering the genetic landscape of ovarian cancer tumor cells. We separately examined CNV and DNA methylation for 42 primary serous ovarian cancer samples using MOMA-ROMA assays and 379 tumor samples analyzed by The Cancer Genome Atlas. We have identified 346 genes with significant deletions or amplifications among the tumor samples. Utilizing associated gene expression data we predict 156 genes with altered copy number and correlated changes in expression. Among these genes CCNE1, POP4, UQCRB, PHF20L1 and C19orf2 were identified within both data sets. We were specifically interested in copy number variation as our base genomic property in the prediction of tumor suppressors and oncogenes in the altered ovarian tumor. We therefore identify changes in DNA methylation and expression for all amplified and deleted genes. We statistically define tumor suppressor and oncogenic features for these modalities and perform a correlation analysis with expression. We predicted 611 potential oncogenes and tumor suppressors candidates by integrating these data types. Genes with a strong correlation for methylation dependent expression changes exhibited at varying copy number aberrations include CDCA8, ATAD2, CDKN2A, RAB25, AURKA, BOP1 and EIF2C3. We provide copy number variation and DNA methylation analysis for over 11,500 individual genes covering the genetic landscape of ovarian cancer tumors. We show the extent of genomic and epigenetic alterations for known tumor suppressors and oncogenes and also use these defined features to identify potential ovarian cancer gene candidates.  相似文献   

16.
In this paper, we present a novel approach of implementing a combination methodology to find appropriate neural network architecture and weights using an evolutionary least square based algorithm (GALS).1 This paper focuses on aspects such as the heuristics of updating weights using an evolutionary least square based algorithm, finding the number of hidden neurons for a two layer feed forward neural network, the stopping criterion for the algorithm and finally some comparisons of the results with other existing methods for searching optimal or near optimal solution in the multidimensional complex search space comprising the architecture and the weight variables. We explain how the weight updating algorithm using evolutionary least square based approach can be combined with the growing architecture model to find the optimum number of hidden neurons. We also discuss the issues of finding a probabilistic solution space as a starting point for the least square method and address the problems involving fitness breaking. We apply the proposed approach to XOR problem, 10 bit odd parity problem and many real-world benchmark data sets such as handwriting data set from CEDAR, breast cancer and heart disease data sets from UCI ML repository. The comparative results based on classification accuracy and the time complexity are discussed.  相似文献   

17.
The EpiGRAPH web service enables biologists to uncover hidden associations in vertebrate genome and epigenome datasets. Users can upload sets of genomic regions and EpiGRAPH will test multiple attributes (including DNA sequence, chromatin structure, epigenetic modifications and evolutionary conservation) for enrichment or depletion among these regions. Furthermore, EpiGRAPH learns to predictively identify similar genomic regions. This paper demonstrates EpiGRAPH's practical utility in a case study on monoallelic gene expression and describes its novel approach to reproducible bioinformatic analysis.  相似文献   

18.
Every year about one million people die due to diseases transmitted by mosquitoes. The infection is transmitted to a person when an infected mosquito stings, injecting the saliva into the human body. The best possible way to prevent a mosquito-borne infection till date is to save the humans from exposure to mosquito bites. This study proposes a Machine Learning (ML) and Deep Learning based system to detect the presence of two critical disease spreading classes of mosquitoes such as the Aedes and Culex. The proposed system will effectively aid in epidemiology to design evidence-based policies and decisions by analyzing the risks and transmission. The study proposes an effective methodology for the classification of mosquitoes using ML and CNN models. The novel RIFS has been introduced which integrates two types of feature selection techniques – the ROI-based image filtering and the wrappers-based FFS technique. Comparative analysis of various ML and deep learning models has been performed to determine the most appropriate model applicable based on their performance metrics as well as computational needs. Results prove that ETC outperformed among the all applied ML model by providing 0.992 accuracy while VVG16 has outperformed other CNN models by giving 0.986 of accuracy.  相似文献   

19.
20.
Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the “curse of dimensionality” and issues related to the simulation and summarization of data when applied to next‐generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号