共查询到20条相似文献,搜索用时 0 毫秒
1.
《Journal of molecular biology》2022,434(15):167666
In higher eukaryotic cells, chromosomes are folded inside the nucleus. Recent advances in whole-genome mapping technologies have revealed the multiscale features of 3D genome organization that are intertwined with fundamental genome functions. However, DNA sequence determinants that modulate the formation of 3D genome organization remain poorly characterized. In the past few years, predicting 3D genome organization based on DNA sequence features has become an active area of research. Here, we review the recent progress in computational approaches to unraveling important sequence elements for 3D genome organization. In particular, we discuss the rapid development of machine learning-based methods that facilitate the connections between DNA sequence features and 3D genome architectures at different scales. While much progress has been made in developing predictive models for revealing important sequence features for 3D genome organization, new research is urgently needed to incorporate multi-omic data and enhance model interpretability, further advancing our understanding of gene regulation mechanisms through the lens of 3D genome organization. 相似文献
2.
Machine learning (ML) is rapidly revolutionizing many fields and is starting to change landscapes for physics and chemistry. With its ability to solve complex tasks autonomously, ML is being exploited as a radically new way to help find material correlations, understand materials chemistry, and accelerate the discovery of materials. Here, an in‐depth review of the application of ML to energy materials, including rechargeable alkali‐ion batteries, photovoltaics, catalysts, thermoelectrics, piezoelectrics, and superconductors, is presented. A conceptual framework is first provided for ML in materials science, with a broad overview of different ML techniques as well as best practices. This is followed by a critical discussion of how ML is applied in energy materials. This review is concluded with the perspectives on major challenges and opportunities in this exciting field. 相似文献
3.
《IRBM》2022,43(1):2-12
ObjectivesThis study focuses on integration of anatomical left ventricle myocardium features and optimized extreme learning machine (ELM) for discrimination of subjects with normal, mild, moderate and severe abnormal ejection fraction (EF). The physiological alterations in myocardium have diagnostic relevance to the etiology of cardiovascular diseases (CVD) with reduced EF.Materials and MethodsThis assessment is carried out on cardiovascular magnetic resonance (CMR) images of 104 subjects available in Kaggle Second Annual Data Science Bowl. The Segment CMR framework is used to segment myocardium from cardiac MR images, and it is subdivided into 16 sectors. 86 clinically significant anatomical features are extracted and subjected to ELM framework. Regularization coefficient and hidden neurons influence the prediction accuracy of ELM. The optimal value for these parameters is achieved with the butterfly optimizer (BO). A comparative study of BOELM framework with different activation functions and feature set has been conducted.ResultsAmong the individual feature set, myocardial volume at ED gives a better classification accuracy of 83.3% compared to others. Further, the given BOELM framework is able to provide higher multi-class accuracy of 95.2% with the entire feature set than ELM. Better discrimination of healthy and moderate abnormal subjects is achieved than other sub groups.ConclusionThe combined anatomical sector wise myocardial features assisted BOELM is able to predict the severity levels of CVDs. Thus, this study supports the radiologists in the mass diagnosis of cardiac disorder. 相似文献
4.
5.
赖氨酸琥珀酰化是一种新型的翻译后修饰,在蛋白质调节和细胞功能控制中发挥重要作用,所以准确识别蛋白质中的琥珀酰化位点是有必要的。传统的实验耗费物力和财力。通过计算方法预测是近段时间以来提出的一种高效的预测方法。本研究中,我们开发了一种新的预测方法iSucc-PseAAC,它是通过使用多种分类算法结合不同的特征提取方法。最终发现,基于耦合序列(PseAAC)特征提取下,使用支持向量机分类效果是最好的,并结合集成学习解决了数据不平衡问题。与现有方法预测效果对比,iSucc-PseAAC在区分赖氨酸琥珀酰化位点方面,更具有意义和实用性。 相似文献
6.
In this work an instance of the general problem occurring when optimizing multicomponent materials is treated: can components be optimized separately or the optimization should occur simultaneously? This problem is investigated from a computational perspective in the domain of donor–acceptor pairs for organic photovoltaics, since most experimental research reports optimization of each component separately. A collection of organic donors and acceptors recently analyzed is used to train nonlinear machine learning models of different families to predict the power conversion efficiency of donor–acceptor pairs, considering computed electronic and structural parameters of both components. The trained models are then used to predict photovoltaic performance for donor–acceptor combinations for which experimental data are not available in the data set. Data structure, and the usefulness of the trained models are critically assessed by predicting some donor–acceptor pairs that recently appeared in the literature, and the best combinations are proposed as worth investigating experimentally. 相似文献
7.
Aimin Yan Andrzej Kloczkowski Heike Hofmann Robert L. Jernigan 《Journal of biomolecular structure & dynamics》2013,31(3):275-287
Abstract We develop ways to predict the side chain orientations of residues within a protein structure by using several different statistical machine learning methods. Here side chain orientation of a given residue i is measured by an angle Ωi between the vector pointing from the center of the protein structure to the Cα i atom and the vector pointing from the Cα i atom to the center of its side chain atoms. To predict the Ωi angles, we construct statistical models by using several different methods such as general linear regression, a regression tree and bagging, a neural network, and a support vector machine. The root mean square errors for the different models range only from 36.67 to 37.60 degrees and the correlation coefficients are all between 30% and 34%. The performances of different models in the test set are, thus, quite similar, and show the relative predictive power of these models to be significant in comparison with random side chain orientations. 相似文献
8.
The C3 pathways of CO2 reduction reaction (CO2RR) lead to the generation of high-value-added chemicals for broad industrial applications, which are still challenging for current electrocatalysis. Only limited electrocatalysts have been reported with the ability to achieve C3 products while the corresponding reaction mechanisms are highly unclear. To overcome such challenges, the first-principle machine learning (FPML) technique on graphdiyne-based atomic catalysts (GDY-ACs) is introduced to directly predict the reaction trends for the key C─C─C coupling processes and the conversions to different C3 products for the first time. All the prediction results are obtained only based on the learning dataset constructed by density functional theory (DFT) calculation results for C1 and C2 pathways, offering an efficient approach to screen promising electrocatalyst candidates for varied C3 products. More importantly, the ML predictions not only reveal the significant role of the neighboring effect and the small–large integrated cycle mechanisms but also supply important insights into the C─C─C coupling processes for understanding the competitive reactions among C1 to C3 pathways. This work has offered an advanced breakthrough for the complicated CO2RR processes, accelerating the future design of novel ACs for C3 products with high efficiency and selectivity. 相似文献
9.
Benjamin Audit Cédric Vaillant Alain Arnéodo Yves d'Aubenton-Carafa Claude Thermes 《Journal of biological physics》2004,30(1):33-81
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime ( 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2]. 相似文献
10.
《Endocrine practice》2023,29(6):448-455
ObjectiveUsing supervised machine learning algorithms (SMLAs), we built models to predict the probability of type 1 diabetes mellitus patients on insulin pump therapy for meeting insulin pump self-management behavioral (IPSMB) criteria and achieving good glycemic response within 6 months.MethodsThis was a single-center retrospective chart review of 100 adult type 1 diabetes mellitus patients on insulin pump therapy (≥6 months). Three SMLAs were deployed: multivariable logistic regression (LR), random forest (RF), and K-nearest neighbor (k-NN); validated using repeated three-fold cross-validation. Performance metrics included area under the curve-Receiver of characteristics for discrimination and Brier scores for calibration.ResultsVariables predictive of adherence with IPSMB criteria were baseline hemoglobin A1c, continuous glucose monitoring, and sex. The models had comparable discriminatory power (LR = 0.74; RF = 0.74; k-NN = 0.72), with the RF model showing better calibration (Brier = 0.151). Predictors of the good glycemic response included baseline hemoglobin A1c, entering carbohydrates, and following the recommended bolus dose, with models comparable in discriminatory power (LR = 0.81, RF = 0.80, k-NN = 0.78) but the RF model being better calibrated (Brier = 0.099).ConclusionThese proof-of-concept analyses demonstrate the feasibility of using SMLAs to develop clinically relevant predictive models of adherence with IPSMB criteria and glycemic control within 6 months. Subject to further study, nonlinear prediction models may perform better. 相似文献
11.
《Molecular & cellular proteomics : MCP》2019,18(12):2492-2505
Highlights
- •Fast and culture-free method for the identification of the 15 bacterial species causing UTIs.
- •Combination of DIA analysis and machine learning algorithms to define a peptide signature.
- •High accuracy, good linearity and reproducibility, sensitivity below standard threshold.
- •Transferability to other laboratories and other mass spectrometers.
12.
机器学习使实现数据的智能化处理及充分利用数据中蕴含的知识与价值成为可能。探索基于机器学习在风景园林领域智能化分析应用的途径,开展3个实验。其中2个与数据分析研究相关,提出基于调研图像色彩聚类分析的城市色彩印象和基于图像识别技术的景观视觉质量评估与网络应用平台部署实验。最后1个实验与数字化设计创作相关,提出用于设计方案遴选的地形生成方法,包括2个子项目:应用深度学习生成对抗网络(GAN)的地形生成和建立遮罩、预测未知区域的高程。3个实验应用到机器学习中分类、聚类和回归3个主要方向中的算法以及深度学习的生成对抗网络,对传统的研究问题提出了基于机器学习新的研究方法。因此,在应用机器学习风景园林领域,可以有效地从多源数据中学习相互增强的知识,发现问题,并提出解决问题的新方法。 相似文献
13.
《Journal of molecular biology》2021,433(20):167196
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs—and intrinsically disordered regions (IDRs) interspersed between folded domains—are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs. 相似文献
14.
Alexander T Xue Daniel R Schrider Andrew D Kern Agg Consortium 《Molecular biology and evolution》2021,38(3):1168
Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC’s performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics. 相似文献
15.
16.
Min‐Hsuan Lee 《Liver Transplantation》2019,9(26)
Ternary organic solar cells (OSCs) have progressed significantly in recent years due to the sufficient photon harvesting of the blend photoactive layer including three absorption‐complementary materials. With the rapid development of highly efficient ternary OSCs in photovoltaics, the precise energy‐level alignment of the three active components within ternary OSC devices should be taken into account. The machine‐learning technique is a computational method that can effectively learn from previous historical data to build predictive models. In this study, a dataset of 124 fullerene derivatives‐based ternary OSCs is manually constructed from a diverse range of literature along with their frontier molecular orbital theory levels, and device structures. Different machine‐learning algorithms are trained based on these electronic parameters to predict photovoltaic efficiency. Thus, the best predictive capability is provided by using the Random Forest approach beyond other machine‐learning algorithms in the dataset. Furthermore, the Random Forest algorithm yields valuable insights into the crucial role of lowest unoccupied molecular orbital energy levels of organic donors in the performance of ternary OSCs. The outcome of this study demonstrates a smart strategy for extracting underlying complex correlations in fullerene derivatives‐based ternary OSCs, thereby accelerating the development of ternary OSCs and related research fields. 相似文献
17.
BackgroundRecent development in neuroimaging and genetic testing technologies have made it possible to measure pathological features associated with Alzheimer''s disease (AD) in vivo. Mining potential molecular markers of AD from high-dimensional, multi-modal neuroimaging and omics data will provide a new basis for early diagnosis and intervention in AD. In order to discover the real pathogenic mutation and even understand the pathogenic mechanism of AD, lots of machine learning methods have been designed and successfully applied to the analysis and processing of large-scale AD biomedical data.ObjectiveTo introduce and summarize the applications and challenges of machine learning methods in Alzheimer''s disease multi-source data analysis.MethodsThe literature selected in the review is obtained from Google Scholar, PubMed, and Web of Science. The keywords of literature retrieval include Alzheimer''s disease, bioinformatics, image genetics, genome-wide association research, molecular interaction network, multi-omics data integration, and so on.ConclusionThis study comprehensively introduces machine learning-based processing techniques for AD neuroimaging data and then shows the progress of computational analysis methods in omics data, such as the genome, proteome, and so on. Subsequently, machine learning methods for AD imaging analysis are also summarized. Finally, we elaborate on the current emerging technology of multi-modal neuroimaging, multi-omics data joint analysis, and present some outstanding issues and future research directions. 相似文献
18.
《IRBM》2023,44(1):100725
ObjectivesWhen the prognosis of COVID-19 disease can be detected early, the intense-pressure and loss of workforce in health-services can be partially reduced. The primary-purpose of this article is to determine the feature-dataset consisting of the routine-blood-values (RBV) and demographic-data that affect the prognosis of COVID-19. Second, by applying the feature-dataset to the supervised machine-learning (ML) models, it is to identify severely and mildly infected COVID-19 patients at the time of admission.Material and methodsThe sample of this study consists of severely (n = 192) and mildly (n = 4010) infected-patients hospitalized with the diagnosis of COVID-19 between March-September, 2021. The RBV-data measured at the time of admission and age-gender characteristics of these patients were analyzed retrospectively. For the selection of the features, the minimum-redundancy-maximum-relevance (MRMR) method, principal-components-analysis and forward-multiple-logistics-regression analyzes were used. The features set were statistically compared between mild and severe infected-patients. Then, the performances of various supervised-ML-models were compared in identifying severely and mildly infected-patients using the feature set.ResultsIn this study, 28 RBV-parameters and age-variable were found as the feature-dataset. The effect of features on the prognosis of the disease has been clinically proven. The ML-models with the highest overall-accuracy in identifying patient-groups were found respectively, as follows: local-weighted-learning (LWL)-97.86%, K-star (K*)-96.31%, Naive-Bayes (NB)-95.36% and k-nearest-neighbor (KNN)-94.05%. Also, the most successful models with the highest area-under-the-receiver-operating-characteristic-curve (AUC) values in identifying patient groups were found respectively, as follows: LWL-0.95%, K*-0.91%, NB-0.85% and KNN-0.75%.ConclusionThe findings in this article have significant a motivation for the healthcare professionals to detect at admission severely and mildly infected COVID-19 patients. 相似文献
19.
设计结合不同化学结构底物的酶结合袋是一个巨大的挑战. 传统的湿实验要筛选成千上万甚至上百万个突变体来寻找对特定配体结合的突变体,此过程需要耗费大量的时间和资源. 为了加快筛选过程,我们提出了一种新的工作流程,将分子建模和数据驱动的机器学习方法相结合,生成具有高富集率的突变文库,用于高效筛选能识别特定底物的蛋白质突变体. M. jannaschii酪氨酰tRNA合成酶(Mj. TyrRS)能识别特定的非天然氨基酸并催化形成氨酰tRNA,其不同的突变体能够识别不同结构的非天然氨基酸,并且已经有了许多报道和数据的积累,因此我们使用TyrRS作为一个例子来进行此筛选流程的概念验证. 基于已知的多个Mj. TyrRS的晶体结构及分子建模的结果,我们发现D158G/P是影响残基158~163位α螺旋蛋白骨架变化的关键突变. 我们的模拟结果表明,在含有687个突变体的测试数据中,与随机突变相比,分子建模和打分函数计算排序可以将目标突变体的富集率提高2倍,而使用已知突变体和对应的非天然氨基酸数据训练的机器学习模型进行校准后,筛选富集率可提高11倍. 这种分子建模和机器学习相结合的计算和筛选流程非常有助于Mj.TyrRS的底物特异性设计,可以大大减少湿实验的时间和成本. 此外,这种新方法在蛋白质计算设计领域具有广泛的应用前景. 相似文献