共查询到20条相似文献,搜索用时 15 毫秒
1.
赖氨酸琥珀酰化是一种新型的翻译后修饰,在蛋白质调节和细胞功能控制中发挥重要作用,所以准确识别蛋白质中的琥珀酰化位点是有必要的。传统的实验耗费物力和财力。通过计算方法预测是近段时间以来提出的一种高效的预测方法。本研究中,我们开发了一种新的预测方法iSucc-PseAAC,它是通过使用多种分类算法结合不同的特征提取方法。最终发现,基于耦合序列(PseAAC)特征提取下,使用支持向量机分类效果是最好的,并结合集成学习解决了数据不平衡问题。与现有方法预测效果对比,iSucc-PseAAC在区分赖氨酸琥珀酰化位点方面,更具有意义和实用性。 相似文献
2.
细胞外基质蛋白质在细胞的一系列生物过程中发挥着重要作用,它的异常调节会导致很多重大疾病。理论细胞外基质蛋白质参考数据是实现细胞外基质蛋白质高效鉴定的基础,研究者们已经基于机器学习的方法开发出一系列的细胞外基质蛋白质预测工具。文中首先阐述了基于机器学习模型构建细胞外基质蛋白质预测工具的基本流程,之后以工具为单位总结了已有细胞外基质蛋白质预测工具的研究成果,最后提出了细胞外基质蛋白质预测工具目前面临的问题和可能的优化方法。 相似文献
3.
基于光镜和扫描电镜的观察,研究了绒螯蟹触角的形态结构,分析了绒螯蟹第1触角内、外肢节的组成,以及外肢节内侧感觉毛的特点;第2触角内肢节的形状以及极长感觉毛和表面鳞片的特征.探讨了在绒螯蟹系统分类中,触角形态特征作为新的分类性状的可能性. 相似文献
4.
Premise Continental-scale leaf trait studies can help explain how plants survive in different environments, but large data sets are costly to assemble at this scale. Automating the measurement of digitized herbarium collections could rapidly expand the data available to such studies. We used machine learning to identify and measure leaves from existing, digitized herbarium specimens. The process was developed, validated, and applied to analyses of relationships between leaf size and climate within and among species for two genera: Syzygium (Myrtaceae) and Ficus (Moraceae). Methods Convolutional neural network (CNN) models were used to detect and measure complete leaves in images. Predictions of a model trained with a set of 35 randomly selected images and a second model trained with 35 user-selected images were compared using a set of 50 labeled validation images. The validated models were then applied to 1227 Syzygium and 2595 Ficus specimens digitized by the National Herbarium of New South Wales, Australia. Leaf area measurements were made for each genus and used to examine links between leaf size and climate. Results The user-selected training method for Syzygium found more leaves (9347 vs. 8423) using fewer training masks (218 vs. 225), and found leaves with a greater range of sizes than the random image training method. Within each genus, leaf size was positively associated with temperature and rainfall, consistent with previous observations. However, within species, the associations between leaf size and environmental variables were weaker. Conclusions CNNs detected and measured leaves with levels of accuracy useful for trait extraction and analysis and illustrate the potential for machine learning of herbarium specimens to massively increase global leaf trait data sets. Within-species relationships were weak, suggesting that population history and gene flow have a strong effect at this level. Herbarium specimens and machine learning could expand sampling of trait data within many species, offering new insights into trait evolution. 相似文献
5.
树种多样性是生态学研究的重要内容,树木的种类和空间分布信息可有效服务于可持续森林管理。但在复杂林分条件下,获取高精度分类结果的难度大。而无人机遥感可获取局域超精细数据,为树种分类精度的提高提供了可能。基于可见光、高光谱、激光雷达等多源无人机遥感数据,探究其在亚热带林分条件下的树种分类潜力。研究发现:(1)随机森林分类器总体精度和各树种的F1分数最高,适合亚热带多树种的分类制图,其区分13种类别(8乔木,4草本)的总体精度为95.63%,Kappa系数为0.948;(2)多源数据的使用可以显著提高分类精度,全特征模型精度最高,且高光谱和激光雷达数据显著影响全特征模型分类精度,可见光纹理数据作用较小;(3)分类特征重要性从大到小排序为结构信息,植被指数,纹理信息,最小噪声变换分量。 相似文献
6.
Echinodorus (Alismataceae) is a genus of aquatic and semi-aquatic herbs naturally distributed from Argentina to the USA but commonly used as ornamentals in aquaria worldwide. The phylogeny of the genus was studied on the basis of 96 morphological characters. The analysis resulted in a single most-parsimonious tree supporting a polyphyletic origin of the genus. However, subgenus Echinodorus together with Echinodorus nymphaeifolius formed a clade. Two large clades can be recognized in Echinodorus s.s. , but previous subdivisions of the genus are not supported and some earlier proposed subspecific combinations were shown to be non-monophyletic. Addition of continuous characters coded as value ranges enhanced both the resolution and the support values of the tree. Hence, inclusion of continuous overlapping data is encouraged in phylogenetic studies. © 2006 The Linnean Society of London, Botanical Journal of the Linnean Society , 2006, 150 , 291–305. 相似文献
7.
Monocytes and neutrophils play key roles in the cytokine storm triggered by SARS-CoV-2 infection, which changes their conformation and function. These changes are detectable at the cellular and molecular level and may be different to what is observed in other respiratory infections. Here, we applied machine learning (ML) to develop and validate an algorithm to diagnose COVID-19 using blood parameters. In this retrospective single-center study, 49 hemogram parameters from 12,321 patients with clinical suspicion of COVID-19 and tested by RT-PCR (4239 positive and 8082 negative) were analysed. The dataset was randomly divided into training and validation sets. Blood cell parameters and patient age were used to construct the predictive model with the support vector machine (SVM) tool. The model constructed from the training set (5936 patients) achieved an accuracy for diagnosis of SARS-CoV-2 infection of 0.952 (95% CI: 0.875–0.892). Test sensitivity and specificity was 0.868 and 0.899, respectively, with a positive (PPV) and negative (NPV) predictive value of 0.896 and 0.872, respectively (prevalence 0.50). The validation set model (4964 patients) achieved an accuracy of 0.894 (95% CI: 0.883–0.903). Test sensitivity and specificity was 0.8922 and 0.8951, respectively, with a positive (PPV) and negative (NPV) predictive value of 0.817 and 0.94, respectively (prevalence 0.34). The area under the receiver operating characteristic curve was 0.952 for the algorithm performance. This algorithm may allow to rule out COVID-19 diagnosis with 94% of probability. This represents a great advance for early diagnostic orientation and guiding clinical decisions. 相似文献
9.
Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging. Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing. Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data. 相似文献
10.
为探讨缺齿藓类与真藓科、提灯藓科的系统关系,理清缺齿藓科的系统位置和中国缺齿藓科分类问题,该研究以中国分布的缺齿藓科植物和相关类群4 000余份标本为材料,进行详细的形态学研究,并选用其中35种、40份样品的4个DNA片段(rps4、trnG、trnL-trnF、atpB-rbcL)联合数据用于分子系统分析,采用邻接法(NJ)和最大似然法(ML)构建分子树。结果表明:(1)在分子树中,缺齿藓类与真藓科植物分别聚在具有高支持率的不同分支上,叶形、叶细胞等形态学特征也有较大差异,缺齿藓类应从广义的真藓科分出。(2)在分子树中,虽然缺齿藓类与提灯藓科植物聚在同一分支中,但无形态学的共源性状,二者不应视为一个单系类群。(3)缺齿藓科是一个自然类群,缺齿藓科内属间存在着密切的系统关系,缺齿藓科的主要识别特征为:植物体小型,茎常分枝;叶形和叶细胞为丝瓜藓型(Pohlia-like),披针形至长椭圆形,中上部细胞狭长,线状菱形或蠕虫形;生殖苞多生于新生枝顶;蒴齿为互生双齿层,常有不同程度的退化或一层蒴齿缺失,稀双层蒴齿缺失。(4)中国缺齿藓科包含有5属,即缺齿藓属(Mielichhoferia)、丝瓜藓属(Pohlia)、拟丝瓜藓属(Pseudopohlia)、合齿藓属(Synthetodontium)和小叶藓属(Epipterygium),目前为止共计34种(含种下分类单位)。 相似文献
11.
We present a study of buzzing sounds of several common species of bumblebees, with the focus on automatic classification of bumblebee species and types. Such classification is useful for bumblebee monitoring, which is important in view of evaluating the quality of their living environment and protecting the biodiversity of these important pollinators. We analysed natural buzzing frequencies for queens and workers of 12 species. In addition, we analysed changes in buzzing of Bombus hypnorum worker for different types of behaviour. We developed a bumblebee classification application using machine learning algorithms. We extracted audio features from sound recordings using a large feature library. We used the best features to train a classification model, with Random Forest proving to be the best training algorithm on the testing set of samples. The web and mobile application also allows expert users to upload new recordings that can be later used to improve the classification model and expand it to include more species. 相似文献
12.
The application of DNA microarray technology for analysis of gene expression creates enormous opportunities to accelerate the pace in understanding living systems and identification of target genes and pathways for drug development and therapeutic intervention. Parallel monitoring of the expression profiles of thousands of genes seems particularly promising for a deeper understanding of cancer biology and the identification of molecular signatures supporting the histological classification schemes of neoplastic specimens. However, the increasing volume of data generated by microarray experiments poses the challenge of developing equally efficient methods and analysis procedures to extract, interpret, and upgrade the information content of these databases. Herein, a computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model is described. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. They represent a rational and dimensionally reduced base for understanding the basic biology of the onset of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for the identification and classification of pathological states. The proposed method has been tested on two different microarray datasets-Golub's analysis of acute human leukemia [Golub et al. (1999) Science 286:531-537], and the human colon adenocarcinoma study presented by Alon et al. [1999; Proc Natl Acad Sci USA 97:10101-10106]. The analysis of the neural network internal structure allows the identification of specific phenotype markers and the extraction of peculiar associations among genes and physiological states. At the same time, the neural network outputs provide assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances. 相似文献
13.
对波纹杂毛虫幼期形态特征作了详细描述;该虫在福建省1a发生2代,以3—4龄幼虫在地被物中越冬;系统观察了各龄幼虫的取食过程并计算了取食量。开展了室内外防治试验。为该虫综合防治提供了科学依据。 相似文献
14.
An important task of computational biology is to identify those parts of a polypeptide chain, which are involved in interactions with other proteins. For this purpose, we have developed the program PresCont, which predicts in a robust manner amino acids that constitute protein-protein interfaces (PPIs). PresCont reaches state-of-the-art classification quality on the basis of only four residue properties that can be readily deduced from the 3D structure of an individual protein and a multiple sequence alignment (MSA) composed of homologs. The core of PresCont is a support vector machine, which assesses solvent-accessible surface area, hydrophobicity, conservation, and the local environment of each amino acid on the protein surface. For training and performance testing, we compiled three nonoverlapping datasets consisting of permanently formed or transient complexes, respectively. A comparison with SPPIDER, ProMate, and meta-PPISP showed that PresCont compares favorably with these highly sophisticated programs, and that its prediction quality is less dependent on the type of protein complex being considered. This balance is due to a mutual compensation of classification weaknesses observed for individual properties: For PPIs of permanent complexes, solvent-accessible surface and hydrophobicity contribute most to classification quality, for PPIs of transient complexes, the assessment of the local environment is most significant. Moreover, we show that for permanent complexes a segmentation of PPIs into core and rim residues has only a moderate influence on prediction quality. PresCont is available as a web service at http://www-bioinf.uni-regensburg.de/. 相似文献
16.
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed. 相似文献
17.
提要为了探索基于深度神经网络模型的牙形刺图像智能识别效果,研究选取奥陶纪8种牙形刺作为研究对象,通过体视显微镜采集牙形刺图像1188幅,收集整理公开发表文献的牙形刺图像778幅,将图像数据集划分为训练集和测试集.通过对训练集图像进行旋转、翻转、滤波增强处理,解决了训练样本不足的问题.基于ResNet-18、ResNet... 相似文献
18.
Conflict among data sources can be frequent in evolutionary biology, especially in cases where one character set poses limitations to resolution. Earthworm taxonomy, for example, remains a challenge because of the limited number of morphological characters taxonomically valuable. An explanation to this may be morphological convergence due to adaptation to a homogeneous habitat, resulting in high degrees of homoplasy. This sometimes impedes clear morphological diagnosis of species. Combination of morphology with molecular techniques has recently aided taxonomy in many groups difficult to delimit morphologically. Here we apply an integrative approach by combining morphological and molecular data, including also some ecological features, to describe a new earthworm species in the family Hormogastridae, Hormogaster abbatissae
sp. n., collected in Sant Joan de les Abadesses (Girona, Spain). Its anatomical and morphological characters are discussed in relation to the most similar Hormogastridae species, which are not the closest species in a phylogenetic analysis of molecular data. Species delimitation using the GMYC method and genetic divergences with the closest species are also considered. The information supplied by the morphological and molecular sources is contradictory, and thus we discuss issues with species delimitation in other similar situations. Decisions should be based on a profound knowledge of the morphology of the studied group but results from molecular analyses should also be considered. 相似文献
20.
In recent years, developing the idea of “cancer big data” has emerged as a result of the significant expansion of various fields such as clinical research, genomics, proteomics and public health records. Advances in omics technologies are making a significant contribution to cancer big data in biomedicine and disease diagnosis. The increasingly availability of extensive cancer big data has set the stage for the development of multimodal artificial intelligence (AI) frameworks. These frameworks aim to analyze high-dimensional multi-omics data, extracting meaningful information that is challenging to obtain manually. Although interpretability and data quality remain critical challenges, these methods hold great promise for advancing our understanding of cancer biology and improving patient care and clinical outcomes. Here, we provide an overview of cancer big data and explore the applications of both traditional machine learning and deep learning approaches in cancer genomic and proteomic studies. We briefly discuss the challenges and potential of AI techniques in the integrated analysis of omics data, as well as the future direction of personalized treatment options in cancer. 相似文献
|