首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Drug Anatomical Therapeutic Chemical (ATC) classification system is a widely used and accepted drug classification system. It is recommended and maintained by World Health Organization (WHO). Each drug in this system is assigned one or more ATC codes, indicating which classes it belongs to in each of five levels. Given a chemical/drug, correct identification of its ATC codes in such system can be helpful to understand its therapeutic effects. Several computational methods have been proposed to identify the first level ATC classes for any drug. Most of them built multi-label classifiers in this regard. One previous study proposed a quite different scheme, which contained two network methods, based on shortest path (SP) and random walk with restart (RWR) algorithms, respectively, to infer novel chemicals/drugs for each first level class. However, due to the limitations of SP and RWR algorithms, there still exist lots of hidden chemicals/drugs that above two methods cannot discover. This study employed another classic network algorithm, Laplacian heat diffusion (LHD) algorithm, to construct a new computational method for recognizing novel latent chemicals/drugs of each first level ATC class. This algorithm was applied on a chemical network, which containing lots of chemical interaction information, to evaluate the associations of candidate chemicals/drugs and each ATC class. Three screening tests, which measured the specificity and association to one ATC class, followed to yield more reliable potential members for each class. Some hidden chemicals/drugs were recognized, which cannot be found out by previous methods, and they were extensively analyzed to confirm that they can be novel members in the corresponding ATC class.  相似文献   

2.
Li Y  Wang N  Perkins EJ  Zhang C  Gong P 《PloS one》2010,5(10):e13715
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.  相似文献   

3.
The recognition of protein folds is an important step in the prediction of protein structure and function. Recently, an increasing number of researchers have sought to improve the methods for protein fold recognition. Following the construction of a dataset consisting of 27 protein fold classes by Ding and Dubchak in 2001, prediction algorithms, parameters and the construction of new datasets have improved for the prediction of protein folds. In this study, we reorganized a dataset consisting of 76-fold classes constructed by Liu et al. and used the values of the increment of diversity, average chemical shifts of secondary structure elements and secondary structure motifs as feature parameters in the recognition of multi-class protein folds. With the combined feature vector as the input parameter for the Random Forests algorithm and ensemble classification strategy, we propose a novel method to identify the 76 protein fold classes. The overall accuracy of the test dataset using an independent test was 66.69%; when the training and test sets were combined, with 5-fold cross-validation, the overall accuracy was 73.43%. This method was further used to predict the test dataset and the corresponding structural classification of the first 27-protein fold class dataset, resulting in overall accuracies of 79.66% and 93.40%, respectively. Moreover, when the training set and test sets were combined, the accuracy using 5-fold cross-validation was 81.21%. Additionally, this approach resulted in improved prediction results using the 27-protein fold class dataset constructed by Ding and Dubchak.  相似文献   

4.

Purpose

Mixtures of organic chemicals are a part of virtually all life cycles, but LCI data exist for only relatively few chemicals. Thus, estimation methods are required. However, these are often either very time-consuming or deliver results of low quality. This article compares existing and new methods in two scenarios and recommends a tiered approach of different methods for an efficient estimation of the production impacts of chemical mixtures.

Methods

Four approaches to estimate impacts of a large number of chemicals are compared in this article: extrapolation from existing data, substitution with generic datasets on chemicals, molecular structure-based models (MSMs, in this case the Finechem tool), and using process-based estimation methods. Two scenarios were analyzed as case studies: soft PVC plastic and a tobacco flavor, a mixture of 20 chemicals.

Results

Process models have the potential to deliver the best estimations, as existing information on production processes can be integrated. However, their estimation quality suffers when such data are not available and they are time-consuming to apply, which is problematic when estimating large numbers of chemicals. Extrapolation from known to unknown components and use of generic datasets are generally not recommended. In both case studies, these two approaches significantly underestimated the impacts of the chemicals compared to the process models. MSMs were generally able to estimate impacts on the same level as the more complex process models. A tiered approach using MSMs to determine the relevance of individual components in mixtures and applying process models to the most relevant components offered a simpler and faster estimation process while delivering results on the level of most process models.

Conclusions

The application of the tiered combination of MSMs and process models allows LCA practitioners a relatively fast and simple estimation of the LCIA results of chemicals, even for mixtures with a large number of components. Such mixtures previously presented a problem, as the application of process models for all components was very time-consuming, while the existing, simple approaches were shown to be inadequate in this study. We recommend the tiered approach as a significant improvement over previous approaches for estimating LCA results of chemical mixtures.  相似文献   

5.
Whole-cell biosensors are mostly non-specific with respect to their detection capabilities for toxicants, and therefore offering an interesting perspective in environmental monitoring. However, to fully employ this feature, a robust classification method needs to be implemented into these sensor systems to allow further identification of detected substances. Substance-specific information can be extracted from signals derived from biosensors harbouring one or multiple biological components. Here, a major task is the identification of substance-specific information among considerable amounts of biosensor data. For this purpose, several approaches make use of statistical methods or machine learning algorithms. Genetic Programming (GP), a heuristic machine learning technique offers several advantages compared to other machine learning approaches and consequently may be a promising tool for biosensor data classification. In the present study, we have evaluated the use of GP for the classification of herbicides and herbicide classes (chemical classes) by analysis of substance-specific patterns derived from a whole-cell multi-species biosensor. We re-analysed data from a previously described array-based biosensor system employing diverse microalgae (Podola and Melkonian, 2005), aiming on the identification of five individual herbicides as well as two herbicide classes. GP analyses were performed using the commercially available GP software 'Discipulus', resulting in classifiers (computer programs) for the binary classification of each individual herbicide or herbicide class. GP-generated classifiers both for individual herbicides and herbicide classes were able to perform a statistically significant identification of herbicides or herbicide classes, respectively. The majority of classifiers were able to perform correct classifications (sensitivity) of about 80-95% of test data sets, whereas the false positive rate (specificity) was lower than 20% for most classifiers. Results suggest that a higher number of data sets may lead to a better classification performance. In the present paper, GP-based classification was combined with a biosensor for the first time. Our results demonstrate GP was able to identify substance-specific information within complex biosensor response patterns and furthermore use this information for successful toxicant classification in unknown samples. This suggests further research to assess perspectives and limitations of this approach in the field of biosensors.  相似文献   

6.
Phenotypic screening through high-content automated microscopy is a powerful tool for evaluating the mechanism of action of candidate therapeutics. Despite more than a decade of development, however, high content assays have yielded mixed results, identifying robust phenotypes in only a small subset of compound classes. This has led to a combinatorial explosion of assay techniques, analyzing cellular phenotypes across dozens of assays with hundreds of measurements. Here, using a minimalist three-stain assay and only 23 basic cellular measurements, we developed an analytical approach that leverages informative dimensions extracted by linear discriminant analysis to evaluate similarity between the phenotypic trajectories of different compounds in response to a range of doses. This method enabled us to visualize biologically-interpretable phenotypic tracks populated by compounds of similar mechanism of action, cluster compounds according to phenotypic similarity, and classify novel compounds by comparing them to phenotypically active exemplars. Hierarchical clustering applied to 154 compounds from over a dozen different mechanistic classes demonstrated tight agreement with published compound mechanism classification. Using 11 phenotypically active mechanism classes, classification was performed on all 154 compounds: 78% were correctly identified as belonging to one of the 11 exemplar classes or to a different unspecified class, with accuracy increasing to 89% when less phenotypically active compounds were excluded. Importantly, several apparent clustering and classification failures, including rigosertib and 5-fluoro-2’-deoxycytidine, instead revealed more complex mechanisms or off-target effects verified by more recent publications. These results show that a simple, easily replicated, minimalist high-content assay can reveal subtle variations in the cellular phenotype induced by compounds and can correctly predict mechanism of action, as long as the appropriate analytical tools are used.  相似文献   

7.
Assigning biological functions to uncharacterized proteins is a fundamental problem in the postgenomic era. The increasing availability of large amounts of data on protein-protein interactions (PPIs) has led to the emergence of a considerable number of computational methods for determining protein function in the context of a network. These algorithms, however, treat each functional class in isolation and thereby often suffer from the difficulty of the scarcity of labeled data. In reality, different functional classes are naturally dependent on one another. We propose a new algorithm, Multi-label Correlated Semi-supervised Learning (MCSL), to incorporate the intrinsic correlations among functional classes into protein function prediction by leveraging the relationships provided by the PPI network and the functional class network. The guiding intuition is that the classification function should be sufficiently smooth on subgraphs where the respective topologies of these two networks are a good match. We encode this intuition as regularized learning with intraclass and interclass consistency, which can be understood as an extension of the graph-based learning with local and global consistency (LGC) method. Cross validation on the yeast proteome illustrates that MCSL consistently outperforms several state-of-the-art methods. Most notably, it effectively overcomes the problem associated with scarcity of label data. The supplementary files are freely available at http://sites.google.com/site/csaijiang/MCSL.  相似文献   

8.
Insect ryanodine receptors: molecular targets for novel pest control chemicals   总被引:15,自引:0,他引:15  
Ryanodine receptors (RyRs) are a distinct class of ligand-gated calcium channels controlling the release of calcium from intracellular stores. They are located on the sarcoplasmic reticulum of muscle and the endoplasmic reticulum of neurons and many other cell types. Ryanodine, a plant alkaloid and an important ligand used to characterize and purify the receptor, has served as a natural botanical insecticide, but attempts to generate synthetic commercial analogues of ryanodine have proved unsuccessful. Recently two classes of synthetic chemicals have emerged resulting in commercial insecticides that target insect RyRs. The phthalic acid diamide class has yielded flubendiamide, the first synthetic ryanodine receptor insecticide to be commercialized. Shortly after the discovery of the phthalic diamides, the anthranilic diamides were discovered. This class has produced the insecticides Rynaxypyr(R) and Cyazypyrtrade mark. Here we review the structure and functions of insect RyRs and address the modes of action of phthalic acid diamides and anthranilic diamides on insect ryanodine receptors. Particularly intersting is the inherent selectivity both chemical classes exhibit for insect RyRs over their mammalian counterparts. The future prospects for RyRs as a commercially-validated target site for insect control chemicals are also considered.  相似文献   

9.
The identification of specific interactions between small molecules and human proteins of interest is a fundamental step in chemical biology and drug development. Here we describe an efficient method to obtain novel binding ligands of human proteins by a chemical array approach. Our method includes large-scale ligand screening with two libraries, proteins and chemicals, the use of cell lysates that express proteins of interest fused with red fluorescent protein, and high-throughput screening by merged display analysis, which removes false positive signals from array experiments. Using our systematic platform, we detected novel inhibitors of carbonic anhydrase II. It is suggested that our systematic platform is a rapid and robust approach to screen novel ligands for human proteins of interest.  相似文献   

10.
The new dermal acute toxic class (ATC) method is presented for one specific classification system for chemicals (released by the European Union) according to acute dermal toxicity. It is a stepwise procedure using three animals of one sex per step. Three starting doses are possible. Assuming a Probit model for the dose response relationship probabilities of a correct, of a less and of a more stringent classification are calculated. It is shown that these probabilities depend only weakly on the starting dose. Also, the expected numbers of used and of dead animals are derived in dependence on the LD50 and on the dose response slope β, as well as the starting doses minimizing the expected animal number are proposed. The results demonstrate that the dermal ATC method is a reliable alternative to the classical LD50 test with the use of significantly fewer animals.  相似文献   

11.
MOTIVATION: As more genomes are sequenced, the demand for fast gene classification techniques is increasing. To analyze a newly sequenced genome, first the genes are identified and translated into amino acid sequences which are then classified into structural or functional classes. The best-performing protein classification methods are based on protein homology detection using sequence alignment methods. Alignment methods have recently been enhanced by discriminative methods like support vector machines (SVMs) as well as by position-specific scoring matrices (PSSM) as obtained from PSI-BLAST. However, alignment methods are time consuming if a new sequence must be compared to many known sequences-the same holds for SVMs. Even more time consuming is to construct a PSSM for the new sequence. The best-performing methods would take about 25 days on present-day computers to classify the sequences of a new genome (20,000 genes) as belonging to just one specific class--however, there are hundreds of classes. Another shortcoming of alignment algorithms is that they do not build a model of the positive class but measure the mutual distance between sequences or profiles. Only multiple alignments and hidden Markov models are popular classification methods which build a model of the positive class but they show low classification performance. The advantage of a model is that it can be analyzed for chemical properties common to the class members to obtain new insights into protein function and structure. We propose a fast model-based recurrent neural network for protein homology detection, the 'Long Short-Term Memory' (LSTM). LSTM automatically extracts indicative patterns for the positive class, but in contrast to profile methods it also extracts negative patterns and uses correlations between all detected patterns for classification. LSTM is capable to automatically extract useful local and global sequence statistics like hydrophobicity, polarity, volume, polarizability and combine them with a pattern. These properties make LSTM complementary to alignment-based approaches as it does not use predefined similarity measures like BLOSUM or PAM matrices. RESULTS: We have applied LSTM to a well known benchmark for remote protein homology detection, where a protein must be classified as belonging to a SCOP superfamily. LSTM reaches state-of-the-art classification performance but is considerably faster for classification than other approaches with comparable classification performance. LSTM is five orders of magnitude faster than methods which perform slightly better in classification and two orders of magnitude faster than the fastest SVM-based approaches (which, however, have lower classification performance than LSTM). Only PSI-BLAST and HMM-based methods show comparable time complexity as LSTM, but they cannot compete with LSTM in classification performance. To test the modeling capabilities of LSTM, we applied LSTM to PROSITE classes and interpreted the extracted patterns. In 8 out of 15 classes, LSTM automatically extracted the PROSITE motif. In the remaining 7 cases alternative motifs are generated which give better classification results on average than the PROSITE motifs. AVAILABILITY: The LSTM algorithm is available from http://www.bioinf.jku.at/software/LSTM_protein/.  相似文献   

12.
The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate.  相似文献   

13.
This review, a sequel to the 1998 review, classifies 63 peer-reviewed articles on the basis of the reported preclinical pharmacological properties of marine chemicals derived from a diverse group of marine animals, algae, fungi and bacteria. In all, 21 marine chemicals demonstrated anthelmintic, antibacterial, anticoagulant, antifungal, antimalarial, antiplatelet, antituberculosis or antiviral activities. An additional 23 compounds had significant effects on the cardiovascular, sympathomimetic or the nervous system, as well as possessed anti-inflammatory, immunosuppressant or fibrinolytic effects. Finally, 22 marine compounds were reported to act on a variety of molecular targets, and thus could potentially contribute to several pharmacological classes. Thus, during 1999 pharmacological research with marine chemicals continued to contribute potentially novel chemical leads in the ongoing global search for therapeutic agents for the treatment of multiple disease categories.  相似文献   

14.
15.
《IRBM》2020,41(4):229-239
Feature selection algorithms are the cornerstone of machine learning. By increasing the properties of the samples and samples, the feature selection algorithm selects the significant features. The general name of the methods that perform this function is the feature selection algorithm. The general purpose of feature selection algorithms is to select the most relevant properties of data classes and to increase the classification performance. Thus, we can select features based on their classification performance. In this study, we have developed a feature selection algorithm based on decision support vectors classification performance. The method can work according to two different selection criteria. We tested the classification performances of the features selected with P-Score with three different classifiers. Besides, we assessed P-Score performance with 13 feature selection algorithms in the literature. According to the results of the study, the P-Score feature selection algorithm has been determined as a method which can be used in the field of machine learning.  相似文献   

16.
预测有机物对虹鳟半致死浓度的分子连接性指数法   总被引:3,自引:0,他引:3  
曹红英  王鑫  陶澍 《生态科学》2003,22(1):9-12
根据7类212种化合物对虹鳟半致死浓度实测数据研究了分子连接性指数与有机化合物半数致死浓度(log1/LC50)的定量关系.结果表明,在目前资料条件下,建立统一的分子连接性指数模型有一定困难,而按照不同类别建立的独立模型可以很好地描述这样的定量关系.模型的调整可决系数在0.62~0.92之间,平均残差为0.283个对数单位.残差超过0.5个对数单位的化合物占全部建模化合物的12%以下.在所研究的7类化合物中,有机磷预测模型的误差最大.  相似文献   

17.
We present an approach to predicting protein structural class that uses amino acid composition and hydrophobic pattern frequency information as input to two types of neural networks: (1) a three-layer back-propagation network and (2) a learning vector quantization network. The results of these methods are compared to those obtained from a modified Euclidean statistical clustering algorithm. The protein sequence data used to drive these algorithms consist of the normalized frequency of up to 20 amino acid types and six hydrophobic amino acid patterns. From these frequency values the structural class predictions for each protein (all-alpha, all-beta, or alpha-beta classes) are derived. Examples consisting of 64 previously classified proteins were randomly divided into multiple training (56 proteins) and test (8 proteins) sets. The best performing algorithm on the test sets was the learning vector quantization network using 17 inputs, obtaining a prediction accuracy of 80.2%. The Matthews correlation coefficients are statistically significant for all algorithms and all structural classes. The differences between algorithms are in general not statistically significant. These results show that information exists in protein primary sequences that is easily obtainable and useful for the prediction of protein structural class by neural networks as well as by standard statistical clustering algorithms.  相似文献   

18.
Membrane proteins are gatekeepers to the cell and essential for determination of the function of cells. Identification of the types of membrane proteins is an essential problem in cell biology. It is time-consuming and expensive to identify the type of membrane proteins with traditional experimental methods. The alternative way is to design effective computational methods, which can provide quick and reliable predictions. To date, several computational methods have been proposed in this regard. Several of them used the features extracted from the sequence information of individual proteins. Recently, networks are more and more popular to tackle different protein-related problems, which can organize proteins in a system level and give an overview of all proteins. However, such form weakens the essential properties of proteins, such as their sequence information. In this study, a novel feature fusion scheme was proposed, which integrated the information of protein sequences and protein-protein interaction network. The fused features of a protein were defined as the linear combination of sequence features of all proteins in the network, where the combination coefficients were the probabilities yielded by the random walk with restart algorithm with the protein as the seed node. Several models with such fused features and different classification algorithms were built and evaluated. Their performance for predicting the type of membrane proteins was improved compared with the models only with the sequence features or network information.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号