首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Chopra P  Lee J  Kang J  Lee S 《PloS one》2010,5(12):e14305
Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN).  相似文献   

2.
Residue contact map is essential for protein three‐dimensional structure determination. But most of the current contact prediction methods based on residue co‐evolution suffer from high false‐positives as introduced by indirect and transitive contacts (i.e., residues A–B and B–C are in contact, but A–C are not). Built on the work by Feizi et al. (Nat Biotechnol 2013; 31:726–733), which demonstrated a general network model to distinguish direct dependencies by network deconvolution, this study presents a new balanced network deconvolution (BND) algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely used co‐evolution methods. On the test of proteins from three benchmark datasets of the 9th critical assessment of protein structure prediction (CASP9), CASP10, and PSICOV (precise structural contact prediction using sparse inverse covariance estimation) database experiments, the BND can improve the medium‐ and long‐range contact predictions at the L/5 cutoff by 55.59% and 47.68%, respectively, without additional central processing unit cost. The improvement is statistically significant, with a P‐value < 5.93 × 10?3 in the Student's t‐test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co‐evolution‐based contact prediction to the three‐dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method, which is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/BND/ . Proteins 2015; 83:485–496. © 2014 Wiley Periodicals, Inc.  相似文献   

3.

Background  

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.  相似文献   

4.
The application of DNA microarray technology for analysis of gene expression creates enormous opportunities to accelerate the pace in understanding living systems and identification of target genes and pathways for drug development and therapeutic intervention. Parallel monitoring of the expression profiles of thousands of genes seems particularly promising for a deeper understanding of cancer biology and the identification of molecular signatures supporting the histological classification schemes of neoplastic specimens. However, the increasing volume of data generated by microarray experiments poses the challenge of developing equally efficient methods and analysis procedures to extract, interpret, and upgrade the information content of these databases. Herein, a computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model is described. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. They represent a rational and dimensionally reduced base for understanding the basic biology of the onset of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for the identification and classification of pathological states. The proposed method has been tested on two different microarray datasets-Golub's analysis of acute human leukemia [Golub et al. (1999) Science 286:531-537], and the human colon adenocarcinoma study presented by Alon et al. [1999; Proc Natl Acad Sci USA 97:10101-10106]. The analysis of the neural network internal structure allows the identification of specific phenotype markers and the extraction of peculiar associations among genes and physiological states. At the same time, the neural network outputs provide assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances.  相似文献   

5.
Image classification is a challenging problem in organizing a large image database. However, an effective method for such an objective is still under investigation. A method based on wavelet analysis to extract features for image classification is presented in this paper. After an image is decomposed by wavelet, the statistics of its features can be obtained by the distribution of histograms of wavelet coefficients, which are respectively projected onto two orthogonal axes, i.e., x and y directions. Therefore, the nodes of tree representation of images can be represented by the distribution. The high level features are described in low dimensional space including 16 attributes so that the computational complexity is significantly decreased. 2,800 images derived from seven categories are used in experiments. Half of the images were used for training neural network and the other images used for testing. The features extracted by wavelet analysis and the conventional features are used in the experiments to prove the efficacy of the proposed method. The classification rate on the training data set with wavelet analysis is up to 91%, and the classification rate on the testing data set reaches 89%. Experimental results show that our proposed approach for image classification is more effective.  相似文献   

6.
One of the major research directions in bioinformatics is that of predicting the protein superfamily in large databases and classifying a given set of protein domains into superfamilies. The classification reflects the structural, evolutionary and functional relatedness. These relationships are embodied in hierarchical classification such as Structural Classification of Protein (SCOP), which is manually curated. Such classification is essential for the structural and functional analysis of proteins. Yet, a large number of proteins remain unclassified. We have proposed an unsupervised machine-learning FuzzyART neural network algorithm to classify a given set of proteins into SCOP superfamilies. The proposed method is fast learning and uses an atypical non-linear pattern recognition technique. In this approach, we have constructed a similarity matrix from p-values of BLAST all-against-all, trained the network with FuzzyART unsupervised learning algorithm using the similarity matrix as input vectors and finally the trained network offers SCOP superfamily level classification. In this experiment, we have evaluated the performance of our method with existing techniques on six different datasets. We have shown that the trained network is able to classify a given similarity matrix of a set of sequences into SCOP superfamilies at high classification accuracy.  相似文献   

7.
Guo JT  Xu D  Kim D  Xu Y 《Nucleic acids research》2003,31(3):944-952
Structural domains are considered as the basic units of protein folding, evolution, function and design. Automatic decomposition of protein structures into structural domains, though after many years of investigation, remains a challenging and unsolved problem. Manual inspection still plays a key role in domain decomposition of a protein structure. We have previously developed a computer program, DomainParser, using network flow algorithms. The algorithm partitions a protein structure into domains accurately when the number of domains to be partitioned is known. However the performance drops when this number is unclear (the overall performance is 74.5% over a set of 1317 protein chains). Through utilization of various types of structural information including hydrophobic moment profile, we have developed an effective method for assessing the most probable number of domains a structure may have. The core of this method is a neural network, which is trained to discriminate correctly partitioned domains from incorrectly partitioned domains. When compared with the manual decomposition results given in the SCOP database, our new algorithm achieves higher decomposition accuracy (81.9%) on the same data set.  相似文献   

8.
基于分类知识利用神经网络反演叶面积指数   总被引:4,自引:0,他引:4  
陈艳华  张万昌  雍斌 《生态学报》2007,27(7):2785-2793
叶面积指数(LAI,Leaf Area Index)是陆面过程中一个十分重要的输入参数,其遥感反演方法研究一直是国内外遥感应用研究的热点问题。基于统计的遥感反演方法由于缺乏物理基础,其可靠性和普适性差。基于物理的冠层反射模型的LAI反演方法克服了上述弊端,但是由于反演过程是病态的,模型反演结果一般不唯一。神经网络算法的介入可在一定程度上改善这一问题,但是模型反演的病态问题至今仍无法很好地解决。在PROSAIL模型敏感性分析的基础上提出了一种基于影像分类的神经网络反演方法,引进了土壤反射指数用于替代原模型中难以确定的土壤背景反射参数,分别针对不同植被类型建立各自的神经网络,对经过大气纠正后的Landsat ETM+影像进行了模拟实验并同野外实测LAI数据进行比较。结果表明,对于LAI小于3的植被区该方法的反演精度比较可靠,而LAI大于3的植被区,反演的LAI偏小,原因归结为密植被的冠层反射在LAI大于3以后趋于饱和而无法敏感地表征LAI的变化所导致的。  相似文献   

9.
A neural network architecture for data classification   总被引:1,自引:0,他引:1  
This article aims at showing an architecture of neural networks designed for the classification of data distributed among a high number of classes. A significant gain in the global classification rate can be obtained by using our architecture. This latter is based on a set of several little neural networks, each one discriminating only two classes. The specialization of each neural network simplifies their structure and improves the classification. Moreover, the learning step automatically determines the number of hidden neurons. The discussion is illustrated by tests on databases from the UCI machine learning database repository. The experimental results show that this architecture can achieve a faster learning, simpler neural networks and an improved performance in classification.  相似文献   

10.
PurposeThe classification of urinary stones is important prior to treatment because the treatments depend on three types of urinary stones, i.e., calcium, uric acid, and mixture stones. We have developed an automatic approach for the classification of urinary stones into the three types based on microcomputed tomography (micro-CT) images using a convolutional neural network (CNN).Materials and methodsThirty urinary stones from different patients were scanned in vitro using micro-CT (pixel size: 14.96 μm; slice thickness: 15 μm); a total of 2,430 images (micro-CT slices) were produced. The slices (227 × 227 pixels) were classified into the three categories based on their energy dispersive X-ray (EDX) spectra obtained via scanning electron microscopy (SEM). The images of urinary stones from each category were divided into three parts; 66%, 17%, and 17% of the dataset were assigned to the training, validation, and test datasets, respectively. The CNN model with 15 layers was assessed based on validation accuracy for the optimization of hyperparameters such as batch size, learning rate, and number of epochs with different optimizers. Then, the model with the optimized hyperparameters was evaluated for the test dataset to obtain classification accuracy and error.ResultsThe validation accuracy of the developed approach with CNN with optimized hyperparameters was 0.9852. The trained CNN model achieved a test accuracy of 0.9959 with a classification error of 1.2%.ConclusionsThe proposed automated CNN-based approach could successfully classify urinary stones into three types, namely calcium, uric acid, and mixture stones, using micro-CT images.  相似文献   

11.
Numerous studies have contributed to efforts to boost the accuracy of the credit scoring model. Especially interesting are recent studies which have successfully developed the hybrid approach, which advances classification accuracy by combining different machine learning techniques. However, to achieve better credit decisions, it is not enough merely to increase the accuracy of the credit scoring model. It is necessary to conduct meaningful supplementary analyses in order to obtain knowledge of causal relations, particularly in terms of significant conceptual patterns or structures involving attributes used in the credit scoring model. This paper proposes a solution of integrating data preprocessing strategies and the Bayesian network classifier with the tree augmented Na"?ve Bayes search algorithm, in order to improve classification accuracy and to obtain improved knowledge of causal patterns, thus enhancing the validity of credit decisions.  相似文献   

12.
Understanding the interactions between the Earth's microbiome and the physical, chemical and biological environment is a fundamental goal of microbial ecology. We describe a bioclimatic modeling approach that leverages artificial neural networks to predict microbial community structure as a function of environmental parameters and microbial interactions. This method was better at predicting observed community structure than were any of several single-species models that do not incorporate biotic interactions. The model was used to interpolate and extrapolate community structure over time with an average Bray-Curtis similarity of 89.7. Additionally, community structure was extrapolated geographically to create the first microbial map derived from single-point observations. This method can be generalized to the many microbial ecosystems for which detailed taxonomic data are currently being generated, providing an observation-based modeling technique for predicting microbial taxonomic structure in ecological studies.  相似文献   

13.

Background

Bacterial colony morphology is the first step of classifying the bacterial species before sending them to subsequent identification process with devices, such as VITEK 2 automated system and mass spectrometry microbial identification system. It is essential as a pre-screening process because it can greatly reduce the scope of possible bacterial species and will make the subsequent identification more specific and increase work efficiency in clinical bacteriology. But this work needs adequate clinical laboratory expertise of bacterial colony morphology, which is especially difficult for beginners to handle properly. This study presents automatic programs for bacterial colony classification task, by applying the deep convolutional neural networks (CNN), which has a widespread use of digital imaging data analysis in hospitals. The most common 18 bacterial colony classes from Peking University First Hospital were used to train this framework, and other images out of these training dataset were utilized to test the performance of this classifier.

Results

The feasibility of this framework was verified by the comparison between predicted result and standard bacterial category. The classification accuracy of all 18 bacteria can reach 73%, and the accuracy and specificity of each kind of bacteria can reach as high as 90%.

Conclusions

The supervised neural networks we use can have more promising classification characteristics for bacterial colony pre-screening process, and the unsupervised network should have more advantages in revealing novel characteristics from pictures, which can provide some practical indications to our clinical staffs.
  相似文献   

14.
The primary function of the vestibuloocular reflex (VOR) is to maintain the stability of retinal images during head movements. This function is expressed through a complex array of dynamic and adaptive characteristics whose essential physiological basis is a disynaptic arc. We present a model of normal VOR function using a simple neural network architecture constrained by the physiological and anatomical characteristics of this disynaptic reflex arc. When tuned using a method of global optimization, this network is capable of exhibiting the broadband response characteristics observed in behavioral tests of VOR function. Examination of the internal units in the network show that this performance is achieved by rediscovering the solution to VOR processing first proposed by Skavenski and Robinson (1973). Type I units at the intermediate level of the network possess activation characteristics associated with either pure position or pure velocity. When the network is made more complex either through adding more pairs of internal units or an additional level of units, the characteristic division of unit activation properties into position and velocity types remains unchanged. Although simple in nature, the results of our simulations reinforce the validity of bottom-up approaches to modeling of neutral function. In addition, the architecture of the network is consistent with current ideas on the characteristics and site of adaptation of the reflex and should be compatible with current theories regarding learning rules for synaptic modification during VOR adaptation.  相似文献   

15.
MOTIVATION: Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to diagnose the current state or predict the evolution of a disease. Recent developments in machine learning allow one to exploit such datasets, characterized by small numbers of very high-dimensional samples. RESULTS: We propose a systematic approach based on decision tree ensemble methods, which is used to automatically determine proteomic biomarkers and predictive models. The approach is validated on two datasets of surface-enhanced laser desorption/ionization time of flight measurements, for the diagnosis of rheumatoid arthritis and inflammatory bowel diseases. The results suggest that the methodology can handle a broad class of similar problems.  相似文献   

16.
17.
Courtship songs produced by Drosophila males — wild-type, plus the cacophony and dissonance behavioral mutants — were examined with the aid of newly developed strategies for adaptive acoustic analysis and classification. This system used several techniques involving artificial neural networks (a.k.a. parallel distributed processing), including learned vector quantization of signals and non-linear adaption (back-propagation) of data analysis. Pulse song from several individual wild-type and mutant males were first vector-quantized according to their frequency spectra. The accumulated quantized data of this kind, for a given song, were then used to teach or adapt a multiple-layered feedforward artificial neural network, which classified that song according to its original genotype. Results are presented on the performance of the final adapted system when faced with novel test data and on acoustic features the system decides upon for predicting the song-mutant genotype in question. The potential applications and extensions of this new system are discussed, including how it could be used to screen for courtship mutants, search novel behavior patterns or cause-and-effect relationships associated with reproduction, compress these kinds of data for digital storage, and analyze Drosophila behavior beyond the case of courtship song.  相似文献   

18.

Background  

Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance.  相似文献   

19.
Hayat M  Khan A  Yeasin M 《Amino acids》2012,42(6):2447-2460
Knowledge of the types of membrane protein provides useful clues in deducing the functions of uncharacterized membrane proteins. An automatic method for efficiently identifying uncharacterized proteins is thus highly desirable. In this work, we have developed a novel method for predicting membrane protein types by exploiting the discrimination capability of the difference in amino acid composition at the N and C terminus through split amino acid composition (SAAC). We also show that the ensemble classification can better exploit this discriminating capability of SAAC. In this study, membrane protein types are classified using three feature extraction and several classification strategies. An ensemble classifier Mem-EnsSAAC is then developed using the best feature extraction strategy. Pseudo amino acid (PseAA) composition, discrete wavelet analysis (DWT), SAAC, and a hybrid model are employed for feature extraction. The nearest neighbor, probabilistic neural network, support vector machine, random forest, and Adaboost are used as individual classifiers. The predicted results of the individual learners are combined using genetic algorithm to form an ensemble classifier, Mem-EnsSAAC yielding an accuracy of 92.4 and 92.2% for the Jackknife and independent dataset test, respectively. Performance measures such as MCC, sensitivity, specificity, F-measure, and Q-statistics show that SAAC-based prediction yields significantly higher performance compared to PseAA- and DWT-based systems, and is also the best reported so far. The proposed Mem-EnsSAAC is able to predict the membrane protein types with high accuracy and consequently, can be very helpful in drug discovery. It can be accessed at http://111.68.99.218/membrane.  相似文献   

20.

Background  

Polysomnography (PSG) is used to define physiological sleep and different physiological sleep stages, to assess sleep quality and diagnose many types of sleep disorders such as obstructive sleep apnea. However, PSG requires not only the connection of various sensors and electrodes to the subject but also spending the night in a bed that is different from the subject's own bed. This study is designed to investigate the feasibility of automatic classification of sleep stages and obstructive apneaic epochs using only the features derived from a single-lead electrocardiography (ECG) signal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号