首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many external and internal validity measures have been proposed in order to estimate the number of clusters in gene expression data but as a rule they do not consider the analysis of the stability of the groupings produced by a clustering algorithm. Based on the approach assessing the predictive power or stability of a partitioning, we propose the new measure of cluster validation and the selection procedure to determine the suitable number of clusters. The validity measure is based on the estimation of the "clearness" of the consensus matrix, which is the result of a resampling clustering scheme or consensus clustering. According to the proposed selection procedure the stable clustering result is determined with the reference to the validity measure for the null hypothesis encoding for the absence of clusters. The final number of clusters is selected by analyzing the distance between the validity plots for initial and permutated data sets. We applied the selection procedure to estimate the clustering results on several datasets. As a result the proposed procedure produced an accurate and robust estimate of the number of clusters, which are in agreement with the biological knowledge and gold standards of cluster quality.  相似文献   

2.
A hybrid neural network architecture is investigated for modeling purposes. The proposed hybrid is based on the multilayer perceptron (MLP) network. In addition to the usual hidden layers, the first hidden layer is selected to be an adaptive reference pattern layer. Each unit in this new layer incorporates a reference pattern that is located somewhere in the space spanned by the input variables. The outputs of these units are the component wise-squared differences between the elements of a reference pattern and the inputs. The reference pattern layer has some resemblance to the hidden layer of the radial basis function (RBF) networks. Therefore the proposed design can be regarded as a sort of hybrid of MLP and RBF networks. The presented benchmark experiments show that the proposed hybrid can provide significant advantages over standard MLPs and RBFs in terms of fast and efficient learning, and compact network structure.  相似文献   

3.
MOTIVATION: Multilayer perceptrons (MLP) represent one of the widely used and effective machine learning methods currently applied to diagnostic classification based on high-dimensional genomic data. Since the dimensionalities of the existing genomic data often exceed the available sample sizes by orders of magnitude, the MLP performance may degrade owing to the curse of dimensionality and over-fitting, and may not provide acceptable prediction accuracy. RESULTS: Based on Fisher linear discriminant analysis, we designed and implemented an MLP optimization scheme for a two-layer MLP that effectively optimizes the initialization of MLP parameters and MLP architecture. The optimized MLP consistently demonstrated its ability in easing the curse of dimensionality in large microarray datasets. In comparison with a conventional MLP using random initialization, we obtained significant improvements in major performance measures including Bayes classification accuracy, convergence properties and area under the receiver operating characteristic curve (A(z)). SUPPLEMENTARY INFORMATION: The Supplementary information is available on http://www.cbil.ece.vt.edu/publications.htm  相似文献   

4.
In this paper, we present an effective and efficient diagnosis system based on particle swarm optimization (PSO) enhanced fuzzy k-nearest neighbor (FKNN) for Parkinson's disease (PD) diagnosis. In the proposed system, named PSO–FKNN, both the continuous version and binary version of PSO were used to perform the parameter optimization and feature selection simultaneously. On the one hand, the neighborhood size k and the fuzzy strength parameter m in FKNN classifier are adaptively specified by the continuous PSO. On the other hand, binary PSO is utilized to choose the most discriminative subset of features for prediction. The effectiveness of the PSO–FKNN model has been rigorously evaluated against the PD data set in terms of classification accuracy, sensitivity, specificity and the area under the receiver operating characteristic (ROC) curve (AUC). Compared to the existing methods in previous studies, the proposed system has achieved the highest classification accuracy reported so far via 10-fold cross-validation analysis, with the mean accuracy of 97.47%. Promisingly, the proposed diagnosis system might serve as a new candidate of powerful tools for diagnosing PD with excellent performance.  相似文献   

5.
Numerous diseases have been linked to the malfunction of G-protein coupled receptors (GPCRs). Their adequate treatment requires rational design of new high-affinity and high-selectivity drugs targeting these receptors. In this work, we report three-dimensional models of the human MT(1) and MT(2) melatonin receptors, members of the GPCR family. The models are based on the X-ray structure of bovine rhodopsin. The computational approach employs an original procedure for optimization of receptor-ligand structures. It includes rotation of one of the transmembrane alpha-helices around its axis with simultaneous assessment of quality of the resulting complexes according to a number of criteria we have developed for this purpose. The optimal geometry of the receptor-ligand binding is selected based on the analysis of complementarity of hydrophobic/hydrophilic properties between the ligand and its protein environment in the binding site. The elaborated "optimized" models are employed to explore the details of protein-ligand interactions for melatonin and a number of its analogs with known affinity to MT(1) and MT(2) receptors. The models permit rationalization of experimental data, including those that were not used in model building. The perspectives opened by the constructed models and by the optimization procedure in the design of new drugs are discussed.  相似文献   

6.
The system of tree architecture proposed by Hallé and Oldeman consists of 23 models named after botanists playing leading roles in elucidating tree architecture. This system gives no indication why other models do not occur. A symbolism is presented here which can serve as a shorthand in recording tree architectures without assumptions about models. and immediately interpretable. Using this symbolism to represent the models proposed by Hallé and Oldeman permits creation of general rules of tree architecture. some of which raise interesting theoretical questions. Two further tree models that might well be expected to exist and several which would not be expected. are described.  相似文献   

7.
Previous work has shown that mutations in muscle LIM protein (MLP) can cause hypertrophic cardiomyopathy (HCM). In order to gain an insight into the molecular basis of the disease phenotype, we analysed the binding characteristics of wild-type MLP and of the (C58G) mutant MLP that causes hypertrophic cardiomyopathy. We show that MLP can form a ternary complex with two of its previously documented myofibrillar ligand proteins, N-RAP and -actinin, which indicates the presence of distinct, non-overlapping binding sites. Our data also show that, in comparison to wild-type MLP, the capacity of the mutated MLP protein to bind both N-RAP and -actinin is significantly decreased. In addition, this single point mutation prevents zinc coordination and proper folding of the second zinc-finger in the first LIM domain, which consequently renders the protein less stable and more susceptible to proteolysis. The molecular basis for HCM-causing mutations in the MLP gene might therefore be an alteration in the equilibrium of interactions of the ternary complex MLP–N-RAP–-actinin. This assumption is supported by the previous observation that in the pathological situation accompanied by MLP down regulation, cardiomyocytes try to compensate for the decreased stability of MLP protein by increasing the expression of its ligand N-RAP, which might finally result in the development of myocyte disarray that is characteristic of this disease.This study was supported by a grant from the Deutsche Forschungsgemeinschaft to D.O.F.  相似文献   

8.
Variable selection is critical in competing risks regression with high-dimensional data. Although penalized variable selection methods and other machine learning-based approaches have been developed, many of these methods often suffer from instability in practice. This paper proposes a novel method named Random Approximate Elastic Net (RAEN). Under the proportional subdistribution hazards model, RAEN provides a stable and generalizable solution to the large-p-small-n variable selection problem for competing risks data. Our general framework allows the proposed algorithm to be applicable to other time-to-event regression models, including competing risks quantile regression and accelerated failure time models. We show that variable selection and parameter estimation improved markedly using the new computationally intensive algorithm through extensive simulations. A user-friendly R package RAEN is developed for public use. We also apply our method to a cancer study to identify influential genes associated with the death or progression from bladder cancer.  相似文献   

9.
In this work, we propose a novel method for individualized treatment selection when the treatment response is multivariate. Our method covers any number of treatments and it can be applied for a broad set of models. The proposed method uses a Mahalanobis-type distance measure to establish an ordering of treatments based on treatment performance measures. Our investigation in this work deals with means of responses conditional on lower dimensional composite scores based on covariates where these scores are built using single index models to approximate mean responses against patient covariates. Smoothed estimates of such conditional means are combined to construct an estimate of the aforementioned distance measure, which is then used to estimate the optimal treatment. An empirical study demonstrates the performance of the proposed method in finite samples. We also present a data analysis using an HIV clinical trial data to show the applicability of the proposed procedure for real data.  相似文献   

10.
11.
支持向量回归机(Support vector regressio,SVR)模型的拟合精度和泛化能力取决于其相关参数的选择,其参数选择实质上是一个优化搜索过程。根据启发式广度优先搜索(Heuristic Breadth first Search,HBFS)算法在求解优化问题上高效的特点,提出了一种以k-fold交叉验证的最小化误差为目标,HBFS为寻优策略的SVR参数选择方法,通过3个基准数据集对该模型进行了仿真实验,结果表明该方法在保证预测精度前提下,大幅度的缩短了训练建模时间,为大样本的SVR参数选择提供了一种新的有效解决方案。  相似文献   

12.
This paper presents a new scheme for training MLPs which employs a relaxation method for multi-objective optimization. The algorithm works by obtaining a reduced set of solutions, from which the one with the best generalization is selected. This approach allows balancing between the training error and norm of network weight vectors, which are the two objective functions of the multi-objective optimization problem. The method is applied to classification and regression problems and compared with Weight Decay (WD), Support Vector Machines (SVMs) and standard Backpropagation (BP). It is shown that the systematic procedure for training proposed results on good generalization neural models, and outperforms traditional methods.  相似文献   

13.
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.  相似文献   

14.
Nonparametric feature selection for high-dimensional data is an important and challenging problem in the fields of statistics and machine learning. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel, which depends on a set of parameters that determines the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters simultaneously. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove the oracle selection property and Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and applications to two real studies.  相似文献   

15.
16.
Detection and Architecture of Small Heat Shock Protein Monomers   总被引:1,自引:0,他引:1  

Background

Small Heat Shock Proteins (sHSPs) are chaperone-like proteins involved in the prevention of the irreversible aggregation of misfolded proteins. Although many studies have already been conducted on sHSPs, the molecular mechanisms and structural properties of these proteins remain unclear. Here, we propose a better understanding of the architecture, organization and properties of the sHSP family through structural and functional annotations. We focused on the Alpha Crystallin Domain (ACD), a sandwich fold that is the hallmark of the sHSP family.

Methodology/Principal Findings

We developed a new approach for detecting sHSPs and delineating ACDs based on an iterative Hidden Markov Model algorithm using a multiple alignment profile generated from structural data on ACD. Using this procedure on the UniProt databank, we found 4478 sequences identified as sHSPs, showing a very good coverage with the corresponding PROSITE and Pfam profiles. ACD was then delimited and structurally annotated. We showed that taxonomic-based groups of sHSPs (animals, plants, bacteria) have unique features regarding the length of their ACD and, more specifically, the length of a large loop within ACD. We detailed highly conserved residues and patterns specific to the whole family or to some groups of sHSPs. For 96% of studied sHSPs, we identified in the C-terminal region a conserved I/V/L-X-I/V/L motif that acts as an anchor in the oligomerization process. The fragment defined from the end of ACD to the end of this motif has a mean length of 14 residues and was named the C-terminal Anchoring Module (CAM).

Conclusions/Significance

This work annotates structural components of ACD and quantifies properties of several thousand sHSPs. It gives a more accurate overview of the architecture of sHSP monomers.  相似文献   

17.
This paper presents an optimization procedure for multi-criterion analysis essential in many biomechanical studies. The optimization is illustrated with a heel-toe running analysis wherein the rate of load and the passive load on support leg are minimized concurrently. The goal of multi-criterion optimization is achieved by incorporating the criterion of Pareto optimality in the genetic algorithm. The proposed procedure can replace the popular weighted-sum approach for problems with multiple objectives. The selection of a final design from the Pareto optimum points (non-dominated designs) can be determined, based on the min-max objective deviation criterion. Nevertheless, a different decision can be made in the final selection without incurring recalculations. The scheme is readily adoptable for parallel computing, which deserves further study to reduce the execution time in a complex biomechanical analysis.  相似文献   

18.
PurposeMultiple Coulomb scattering (MCS) poses a challenge in proton CT (pCT) image reconstruction. The assumption of straight paths is replaced with Bayesian models of the most likely path (MLP). Current MLP-based pCT reconstruction approaches assume a water scattering environment. We propose an MLP formalism based on accurate determination of scattering moments in inhomogeneous media.MethodsScattering power relative to water (RScP) was calculated for a range of human tissues and investigated against relative stopping power (RStP). Monte Carlo simulation was used to compare the new inhomogeneous MLP formalism to the water approach in a slab geometry and a human head phantom. An MLP-Spline-Hybrid method was investigated for improved computational efficiency.ResultsA piecewise-linear correlation between RStP and RScP was shown, which may assist in iterative pCT reconstruction. The inhomogeneous formalism predicted Monte Carlo proton paths through a water cube with thick bone inserts to within 1.0 mm for beams ranging from 210 to 230 MeV incident energy. Improvement in accuracy over the conventional MLP ranged from 5% for a 230 MeV beam to 17% for 210 MeV. There was no noticeable gain in accuracy when predicting 200 MeV proton paths through a clinically relevant human head phantom. The MLP-Spline-Hybrid method reduced computation time by half while suffering negligible loss of accuracy.ConclusionsWe have presented an MLP formalism that accounts for material composition. In most clinical cases a water scattering environment can be assumed, however in certain cases of significant heterogeneity the proposed algorithm may improve proton path estimation.  相似文献   

19.
Phage display is a well-established procedure to isolate binders against a wide variety of antigens that can be performed on purified antigens, but also on intact cells. As selection steps are performed in vitro, it is possible to focus the outcome of the selection on relevant epitopes by performing some additional steps, such as depletion or competitive elutions. However in practice, the efficiency of these steps is often limited and can lead to inconsistent results. We have designed a new selection method named masked selection, based on the blockade of unwanted epitopes to favor the targeting of relevant ones. We demonstrate the efficiency and flexibility of this method by selecting single-domain antibodies against a specific portion of a fusion protein, by selecting binders against several members of the seven transmembrane receptor family using transfected HEK cells, or by selecting binders against unknown breast cancer markers not expressed on normal samples. The relevance of this approach for antibody-based therapies was further validated by the identification of four of these markers, Epithelial cell adhesion molecule, Transferrin receptor 1, Metastasis cell adhesion molecule, and Sushi containing domain 2, using immunoprecipitation and mass spectrometry. This new phage display strategy can be applied to any type of antibody fragments or alternative scaffolds, and is especially suited for the rapid discovery and identification of cell surface markers.Hybridoma (1) and phage-display recombinant antibody systems (2) are currently the predominant methods for isolating monoclonal antibodies (Abs).1 Display of recombinant Abs on the surface of bacteriophage M13 has numerous advantages compared with conventional hybridoma technology. When combined with the use of large non-immune libraries, phage Ab selection represents a rich source of binders that can be isolated in a fraction of the time needed for hybridoma-based approaches. Moreover, this in vitro selection method permits the selection of binders against toxic, non-immunogenic or highly conserved antigens, which is not easily performed using the conventional hybridoma techniques. Importantly, it can be used to isolate fully human antibody fragments (3). Consequently, phage display rapidly became an established procedure for the isolation of binders against a wide variety of antigens.Phage display-based antibody isolation typically relies on the use of recombinant proteins for several steps, including immunizations (if needed), library enrichment by selection on immobilized antigen, screening, and characterization of antibodies in terms of specificity and affinity (4). This procedure is efficient but depends on the availability of purified recombinant proteins. Unfortunately, some surface molecules, such as G-protein coupled receptors, cannot be easily expressed and purified in their native conformation. Some molecules with large extracellular domains may adopt a specific conformation upon interaction with other cell surface proteins, thereby forming complexes that are cumbersome to produce by recombinant expression. Moreover, many standard screening practices, such as the adsorption of recombinant proteins on plastic, may significantly alter protein conformations (5). For these reasons, Abs selected on the basis of binding to a recombinant protein may not bind the native conformation of this protein. It is thus of high interest to develop procedures entirely based on the use of intact cells expressing the receptor of choice. However, in this case, an extra step is necessary to enrich for phage-Abs binding to the receptor of interest rather than to other cell surface proteins. Because selection steps are performed in vitro, it is possible to influence the outcome of a selection by performing some additional steps such as deletion steps (also named negative selection) prior to positive selections to remove unwanted specificities or cross-reactions (6), by alternating the source of the antigen (7), or by using a competitive elution with a ligand or an existing monoclonal antibody to favor the selection of binders against a precise epitope (8).Along this line, it would be of very high interest to establish a procedure able to reliably guide the selection toward an unknown but relevant antigen within a complex mixture, such as a tumor maker overexpressed at the surface of intact cells, or in a cell lysate. Indeed, during the past two decades, there has been a growing interest in approaches aiming at discovering new diagnosis biomarkers and identifying new potential surface markers for targeted therapy. Several studies have described the use of phage display and libraries of recombinant antibodies for the isolation of tumor specific binders (915), leading in some cases to the identification of new tumor markers (16, 17). Most of these strategies are relying on the use of depletion steps on normal samples followed by a selection step on tumor samples. Unfortunately, this procedure often leads to inconsistent results and its efficiency can be a limiting factor in complex situations such as the selection of antibodies against unknown overexpressed tumor antigens.We have designed a new selection method, named masked selection, which is relying on the blockade of unwanted epitopes to favor the targeting of relevant ones. We demonstrate the efficiency of this method by selecting binders against a specific portion of a fusion protein, by selecting binders against two members of the seven transmembrane receptor family and a tyrosine kinase receptor using intact transfected HEK cells, or by selecting binders against unknown breast cancer markers not expressed on normal samples, as shown by flow cytometry and immunohistochemistry. The universality and efficiency of this approach should ultimately lead to the rapid selection of specific binders and the development of diagnostic and targeted therapies in various settings.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号