首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce a new approach to investigate the dual nucleotides compositions of 11 Gram-positive and 12 Gram-negative eubacteria recently studied by Sorimachi and Okayasu. The approach firstly obtains a 16-dimension vector set of dual nucleotides by PN-curve from the complete genome of organism. Each vector of the set corresponds to a single gene of genome. Then we reduce the 16-dimension vector set to 2-dimension by principal components analysis (PCA). The reduction avoids possible loss of information averaging all 16-dimension vectors. Then we suggest a 2D graphical representation based on the 2-dimension vector to investigate the classification patters among different organisms.  相似文献   

2.
3.
Sparse sufficient dimension reduction   总被引:2,自引:0,他引:2  
Li  Lexin 《Biometrika》2007,94(3):603-613
Existing sufficient dimension reduction methods suffer fromthe fact that each dimension reduction component is a linearcombination of all the original predictors, so that it is difficultto interpret the resulting estimates. We propose a unified estimationstrategy, which combines a regression-type formulation of sufficientdimension reduction methods and shrinkage estimation, to producesparse and accurate solutions. The method can be applied tomost existing sufficient dimension reduction methods such assliced inverse regression, sliced average variance estimationand principal Hessian directions. We demonstrate the effectivenessof the proposed method by both simulations and real data analysis.  相似文献   

4.
5.
The solvent-accessible surface area of proteins is important in biological function for many reasons, including protein-protein interactions, protein folding, and catalytic sites. Here we present a chemical technique to oxidize amino acid side chains in a model protein, apomyoglobin, and subsequent elucidation of the effect of solvent accessibility on the sites of oxidation. Under conditions of low protein oxidation (zero to three oxygen atoms added per apomyoglobin molecule), we have positively identified five oxidation sites by liquid chromatography-tandem mass spectrometry and high-resolution Fourier transform mass spectrometry. Our results indicate that all oxidized amino acids, with the exception of methionine, have highly solvent-accessible side chains, but the rate of oxidation may not be dictated solely by solvent accessibility and amino acid identity.  相似文献   

6.
7.
Lu W  Li L 《Biometrics》2011,67(2):513-523
Methodology of sufficient dimension reduction (SDR) has offered an effective means to facilitate regression analysis of high-dimensional data. When the response is censored, however, most existing SDR estimators cannot be applied, or require some restrictive conditions. In this article, we propose a new class of inverse censoring probability weighted SDR estimators for censored regressions. Moreover, regularization is introduced to achieve simultaneous variable selection and dimension reduction. Asymptotic properties and empirical performance of the proposed methods are examined.  相似文献   

8.

Background  

The most popular methods for significance analysis on microarray data are well suited to find genes differentially expressed across predefined categories. However, identification of features that correlate with continuous dependent variables is more difficult using these methods, and long lists of significant genes returned are not easily probed for co-regulations and dependencies. Dimension reduction methods are much used in the microarray literature for classification or for obtaining low-dimensional representations of data sets. These methods have an additional interpretation strength that is often not fully exploited when expression data are analysed. In addition, significance analysis may be performed directly on the model parameters to find genes that are important for any number of categorical or continuous responses. We introduce a general scheme for analysis of expression data that combines significance testing with the interpretative advantages of the dimension reduction methods. This approach is applicable both for explorative analysis and for classification and regression problems.  相似文献   

9.
Sufficient dimension reduction via bayesian mixture modeling   总被引:1,自引:0,他引:1  
Reich BJ  Bondell HD  Li L 《Biometrics》2011,67(3):886-895
Dimension reduction is central to an analysis of data with many predictors. Sufficient dimension reduction aims to identify the smallest possible number of linear combinations of the predictors, called the sufficient predictors, that retain all of the information in the predictors about the response distribution. In this article, we propose a Bayesian solution for sufficient dimension reduction. We directly model the response density in terms of the sufficient predictors using a finite mixture model. This approach is computationally efficient and offers a unified framework to handle categorical predictors, missing predictors, and Bayesian variable selection. We illustrate the method using both a simulation study and an analysis of an HIV data set.  相似文献   

10.
Xia  Yingcun 《Biometrika》2009,96(1):133-148
Lack-of-fit checking for parametric and semiparametric modelsis essential in reducing misspecification. The efficiency ofmost existing model-checking methods drops rapidly as the dimensionof the covariates increases. We propose to check a model byprojecting the fitted residuals along a direction that adaptsto the systematic departure of the residuals from the desiredpattern. Consistency of the method is proved for parametricand semiparametric regression models. A bootstrap implementationis also discussed. Simulation comparisons with several existingmethods are made, suggesting that the proposed methods are moreefficient than the existing methods when the dimension increases.Air pollution data from Chicago are used to illustrate the procedure.  相似文献   

11.
In the analysis of high-throughput biological data, it is often believed that the biological units such as genes behave interactively by groups, that is, pathways in our context. It is conceivable that utilization of priorly available pathway knowledge would greatly facilitate both interpretation and estimation in statistical analysis of such high-dimensional biological data. In this article, we propose a 2-step procedure for the purpose of identifying pathways that are related to and influence the clinical phenotype. In the first step, a nonlinear dimension reduction method is proposed, which permits flexible within-pathway gene interactions as well as nonlinear pathway effects on the response. In the second step, a regularized model-based pathway ranking and selection procedure is developed that is built upon the summary features extracted from the first step. Simulations suggest that the new method performs favorably compared to the existing solutions. An analysis of a glioblastoma microarray data finds 4 pathways that have evidence of support from the biological literature.  相似文献   

12.
Characterization of melanophore morphology by fractal dimension analysis   总被引:1,自引:0,他引:1  
Fractal or focal dimension (FD) analysis is a valuable tool to identify physiologic stimuli at the cellular and tissue levels that allows for quantification of cell perimeter complexity. The FD analysis was determined on fluorescence images of caffeine- or epinephrine-treated (or untreated control) killifish Fundulus heteroclitus (Linneaus) melanophores in culture. Cell perimeters were indicated by rhodamine-phalloidin labeling of cortical microfilaments using box-counting FD analysis. Caffeine-treated melanophores displayed dispersed melanosomes in cells with less serrated edges and reduced FD and complexity. Complexity in epinephrine-treated cells was significantly higher than the caffeine-treated cells or in the control. Cytoarchitectural variability of the cell perimeter is expected because cells change shape when cued with agents. Epinephrine-treated melanophores demonstrated aggregated melanosomes in cells with more serrated edges, significantly higher FD and thus complexity. Melanophores not treated with caffeine or epinephrine produced variable distributions of melanosomes and resulted in cells with variably serrated edges and intermediate FD with a larger SE of the regression and greater range of complexity. Dispersion of melanosomes occurs with rearrangements of the cytoskeleton to accommodate centrifugal distribution of melanosomes throughout the cell and to the periphery. The loading of melanosomes onto cortical microfilaments may provide a less complex cell contour, with the even distribution of the cytoskeleton and melanosomes. Aggregation of melanosomes occurs with rearrangements of the cytoskeleton to accommodate centripetal distribution of melanosomes. The aggregation of melanosomes may contribute to centripetal retraction of the cytoskeleton and plasma membrane. The FD analysis is, therefore, a convenient method to measure contrasting morphologic changes within stimulated cells.  相似文献   

13.
Protein modeling could be done on various levels of structural details, from simplified lattice or continuous representations, through high resolution reduced models, employing the united atom representation, to all-atom models of the molecular mechanics. Here I describe a new high resolution reduced model, its force field and applications in the structural proteomics. The model uses a lattice representation with 800 possible orientations of the virtual alpha carbon-alpha carbon bonds. The sampling scheme of the conformational space employs the Replica Exchange Monte Carlo method. Knowledge-based potentials of the force field include: generic protein-like conformational biases, statistical potentials for the short-range conformational propensities, a model of the main chain hydrogen bonds and context-dependent statistical potentials describing the side group interactions. The model is more accurate than the previously designed lattice models and in many applications it is complementary and competitive in respect to the all-atom techniques. The test applications include: the ab initio structure prediction, multitemplate comparative modeling and structure prediction based on sparse experimental data. Especially, the new approach to comparative modeling could be a valuable tool of the structural proteomics. It is shown that the new approach goes beyond the range of applicability of the traditional methods of the protein comparative modeling.  相似文献   

14.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

15.
L Ellingson  J Zhang 《PloS one》2012,7(7):e40540
Comparison of the binding sites of proteins is an effective means for predicting protein functions based on their structure information. Despite the importance of this problem and much research in the past, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. Here, we designed a new algorithm, TIPSA (Triangulation-based Iterative-closest-point for Protein Surface Alignment), based on the iterative closest point (ICP) algorithm. TIPSA aims to find the maximum number of atoms that can be superposed between two protein binding sites, where any pair of superposed atoms has a distance smaller than a given threshold. The search starts from similar tetrahedra between two binding sites obtained from 3D Delaunay triangulation and uses the Hungarian algorithm to find additional matched atoms. We found that, due to the plasticity of protein binding sites, matching the rigid body of point clouds of protein binding sites is not adequate for satisfactory binding ligand prediction. We further incorporated global geometric information, the radius of gyration of binding site atoms, and used nearest neighbor classification for binding site prediction. Tested on benchmark data, our method achieved a performance comparable to the best methods in the literature, while simultaneously providing the common atom set and atom correspondences.  相似文献   

16.
OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a reachability plot that can either be used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the clustering structure of the input set, even if the input set is embedded in higher dimensions. The focus of this work is a visualization method that can be applied for comparing two, independent hierarchical clusterings by assigning colors to all entries of the input database. We give two applications related to macromolecular structural properties: the first is a sequence-based clustering of the SwissProt database that is evaluated using NCBI taxonomy identifiers, and the second application involves clustering locations of specific atoms in the serine protease enzyme family—and the clusters are evaluated using SCOP structural classifications.  相似文献   

17.
18.
Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5'-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function.  相似文献   

19.
With the help of a microfabrication process and surface modification technology, a method of fabricating protein patterned chips was developed which can be utilized as a powerful tool for performing bioassays in a high-throughput manner. A digital micromirror array (MMA) system was used as a virtual photomask, so that a maskless photolithography process was able to be used to build patterned biomolecules on a chip by selective illumination onto the chip surface. We utilized the nitroveratryloxycarbonyl (NVOC) group as a photolabile protecting group for protein patterning. The NVOC-protected surface was selectively irradiated by a UV illuminator using an MMA. After removing the NVOC group, biotin was coupled to the NVOC-cleaved site, onto which a buffered streptavidin solution was eluted. At this point, we could obtain a streptavidin-patterned surface and observe the effect of the polymer-grafted surface in reducing nonspecific binding.  相似文献   

20.
Hundreds of protein crystal structures exist for proteins whose function cannot be confidently determined from sequence similarity. Surflex‐PSIM, a previously reported surface‐based protein similarity algorithm, provides an alternative method for hypothesizing function for such proteins. The method now supports fully automatic binding site detection and is fast enough to screen comprehensive databases of protein binding sites. The binding site detection methodology was validated on apo/holo cognate protein pairs, correctly identifying 91% of ligand binding sites in holo structures and 88% in apo structures where corresponding sites existed. For correctly detected apo binding sites, the cognate holo site was the most similar binding site 87% of the time. PSIM was used to screen a set of proteins that had poorly characterized functions at the time of crystallization, but were later biochemically annotated. Using a fully automated protocol, this set of 8 proteins was screened against ~60,000 ligand binding sites from the PDB. PSIM correctly identified functional matches that predated query protein biochemical annotation for five out of the eight query proteins. A panel of 12 currently unannotated proteins was also screened, resulting in a large number of statistically significant binding site matches, some of which suggest likely functions for the podorly characterized proteins. Proteins 2014; 82:679–694. © 2013 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号