首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Single-molecule localization microscopy (SMLM) is a powerful tool for studying intracellular structure and macromolecular organization at the nanoscale. The increasingly massive pointillistic data sets generated by SMLM require the development of new and highly efficient quantification tools. Here we present FOCAL3D, an accurate, flexible and exceedingly fast (scaling linearly with the number of localizations) density-based algorithm for quantifying spatial clustering in large 3D SMLM data sets. Unlike DBSCAN, which is perhaps the most commonly employed density-based clustering algorithm, an optimum set of parameters for FOCAL3D may be objectively determined. We initially validate the performance of FOCAL3D on simulated datasets at varying noise levels and for a range of cluster sizes. These simulated datasets are used to illustrate the parametric insensitivity of the algorithm, in contrast to DBSCAN, and clustering metrics such as the F1 and Silhouette score indicate that FOCAL3D is highly accurate, even in the presence of significant background noise and mixed populations of variable sized clusters, once optimized. We then apply FOCAL3D to 3D astigmatic dSTORM images of the nuclear pore complex (NPC) in human osteosaracoma cells, illustrating both the validity of the parameter optimization and the ability of the algorithm to accurately cluster complex, heterogeneous 3D clusters in a biological dataset. FOCAL3D is provided as an open source software package written in Python.  相似文献   

2.
Backbone cluster identification in proteins by a graph theoretical method   总被引:4,自引:0,他引:4  
A graph theoretical algorithm has been developed to identify backbone clusters of residues in proteins. The identified clusters show protein sites with the highest degree of interactions. An adjacency matrix is constructed from the non-bonded connectivity information in proteins. The diagonalization of such a matrix yields eigenvalues and eigenvectors, which contain the information on clusters. In graph theory, distinct clusters can be obtained from the second lowest eigenvector components of the matrix. However, in an interconnected graph, all the points appear as one single cluster. We have developed a method of identifying highly interacting centers (clusters) in proteins by truncating the vector components of high eigenvalues. This paper presents in detail the method adopted for identifying backbone clusters and the application of the algorithm to families of proteins like RNase-A and globin. The objective of this study was to show the efficiency of the algorithm as well as to detect conserved or similar backbone packing regions in a particular protein family. Three clusters in topologically similar regions in the case of the RNase-A family and three clusters around the porphyrin ring in the globin family were observed. The predicted clusters are consistent with the features of the family of proteins such as the topology and packing density. The method can be applied to problems such as identification of domains and recognition of structural similarities in proteins.  相似文献   

3.
《Biophysical journal》2022,121(15):2906-2920
Single-molecule localization microscopy (SMLM) permits the visualization of cellular structures an order of magnitude smaller than the diffraction limit of visible light, and an accurate, objective evaluation of the resolution of an SMLM data set is an essential aspect of the image processing and analysis pipeline. Here, we present a simple method to estimate the localization spread function (LSF) of a static SMLM data set directly from acquired localizations, exploiting the correlated dynamics of individual emitters and properties of the pair autocorrelation function evaluated in both time and space. The method is demonstrated on simulated localizations, DNA origami rulers, and cellular structures labeled by dye-conjugated antibodies, DNA-PAINT, or fluorescent fusion proteins. We show that experimentally obtained images have LSFs that are broader than expected from the localization precision alone, due to additional uncertainty accrued when localizing molecules imaged over time.  相似文献   

4.

Background

Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families.

Results

The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function.

Conclusions

Our results demonstrate that the method we present here using a k- modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
  相似文献   

5.
Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.  相似文献   

6.
CFinder: locating cliques and overlapping modules in biological networks   总被引:6,自引:0,他引:6  
Most cellular tasks are performed not by individual proteins, but by groups of functionally associated proteins, often referred to as modules. In a protein association network modules appear as groups of densely interconnected nodes, also called communities or clusters. These modules often overlap with each other and form a network of their own, in which nodes (links) represent the modules (overlaps). We introduce CFinder, a fast program locating and visualizing overlapping, densely interconnected groups of nodes in undirected graphs, and allowing the user to easily navigate between the original graph and the web of these groups. We show that in gene (protein) association networks CFinder can be used to predict the function(s) of a single protein and to discover novel modules. CFinder is also very efficient for locating the cliques of large sparse graphs. Availability: CFinder (for Windows, Linux and Macintosh) and its manual can be downloaded from http://angel.elte.hu/clustering. Supplementary information: Supplementary data are available on Bioinformatics online.  相似文献   

7.
Functionally related genes often appear in each other's neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the (pi)pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences.  相似文献   

8.
Given the massive increase in the number of new sequences and structures, a critical problem is how to integrate these raw data into meaningful biological information. One approach, the Evolutionary Trace, or ET, uses phylogenetic information to rank the residues in a protein sequence by evolutionary importance and then maps those ranked at the top onto a representative structure. If these residues form structural clusters, they can identify functional surfaces such as those involved in molecular recognition. Now that a number of examples have shown that ET can identify binding sites and focus mutational studies on their relevant functional determinants, we ask whether the method can be improved so as to be applicable on a large scale. To address this question, we introduce a new treatment of gaps resulting from insertions and deletions, which streamlines the selection of sequences used as input. We also introduce objective statistics to assess the significance of the total number of clusters and of the size of the largest one. As a result of the novel treatment of gaps, ET performance improves measurably. We find evolutionarily privileged clusters that are significant at the 5% level in 45 out of 46 (98%) proteins drawn from a variety of structural classes and biological functions. In 37 of the 38 proteins for which a protein-ligand complex is available, the dominant cluster contacts the ligand. We conclude that spatial clustering of evolutionarily important residues is a general phenomenon, consistent with the cooperative nature of residues that determine structure and function. In practice, these results suggest that ET can be applied on a large scale to identify functional sites in a significant fraction of the structures in the protein databank (PDB). This approach to combining raw sequences and structure to obtain detailed insights into the molecular basis of function should prove valuable in the context of the Structural Genomics Initiative.  相似文献   

9.
The mammalian cell nucleus is functionally compartmentalized into various substructures. Nuclear speckles, also known as interchromatin granule clusters, are enriched with SR splicing factors and are implicated in gene expression. Here we report that nuclear speckle formation is developmentally regulated; in certain cases phosphorylated SR proteins are absent from the nucleus and are instead localized at granular structures in the cytoplasm. To investigate how the nuclear architecture is formed, we performed a phenotypic screen of HeLa cells treated with a series of small interfering RNAs. Depletion of Ran-binding protein 2 induced cytoplasmic intermediates of nuclear speckles in G1 phase. Detailed analyses of these structures suggested that a late step in the sequential nuclear entry of mitotic interchromatin granule components was disrupted and that phosphorylated SR proteins were sequestered in an SR protein kinase-dependent manner. As a result, the cells had an imbalanced subcellular distribution of phosphorylated and hypophosphorylated SR proteins, which affected alternative splicing patterns. This study demonstrates that the speckled distribution of phosphorylated pre-mRNA processing factors is regulated by the nucleocytoplasmic transport system in mammalian cells and that it is important for alternative splicing.  相似文献   

10.
The number of different cortical structures in mammalian brains and the number of extrinsic fibres linking these regions are both large. As with any complex system, systematic analysis is required to draw reliable conclusions about the organization of the complex neural networks comprising these numerous elements. One aspect of organization that has long been suspected is that cortical networks are organized into 'streams' or 'systems'. Here we report computational analyses capable of showing whether clusters of strongly interconnected areas are aspects of the global organization of cortical systems in macaque and cat. We used two different approaches to analyse compilations of corticocortical connection data from the macaque and the cat. The first approach, optimal set analysis, employed an explicit definition of a neural 'system' or 'stream', which was based on differential connectivity. We defined a two-component cost function that described the cost of the global cluster arrangement of areas in terms of the areas' connectivity within and between candidate clusters. Optimal cluster arrangements of cortical areas were then selected computationally from the very many possible arrangements, using an evolutionary optimization algorithm. The second approach, non-parametric cluster analysis (NPCA), grouped cortical areas on the basis of their proximity in multidimensional scaling representations. We used non-metric multidimensional scaling to represent the cortical connectivity structures metrically in two and five dimensions. NPCA then analysed these representations to determine the nature of the clusters for a wide range of different cluster shape parameters. The results from both approaches largely agreed. They showed that macaque and cat cortices are organized into densely intra-connected clusters of areas, and identified the constituent members of the clusters. These clusters reflected functionally specialized sets of cortical areas, suggesting that structure and function are closely linked at this gross, systems level.  相似文献   

11.
Cai XH  Jaroszewski L  Wooley J  Godzik A 《Proteins》2011,79(8):2389-2402
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.  相似文献   

12.
Charge interactions are of great importance for protein function and structure, and for a variety of cellular and biochemical processes. We present a systematic approach to the detection of distinctive clusters, runs and periodic patterns of charged residues in a protein sequence. Criteria and formulae are set forth to assess statistical significance of these charge configurations. For the 80-odd proteins potentially encoded by the Epstein-Barr virus, only the major nuclear antigens of the latent state and the transactivator of the lytic cycle contain separated charge clusters of opposite sign as well as periodic charge patterns. From our studies of the polypeptides of the human herpesviruses and of a broad collection of human and other viral protein sequences, distinctive charge configurations appear to be associated with viral capsid and core proteins (positive clusters or runs, mostly at the carboxyl terminus), with many viral glycoproteins and membrane-associated proteins (negative charge clusters), and with transactivators and transforming proteins (multiple charge structures). The statistics developed in this paper apply more generally to other than charge properties of a protein and should aid in the evaluation of a large variety of sequence features.  相似文献   

13.
This paper presents a novel method to detect side-chain clusters in protein three-dimensional structures using a graph spectral approach. Protein side-chain interactions are represented by a labeled graph in which the nodes of the graph represent the Cbeta atoms and the edges represent the distance between the Cbeta atoms. The distance information and the non-bonded connectivity of the residues are represented in the form of a matrix called the Laplacian matrix. The constructed matrix is diagonalized and clustering information is obtained from the vector components associated with the second lowest eigenvalue and cluster centers are obtained from the vector components associated with the top eigenvalues. The method uses global information for clustering and a single numeric computation is required to detect clusters of interest. The approach has been adopted here to detect a variety of side-chain clusters and identify the residue which makes the largest number of interactions among the residues forming the cluster (cluster centers). Detecting such clusters and cluster centers are important from a protein structure and folding point of view. The crucial residues which are important in the folding pathway as determined by PhiF values (which is a measure of the effect of a mutation on the stability of the transition state of folding) as obtained from protein engineering methods, can be identified from the vector components corresponding to the top eigenvalues. Expanded clusters are detected near the active and binding site of the protein, supporting the nucleation condensation hypothesis for folding. The method is also shown to detect domains in protein structures and conserved side-chain clusters in topologically similar proteins.  相似文献   

14.
Protein complex prediction via cost-based clustering   总被引:13,自引:0,他引:13  
MOTIVATION: Understanding principles of cellular organization and function can be enhanced if we detect known and predict still undiscovered protein complexes within the cell's protein-protein interaction (PPI) network. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitates an accurate and scalable approach to protein complex identification. RESULTS: We have developed the Restricted Neighborhood Search Clustering Algorithm (RNSC) to efficiently partition networks into clusters using a cost function. We applied this cost-based clustering algorithm to PPI networks of Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans to identify and predict protein complexes. We have determined functional and graph-theoretic properties of true protein complexes from the MIPS database. Based on these properties, we defined filters to distinguish between identified network clusters and true protein complexes. Conclusions: Our application of the cost-based clustering algorithm provides an accurate and scalable method of detecting and predicting protein complexes within a PPI network.  相似文献   

15.
《Biophysical journal》2022,121(12):2279-2289
Modulation enhanced single-molecule localization microscopy (meSMLM) methods improve the localization precision by using patterned illumination to encode additional position information. Iterative meSMLM (imeSMLM) methods iteratively generate prior information on emitter positions, used to locally improve the localization precision during subsequent iterations. The Cramér-Rao lower bound cannot incorporate prior information to bound the best achievable localization precision because it requires estimators to be unbiased. By treating estimands as random variables with a known prior distribution, the Van Trees inequality (VTI) can be used to bound the best possible localization precision of imeSMLM methods. An imeSMLM method is considered, where the positions of in-plane standing-wave illumination patterns are controlled over the course of multiple iterations. Using the VTI, we analytically approximate a lower bound on the maximum localization precision of imeSMLM methods that make use of standing-wave illumination patterns. In addition, we evaluate the maximally achievable localization precision for different illumination pattern placement strategies using Monte Carlo simulations. We show that in the absence of background and under perfect modulation, the information content of signal photons increases exponentially as a function of the iteration count. However, the information increase is no longer exponential as a function of the iteration count under non-zero background, imperfect modulation, or limited mechanical resolution of the illumination positioning system. As a result, imeSMLM with two iterations reaches at most a fivefold improvement over SMLM at 8 expected background photons per pixel and 95% modulation contrast. Moreover, the information increase from imeSMLM is balanced by a reduced signal photon rate. Therefore, SMLM outperforms imeSMLM when considering an equal measurement time and illumination power per iteration. Finally, the VTI is an excellent tool for the assessment of the performance of illumination control and is therefore the method of choice for optimal design and control of imeSMLM methods.  相似文献   

16.
Intraspecific genetic variation can have similar effects as species diversity on ecosystem function; understanding such variation is important, particularly for ecological key species. The brown trout plays central roles in many northern freshwater ecosystems, and several cases of sympatric brown trout populations have been detected in freshwater lakes based on apparent morphological differences. In some rare cases, sympatric, genetically distinct populations lacking visible phenotypic differences have been detected based on genetic data alone. Detecting such “cryptic” sympatric populations without prior grouping of individuals based on phenotypic characteristics is more difficult statistically, though. The aim of the present study is to delineate the spatial connectivity of two cryptic, sympatric genetic clusters of brown trout discovered in two interconnected, tiny subarctic Swedish lakes. The structures were detected using allozyme markers, and have been monitored over time. Here, we confirm their existence for almost three decades and report that these cryptic, sympatric populations exhibit very different connectivity patterns to brown trout of nearby lakes. One of the clusters is relatively isolated while the other one shows high genetic similarity to downstream populations. There are indications of different spawning sites as reflected in genetic structuring among parr from different creeks. We used >3000 SNPs on a subsample and find that the SNPs largely confirm the allozyme pattern but give considerably lower F ST values, and potentially indicate further structuring within populations. This type of complex genetic substructuring over microgeographical scales might be more common than anticipated and needs to be considered in conservation management.  相似文献   

17.
MOTIVATION: We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns. RESULTS: The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 dataset. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information. AVAILABILITY: The Web interface of the predictor is available at http://neural.dsi.unifi.it/cysteines  相似文献   

18.
This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.  相似文献   

19.
In this paper we describe a 160-kDa protein (p160) which is present in the nuclear matrix of rat, mouse, and human cells. Biochemical and ultrastructural analysis shows that p160 is associated with the internal matrix and is not present in the lamina-pore complex. Immunoelectron microscopy shows that the protein is part of the extranucleolar, fibrogranular network of the nuclear matrix. During an in vivo 42 degrees C heat treatment of HeLa cells, A431 human epidermoid cells, and T24 human bladder carcinoma cells, p160 transiently formed large clusters inside the nucleus. These p160 clusters are associated with the nuclear matrix network, as judged by immunolabeling on isolated nuclear matrices. The percentage of cells showing p160 clusters increased proportionally with longer heat treatments, reaching a maximum after a period of 3 h. At this time 70 +/- 5% of the cells displayed these clusters. Clustering decreased after longer heat treatments and the anti-p160 staining pattern became diffuse granular again. Other nuclear components, such as the A1 antigen of hnRNP (ribonucleoprotein), the Sm antigen of snRNPs, and lamins A and C, did not cluster during the 42 degrees C treatment, indicating that this reallocation is characteristic for the p160 matrix protein. These results demonstrate that p160 is an internal nuclear matrix element with a dynamic spatial distribution.  相似文献   

20.
Podosomes are multimolecular mechanosensory assemblies that coordinate mesenchymal migration of tissue-resident dendritic cells. They have a protrusive actin core and an adhesive ring of integrins and adaptor proteins, such as talin and vinculin. We recently demonstrated that core actin oscillations correlate with intensity fluctuations of vinculin but not talin, suggesting different molecular rearrangements for these components. Detailed information on the mutual localization of core and ring components at the nanoscale is lacking. By dual-color direct stochastic optical reconstruction microscopy, we for the first time determined the nanoscale organization of individual podosomes and their spatial arrangement within large clusters formed at the cell–substrate interface. Superresolution imaging of three ring components with respect to actin revealed that the cores are interconnected and linked to the ventral membrane by radiating actin filaments. In core-free areas, αMβ2 integrin and talin islets are homogeneously distributed, whereas vinculin preferentially localizes proximal to the core and along the radiating actin filaments. Podosome clusters appear as self-organized contact areas, where mechanical cues might be efficiently transduced and redistributed. Our findings call for a reevaluation of the current “core–ring” model and provide a novel structural framework for further understanding the collective behavior of podosome clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号