首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Finding the common substructures shared by two proteins is considered as one of the central issues in computational biology because of its usefulness in understanding the structure-function relationship and application in drug and vaccine design. In this paper, we propose a novel algorithm called FAMCS (Finding All Maximal Common Substructures) for the common substructure identification problem. Our method works initially at the protein secondary structural element (SSE) level and starts with the identification of all structurally similar SSE pairs. These SSE pairs are then merged into sets using a modified Apriori algorithm, which will test the similarity of various sets of SSE pairs incrementally until all the maximal sets of SSE pairs that deemed to be similar are found. The maximal common substructures of the two proteins will be formed from these maximal sets. A refinement algorithm is also proposed to fine tune the alignment from the SSE level to the residue level. Comparison of FAMCS with other methods on various proteins shows that FAMCS can address all four requirements and infer interesting biological discoveries.  相似文献   

2.
To address many challenges in RNA structure/function prediction, the characterization of RNA''s modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.  相似文献   

3.
We report an application of current parallel processing transputer technology which has readily achieved a 25-fold reduction in computational time of peptide-solvent interactions.  相似文献   

4.
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa.  相似文献   

5.
G Vriend  C Sander 《Proteins》1991,11(1):52-58
We present a fully automatic algorithm for three-dimensional alignment of protein structures and for the detection of common substructures and structural repeats. Given two proteins, the algorithm first identifies all pairs of structurally similar fragments and subsequently clusters into larger units pairs of fragments that are compatible in three dimensions. The detection of similar substructures is independent of insertion/deletion penalties and can be chosen to be independent of the topology of loop connections and to allow for reversal of chain direction. Using distance geometry filters and other approximations, the algorithm, implemented in the WHAT IF program, is so fast that structural comparison of a single protein with the entire database of known protein structures can be performed routinely on a workstation. The method reproduces known non-trivial superpositions such as plastocyanin on azurin. In addition, we report surprising structural similarity between ubiquitin and a (2Fe-2S) ferredoxin.  相似文献   

6.
7.
In order to understand the behavior of a gene regulatory network, it is essential to know the genes that belong to it. Identifying the correct members (e.g., in order to build a model) is a difficult task even for small subnetworks. Usually only few members of a network are known and one needs to guess the missing members based on experience or informed speculation. It is beneficial if one can additionally rely on experimental data to support this guess. In this work we present a new method based on formal concept analysis to detect unknown members of a gene regulatory network from gene expression time series data. We show that formal concept analysis is able to find a list of candidate genes for inclusion into a partially known basic network. This list can then be reduced by a statistical analysis so that the resulting genes interact strongly with the basic network and therefore should be included when modeling the network. The method has been applied to the DNA repair system of Mycobacterium tuberculosis. In this application, our method produces comparable results to an already existing method of component selection while it is applicable to a broader range of problems.  相似文献   

8.
The local environment and land usages have changed a lot during the past one hundred years. Historical documents and materials are crucial in understanding and following these changes. Historical documents are, therefore, an important piece in the understanding of the impact and consequences of land usage change. This, in turn, is important in the search of restoration projects that can be conducted to turn and reduce harmful and unsustainable effects originating from changes in the land-usage.This work extracts information on the historical location and geographical distribution of wetlands, from hand-drawn maps. This is achieved by using deep learning (DL), and more specifically a convolutional neural network (CNN). The CNN model is trained on a manually pre-labelled dataset on historical wetlands in the area of Jönköping county in Sweden. These are all extracted from the historical map called “Generalstabskartan”.The presented CNN performs well and achieves a F1-score of 0.886 when evaluated using a 10-fold cross validation over the data. The trained models are additionally used to generate a GIS layer of the presumable historical geographical distribution of wetlands for the area that is depicted in the southern collection in Generalstabskartan, which covers the southern half of Sweden. This GIS layer is released as an open resource and can be freely used.To summarise, the presented results show that CNNs can be a useful tool in the extraction and digitalisation of non-textual information in historical documents, such as historical maps. A modern GIS material that can be used to further understand the past land-usage change is produced within this research. Previously, no material of this detail and extent have been available, due to the large effort needed to manually create such. However, with the presented resource better quantifications and estimations of historical wetlands that have been lost can be made.  相似文献   

9.
The investigation of cell shapes mostly relies on the manual classification of 2D images, causing a subjective and time consuming evaluation based on a portion of the cell surface. We present a dual-stage neural network architecture for analyzing fine shape details from confocal microscopy recordings in 3D. The system, tested on red blood cells, uses training data from both healthy donors and patients with a congenital blood disease, namely hereditary spherocytosis. Characteristic shape features are revealed from the spherical harmonics spectrum of each cell and are automatically processed to create a reproducible and unbiased shape recognition and classification. The results show the relation between the particular genetic mutation causing the disease and the shape profile. With the obtained 3D phenotypes, we suggest our method for diagnostics and theragnostics of blood diseases. Besides the application employed in this study, our algorithms can be easily adapted for the 3D shape phenotyping of other cell types and extend their use to other applications, such as industrial automated 3D quality control.  相似文献   

10.
Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download.  相似文献   

11.
Grassland birds are among the most globally threatened bird groups due to substantial degradation of native grassland habitats. However, the current network of grassland conservation areas may not be adequate for halting population declines and biodiversity loss. Here, we evaluate a network of grassland conservation areas within Wisconsin, U.S.A., that includes both large Focal Landscapes and smaller targeted conservation areas (e.g., Grassland Bird Conservation Areas, GBCAs) established within them. To date, this conservation network has lacked baseline information to assess whether the current placement of these conservation areas aligns with population hot spots of grassland‐dependent taxa. To do so, we fitted data from thousands of avian point‐count surveys collected by citizen scientists as part of Wisconsin''s Breeding Bird Atlas II with multinomial N‐mixture models to estimate habitat–abundance relationships, develop spatially explicit predictions of abundance, and establish ecological baselines within priority conservation areas for a suite of obligate grassland songbirds. Next, we developed spatial randomization tests to evaluate the placement of this conservation network relative to randomly placed conservation networks. Overall, less than 20% of species statewide populations were found within the current grassland conservation network. Spatial tests demonstrated a high representation of this bird assemblage within the entire conservation network, but with a bias toward birds associated with moderately tallgrasses relative to those associated with shortgrasses or tallgrasses. We also found that GBCAs had higher representation at Focal Landscape rather than statewide scales. Here, we demonstrated how combining citizen science data with hierarchical modeling is a powerful tool for estimating ecological baselines and conducting large‐scale evaluations of an existing conservation network for multiple grassland birds. Our flexible spatial randomization approach offers the potential to be applied to other protected area networks and serves as a complementary tool for conservation planning efforts globally.  相似文献   

12.
Lauraceae and Fagaceae are two large woody plant families that are predominant in the low- and middle-altitude regions in Taiwan. The highly interspecific similarity between some species of the family brings limitations on the management and utilization. This work proposed an approach for identifying 15 Lauraceae species and 20 Fagaceae species using leaf images and convolutional neural networks (CNNs). Leaf specimens of 35 species were collected from the northern, central, and southern parts of Taiwan. Images of the leaves were acquired using flat-bed scanners. Three CNN architectures—DenseNet-121, MobileNet V2, and Xception—were trained. Xception achieved the highest mean test accuracy of 99.39%, and MobileNet V2 required the shortest mean test time of 17.1 ms per image using a GPU. The saliency maps revealed that the characteristics learned by models matched the leaf features used by botanists. A pruning algorithm, gate decorator, was applied to the trained models for reducing the number of parameters and number of floating-point operations of the MobileNet V2 by 55.4% and 69.1%, respectively, while the model accuracy was maintained at 92.03%. Thus, MobileNet V2 has the potential to be used for identifying the Lauraceae and Fagaceae species on mobile devices.  相似文献   

13.
Fourteen natural products, known to inhibit other proteins of the Zincin-like fold class, were screened for inhibition of the Zincin-like fold metalloprotease thermolysin using mass spectrometry. Fourier Transform Mass Spectrometry was successful in identifying actinonin, a known inhibitor of astacin and stromelysin, to be an inhibitor of thermolysin. Molecular modelling studies have shown that specificity within the Zincin-like fold is determined by Protein Fold Topology.  相似文献   

14.
A central challenge in neuroscience is to understand the formation and function of three-dimensional (3D) neuronal networks. In vitro studies have been mainly limited to measurements of small numbers of neurons connected in two dimensions. Here we demonstrate the use of colloids as moveable supports for neuronal growth, maturation, transfection and manipulation, where the colloids serve as guides for the assembly of controlled 3D, millimeter-sized neuronal networks. Process growth can be guided into layered connectivity with a density similar to what is found in vivo. The colloidal superstructures are optically transparent, enabling remote stimulation and recording of neuronal activity using layer-specific expression of light-activated channels and indicator dyes. The modular approach toward in vitro circuit construction provides a stepping stone for applications ranging from basic neuroscience to neuron-based screening of targeted drugs.  相似文献   

15.

Background  

Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of 3D knowledge can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community.  相似文献   

16.
17.
Isotope labeling networks (ILNs) are graphs explaining the flow of isotope labeled molecules in a metabolic network. Moreover, they are the structural backbone of metabolic flux analysis (MFA) by isotopic tracers which has been established as a standard experimental tool in fluxomics. To configure an isotope labeling experiment (ILE) for MFA, the structure of the corresponding ILN must be understood to a certain extent even by a practitioner. Graph algorithms help to analyze the network structure but produce rather abstract results. Here, the major obstruction is the high dimension of these networks and the large number of network components which, consequently, are hard to figure out manually. At the interface between theory and experiment, the three-dimensional interactive visualization tool CumoVis has been developed for exploring the network structure in a step by step manner. Navigation and orientation within ILNs are supported by exploiting the natural 3D structure of an underlying metabolite network with stacked labeled particles on top of each metabolite node. Network exploration is facilitated by rotating, zooming, forward and backward path tracing and, most important, network component reduction. All features of CumoVis are explained with an educational example and a realistic network describing carbon flow in the citric acid cycle.  相似文献   

18.
19.
The crystal structure based model of the catalytic center of Ago2 revealed that the siRNA and the mRNA must be able to form an A-helix for correct positing of the scissile phosphate bond for cleavage in RNAi. This suggests that base pairing of the target mRNA with itself, i.e. secondary structure, must be removed before cleavage. Early on in the siRNA design, GC-rich target sites were avoided because of their potential to be involved in strong secondary structure. It is still unclear how important a factor mRNA secondary structure is in RNAi. However, it has been established that a difference in the thermostability of the ends of an siRNA duplex dictate which strand is loaded into the RNA-induced silencing complex. Here, we use a novel secondary structure prediction method and duplex-end differential calculations to investigate the importance of a secondary structure in the siRNA design. We found that the differential duplex-end stabilities alone account for functional prediction of 60% of the 80 siRNA sites examined, and that secondary structure predictions improve the prediction of site efficacy. A total of 80% of the non-functional sites can be eliminated using secondary structure predictions and duplex-end differential.  相似文献   

20.
Glioma is a highly aggressive form of brain cancer, with some subtypes having 5-year survival rates of less than 5%. Tumour cell invasion into the surrounding parenchyma seems to be the primary driver of these poor outcomes, as most gliomas recur within 2 cm of the original surgically-resected tumour. Many current approaches to the development of anticancer therapy attempt to target genetic weaknesses in a particular cancer, but may not take into account the microenvironment experienced by a tumour and the patient-specific genetic differences in susceptibility to treatment. Here we demonstrate the use of complementary approaches, 3D bioprinting and scaffold-free 3D tissue culture, to examine the invasion of glioma cells into neural-like tissue with 3D confocal microscopy. We found that, while both approaches were successful, the use of 3D tissue culture for organoid development offers the advantage of broad accessibility. As a proof-of-concept of our approach, we developed a system in which we could model the invasion of human glioma cells into mouse neural progenitor cell-derived spheroids. We show that we can follow invasion of human tumour cells using cell-tracking dyes and 3D laser scanning confocal microscopy, both in real time and in fixed samples. We validated these results using conventional cryosectioning. Our scaffold-free 3D approach has broad applicability, as we were easily able to examine invasion using different neural progenitor cell lines, thus mimicking differences that might be observed in patient brain tissue. These results, once applied to iPSC-derived cerebral organoids that incorporate the somatic genetic variability of patients, offer the promise of truly personalized treatments for brain cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号