首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wang S  Zhu J 《Biometrics》2008,64(2):440-448
Summary .   Variable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L 1-norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural "group." Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L 1-norm approach.  相似文献   

2.
X-ray crystallography typically uses a single set of coordinates and B factors to describe macromolecular conformations. Refinement of multiple copies of the entire structure has been previously used in specific cases as an alternative means of representing structural flexibility. Here, we systematically validate this method by using simulated diffraction data, and we find that ensemble refinement produces better representations of the distributions of atomic positions in the simulated structures than single-conformer refinements. Comparison of principal components calculated from the refined ensembles and simulations shows that concerted motions are captured locally, but that correlations dissipate over long distances. Ensemble refinement is also used on 50 experimental structures of varying resolution and leads to decreases in R(free) values, implying that improvements in the representation of flexibility observed for the simulated structures may apply to real structures. These gains are essentially independent of resolution or data-to-parameter ratio, suggesting that even structures at moderate resolution can benefit from ensemble refinement.  相似文献   

3.
4.
Sequence comparison is one of the major tasks in bioinformatics, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations. There are several similarity/dissimilarity measures for sequence comparison, but challenges remains. This paper presented a binomial model-based measure to analyze biological sequences. With help of a random indicator, the occurrence of a word at any position of sequence can be regarded as a random Bernoulli variable, and the distribution of a sum of the word occurrence is well known to be a binomial one. By using a recursive formula, we computed the binomial probability of the word count and proposed a binomial model-based measure based on the relative entropy. The proposed measure was tested by extensive experiments including classification of HEV genotypes and phylogenetic analysis, and further compared with alignment-based and alignment-free measures. The results demonstrate that the proposed measure based on binomial model is more efficient.  相似文献   

5.
We draw on Short’s work on Peirce’s theory of signs to propose a new general definition of interpretation. Short argues that Peirce’s semiotics rests on his naturalised teleology. Our proposal extends Short’s work by modifying his definition of interpretation so as to make it more generally applicable to putatively interpretative processes in biological systems. We use our definition as the basis of an account of different kinds of misinterpretation and we discuss some questions raised by the definition by reference to parallel problems in the field of teleosemantics. We propose that interpretative responses fulfilling the criteria of our definition may be made by relatively simple molecular entities and we suggest two specific empirical applications of the definition to experimental work in the field of origin of life research. Our wider aim is to suggest that a well formulated naturalistic definition of interpretation will allow a re-evaluation of the role of semiotic phenomena in biological systems, including the generation of empirically testable hypotheses.  相似文献   

6.

Background  

Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases.  相似文献   

7.
Bacterial chaperonin, GroEL, together with its co-chaperonin, GroES, facilitates the folding of a variety of polypeptides. Experiments suggest that GroEL stimulates protein folding by multiple cycles of binding and release. Misfolded proteins first bind to an exposed hydrophobic surface on GroEL. GroES then encapsulates the substrate and triggers its release into the central cavity of the GroEL/ES complex for folding. In this work, we investigate the possibility to facilitate protein folding in molecular dynamics simulations by mimicking the effects of GroEL/ES namely, repeated binding and release, together with spatial confinement. During the binding stage, the (metastable) partially folded proteins are allowed to attach spontaneously to a hydrophobic surface within the simulation box. This destabilizes the structures, which are then transferred into a spatially confined cavity for folding. The approach has been tested by attempting to refine protein structural models generated using the ROSETTA procedure for ab initio structure prediction. Dramatic improvements in regard to the deviation of protein models from the corresponding experimental structures were observed. The results suggest that the primary effects of the GroEL/ES system can be mimicked in a simple coarse-grained manner and be used to facilitate protein folding in molecular dynamics simulations. Furthermore, the results support the assumption that the spatial confinement in GroEL/ES assists the folding of encapsulated proteins.  相似文献   

8.
The novelty of new human coronavirus COVID-19/SARS-CoV-2 and the lack of effective drugs and vaccines gave rise to a wide variety of strategies employed to fight this worldwide pandemic. Many of these strategies rely on the repositioning of existing drugs that could shorten the time and reduce the cost compared to de novo drug discovery. In this study, we presented a new network-based algorithm for drug repositioning, called SAveRUNNER (Searching off-lAbel dRUg aNd NEtwoRk), which predicts drug–disease associations by quantifying the interplay between the drug targets and the disease-specific proteins in the human interactome via a novel network-based similarity measure that prioritizes associations between drugs and diseases locating in the same network neighborhoods. Specifically, we applied SAveRUNNER on a panel of 14 selected diseases with a consolidated knowledge about their disease-causing genes and that have been found to be related to COVID-19 for genetic similarity (i.e., SARS), comorbidity (e.g., cardiovascular diseases), or for their association to drugs tentatively repurposed to treat COVID-19 (e.g., malaria, HIV, rheumatoid arthritis). Focusing specifically on SARS subnetwork, we identified 282 repurposable drugs, including some the most rumored off-label drugs for COVID-19 treatments (e.g., chloroquine, hydroxychloroquine, tocilizumab, heparin), as well as a new combination therapy of 5 drugs (hydroxychloroquine, chloroquine, lopinavir, ritonavir, remdesivir), actually used in clinical practice. Furthermore, to maximize the efficiency of putative downstream validation experiments, we prioritized 24 potential anti-SARS-CoV repurposable drugs based on their network-based similarity values. These top-ranked drugs include ACE-inhibitors, monoclonal antibodies (e.g., anti-IFNγ, anti-TNFα, anti-IL12, anti-IL1β, anti-IL6), and thrombin inhibitors. Finally, our findings were in-silico validated by performing a gene set enrichment analysis, which confirmed that most of the network-predicted repurposable drugs may have a potential treatment effect against human coronavirus infections.  相似文献   

9.
鹅源新城疫病毒ZJ1株微型基因组的构建及其初步应用   总被引:3,自引:0,他引:3  
在获得鹅源新城疫病毒ZJ1株全基因组序列的基础上,用增强型绿色荧光蛋白(eGFP)报告基因取代鹅源新城疫病毒ZJ1株整个编码区,只保留与病毒复制、转录和病毒粒子包装相关的调控序列,将其反向克隆入转录载体TVT7R(0.0)中,构建了该毒株的微型基因组。当转染用辅助病毒ZJ1株感染的Hep_2细胞时报告基因得到表达,表明此微型 基因组RNA可被辅助病毒提供的NP、P和L蛋白翻译。同时将该病毒NP、P和L蛋白基因分别克隆入真核表达载体pCI_neo中,构建了表达该病毒NP、P与L蛋白的辅助质粒,用此微型基因组对辅助质粒的表达产物进行了功能鉴定并对该病毒拯救过程中痘苗病毒的最适感染剂量进行了摸索。以上研究为该病毒的成功拯救及开展其它相关研究奠定了基础。  相似文献   

10.

Background

Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees.

Results

In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search.

Conclusions

The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request.
  相似文献   

11.
We have developed a method of searching for similar spatial arrangements of atoms around a given chemical moiety in proteins that bind a common ligand. The first step in this method is to consider a set of atoms that closely surround a given chemical moiety. Then, to compare the spatial arrangements of such surrounding atoms in different proteins, they are translated and rotated so that the chemical moieties are superposed on each other. Spatial arrangements of surrounding atoms in a pair of proteins are judged to be similar, when there are many corresponding atoms occupying similar spatial positions. Because the method focuses on the arrangements of surrounding atoms, it can detect structural similarities of binding sites in proteins that are dissimilar in their amino acid sequences or in their chain folds. We have applied this method to identify modes of nucleotide base recognition by proteins. An all-against-all comparison of the arrangements of atoms surrounding adenine moieties revealed an unexpected structural similarity between protein kinases, cAMP-dependent protein kinase (cAPK), and casein kinase-1 (CK1), and D-Ala:D-Ala ligase (DD-ligase) at their adenine-binding sites, despite a lack of similarity in their chain folds. The similar local structure consists of a four-residue segment and three sequentially separated residues. In particular the four-residue segments of these enzymes were found to have nearly identical conformations in their backbone parts, which are involved in the recognition of adenine. This common local structure was also found in substrate-free three-dimensional structures of other proteins that are similar to DD-ligase in the chain fold and of other protein kinases. As the proteins with different folds were found to share a common local structure, these proteins seem to constitute a remarkable example of convergent evolution for the same recognition mechanism. Received: 9 December 1996 / Accepted: 7 February 1997  相似文献   

12.
We describe an algorithm for finding nucleotide residues stronglycorrelated with the amino acid acceptor functions of transferRNAs. The algorithm exploits the fact that each tRNA acceptsonly one of 20 amino acids. The algorithm is applied to 37 Saccharomycescerevisiae transfer RNAs. Received on January 28, 1987  相似文献   

13.
H Robinson  A H Wang 《Biochemistry》1992,31(13):3524-3533
We have developed a simple and quantitative procedure (SPEDREF) for the refinement of DNA structures using experimental two-dimensional nuclear Overhauser effect (2D NOE) data. The procedure calculates the simulated 2D NOE spectrum using the full matrix relaxation method on the basis of a molecular model. The volume of all NOE peaks is measured and compared between the experimental and the calculated spectra. The difference of the experimental and simulated volumes is minimized by a conjugated gradient procedure to adjust the interproton distances in the model. An agreement factor (analogous to the crystallographic R-factor) is used to monitor the progress of the refinement. The procedure is an The agreement is considered to be complete when several parameters, including the R-factor, the energy associated with the molecule, the local conformation (as judged by the sugar pseudorotation), and the global conformation (as judged by the helical x-displacement), are refined to their respective convergence. With the B-DNA structure of d(CGATCG) as an example, we show that DNA structure may be refined to produce calculated NOE spectra that are in excellent agreement with the experimental 2D NOE spectra. This is judged to be effective by the low R-factor of approximately 15%. Moreover, we demonstrate that not only are NOE data very powerful in providing details of the local structure but, with appropriate weighting of the NOE constraints, the global structure of the DNA double helix can also be determined, even when starting with a grossly different model. The reliability and limitations of a DNA structure as determined by NMR spectroscopy are discussed.  相似文献   

14.
15.
16.
A method is described for predicting and solving crystal structures of linear homopolysaccharides. The method is based on the refinement of the structure with respect to either stereochemical constraints or x-ray diffraction intensities. In the refinement process, all conformational and packing features of the molecule, such as bond lengths, bond angles, conformational angles, nonbonded contacts, hydrogen bonds, etc., can be allowed to vary until the structure reaches both a conformation and crystalline packing that are in minimum disagreement with the stereochemical restraints and the diffraction data. In this fashion, both packing and conformational features of the structure can be simultaneously refined, and not separately as has been the custom in the past. The refinement procedure is based on a method of constrained optimization which possesses improved characteristics of reaching a solution and avoiding false minima, in comparison with least squares methods. The procedure is, in addition, capable of easily finding molecules of solvent of crystallization. The method was applied to further refining the previously solved crystal structure of V-amylose. The results indicated that contrary to the previously found six-fold molecular symmetry in the P212121 space group, the V-amylose molecule exhibits only two-fold symmetry with the asymmetric unit consisting of three glucose residues in one-half turn of the helix. The three residues are nonequivalent principally due to unequal rotational positions of the hydroxymethyl groups. The crystal structure of V-amylose predicted from stereochemical refinement was identical in all details with that obtained from refining against X-ray data. The excellent agreement with the diffraction data was indicated by the crystallographic disagreement index R = 0.25.  相似文献   

17.
18.
The availability of computerized knowledge on biochemical pathways in the KEGG database opens new opportunities for developing computational methods to characterize and understand higher level functions of complete genomes. Our approach is based on the concept of graphs; for example, the genome is a graph with genes as nodes and the pathway is another graph with gene products as nodes. We have developed a simple method for graph comparison to identify local similarities, termed correlated clusters, between two graphs, which allows gaps and mismatches of nodes and edges and is especially suitable for detecting biological features. The method was applied to a comparison of the complete genomes of 10 microorganisms and the KEGG metabolic pathways, which revealed, not surprisingly, a tendency for formation of correlated clusters called FRECs (functionally related enzyme clusters). However, this tendency varied considerably depending on the organism. The relative number of enzymes in FRECs was close to 50% for Bacillus subtilis and Escherichia coli, but was <10% for Synechocystis and Saccharomyces cerevisiae. The FRECs collection is reorganized into a collection of ortholog group tables in KEGG, which represents conserved pathway motifs with the information about gene clusters in all the completely sequenced genomes.  相似文献   

19.
Organisms orient themselves to a stimulus by two general methods. One method is by directed orientation (taxis); the other is by undirected locomotory reaction (kinesis). An equation, and the methods for finding the necessary parameters of this equation, is derived for the distribution of organisms within a container, with the following limitations: (1) the organisms have no accommodation, (2) they are always active, and (3) the stimulus changes slowly with position. Necessary modifications of the equation are then derived, so that the last two limitations may be eliminated. The equation cannot be solved excatly because of its complexity; hence an approximation method must be used. This method is discussed, an approximate solution is found, and a time constant for equilibrium to be established is derived. Applications tovarious experiments in the literature are then made with fairly satisfactory results. A new interpretation of the theory of klino-kinesis with accommodation is found upon application of the equations developed to experimental work. Further limitations and uses of these equations are then discussed. This work was done while the author was Public Health Service Research Fellow of The National Institute of Mental Halth, Federal Security Agency.  相似文献   

20.
A parallel algorithm for estimating the secondary structure of an RNA molecule is presented in this paper. The mathematical problem to compute an optimal folding based on free-energy minimization is mapped onto a graph planarization problem. In the planarization problem we want to maximize the number of edges in a plane with no two edges crossing each other. To solve a sequence of n bases, n(n — 1)/2 processing elements are used in our algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号