共查询到20条相似文献,搜索用时 15 毫秒
1.
Information of protein quaternary structure can help to understand the biological functions of proteins. Because wet-lab experiments are both time-consuming and costly, we adopt a novel computational approach to assign proteins into 10 kinds of quaternary structures. By coding each protein using its biochemical and physicochemical properties, feature selection was carried out using Incremental Feature Selection (IFS) method. The thus obtained optimal feature set consisted of 97 features, with which the prediction model was built. As a result, the overall prediction success rate is 74.90% evaluated by Jackknife test, much higher than the overall correct rate of a random guess 10% (1/10). The further feature analysis indicates that protein secondary structure is the most contributed feature in the prediction of protein quaternary structure. 相似文献
2.
生物序列相似性(或差异性)分析是生物信息学研究的一种重要的方法。其中基于对齐的生物序列相似性分析方法,重点介绍基于隐马尔可夫模型的比较方法,并比较基于对齐的各种生物序列分析方法的优缺点。 相似文献
3.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them. 相似文献
4.
5.
Purpose
For compliance with the ISO standard 14044, comparative life cycle assessments are required to address data quality for time-related coverage, geographic coverage, technology coverage, precision, completeness, representativeness, consistency, reproducibility, sources of the data and uncertainty of the information. As the community of practitioners and data developers grows, the purpose of this commentary is to initiate discussion of current issues and opportunities for improvement in data quality analysis. 相似文献6.
The eosinophil cationic protein (ECP) is a small polypeptide that originates from activated eosinophil granulocytes. A wide range of stimuli has been shown to induce the secretion of ECP. The gene that encodes the human ECP is located on chromosome 14, and the protein shares the overall three-dimensional structure and the RNase active-site residues with other proteins in the RNase A superfamily. Several single-nucleotide polymorphisms in the human ECP gene have been currently described. ECP has many biological functions, including an immunoregulatory function, the regulation of fibroblast activity, and the induction of mucus secretion in the airway. Additionally, the protein is a potent cytotoxic molecule and has the capacity to kill mammalian and nonmammalian cells. The purpose of this article was to review the known biological and genetic characteristics of ECP that contribute to the understanding of this protein's role in the development and progression of a wide variety of diseases. 相似文献
7.
Gerard Such-Sanmartín Simone SidoliEstela Ventura-Espejo Ole N. Jensen 《Biochemical and biophysical research communications》2014
We introduce the computer tool “Know Your Samples” (KYSS) for assessment and visualisation of large scale proteomics datasets, obtained by mass spectrometry (MS) experiments. KYSS facilitates the evaluation of sample preparation protocols, LC peptide separation, and MS and MS/MS performance by monitoring the number of missed cleavages, precursor ion charge states, number of protein identifications and peptide mass error in experiments. KYSS generates several different protein profiles based on protein abundances, and allows for comparative analysis of multiple experiments. KYSS was adapted for blood plasma proteomics and provides concentrations of identified plasma proteins. We demonstrate the utility of the KYSS tool for MS based proteome analysis of blood plasma and for assessment of hydrogel particles for depletion of abundant proteins in plasma. The KYSS software is open source and is freely available at http://kyssproject.github.io/. 相似文献
8.
9.
10.
11.
Rocke DM 《Seminars in cell & developmental biology》2004,15(6):703-713
The design and analysis of experiments using gene expression microarrays is a topic of considerable current research, and work is beginning to appear on the analysis of proteomics and metabolomics data by mass spectrometry and NMR spectroscopy. The literature in this area is evolving rapidly, and commercial software for analysis of array or proteomics data is rarely up to date, and is essentially nonexistent for metabolomics data. In this paper, I review some of the issues that should concern any biologists planning to use such high-throughput biological assay data in an experimental investigation. Technical details are kept to a minimum, and may be found in the referenced literature, as well as in the many excellent papers which space limitations prevent my describing. There are usually a number of viable options for design and analysis of such experiments, but unfortunately, there are even more non-viable ones that have been used even in the published literature. This is an area in which up-to-date knowledge of the literature is indispensable for efficient and effective design and analysis of these experiments. In general, we concentrate on relatively simple analyses, often focusing on identifying differentially expressed genes and the comparable issues in mass spectrometry and NMR spectroscopy (consistent differences in peak heights or areas for example). Complex multivariate and pattern recognition methods also need much attention, but the issues we describe in this paper must be dealt with first. The literature on analysis of proteomics and metabolomics data is as yet sparse, so the main focus of this paper will be on methods devised for analysis of gene expression data that generalize to proteomics and metabolomics, with some specific comments near the end on analysis of metabolomics data by mass spectrometry and NMR spectroscopy. 相似文献
12.
Metabolite fingerprinting: detecting biological features by independent component analysis 总被引:18,自引:0,他引:18
Scholz M Gatzek S Sterling A Fiehn O Selbig J 《Bioinformatics (Oxford, England)》2004,20(15):2447-2454
MOTIVATION: Metabolite fingerprinting is a technology for providing information from spectra of total compositions of metabolites. Here, spectra acquisitions by microchip-based nanoflow-direct-infusion QTOF mass spectrometry, a simple and high throughput technique, is tested for its informative power. As a simple test case we are using Arabidopsis thaliana crosses. The question is how metabolite fingerprinting reflects the biological background. In many applications the classical principal component analysis (PCA) is used for detecting relevant information. Here a modern alternative is introduced-the independent component analysis (ICA). Due to its independence condition, ICA is more suitable for our questions than PCA. However, ICA has not been developed for a small number of high-dimensional samples, therefore a strategy is needed to overcome this limitation. RESULTS: To apply ICA successfully it is essential first to reduce the high dimension of the dataset, by using PCA. The number of principal components determines the quality of ICA significantly, therefore we propose a criterion for estimating the optimal dimension automatically. The kurtosis measure is used to order the extracted components to our interest. Applied to our A. thaliana data, ICA detects three relevant factors, two biological and one technical, and clearly outperforms the PCA. 相似文献
13.
Mining of biological data I: identifying discriminating features via mean hypothesis testing 总被引:1,自引:0,他引:1
Large volumes of data are routinely collected during bioprocess operations and, more recently, in basic biological research using genomics-based technologies. While these data often lack sufficient detail to be used for mechanism identification, it is possible that the underlying mechanisms affecting cell phenotype or process outcome are reflected as specific patterns in the overall or temporal sensor logs. This raises the possibility of identifying outcome-specific fingerprints that can be used for process or phenotype classification and the identification of discriminating characteristics, such as specific genes or process variables. The aim of this work is to provide a systematic approach to identifying and modeling patterns in historical records and using this information for process classification. This approach differs from others in that emphasis is placed on analyzing the data structure first and thereby extracting potentially relevant features prior to model creation. The initial step in this overall approach is to first identify the discriminating features of the relevant measurements and time windows, which can then be subsequently used to discriminate among different classes of process behavior. This is achieved via a mean hypothesis testing algorithm. Next, the homogeneity of the multivariate data in each class is explored via a novel cluster analysis technique called PC1 Time Series Clustering to ensure that the data subsets used accurately reflect the variability displayed in the historical records. This will be the topic of the second paper in this series. We present here the method for identifying discriminating features in data via mean hypothesis testing along with results from the analysis of case studies from industrial fermentations Copyright 2000 Academic Press. 相似文献
14.
From pull-down data to protein interaction networks and complexes with biological relevance 总被引:1,自引:0,他引:1
Motivation: Recent improvements in high-throughput Mass Spectrometry(MS) technology have expedited genome-wide discovery of protein–proteininteractions by providing a capability of detecting proteincomplexes in a physiological setting. Computational inferenceof protein interaction networks and protein complexes from MSdata are challenging. Advances are required in developing robustand seamlessly integrated procedures for assessment of protein–proteininteraction affinities, mathematical representation of proteininteraction networks, discovery of protein complexes and evaluationof their biological relevance. Results: A multi-step but easy-to-follow framework for identifyingprotein complexes from MS pull-down data is introduced. It assessesinteraction affinity between two proteins based on similarityof their co-purification patterns derived from MS data. It constructsa protein interaction network by adopting a knowledge-guidedthreshold selection method. Based on the network, it identifiesprotein complexes and infers their core components using a graph-theoreticalapproach. It deploys a statistical evaluation procedure to assessbiological relevance of each found complex. On Saccharomycescerevisiae pull-down data, the framework outperformed othermore complicated schemes by at least 10% in F1-measure and identified610 protein complexes with high-functional homogeneity basedon the enrichment in Gene Ontology (GO) annotation. Manual examinationof the complexes brought forward the hypotheses on cause offalse identifications. Namely, co-purification of differentprotein complexes as mediated by a common non-protein molecule,such as DNA, might be a source of false positives. Protein identificationbias in pull-down technology, such as the hydrophilic bias couldresult in false negatives. Contact: samatovan{at}ornl.gov Supplementary information: Supplementary data are availableat Bioinformatics online.
Associate Editor: Jonathan Wren
Present address: Department of Biomedical Informatics, VanderbiltUniversity, Nashville, TN 37232.
The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors. 相似文献
15.
The "allometric cancellation" technique for determining similarity criteria (dimensionless numbers) insures that these ratios are essentially free from size-dependent variation. However, with the traditional methods of calculating such values, other important sources of variation are not examined. A correlation analysis of residuals demonstrates that many similarity criteria are actually highly variable relationships among organisms and frequently have questionable empirical validity. 相似文献
16.
Biclustering algorithms for biological data analysis: a survey 总被引:7,自引:0,他引:7
Madeira SC Oliveira AL 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(1):24-45
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications. 相似文献
17.
A unified theory of biological similarity is proposed, based on dimensional analysis (mass M, length L, diameter D, time T) and on three postulates: (1) the constancy of body density in terrestrial mammals; (2) the elastic similarity criterion (Rashevsky and McMahon) where ; and (3) the proportionality between length (L) and time (T), which is valid for relaxation oscillators. The postulated theoretical model provides a satisfactory correlation (r = 0·9937) between the predicted reduced exponent (b) and 96 allometric exponents (b) obtained from experimental data concerning a number of morphologic and physiologic parameters in animals of different size.The reformulation of a theory of biological similarity is posited mainly for the “internal” organization of organisms, whereas a “mechanical” similarity should be applied when inertial forces are present during animal locomotion (kinematics).Since biological rhythmicity is based on relaxation oscillators (T ∝ L), while in “mechanical” similarity the pendulum () is the paradigm of a self-sustained oscillation, these two are the limits of a continuous spectrum of similarity criteria, where the exponent of the time dimension (T) is the essential factor.The four-dimensional nature of biological space is discussed (W ∝ L4), and due to the postulated isometry of length (L) and time (T), periodic phenomena conform to . 相似文献
18.
19.
20.
A primary objective in quantitative risk or safety assessment is characterization of the severity and likelihood of an adverse effect caused by a chemical toxin or pharmaceutical agent. In many cases data are not available at low doses or low exposures to the agent, and inferences at those doses must be based on the high-dose data. A modern method for making low-dose inferences is known as benchmark analysis, where attention centers on the dose at which a fixed benchmark level of risk is achieved. Both upper confidence limits on the risk and lower confidence limits on the "benchmark dose" are of interest. In practice, a number of possible benchmark risks may be under study; if so, corrections must be applied to adjust the limits for multiplicity. In this short note, we discuss approaches for doing so with quantal response data. 相似文献