共查询到20条相似文献,搜索用时 15 毫秒
1.
Rey-Long Liu 《PloS one》2015,10(10)
Biomedical literature is an essential source of biomedical evidence. To translate the evidence for biomedicine study, researchers often need to carefully read multiple articles about specific biomedical issues. These articles thus need to be highly related to each other. They should share similar core contents, including research goals, methods, and findings. However, given an article r, it is challenging for search engines to retrieve highly related articles for r. In this paper, we present a technique PBC (Passage-based Bibliographic Coupling) that estimates inter-article similarity by seamlessly integrating bibliographic coupling with the information collected from context passages around important out-link citations (references) in each article. Empirical evaluation shows that PBC can significantly improve the retrieval of those articles that biomedical experts believe to be highly related to specific articles about gene-disease associations. PBC can thus be used to improve search engines in retrieving the highly related articles for any given article r, even when r is cited by very few (or even no) articles. The contribution is essential for those researchers and text mining systems that aim at cross-validating the evidence about specific gene-disease associations. 相似文献
2.
3.
Semantic Similarity in Biomedical Ontologies 总被引:1,自引:0,他引:1
Catia Pesquita Daniel Faria Andr O. Falco Phillip Lord Francisco M. Couto 《PLoS computational biology》2009,5(7)
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. 相似文献
4.
In the present paper we investigate the concept of equidirection, i.e. similarity in the direction of variation, or parallelism in the broader sense, among m (m ≧ 2) times series, especially under the assumption that the time series are realizations of processes with independent increments. However, the processes need not be stationary. Furthermore, the probabilities for the direction of variation may be unstable, in which case only upper and lower bounds are known. A measure based on the concept of equidirection was developed that enables identification of clusters of similar time series and analysis of relationships among variables. 相似文献
5.
Background
Numerous completely sequenced bacterial genomes harbor prophage elements. These elements have been implicated in increasing the virulence of the host and in phage immunity. The e14 element is a defective lambdoid prophage element present at 25 min in the Escherichia coli K-12 genome. e14 is a well-characterized prophage element and has been subjected to in-depth bioinformatic analysis. 相似文献6.
In the present paper, we have created several novel journal similarity metrics. The MeSH odds ratio measures the topical similarity of any pair of journals, based on the major MeSH headings assigned to articles in MEDLINE. The second metric employed the 2009 Author-ity author name disambiguation dataset as a gold standard for estimating the author odds ratio. This gives a straightforward, intuitive answer to the question: Given two articles in PubMed that share the same author name (lastname, first initial), how does knowing only the identity of the journals (in which the articles were published) predict the relative likelihood that they are written by the same person vs. different persons? The article pair odds ratio detects the tendency of authors to publish repeatedly in the same journal, as well as in specific pairs of journals. The metrics can be applied not only to estimate the similarity of a pair of journals, but to provide novel profiles of individual journals as well. For example, for each journal, one can define the MeSH cloud as the number of other journals that are topically more similar to it than expected by chance, and the author cloud as the number of other journals that share more authors than expected by chance. These metrics for journal pairs and individual journals have been provided in the form of public datasets that can be readily studied and utilized by others. 相似文献
7.
图像配准在临床诊断中有重要意义,针对这一问题已经提出了许多方法。本文以区域相似性匹配测度,运用改进的分割方法,结合Powell寻优算法实现了CT/PET多模医学图像配准。实验结果表明,该算法易于实现,配准速度快、精度高,鲁棒性较好。 相似文献
8.
为配准医学图像,本文提出了一种新的自适应指数加权的互信息(Adaptive Exponential Weighted Mutual Informa- tion,AEWMI)测度,分析表明:通过对互信息(Mutual Information,MI)测度进行指数加权可以提高测度曲线的峰值尖锐性和平滑性;而指数的权值则可以通过评估待配准图像的质量和分辨率大小来自适应确定。仿真实验结果在验证分析结果的同时也表明,基于本文AEWMI测度的配准方案,对图像噪声、分辨率差异等有较高的鲁棒性,且可有效地提高配准的成功率。 相似文献
9.
Jason Robert Potas Newton Gon?alves de Castro Ted Maddess Marcio Nogueira de Souza 《PloS one》2015,10(9)
Experimental electrophysiological assessment of evoked responses from regenerating nerves is challenging due to the typical complex response of events dispersed over various latencies and poor signal-to-noise ratio. Our objective was to automate the detection of compound action potential events and derive their latencies and magnitudes using a simple cross-correlation template comparison approach. For this, we developed an algorithm called Waveform Similarity Analysis. To test the algorithm, challenging signals were generated in vivo by stimulating sural and sciatic nerves, whilst recording evoked potentials at the sciatic nerve and tibialis anterior muscle, respectively, in animals recovering from sciatic nerve transection. Our template for the algorithm was generated based on responses evoked from the intact side. We also simulated noisy signals and examined the output of the Waveform Similarity Analysis algorithm with imperfect templates. Signals were detected and quantified using Waveform Similarity Analysis, which was compared to event detection, latency and magnitude measurements of the same signals performed by a trained observer, a process we called Trained Eye Analysis. The Waveform Similarity Analysis algorithm could successfully detect and quantify simple or complex responses from nerve and muscle compound action potentials of intact or regenerated nerves. Incorrectly specifying the template outperformed Trained Eye Analysis for predicting signal amplitude, but produced consistent latency errors for the simulated signals examined. Compared to the trained eye, Waveform Similarity Analysis is automatic, objective, does not rely on the observer to identify and/or measure peaks, and can detect small clustered events even when signal-to-noise ratio is poor. Waveform Similarity Analysis provides a simple, reliable and convenient approach to quantify latencies and magnitudes of complex waveforms and therefore serves as a useful tool for studying evoked compound action potentials in neural regeneration studies. 相似文献
10.
Courtney Schiffman Christina Lin Funan Shi Luonan Chen Lydia Sohn Haiyan Huang 《Statistics in biosciences》2017,9(1):200-216
One goal of single-cell RNA sequencing (scRNA seq) is to expose possible heterogeneity within cell populations due to meaningful, biological variation. Examining cell-to-cell heterogeneity, and further, identifying subpopulations of cells based on scRNA seq data has been of common interest in life science research. A key component to successfully identifying cell subpopulations (or clustering cells) is the (dis)similarity measure used to group the cells. In this paper, we introduce a novel measure, named SIDEseq, to assess cell-to-cell similarity using scRNA seq data. SIDEseq first identifies a list of putative differentially expressed (DE) genes for each pair of cells. SIDEseq then integrates the information from all the DE gene lists (corresponding to all pairs of cells) to build a similarity measure between two cells. SIDEseq can be implemented in any clustering algorithm that requires a (dis)similarity matrix. This new measure incorporates information from all cells when evaluating the similarity between any two cells, a characteristic not commonly found in existing (dis)similarity measures. This property is advantageous for two reasons: (a) borrowing information from cells of different subpopulations allows for the investigation of pairwise cell relationships from a global perspective and (b) information from other cells of the same subpopulation could help to ensure a robust relationship assessment. We applied SIDEseq to a newly generated human ovarian cancer scRNA seq dataset, a public human embryo scRNA seq dataset, and several simulated datasets. The clustering results suggest that the SIDEseq measure is capable of uncovering important relationships between cells, and outperforms or at least does as well as several popular (dis)similarity measures when used on these datasets. 相似文献
11.
K. H. Nicholls 《International Review of Hydrobiology》1985,70(5):621-632
Four commonly used formulae for measuring percentage similarity (PS) of biological communities were tested for their usefulness in relating to two plankton community properties, species proportional differences and total density differences. The formula best combining species proportionality and total density in the expression of PS is new: where min (xi,yi) is the lesser percentage (doubly standardized) of a species in two samples X and Y and where 2 q, 2xi and 2yi are the total quantities of all species in samples 8,X and Y, where \documentclass{article}\pagestyle{empty}\begin{document}$ \sum\limits_i {z_i } ,\,\sum\limits_i {x_i } \,and\sum\limits_i {y_i } $\end{document} are the total quantities of all species in samples Z, X and Y, respectively. Sample 2 contains the highest density of all species in the set; \documentclass{article}\pagestyle{empty}\begin{document}$ \sum\limits_i {z_i \, > \,(\sum\limits_i {x_i ,\,} \sum\limits_i {y_i } )} $\end{document}. The new expression of PS is simple to use and has the additional advantage of offering the analyst an unlimited choice of weighting factors or importance values for proportionality of species content and total density. The method has been applied to data from Gravenhurst Bay (Ontario) and effectively demonstrates the consequences of phosphorus loading reductions for phytoplankton communities. 相似文献
12.
Hamed Nili Cai Wingfield Alexander Walther Li Su William Marslen-Wilson Nikolaus Kriegeskorte 《PLoS computational biology》2014,10(4)
Neuronal population codes are increasingly being investigated with multivariate pattern-information analyses. A key challenge is to use measured brain-activity patterns to test computational models of brain information processing. One approach to this problem is representational similarity analysis (RSA), which characterizes a representation in a brain or computational model by the distance matrix of the response patterns elicited by a set of stimuli. The representational distance matrix encapsulates what distinctions between stimuli are emphasized and what distinctions are de-emphasized in the representation. A model is tested by comparing the representational distance matrix it predicts to that of a measured brain region. RSA also enables us to compare representations between stages of processing within a given brain or model, between brain and behavioral data, and between individuals and species. Here, we introduce a Matlab toolbox for RSA. The toolbox supports an analysis approach that is simultaneously data- and hypothesis-driven. It is designed to help integrate a wide range of computational models into the analysis of multichannel brain-activity measurements as provided by modern functional imaging and neuronal recording techniques. Tools for visualization and inference enable the user to relate sets of models to sets of brain regions and to statistically test and compare the models using nonparametric inference methods. The toolbox supports searchlight-based RSA, to continuously map a measured brain volume in search of a neuronal population code with a specific geometry. Finally, we introduce the linear-discriminant t value as a measure of representational discriminability that bridges the gap between linear decoding analyses and RSA. In order to demonstrate the capabilities of the toolbox, we apply it to both simulated and real fMRI data. The key functions are equally applicable to other modalities of brain-activity measurement. The toolbox is freely available to the community under an open-source license agreement (http://www.mrc-cbu.cam.ac.uk/methods-and-resources/toolboxes/license/).
This is a PLOS Computational Biology Software Article相似文献
13.
Assessing the Impact of Case Sensitivity and Term Information Gain on Biomedical Concept Recognition
Concept recognition (CR) is a foundational task in the biomedical domain. It supports the important process of transforming unstructured resources into structured knowledge. To date, several CR approaches have been proposed, most of which focus on a particular set of biomedical ontologies. Their underlying mechanisms vary from shallow natural language processing and dictionary lookup to specialized machine learning modules. However, no prior approach considers the case sensitivity characteristics and the term distribution of the underlying ontology on the CR process. This article proposes a framework that models the CR process as an information retrieval task in which both case sensitivity and the information gain associated with tokens in lexical representations (e.g., term labels, synonyms) are central components of a strategy for generating term variants. The case sensitivity of a given ontology is assessed based on the distribution of so-called case sensitive tokens in its terms, while information gain is modelled using a combination of divergence from randomness and mutual information. An extensive evaluation has been carried out using the CRAFT corpus. Experimental results show that case sensitivity awareness leads to an increase of up to 0.07 F1 against a non-case sensitive baseline on the Protein Ontology and GO Cellular Component. Similarly, the use of information gain leads to an increase of up to 0.06 F1 against a standard baseline in the case of GO Biological Process and Molecular Function and GO Cellular Component. Overall, subject to the underlying token distribution, these methods lead to valid complementary strategies for augmenting term label sets to improve concept recognition. 相似文献
14.
15.
16.
S ummary . A rapid test for the identification of phenol-degrading bacteria was devised using a medium containing phenol as the sole source of carbon and energy. Degradation of phenol in such a medium was assessed by growth and change in pH value. The results obtained are in agreement with those obtained by methods requiring chemical analysis of phenol in spent culture fluid. 相似文献
17.
指出了Nei氏遗传相似度仅仅是用来描述两个二进制变量差异或相似程度的一种距离系数或相似系数,与个体间亲缘程度没有必然联系.根据亲缘系数的定义,提出新的遗传相似度计算公式即rA(x,y)=(2N2xy)/(NxNy)或rA(x,y)=(N2xy)/(NxNy).并通过实例验证了该公式可用于判断个体间亲缘程度方面。
Abstract:Nei's genetic similarity was only described as a kind of distance coefficient for binary system variables and it have not positive connection with the relationship between individuals.According to the definition of the relationship coefficient,a new formula which is about genetic similarity is put forward as following:rA(x,y)=
It is verified by an example that this formula can be used to judge the relationship between individuals. 相似文献
18.
19.
BackgroundPhenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO.ResultsHPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA).ConclusionsHPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/). 相似文献
20.
DNA序列信息的一种新的测度 总被引:1,自引:3,他引:1
根据信息理论给出了测度DNA序列信息的一种新的方法,获得DNA序列4个层次的信息量测度:Ib,If(1),If(2)andIf(3),这4种信息测度可分别用来测度DNA的碱基序列、密码子序列、编码蛋白质序列和功能蛋白质序列的信息量。从M.edulis的线粒体基因组中两个较短的编码蛋白质的DNA序列和使用具有不同倍性的间并密码子组组成的模拟DNA序列中所获得计算结果表明,这些信息测度确实能用来揭示所 相似文献