共查询到20条相似文献,搜索用时 9 毫秒
1.
Since biomedical texts contain a wide variety of domain specific terms, building a large dictionary to perform term matching is of great relevance. However, due to the existence of null boundary between adjacent terms, this matching is not a trivial problem. Moreover, it is known that generative words cannot be comprehensively included in a dictionary because their possible variations are infinite. In this study, we report our approach to dictionary building and term matching in biomedical texts. Large amount of terms with/without part-of-speech (POS) and/or category information were gathered, and a completion program generated approximately 1.36 million term variants to avoid stemming problems when matching terms. The dictionary was stored in a relational database management system (RDBMS) for quick lookup, and used by a matching program. Since the matching operation is not restricted to a substring surrounded by space characters, we can avoid the problem of null boundaries. This feature is also useful for generative words. Experimental results on GENIA corpus are promising: nearly half of the possible terms were correctly recognized as a meaningful segment, and most of the remaining half could be correctly recognized by some post-processing process, like chunking and further decomposition. It should be remarked that although we have not used term cost, connectivity cost, or syntactic information, reasonable segmentation and dictionary lookup were performed in most cases. 相似文献
2.
MOTIVATION: Attribute selection is a critical step in development of document classification systems. As a standard practice, words are stemmed and the most informative ones are used as attributes in classification. Owing to high complexity of biomedical terminology, general-purpose stemming algorithms are often conservative and could also remove informative stems. This can lead to accuracy reduction, especially when the number of labeled documents is small. To address this issue, we propose an algorithm that omits stemming and, instead, uses the most discriminative substrings as attributes. RESULTS: The approach was tested on five annotated sets of abstracts from iProLINK that report on the experimental evidence about five types of protein post-translational modifications. The experiments showed that Naive Bayes and support vector machine classifiers perform consistently better [with area under the ROC curve (AUC) accuracy in range 0.92-0.97] when using the proposed attribute selection than when using attributes obtained by the Porter stemmer algorithm (AUC in 0.86-0.93 range). The proposed approach is particularly useful when labeled datasets are small. 相似文献
3.
Jin Wang Ping Liu Mary F.H. She Saeid Nahavandi Abbas Kouzani 《Biomedical signal processing and control》2013,8(6):634-644
Automatic analysis of biomedical time series such as electroencephalogram (EEG) and electrocardiographic (ECG) signals has attracted great interest in the community of biomedical engineering due to its important applications in medicine. In this work, a simple yet effective bag-of-words representation that is originally developed for text document analysis is extended for biomedical time series representation. In particular, similar to the bag-of-words model used in text document domain, the proposed method treats a time series as a text document and extracts local segments from the time series as words. The biomedical time series is then represented as a histogram of codewords, each entry of which is the count of a codeword appeared in the time series. Although the temporal order of the local segments is ignored, the bag-of-words representation is able to capture high-level structural information because both local and global structural information are well utilized. The performance of the bag-of-words model is validated on three datasets extracted from real EEG and ECG signals. The experimental results demonstrate that the proposed method is not only insensitive to parameters of the bag-of-words model such as local segment length and codebook size, but also robust to noise. 相似文献
4.
Christopher S. Rogers 《Transgenic research》2016,25(3):345-359
To commemorate Transgenic Animal Research Conference X, this review summarizes the recent progress in developing genetically engineered livestock species as biomedical models. The first of these conferences was held in 1997, which turned out to be a watershed year for the field, with two significant events occurring. One was the publication of the first transgenic livestock animal disease model, a pig with retinitis pigmentosa. Before that, the use of livestock species in biomedical research had been limited to wild-type animals or disease models that had been induced or were naturally occurring. The second event was the report of Dolly, a cloned sheep produced by somatic cell nuclear transfer. Cloning subsequently became an essential part of the process for most of the models developed in the last 18 years and is stilled used prominently today. This review is intended to highlight the biomedical modeling achievements that followed those key events, many of which were first reported at one of the previous nine Transgenic Animal Research Conferences. Also discussed are the practical challenges of utilizing livestock disease models now that the technical hurdles of model development have been largely overcome. 相似文献
5.
Brodersen KH Schofield TM Leff AP Ong CS Lomakina EI Buhmann JM Stephan KE 《PLoS computational biology》2011,7(6):e1002079
Decoding models, such as those underlying multivariate classification algorithms, have been increasingly used to infer cognitive or clinical brain states from measures of brain activity obtained by functional magnetic resonance imaging (fMRI). The practicality of current classifiers, however, is restricted by two major challenges. First, due to the high data dimensionality and low sample size, algorithms struggle to separate informative from uninformative features, resulting in poor generalization performance. Second, popular discriminative methods such as support vector machines (SVMs) rarely afford mechanistic interpretability. In this paper, we address these issues by proposing a novel generative-embedding approach that incorporates neurobiologically interpretable generative models into discriminative classifiers. Our approach extends previous work on trial-by-trial classification for electrophysiological recordings to subject-by-subject classification for fMRI and offers two key advantages over conventional methods: it may provide more accurate predictions by exploiting discriminative information encoded in 'hidden' physiological quantities such as synaptic connection strengths; and it affords mechanistic interpretability of clinical classifications. Here, we introduce generative embedding for fMRI using a combination of dynamic causal models (DCMs) and SVMs. We propose a general procedure of DCM-based generative embedding for subject-wise classification, provide a concrete implementation, and suggest good-practice guidelines for unbiased application of generative embedding in the context of fMRI. We illustrate the utility of our approach by a clinical example in which we classify moderately aphasic patients and healthy controls using a DCM of thalamo-temporal regions during speech processing. Generative embedding achieves a near-perfect balanced classification accuracy of 98% and significantly outperforms conventional activation-based and correlation-based methods. This example demonstrates how disease states can be detected with very high accuracy and, at the same time, be interpreted mechanistically in terms of abnormalities in connectivity. We envisage that future applications of generative embedding may provide crucial advances in dissecting spectrum disorders into physiologically more well-defined subgroups. 相似文献
6.
7.
We present an algorithm to identify individual neural spikes observed on high-density multi-electrode arrays (MEAs). Our method can distinguish large numbers of distinct neural units, even when spikes overlap, and accounts for intrinsic variability of spikes from each unit. As MEAs grow larger, it is important to find spike-identification methods that are scalable, that is, the computational cost of spike fitting should scale well with the number of units observed. Our algorithm accomplishes this goal, and is fast, because it exploits the spatial locality of each unit and the basic biophysics of extracellular signal propagation. Human interaction plays a key role in our method; but effort is minimized and streamlined via a graphical interface. We illustrate our method on data from guinea pig retinal ganglion cells and document its performance on simulated data consisting of spikes added to experimentally measured background noise. We present several tests demonstrating that the algorithm is highly accurate: it exhibits low error rates on fits to synthetic data, low refractory violation rates, good receptive field coverage, and consistency across users. 相似文献
8.
针对局部线性嵌入算法(LocalLinearEmbedding,LLE)利用试凑法寻找近邻数耗时的缺陷性,提出一种增强的核局部线性嵌入算法(EnhancedKernelLocalLinearEmbedding,EKLLE)自动为样本分配邻域;该算法以高斯核函数为核心改进标准LLE距离度量准则,结合样本的类别信息,无需人工干预自动为样本设置不同的近邻数,克服了试凑法获得最优结果时需要大量时间;最后在各样本近邻数不相同的情况下对数据进行维数简约及待测样本分类。EKLLE算法有效地将高维基因表达谱数据映射到低维本质空间中,解决了传统LLE算法不能很好地处理合噪声或者稀疏数据的缺点。通过对比其他肿瘤样本分类实验,验证本文方法的实时性和精确性。 相似文献
9.
The ongoing exponential rise in recording capacity calls for new approaches for analysing and interpreting neural data. Effective dimensionality has emerged as an important property of neural activity across populations of neurons, yet different studies rely on different definitions and interpretations of this quantity. Here, we focus on intrinsic and embedding dimensionality, and discuss how they might reveal computational principles from data. Reviewing recent works, we propose that the intrinsic dimensionality reflects information about the latent variables encoded in collective activity while embedding dimensionality reveals the manner in which this information is processed. We conclude by highlighting the role of network models as an ideal substrate for testing more specifically various hypotheses on the computational principles reflected through intrinsic and embedding dimensionality. 相似文献
10.
11.
One of the major research directions in bioinformatics is that of predicting the protein superfamily in large databases and classifying a given set of protein domains into superfamilies. The classification reflects the structural, evolutionary and functional relatedness. These relationships are embodied in hierarchical classification such as Structural Classification of Protein (SCOP), which is manually curated. Such classification is essential for the structural and functional analysis of proteins. Yet, a large number of proteins remain unclassified. We have proposed an unsupervised machine-learning FuzzyART neural network algorithm to classify a given set of proteins into SCOP superfamilies. The proposed method is fast learning and uses an atypical non-linear pattern recognition technique. In this approach, we have constructed a similarity matrix from p-values of BLAST all-against-all, trained the network with FuzzyART unsupervised learning algorithm using the similarity matrix as input vectors and finally the trained network offers SCOP superfamily level classification. In this experiment, we have evaluated the performance of our method with existing techniques on six different datasets. We have shown that the trained network is able to classify a given similarity matrix of a set of sequences into SCOP superfamilies at high classification accuracy. 相似文献
12.
Wilson-Sanders SE 《ILAR journal / National Research Council, Institute of Laboratory Animal Resources》2011,52(2):126-152
Invertebrate animals have been used as medicinals for 4,000 years and have served as models for research and teaching since the late 1800s. Interest in invertebrate models has increased over the past several decades as the research community has responded to public concerns about the use of vertebrate animals in research. As a result, invertebrates are being evaluated and recognized as models for many diseases and conditions. Their use has led to discoveries in almost every area of biology and medicine--from embryonic development to aging processes. Species range from terrestrial invertebrates such as nematodes and insects to freshwater and marine life including planarians, crustaceans, molluscs, and many others. The most often used models are the fruit fly Drosophila melanogaster and the minuscule nematode Caenorhabditis elegans. Topics in this article are categorized by biologic system, process, or disease with discussion of associated invertebrate models. Sections on bioactive products discovered from invertebrates follow the models section, and the article concludes with uses of invertebrates in teaching. The models reviewed can serve as references for scientists, researchers, veterinarians, institutional animal care and use committees (IACUCs), and others interested in alternatives to vertebrate animals. 相似文献
13.
Biomedical trials often give rise to data having the form of time series of a common process on separate individuals. One model which has been proposed to explain variations in such series across individuals is a random effects model based on sample periodograms. The use of spectral coefficients enables models for individual series to be constructed on the basis of standard asymptotic theory, whilst variations between individuals are handled by permitting a random effect perturbation of model coefficients. This paper extends such methodology in two ways: first, by enabling a nonparametric specification of underlying spectral behaviour; second, by addressing some of the tricky computational issues which are encountered when working with this class of random effect models. This leads to a model in which a population spectrum is specified nonparametrically through a dynamic system, and the processes measured on individuals within the population are assumed to have a spectrum which has a random effect perturbation from the population norm. Simulation studies show that standard MCMC algorithms give effective inferences for this model, and applications to biomedical data suggest that the model itself is capable of revealing scientifically important structure in temporal characteristics both within and between individual processes. 相似文献
14.
A neural network architecture for data classification 总被引:1,自引:0,他引:1
Lezoray O 《International journal of neural systems》2001,11(1):33-42
This article aims at showing an architecture of neural networks designed for the classification of data distributed among a high number of classes. A significant gain in the global classification rate can be obtained by using our architecture. This latter is based on a set of several little neural networks, each one discriminating only two classes. The specialization of each neural network simplifies their structure and improves the classification. Moreover, the learning step automatically determines the number of hidden neurons. The discussion is illustrated by tests on databases from the UCI machine learning database repository. The experimental results show that this architecture can achieve a faster learning, simpler neural networks and an improved performance in classification. 相似文献
15.
This paper presents work on parameter estimation methods for bursting neural models. In our approach we use both geometrical features specific to bursting, as well as general features such as periodic orbits and their bifurcations. We use the geometry underlying bursting to introduce defining equations for burst initiation and termination, and restrict the estimation algorithms to the space of bursting periodic orbits when trying to fit periodic burst data. These geometrical ideas are combined with automatic differentiation to accurately compute parameter sensitivities for the burst timing and period. In addition to being of inherent interest, these sensitivities are used in standard gradient-based optimization algorithms to fit model burst duration and period to data. As an application, we fit Butera et al.'s (Journal of Neurophysiology 81, 382-397, 1999) model of preB?tzinger complex neurons to empirical data both in control conditions and when the neuromodulator norepinephrine is added (Viemari and Ramirez, Journal of Neurophysiology 95, 2070-2082, 2006). The results suggest possible modulatory mechanisms in the preB?tzinger complex, including modulation of the persistent sodium current. 相似文献
16.
17.
Chains of coupled oscillators of simple “rotator” type have been used to model the central pattern generator (CPG) for locomotion
in lamprey, among numerous applications in biology and elsewhere. In this paper, motivated by experiments on lamprey CPG with
brainstem attached, we investigate a simple oscillator model with internal structure which captures both excitable and bursting
dynamics. This model, and that for the coupling functions, is inspired by the Hodgkin–Huxley equations and two-variable simplifications
thereof. We analyse pairs of coupled oscillators with both excitatory and inhibitory coupling. We also study traveling wave
patterns arising from chains of oscillators, including simulations of “body shapes” generated by a double chain of oscillators
providing input to a kinematic musculature model of lamprey..
Received: 25 November 1996 / Revised version: 9 December 1997 相似文献
18.
19.
Background
Bacterial colony morphology is the first step of classifying the bacterial species before sending them to subsequent identification process with devices, such as VITEK 2 automated system and mass spectrometry microbial identification system. It is essential as a pre-screening process because it can greatly reduce the scope of possible bacterial species and will make the subsequent identification more specific and increase work efficiency in clinical bacteriology. But this work needs adequate clinical laboratory expertise of bacterial colony morphology, which is especially difficult for beginners to handle properly. This study presents automatic programs for bacterial colony classification task, by applying the deep convolutional neural networks (CNN), which has a widespread use of digital imaging data analysis in hospitals. The most common 18 bacterial colony classes from Peking University First Hospital were used to train this framework, and other images out of these training dataset were utilized to test the performance of this classifier.Results
The feasibility of this framework was verified by the comparison between predicted result and standard bacterial category. The classification accuracy of all 18 bacteria can reach 73%, and the accuracy and specificity of each kind of bacteria can reach as high as 90%.Conclusions
The supervised neural networks we use can have more promising classification characteristics for bacterial colony pre-screening process, and the unsupervised network should have more advantages in revealing novel characteristics from pictures, which can provide some practical indications to our clinical staffs.20.
MOTIVATION: The sheer volume of textually described biomedical knowledge exerts the need for natural language processing (NLP) applications in order to allow flexible and efficient access to relevant information. Specialized semantic networks (such as biomedical ontologies, terminologies or semantic lexicons) can significantly enhance these applications by supplying the necessary terminological information in a machine-readable form. With the explosive growth of bio-literature, new terms (representing newly identified concepts or variations of the existing terms) may not be explicitly described within the network and hence cannot be fully exploited by NLP applications. Linguistic and statistical clues can be used to extract many new terms from free text. The extracted terms still need to be correctly positioned relative to other terms in the network. Classification as a means of semantic typing represents the first step in updating a semantic network with new terms. RESULTS: The MaSTerClass system implements the case-based reasoning methodology for the classification of biomedical terms. 相似文献