共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
The hierarchical clustering tree (HCT) with a dendrogram [1] and the singular value decomposition (SVD) with a dimension-reduced representative map [2] are popular methods for two-way sorting the gene-by-array matrix map employed in gene expression profiling. While HCT dendrograms tend to optimize local coherent clustering patterns, SVD leading eigenvectors usually identify better global grouping and transitional structures.Results
This study proposes a flipping mechanism for a conventional agglomerative HCT using a rank-two ellipse (R2E, an improved SVD algorithm for sorting purpose) seriation by Chen [3] as an external reference. While HCTs always produce permutations with good local behaviour, the rank-two ellipse seriation gives the best global grouping patterns and smooth transitional trends. The resulting algorithm automatically integrates the desirable properties of each method so that users have access to a clustering and visualization environment for gene expression profiles that preserves coherent local clusters and identifies global grouping trends.Conclusion
We demonstrate, through four examples, that the proposed method not only possesses better numerical and statistical properties, it also provides more meaningful biomedical insights than other sorting algorithms. We suggest that sorted proximity matrices for genes and arrays, in addition to the gene-by-array expression matrix, can greatly aid in the search for comprehensive understanding of gene expression structures. Software for the proposed methods can be obtained at http://gap.stat.sinica.edu.tw/Software/GAP. 相似文献2.
Background
The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from the use of a great deal of abbreviations. Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis. Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based. State of the art methods either focus exclusively on acronym-type abbreviations, or could not recognize rare abbreviations. We propose a systematic method to extract abbreviations effectively. At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method for the latter. 相似文献3.
We present a biomedical text-mining system focused on four types of gene-related information: biological functions, associated diseases, related genes and gene-gene relations. The aim of this system is to provide researchers an easy-to-use bio-information service that will rapidly survey the rapidly burgeoning biomedical literature. AVAILABILITY: http://iir.csie.ncku.edu.tw/~yuhc/gis/ 相似文献
4.
N. F. Belyaeva V. N. Kashirtseva N. V. Medvedeva Yu. Yu. Khudoklinova O. M. Ipatova A. I. Archakov 《Biochemistry (Moscow) Supplemental Series B: Biomedical Chemistry》2009,3(4):343-350
Zebrafish (Danio rerio) is now firmly recognized as a powerful research model for many areas of biology and medicine. Here, we review some achievements
of zebrafish-based assays for modeling human diseases and for drug discovery and development. For drug discovery, zebrafish
is especially valuable during the earlier stages of research as its represents a model organism to demonstrate a new treatment’s
efficacy and toxicity before more costly mammalian models are used. This review considers some examples of known compounds
which exhibit both physiological activity and toxicity in humans and zebrafish. The major advantages of zebrafish embryos
consist in their permeability to small molecules added to their incubation medium and chorion transparency that enables the
easy observation of the development. Assay of acute toxicity (LC50 estimation) in embryos can also include the screening for developmental disorders as an indicator of teratogenic effects.
We have used the zebrafish model for toxicity testing of new drugs based on phospholipid nanoparticles (e.g. doxorubicin).
Genome organization and the pathways involved into control of signal transduction appear to be highly conserved between zebrafish
and humans and therefore zebrafish may be used for modeling of human diseases. The review provides some examples of zebrafish
application in this field. 相似文献
5.
Yang Jin Ryan T McDonald Kevin Lerman Mark A Mandel Steven Carroll Mark Y Liberman Fernando C Pereira Raymond S Winters Peter S White 《BMC bioinformatics》2006,7(1):492
Background
The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining. 相似文献6.
MOTIVATION: The sheer volume of textually described biomedical knowledge exerts the need for natural language processing (NLP) applications in order to allow flexible and efficient access to relevant information. Specialized semantic networks (such as biomedical ontologies, terminologies or semantic lexicons) can significantly enhance these applications by supplying the necessary terminological information in a machine-readable form. With the explosive growth of bio-literature, new terms (representing newly identified concepts or variations of the existing terms) may not be explicitly described within the network and hence cannot be fully exploited by NLP applications. Linguistic and statistical clues can be used to extract many new terms from free text. The extracted terms still need to be correctly positioned relative to other terms in the network. Classification as a means of semantic typing represents the first step in updating a semantic network with new terms. RESULTS: The MaSTerClass system implements the case-based reasoning methodology for the classification of biomedical terms. 相似文献
7.
8.
Zhehuan Zhao Zhihao Yang Ling Luo Lei Wang Yin Zhang Hongfei Lin Jian Wang 《BMC medical genomics》2017,10(5):73
Background
Automatic disease named entity recognition (DNER) is of utmost importance for development of more sophisticated BioNLP tools. However, most conventional CRF based DNER systems rely on well-designed features whose selection is labor intensive and time-consuming. Though most deep learning methods can solve NER problems with little feature engineering, they employ additional CRF layer to capture the correlation information between labels in neighborhoods which makes them much complicated.Methods
In this paper, we propose a novel multiple label convolutional neural network (MCNN) based disease NER approach. In this approach, instead of the CRF layer, a multiple label strategy (MLS) first introduced by us, is employed. First, the character-level embedding, word-level embedding and lexicon feature embedding are concatenated. Then several convolutional layers are stacked over the concatenated embedding. Finally, MLS strategy is applied to the output layer to capture the correlation information between neighboring labels.Results
As shown by the experimental results, MCNN can achieve the state-of-the-art performance on both NCBI and CDR corpora.Conclusions
The proposed MCNN based disease NER method achieves the state-of-the-art performance with little feature engineering. And the experimental results show the MLS strategy’s effectiveness of capturing the correlation information between labels in the neighborhood.9.
Hye-Jeong Song Byeong-Cheol Jo Chan-Young Park Jong-Dae Kim Yu-Seop Kim 《Biomedical engineering online》2018,17(2):158
Background
Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.Results
Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.Conclusions
By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.10.
Richard Tzong-Han Tsai Shih-Hung Wu Wen-Chi Chou Yu-Chun Lin Ding He Jieh Hsiang Ting-Yi Sung Wen-Lian Hsu 《BMC bioinformatics》2006,7(1):92
Background
Text mining in the biomedical domain is receiving increasing attention. A key component of this process is named entity recognition (NER). Generally speaking, two annotated corpora, GENIA and GENETAG, are most frequently used for training and testing biomedical named entity recognition (Bio-NER) systems. JNLPBA and BioCreAtIvE are two major Bio-NER tasks using these corpora. Both tasks take different approaches to corpus annotation and use different matching criteria to evaluate system performance. This paper details these differences and describes alternative criteria. We then examine the impact of different criteria and annotation schemes on system performance by retesting systems participated in the above two tasks. 相似文献11.
Baumgartner WA Lu Z Johnson HL Caporaso JG Paquette J Lindemann A White EK Medvedeva O Cohen KB Hunter L 《Genome biology》2008,9(Z2):S9
Background:
Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing.Results:
Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist.Conclusion:
Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet http://bionlp.sourceforge.net.12.
Background
The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles. 相似文献13.
Genes are widely assumed to play a major role in the epidemiology of complex chronic diseases, yet attempts to characterize the genetic architecture of such traits have been frustrating. Understanding that evolution works by screening phenotypes rather than genotypes can help explain the source of this frustration. Complex traits are usually the result of long-term, often subtle, gene-environment interactions, such that individual life histories may be as important as population histories in predicting and explaining these traits. Recognizing that the problem is not due to technological limitations can help temper expectations and guide the design of future work in biomedical genetics, by allowing us to focus on better approaches where they exist and on those problems most likely to yield a genetic solution. We may even be forced to re-conceive complex biological causation. 相似文献
14.
15.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods. 相似文献
16.
Alain Boucher Pablo J. Hidalgo Monique Thonnat Jordina Belmonte Carmen Galan Pierre Bonton Régis Tomczak 《Aerobiologia》2002,18(3-4):195-201
A semi-automatic system for pollen recognitionis studied for the european project ASTHMA. The goal of such a system is to provideaccurate pollen concentration measurements. This information can be used as well by thepalynologists, the clinicians or a forecastsystem to predict pollen dispersion. At first,our emphasis has been put on Cupressaceae, Olea, Poaceae and Urticaceae pollen types. The system is composed of two modules: pollengrain extraction and pollen grain recognition. In the first module, the pollen grains areobserved in light microscopy and are extractedautomatically from a pollen slide coloured withfuchsin and digitized in 3D. In the secondmodule, the pollen grain is analyzed forrecognition. To accomplish the recognition, itis necessary to work on 3D images and to usedetailed palynological knowledge. Thisknowledge describes the pollen types accordingto their main visible characteristerics and tothose which are important for recognition. Somepollen structures are identified like the porewith annulus in Poaceae, the reticulum in Oleaand similar pollen types or the cytoplasm inCupressaceae. The preliminary results show therecognition of some pollen types, likeUrticaceae or Poaceae or some groups of pollentypes, like reticulate group. 相似文献
17.
Lipidomics: a new window to biomedical frontiers 总被引:1,自引:0,他引:1
Lipids are a highly diverse class of molecules with crucial roles in cellular energy storage, structure and signaling. Lipid homeostasis is fundamental to maintain health, and lipid defects are central to the pathogenesis of important and devastating diseases. Newly emerging advances have facilitated the development of so-called lipidomics technologies and offer an opportunity to elucidate the mechanisms leading to disease. Furthermore, these advances also provide the tools to unravel the complexity of the 'allostatic forces' that allow maintenance of normal cellular/tissue phenotypes through the application of bioenergetically inefficient adaptive mechanisms. An alternative strategy is to focus on tissues with limited allostatic capacity, such as the eye, that could be used as readouts of metabolic stress over time. Identification of these allostatic mechanisms and pathological 'scares' might provide a window to unknown pathogenic mechanisms, as well as facilitate identification of early biomarkers of disease. 相似文献
18.
19.
Facchini F Fiori G 《Journal of PHYSIOLOGICAL ANTHROPOLOGY and Applied Human Science》2001,20(2):95-103
In order to focus the situation of Kazakhstan today in relation to the processes of modernization and transition to a market economy and to evidence their effects on the biology and health status of the population of Kazakhstan, we have reviewed recently available data for this region (1993-1999). Kazakhstan is still characterized by a pyramid shaped age distribution of its population and by a high incidence of not communicable diseases and lack of nutrient and micronutrients, especially among children. However, the population of Kazakhstan seems to be not immune to the diseases of the modernization. I.e., among women obesity is more frequent than underweight, especially in the urban areas. In rural populations the frequency of clinically relevant hypertension resulted low in the more isolated and traditionally living communities but it increased to 20% in the less isolated one. Although it is expected a strong increase of urbanized population in the next 25 years, currently, modernization is probably influencing life style and nutritional habits of almost only a minority of the inhabitants of Kazakhstan. 相似文献
20.
I L Bennett 《Federation proceedings》1969,28(5):1592-1603