首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cluster Computing - High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are...  相似文献   

2.
MedlineR: an open source library in R for Medline literature data mining   总被引:3,自引:0,他引:3  
SUMMARY: We describe an open source library written in the R programming language for Medline literature data mining. This MedlineR library includes programs to query Medline through the NCBI PubMed database; to construct the co-occurrence matrix; and to visualize the network topology of query terms. The open source nature of this library allows users to extend it freely in the statistical programming language of R. To demonstrate its utility, we have built an application to analyze term-association by using only 10 lines of code. We provide MedlineR as a library foundation for bioinformaticians and statisticians to build more sophisticated literature data mining applications. AVAILABILITY: The library is available from http://dbsr.duke.edu/pub/MedlineR.  相似文献   

3.
《Cytotherapy》2022,24(5):445-455
Bone marrow aspirate concentrate (BMAC) therapy has been spotlighted as a promising regenerative tool with its abundant source of mesenchymal stromal cells (MSCs) and growth factors. The spectrum of the utility of BMAC therapy has been expanding day by day to harness the potential for varied therapeutic purposes. In the due course of its evolution, it is often essential to have a comprehensive summary of progress to have a greater understanding and refine our future directives. With technological developments such as data mining, graphic drawing and information analytics combined with computational statistics, visualization of scientific metrology has become a reality. With this newer perspective, we intend to use scientometric tools including text mining, cocitation analysis, keyword analysis and cluster network analysis to perform thematic trend mapping and hotspot analysis of the literature on BMAC therapy and evaluate its progress in the management of osteoarthritis.  相似文献   

4.
The availability of sequence data derived from shotgun sequencing programs enables mining for simple sequence repeats (SSRs), providing useful genetic markers for crop improvement. This study presents the development and characterization of 40 SSR markers from Brassica oleracea shotgun sequence and their cross‐amplification across Brassica species. The markers show reliable amplification, genome specificity and considerable polymorphism, demonstrating the utility of SSRs for genetic analysis of commercial Brassica germplasm.  相似文献   

5.
LARaLink 2.0 (Loci Analysis for Rearrangement Link) is an enabling web technology that permits the rapid retrieval of clinical cytogenetic and molecular data. New data mining capabilities have been incorporated into version 2.0, building upon LARaLink 1.0, to extend the utility of the system for applications in both the clinical and basic sciences. These include access to the Chromosomal Variation in Man database and the GEO database. Together these new resources enhance the user's ability to associate genotype with phenotype to identify potential gene candidates. Unlimited access for researchers exploring disease-gene relationships and for clinicians extending practice in patient care is available at LARaLink.bioinformatics.wayne.edu:8080/ unigene.  相似文献   

6.
Protein expression profiling is increasingly being used to discover, validate and characterize biomarkers that can potentially be used for diagnostic purposes and to aid in pharmaceutical development. Correct analysis of data obtained from these experiments requires an understanding of the underlying analytic procedures used to obtain the data, statistical principles underlying high-dimensional data and clinical statistical tools used to determine the utility of the interpreted data. This review summarizes each of these steps, with the goal of providing the nonstatistician proteomics researcher with a working understanding of the various approaches that may be used by statisticians. Emphasis is placed on the process of mining high-dimensional data to identify a specific set of biomarkers that may be used in a diagnostic or other assay setting.  相似文献   

7.
Protein expression profiling is increasingly being used to discover, validate and characterize biomarkers that can potentially be used for diagnostic purposes and to aid in pharmaceutical development. Correct analysis of data obtained from these experiments requires an understanding of the underlying analytic procedures used to obtain the data, statistical principles underlying high-dimensional data and clinical statistical tools used to determine the utility of the interpreted data. This review summarizes each of these steps, with the goal of providing the nonstatistician proteomics researcher with a working understanding of the various approaches that may be used by statisticians. Emphasis is placed on the process of mining high-dimensional data to identify a specific set of biomarkers that may be used in a diagnostic or other assay setting.  相似文献   

8.
Surface-based and probabilistic atlases of primate cerebral cortex   总被引:3,自引:0,他引:3  
Van Essen DC  Dierker DL 《Neuron》2007,56(2):209-225
Brain atlases play an increasingly important role in neuroimaging, as they are invaluable for analysis, visualization, and comparison of results across studies. For both humans and macaque monkeys, digital brain atlases of many varieties are in widespread use, each having its own strengths and limitations. For studies of cerebral cortex there is particular utility in hybrid atlases that capitalize on the complementary nature of surface and volume representations, are based on a population average rather than an individual brain, and include measures of variation as well as averages. Linking different brain atlases to one another and to online databases containing a growing body of neuroimaging data will enable powerful forms of data mining that accelerate discovery and improve research efficiency.  相似文献   

9.
10.
The spot price for tantalum, a metal used in high‐performance consumer electronics, spiked in 2000, triggering a boom in artisanal mining of surface deposits in the Democratic Republic of Congo (DRC). The profit from columbite‐tantalite ore, or coltan, is alleged to have funded militants during that country's civil war. One warlord famously claimed that in 2000, coltan delivered a million dollars per month. While coltan mining was neither a necessary nor sufficient cause for the civil war, there is nevertheless a clear association between mining and conflict. In order to trace global flows of coltan out of the DRC, we used a high‐resolution multiregion input‐output (MRIO) table and a hybrid life cycle assessment (LCA) approach to trace exports through international supply chains in order to estimate a “coltan footprint” for various products. In this case study, our aim is to highlight the power and utility of hybrid LCA analysis using high‐resolution global MRIO accounts. We estimate which supply chains, nations, and consumer goods carry the largest loads of embodied coltan. This hybrid LCA case study provides estimates on illicit flows of coltan, estimates a coltan footprint of consumption, and highlights the advantages and challenges of using hybrid monetary‐physical input‐output/LCA approaches to study and quantify a negative social impact as an input to production. If successful, the hybrid LCA approach could be a useful and expedient measurement tool for understanding flows of conflict minerals embodied in supply chains.  相似文献   

11.
The strength of the rat as a model organism lies in its utility in pharmacology, biochemistry and physiology research. Data resulting from such studies is difficult to represent in databases and the creation of user-friendly data mining tools has proved difficult. The Rat Genome Database has developed a comprehensive ontology-based data structure and annotation system to integrate physiological data along with environmental and experimental factors, as well as genetic and genomic information. RGD uses multiple ontologies to integrate complex biological information from the molecular level to the whole organism, and to develop data mining and presentation tools. This approach allows RGD to indicate not only the phenotypes seen in a strain but also the specific values under each diet and atmospheric condition, as well as gender differences. Harnessing the power of ontologies in this way allows the user to gather and filter data in a customized fashion, so that a researcher can retrieve all phenotype readings for which a high hypoxia is a factor. Utilizing the same data structure for expression data, pathways and biological processes, RGD will provide a comprehensive research platform which allows users to investigate the conditions under which biological processes are altered and to elucidate the mechanisms of disease.  相似文献   

12.
LC MS/MS has become an established technology in proteomic studies, and with the maturation of the technology the bottleneck has shifted from data generation to data validation and mining. To address this bottleneck we developed Experimental Peptide Identification Repository (EPIR), which is an integrated software platform for storage, validation, and mining of LC MS/MS-derived peptide evidence. EPIR is a cumulative data repository where precursor ions are linked to peptide assignments and protein associations returned by a search engine (e.g. Mascot, Sequest, or PepSea). Any number of datasets can be parsed into EPIR and subsequently validated and mined using a set of software modules that overlay the database. These include a peptide validation module, a protein grouping module, a generic module for extracting quantitative data, a comparative module, and additional modules for extracting statistical information. In the present study, the utility of EPIR and associated software tools is demonstrated on LC MS/MS data derived from a set of model proteins and complex protein mixtures derived from MCF-7 breast cancer cells. Emphasis is placed on the key strengths of EPIR, including the ability to validate and mine multiple combined datasets, and presentation of protein-level evidence in concise, nonredundant protein groups that are based on shared peptide evidence.  相似文献   

13.

Background

The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.

Methodology/Results

We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.

Conclusions

Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.  相似文献   

14.
Metabolomics experiments seldom achieve their aim of comprehensively covering the entire metabolome. However, important information can be gleaned even from sparse datasets, which can be facilitated by placing the results within the context of known metabolic networks. Here we present a method that allows the automatic assignment of identified metabolites to positions within known metabolic networks, and, furthermore, allows automated extraction of sub-networks of biological significance. This latter feature is possible by use of a gap-filling algorithm. The utility of the algorithm in reconstructing and mining of metabolomics data is shown on two independent datasets generated with LC–MS LTQ-Orbitrap mass spectrometry. Biologically relevant metabolic sub-networks were extracted from both datasets. Moreover, a number of metabolites, whose presence eluded automatic selection within mass spectra, could be identified retrospectively by virtue of their inferred presence through gap filling.  相似文献   

15.
Protein profiling using high-throughput tandem mass spectrometry has become a powerful method for analyzing changes in global protein expression patterns in cells and tissues as a function of developmental, physiologic and disease processes. This review summarizes the utility and practical application of multidimensional protein identification technology as a platform for comprehensive proteomic profiling of complex biologic samples. The strengths and potential problems and limitations associated with this powerful technology are discussed, with an emphasis placed on one of the biggest challenges currently facing large-scale expression profiling projects -- namely, data analysis. Complementary bioinformatic computational data mining strategies, such as clustering, functional annotation and statistical inference, are also discussed as these are increasingly necessary for interpreting the results of global proteomic profiling studies.  相似文献   

16.
Protein profiling using high-throughput tandem mass spectrometry has become a powerful method for analyzing changes in global protein expression patterns in cells and tissues as a function of developmental, physiologic and disease processes. This review summarizes the utility and practical application of multidimensional protein identification technology as a platform for comprehensive proteomic profiling of complex biologic samples. The strengths and potential problems and limitations associated with this powerful technology are discussed, with an emphasis placed on one of the biggest challenges currently facing large-scale expression profiling projects – namely, data analysis. Complementary bioinformatic computational data mining strategies, such as clustering, functional annotation and statistical inference, are also discussed as these are increasingly necessary for interpreting the results of global proteomic profiling studies.  相似文献   

17.
This Comment records the details of an unusual multipartner ecological research programme studying Temperate Highland Peat Swamps on Sandstone in the Sydney Basin‐Blue Mountains area of New South Wales. We draw lessons from the experience of designing and managing this multipartner ecological programme, which was based on a nontraditional funding source – that is an enforceable undertaking required of a coal mining company related to an occurrence at a mine site. The research programme encompassed geomorphology, ecohydrology and ecology of a number of sites. Given the currently constrained public‐good environmental research funding and pressures for both researchers and managers to find new, collaborative ways of funding and implementing research, lessons drawn from such innovative experiences may be of wider utility. In particular, lessons are drawn from the programme regarding the time required to design collaborative processes and the need for explicit programme management capacities.  相似文献   

18.
The availability of expressed sequence data derived from gene discovery programs enables mining for simple sequence repeats (SSR), providing useful genetic markers for crop improvement. These markers are inexpensive, require minimal labour to produce and can frequently be associated with functionally annotated genes. This study presents the development and characterization of 24 expressed sequence tags (EST)‐SSR markers from Brassica napus and their cross‐amplification across Brassica species. The markers show reliable amplification, genome specificity and considerable polymorphism, demonstrating the utility of EST‐SSRs for genetic analysis of wild Brassica populations and commercial Brassica germplasm.  相似文献   

19.
The availability of expressed sequence data derived from gene discovery programmes enables mining for simple sequence repeats (SSRs), providing useful genetic markers for crop improvement. These markers are inexpensive, require minimal labour to produce and can frequently be associated with functionally annotated genes. This study reports on the development and characterization of expressed sequence tag (EST)–SSR markers in the cultivated strawberry, Fragaria×ananassa. Fourteen primer pairs were assessed for polymorphism in 13 F.×ananassa genotypes. The markers show reliable amplification and considerable polymorphism, demonstrating the utility of EST–SSRs for genetic analysis of commercial strawberry germplasm.  相似文献   

20.
We report on the data mining of publicly available Litopenaeus vannamei expressed sequence tags (ESTs) to generate simple sequence repeat (SSRs) markers and on their transferability between related Penaeid shrimp species. Repeat motifs were found in 3.8% of the evaluated ESTs at a frequency of one repeat every 7.8 kb of sequence data. A total of 206 primer pairs were designed, and 112 loci were amplified with the highest success in L. vannamei. A high percentage (69%) of EST-SSRs were transferable within the genus Litopenaeus. More than half of the amplified products were polymorphic in a small testing panel of L. vannamei. Evaluation of those primers in a larger testing panel showed that 72% of the markers fit Hardy-Weinberg equilibrium, which shows their utility for population genetic analysis. Additionally, a set of 26 of the EST-SSRs were evaluated for Mendelian segregation. A high percentage of monomorphic markers (46%) proved to be polymorphic by singles-stranded conformational polymorphism analysis. Because of the high number of ESTs available in public databases, a data mining approach similar to the one outlined here might yield high numbers of SSR markers in many animal taxa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号