首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.  相似文献   

2.
It is necessary to decompose the intra-muscular EMG signal to extract motor unit action potential (MUAP) waveforms and firing times. Some algorithms were proposed in the literature to resolve superimposed MUAPs, including Peel-Off (PO), branch and bound (BB), genetic algorithm (GA), and particle swarm optimization (PSO). This study aimed to compare these algorithms in terms of overall accuracy and running time. Two sets of two-to-five MUAP templates (set1: a wide range of energies, and set2: a high degree of similarity) were used. Such templates were time-shifted, and white Gaussian noise was added. A total of 1000 superpositions were simulated for each template and were resolved using PO (also, POI: interpolated PO), BB, GA, and PSO algorithms. The generalized estimating equation was used to identify which method significantly outperformed, while the overall rank product was used for overall ranking. The rankings were PSO, BB, GA, PO, and POI in the first, and BB, PSO, GA, PO, POI in the second set. The overall ranking was BB, PSO, GA, PO, and POI in the entire dataset. Although the BB algorithm is generally fast, there are cases where the BB algorithm is too slow and it is thus not suitable for real-time applications.  相似文献   

3.

Background  

The choice of probe set algorithms for expression summary in a GeneChip study has a great impact on subsequent gene expression data analysis. Spiked-in cRNAs with known concentration are often used to assess the relative performance of probe set algorithms. Given the fact that the spiked-in cRNAs do not represent endogenously expressed genes in experiments, it becomes increasingly important to have methods to study whether a particular probe set algorithm is more appropriate for a specific dataset, without using such external reference data.  相似文献   

4.
Thrombus in a femoral artery may form under stagnant flow conditions which vary depending on the local arterial waveform. Four different physiological flow waveforms – poor (blunt) monophasic, sharp monophasic, biphasic and triphasic – can exist in the femoral artery as a result of different levels of peripheral arterial disease progression. This study aims to examine the effect of different physiological waveforms on femoral artery haemodynamics. In this regard, a fluid–structure interaction analysis was carried out in idealised models of bifurcated common femoral artery. The results showed that recirculation zones occur in almost all flow waveforms; however, the sites at where these vortices are initiated, the size and structure of vortices are highly dependent on the type of flow waveform being used. It was shown that the reverse diastolic flow in biphasic and triphasic waveforms leads to the occurrence of a retrograde flow which aids in ‘washout’ of the disturbed flow regions. This may limit the likelihood of thrombus formation, indicating the antithrombotic role of retrograde flow in femoral arteries. Furthermore, our data revealed that the flow particles experience considerably higher residence time under blunt and sharp monophasic waveforms than under biphasic and triphasic waveforms. This confirms that the risk of atherothrombotic plaque initiation and development in femoral arteries is higher under blunt and sharp monophasic waveforms than under biphasic and triphasic flow waveforms.  相似文献   

5.
Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.  相似文献   

6.
Medical device manufacturers are increasingly applying artificial intelligence (AI) to innovate their products and to improve patient outcomes. Health institutions are also developing their own algorithms, to address specific needs for which no commercial product exists.Although AI-based algorithms offer good prospects for improving patient outcomes, their wide adoption in clinical practice is still limited. The most significant barriers to the trust required for wider implementation are safety and clinical performance assurance .Qualified medical physicist experts (MPEs) play a key role in safety and performance assessment of such tools, before and during integration in clinical practice. As AI methods drive clinical decision-making, their quality should be assured and tested. Occasionally, an MPE may be also involved in the in-house development of such an AI algorithm. It is therefore important for MPEs to be well informed about the current regulatory framework for Medical Devices.The new European Medical Device Regulation (EU MDR), with date of application set for 26 of May 2021, imposes stringent requirements that need to be met before such tools can be applied in clinical practice.The objective of this paper is to give MPEs perspective on how the EU MDR affects the development of AI-based medical device software. We present our perspective regarding how to implement a regulatory roadmap, from early-stage consideration through design and development, regulatory submission, and post-market surveillance. We have further included an explanation of how to set up a compliant quality management system to ensure reliable and consistent product quality, safety, and performance .  相似文献   

7.
Gan X  Liew AW  Yan H 《Nucleic acids research》2006,34(5):1608-1619
Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.  相似文献   

8.
We present two algorithms for calculating the quartet distance between all pairs of trees in a set of binary evolutionary trees on a common set of species. The algorithms exploit common substructure among the trees to speed up the pairwise distance calculations, thus performing significantly better on large sets of trees compared to performing distinct pairwise distance calculations, as we illustrate experimentally, where we see a speedup factor of around 130 in the best case.  相似文献   

9.
Clustering algorithms divide a set of observations into groups so that members of the same group share common features. In most of the algorithms, tunable parameters are set arbitrarily or by trial and error, resulting in less than optimal clustering. This paper presents a global optimization strategy for the systematic and optimal selection of parameter values associated with a clustering method. In the process, a performance criterion for the optimization model is proposed and benchmarked against popular performance criteria from the literature (namely, the Silhouette coefficient, Dunn's index, and Davies-Bouldin index). The tuning strategy is illustrated using the support vector clustering (SVC) algorithm and simulated annealing. In order to reduce the computational burden, the paper also proposes an alternative to the adjacency matrix method (used for the assignment of cluster labels), namely the contour plotting approach. Datasets tested include the iris and the thyroid datasets from the UCI repository, as well as lymphoma and breast cancer data. The optimal tuning parameters are determined efficiently, while the contour plotting approach leads to significant reductions in computational effort (CPU time) especially for large datasets. The performance criteria comparisons indicate mixed results. Specifically, the Silhouette coefficient and the Davies-Bouldin index perform better, while the Dunn's index is worse on average than the proposed performance index.  相似文献   

10.
11.
MOTIVATION: The most commonly utilized microarrays for mRNA profiling (Affymetrix) include 'probe sets' of a series of perfect match and mismatch probes (typically 22 oligonucleotides per probe set). There are an increasing number of reported 'probe set algorithms' that differ in their interpretation of a probe set to derive a single normalized 'signal' representative of expression of each mRNA. These algorithms are known to differ in accuracy and sensitivity, and optimization has been done using a small set of standardized control microarray data. We hypothesized that different mRNA profiling projects have varying sources and degrees of confounding noise, and that these should alter the choice of a specific probe set algorithm. Also, we hypothesized that use of the Microarray Suite (MAS) 5.0 probe set detection p-value as a weighting function would improve the performance of all probe set algorithms. RESULTS: We built an interactive visual analysis software tool (HCE2W) to test and define parameters in Affymetrix analyses that optimize the ratio of signal (desired biological variable) versus noise (confounding uncontrolled variables). Five probe set algorithms were studied with and without statistical weighting of probe sets using the MAS 5.0 probe set detection p-values. The signal-to-noise ratio optimization method was tested in two large novel microarray datasets with different levels of confounding noise, a 105 sample U133A human muscle biopsy dataset (11 groups: mutation-defined, extensive noise), and a 40 sample U74A inbred mouse lung dataset (8 groups: little noise). Performance was measured by the ability of the specific probe set algorithm, with and without detection p-value weighting, to cluster samples into the appropriate biological groups (unsupervised agglomerative clustering with F-measure values). Of the total random sampling analyses, 50% showed a highly statistically significant difference between probe set algorithms by ANOVA [F(4,10) > 14, p < 0.0001], with weighting by MAS 5.0 detection p-value showing significance in the mouse data by ANOVA [F(1,10) > 9, p < 0.013] and paired t-test [t(9) = -3.675, p = 0.005]. Probe set detection p-value weighting had the greatest positive effect on performance of dChip difference model, ProbeProfiler and RMA algorithms. Importantly, probe set algorithms did indeed perform differently depending on the specific project, most probably due to the degree of confounding noise. Our data indicate that significantly improved data analysis of mRNA profile projects can be achieved by optimizing the choice of probe set algorithm with the noise levels intrinsic to a project, with dChip difference model with MAS 5.0 detection p-value continuous weighting showing the best overall performance in both projects. Furthermore, both existing and newly developed probe set algorithms should incorporate a detection p-value weighting to improve performance. AVAILABILITY: The Hierarchical Clustering Explorer 2.0 is available at http://www.cs.umd.edu/hcil/hce/ Murine arrays (40 samples) are publicly available at the PEPR resource (http://microarray.cnmcresearch.org/pgadatatable.asp http://pepr.cnmcresearch.org Chen et al., 2004).  相似文献   

12.
With discovery of diverse roles for RNA, its centrality in cellular functions has become increasingly apparent. A number of algorithms have been developed to predict RNA secondary structure. Their performance has been benchmarked by comparing structure predictions to reference secondary structures. Generally, algorithms are compared against each other and one is selected as best without statistical testing to determine whether the improvement is significant. In this work, it is demonstrated that the prediction accuracies of methods correlate with each other over sets of sequences. One possible reason for this correlation is that many algorithms use the same underlying principles. A set of benchmarks published previously for programs that predict a structure common to three or more sequences is statistically analyzed as an example to show that it can be rigorously evaluated using paired two-sample t-tests. Finally, a pipeline of statistical analyses is proposed to guide the choice of data set size and performance assessment for benchmarks of structure prediction. The pipeline is applied using 5S rRNA sequences as an example.  相似文献   

13.
The artificial bee colony (ABC) algorithm is a recent class of swarm intelligence algorithms that is loosely inspired by the foraging behavior of honeybee swarms. It was introduced in 2005 using continuous optimization problems as an example application. Similar to what has happened with other swarm intelligence techniques, after the initial proposal, several researchers have studied variants of the original algorithm. Unfortunately, often these variants have been tested under different experimental conditions and different fine-tuning efforts for the algorithm parameters. In this article, we review various variants of the original ABC algorithm and experimentally study nine ABC algorithms under two settings: either using the original parameter settings as proposed by the authors, or using an automatic algorithm configuration tool using a same tuning effort for each algorithm. We also study the effect of adding local search to the ABC algorithms. Our experimental results show that local search can improve considerably the performance of several ABC variants and that it reduces strongly the performance differences between the studied ABC variants. We also show that the best ABC variants are competitive with recent state-of-the-art algorithms on the benchmark set we used, which establishes ABC algorithms as serious competitors in continuous optimization.  相似文献   

14.
Aim Trait‐based risk assessment for invasive species is becoming an important tool for identifying non‐indigenous species that are likely to cause harm. Despite this, concerns remain that the invasion process is too complex for accurate predictions to be made. Our goal was to test risk assessment performance across a range of taxonomic and geographical scales, at different points in the invasion process, with a range of statistical and machine learning algorithms. Location Regional to global data sets. Methods We selected six data sets differing in size, geography and taxonomic scope. For each data set, we created seven risk assessment tools using a range of statistical and machine learning algorithms. Performance of tools was compared to determine the effects of data set size and scale, the algorithm used, and to determine overall performance of the trait‐based risk assessment approach. Results Risk assessment tools with good performance were generated for all data sets. Random forests (RF) and logistic regression (LR) consistently produced tools with high performance. Other algorithms had varied performance. Despite their greater power and flexibility, machine learning algorithms did not systematically outperform statistical algorithms. Geographic scope of the data set, and size of the data set, did not systematically affect risk assessment performance. Main conclusions Across six representative data sets, we were able to create risk assessment tools with high performance. Additional data sets could be generated for other taxonomic groups and regions, and these could support efforts to prevent the arrival of new invaders. Random forests and LR approaches performed well for all data sets and could be used as a standard approach to risk assessment development.  相似文献   

15.
The graphics processing unit (GPU), which originally was used exclusively for visualization purposes, has evolved into an extremely powerful co-processor. In the meanwhile, through the development of elaborate interfaces, the GPU can be used to process data and deal with computationally intensive applications. The speed-up factors attained compared to the central processing unit (CPU) are dependent on the particular application, as the GPU architecture gives the best performance for algorithms that exhibit high data parallelism and high arithmetic intensity. Here, we evaluate the performance of the GPU on a number of common algorithms used for three-dimensional image processing. The algorithms were developed on a new software platform called "CUDA", which allows a direct translation from C code to the GPU. The implemented algorithms include spatial transformations, real-space and Fourier operations, as well as pattern recognition procedures, reconstruction algorithms and classification procedures. In our implementation, the direct porting of C code in the GPU achieves typical acceleration values in the order of 10-20 times compared to a state-of-the-art conventional processor, but they vary depending on the type of the algorithm. The gained speed-up comes with no additional costs, since the software runs on the GPU of the graphics card of common workstations.  相似文献   

16.
C Wang  XJ Guo  JF Xu  C Wu  YL Sun  XF Ye  W Qian  XQ Ma  WM Du  J He 《PloS one》2012,7(7):e40561

Background

The detection of signals of adverse drug events (ADEs) has increased because of the use of data mining algorithms in spontaneous reporting systems (SRSs). However, different data mining algorithms have different traits and conditions for application. The objective of our study was to explore the application of association rule (AR) mining in ADE signal detection and to compare its performance with that of other algorithms.

Methodology/Principal Findings

Monte Carlo simulation was applied to generate drug-ADE reports randomly according to the characteristics of SRS datasets. Thousand simulated datasets were mined by AR and other algorithms. On average, 108,337 reports were generated by the Monte Carlo simulation. Based on the predefined criterion that 10% of the drug-ADE combinations were true signals, with RR equaling to 10, 4.9, 1.5, and 1.2, AR detected, on average, 284 suspected associations with a minimum support of 3 and a minimum lift of 1.2. The area under the receiver operating characteristic (ROC) curve of the AR was 0.788, which was equivalent to that shown for other algorithms. Additionally, AR was applied to reports submitted to the Shanghai SRS in 2009. Five hundred seventy combinations were detected using AR from 24,297 SRS reports, and they were compared with recognized ADEs identified by clinical experts and various other sources.

Conclusions/Significance

AR appears to be an effective method for ADE signal detection, both in simulated and real SRS datasets. The limitations of this method exposed in our study, i.e., a non-uniform thresholds setting and redundant rules, require further research.  相似文献   

17.
Forensic facial identification examiners are required to match the identity of faces in images that vary substantially, owing to changes in viewing conditions and in a person''s appearance. These identifications affect the course and outcome of criminal investigations and convictions. Despite calls for research on sources of human error in forensic examination, existing scientific knowledge of face matching accuracy is based, almost exclusively, on people without formal training. Here, we administered three challenging face matching tests to a group of forensic examiners with many years'' experience of comparing face images for law enforcement and government agencies. Examiners outperformed untrained participants and computer algorithms, thereby providing the first evidence that these examiners are experts at this task. Notably, computationally fusing responses of multiple experts produced near-perfect performance. Results also revealed qualitative differences between expert and non-expert performance. First, examiners'' superiority was greatest at longer exposure durations, suggestive of more entailed comparison in forensic examiners. Second, experts were less impaired by image inversion than non-expert students, contrasting with face memory studies that show larger face inversion effects in high performers. We conclude that expertise in matching identity across unfamiliar face images is supported by processes that differ qualitatively from those supporting memory for individual faces.  相似文献   

18.
A novel experimental technique known as non-equilibrium response spectroscopy (NRS) based on ion channel responses to rapidly fluctuating voltage waveforms was recently described (Millonas & Hanck, 1998a). It was demonstrated that such responses can be affected by subtle details of the kinetics that are otherwise invisible when conventional stepped pulses are applied. As a consequence, the kinetics can be probed in a much more sensitive way by supplementing conventional techniques with measurements of the responses to more complex voltage waveforms. In this paper we provide an analysis of the problem of the design and optimization of such waveforms. We introduce some methods for determination of the parametric uncertainty of a class of kinetic models for a particular data set. The parametric uncertainty allows for a characterization of the amount of kinetic information acquired through a set of experiments which can in turn be used to design new experiments that increase this information. We revisit the application of dichotomous noise (Millonas & Hanck, 1998a, b), and further consider applications of a more general class of continuous wavelet -based waveforms. A controlled illustration of these methods is provided by making use of a simplified "toy" model for the potassium channel kinetics.  相似文献   

19.
This paper presents a study of the performance of TRIBES, an adaptive particle swarm optimization algorithm. Particle Swarm Optimization (PSO) is a biologically-inspired optimization method. Recently, researchers have used it effectively in solving various optimization problems. However, like most optimization heuristics, PSO suffers from the drawback of being greatly influenced by the selection of its parameter values. Thus, the common belief is that the performance of a PSO algorithm is directly related to the tuning of such parameters. Usually, such tuning is a lengthy, time consuming and delicate process. A new adaptive PSO algorithm called TRIBES avoids manual tuning by defining adaptation rules which aim at automatically changing the particles’ behaviors as well as the topology of the swarm. In TRIBES, the topology is changed according to the swarm behavior and the strategies of displacement are chosen according to the performances of the particles. A comparative study carried out on a large set of benchmark functions shows that the performance of TRIBES is quite competitive compared to most other similar PSO algorithms that need manual tuning of parameters. The performance evaluation of TRIBES follows the testing procedure introduced during the 2005 IEEE Conference on Evolutionary Computation. The main objective of the present paper is to perform a global study of the behavior of TRIBES under several conditions, in order to determine strengths and drawbacks of this adaptive algorithm.  相似文献   

20.
This paper proposes solutions to monitor the load and to balance the load of cloud data center. The proposed solutions work in two phases and graph theoretical concepts are applied in both phases. In the first phase, cloud data center is modeled as a network graph. This network graph is augmented with minimum dominating set concept of graph theory for monitoring its load. For constructing minimum dominating set, this paper proposes a new variant of minimum dominating set (V-MDS) algorithm and is compared with existing construction algorithms proposed by Rooji and Fomin. The V-MDS approach of querying cloud data center load information is compared with Central monitor approach. The second phase focuses on system and network-aware live virtual machine migration for load balancing cloud data center. For this, a new system and traffic-aware live VM migration for load balancing (ST-LVM-LB) algorithm is proposed and is compared with existing benchmarked algorithms dynamic management algorithm (DMA) and Sandpiper. To study the performance of the proposed algorithms, CloudSim3.0.3 simulator is used. The experimental results show that, V-MDS algorithm takes quadratic time complexity, whereas Rooji and Fomin algorithms take exponential time complexity. Then the V-MDS approach for querying Cloud Data Center load information is compared with the Central monitor approach and the experimental result shows that the proposed approach reduces the number of message updates by half than the Central monitor approach. The experimental results show on load balancing that the developed ST-LVM-LB algorithm triggers lesser Virtual Machine migrations, takes lesser time and migration cost to migrate with minimum network overhead. Thus the proposed algorithms improve the service delivery performance of cloud data center by incorporating graph theoretical solutions in monitoring and balancing the load.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号