首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Remote sensing can be a valuable alternative or complement to traditional techniques for monitoring wildlife populations, but often entails operational bottlenecks at the image analysis stage. For example, photographic aerial surveys have several advantages over surveys employing airborne observers or other more intrusive monitoring techniques, but produce onerous amounts of imagery for manual analysis when conducted across vast areas, such as the Arctic. Deep learning algorithms, chiefly convolutional neural networks (CNNs), have shown promise for automatically detecting wildlife in large and/or complex image sets. But for sparsely distributed species, such as polar bears (Ursus maritimus), there may not be sufficient known instances of the animals in an image set to train a CNN. We investigated the feasibility of instead providing ‘synthesized’ training data to a CNN to detect polar bears throughout large volumes of aerial imagery from a survey of the Baffin Bay subpopulation. We harvested 534 miscellaneous images of polar bears from the Web that we edited to more closely resemble 21 known images of bears from the aerial survey that were solely used for validation. We combined the Web images of polar bears with 6292 random background images from the aerial survey to train a CNN (ResNet-50), which subsequently correctly classified 20/21 (95%) bear images from the survey and 1172/1179 (99.4%) random background validation images. Given that even a small background misclassification rate could produce multitudinous false positives over many thousands of photos, we describe a potential workflow to efficiently screen out erroneous detections. We also discuss potential avenues to improve CNN accuracy, and the broader applicability of our approach to other image-based wildlife monitoring scenarios. Our results demonstrate the feasibility of using miscellaneously sourced images of animals to train deep neural networks for specific wildlife detection tasks.  相似文献   

2.
There is a need for monitoring biodiversity at multiple spatial and temporal scales to aid conservation efforts. Autonomous recording units (ARUs) can provide cost-effective, long-term and systematic species monitoring data for sound-producing wildlife, including birds, amphibians, insects and mammals over large areas. Modern deep learning can efficiently automate the detection of species occurrences in these sound data with high accuracy. Further, citizen science can be leveraged to scale up the deployment of ARUs and collect reference vocalizations needed for training and validating deep learning models. In this study we develop a convolutional neural network (CNN) acoustic classification pipeline for detecting 54 bird species in Sonoma County, California USA, with sound and reference vocalization data collected by citizen scientists within the Soundscapes to Landscapes project (www.soundscapes2landscapes.org). We trained three ImageNet-based CNN architectures (MobileNetv2, ResNet50v2, ResNet100v2), which function as a Mixture of Experts (MoE), to evaluate the usefulness of several methods to enhance model accuracy. Specifically, we: 1) quantify accuracy with fully-labeled 1-min soundscapes for an assessment of real-world conditions; 2) assess the effect on precision and recall of additional pre-training with an external sound archive (xeno-canto) prior to fine-tuning with vocalization data from our study domain; and, 3) assess how detections and errors are influenced by the presence of coincident biotic and non-biotic sounds (i.e., soundscape components). In evaluating accuracy with soundscape data (n = 37 species) across CNN probability thresholds and models, we found acoustic pre-training followed by fine-tuning improved average precision by 10.3% relative to no pre-training, although there was a small average 0.8% reduction in recall. In selecting an optimal CNN architecture for each species based on maximum F(β = 0.5), we found our MoE approach had total precision of 84.5% and average species precision of 85.1%. Our data exhibit multiple issues arising from applying citizen science and acoustic monitoring at the county scale, including deployment of ARUs with relatively low fidelity and recordings with background noise and overlapping vocalizations. In particular, human noise was significantly associated with more incorrect species detections (false positives, decreased precision), while physical interference (e.g., recorder hit by a branch) and geophony (e.g., wind) was associated with the classifier missing detections (false negatives, decreased recall). Our process surmounted these obstacles, and our final predictions allowed us to demonstrate how deep learning applied to acoustic data from low-cost ARUs paired with citizen science can provide valuable bird diversity data for monitoring and conservation efforts.  相似文献   

3.
Epithelial and stromal tissues are components of the tumor microenvironment and play a major role in tumor initiation and progression. Distinguishing stroma from epithelial tissues is critically important for spatial characterization of the tumor microenvironment. Here, we propose BrcaSeg, an image analysis pipeline based on a convolutional neural network (CNN) model to classify epithelial and stromal regions in whole-slide hematoxylin and eosin (H&E) stained histopathological images. The CNN model is trained using well-annotated breast cancer tissue microarrays and validated with images from The Cancer Genome Atlas (TCGA) Program. BrcaSeg achieves a classification accuracy of 91.02%, which outperforms other state-of-the-art methods. Using this model, we generate pixel-level epithelial/stromal tissue maps for 1000 TCGA breast cancer slide images that are paired with gene expression data. We subsequently estimate the epithelial and stromal ratios and perform correlation analysis to model the relationship between gene expression and tissue ratios. Gene Ontology (GO) enrichment analyses of genes that are highly correlated with tissue ratios suggest that the same tissue is associated with similar biological processes in different breast cancer subtypes, whereas each subtype also has its own idiosyncratic biological processes governing the development of these tissues. Taken all together, our approach can lead to new insights in exploring relationships between image-based phenotypes and their underlying genomic events and biological processes for all types of solid tumors. BrcaSeg can be accessed at https://github.com/Serian1992/ImgBio.  相似文献   

4.
Sickle cell disease, a genetic disorder affecting a sizeable global demographic, manifests in sickle red blood cells (sRBCs) with altered shape and biomechanics. sRBCs show heightened adhesive interactions with inflamed endothelium, triggering painful vascular occlusion events. Numerous studies employ microfluidic-assay-based monitoring tools to quantify characteristics of adhered sRBCs from high resolution channel images. The current image analysis workflow relies on detailed morphological characterization and cell counting by a specially trained worker. This is time and labor intensive, and prone to user bias artifacts. Here we establish a morphology based classification scheme to identify two naturally arising sRBC subpopulations—deformable and non-deformable sRBCs—utilizing novel visual markers that link to underlying cell biomechanical properties and hold promise for clinically relevant insights. We then set up a standardized, reproducible, and fully automated image analysis workflow designed to carry out this classification. This relies on a two part deep neural network architecture that works in tandem for segmentation of channel images and classification of adhered cells into subtypes. Network training utilized an extensive data set of images generated by the SCD BioChip, a microfluidic assay which injects clinical whole blood samples into protein-functionalized microchannels, mimicking physiological conditions in the microvasculature. Here we carried out the assay with the sub-endothelial protein laminin. The machine learning approach segmented the resulting channel images with 99.1±0.3% mean IoU on the validation set across 5 k-folds, classified detected sRBCs with 96.0±0.3% mean accuracy on the validation set across 5 k-folds, and matched trained personnel in overall characterization of whole channel images with R2 = 0.992, 0.987 and 0.834 for total, deformable and non-deformable sRBC counts respectively. Average analysis time per channel image was also improved by two orders of magnitude (∼ 2 minutes vs ∼ 2-3 hours) over manual characterization. Finally, the network results show an order of magnitude less variance in counts on repeat trials than humans. This kind of standardization is a prerequisite for the viability of any diagnostic technology, making our system suitable for affordable and high throughput disease monitoring.  相似文献   

5.
There are several identification tools that can assist researchers, technicians and the community in the recognition of Chagas vector insects (triatomines), from other insects with similar morphologies. They involve using dichotomous keys, field guides, expert knowledge or, in more recent approaches, through the classification by a neural network of high quality photographs taken in standardized conditions. The aim of this research was to develop a deep neural network to recognize triatomines (insects associated with vectorial transmission of Chagas disease) directly from photos taken with any commonly available mobile device, without any other specialized equipment. To overcome the shortcomings of taking images using specific instruments and a controlled environment an innovative machine-learning approach was used: Fastai with Pytorch, a combination of open-source software for deep learning. The Convolutional Neural Network (CNN) was trained with triatomine photos, reaching a correct identification in 94.3% of the cases. Results were validated using photos sent by citizen scientists from the GeoVin project, resulting in 91.4% of correct identification of triatomines. The CNN provides a lightweight, robust method that even works with blurred images, poor lighting and even with the presence of other subjects and objects in the same frame. Future steps include the inclusion of the CNN into the framework of the GeoVin science project, which will also allow to further train the network using the photos sent by the citizen scientists. This would allow the participation of the community in the identification and monitoring of the vector insects, particularly in regions where government-led monitoring programmes are not frequent due to their low accessibility and high costs.  相似文献   

6.

Background

Images embedded in biomedical publications carry rich information that often concisely summarize key hypotheses adopted, methods employed, or results obtained in a published study. Therefore, they offer valuable clues for understanding main content in a biomedical publication. Prior studies have pointed out the potential of mining images embedded in biomedical publications for automatically understanding and retrieving such images' associated source documents. Within the broad area of biomedical image processing, categorizing biomedical images is a fundamental step for building many advanced image analysis, retrieval, and mining applications. Similar to any automatic categorization effort, discriminative image features can provide the most crucial aid in the process.

Method

We observe that many images embedded in biomedical publications carry versatile annotation text. Based on the locations of and the spatial relationships between these text elements in an image, we thus propose some novel image features for image categorization purpose, which quantitatively characterize the spatial positions and distributions of text elements inside a biomedical image. We further adopt a sparse coding representation (SCR) based technique to categorize images embedded in biomedical publications by leveraging our newly proposed image features.

Results

we randomly selected 990 images of the JPG format for use in our experiments where 310 images were used as training samples and the rest were used as the testing cases. We first segmented 310 sample images following the our proposed procedure. This step produced a total of 1035 sub-images. We then manually labeled all these sub-images according to the two-level hierarchical image taxonomy proposed by [1]. Among our annotation results, 316 are microscopy images, 126 are gel electrophoresis images, 135 are line charts, 156 are bar charts, 52 are spot charts, 25 are tables, 70 are flow charts, and the remaining 155 images are of the type "others". A serial of experimental results are obtained. Firstly, each image categorizing results is presented, and next image categorizing performance indexes such as precision, recall, F-score, are all listed. Different features which include conventional image features and our proposed novel features indicate different categorizing performance, and the results are demonstrated. Thirdly, we conduct an accuracy comparison between support vector machine classification method and our proposed sparse representation classification method. At last, our proposed approach is compared with three peer classification method and experimental results verify our impressively improved performance.

Conclusions

Compared with conventional image features that do not exploit characteristics regarding text positions and distributions inside images embedded in biomedical publications, our proposed image features coupled with the SR based representation model exhibit superior performance for classifying biomedical images as demonstrated in our comparative benchmark study.
  相似文献   

7.
With growing anthropogenic pressure on deep-sea ecosystems, large quantities of data are needed to understand their ecology, monitor changes over time and inform conservation managers. Current methods of image analysis are too slow to meet these requirements. Recently, computer vision has become more accessible to biologists, and could help address this challenge. In this study we demonstrate a method by which non-specialists can train a YOLOV4 Convolutional Neural Network (CNN) able to count and measure a single class of objects. We apply CV to the extraction of quantitative data on the density and population size structure of the xenophyophore Syringammina fragilissima, from more than 58,000 images taken by an AUV 1200 m deep in the North-East Atlantic. The workflow developed used open-source tools, cloud-base hardware, and only required a level of experience with CV commonly found among ecologists. The CNN performed well, achieving a recall of 0.84 and precision of 0.91. Individual counts per image and size measurements resulting from model predictions were highly correlated (0.96 and 0.92, respectively) with manually collected data. The analysis could be completed in less than 10 days thus bringing novel insights into the population size structure and fine scale distribution of this Vulnerable Marine Ecosystem. It showed S. fragilissima distribution is patchy. The average density is 2.5 ind.m−2 but can vary from up to 45 ind.m−2 only a few tens of meter away from areas where it is almost absent. The average size is 5.5 cm and the largest individuals (>15 cm) tend to be in areas of low density. This study demonstrates how researchers could take advantage of CV to quickly and efficiently generate large quantitative datasets data on benthic ecosystems extent and distribution. This, coupled with the large sampling capacity of AUVs could bypass the bottleneck of image analysis and greatly facilitate future deep-ocean exploration and monitoring. It also illustrates the future potential of these new technologies to meet the goals set by the UN Ocean Decade.  相似文献   

8.
Deep learning is a powerful approach for distinguishing classes of images, and there is a growing interest in applying these methods to delimit species, particularly in the identification of mosquito vectors. Visual identification of mosquito species is the foundation of mosquito-borne disease surveillance and management, but can be hindered by cryptic morphological variation in mosquito vector species complexes such as the malaria-transmitting Anopheles gambiae complex. We sought to apply Convolutional Neural Networks (CNNs) to images of mosquitoes as a proof-of-concept to determine the feasibility of automatic classification of mosquito sex, genus, species, and strains using whole-body, 2D images of mosquitoes. We introduce a library of 1, 709 images of adult mosquitoes collected from 16 colonies of mosquito vector species and strains originating from five geographic regions, with 4 cryptic species not readily distinguishable morphologically even by trained medical entomologists. We present a methodology for image processing, data augmentation, and training and validation of a CNN. Our best CNN configuration achieved high prediction accuracies of 96.96% for species identification and 98.48% for sex. Our results demonstrate that CNNs can delimit species with cryptic morphological variation, 2 strains of a single species, and specimens from a single colony stored using two different methods. We present visualizations of the CNN feature space and predictions for interpretation of our results, and we further discuss applications of our findings for future applications in malaria mosquito surveillance.  相似文献   

9.
Inspection of insect sticky paper traps is an essential task for an effective integrated pest management (IPM) programme. However, identification and counting of the insect pests stuck on the traps is a very cumbersome task. Therefore, an efficient approach is needed to alleviate the problem and to provide timely information on insect pests. In this research, an automatic method for the multi-class recognition of small-size greenhouse insect pests on sticky paper trap images acquired by wireless imaging devices is proposed. The developed algorithm features a cascaded approach that uses a convolutional neural network (CNN) object detector and CNN image classifiers, separately. The object detector was trained for detecting objects in an image, and a CNN classifier was applied to further filter out non-insect objects from the detected objects in the first stage. The obtained insect objects were then further classified into flies (Diptera: Drosophilidae), gnats (Diptera: Sciaridae), thrips (Thysanoptera: Thripidae) and whiteflies (Hemiptera: Aleyrodidae), using a multi-class CNN classifier in the second stage. Advantages of this approach include flexibility in adding more classes to the multi-class insect classifier and sample control strategies to improve classification performance. The algorithm was developed and tested for images taken by multiple wireless imaging devices installed in several greenhouses under natural and variable lighting environments. Based on the testing results from long-term experiments in greenhouses, it was found that the algorithm could achieve average F1-scores of 0.92 and 0.90 and mean counting accuracies of 0.91 and 0.90, as tested on a separate 6-month image data set and on an image data set from a different greenhouse, respectively. The proposed method in this research resolves important problems for the automated recognition of insect pests and provides instantaneous information of insect pest occurrences in greenhouses, which offers vast potential for developing more efficient IPM strategies in agriculture.  相似文献   

10.
Understanding environmental factors that influence forest health, as well as the occurrence and abundance of wildlife, is a central topic in forestry and ecology. However, the manual processing of field habitat data is time-consuming and months are often needed to progress from data collection to data interpretation. To shorten the time to process the data we propose here Habitat-Net: a novel deep learning application based on Convolutional Neural Networks (CNN) to segment habitat images of tropical rainforests. Habitat-Net takes color images as input and after multiple layers of convolution and deconvolution, produces a binary segmentation of the input image. We worked on two different types of habitat datasets that are widely used in ecological studies to characterize the forest conditions: canopy closure and understory vegetation. We trained the model with 800 canopy images and 700 understory images separately and then used 149 canopy and 172 understory images to test the performance of Habitat-Net. We compared the performance of Habitat-Net to the performance of a simple threshold based method, manual processing by a second researcher and a CNN approach called U-Net, upon which Habitat-Net is based. Habitat-Net, U-Net and simple thresholding reduced total processing time to milliseconds per image, compared to 45 s per image for manual processing. However, the higher mean Dice coefficient of Habitat-Net (0.94 for canopy and 0.95 for understory) indicates that accuracy of Habitat-Net is higher than that of both the simple thresholding (0.64, 0.83) and U-Net (0.89, 0.94). Habitat-Net will be of great relevance for ecologists and foresters, who need to monitor changes in their forest structures. The automated workflow not only reduces the time, it also standardizes the analytical pipeline and, thus, reduces the degree of uncertainty that would be introduced by manual processing of images by different people (either over time or between study sites).  相似文献   

11.
Social feeding strategies of wintering red kites are analyzed in relation to age, food, roost-sites and differences from kite residents. Whereas young and adult wintering kites gathered at roost sites almost daily, adult residents did not, and immature residents only occasionally. Kites using roost sites feed more often on prey prelocated by others, while lone roosters also forage and discover food alone. After finding food, kites tend to shift to a new roost site and foraging area. Two details of the ‘information centre’ hypothesis are confirmed in our study: carcasses are unpredictably found patches, divisible between several individuals. But carcasses disappeared fast in the study area, and no increase with time in the number of birds consuming a carcass was observed, so that information transmission was unconfirmed. When kites leave the roost in groups no leader is detectable. It seems that other types of social foraging are operating, and the model best matching our results is network foraging.  相似文献   

12.
Programmes to reintroduce predatory birds are resource intensive and expensive, yet there are few long-term studies on the health of these reintroduced birds following release. A total of 326 red kites (Milvus milvus) were released at four sites in England between 1989 and 2006 as part of efforts to reintroduce this species to England and Scotland, resulting in the establishment of several rapidly expanding populations in the wild. Detailed post-mortem examinations were carried out on 162 individuals found dead between 1989 and 2007, involving both released and wild-fledged birds. Toxicological analysis of one or more compounds was performed on 110 of the 162 birds. Poisoning was diagnosed in 32 of these 110 kites, 19 from second-generation anticoagulant rodenticides, 9 from other pesticides and 6 from lead. Criteria for diagnosing anticoagulant rodenticide poisoning included visible haemorrhage on gross post-mortem examination and levels of anticoagulant rodenticide exceeding 100 ng/g, but levels were elevated above 100 ng/g in a further eight red kites without visible haemorrhages, suggesting poisoning may have occurred in more birds. The anticoagulant rodenticides difenacoum and bromadiolone were the most common vertebrate control agents involved during this period. Poisoning of red kites may be slowing their rate of population recovery and range expansion in England. Simple modifications of human activity, such as best practice in rodent control campaigns, tackling the illegal use of pesticides and the use of non-toxic alternatives to lead ammunition, can reduce our impact on red kites and probably other populations of predatory and scavenging species.  相似文献   

13.
14.
15.
Research on raptors in general in India is scanty, and it is practically non-existent on black kites (Milvus migrans govinda) which are the major scavenging raptor in many urban areas. The aim of this study was to analyse the seasonal abundance and roosting behaviour of black kites in an urban metropolis. Data on the abundance and behaviour of roosting black kites in this setting were collected using evening roost counts and ad-libitum sampling, respectively. Analysis was performed using separate generalized linear models considering roosting kite abundance, number of black kites arriving to roost and number of black kites showing pre-roosting display as response variables, respectively. We found that black kites roosted communally and that their number varied in different years and seasons, with the abundance highest in the summer and lowest during the winter. Pre-roosting displays also varied seasonally, being highest during the monsoon and at a minimum in the winter. In our urban setting, black kites arrived at the roosting sites mostly after sunset, and their arrival was influenced by sunset time, temperature, relative humidity and season. Some behavioural aspects of black kites within the roosts were also documented. This is the first quantitative assessment of roosting black kite abundance in Kolkata, India, and our data provide insight on the roosting behaviour of these birds relative to various environmental parameters.  相似文献   

16.
We present a supervised machine learning approach for markerless estimation of human full-body kinematics for a cyclist from an unconstrained colour image. This approach is motivated by the limitations of existing marker-based approaches restricted by infrastructure, environmental conditions, and obtrusive markers. By using a discriminatively learned mixture-of-parts model, we construct a probabilistic tree representation to model the configuration and appearance of human body joints. During the learning stage, a Structured Support Vector Machine (SSVM) learns body parts appearance and spatial relations. In the testing stage, the learned models are employed to recover body pose via searching in a test image over a pyramid structure. We focus on the movement modality of cycling to demonstrate the efficacy of our approach. In natura estimation of cycling kinematics using images is challenging because of human interaction with a bicycle causing frequent occlusions. We make no assumptions in relation to the kinematic constraints of the model, nor the appearance of the scene. Our technique finds multiple quality hypotheses for the pose. We evaluate the precision of our method on two new datasets using loss functions. Our method achieves a score of 91.1 and 69.3 on mean Probability of Correct Keypoint (PCK) measure and 88.7 and 66.1 on the Average Precision of Keypoints (APK) measure for the frontal and sagittal datasets respectively. We conclude that our method opens new vistas to robust user-interaction free estimation of full body kinematics, a prerequisite to motion analysis.  相似文献   

17.
Two experiments examined the nature of visuo-spatial mental imagery generation and maintenance in 4-, 6-, 8-, 10-year old children and adults (N = 211). The key questions were how image generation and maintenance develop (Experiment 1) and how accurately children and adults coordinate mental and visually perceived images (Experiment 2). Experiment 1 indicated that basic image generation and maintenance abilities are present at 4 years of age but the precision with which images are generated and maintained improves particularly between 4 and 8 years. In addition to increased precision, Experiment 2 demonstrated that generated and maintained mental images become increasingly similar to visually perceived objects. Altogether, findings suggest that for simple tasks demanding image generation and maintenance, children attain adult-like precision younger than previously reported. This research also sheds new light on the ability to coordinate mental images with visual images in children and adults.  相似文献   

18.
AimThis study evaluated a convolutional neural network (CNN) for automatically delineating the liver on contrast-enhanced or non-contrast-enhanced CT, making comparisons with a commercial automated technique (MIM Maestro®).BackgroundIntensity-modulated radiation therapy requires careful labor-intensive planning involving delineation of the target and organs on CT or MR images to ensure delivery of the effective dose to the target while avoiding organs at risk.Materials and MethodsContrast-enhanced planning CT images from 101 pancreatic cancer cases and accompanying mask images showing manually-delineated liver contours were used to train the CNN to segment the liver. The trained CNN then performed liver segmentation on a further 20 contrast-enhanced and 15 non-contrastenhanced CT image sets, producing three-dimensional mask images of the liver.ResultsFor both contrast-enhanced and non-contrast-enhanced images, the mean Dice similarity coefficients between CNN segmentations and ground-truth manual segmentations were significantly higher than those between ground-truth and MIM Maestro software (p < 0.001). Although mean CT values of the liver were higher on contrast-enhanced than on non-contrast-enhanced CT, there were no significant differences in the Hausdorff distances of the CNN segmentations, indicating that the CNN could successfully segment the liver on both image types, despite being trained only on contrast-enhanced images.ConclusionsOur results suggest that a CNN can perform highly accurate automated delineation of the liver on CT images, irrespective of whether the CT images are contrast-enhanced or not.  相似文献   

19.
This paper describes and explains design patterns for software that supports how analysts can efficiently inspect and classify camera trap images for wildlife‐related ecological attributes. Broadly speaking, a design pattern identifies a commonly occurring problem and a general reusable design approach to solve that problem. A developer can then use that design approach to create a specific software solution appropriate to the particular situation under consideration. In particular, design patterns for camera trap image analysis by wildlife biologists address solutions to commonly occurring problems they face while inspecting a large number of images and entering ecological data describing image attributes. We developed design patterns for image classification based on our understanding of biologists' needs that we acquired over 8 years during development and application of the freely available Timelapse image analysis system. For each design pattern presented, we describe the problem, a design approach that solves that problem, and a concrete example of how Timelapse addresses the design pattern. Our design patterns offer both general and specific solutions related to: maintaining data consistency, efficiencies in image inspection, methods for navigating between images, efficiencies in data entry including highly repetitious data entry, and sorting and filtering image into sequences, episodes, and subsets. These design patterns can inform the design of other camera trap systems and can help biologists assess how competing software products address their project‐specific needs along with determining an efficient workflow.  相似文献   

20.
ABSTRACT: BACKGROUND: A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. RESULTS: We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9 % and recall = 70.5 %) compared to a popular dictionary based approach (precision = 97.5 % and recall = 54.3 %) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5 % and 96.2 % respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Additionally, we present the comparison results of various machine learning algorithms on our annotated corpus. Naive Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms. CONCLUSIONS: We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号