首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A graphical user interface is presented that allows users of taxonomic data to explore concept relationships between conflicting but related taxonomic classifications.Ecological analyses that use taxonomic metadata depend on accurate naming of specimens and taxa, and if the metadata involves several taxonomies, care has to be taken to match concepts between them. To perform this accurately requires expert-defined concept relationships, which are more complex yet more representative than the simple one-to-one mappings found through simple name matching, and can accommodate nomenclatural changes and differences in classification technique (cf ‘lumpers’ versus ‘splitters’). In the SEEK-Taxon (Scientific Environment for Ecological Knowledge) project we aim to help users of taxonomic datasets untangle and understand these relationships through a prototype visual interface which graphically displays these relationship structures, allowing users to comprehend such information and more accurately name their data.  相似文献   

2.
To date, little is known about the relative importance of dispersal related versus local factors in shaping microbial metacommunities. A common criticism regarding existing datasets is that the level of taxonomic resolution might be too coarse to reliably assess microbial community structure and study biogeographical patterns. Moreover, few studies have assessed the importance of geographic distance between habitats, which may influence metacommunity dynamics through its effect on dispersal rates. We applied variation partitioning analyses to 15 separate regional datasets on diatoms found in lakes in Eurasia, Africa and Antarctica. These analyses quantified the relative contributions of dispersal related and local factors in determining patterns of taxonomic turnover at the species and at the genus level. In general, results were similar at both taxonomic levels. Local environmental factors accounted for most of the explained variation (median=21%), whereas dispersal related factors were much less important (median of significant fractions=5.5% variation explained) and failed to significantly explain any variation, independent of the environmental variables, in the majority of the datasets. However, the amount of variation explained by dispersal related factors increased with increasing geographic distance and increasing taxonomic resolution. We extrapolated our regional scale observations to the global scale by combining the regional datasets into a global dataset comprising 1039 freshwater lakes from both hemispheres and spanning a geographic distance of over 19 000  km. At this global scale, taxonomic turnover was lowest in highly connected habitats, once environmental factors were partialled out. In common with many other studies of macro-organisms, these analyses showed that both dispersal related and local variables significantly contribute to the structure of global lacustrine diatom communities.  相似文献   

3.
Complex multi-dimensional datasets are now pervasive in science and elsewhere in society. Better interactive tools are needed for visual data exploration so that patterns in such data may be easily discovered, data can be proofread, and subsets of data can be chosen for algorithmic analysis. In particular, synthetic research such as ecological interaction research demands effective ways to examine multiple datasets. This paper describes our integration of hundreds of food-web datasets into a common platform, and the visualization software, EcoLens, we developed for exploring this information. This publicly-available application and integrated dataset have been useful for our research predicting large complex food webs, and EcoLens is favorably reviewed by other researchers. Many habitats are not well represented in our large database. We confirm earlier results about the small size and lack of taxonomic resolution in early food webs but find that they and a non-food-web source provide trophic information about a large number of taxa absent from more modern studies. Corroboration of Tuesday Lake trophic links across studies is usually possible, but lack of links among congeners may have several explanations. While EcoLens does not provide all kinds of analytical support, its label- and item-based approach is effective at addressing concerns about the comparability and taxonomic resolution of food-web data.  相似文献   

4.
An intricate network of interactions between organisms and their environment form the ecosystems that sustain life on earth. With a detailed understanding of these interactions, ecologists and biologists can make better informed predictions about the ways different environmental factors will impact ecosystems. Despite the abundance of research data on biotic and abiotic interactions, no comprehensive and easily accessible data collection is available that spans taxonomic, geospatial, and temporal domains. Biotic-interaction datasets are effectively siloed, inhibiting cross-dataset comparisons. In order to pool resources and bring to light individual datasets, specialized research tools are needed to aggregate, normalize, and integrate existing datasets with standard taxonomies, ontologies, vocabularies, and structured data repositories. Global Biotic Interactions (GloBI) provides such tools by way of an open, community-driven infrastructure designed to lower the barrier for researchers to perform ecological systems analysis and modeling. GloBI provides a tool that (a) ingests, normalizes, and aggregates datasets, (b) integrates interoperable data with accepted ontologies (e.g., OBO Relations Ontology, Uberon, and Environment Ontology), vocabularies (e.g., Coastal and Marine Ecological Classification Standard), and taxonomies (e.g., Integrated Taxonomic Information System and National Center for Biotechnology Information Taxonomy Database), (c) makes data accessible through an application programming interface (API) and various data archives (Darwin Core, Turtle, and Neo4j), and (d) houses a data collection of about 700,000 species interactions across about 50,000 taxa, covering over 1100 references from 19 data sources. GloBI has taken an open-source and open-data approach in order to make integrated species-interaction data maximally accessible and to encourage users to provide feedback, contribute data, and improve data access methods. The GloBI collection of datasets is currently used in the Encyclopedia of Life (EOL) and Gulf of Mexico Species Interactions (GoMexSI).  相似文献   

5.
Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.  相似文献   

6.
7.
8.
Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution dataset mergers, such as the one exemplified here, can serve as a baseline towards comprehensive species distribution datasets.  相似文献   

9.
Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.  相似文献   

10.
11.
Organismal taxonomy is often based on a single or a small number of morphological characters. When they are morphologically simple or known to be plastic, we may not have great confidence in the taxonomic conclusions of analyses based on these characters. For example, calyptraeid gastropod shells are well known for their simplicity and plasticity, and appear to be subject to frequent evolutionary convergences, but are nevertheless the basis for calyptraeid taxonomy. In a case like this, knowing how the pattern of relationships inferred from morphological features used in traditional taxonomy compares to the patterns of relationships inferred from other morphological characters or DNA sequence data would be useful. In this paper, I examine the relative utility of traditional taxonomic characters (shell characters), anatomical characters and molecular characters for reconstructing the phylogeny of calyptraeid gastropods. The results of an ILD test and comparisons of the recovered tree topologies suggest that there is conflict between the DNA sequence data and the morphological data. Very few of the nodes recovered by the morphological data were recovered by any other dataset. Despite this conflict, the inclusion of morphological data increased the resolution and support of nodes in the topology recovered from a combined dataset. The RIs and CIs of the morphological data on the best estimate topology were not any worse than these indices for the other datasets. This analysis demonstrates that although analyses can be misled by these convergences if morphological characters are used alone, these characters contribute significantly to the combined dataset.  © 2003 The Linnean Society of London . Biological Journal of the Linnean Society , 2003, 78 , 541–593.  相似文献   

12.
The GRID: The General Repository for Interaction Datasets   总被引:8,自引:0,他引:8       下载免费PDF全文
We have developed a relational database, called the General Repository for Interaction Datasets (The GRID) to archive and display physical, genetic and functional interactions. The GRID displays data-rich interaction tables for any protein of interest, combines literature-derived and high-throughput interaction datasets, and is readily accessible via the web. Interactions parsed in The GRID can be viewed in graphical form with a versatile visualization tool called Osprey.  相似文献   

13.
Breitkreutz BJ  Stark C  Tyers M 《Genome biology》2002,3(12):preprint00
We have developed a relational database, called the General Repository for Interaction Datasets (The GRID; ) to archive and display physical, genetic and functional interactions. The GRID displays data-rich interaction tables for any protein of interest, combines literature-derived and high throughput interaction datasets, and is readily accessible via the World Wide Web. Interactions parsed in the GRID can be viewed in graphical form with a versatile visualization tool called Osprey.  相似文献   

14.
15.
Background: Traditional Chinese medicine (TCM) has been attracting lots of attentions from various disciplines recently. However, TCM is still mysterious because of its unique philosophy and theoretical thinking. Due to the lack of high quality data, understanding TCM thoroughly faces critical challenges. In this study, we introduce the Zhou Archive, a large-scale database of expert-specific Electronic Medical Records containing information about 73,000+ visits to one TCM doctor for over 35 years. Covering the full spectrum of diagnosis-treatment model behind TCM practice, the archive provides an opportunity to understand TCM from the data-driven perspective. Methods: Processing the text data in the archive via a series of data processing steps, we transformed the semi-structured EMRs in the archive to a well-structured feature table. Based on the structured feature table obtained, a series of statistical analyses are implemented to learn principles of TCM clinical practice from the archive, including correlation analysis, enrichment analysis, embedding analysis and association pattern discovery. Results: A structured feature table of 14,000+ features is generated at the end of the proposed data processing procedure, with a feature codebook, a term dictionary and a term-feature map as byproducts. Statistical analysis of the feature table reveals underlying principles about the diagnosis-treatment model of TCM, helping us better understand the TDM practice from a data-driven perspective. Conclusion: Expert-specific EMRs provide opportunities to understand TCM from the data-driven perspective. Taking advantage of recent progresses on NLP for Chinese, we can process a large number of TCM EMRs efficiently to gain insights via statistical analysis.  相似文献   

16.
DNA barcoding using a fragment of the mitochondrial cytochrome c oxidase subunit 1 gene (COI) has proven to be successful for species-level identification in many animal groups. However, most studies have been focused on relatively small datasets or on large datasets of taxonomically high-ranked groups. We explore the quality of DNA barcodes to delimit species in the diverse chironomid genus Tanytarsus (Diptera: Chironomidae) by using different analytical tools. The genus Tanytarsus is the most species-rich taxon of tribe Tanytarsini (Diptera: Chironomidae) with more than 400 species worldwide, some of which can be notoriously difficult to identify to species-level using morphology. Our dataset, based on sequences generated from own material and publicly available data in BOLD, consist of 2790 DNA barcodes with a fragment length of at least 500 base pairs. A neighbor joining tree of this dataset comprises 131 well separated clusters representing 121 morphological species of Tanytarsus: 77 named, 16 unnamed and 28 unidentified theoretical species. For our geographically widespread dataset, DNA barcodes unambiguously discriminate 94.6% of the Tanytarsus species recognized through prior morphological study. Deep intraspecific divergences exist in some species complexes, and need further taxonomic studies using appropriate nuclear markers as well as morphological and ecological data to be resolved. The DNA barcodes cluster into 120–242 molecular operational taxonomic units (OTUs) depending on whether Objective Clustering, Automatic Barcode Gap Discovery (ABGD), Generalized Mixed Yule Coalescent model (GMYC), Poisson Tree Process (PTP), subjective evaluation of the neighbor joining tree or Barcode Index Numbers (BINs) are used. We suggest that a 4–5% threshold is appropriate to delineate species of Tanytarsus non-biting midges.  相似文献   

17.
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the ‘state-of-the-art’ transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.  相似文献   

18.
Towards a collaborative, global infrastructure for biodiversity assessment   总被引:4,自引:0,他引:4  
Biodiversity data are rapidly becoming available over the Internet in common formats that promote sharing and exchange. Currently, these data are somewhat problematic, primarily with regard to geographic and taxonomic accuracy, for use in ecological research, natural resources management and conservation decision-making. However, web-based georeferencing tools that utilize best practices and gazetteer databases can be employed to improve geographic data. Taxonomic data quality can be improved through web-enabled valid taxon names databases and services, as well as more efficient mechanisms to return systematic research results and taxonomic misidentification rates back to the biodiversity community. Both of these are under construction. A separate but related challenge will be developing web-based visualization and analysis tools for tracking biodiversity change. Our aim was to discuss how such tools, combined with data of enhanced quality, will help transform today's portals to raw biodiversity data into nexuses of collaborative creation and sharing of biodiversity knowledge.  相似文献   

19.
Yan SK  Wu DX  Singh A  Li YL  Wei WS  Cui Y  Wang SL  Xu GB 《应用生态学报》2011,22(4):1067-1074
This paper presented a new and simple assessment method for the quality of ecological monitoring data. This method theorized the associations between the data reliability as an ordinal variable with different number of classes and the data sources such as natural main ecological processes, secondary ecological processes, and extraneous and exotic processes, and offered a new data quality index to estimate the quality of the whole dataset by using the reasonableness ratio of observations. The assessment results provided the reliability class of each dataset, good explanations for outlier (or error data) flagging decisions, and quality value of the whole dataset. The method was applied to assess two tree growth datasets from Chinese Ecosystem Research Network (CERN), and the results demonstrated that the new data quality index could quantitatively evaluate the quality of the tree growth datasets. The new method would facilitate the development of corresponding software.  相似文献   

20.
Spatial and/or temporal biases in biodiversity data can directly influence the utility, comparability, and reliability of ecological and evolutionary studies. While the effects of biased spatial coverage of biodiversity data are relatively well known, temporal variation in data quality (i.e., the congruence between recorded and actual information) has received much less attention. Here, we develop a conceptual framework for understanding the influence of time on biodiversity data quality based on three main processes: (1) the natural dynamics of ecological systems—such as species turnover or local extinction; (2) periodic taxonomic revisions, and; (3) the loss of physical and metadata due to inefficient curation, accidents, or funding shortfalls. Temporal decay in data quality driven by these three processes has fundamental consequences for the usage and comparability of data collected in different time periods. Data decay can be partly ameliorated by adopting standard protocols for generation, storage, and sharing data and metadata. However, some data degradation is unavoidable due to natural variations in ecological systems. Consequently, changes in biodiversity data quality over time need be carefully assessed and, if possible, taken into account when analyzing aging datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号