首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Implicit assumptions for most mark‐recapture studies are that individuals do not lose their markers and all observed markers are correctly recorded. If these assumptions are violated, e.g., due to loss or extreme wear of markers, estimates of population size and vital rates will be biased. Double‐marking experiments have been widely used to estimate rates of marker loss and adjust for associated bias, and we extended this approach to estimate rates of recording errors. We double‐marked 309 Piping Plovers (Charadrius melodus) with unique combinations of color bands and alphanumeric flags and used multi‐state mark recapture models to estimate the frequency with which plovers were misidentified. Observers were twice as likely to read and report an invalid color‐band combination (2.4% of the time) as an invalid alphanumeric code (1.0%). Observers failed to read matching band combinations or alphanumeric flag codes 4.5% of the time. Unlike previous band resighting studies, use of two resightable markers allowed us to identify when resighting errors resulted in reports of combinations or codes that were valid, but still incorrect; our results suggest this may be a largely unappreciated problem in mark‐resight studies. Field‐readable alphanumeric flags offer a promising auxiliary marker for identifying and potentially adjusting for false‐positive resighting errors that may otherwise bias demographic estimates.  相似文献   

2.
Molecular techniques for detecting microorganisms, macroorganisms and infectious agents are susceptible to false‐negative and false‐positive errors. If left unaddressed, these observational errors may yield misleading inference concerning occurrence, prevalence, sensitivity, specificity and covariate relationships. Occupancy models are widely used to account for false‐negative errors and more recently have even been used to address false‐positive errors, too. Current modelling options assume false‐positive errors only occur in truly negative samples, an assumption that yields biased inference concerning detection because a positive sample could be classified as such not because the target agent was successfully detected, but rather due to a false‐positive test result. We present an extension to the occupancy modelling framework that allows false‐positive errors in both negative and positive samples, thereby providing unbiased inference concerning occurrence and detection, as well as reliable conclusions about the efficacy of sampling designs, handling protocols and diagnostic tests. We apply the model to simulated data, showing that it recovers known parameters and outperforms other approaches that are commonly used when confronted with observation errors. We then apply the model to an experimental data set on Batrachochytrium dendrobatidis, a pathogenic fungus that is implicated in the global decline or extinction of hundreds of amphibian species. The model‐based approach we present is not only useful for obtaining reliable inference when data are contaminated with observational errors, but also eliminates the need for establishing arbitrary thresholds or decision rules that have hidden and unintended consequences.  相似文献   

3.
Diagnostic studies in ophthalmology frequently involve binocular data where pairs of eyes are evaluated, through some diagnostic procedure, for the presence of certain diseases or pathologies. The simplest approach of estimating measures of diagnostic accuracy, such as sensitivity and specificity, treats eyes as independent, consequently yielding incorrect estimates, especially of the standard errors. Approaches that account for the inter‐eye correlation include regression methods using generalized estimating equations and likelihood techniques based on various correlated binomial models. The paper proposes a simple alternative statistical methodology of jointly estimating measures of diagnostic accuracy for binocular tests based on a flexible model for correlated binary data. Moments' estimation of model parameters is outlined and asymptotic inference is discussed. The resulting estimates are straightforward and easy to obtain, requiring no special statistical software but only elementary calculations. Results of simulations indicate that large‐sample and bootstrap confidence intervals based on the estimates have relatively good coverage properties when the model is correctly specified. The computation of the estimates and their standard errors are illustrated with data from a study on diabetic retinopathy.  相似文献   

4.
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut‐offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high‐throughput environmental sequencing. This method provides rank‐flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast ‐based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave‐one‐out cross‐validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut‐offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.  相似文献   

5.
This work builds upon the record-breaking speed and generous immediate release of new experimental three-dimensional structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins and complexes, which are crucial to downstream vaccine and drug development. We have surveyed those structures to catch the occasional errors that could be significant for those important uses and for which we were able to provide demonstrably higher-accuracy corrections. This process relied on new validation and correction methods such as CaBLAM and ISOLDE, which are not yet in routine use. We found such important and correctable problems in seven early SARS-CoV-2 structures. Two of the structures were soon superseded by new higher-resolution data, confirming our proposed changes. For the other five, we emailed the depositors a documented and illustrated report and encouraged them to make the model corrections themselves and use the new option at the worldwide Protein Data Bank for depositors to re-version their coordinates without changing the Protein Data Bank code. This quickly and easily makes the better-accuracy coordinates available to anyone who examines or downloads their structure, even before formal publication. The changes have involved sequence misalignments, incorrect RNA conformations near a bound inhibitor, incorrect metal ligands, and cis-trans or peptide flips that prevent good contact at interaction sites. These improvements have propagated into nearly all related structures done afterward. This process constitutes a new form of highly rigorous peer review, which is actually faster and more strict than standard publication review because it has access to coordinates and maps; journal peer review would also be strengthened by such access.  相似文献   

6.
Objective: The psychosocial functioning of overweight youth is a growing concern. Research has shown that overweight children report lower quality of life (QOL) than their non‐overweight peers. This study sought to extend the literature by examining the association between peer victimization, child depressive symptoms, parent distress, and health‐related QOL in overweight youth. Mediator models are used to assess the effect of child depressive symptoms on the relationship between psychosocial variables and QOL. Research Methods and Procedures: The sample consisted of 96 overweight and at‐risk‐for‐overweight children (mean age = 12.8 years) and their parents who were recruited from a Pediatric Endocrinology Obesity Clinic. Parents completed a demographic questionnaire, the Pediatric Quality of Life Inventory–parent‐proxy version, and the Brief Symptom Inventory. Children completed the Children's Depression Inventory–Short Form, the Schwartz Peer Victimization Scale, and the Pediatric Quality of Life Inventory. Results: Increased parent distress, child depressive symptoms, and peer victimization were associated with lower QOL by both parent‐proxy and self‐report. Child depressive symptoms mediated the relationship between psychosocial variables (parent distress and peer victimization) for self‐reported QOL but not for parent‐proxy‐reported QOL. Discussion: This study documented the important impact of peer victimization and parental distress on the QOL of overweight children. Expanding our understanding of how overweight children experience and interact with their environment is critical. Further research is needed to examine the mechanisms by which parent distress and peer victimization impact the development of depressive symptoms in overweight children, including coping and support strategies that may buffer these children against the development of depressive symptoms and ultimately lower QOL.  相似文献   

7.
Introduction – Orange (Citrus sinensis L.) juice comprises a complex mixture of volatile components that are difficult to identify and quantify. Classification and discrimination of the varieties on the basis of the volatile composition could help to guarantee the quality of a juice and to detect possible adulteration of the product. Objective – To provide information on the amounts of volatile constituents in fresh‐squeezed juices from four orange cultivars and to establish suitable discrimination rules to differentiate orange juices using new chemometric approaches. Methodology – Fresh juices of four orange cultivars were analysed by headspace solid‐phase microextraction (HS‐SPME) coupled with GC‐MS. Principal component analysis, linear discriminant analysis and heuristic methods, such as neural networks, allowed clustering of the data from HS‐SPME analysis while genetic algorithms addressed the problem of data reduction. To check the quality of the results the chemometric techniques were also evaluated on a sample. Results – Thirty volatile compounds were identified by HS‐SPME and GC‐MS analyses and their relative amounts calculated. Differences in composition of orange juice volatile components were observed. The chosen orange cultivars could be discriminated using neural networks, genetic relocation algorithms and linear discriminant analysis. Genetic algorithms applied to the data were also able to detect the most significant compounds. Conclusions – SPME is a useful technique to investigate orange juice volatile composition and a flexible chemometric approach is able to correctly separate the juices. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank‐ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly‐used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6‐bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column‐specific properties such as sequence entropy and random noise were subtracted; “central” positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints—detectable by divergent algorithms—that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. Proteins 2015; 83:2293–2306. © 2015 Wiley Periodicals, Inc.  相似文献   

9.
In linear mixed‐effects models, random effects are used to capture the heterogeneity and variability between individuals due to unmeasured covariates or unknown biological differences. Testing for the need of random effects is a nonstandard problem because it requires testing on the boundary of parameter space where the asymptotic chi‐squared distribution of the classical tests such as likelihood ratio and score tests is incorrect. In the literature several tests have been proposed to overcome this difficulty, however all of these tests rely on the restrictive assumption of i.i.d. measurement errors. The presence of correlated errors, which often happens in practice, makes testing random effects much more difficult. In this paper, we propose a permutation test for random effects in the presence of serially correlated errors. The proposed test not only avoids issues with the boundary of parameter space, but also can be used for testing multiple random effects and any subset of them. Our permutation procedure includes the permutation procedure in Drikvandi, Verbeke, Khodadadi, and Partovi Nia (2013) as a special case when errors are i.i.d., though the test statistics are different. We use simulations and a real data analysis to evaluate the performance of the proposed permutation test. We have found that random slopes for linear and quadratic time effects may not be significant when measurement errors are serially correlated.  相似文献   

10.
Speculation over a global rise in jellyfish populations has become widespread in the scientific literature, but until recently the purported ‘global increase’ had not been tested. Here we present a citation analysis of peer‐reviewed literature to track the evolution of the current perception of increases in jellyfish and identify key papers involved in its establishment. Trend statements and citation threads were reviewed and arranged in a citation network. Trend statements were assessed according their degree of affirmation and spatial scale, and the appropriateness of the citations used to support statements was assessed. Analyses showed that 48.9% of publications misinterpreted the conclusions of cited sources, with a bias towards claiming jellyfish populations are increasing, with a single review having the most influence on the network. Collectively, these disparities resulted in a network based on unsubstantiated statements and citation threads. As a community, we must ensure our statements about scientific findings in general are accurately substantiated and carefully communicated such that incorrect perceptions, as in the case of jellyfish blooms, do not develop in the absence of rigorous testing.  相似文献   

11.
Radar systems have been increasingly used to monitor birds. To take full advantage of the large datasets provided by radars, researchers have implemented machine learning (ML) techniques that automatically read and attempt to classify targets. Here we used data collected from two locations in Portugal with two marine radar antennas (VSR and HSR) to apply and compare the performance of six ML algorithms that are widely used in the literature: random forests (RF), support vector machine (SVM), artificial neural networks (NN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and decision trees (DT), all trained with several dataset configurations. We found that all algorithms performed well (area under the receiver operating characteristic (AUC) and accuracy > 0.80, < 0.001) when discriminating birds from non‐biological targets such as vehicles, rain or wind turbines, but greater variance in the performance among algorithms was apparent when separating different bird functional groups or bird species (e.g. herons vs. gulls). In our case study, only RF was able to hold an accuracy > 0.80 for all classification tasks, although SVM and DT also performed well. Further, all algorithms correctly classified 86% and 66% (VSR and HSR) of the target points, and only 2% and 4% of these points were misclassified by all algorithms. Our results suggest that ML algorithms are suitable for classifying radar targets as birds, and thereby separating them from other non‐biological targets. The ability of these algorithms to correctly identify among bird species functional groups was found to be much weaker, but if properly trained and supported by a good ground truthing dataset, targeted to the relevant species groups, some of these algorithms are still able to achieve high accuracies in classification tasks. Such results indicate that ML algorithms are suitable for use in near real‐time monitoring of bird movements, and may help to mitigate collision of birds with, for example, wind turbines or airplanes.  相似文献   

12.
Single nucleotide polymorphisms (SNPs) have become the marker of choice for genetic studies in organisms of conservation, commercial or biological interest. Most SNP discovery projects in nonmodel organisms apply a strategy for identifying putative SNPs based on filtering rules that account for random sequencing errors. Here, we analyse data used to develop 4723 novel SNPs for the commercially important deep‐sea fish, orange roughy (Hoplostethus atlanticus), to assess the impact of not accounting for systematic sequencing errors when filtering identified polymorphisms when discovering SNPs. We used SAMtools to identify polymorphisms in a velvet assembly of genomic DNA sequence data from seven individuals. The resulting set of polymorphisms were filtered to minimize ‘bycatch’—polymorphisms caused by sequencing or assembly error. An Illumina Infinium SNP chip was used to genotype a final set of 7714 polymorphisms across 1734 individuals. Five predictors were examined for their effect on the probability of obtaining an assayable SNP: depth of coverage, number of reads that support a variant, polymorphism type (e.g. A/C), strand‐bias and Illumina SNP probe design score. Our results indicate that filtering out systematic sequencing errors could substantially improve the efficiency of SNP discovery. We show that BLASTX can be used as an efficient tool to identify single‐copy genomic regions in the absence of a reference genome. The results have implications for research aiming to identify assayable SNPs and build SNP genotyping assays for nonmodel organisms.  相似文献   

13.
The whale shark (Rhincodon typus) is an endangered marine fish species which can be adversely affected by the fishing activities of the industrial purse seine fleet targeting tropical tuna. Tuna tend to aggregate around all types of floating objects, including whale sharks. We analyzed and modeled the spatial distribution and environmental preferences of whale sharks based on the presence and absence data from fishing observations in the Atlantic Ocean. We used a thorough multialgorithm analysis, based on a new presence–absence dataset, and endeavored to follow the most recent recommendations on best practices in species distribution modeling. First, we selected a subset of relevant variables using a generalized linear model that addressed multicollinearity, statistical errors, and information criteria. We then used the selected variables to build a model ensemble including 19 different algorithms. After eliminating models with insufficient performance, we assessed the potential distribution of whale sharks using the mean of the predictions of the selected models. We also assessed the variance among the predictions of different algorithms, in order to identify areas with the highest model consensus. The results show that several coastal regions and warm shallow currents, such as the Gulf Stream and the Canary and Benguela currents, are the most suitable areas for whale sharks under current environmental conditions. Future environmental projections for the Atlantic Ocean suggest that some of the suitable regions will shift northward, but current concentration areas will continue to be suitable for whale shark, although with less productivity, which could have negative consequences for conservation of the species. We discuss the implications of these predictions for the conservation and management of this charismatic marine species.  相似文献   

14.
The sexes of non‐ratite birds can be determined routinely by PCR amplification of the CHD‐Z and CHD‐W genes. CHD‐based molecular sexing of four species of auklets revealed the presence of a polymorphism in the Z chromosome. No deviation from a 1:1 sex ratio was observed among the chicks, though the analyses were of limited power. Polymorphism in the CHD‐Z gene has not been reported previously in any bird, but if undetected it could lead to the incorrect assignment of sex. We discuss the potential difficulties caused by a polymorphism such as that identified in auklets and the merits of alternative CHD‐based sexing protocols and primers.  相似文献   

15.
Process life cycle assessment (PLCA) is widely used to quantify environmental flows associated with the manufacturing of products and other processes. As PLCA always depends on defining a system boundary, its application involves truncation errors. Different methods of estimating truncation errors are proposed in the literature; most of these are based on artificially constructed system complete counterfactuals. In this article, we review the literature on truncation errors and their estimates and systematically explore factors that influence truncation error estimates. We classify estimation approaches, together with underlying factors influencing estimation results according to where in the estimation procedure they occur. By contrasting different PLCA truncation/error modeling frameworks using the same underlying input‐output (I‐O) data set and varying cut‐off criteria, we show that modeling choices can significantly influence estimates for PLCA truncation errors. In addition, we find that differences in I‐O and process inventory databases, such as missing service sector activities, can significantly affect estimates of PLCA truncation errors. Our results expose the challenges related to explicit statements on the magnitude of PLCA truncation errors. They also indicate that increasing the strictness of cut‐off criteria in PLCA has only limited influence on the resulting truncation errors. We conclude that applying an additional I‐O life cycle assessment or a path exchange hybrid life cycle assessment to identify where significant contributions are located in upstream layers could significantly reduce PLCA truncation errors.  相似文献   

16.
Gender assignment errors are common in some animal species and lead to inaccuracies in downstream analyses. Procedures for detecting gender misassignment are available for array‐based SNP data but are still being developed for genotyping‐by‐sequencing (GBS) data. In this study, we describe a method for using GBS data to predict gender using X and Y chromosomal SNPs. From a set of 1286 X chromosomal and 23 Y chromosomal deer (Cervus sp.) SNPs discovered from GBS sequence reads, a prediction model was built using a training dataset of 422 Red deer and validated using a test dataset of 868 Red deer and Wapiti deer. Prediction was based on the proportion of heterozygous genotypes on the X chromosome and the proportion of non‐missing genotypes on the Y chromosome observed in each individual. The concordance between recorded gender and predicted gender was 98.6% in the training dataset and 99.3% in the test dataset. The model identified five individuals across both datasets with incorrect recorded gender and was unable to predict gender for another five individuals. Overall, our method predicted gender with a high degree of accuracy and could be used for quality control in gender assignment datasets or for assigning gender when unrecorded, provided a suitable reference genome is available.  相似文献   

17.
In perennial energy crop breeding programmes, it can take several years before a mature yield is reached when potential new varieties can be scored. Modern plant breeding technologies have focussed on molecular markers, but for many crop species, this technology is unavailable. Therefore, prematurity predictors of harvestable yield would accelerate the release of new varieties. Metabolic biomarkers are routinely used in medicine, but they have been largely overlooked as predictive tools in plant science. We aimed to identify biomarkers of productivity in the bioenergy crop, Miscanthus, that could be used prognostically to predict future yields. This study identified a metabolic profile reflecting productivity in Miscanthus by correlating the summer carbohydrate composition of multiple genotypes with final yield 6 months later. Consistent and strong, significant correlations were observed between carbohydrate metrics and biomass traits at two separate field sites over 2 years. Machine‐learning feature selection was used to optimize carbohydrate metrics for support vector regression models, which were able to predict interyear biomass traits with a correlation (R) of >0.67 between predicted and actual values. To identify a causal basis for the relationships between the glycome profile and biomass, a 13C‐labelling experiment compared carbohydrate partitioning between high‐ and low‐yielding genotypes. A lower yielding and slower growing genotype partitioned a greater percentage of the 13C pulse into starch compared to a faster growing genotype where a greater percentage was located in the structural biomass. These results supported a link between plant performance and carbon flow through two rival pathways (starch vs. sucrose), with higher yielding plants exhibiting greater partitioning into structural biomass, via sucrose metabolism, rather than starch. Our results demonstrate that the plant metabolome can be used prognostically to anticipate future yields and this is a method that could be used to accelerate selection in perennial energy crop breeding programmes.  相似文献   

18.
Powdery mildew, caused by Golovinomyces orontii and Podosphaera xanthii, is a widespread disease that causes important losses in cucurbit production. To determine the aetiology and the epidemiology of cucurbit powdery mildew disease in the North of Italy, observations on the occurrence of the main disease‐causing fungal species were conducted during the 2010, 2011 and 2012 growing seasons. Samples of infected leaves of zucchini, melon and pumpkin plants, either from field or greenhouse crops, were collected every 15–18 days from May to September/October. To identify the fungal species, both morphological observations based on the asexual stage and molecular identifications by a Multiplex‐PCR reaction with species‐specific primers were performed. Climatic parameters of temperature and relative humidity were also monitored. Pearson's correlation coefficient and Principal Component Analysis showed a negative significant correlation between the two species, and a peculiar epidemiological behaviour was also observed: the earlier infections were caused by G. orontii, which was the predominant species till the end of June–middle of July. At this time, this species progressively decreased in frequency and was replaced by P. xanthii that became the main species infecting cucurbits till the end of the growing season. As the two species have different ecological requirements, these seasonal variations in the cucurbit powdery mildew species composition could possibly be explained by the influence of temperature and relative humidity on the pathogen epidemiology during the growing season but also by the different overwintering strategies adopted by the two species.  相似文献   

19.
van den Oord EJ  Jiang Y  Riley BP  Kendler KS  Chen X 《BioTechniques》2003,34(3):610-6, 618-20, 622 passim
For technologies that are commonly used in ordinary laboratories such as fluorescence-polarization detection with template-directed, dye-terminator incorporation (FP-TDI), SNP genotype scoring is usually done manually. Here we study rates of errors and missing genotypes obtained with this procedure. We also introduce three statistical genotype scoring methods to examine whether they form a viable alternative. Data consisted of eight SNPs typed in about 1400 individuals from 268 pedigrees. The statistical procedures performed better on several internal criteria, such as the number of Mendelian errors, and showed much higher agreement with discrepant genotypes re-scored by two raters. The best results were obtained with the statistical procedure that incorporated information about regularities in the error structure of the FP-TDI data. We estimated that there were about 1.6% more errors if genotypes were scored manually. About 0.6% of these errors could be explained by data manipulation errors, leaving 1% as the result of possible incorrect scoring. There were 3.3% more missing genotypes in the manual scoring due to errors in data manipulation (1.7%) and conservative scoring (1.6%).  相似文献   

20.
Since the discovery of exceptionally preserved theropod dinosaurs with soft tissues in China in the 1990s, there has been much debate about the nature of filamentous structures observed in some specimens. Sinosauropteryx was the first non‐avian theropod to be described with these structures, and remains one of the most studied examples. Despite a general consensus that the structures represent feathers or feather homologues, a few identify them as degraded collagen fibres derived from the skin. This latter view has been based on observations of low‐quality images of Sinosauropteryx, as well as the suggestion that because superficially similar structures are seen in Jurassic ichthyosaurs they cannot represent feathers. Here, we highlight issues with the evidence put forward in support of this view, showing that integumentary structures have been misinterpreted based on sedimentary features and preparation marks, and that these errors have led to incorrect conclusions being drawn about the existence of collagen in Sinosauropteryx and the ichthyosaur Stenopterygius. We find that there is no evidence to support the idea that the integumentary structures seen in the two taxa are collagen fibres, and confirm that the most parsimonious interpretation of fossilized structures that look like feather homologues in Sinosauropteryx is that they are indeed the remains of feather homologues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号