首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Species distribution models (SDMs) are often calibrated using presence‐only datasets plagued with environmental sampling bias, which leads to a decrease of model accuracy. In order to compensate for this bias, it has been suggested that background data (or pseudoabsences) should represent the area that has been sampled. However, spatially‐explicit knowledge of sampling effort is rarely available. In multi‐species studies, sampling effort has been inferred following the target‐group (TG) approach, where aggregated occurrence of TG species informs the selection of background data. However, little is known about the species‐ specific response to this type of bias correction. The present study aims at evaluating the impacts of sampling bias and bias correction on SDM performance. To this end, we designed a realistic system of sampling bias and virtual species based on 92 terrestrial mammal species occurring in the Mediterranean basin. We manipulated presence and background data selection to calibrate four SDM types. Unbiased (unbiased presence data) and biased (biased presence data) SDMs were calibrated using randomly distributed background data. We used real and TG‐estimated sampling efforts in background selection to correct for sampling bias in presence data. Overall, environmental sampling bias had a deleterious effect on SDM performance. In addition, bias correction improved model accuracy, and especially when based on spatially‐explicit knowledge of sampling effort. However, our results highlight important species‐specific variations in susceptibility to sampling bias, which were largely explained by range size: widely‐distributed species were most vulnerable to sampling bias and bias correction was even detrimental for narrow‐ranging species. Furthermore, spatial discrepancies in SDM predictions suggest that bias correction effectively replaces an underestimation bias with an overestimation bias, particularly in areas of low sampling intensity. Thus, our results call for a better estimation of sampling effort in multispecies system, and cautions the uninformed and automatic application of TG bias correction.  相似文献   

2.
Species distribution modelling (SDM) has become an essential method in ecology and conservation. In the absence of survey data, the majority of SDMs are calibrated with opportunistic presence‐only data, incurring substantial sampling bias. We address the challenge of correcting for sampling bias in the data‐sparse situations. We modelled the relative intensity of bat records in their entire range using three modelling algorithms under the point‐process modelling framework (GLMs with subset selection, GLMs fitted with an elastic‐net penalty, and Maxent). To correct for sampling bias, we applied model‐based bias correction by incorporating spatial information on site accessibility or sampling efforts. We evaluated the effect of bias correction on the models’ predictive performance (AUC and TSS), calculated on spatial‐block cross‐validation and a holdout data set. When evaluated with independent, but also sampling‐biased test data, correction for sampling bias led to improved predictions. The predictive performance of the three modelling algorithms was very similar. Elastic‐net models have intermediate performance, with slight advantage for GLMs on cross‐validation and Maxent on hold‐out evaluation. Model‐based bias correction is very useful in data‐sparse situations, where detailed data are not available to apply other bias correction methods. However, bias correction success depends on how well the selected bias variables describe the sources of bias. In this study, accessibility covariates described bias in our data better than the effort covariate, and their use led to larger changes in predictive performance. Objectively evaluating bias correction requires bias‐free presence–absence test data, and without them the real improvement for describing a species’ environmental niche cannot be assessed.  相似文献   

3.
Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.  相似文献   

4.
Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter “observer bias”). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly – by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the “pseudo-absence problem” of where to locate pseudo-absences (and how many). The proposed method of bias-correction is evaluated using systematically collected presence/absence data for 62 plant species endemic to the Blue Mountains near Sydney, Australia. It is shown that modelling and controlling for observer bias significantly improves the accuracy of predictions made using presence-only data, and usually improves predictions as compared to pseudo-absence or “inventory” methods of bias correction based on absences from non-target species. Future research will consider the potential for improving the proposed bias-correction approach by estimating the observer bias simultaneously across multiple species.  相似文献   

5.
A statistical model is proposed for the analysis of errors in microarray experiments and is employed in the analysis and development of a combined normalisation regime. Through analysis of the model and two-dye microarray data sets, this study found the following. The systematic error introduced by microarray experiments mainly involves spot intensity-dependent, feature-specific and spot position-dependent contributions. It is difficult to remove all these errors effectively without a suitable combined normalisation operation. Adaptive normalisation using a suitable regression technique is more effective in removing spot intensity-related dye bias than self-normalisation, while regional normalisation (block normalisation) is an effective way to correct spot position-dependent errors. However, dye-flip replicates are necessary to remove feature-specific errors, and also allow the analyst to identify the experimentally introduced dye bias contained in non-self-self data sets. In this case, the bias present in the data sets may include both experimentally introduced dye bias and the biological difference between two samples. Self-normalisation is capable of removing dye bias without identifying the nature of that bias. The performance of adaptive normalisation, on the other hand, depends on its ability to correctly identify the dye bias. If adaptive normalisation is combined with an effective dye bias identification method then there is no systematic difference between the outcomes of the two methods.  相似文献   

6.
7.
Diseased animals may exhibit behavioral shifts that increase or decrease their probability of being randomly sampled. In harvest-based sampling approaches, animal movements, changes in habitat utilization, changes in breeding behaviors during harvest periods, or differential susceptibility to harvest via behaviors like hiding or decreased sensitivity to stimuli may result in a non-random sample that biases prevalence estimates. We present a method that can be used to determine whether bias exists in prevalence estimates from harvest samples. Using data from harvested mule deer (Odocoileus hemionus) sampled in northcentral Colorado (USA) during fall hunting seasons 1996-98 and Akaike's information criterion (AIC) model selection, we detected within-yr trends indicating potential bias in harvest-based prevalence estimates for chronic wasting disease (CWD). The proportion of CWD-positive deer harvested slightly increased through time within a yr. We speculate that differential susceptibility to harvest or breeding season movements may explain the positive trend in proportion of CWD-positive deer harvested during fall hunting seasons. Detection of bias may provide information about temporal patterns of a disease, suggest biological hypotheses that could further understanding of a disease, or provide wildlife managers with information about when diseased animals are more or less likely to be harvested. Although AIC model selection can be useful for detecting bias in data, it has limited utility in determining underlying causes of bias. In cases where bias is detected in data using such model selection methods, then design-based methods (i.e., experimental manipulation) may be necessary to assign causality.  相似文献   

8.
Transect techniques for censusing reef fishes, and the sources of bias inherent in them are considered. A technique, derived from aeraly survey methods, is demonstrated to correct a bias in density estimates due to the width of the transect being censused. This bias is sufficient on a transect 1 m wide to underestimate density by 11.1–26.7% for five species or species groups examined. The bias is still greater on wider transects. Because this bias varies in degree among species, comparisons among species should not be made using uncorrected transect data. Comments are made on other probable sources of bias in transect data, and on ways of minimising bias when making visual transect censuses.  相似文献   

9.
Researchers have suggested that certain individuals may show a self-positivity bias, rating themselves as possessing more positive personality traits than others. Previous evidence has shown that people evaluate self-related information in such a way as to maintain or enhance self-esteem. However, whether self-esteem would modulate the time course of self-positivity bias in explicit self-evaluation has never been explored. In the present study, 21 participants completed the Rosenberg self-esteem scale and then completed a task where they were instructed to indicate to what extent positive/negative traits described themselves. Behavioral data showed that participants endorsed positive traits as higher in self-relevance compared to the negative traits. Further, participants’ self-esteem levels were positively correlated with their self-positivity bias. Electrophysiological data revealed smaller N1 amplitude and larger late positive component (LPC) amplitude to stimuli consistent with the self-positivity bias (positive-high self-relevant stimuli) when compared to stimuli that were inconsistent with the self-positivity bias (positive-low self-relevant stimuli). Moreover, only in individuals with low self-esteem, the latency of P2 was more pronounced in processing stimuli that were consistent with the self-positivity bias (negative-low self-relevant stimuli) than to stimuli that were inconsistent with the self-positivity bias (positive-low self-relevant stimuli). Overall, the present study provides additional support for the view that low self-esteem as a personality variable would affect the early attentional processing.  相似文献   

10.
11.
A generalized interval mapping (GIM) method to map quantitative trait loci (QTL) for binary polygenic traits in a multi-family half-sib design is developed based on threshold theory and implemented using a Newton-Raphson algorithm. Statistical power and bias of QTL mapping for binary traits by GIM is compared with linear regression interval mapping (RIM) using simulation. Data on 20 paternal half-sib families were simulated with two genetic markers that bracketed an additive QTL. Data simulated and analysed were: (1) data on the underlying normally distributed liability (NDL) scale, (2) binary data created by truncating NDL data based on three thresholds yielding data sets with three different incidences, and (3) NDL data with polygenic and QTL effects reduced by a proportion equal to the ratio of the heritabilities on the binary versus NDL scale (reduced-NDL). Binary data were simulated with and without systematic environmental (herd) effects in an unbalanced design. GIM and RIM gave similar power to detect the QTL and similar estimates of QTL location, effects and variances. Presence of fixed effects caused differences in bias between RIM and GIM, where GIM showed smaller bias which was affected less by incidence. The original NDL data had higher power and lower bias in QTL parameter estimates than binary and reduced-NDL data. RIM for reduced-NDL and binary data gave similar power and estimates of QTL parameters, indicating that the impact of the binary nature of data on QTL analysis is equivalent to its impact on heritability.  相似文献   

12.
Trinquart L  Abbé A  Ravaud P 《PloS one》2012,7(4):e35219

Background

Indirect comparisons of competing treatments by network meta-analysis (NMA) are increasingly in use. Reporting bias has received little attention in this context. We aimed to assess the impact of such bias in NMAs.

Methods

We used data from 74 FDA-registered placebo-controlled trials of 12 antidepressants and their 51 matching publications. For each dataset, NMA was used to estimate the effect sizes for 66 possible pair-wise comparisons of these drugs, the probabilities of being the best drug and ranking the drugs. To assess the impact of reporting bias, we compared the NMA results for the 51 published trials and those for the 74 FDA-registered trials. To assess how reporting bias affecting only one drug may affect the ranking of all drugs, we performed 12 different NMAs for hypothetical analysis. For each of these NMAs, we used published data for one drug and FDA data for the 11 other drugs.

Findings

Pair-wise effect sizes for drugs derived from the NMA of published data and those from the NMA of FDA data differed in absolute value by at least 100% in 30 of 66 pair-wise comparisons (45%). Depending on the dataset used, the top 3 agents differed, in composition and order. When reporting bias hypothetically affected only one drug, the affected drug ranked first in 5 of the 12 NMAs but second (n = 2), fourth (n = 1) or eighth (n = 2) in the NMA of the complete FDA network.

Conclusions

In this particular network, reporting bias biased NMA-based estimates of treatments efficacy and modified ranking. The reporting bias effect in NMAs may differ from that in classical meta-analyses in that reporting bias affecting only one drug may affect the ranking of all drugs.  相似文献   

13.
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.  相似文献   

14.
Abstract: Incomplete detection of all individuals leading to negative bias in abundance estimates is a pervasive source of error in aerial surveys of wildlife, and correcting that bias is a critical step in improving surveys. We conducted experiments using duck decoys as surrogates for live ducks to estimate bias associated with surveys of wintering ducks in Mississippi, USA. We found detection of decoy groups was related to wetland cover type (open vs. forested), group size (1–100 decoys), and interaction of these variables. Observers who detected decoy groups reported counts that averaged 78% of the decoys actually present, and this counting bias was not influenced by either covariate cited above. We integrated this sightability model into estimation procedures for our sample surveys with weight adjustments derived from probabilities of group detection (estimated by logistic regression) and count bias. To estimate variances of abundance estimates, we used bootstrap resampling of transects included in aerial surveys and data from the bias-correction experiment. When we implemented bias correction procedures on data from a field survey conducted in January 2004, we found bias-corrected estimates of abundance increased 36–42%, and associated standard errors increased 38–55%, depending on species or group estimated. We deemed our method successful for integrating correction of visibility bias in an existing sample survey design for wintering ducks in Mississippi, and we believe this procedure could be implemented in a variety of sampling problems for other locations and species. (JOURNAL OF WILDLIFE MANAGEMENT 72(3):808–813; 2008)  相似文献   

15.
Rao Y  Wu G  Wang Z  Chai X  Nie Q  Zhang X 《DNA research》2011,18(6):499-512
Synonymous codons are used with different frequencies both among species and among genes within the same genome and are controlled by neutral processes (such as mutation and drift) as well as by selection. Up to now, a systematic examination of the codon usage for the chicken genome has not been performed. Here, we carried out a whole genome analysis of the chicken genome by the use of the relative synonymous codon usage (RSCU) method and identified 11 putative optimal codons, all of them ending with uracil (U), which is significantly departing from the pattern observed in other eukaryotes. Optimal codons in the chicken genome are most likely the ones corresponding to highly expressed transfer RNA (tRNAs) or tRNA gene copy numbers in the cell. Codon bias, measured as the frequency of optimal codons (Fop), is negatively correlated with the G + C content, recombination rate, but positively correlated with gene expression, protein length, gene length and intron length. The positive correlation between codon bias and protein, gene and intron length is quite different from other multi-cellular organism, as this trend has been only found in unicellular organisms. Our data displayed that regional G + C content explains a large proportion of the variance of codon bias in chicken. Stepwise selection model analyses indicate that G + C content of coding sequence is the most important factor for codon bias. It appears that variation in the G + C content of CDSs accounts for over 60% of the variation of codon bias. This study suggests that both mutation bias and selection contribute to codon bias. However, mutation bias is the driving force of the codon usage in the Gallus gallus genome. Our data also provide evidence that the negative correlation between codon bias and recombination rates in G. gallus is determined mostly by recombination-dependent mutational patterns.  相似文献   

16.
Biologists often use allometric equations that take the form of power functions (e.g., Y = aM(b), where M stands for mass and a and b are empirically fitted constants). Typically, these allometric equations are fitted by taking the antilog of log-log regressions. Predictions from these allometric equations are biased, and the bias my be appreciable. Methods for making predictions that correct for the bias are available, but they have rarely, if ever, been used by ecological and evolutionary physiologists. Just as physiologists would not use an instrument that was not properly calibrated, they should not use allometric equations to make predictions unless they account for the bias of those predictions. We analyzed 20 interspecific and 10 intraspecific data sets. We compared predictions from standard allometric equations with those from several alternative methods. Our analyses suggest that the bias of predictions from interspecific data sets may be substantial. For the intraspecific data sets we analyzed, the bias was likely to be small. Biologists, including ecological and evolutionary physiologists, should exercise care when using allometric equations to make predictions, particularly given that methods to adjust for bias are easily implemented.  相似文献   

17.
Phylogeny is deeply pertinent to evolutionary studies. Traits that perform a body function are expected to be strongly influenced by physical "requirements" of the function. We investigated if such traits exhibit phylogenetic signals, and, if so, how phylogenetic noises bias quantification of form-function relationships. A form-function system that is strongly influenced by physics, namely the relationship between eye morphology and visual optics in amniotes, was used. We quantified the correlation between form (i.e., eye morphology) and function (i.e., ocular optics) while varying the level of phylogenetic bias removal through adjusting Pagel's λ. Ocular soft-tissue dimensions exhibited the highest correlation with ocular optics when 1% of phylogenetic bias expected from Brownian motion was removed (i.e., λ= 0.01); the value for hard-tissue data were 8%. A small degree of phylogenetic bias therefore exists in morphology despite of the stringent functional constraints. We also devised a phylogenetically informed discriminant analysis and recorded the effects of phylogenetic bias on this method using the same data. Use of proper λ values during phylogenetic bias removal improved misidentification rates in resulting classifications when prior probabilities were assumed to be equal. Even a small degree of phylogenetic bias affected the classification resulting from phylogenetically informed discriminant analysis.  相似文献   

18.
Empirical Bayes models have been shown to be powerful tools for identifying differentially expressed genes from gene expression microarray data. An example is the WAME model, where a global covariance matrix accounts for array-to-array correlations as well as differing variances between arrays. However, the existing method for estimating the covariance matrix is very computationally intensive and the estimator is biased when data contains many regulated genes. In this paper, two new methods for estimating the covariance matrix are proposed. The first method is a direct application of the EM algorithm for fitting the multivariate t-distribution of the WAME model. In the second method, a prior distribution for the log fold-change is added to the WAME model, and a discrete approximation is used for this prior. Both methods are evaluated using simulated and real data. The first method shows equal performance compared to the existing method in terms of bias and variability, but is superior in terms of computer time. For large data sets (>15 arrays), the second method also shows superior computer run time. Moreover, for simulated data with regulated genes the second method greatly reduces the bias. With the proposed methods it is possible to apply the WAME model to large data sets with reasonable computer run times. The second method shows a small bias for simulated data, but appears to have a larger bias for real data with many regulated genes.  相似文献   

19.
MOTIVATION: Microarray experiments are affected by numerous sources of non-biological variation that contribute systematic bias to the resulting data. In a dual-label (two-color) cDNA or long-oligonucleotide microarray, these systematic biases are often manifested as an imbalance of measured fluorescent intensities corresponding to Sample A versus those corresponding to Sample B. Systematic biases also affect between-slide comparisons. Making effective corrections for these systematic biases is a requisite for detecting the underlying biological variation between samples. Effective data normalization is therefore an essential step in the confident identification of biologically relevant differences in gene expression profiles. Several normalization methods for the correction of systemic bias have been described. While many of these methods have addressed intensity-dependent bias, few have addressed both intensity-dependent and spatiality-dependent bias. RESULTS: We present a neural network-based normalization method for correcting the intensity- and spatiality-dependent bias in cDNA microarray datasets. In this normalization method, the dependence of the log-intensity ratio (M) on the average log-intensity (A) as well as on the spatial coordinates (X,Y) of spots is approximated with a feed-forward neural network function. Resistance to outliers is provided by assigning weights to each spot based on how distant their M values is from the median over the spots whose A values are similar, as well as by using pseudospatial coordinates instead of spot row and column indices. A comparison of the robust neural network method with other published methods demonstrates its potential in reducing both intensity-dependent bias and spatial-dependent bias, which translates to more reliable identification of truly regulated genes.  相似文献   

20.

Background

One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data.

Results

Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test.

Conclusions

To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号