首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

2.
A database of 209 Drosophila introns was extracted from Genbank (release number 64.0) and examined by a number of methods in order to characterize features that might serve as signals for messenger RNA splicing. A tight distribution of sizes was observed: while the smallest introns in the database are 51 nucleotides, more than half are less than 80 nucleotides in length, and most of these have lengths in the range of 59-67 nucleotides. Drosophila splice sites found in large and small introns differ in only minor ways from each other and from those found in vertebrate introns. However, larger introns have greater pyrimidine-richness in the region between 11 and 21 nucleotides upstream of 3' splice sites. The Drosophila branchpoint consensus matrix resembles C T A A T (in which branch formation occurs at the underlined A), and differs from the corresponding mammalian signal in the absence of G at the position immediately preceding the branchpoint. The distribution of occurrences of this sequence suggests a minimum distance between 5' splice sites and branchpoints of about 38 nucleotides, and a minimum distance between 3' splice sites and branchpoints of 15 nucleotides. The methods we have used detect no information in exon sequences other than in the few nucleotides immediately adjacent to the splice sites. However, Drosophila resembles many other species in that there is a discontinuity in A + T content between exons and introns, which are A + T rich.  相似文献   

3.
Effective population size (Ne) is a key parameter of population genetics. However, Ne remains challenging to estimate for natural populations as several factors are likely to bias estimates. These factors include sampling design, sequencing method, and data filtering. One issue inherent to the restriction site‐associated DNA sequencing (RADseq) protocol is missing data and SNP selection criteria (e.g., minimum minor allele frequency, number of SNPs). To evaluate the potential impact of SNP selection criteria on Ne estimates (Linkage Disequilibrium method) we used RADseq data for a nonmodel species, the thornback ray. In this data set, the inbreeding coefficient FIS was positively correlated with the amount of missing data, implying data were missing nonrandomly. The precision of Neestimates decreased with the number of SNPs. Mean Ne estimates (averaged across 50 random data sets with2000 SNPs) ranged between 237 and 1784. Increasing the percentage of missing data from 25% to 50% increased Ne estimates between 82% and 120%, while increasing the minor allele frequency (MAF) threshold from 0.01 to 0.1 decreased estimates between 71% and 75%. Considering these effects is important when interpreting RADseq data‐derived estimates of effective population size in empirical studies.  相似文献   

4.
MOTIVATION: Most sequence comparison methods assume that the data being compared are trustworthy, but this is not the case with raw DNA sequences obtained from automatic sequencing machines. Nevertheless, sequence comparisons need to be done on them in order to remove vector splice sites and contaminants. This step is necessary before other genomic data processing stages can be carried out, such as fragment assembly or EST clustering. A specialized tool is therefore needed to solve this apparent dilemma. RESULTS: We have designed and implemented a program that specifically addresses the problem. This program, called LUCY, has been in use since 1998 at The Institute for Genomic Research (TIGR). During this period, many rounds of experience-driven modifications were made to LUCY to improve its accuracy and its ability to deal with extremely difficult input cases. We believe we have finally obtained a useful program which strikes a delicate balance among the many issues involved in the raw sequence cleaning problem, and we wish to share it with the research community. AVAILABILITY: LUCY is available directly from TIGR (http://www.tigr.org/softlab). Academic users can download LUCY after accepting a free academic use license. Business users may need to pay a license fee to use LUCY for commercial purposes. CONTACT: Questions regarding the quality assessment module of LUCY should be directed to Michael Holmes (mholmes@tigr.org). Questions regarding other aspects of LUCY should be directed to Hui-Hsien Chou (hhchou@iastate.edu).  相似文献   

5.
If the origins of fragments are known in genome sequencing projects, it is straightforward to reconstruct diploid consensus sequences. In reality, however, this is not true. Although there are proposed methods to reconstruct haplotypes from genome sequencing projects, an accuracy assessment is required to evaluate the confidence of the estimated diploid consensus sequences. In this paper, we define the confidence score of diploid consensus sequences. It requires the calculation of the likelihood of an assembly. To calculate the likelihood, we propose a linear time algorithm with respect to the number of polymorphic sites. The likelihood calculation and confidence score are used for further improvements of haplotype estimation in two directions. One direction is that low-scored phases are disconnected. The other direction is that, instead of using nominal frequency 1/2, the haplotype frequency is estimated to reflect the actual contribution of each haplotype. Our method was evaluated on the simulated data whose polymorphism rate (1.2 percent) was based on Ciona intestinalis. As a result, the high accuracy of our algorithm was indicated: The true positive rate of the haplotype estimation was greater than 97 percent  相似文献   

6.
The influence of egg size and composition on the size, quality and survival of lapwing chicks was examined on two farmland study sites in the Midland Valley of Scotland. Eggs comprised 33.1% yolk, 61.3% albumen and 5.6% shell. Whereas the yolk and shell proportions decreased with increasing egg size, the albumen proportion increased. Most variation in egg size was attributable to differences between females but was also influenced by clutch number (eggs in replacement clutches on the rough grazing, but not the arable, site were smaller), clutch size (eggs were smaller in smaller clutches), maternal body condition (females in good condition produced larger eggs) and habitat (since females on the arable site fed more successfully, they were in better condition and laid larger eggs). Chick size, weight and survival were all influenced by egg size. The incubation period varied between 21 and 28 days (mean = 25.2) and was shorter in clutches laid later in the season.  相似文献   

7.

Background  

Few overlap between independently developed gene signatures and poor inter-study applicability of gene signatures are two of major concerns raised in the development of microarray-based prognostic gene signatures. One recent study suggested that thousands of samples are needed to generate a robust prognostic gene signature.  相似文献   

8.
9.
Interpreting consensus sequences based on plurality rule.   总被引:1,自引:0,他引:1  
Our goal is to help researchers interpret the results of a function, based on the concept of plurality rule, that calculates a consensus of a profile of molecular bases. By expressing the plurality rule function as a composition of simpler functions, we obtain both an algorithm to calculate the consensus result and an upper bound on the number of nonequivalent results. Consequently, when used to analyze molecular sequences such as DNA or RNA, the plurality rule function yields at most 48 nonequivalent consensus results. For problems of reasonable size, we describe an algorithm to calculate the probability that each consensus result would occur if the bases were equally likely to appear at every position of the plurality rule function's input profile.  相似文献   

10.
詹月平  周敏  贺张  陈中正  段毕升  胡好远  肖晖 《生态学报》2013,33(11):3318-3323
寄主大小模型认为寄生蜂后代性比与寄主大小相关,寄生蜂倾向于在大寄主上产出更多雌性后代,在小寄主上产出更多雄性后代.探讨了以家蝇蛹为寄主时,蝇蛹佣小蜂后代产量和性比变化;单次寄生情况下,寄主大小及寄生顺序对寄生蜂后代性比等影响.结果表明,蝇蛹佣小蜂的产卵期为(8.93±3.34)d,单头雌蜂能产雌性后代(34.11±16.34)头和雄性后代(11.04±8.87)头,且雄性百分比为0.24±0.11.随成蜂日龄的增大,寄生蜂产生雄性后代的比率显著增加.蝇蛹佣小蜂在寄生家蝇蛹时,会优先选择寄生个体较大的蛹;在单次寄生的情况下,蝇蛹佣小蜂倾向于在较大的家蝇蛹内产出更多的雌性后代.  相似文献   

11.
Stochastic population theory makes clear predictions about the effects of reproductive potential and carrying capacity on characteristic time-scales of extinction. At the same time, the effects of habitat size and quality on reproduction and regulation have been hotly debated. To trace the causal relationships among these factors, we looked at the effects of habitat size and quality on extinction time in experimental populations of Daphnia magna. Replicate model systems representative of a broad-spectrum consumer foraging on a continuously supplied resource were established under crossed treatments of habitat size (two levels) and habitat quality (three levels) and monitored until eventual extinction of all populations. Using statistically derived estimates of key parameters, we related experimental treatments to persistence time through their effect on carrying capacity and the population growth rate. We found that carrying capacity and the intrinsic rate of increase were each influenced similarly by habitat size and quality, and that carrying capacity and the intrinsic rate of increase were in turn both correlated with time to population extinction. We expected habitat quality to have a greater influence on extinction. However, owing to an unexpected effect of habitat size on reproductive potential, habitat size and quality were similarly important for population persistence. These results support the idea that improving the population growth rate or carrying capacity will reduce extinction risk and demonstrate that both are possible by improving habitat quality or increasing habitat size.  相似文献   

12.
13.
14.
This study considers the effects of sample size on estimates of three parasitological indices (prevalence, mean abundance and mean intensity) in four different host–parasite systems, each showing a different pattern of infection. Monte Carlo simulation procedures were used in order to obtain an estimation of the parasitological indices, as well as their variance and bias, based on samples of different size. Although results showed that mean values of all indices were similar irrespective of sample size, estimates of prevalence were not significantly affected by sample size whereas mean abundance and mean intensity were affected in at least one sample. Underestimation of values was more perceptible in small (<40) sample sizes. Distribution of the estimated values revealed a different arrangement according to the host–parasite system and to the parasitological parameter. Monte Carlo simulation procedures are, therefore, suggested to be included in studies concerning estimation of parasitological parameters.  相似文献   

15.
We carried out in vitro selection experiments to systematically probe the effects of TATA-box flanking sequences on its interaction with the TATA-box binding protein (TBP). This study validates our previous hypothesis that the effect of the flanking sequences on TBP/TATA-box interactions is much more significant when the TATA box has a context-dependent DNA structure. Several interesting observations, with implications for protein–DNA interactions in general, came out of this study. (i) Selected sequences are selection-method specific and TATA-box dependent. (ii) The variability in binding stability as a function of the flanking sequences for (T-A)4 boxes is as large as the variability in binding stability as a function of the core TATA box itself. Thus, for (T-A)4 boxes the flanking sequences completely dominate and determine the binding interaction. (iii) Binding stabilities of all but one of the individual selected sequences of the (T-A)4form is significantly higher than that of their mononucleotide-based consensus sequence. (iv) Even though the (T-A)4 sequence is symmetric the flanking sequence pattern is asymmetric. We propose that the plasticity of (T-A)n sequences increases the number of conformationally distinct TATA boxes without the need to extent the TBP contact region beyond the eight-base-pair long TATA box.  相似文献   

16.
Fiske IJ  Bruna EM  Bolker BM 《PloS one》2008,3(8):e3080

Background

Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (λ) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of λ–Jensen''s Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of λ due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of λ.

Methodology/Principal Findings

Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating λ for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of λ with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography.

Conclusions/Significance

We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.  相似文献   

17.
Measures of geographic range size: the effects of sample size   总被引:2,自引:0,他引:2  
A number of methods have been used for quantifying the sizes of the geographic ranges of species. The consequences of different levels of sampling (the proportion of actual spatial occurrences) are explored for eight of these, using data on the occurrences of butterfly species on a 10 × 10 km grid across Britain. For all methods, the percentage error of estimation (PEE) decreases with the number of 10 × 10 km squares which a species occupies, most rapidly for extent measures, and more rapidly for area measures than for measures of numbers of units occupied. The rate of decline in PEE itself falls as sampling effort increases. At a given sampling level, rank correlations between range sizes measured by different methods are generally high, but there is no consistent change in the magnitude of these correlations as the level of sampling increases. The composition of the set of species with the smallest range sizes changes with the level of sampling.  相似文献   

18.
N-Glycosylation is a cotranslational and post-translational process of proteins that may influence protein folding, maturation, stability, trafficking, and consequently cell surface expression of functional channels. Here we have characterized two consensus N-glycosylation sequences of a voltage-gated K+ channel (Kv3.1). Glycosylation of Kv3.1 protein from rat brain and infected Sf9 cells was demonstrated by an electrophoretic mobility shift assay. Digestion of total brain membranes with peptide N glycosidase F (PNGase F) produced a much faster-migrating Kv3.1 immunoband than that of undigested brain membranes. To demonstrate N-glycosylation of wild-type Kv3.1 in Sf9 cells, cells were treated with tunicamycin. Also, partially purified proteins were digested with either PNGase F or endoglycosidase H. Attachment of simple-type oligosaccharides at positions 220 and 229 was directly shown by single (N229Q and N220Q) and double (N220Q/N229Q) Kv3.1 mutants. Functional measurements and membrane fractionation of infected Sf9 cells showed that unglycosylated Kv3.1s were transported to the plasma membrane. Unitary conductance of N220Q/N229Q was similar to that of the wild-type Kv3.1. However, whole cell currents of N220Q/N229Q channels had slower activation rates, and a slight positive shift in voltage dependence compared to wild-type Kv3.1. The voltage dependence of channel activation for N229Q and N220Q was much like that for N220Q/N229Q. These results demonstrate that the S1-S2 linker is topologically extracellular, and that N-glycosylation influences the opening of the voltage-dependent gate of Kv3.1. We suggest that occupancy of the sites is critical for folding and maturation of the functional Kv3.1 at the cell surface.  相似文献   

19.
Although the biochemistry of early trimming reactions by glucosidases and ER mannosidases occurring on asparagine-linked oligosaccharides has been known for a long time, their involvement in quality control of protein folding has become apparent only more recently. Here we review the evidence for the involvement of specific oligosaccharide trimming intermediates such as Glc(1)Man(9)GlcNAc(2) and Man(8)GlcNAc(2) B isomer in this fundamental cellular process and the subcellular distribution of components of the protein quality control machinery which indicates the involvement of both the ER and pre-Golgi intermediates in this process. In addition, recent studies on the subcellular distribution of endomannosidase in conjunction with previously obtained biochemical data will be reviewed which demonstrate that an alternative deglucosylation pathway exists in pre-Golgi intermediates and the Golgi apparatus.  相似文献   

20.
Parasite prevalence and host sample size   总被引:3,自引:0,他引:3  
Parasite prevalence is a summary statistic familiar to biologists. However, that there is an interspecific relationship between prevalence and sample size (the number of host individuals examined for parasites) is not widely appreciated. In this article, Richard Gregory and Tim Blackburn present some examples of this negative relationship, explain the mechanisms that underlie this pattern and discuss the potential problems this association might create for biological studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号