期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Stop codons in bacteria are not selectively equivalent

IS Povolotskaya FA Kondrashov A Lleda PK Vlasov 《Biology direct》2012,7(1):30

ABSTRACT: BACKGROUND: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. RESULTS: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG. CONCLUSIONS: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.ReviewersThis article was reviewed by Michail Gelfand, Arcady Mushegian and Shamil Sunyaev. For the full reviews, please go to the Reviewers' Comments section. 相似文献

2.

Estimate of nucleotide diversity in dogs with a pool-and-sequence method

James A. Brouillette Jennifer R. Andrew Patrick J. Venta 《Mammalian genome》2000,11(12):1079-1086

Nucleotide diversity (π), the average number of base differences per site for two homologous sequences randomly selected from a population, is an important parameter used to understand the structure and history of populations. It is also important for determining the feasibility of developing a genetic map for a species from single nucleotide polymorphisms (SNPs). Nucleotide diversity has never been estimated for dogs. Segments of twelve canine genes from ten diverse dog breeds were examined for nucleotide variation by using a pool-and-sequence method. We identified three SNPs in the coding regions (2501 bp) and 11 SNPs in the introns (2953 bp). Each of these putative SNPs was tested by restriction enzyme analysis, and all were verified. Six additional SNPs were identified in a single SINE contained in one gene. Using these data, canine sequence diversity across breeds was estimated to be 0.001 and 0.0004 in intronic and coding regions, respectively, with SNPs spaced every 400 bp on average. Discovery of useful SNPs in 7 of the 12 genes suggests that construction of a canine SNP-based map can be accomplished with current technology. Thirteen polymorphic SNPs were also found in 5847 bp in the cat, horse, ox, and pig, by using four of the same genes from which canine nucleotide diversity was estimated. These results suggest that these species may have similar amounts of nucleotide diversity. Received: 1 February 2000 / Accepted: 22 August 2000 相似文献

3.

DNA Polymorphism Detectable by Restriction Endonucleases 总被引：67，自引：15，他引：67

下载免费PDF全文

Masatoshi Nei Fumio Tajima 《Genetics》1981,97(1):145-163

Data on DNA polymorphisms detected by restriction endonucleases are rapidly accumulating. With the aim of analyzing these data, several different measures of nucleon (DNA segment) diversity within and between populations are proposed, and statistical methods for estimating these quantities are developed. These statistical methods are applicable to both nuclear and nonnuclear DNAs. When evolutionary change of nucleons occurs mainly by mutation and genetic drift, all the measures can be expressed in terms of the product of mutation rate per nucleon and effective population size. A method for estimating nucleotide diversity from nucleon diversity is also presented under certain assumptions. It is shown that DNA divergence between two populations can be studied either by the average number of restriction site differences or by the average number of nucleotide differences. In either case, a large number of different restriction enzymes should be used for studying phylogenetic relationships among related organisms, since the effect of stochastic factors on these quantities is very large. The statistical methods developed have been applied to data of Shah and Langley on mitochondrial (mt)DNA from Drosophila melanogaster, simulans and virilis. This application has suggested that the evolutionary change of mtDNA in higher animals occurs mainly by nucleotide substitution rather than by deletion and insertion. The evolutionary distances among the three species have also been estimated. 相似文献

4.

Context-dependent mutation rates may cause spurious signatures of a fixation bias favoring higher GC-content in humans 总被引：1，自引：0，他引：1

Hernandez RD Williamson SH Zhu L Bustamante CD 《Molecular biology and evolution》2007,24(10):2196-2202

Understanding the proximate and ultimate causes underlying the evolution of nucleotide composition in mammalian genomes is of fundamental interest to the study of molecular evolution. Comparative genomics studies have revealed that many more substitutions occur from G and C nucleotides to A and T nucleotides than the reverse, suggesting that mammalian genomes are not at equilibrium for base composition. Analysis of human polymorphism data suggests that mutations that increase GC-content tend to be at much higher frequencies than those that decrease or preserve GC-content when the ancestral allele is inferred via parsimony using the chimpanzee genome. These observations have been interpreted as evidence for a fixation bias in favor of G and C alleles due to either positive natural selection or biased gene conversion. Here, we test the robustness of this interpretation to violations of the parsimony assumption using a data set of 21,488 noncoding single nucleotide polymorphisms (SNPs) discovered by the National Institute of Environmental Health Sciences (NIEHS) SNPs project via direct resequencing of n = 95 individuals. Applying standard nonparametric and parametric population genetic approaches, we replicate the signatures of a fixation bias in favor of G and C alleles when the ancestral base is assumed to be the base found in the chimpanzee outgroup. However, upon taking into account the probability of misidentifying the ancestral state of each SNP using a context-dependent mutation model, the corrected distribution of SNP frequencies for GC-content increasing SNPs are nearly indistinguishable from the patterns observed for other types of mutations, suggesting that the signature of fixation bias is a spurious artifact of the parsimony assumption. 相似文献

5.

Low Nucleotide Diversity in Man 总被引：49，自引：0，他引：49

下载免费PDF全文

W. H. Li L. A. Sadler 《Genetics》1991,129(2):513-523

The nucleotide diversity (pi) in humans is studied by using published cDNA and genomic sequences that have been carefully checked for sequencing accuracy. This measure of genetic variability is defined as the number of nucleotide differences per site between two randomly chosen sequences from a population. A total of more than 75,000 base pairs from 49 loci are compared. The DNA regions studied are the 5' and 3' untranslated regions and the amino acid coding regions. The coding regions are divided into nondegenerate sites (i.e., sites at which all possible changes are nonsynonymous), twofold degenerate sites (i.e., sites at each of which one of the three possible changes is synonymous) and fourfold degenerate sites (i.e., sites at which all three possible changes are synonymous). The pi values estimated are, respectively, 0.03 and 0.04% for the 5' and 3' UT regions, and 0.03, 0.06 and 0.11% for nondegenerate, twofold degenerate and fourfold degenerate sites. Since the highest pi value is only 0.11%, which is about one order of magnitude lower than those in Drosophila populations, the nucleotide diversity in humans is very low. The low diversity is probably due to a relatively small long-term effective population size rather than any severe bottleneck during human evolution. 相似文献

6.

Low diversity and biased substitution patterns in the mitochondrial DNA control region of sperm whales: implications for estimates of time since common ancestry

Lyrholm T; Leimar O; Gyllensten U 《Molecular biology and evolution》1996,13(10):1318-1326

The mitochondrial DNA (mtDNA) control region was sequenced in 37 sperm whales from a large part of the global range of the species. Nucleotide diversity was several-fold lower than that reported for control regions of abundant and outbred mammals, but similar to that for populations known to have experienced bottlenecks. Relative neck tests did not suggest that the low diversity is due to a lower substitution rate in sperm whale mtDNA. Rather, it is more likely that demographic factors have reduced diversity. The pattern of nucleotide substitutions was examined by cladistic methods, facilitated by the apparent monophyly of lineages from the Southern Hemisphere, as defined by a single base pair deletion. Substitutions were nonrandom in nature, confined to a few "hot spots," and parallel substitutions constituted a majority of the inferred changes. The substitution pattern fitted a negative binomial distribution better than a Poisson distribution, and the bias in number of substitutions among sites was considerably higher than previously reported for the mtDNA control region of any species. A novel method of estimating time since common ancestry was developed, which utilizes the transition/transversion ratio R and the number of substitutions inferred from a parsimony analysis. Using this method, we estimated the age of sperm whale mtDNA diversity to be about 6,000-25,000 years, and when the uncertainty of R was accounted for, a range of about 1,000- 100,000 years was obtained. 相似文献

7.

Genomic distribution and estimation of nucleotide diversity in natural populations: perspectives from the collared flycatcher (Ficedula albicollis) genome

下载免费PDF全文

Ludovic Dutoit Reto Burri Alexander Nater Carina F. Mugal Hans Ellegren 《Molecular ecology resources》2017,17(4):586-597

Properly estimating genetic diversity in populations of nonmodel species requires a basic understanding of how diversity is distributed across the genome and among individuals. To this end, we analysed whole‐genome resequencing data from 20 collared flycatchers (genome size ≈1.1 Gb; 10.13 million single nucleotide polymorphisms detected). Genomewide nucleotide diversity was almost identical among individuals (mean = 0.00394, range = 0.00384–0.00401), but diversity levels varied extensively across the genome (95% confidence interval for 200‐kb windows = 0.0013–0.0053). Diversity was related to selective constraint such that in comparison with intergenic DNA, diversity at fourfold degenerate sites was reduced to 85%, 3′ UTRs to 82%, 5′ UTRs to 70% and nondegenerate sites to 12%. There was a strong positive correlation between diversity and chromosome size, probably driven by a higher density of targets for selection on smaller chromosomes increasing the diversity‐reducing effect of linked selection. Simulations exploring the ability of sequence data from a small number of genetic markers to capture the observed diversity clearly demonstrated that diversity estimation from finite sampling of such data is bound to be associated with large confidence intervals. Nevertheless, we show that precision in diversity estimation in large outbred population benefits from increasing the number of loci rather than the number of individuals. Simulations mimicking RAD sequencing showed that this approach gives accurate estimates of genomewide diversity. Based on the patterns of observed diversity and the performed simulations, we provide broad recommendations for how genetic diversity should be estimated in natural populations. 相似文献

8.

Nucleotide diversity in gorillas 总被引：9，自引：0，他引：9

Yu N Jensen-Seaman MI Chemnick L Ryder O Li WH 《Genetics》2004,166(3):1375-1383

Comparison of the levels of nucleotide diversity in humans and apes may provide valuable information for inferring the demographic history of these species, the effect of social structure on genetic diversity, patterns of past migration, and signatures of past selection events. Previous DNA sequence data from both the mitochondrial and the nuclear genomes suggested a much higher level of nucleotide diversity in the African apes than in humans. Noting that the nuclear DNA data from the apes were very limited, we previously conducted a DNA polymorphism study in humans and another in chimpanzees and bonobos, using 50 DNA segments randomly chosen from the noncoding, nonrepetitive parts of the human genome. The data revealed that the nucleotide diversity (pi) in bonobos (0.077%) is actually lower than that in humans (0.087%) and that pi in chimpanzees (0.134%) is only 50% higher than that in humans. In the present study we sequenced the same 50 segments in 15 western lowland gorillas and estimated pi to be 0.158%. This is the highest value among the African apes but is only about two times higher than that in humans. Interestingly, available mtDNA sequence data also suggest a twofold higher nucleotide diversity in gorillas than in humans, but suggest a threefold higher nucleotide diversity in chimpanzees than in humans. The higher mtDNA diversity in chimpanzees might be due to the unique pattern in the evolution of chimpanzee mtDNA. From the nuclear DNA pi values, we estimated that the long-term effective population sizes of humans, bonobos, chimpanzees, and gorillas are, respectively, 10,400, 12,300, 21,300, and 25,200. 相似文献

9.

Global patterns in peatmoss biodiversity

Shaw AJ Cox CJ Boles SB 《Molecular ecology》2003,12(10):2553-2570

DNA sequence data from the nuclear ribosomal internal transcribed spacers (ITS) and the trnL-trnF chloroplast DNA regions were used to quantify geographical partitioning of global biodiversity in peatmosses (Sphagnum), and to compare patterns of molecular diversity with patterns of species richness. Molecular diversity was estimated for boreal, tropical, Neotropical, nonboreal (tropical plus Southern Hemisphere), Old World and New World partitions, based on a total of 436 accessions. Diversity was partitioned among geographical regions in terms of combined nuclear and chloroplast sequence data and separately for the ITS and trnL-trnF data sets. Levels of variation were estimated using phylogenetic diversity (PD), which incorporates branch lengths from a phylogenetic tree, and the number of polymorphic nucleotide sites. Estimates of species richness suggest that peatmoss diversity is higher in New World than Old World regions, and that the Neotropics constitute a "hotspot" of diversity. Molecular estimates, in contrast, indicate that peatmoss biodiversity is almost evenly divided between New and Old World regions, and that the Neotropics account for only 20-35% of global peatmoss diversity. In general, levels of tropical and boreal peatmoss molecular diversity were comparable. Two species, S. sericeum from the Old World tropics and S. lapazense from Bolivia, are remarkably divergent in nucleotide sequences from all other Sphagna and together account for almost 20% of all peatmoss diversity, although they are represented by only three of the 436 accessions (0.7%). These species clearly demonstrate the nonequivalence of species biodiversity value. 相似文献

10.

Chromatid structure: relationship between DNA content and nucleotide sequence diversity 总被引：15，自引：1，他引：14

Charles D. Laird 《Chromosoma》1971,32(4):378-406

Models of chromatid structure are based on inferences made from genetic, cytological, and cytochemical observations. An alternative approach can provide limits as to the number of identical subunits present in chromatids. This method is based on the demonstration that nucleotide sequence diversity may be estimated from the kinetics of renaturation of denatured DNA. Measurements of DNA content and renaturation rate constants are given for several eukaryotic DNAs. Control experiments involved measurements of renaturation kinetics of DNAs from bacteria and bacteriophage. These estimates show that most of the nucleotide sequences in mouse, Drosophila, and Ciona DNA are present only once per sperm. Since the reduction of DNA content during meiosis indicates that mouse sperm contain a haploid set of chromatids, it follows that a set of mouse meiotic chromatids contains a single copy of most sequences. Models of chromatid structure which postulate multiple subunits with identical nucleotide sequences are therefore not tenable for mouse meiotic chromatids. This method of analyzing nucleotide sequence diversity may be of general use in designing models of chromatid structure in other organisms. 相似文献

11.

Time scale for cyclostome evolution inferred with a phylogenetic diagnosis of hagfish and lamprey cDNA sequences

Kuraku S Kuratani S 《Zoological science》2006,23(12):1053-1064

The Cyclostomata consists of the two orders Myxiniformes (hagfishes) and Petromyzoniformes (lampreys), and its monophyly has been unequivocally supported by recent molecular phylogenetic studies. Under this updated vertebrate phylogeny, we performed in silico evolutionary analyses using currently available cDNA sequences of cyclostomes. We first calculated the GC-content at four-fold degenerate sites (GC(4)), which revealed that an extremely high GC-content is shared by all the lamprey species we surveyed, whereas no striking pattern in GC-content was observed in any of the hagfish species surveyed. We then estimated the timing of diversification in cyclostome evolution using nucleotide and amino acid sequences. We obtained divergence times of 470-390 million years ago (Mya) in the Ordovician-Silurian-Devonian Periods for the interordinal split between Myxiniformes and Petromyzoniformes; 90-60 Mya in the Cretaceous-Tertiary Periods for the split between the two hagfish subfamilies, Myxininae and Eptatretinae; 280-220 Mya in the Permian-Triassic Periods for the split between the two lamprey subfamilies, Geotriinae and Petromyzoninae; and 30-10 Mya in the Tertiary Period for the split between the two lamprey genera, Petromyzon and Lethenteron. This evolutionary configuration indicates that Myxiniformes and Petromyzoniformes diverged shortly after the common ancestor of cyclostomes split from the future gnathostome lineage. Our results also suggest that intra-subfamilial diversification in hagfish and lamprey lineages (especially those distributed in the northern hemisphere) occurred in the Cretaceous or Tertiary Periods. 相似文献

12.

An in-silico study of alphaherpesviruses ICP0 genes: positive selection or strong mutational GC-pressure?

Khrustalev VV Barkovsky EV 《IUBMB life》2008,60(7):456-460

The purpose of our work was to analyze the case of the strong mutational GC-pressure influence on the ratio between nonsynonymous (DN) and synonymous (DS) distances (DN/DS ratio). We have used as the material the genes coding for ICP0 from five completely sequenced genomes of simplexviruses. DN/DS ratio, total GC-content (G + C), and GC-content in first, second, and third codon positions (1GC, 2GC, and 3GC, respectively) have been calculated separately for exon 2, nonconserved part of exon 3, and conserved part of exon 3 from ICP0 genes. Results showed that DN is more than DS only in the conserved part of exon 3 of ICP0 genes from cercopithecine herpesvirus 2 and cercopithecine herpesvirus 16. However, the cause of this result (DN/DS = 2.54) is the GC-pressure acting on the coding districts with 3GC = 99% rather than the biological process called positive selection. Only in these two viruses, because of the strong GC-pressure, 3GC has reached 99% in the conserved part of ICP0 exon 3, and so nucleotide substitutions that increase the GC-content practically cannot occur in third codon positions, where most substitutions are synonymous. In this case, GC-pressure has a substrate for nucleotide substitutions only in first and second codon positions, where most substitutions are nonsynonymous. 相似文献

13.

Relative contributions of germline gene variation and somatic mutation to immunoglobulin diversity in the mouse 总被引：3，自引：1，他引：2

Gojobori T; Nei M 《Molecular biology and evolution》1986,3(2):156-167

The relative contributions of germline gene variation and somatic mutation to immunoglobulin diversity were studied by comparing germline gene sequences with their rearranged counterparts for the mouse VH, V kappa, and V lambda genes. The mutation rate at the amino acid level was estimated to be 7.0% in the first and second complementarity- determining regions (CDRs) and 2.0% in the framework regions (FRs). The difference in the mutation rate at the nucleotide level between the CDRs and FRs was of the same order of magnitude as that for the amino acid level. Analysis of amino acid diversity or nucleotide diversity indicated that the contribution of somatic mutation to immunoglobulin diversity is approximately 5%. However, the contribution of somatic mutation to the number of different amino acid sequences of immunoglobulins is much larger than that estimated by the analysis of amino acid diversity, and more than 90% of the different immunoglobulins seem to be generated by somatic mutation. Examination of the pattern of nucleotide substitution has suggested that clonal selection after somatic mutation may not be as strong as generally believed. 相似文献

14.

CodSeqGen: A tool for generating synonymous coding sequences with desired GC-contents

《Genomics》2020,112(1):237-242

相似文献

15.

Using Piper Species Diversity to Identify Conservation Priorities in the Chocó Region of Colombia

M. Alejandra Jaramillo 《Biodiversity and Conservation》2006,15(5):1695-1712

The forests of the Chocó Region are among the most diverse in the world, however they are under imminent threat of significant degradation. This study uses species diversity and phylogenetic data in the plant genus Piper to select areas of maximum biological diversity to be considered as conservation priorities. Species distributions were obtained from herbarium collections and the literature. A molecular phylogeny based on nucleotide sequence data from the ITS region of the nuclear genome was used to estimate phylogenetic diversity indices. Three diversity indices were estimated: total species richness, number of endemic species, and phylogenetic diversity. Area selection was conducted by maximizing the total value for these indices and also by complementarity. Four regions were selected as the highest conservation priorities: the vicinity of Buenaventura, the Rio San Juan watershed (south of Quibdó), the department of Nariño, and the Rio Atrato watershed. All of them had the highest rankings for all or some of the diversity indices evaluated. Furthermore, this study shows that in the Chocó Region, Piper phylogenetic diversity increases with total number of species, but decreases with the proportion of endemics. 相似文献

16.

Assessing the nucleotide diversity of three aphid species by RAPD

D. Martínez-Torres R. Carri A. Latorre J. C. Simon A. Hermoso A. Moya 《Journal of evolutionary biology》1997,10(4):459-477

A method is presented for the estimation of nucleotide diversity and genetic structure of populations from RAPD (random amplified polymorphic DNA) data. It involves a modification of the technique developed by Lynch and Crease (1990) for the case of restriction sites as survey data. As new elements the method incorporates (i) dominance correction, (ii) values of asexual reproduction of the populations sampled, and (iii) an analytical variance of the number of nucleotide substitutions per site. Sampling was carried out at two geographic scales for three aphid species. At a macrogeographic scale, populations of Rhopalosiphum padi did not show statistical genetic differentiation. Aphis gossypii and Myzus persicae, which were sampled at a microgeographic scale, showed a higher genetic differentiation than R. padi, it being statistically significant in M. persicae. The major sources of sampling variance within- and between-populations were found to be nucleotide (i.e., the number of alleles used as a function of the number of primers used) and population (i.e., sample size) sampling. Extremely low estimates of nucleotide diversity were obtained for the species studied here. This result is consistent with previous reports on genetic diversity for the same or other aphid species which were based on allozyme polymorphism, mitochondrial DNA variation and qualitative analyses of RAPDs. 相似文献

17.

Distribution of words with a predefined range of mismatches to a DNA probe in bacterial genomes

Melko OM Mushegian AR 《Bioinformatics (Oxford, England)》2004,20(1):67-74

MOTIVATION: Hybridization of oligonucleotides with longer nucleotide sequences is an essential step in nucleic acid biosynthesis in vitro and in vivo, in oligonucleotide-based diagnostics, and in therapeutic applications of oligonucleotides. A major factor determining sensitivity and selectivity of hybridization is the number of base pair mismatches that occur in an ungapped alignment of the oligonucleotide (probe) and a longer sequence (target). RESULTS: The k-distance match count between the probe and the target is defined as the number of ungapped alignments between the two sequences that have exactly k mismatches, and the k-neighbor match count is defined as the sum of the j-distance match counts for j between 0 and k. We derive a novel formula for the probability of a k-distance match. This formula is based on the assumption that the target is strand-symmetric Bernoulli text (i.e. nucleotides are independently, identically distributed in the target and satisfy Chargaff's second parity rule). Our model predicts that the GC-content in both the probe and the target significantly affects the match count expectation. The ratio of k-neighbor match counts in two distinct genomes for a given probe is a measure of its specificity. We calculated such ratios for pairs of bacterial genomes with different combinations of length, GC-content and phylogenetic distance. Examination of the extreme values of these ratios indicates that probes with a high discriminative power exist for each tested pair. 相似文献

18.

Comparison of Sample Preparation Methods Used for the Next-Generation Sequencing of Mycobacterium tuberculosis

Andrea D. Tyler Sara Christianson Natalie C. Knox Philip Mabon Joyce Wolfe Gary Van Domselaar Morag R. Graham Meenu K. Sharma 《PloS one》2016,11(2)

The advent and widespread application of next-generation sequencing (NGS) technologies to the study of microbial genomes has led to a substantial increase in the number of studies in which whole genome sequencing (WGS) is applied to the analysis of microbial genomic epidemiology. However, microorganisms such as Mycobacterium tuberculosis (MTB) present unique problems for sequencing and downstream analysis based on their unique physiology and the composition of their genomes. In this study, we compare the quality of sequence data generated using the Nextera and TruSeq isolate preparation kits for library construction prior to Illumina sequencing-by-synthesis. Our results confirm that MTB NGS data quality is highly dependent on the purity of the DNA sample submitted for sequencing and its guanine-cytosine content (or GC-content). Our data additionally demonstrate that the choice of library preparation method plays an important role in mitigating downstream sequencing quality issues. Importantly for MTB, the Illumina TruSeq library preparation kit produces more uniform data quality than the Nextera XT method, regardless of the quality of the input DNA. Furthermore, specific genomic sequence motifs are commonly missed by the Nextera XT method, as are regions of especially high GC-content relative to the rest of the MTB genome. As coverage bias is highly undesirable, this study illustrates the importance of appropriate protocol selection when performing NGS studies in order to ensure that sound inferences can be made regarding mycobacterial genomes. 相似文献

19.

Estimation of evolutionary distance between nucleotide sequences 总被引：34，自引：9，他引：25

F Tajima M Nei 《Molecular biology and evolution》1984,1(3):269-285

A mathematical formula for estimating the average number of nucleotide substitutions per site (delta) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as delta is equal to or smaller than 1. Furthermore, the frequency of cases to which the formula is inapplicable is much lower than that for other similar methods recently proposed. This point is illustrated using insulin genes. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. Application of this method to globin gene data indicates that the number of nucleotide changes per site increases with evolutionary time but the pattern of the increase is quite irregular. 相似文献

20.

Average weighted nucleotide diversity is more precise than pixy in estimating the true value of π from sequence sets containing missing data

Maciej K. Konopiński 《Molecular ecology resources》2023,23(2):348-354

Nucleotide diversity remains an important statistic in population genetic/genomic studies. Although recent advances in massive sequencing make generating sequence data sets cheaper and faster, currently used technologies often introduce substantial amounts of missing nucleotides in their output. A novel method of estimating π from data sets containing missing data – pixy - has also recently been proposed. In this study, the pixy estimator, π_pixy, was compared to average weighted nucleotide diversity, π_W. The estimators were tested both on sequences simulated in fastsimcoal and real sequence sets. Both sets were modified by random insertion of missing nucleotides. Weighted nucleotide diversity performed better in all pairwise comparisons. It was characterized by a smaller error and a narrower distribution of the results. π_pixy tends to overestimate the nucleotide diversity when both the proportion of missing data and the level of variation is low. Of the two estimators, only π_W estimated the true nucleotide diversity in a part of the simulations. A simple formula for estimating π_W allows for easy integration of the estimator in packages such as pixy, which would allow obtaining more precise estimates of nucleotide diversity either in a sliding window or for discrete genomic regions. 相似文献