首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
 The distribution of the number of segregating sites among randomly sampled DNA sequences from a geographically structured population is studied. We assume the infinitely-many-sites model of neutral genes and no recombination. Employing the genealogical process, we derive an equation for the generating function of the distribution of the number of segregating sites. First we study the strong-migration limit and prove that the distribution converges to that for a panmictic population. We also study the case of two sampled DNA sequences in the d-dimensional torus model with homogeneous migration. Received 13 July 1995; received in revised form 21 April 1997  相似文献   

2.
F. Tajima 《Genetics》1989,123(1):229-240
Using the two subpopulation model, the expected numbers of segregating sites in a number of DNA sequences randomly sampled from a subdivided population were examined for several types of population subdivisions. It is shown that, in the case where the pattern of migration is symmetrical such as the finite island model, the expected number of segregating sites is independent of the migration rate when two or three DNA sequences are randomly sampled from the same subpopulation, but depends on the migration rate when more than three DNA sequences are sampled. It is also shown that the population subdivision can increase the amount of DNA polymorphism even in a subpopulation in some cases.  相似文献   

3.
The extent to which natural selection shapes diversity within populations is a key question for population genetics. Thus, there is considerable interest in quantifying the strength of selection. A full likelihood approach for inference about selection at a single site within an otherwise neutral fully linked sequence of sites is described here. A coalescent model of evolution is used to model the ancestry of a sample of DNA sequences which have the selected site segregating. The mutation model, for the selected and neutral sites, is the infinitely many-sites model where there is no back or parallel mutation at sites. A unique perfect phylogeny, a gene tree, can be constructed from the configuration of mutations on the sample sequences under this model of mutation. The approach is general and can be used for any bi-allelic selection scheme. Selection is incorporated through modelling the frequency of the selected and neutral allelic classes stochastically back in time, then using a subdivided population model considering the population frequencies through time as variable population sizes. An importance sampling algorithm is then used to explore over coalescent tree space consistent with the data. The method is applied to a simulated data set and the gene tree presented in Verrelli et al. (2002).  相似文献   

4.
By means of simulations and DNA sequence analyses, standardized identity excess (a measure of linkage disequilibrium) between segregating nucleotide sites was studied as an effort to quantify the patchwork pattern among alleles of the major histocompatibility complex loci. It was found that the pattern under selective neutrality, and/or no intralocus recombination does not fit the observed pattern based on DNA sequences. However, the intensity and type of selection and the rate of recombination are difficult to estimate by comparing simulation results with the observed pattern. Received: 10 December 1999 / Accepted: 2 March 2000  相似文献   

5.
R Nielsen  D M Weinreich 《Genetics》1999,153(1):497-506
McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this age-of-mutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.  相似文献   

6.
Y. X. Fu 《Genetics》1997,146(4):1489-1499
A coalescent theory for a sample of DNA sequences from a partially selfing diploid population and an algorithm for simulating such samples are developed in this article. Approximate formulas are given for the expectation and the variance of the number of segregating sites in a sample of k sequences from n individuals. Several new estimators of the important parameters θ = 4Nμ and the selfing rate s, where N and μ are, respectively, the effective population size and the mutation rate per sequence per generation, are proposed and their sampling properties are studied.  相似文献   

7.
The Effect of Change in Population Size on DNA Polymorphism   总被引:61,自引:15,他引:46       下载免费PDF全文
F. Tajima 《Genetics》1989,123(3):597-601
The expected number of segregating sites and the expectation of the average number of nucleotide differences among DNA sequences randomly sampled from a population, which is not in equilibrium, have been developed. The results obtained indicate that, in the case where the population size has changed drastically, the number of segregating sites is influenced by the size of the current population more strongly than is the average number of nucleotide differences, while the average number of nucleotide differences is affected by the size of the original population more severely than is the number of segregating sites. The results also indicate that the average number of nucleotide differences is affected by a population bottleneck more strongly than is the number of segregating sites.  相似文献   

8.
Wang J 《Genetics》2006,173(3):1679-1692
A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies.  相似文献   

9.
Brandström M  Ellegren H 《Genetics》2007,176(3):1691-1701
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations.  相似文献   

10.
Summary Our previous experiments on maize (Zea mays L.) plants regenerated from tissue culture revealed genetic activity characteristic of the transposable element Activator (Ac) in the progeny of 2–3% of the plants tested, despite the lack of Ac activity in the progenitor plants. The objective of the present study was to determine whether the presence of Ac activity in tissue-culture-derived plants was associated with changes in the number or structure of Ac-homologous DNA sequences. Families segregating for Ac activity were obtained by crossing plants heterozygous for Ac activity onto Ac-responsive tester plants. A DNA probe derived from a previously isolated Ac sequence was used to examine the Ac-homologous sequences within individual progeny seedlings of segregating families and noncultured control materials. All plants tested had six or more Ac-homologous DNA sequences, regardless of whether Ac activity was present. In the segregating progeny of one tissue-culturederived plant, a 30-kb Ac-homologous SstI restriction fragment and a 10-kb Ac-homologous BglII restriction fragment were found to cosegregate with Ac activity. We propose that these fragments contained a previously silent Ac sequence that had been activated during tissue culture. Although one or more Ac sequences were often hypomethylated at internal PvuII and HpaII sites in plants with Ac activity, hypomethylation was not a prerequisite for activity. Reduced methylation at these sites may have been a result rather than a cause of Ac activity.  相似文献   

11.
Unknown and foreign viruses can be detected using degenerate primers targeted at conserved sites in the known viral gene sequences. Conserved sites are found by comparing sequences and so the usefulness of a set of primers depends crucially on how well the known sequences represent the target group including unknown sequences. METHODOLOGY/PRINCIPAL FINDINGS: We developed a method for assessing the apparent stability of consensus sequences at sites over time using deposition dates from Genbank. We tested the method using 17 conserved sites in potyvirus genomes. The accumulation of knowledge of sequence variants over 20 years caused 'consensus decay' of the sites. Rates of decay were rapid at all sites but varied widely and as a result, the ranking of the most conserved sites changed. The discovery and reporting of sequences from previously unknown and distinct species, rather than from strains of known species, dominated the decay, indicating it was largely a sampling effect related to the progressive discovery of species, and recent virus mutation was probably only a minor contributing factor. CONCLUSION/SIGNIFICANCE: We showed that in the past, the sampling bias has misled the choice of the most conserved target sites for genus specific degenerate primers. The history of sequence discoveries indicates primer designs should be updated regularly and provides an additional dimension for improving the design of degenerate primers.  相似文献   

12.
A computer program (PINCERS) is described for use in the design of synthetic genes and mixed-probe DNA sequences. A protein sequence is reverse translated with generation of synonymous codons at each position producing a degenerate sequence. In order to locate potential restriction enzyme sites, the degenerate sequence is searched with a library of restriction enzymes for sites that utilize any combination of synonymous codons. These sites are indicated in a map so that they may be incorporated into the synthetic gene sequence. The program allows the user to select the appropriate codon usage table for the organism of interest and then to set a threshold usage frequency below which codons are not generated. PINCERS may also be used to assist in planning the synthesis of mixed-probe DNA sequences for cross-hybridization experiments. It can identify regions of specified length with the protein sequence that have the least overall degeneracy, thereby minimizing the number of probes to be synthesized and, therefore, maximizing the concentration of a given probe sequence.  相似文献   

13.
Statistical Properties of a DNA Sample under the Finite-Sites Model   总被引:1,自引:0,他引:1       下载免费PDF全文
Z. Yang 《Genetics》1996,144(4):1941-1950
Statistical properties of a DNA sample from a random-mating population of constant size are studied under the finite-sites model. It is assumed that there is no migration and no recombination occurs within the locus. A Markov process model is used for nucleotide substitution, allowing for multiple substitutions at a single site. The evolutionary rates among sites are treated as either constant or variable. The general likelihood calculation using numerical integration involves intensive computation and is feasible for three or four sequences only; it may be used for validating approximate algorithms. Methods are developed to approximate the probability distribution of the number of segregating sites in a random sample of n sequences, with either constant or variable substitution rates across sites. Calculations using parameter estimates obtained for human D-loop mitochondrial DNAs show that among-site rate variation has a major effect on the distribution of the number of segregating sites; the distribution under the finite-sites model with variable rates among sites is quite different from that under the infinite-sites model.  相似文献   

14.
Preferential binding of the SeqA protein to hemi-methylated GATC sequences functions as a negative regulator for Escherichia coli initiation of chromosomal replication at oriC and is implicated in segregating replicated chromosomes for cell division. We demonstrate that sequential binding of one SeqA tetramer to a set of two hemi-methylated sites mediates formation of higher-order complexes. The absence of cross-binding to separate DNAs suggests that two monomers of a SeqA tetramer bind to two hemi-methylated sites on DNA. The interaction among SeqA proteins bound to at least six adjacent hemi-methylated sites induces aggregation of free proteins to bound proteins. Aggregation might be indicative of SeqA foci, which appear to track replication forks in vivo. Studies of the properties of SeqA binding will contribute to our understanding of the function of SeqA.  相似文献   

15.
Topoisomerase I-mediated integration of hepadnavirus DNA in vitro.   总被引:14,自引:4,他引:10  
Hepadnaviruses integrate in cellular DNA via an illegitimate recombination mechanism, and clonally propagated integrations are present in most hepatocellular carcinomas which arise in hepadnavirus carriers. Although integration is not specific for any viral or cellular sequence, highly preferred integration sites have been identified near the DR1 and DR2 sequences and in the cohesive overlap region of virion DNA. We have mapped a set of preferred topoisomerase I (Topo I) cleavage sites in the region of DR1 on plus-strand DNA and in the cohesive overlap near DR2 and have tested whether Topo I is capable of mediating illegitimate recombination of woodchuck hepatitis virus (WHV) DNA with cellular DNA by developing an in vitro assay for Topo I-mediated linking. Four in vitro-generated virus-cell hybrid molecules have been cloned, and sequence analysis demonstrated that Topo I can mediate both linkage of WHV DNA to 5'OH acceptor ends of heterologous DNA fragments and linkage of WHV DNA into internal sites of a linear double-stranded cellular DNA. The in vitro integrations occurred at preferred Topo I cleavage sites in WHV DNA adjacent to the DR1 and were nearly identical to a subset of integrations cloned from hepatocellular carcinomas. The end specificity and polarity of viral sequences in the integrations allows us to propose a prototype integration mechanism for both ends of a linearized hepadnavirus DNA molecule.  相似文献   

16.
The folding of DNA on the nucleosome core particle governs many fundamental issues in eukaryotic molecular biology. In this study, an updated set of sequence-dependent empirical “energy” functions, derived from the structures of other protein-bound DNA molecules, is used to investigate the extent to which the architecture of nucleosomal DNA is dictated by its underlying sequence. The potentials are used to estimate the cost of deforming a collection of sequences known to bind or resist uptake in nucleosomes along various left-handed superhelical pathways and to deduce the features of sequence contributing to a particular structural form. The deformation scores reflect the choice of template, the deviations of structural parameters at each step of the nucleosome-bound DNA from their intrinsic values, and the sequence-dependent “deformability” of a given dimer. The correspondence between the computed scores and binding propensities points to a subtle interplay between DNA sequence and nucleosomal folding, e.g., sequences with periodically spaced pyrimidine-purine steps deform at low cost along a kinked template whereas sequences that resist deformation prefer a smoother spatial pathway. Successful prediction of the known settings of some of the best-resolved nucleosome-positioning sequences, however, requires a template with “kink-and-slide” steps like those found in high-resolution nucleosome structures.  相似文献   

17.
Y. X. Fu 《Genetics》1996,144(2):829-838
The number of segregating sites in a sample of DNA sequences and the age of the most recent common ancestor (MRCA) of the sequences in the sample are positively correlated. The value of the former can be used to estimate the value of the latter. Using the coalescent approach, we derive in this paper the joint probability distribution of the number of segregating sites and the age of the MRCA of a sample under the neutral Wright-Fisher model. From this distribution, we are able to compute the likelihood function of the number of segregating sites and the posterior probability of the age of the MRCA of a sample. Three point estimators and one interval estimator of the age of the MRCA are developed; their relationships and properties are investigated. The estimation of the age of the MRCA of human Y chromosomes from a sample of no variation is discussed.  相似文献   

18.
Human mitochondrial DNA (mtDNA) sequences reveal an abundance of polymorphic sites in which the frequencies of the segregating bases are very different. A typical polymorphism involves one base at low frequency and the other base at high frequency. In contrast, nuclear gene data sets tend to show an excess of polymorphisms in which both segregating bases are at intermediate frequencies. A new statistical test of this difference finds significant differences between mtDNA and nuclear gene data sets reported in the literature. However, differences in the polymorphism patterns could be caused by different sample origins for the different data sets. To examine the mtDNA-nuclear difference more closely, DNA sequences were generated from a portion of the X-linked pyruvate dehydrogenase E1 alpha subunit (PDHA1) locus and from a portion of mitochondrial control region I (CRI) from each of eight individuals, four from sub-Saharan Africa. The two genes revealed a significant difference in the site frequency distribution of polymorphic sites. PDHA1 revealed an excess of intermediate-frequency polymorphisms, while CRI showed an excess of sites with the low-high frequency pattern. The discrepancy suggests that mitochondrial variation has been shaped by natural selection, and may not be ideal for some questions on human origins.   相似文献   

19.
Human prolactin. cDNA structural analysis and evolutionary comparisons   总被引:33,自引:0,他引:33  
Prolactin (Prl), growth hormone, and chorionic sommatomammotropin form a set (the "Prl set") of hormones which is thought to have evolved from a common ancestral gene. This assumption is based on several lines of evidence: overlap in their biological and immunological properties, similarities in their amino acid sequences, and homologies in the nucleic acid sequences of their structural genes. In the current study we report the cloning, amplification in bacteria, and sequence analysis of DNA complementary to Prl mRNA isolated from human pituitary Prl-secreting adenomas. The cloned DNA contains 914 bases, which includes the entire coding sequence of human prePrl as well as portions of the 5- and 3'-untranslated regions of the mRNA. The amino acid sequence predicted by our data differs from a previously reported amino acid sequence in 8 positions. With the results of this study we can now compare in one species the nucleotide sequences of the structural gene coding for each of the hormones of the Prl set. The sequence divergence at replacement sites is used to establish an evolutionary clock for the Prl set of genes. Using this clock, we postulate that the chromosomal segregation of human Prl and human growth hormone occurred about 392 million years ago and that growth hormone and chorionic sommatomammotropin underwent an intrachromosomal recombination within the last 10 million years.  相似文献   

20.
ABSTRACT: BACKGROUND: A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). RESULTS: The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar 'Chandler' were mapped to 48,661 'Chandler' bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to genotype a walnut mapping population having 'Chandler' as one of the parents. Genotyping results were used to adjust the filtering parameters of the updated AGSNP pipeline. With the adjusted filtering criteria, 69.6% of SNPs discovered with the updated pipeline were real and could be mapped on the walnut genetic map. A total of 13,439 SNPs were discovered by BES re-sequencing. BESs harboring SNPs were in 677 FPC contigs covering 98% of the physical map of the walnut genome. CONCLUSION: The updated AGSNP pipeline is a versatile SNP discovery tool for a high-throughput, genome-wide SNP discovery in both autogamous and allogamous species. With this pipeline, a large set of SNPs were identified in a single walnut cultivar.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号