首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recurrent deletions have been associated with numerous diseases and genomic disorders. Few, however, have been resolved at the molecular level because their breakpoints often occur in highly copy-number-polymorphic duplicated sequences. We present an approach that uses a combination of somatic cell hybrids, array comparative genomic hybridization, and the specificity of next-generation sequencing to determine breakpoints that occur within segmental duplications. Applying our technique to the 17q21.31 microdeletion syndrome, we used genome sequencing to determine copy-number-variant breakpoints in three deletion-bearing individuals with molecular resolution. For two cases, we observed breakpoints consistent with nonallelic homologous recombination involving only H2 chromosomal haplotypes, as expected. Molecular resolution revealed that the breakpoints occurred at different locations within a 145 kbp segment of >99% identity and disrupt KANSL1 (previously known as KANSL1). In the remaining case, we found that unequal crossover occurred interchromosomally between the H1 and H2 haplotypes and that this event was mediated by a homologous sequence that was once again missing from the human reference. Interestingly, the breakpoints mapped preferentially to gaps in the current reference genome assembly, which we resolved in this study. Our method provides a strategy for the identification of breakpoints within complex regions of the genome harboring high-identity and copy-number-polymorphic segmental duplication. The approach should become particularly useful as high-quality alternate reference sequences become available and genome sequencing of individuals'' DNA becomes more routine.  相似文献   

2.
The analysis of evolutionary rates is a popular approach to characterizing the effect of natural selection at the molecular level. Sequences contributing to species adaptation are expected to evolve faster than nonfunctional sequences because favourable mutations have a higher fixation probability than neutral ones. Such an accelerated rate of evolution might be due to factors other than natural selection, in particular GC-biased gene conversion. This is true of neutral sequences, but also of constrained sequences, which can be illustrated using the mouse Fxy gene. Several criteria can discriminate between the natural selection and biased gene conversion models. These criteria suggest that the recently reported human accelerated regions are most likely the result of biased gene conversion. We argue that these regions, far from contributing to human adaptation, might represent the Achilles' heel of our genome.  相似文献   

3.
Previous reports of preferential transmission of bipolar affective disorder (BP) from the maternal versus the paternal lines in families suggested that this disorder may be caused by mitochondrial DNA mutations. We have sequenced the mitochondrial genome in 25 BP patients with family histories of psychiatric disorder that suggest matrilineal inheritance. No polymorphism identified more than once in this sequencing showed any significant association with BP in association studies using 94 cases and 94 controls. To determine whether our BP sample showed evidence of selection against the maternal lineage, we determined genetic distances between all possible pairwise comparisons within the BP and control groups, based on multilocus mitochondrial polymorphism haplotypes. These analyses revealed fewer closely related haplotypes in the BP group than in the matched control group, suggesting selection against maternal lineages in this disease. Such selection is compatible with recurrent mitochondrial mutations, which are associated with slightly decreased fitness. Although such mismatch distribution comparisons have been used previously for analyses of population histories, this is, as far as we are aware, the first report of this method being used to study disease.  相似文献   

4.
Human polymorphisms originate as mutations, and the influence of context on mutagenesis should be reflected in the distribution of sequences surrounding single nucleotide polymorphisms (SNPs). We have performed a computational survey of nearly two million human SNPs to determine if sequence-dependent hotspots for polymorphism exist in the human genome. Here we show that sequences containing CpG dinucleotides, which occur at low frequencies in the human genome, are 6.7-fold more abundant at polymorphic sites than expected. In contrast, polymorphisms in CpG sequences located within CpG islands, important regulatory regions that modulate gene expression, are 6.8-fold less prevalent than expected. The distribution of polymorphic alleles at CpGs in CpG islands is also significantly different from that in non-island regions. These data strongly support a role for 5-methylcytosine deamination in the generation of human variation, and suggest that variation at CpGs in islands is suppressed.  相似文献   

5.
Canine X-linked progressive retinal atrophy (XLPRA) is an inherited blinding disorder caused by mutations in the ORF15 of the RPGR gene and homolog to human retinitis pigmentosa 3 (RP3). The disease is observed in 2 variations, XLPRA1 in Siberian husky and samoyed and XLPRA2 derived from mongrel dogs. A third, neutral, deletion has been described in red wolves. Haplotype analysis of the 633-kbp RP3 interval in 6 different canidae confirmed the same decent for the XLPRA1 mutation in both affected breeds but suggests a recent and independent origin for both forms of XLPRA. The RP3 interval was excluded from causative associations with blindness in the red wolf and akita, a breed closely related to Nordic sled dogs. Overall, these data suggest a limited distribution of the affected haplotypes and indicate that mutations in the ORF15 are likely to be limited to the described dog breeds.  相似文献   

6.
We present models describing the acquisition and deletion of novel sequences in populations of microorganisms. We infer that most novel sequences are neutral. Thus, sequence duplications and gene transfer between organisms sharing the same environment are rarely expected to generate adaptive functions. Two classes of models are considered: (1) a homogeneous population with constant size, and (2) an island model in which the population is subdivided into patches that are in contact through slow migration. Distributions of gene frequencies are derived in a Moran model with overlapping generations. We find that novel, neutral or near-neutral coding sequences in microorganisms will not be fixed globally because they offer large target sizes for mutations and because the populations are so large. At most, such genes may have a transient presence in only a small fraction of the population. Consequently, a microbial population is expected to have a very large diversity of transient neutral gene content. Only sequences that are under strong selection, globally or in individual patches, can be expected to persist. We suggest that genome size is maintained in microorganisms by a quasi-steady state mechanism in which random fluctuations in the effective acquisition and deletion rates result in genome sizes that vary from patch to patch. We assign the genomic identity of a global population to those genes that are required for the participation of patches in the genetic sweeps that maintain the genomic coherence of the population. In contrast, we stress the influence of sequence loss on the isolation and the divergence (speciation) of novel patches from a global population.  相似文献   

7.
We have developed a mathematical algorithm to implement a method for localizing mutations using haplotype analysis. Our strategy infers haplotypes based on the determination of genotypes of a proximal and a distal marker for 21 chromosomal intervals distributed across the mouse genome (corresponding to two intervals for Chromosomes (Chrs) 1 and 2 and one for the remaining 17 autosomes). To simulate the analysis of mice homozygous for recessive mutations, we tested the efficacy of our method on over 200 data sets generated from two independent mapping panel data sets containing the genotypes of 46 F2 progeny of an intercross and 94 F2 progeny of a backcross. In all cases we were able to identify the chromosomal interval carrying the recessive mutation despite the fact that some of the data sets consisted of as few as 10 meioses. Our strategy proved sensitive and expedient, since the simulated genome-wide screen could be executed by genotype analysis of 40 microsatellite markers in small numbers of intercross or backcross progeny. Received: 2 June 1997 / Accepted: 22 October 1997  相似文献   

8.
High-throughput sequencing (HTS) metabarcoding is commonly applied to assess phytoplankton diversity. Usually, haplotypes are grouped into operational taxonomic units (OTUs) through clustering, whereby the resulting number of OTUs depends on chosen similarity thresholds. We applied, instead, a phylogenetic approach to infer taxa among 18S rDNA V4-metabarcode haplotypes gathered from 48 time-series samples using the marine planktonic diatoms Chaetoceros and Bacteriastrum as test case. The 73 recovered taxa comprised both solitary haplotypes and polytomies, the latter composed each of a highly abundant, dominant haplotype and one to several minor, peripheral haplotypes. The solitary and dominant haplotypes usually matched reference sequences, enabling species assignation of taxa. We hypothesise that the super-abundance of reads in dominant haplotypes results from the homogenization effect of concerted evolution. Reads of populous peripheral haplotypes and dominant haplotypes show comparable distribution patterns over the sample dates, suggesting that they are part of the same population. Many taxa revealed marked seasonality, with closely related ones generally showing distinct periodicity, whereas others occur year-round. Phylogenies inferred from metabarcode haplotypes enable delineation of biologically meaningful taxa, whereas OTUs resulting from clustering algorithms often deviate markedly from such taxa.  相似文献   

9.
10.
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.  相似文献   

11.
Direct polymerase chain reaction (PCR) detection of insertion/deletion (indel) polymorphisms requires sample homozygosity. For the indel polymorphisms that have the deletion allele with a relatively low frequency in the autosomal regions, direct PCR detection becomes difficult or impossible. The present study is, to our knowledge, the first designed to directly detect indel polymorphisms in a human autosomal region (i.e., the immunoglobulin V(H) region), through use of single haploid sperm cells as subjects. Unique marker sequences (n=32), spaced at approximately 5-kb intervals, were selected near the 3' end of the V(H) region. A two-round multiplex PCR protocol was used to amplify these sequences from single sperm samples from nine unrelated healthy donors. The parental haplotypes of the donors were determined by examining the presence or absence of these markers. Seven clustered markers in 6 of the 18 haplotypes were missing and likely represented a 35-40-kb indel polymorphism. The genotypes of the donors, with respect to this polymorphism, perfectly matched the expectation under Hardy-Weinberg equilibrium. Three V(H) gene segments, of which two are functional, are affected by this polymorphism. According to these results, >10% of individuals in the human population may not have these gene segments in their genome, and approximately 44% may have only one copy of these gene segments. The biological impact of this polymorphism would be very interesting to study. The approach used in the present study could be applied to understand the physical structure and diversity of all other autosomal regions.  相似文献   

12.
Gompert Z  Buerkle CA 《Genetics》2011,187(3):903-917
The demography of populations and natural selection shape genetic variation across the genome and understanding the genomic consequences of these evolutionary processes is a fundamental aim of population genetics. We have developed a hierarchical Bayesian model to quantify genome-wide population structure and identify candidate genetic regions affected by selection. This model improves on existing methods by accounting for stochastic sampling of sequences inherent in next-generation sequencing (with pooled or indexed individual samples) and by incorporating genetic distances among haplotypes in measures of genetic differentiation. Using simulations we demonstrate that this model has a low false-positive rate for classifying neutral genetic regions as selected genes (i.e., Φ(ST) outliers), but can detect recent selective sweeps, particularly when genetic regions in multiple populations are affected by selection. Nonetheless, selection affecting just a single population was difficult to detect and resulted in a high false-negative rate under certain conditions. We applied the Bayesian model to two large sets of human population genetic data. We found evidence of widespread positive and balancing selection among worldwide human populations, including many genetic regions previously thought to be under selection. Additionally, we identified novel candidate genes for selection, several of which have been linked to human diseases. This model will facilitate the population genetic analysis of a wide range of organisms on the basis of next-generation sequence data.  相似文献   

13.
Analyses of mitochondrial DNA (mtDNA) sequences have revealed non-neutral patterns, suggesting that many amino acid mutations in animal mtDNA may be mildly deleterious, but this has not been verified in human clinical series. Since sensorineural hearing impairment (SNHI) is a common manifestation in many of the syndromes caused by mutations in mtDNA, this may be regarded as the phenotype of choice in attempts to detect mutations that may have a mildly deleterious effect on mitochondrial function. We selected 32 subjects from among 117 unrelated SNHI patients with SNHI in maternal relatives by means of family history, determined the entire coding region sequence of mtDNA and compared the sequence variation with that in 32 haplogroup-matched controls taken at random from 192 Finnish sequences. The 32 control sequences differed from the remaining 160 sequences by 36±9 substitutions (mean ± SD), while the difference for the 32 patients was 58±4 substitutions (P=0.005 for difference; Wilcoxon signed rank test). Differences were also found in the number of new haplotypes and new non-synonymous mutations or mutations in tRNA or rRNA genes. A total of 12 rare mtDNA variants were detected in the patients, and only 3 of these were considered to be neutral in effect. It is proposed that increased sequence variation in mtDNA may be a genetic risk factor for SNHI, and the increased frequency of rare haplotypes in these patients points to the presence of mildly deleterious mutations in mtDNA.  相似文献   

14.
Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation.  相似文献   

15.
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.  相似文献   

16.
A stepwise logistic-regression procedure is proposed for evaluation of the relative importance of variants at different sites within a small genetic region. By fitting statistical models with main effects, rather than modeling the full haplotype effects, we generate tests, with few degrees of freedom, that are likely to be powerful for detecting primary etiological determinants. The approach is applicable to either case/control or nuclear-family data, with case/control data modeled via unconditional and family data via conditional logistic regression. Four different conditioning strategies are proposed for evaluation of effects at multiple, closely linked loci when family data are used. The first strategy results in a likelihood that is equivalent to analysis of a matched case/control study with each affected offspring matched to three pseudocontrols, whereas the second strategy is equivalent to matching each affected offspring with between one and three pseudocontrols. Both of these strategies require you be able to infer parental phase (i.e., those haplotypes present in the parents). Families in which phase cannot be determined must be discarded, which can considerably reduce the effective size of a data set, particularly when large numbers of loci that are not very polymorphic are being considered. Therefore, a third strategy is proposed in which knowledge of parental phase is not required, which allows those families with ambiguous phase to be included in the analysis. The fourth and final strategy is to use conditioning method 2 when parental phase can be inferred and to use conditioning method 3 otherwise. The methods are illustrated using nuclear-family data to evaluate the contribution of loci in the HLA region to the development of type 1 diabetes.  相似文献   

17.
An important computational technique for extracting the wealth of information hidden in human genomic sequence data is to compare the sequence with that from the corresponding region of the mouse genome, looking for segments that are conserved over evolutionary time. Moreover, the approach generalises to comparison of sequences from any two related species. The underlying rationale (which is abundantly confirmed by observation) is that a random mutation in a functional region is usually deleterious to the organism, and hence unlikely to become fixed in the population, whereas mutations in a non-functional region are free to accumulate over time.The potential value of this approach is so attractive that the public and private projects to sequence the human genome are now turning to sequencing the mouse, and you will soon be able to compare the human and mouse sequences of your favourite genomic region.We are currently witnessing an explosion of computer tools for comparative analysis of two genomic sequences. Here the capabilities of two new network servers for comparing genomic sequences from any pair of closely related species are sketched.The Syntenic Gene Prediction Program SGP-I utilises sequence comparisons to enhance the ability to locate protein coding segments in genomic data. PipMaker attempts to determine all conserved genomic regions, regardless of their function.  相似文献   

18.
C A Wise  M Sraml  S Easteal 《Genetics》1998,148(1):409-421
To test whether patterns of mitochondrial DNA (mtDNA) variation are consistent with a neutral model of molecular evolution, nucleotide sequences were determined for the 1041 bp of the NADH dehydrogenase subunit 2 (ND2) gene in 20 geographically diverse humans and 20 common chimpanzees. Contingency tests of neutrality were performed using four mutational categories for the ND2 molecule: synonymous and nonsynonymous mutations in the transmembrane regions, and synonymous and nonsynonymous mutations in the surface regions. The following three topological mutational categories were also used: intraspecific tips, intraspecific interiors, and interspecific fixed differences. The analyses reveal a significantly greater number of nonsynonymous polymorphisms within human transmembrane regions than expected based on interspecific comparisons, and they are inconsistent with a neutral equilibrium model. This pattern of excess nonsynonymous polymorphism is not seen within chimpanzees. Statistical tests of neutrality, such as TAJIMA''s D test, and the D and F tests proposed by FU and LI, indicate an excess of low frequency polymorphisms in the human data, but not in the chimpanzee data. This is consistent with recent directional selection, a population bottleneck or background selection of slightly deleterious mutations in human mtDNA samples. The analyses further support the idea that mitochondrial genome evolution is governed by selective forces that have the potential to affect its use as a "neutral" marker in evolutionary and population genetic studies.  相似文献   

19.
General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus) is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP) typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1), homologues of human genes involved in adaptations (e.g. alpha-amylase genes) or in genetic diseases (e.g. Huntingtin and Parkin). Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice may also have a natural origin.  相似文献   

20.
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号