首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Nonallelic homologous recombination (NAHR), occurring between low-copy repeats (LCRs) >10 kb in size and sharing >97% DNA sequence identity, is responsible for the majority of recurrent genomic rearrangements in the human genome. Recent studies have shown that transposable elements (TEs) can also mediate recurrent deletions and translocations, indicating the features of substrates that mediate NAHR may be significantly less stringent than previously believed. Using >4 kb length and >95% sequence identity criteria, we analyzed of the genome-wide distribution of long interspersed element (LINE) retrotransposon and their potential to mediate NAHR. We identified 17 005 directly oriented LINE pairs located <10 Mbp from each other as potential NAHR substrates, placing 82.8% of the human genome at risk of LINE–LINE-mediated instability. Cross-referencing these regions with CNVs in the Baylor College of Medicine clinical chromosomal microarray database of 36 285 patients, we identified 516 CNVs potentially mediated by LINEs. Using long-range PCR of five different genomic regions in a total of 44 patients, we confirmed that the CNV breakpoints in each patient map within the LINE elements. To additionally assess the scale of LINE–LINE/NAHR phenomenon in the human genome, we tested DNA samples from six healthy individuals on a custom aCGH microarray targeting LINE elements predicted to mediate CNVs and identified 25 LINE–LINE rearrangements. Our data indicate that LINE–LINE-mediated NAHR is widespread and under-recognized, and is an important mechanism of structural rearrangement contributing to human genomic variability.  相似文献   

2.
The diploid genome sequence of an individual human   总被引:4,自引:1,他引:3  
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.  相似文献   

3.
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5–20 kb read with 4%–15% error per base), phasing performance was substantially improved.  相似文献   

4.
A simple isothermal nucleic-acid amplification reaction, primer generation–rolling circle amplification (PG–RCA), was developed to detect specific nucleic-acid sequences of sample DNA. This amplification method is achievable at a constant temperature (e.g. 60°C) simply by mixing circular single-stranded DNA probe, DNA polymerase and nicking enzyme. Unlike conventional nucleic-acid amplification reactions such as polymerase chain reaction (PCR), this reaction does not require exogenous primers, which often cause primer dimerization or non-specific amplification. Instead, ‘primers’ are generated and accumulated during the reaction. The circular probe carries only two sequences: (i) a hybridization sequence to the sample DNA and (ii) a recognition sequence of the nicking enzyme. In PG–RCA, the circular probe first hybridizes with the sample DNA, and then a cascade reaction of linear rolling circle amplification and nicking reactions takes place. In contrast with conventional linear rolling circle amplification, the signal amplification is in an exponential mode since many copies of ‘primers’ are successively produced by multiple nicking reactions. Under the optimized condition, we obtained a remarkable sensitivity of 84.5 ymol (50.7 molecules) of synthetic sample DNA and 0.163 pg (~60 molecules) of genomic DNA from Listeria monocytogenes, indicating strong applicability of PG–RCA to various molecular diagnostic assays.  相似文献   

5.
The human genes encoding α1-antitrypsin (α1AT, gene symbol PI), corticosteroid-binding globulin (CBG), α1-antichymotrypsin (AACT), and protein C inhibitor (PCI) are related by descent, and they all map to human chromosome 14q32.1. This serine protease inhibitor (serpin) gene cluster also contains an antitrypsin-related sequence (ATR, gene symbol PIL), but the precise molecular organization of this region has not been defined. In this report we describe the generation and characterization of an 370-kb cosmid contig that includes all five serpin genes. Moreover, a newly described serpin, kallistatin (KAL, gene symbol PI4), was also mapped within the region. Gene order within this interval is cen–CBG–ATR–α1AT–KAL–PCI–AACT–tel. The genes occupy 320 kb of genomic DNA, and they are organized into two discrete subclusters of three genes each that are separated by 170 kb. The distal subcluster includes KAL, PCI, and AACT; it occupies 63 kb of DNA, and all three genes are transcribed in a proximal-to-distal orientation. Within the subcluster, there is 12 kb of intergenic DNA between KAL and PCI and 19 kb between PCI and AACT. The proximal subcluster includes α1AT, ATR, and CBG; it occupies 90 kb of genomic DNA, with 12 kb of DNA between α1AT and ATR and 40 kb between ATR and CBG. These genes are all transcribed in a distal-to-proximal orientation. This represents the first detailed physical map of the serpin gene cluster on 14q32.1.  相似文献   

6.
Transformation-associated recombination (TAR) cloning in yeast is used to isolate a desired chromosomal region or gene from a complex genome without construction of a genomic library. The technique involves homologous recombination during yeast spheroplast transformation between genomic DNA and a TAR vector containing short 5′ and 3′ gene-specific targeting hooks. Efficient gene capture requires a high yield of transformants, and we demonstrate here that the transformant yield increases ~10-fold when the genomic DNA is sheared to 100–200 kb before being presented to the spheroplasts. Here we determine the most effective concentration of genomic DNA, and also show that the targeted sequences recombine much more efficiently with the vector’s targeting hooks when they are located at the ends of the genomic DNA fragment. We demonstrate that the yield of gene-positive clones increases ~20-fold after endonuclease digestion of genomic DNA, which caused double strand breaks near the targeted sequences. These findings have led to a greatly improved protocol.  相似文献   

7.
To determine the influence of increased gene expression and amplification in colorectal carcinoma on chromatin structure, the nuclear distances between pairs of bacterial artificial chromosome (BAC) clones with genomic separation from 800 to 29,000 kb were measured and compared between the tumor and parallel epithelial cells of six patients. The nuclear distances were measured between the loci in chromosomal bands 7p22.3–7p21.3; 7q35–7q36.3; 11p15.5–11p15.4; 20p13; 20p12.2; 20q11.21 and 20q12 where increased expression had been found in all types of colorectal carcinoma. The loci were visualized by three-dimensional fluorescence in situ hybridization using 22 BAC clones. Our results show that for short genomic separations, mean nuclear distance increases linearly with increased genomic separation. The results for some pairs of loci fell outside this linear slope, indicating the existence of different levels of chromatin folding. For the same genomic separations the nuclear distances were frequently shorter for tumor as compared with epithelial cells. Above the initial growing phase of the nuclear distances, a plateau phase was observed in both cell types where the increase in genomic separation was not accompanied by an increase in nuclear distance. The ratio of the mean nuclear distances between the corresponding loci in tumor and epithelium cells decreases with increasing amplification of loci. Our results further show that the large-scale chromatin folding might differ for specific regions of chromosomes and that it is basically preserved in tumor cells in spite of the amplification of many loci.Communicated by T. Hassold  相似文献   

8.
Mnt is a repressor from phage P22 that belongs to the ribbon–helix–helix family of DNA binding factors. Four amino acids from the N-terminus of the protein, Arg2, His6, Asn8 and Arg10, interact with the base pairs of the DNA to provide the sequence specificity. Raumann et al. (Nature Struct. Biol., 2, 1115–1122) identified position 6 as a ‘master residue’ that controls the specificity of the protein. Models for the interaction have residue 6 of Mnt interacting directly with position 5 of the operator. In vivo selections demonstrated that protein variants at residue 6 bound specifically to operator mutations at that position. Operators in which the wild-type G at position 5 was replaced by T specifically bound to several different protein variants, primarily hydrophobic residues. The obtained protein variants, plus some others, were used in in vitro selections to determine their preferred binding sites. The results showed that the residue at position 6 influenced the preference for binding site bases predominantly at position 5, but that the effects of altering it can extend over longer distances, consistent with its designation as a ‘master residue’. The similarities of binding sites for different residues do not correlate strongly with common measures of amino acid similarities.  相似文献   

9.
Detection of low-level DNA mutations can reveal recurrent, hotspot genetic changes of clinical relevance to cancer, prenatal diagnostics, organ transplantation or infectious diseases. However, the high excess of wild-type (WT) alleles, which are concurrently present, often hinders identification of salient genetic changes. Here, we introduce UV-mediated cross-linking minor allele enrichment (UVME), a novel approach that incorporates ultraviolet irradiation (∼365 nm UV) DNA cross-linking either before or during PCR amplification. Oligonucleotide probes matching the WT target sequence and incorporating a UV-sensitive 3-cyanovinylcarbazole nucleoside modification are employed for cross-linking WT DNA. Mismatches formed with mutated alleles reduce DNA binding and UV-mediated cross-linking and favor mutated DNA amplification. UV can be applied before PCR and/or at any stage during PCR to selectively block WT DNA amplification and enable identification of traces of mutated alleles. This enables a single-tube PCR reaction directly from genomic DNA combining optimal pre-amplification of mutated alleles, which then switches to UV-mediated mutation enrichment-based DNA target amplification. UVME cross-linking enables enrichment of mutated KRAS and p53 alleles, which can be screened directly via Sanger sequencing, high-resolution melting, TaqMan genotyping or digital PCR, resulting in the detection of mutation allelic frequencies of 0.001–0.1% depending on the endpoint detection method. UV-mediated mutation enrichment provides new potential for mutation enrichment in diverse clinical samples.  相似文献   

10.
Copy number variants (CNVs) are currently defined as genomic sequences that are polymorphic in copy number and range in length from 1000 to several million base pairs. Among current array-based CNV detection platforms, long-oligonucleotide arrays promise the highest resolution. However, the performance of currently available analytical tools suffers when applied to these data because of the lower signal:noise ratio inherent in oligonucleotide-based hybridization assays. We have developed wuHMM, an algorithm for mapping CNVs from array comparative genomic hybridization (aCGH) platforms comprised of 385 000 to more than 3 million probes. wuHMM is unique in that it can utilize sequence divergence information to reduce the false positive rate (FPR). We apply wuHMM to 385K-aCGH, 2.1M-aCGH and 3.1M-aCGH experiments comparing the 129X1/SvJ and C57BL/6J inbred mouse genomes. We assess wuHMM's performance on the 385K platform by comparison to the higher resolution platforms and we independently validate 10 CNVs. The method requires no training data and is robust with respect to changes in algorithm parameters. At a FPR of <10%, the algorithm can detect CNVs with five probes on the 385K platform and three on the 2.1M and 3.1M platforms, resulting in effective resolutions of 24 kb, 2–5 kb and 1 kb, respectively.  相似文献   

11.
We have investigated the large-scale organization of the human chAB4-related long-range multisequence family, a low copy-number repetitive DNA located in the pericentromeric heterochromatin of several human chromosomes. Analysis of genomic clones revealed large-scale (~100 kb or more) sequence conservation in the region flanking the prototype chAB4 element. We demonstrated that this low copy-number family is connected to another long-range repeat, the NF1-related (ΨNF1) multisequence. The two DNA types are joined by an ~2 kb-long tandem repeat of a 48-bp satellite. Although the chAB4- and NF1-like sequences were known to have essentially the same chromosomal localization, their close association is reported here for the first time. It indicates that they are not two independent long-range DNA families, but are parts of a single element spanning ~200 kb or more. This view is consistent both with their similar chromosomal localizations and the high levels of sequence conservation among copies found on different chromosomes. We suggest that the master copy of the linked chAB4–ΨNF1 DNA segment appeared first on the ancestor of human chromosome 17.  相似文献   

12.
We have assessed the numbers of potentially deleterious variants in the genomes of apparently healthy humans by using (1) low-coverage whole-genome sequence data from 179 individuals in the 1000 Genomes Pilot Project and (2) current predictions and databases of deleterious variants. Each individual carried 281–515 missense substitutions, 40–85 of which were homozygous, predicted to be highly damaging. They also carried 40–110 variants classified by the Human Gene Mutation Database (HGMD) as disease-causing mutations (DMs), 3–24 variants in the homozygous state, and many polymorphisms putatively associated with disease. Whereas many of these DMs are likely to represent disease-allele-annotation errors, between 0 and 8 DMs (0–1 homozygous) per individual are predicted to be highly damaging, and some of them provide information of medical relevance. These analyses emphasize the need for improved annotation of disease alleles both in mutation databases and in the primary literature; some HGMD mutation data have been recategorized on the basis of the present findings, an iterative process that is both necessary and ongoing. Our estimates of deleterious-allele numbers are likely to be subject to both overcounting and undercounting. However, our current best mean estimates of ∼400 damaging variants and ∼2 bona fide disease mutations per individual are likely to increase rather than decrease as sequencing studies ascertain rare variants more effectively and as additional disease alleles are discovered.  相似文献   

13.
14.
‘Indirect readout’ refers to the proposal that proteins can recognize the intrinsic three-dimensional shape or flexibility of a DNA binding sequence apart from direct protein contact with DNA base pairs. The differing affinities of human papillomavirus (HPV) E2 proteins for different E2 binding sites have been proposed to reflect indirect readout. DNA bending has been observed in X-ray structures of E2 protein–DNA complexes. X-ray structures of three different E2 DNA binding sites revealed differences in intrinsic curvature. DNA sites with intrinsic curvature in the direction of protein-induced bending were bound more tightly by E2 proteins, supporting the indirect readout model. We now report solution measurements of intrinsic DNA curvature for three E2 binding sites using a sensitive electrophoretic phasing assay. Measured E2 site curvature agrees well the predictions of a dinucleotide model and supports an indirect readout hypothesis for DNA recognition by HPV E2.  相似文献   

15.
X-ray analysis of enzyme–DNA interactions is very informative in revealing molecular contacts, but provides neither quantitative estimates of the relative importance of these contacts nor information on the relative contributions of specific and nonspecific interactions to the total affinity of enzymes for specific DNA. A stepwise increase in the ligand complexity approach is used to estimate the relative contributions of virtually every nucleotide unit of synthetic DNA containing abasic sites to its affinity for apurinic/apyrimidinic endonuclease (APE1) from human placenta. It was found that APE1 interacts with 9–10 nt units or base pairs of single-stranded and double-stranded ribooligonucleotides and deoxyribooligonucleotides of different lengths and sequences, mainly through weak additive contacts with internucleotide phosphate groups. Such nonspecific interactions of APE1 with nearly every nucleotide within its DNA-binding cleft provides up to seven orders of magnitude (ΔG° ~ −8.7 to −9.0 kcal/mol) of the enzyme affinity for any DNA substrate. In contrast, interactions with the abasic site together with other specific APE1–DNA interactions provide only one order of magnitude (ΔG° ~ −1.1 to −1.5 kcal/mol) of the total affinity of APE1 for specific DNA. We conclude that the enzyme's specificity for abasic sites in DNA is mostly due to a great increase (six to seven orders of magnitude) in the reaction rate with specific DNA, with formation of the Michaelis complex contributing to the substrate preference only marginally.  相似文献   

16.
The physical distance between DNA sequences in interphase nuclei was determined using eight cosmids containing fragments of the Chinese hamster genome that span 273 kb surrounding the dihydrofolate reductase (DHFR) gene. The distance between these sequences at the molecular level has been determined previously by restriction enzyme mapping (J.E. Looney and J.L. Hamlin, 1987, Mol. Cell Biol. 7: 569-577; C. Ma et al., 1988, Mol. Cell Biol. 8: 2316-2327). Fluorescence in situ hybridization was used to localize the DNA sequences in interphase nuclei of cells bearing only one copy of this genomic region. The distance between DNA sequences in interphase nuclei was correlated to molecular distance over a range of 25 to at least 250 kb. The observed relationship was such that genomic distance could be predicted to within 40 kb from interphase distance. The correct order of seven probes was derived from interphase distances measured for 19 pair-wise combinations of the probes. Measured distances between sequences approximately 200 kb apart indicate that the DNA is condensed 70- to 100-fold in hybridized nuclei relative to a linear DNA helix molecule. Cell lines with chromosome inversions were used to show that interphase distance increases with genomic distance in the 50-90 Mb range, but less steeply than in the 25-250 kb range.  相似文献   

17.
Rapid determination of short DNA sequences by the use of MALDI-MS   总被引:3,自引:3,他引:0       下载免费PDF全文
We have developed a protocol for rapid sequencing of short DNA stretches (15–20 nt) using MALDI-TOF-MS. The protocol is based on the Sanger concept with the modification that double-stranded template DNA is used and all four sequencing reactions are performed in one reaction vial. The sequencing products are separated and detected by MALDI-TOF-MS and the sequence is determined by comparing measured molecular mass differences to expected values. The protocol is optimized for low costs and broad applicability. One reaction typically includes 300 fmol template, 10 pmol primer and 200 pmol each nucleotide monomer. Neither the primer nor any of the nucleotide monomers are labeled. Solid phase purification, concentration and mass spectrometric sample preparation of the sequencing products are accomplished in a few minutes and parallel processing of 96 samples is possible. The mass spectrometric analyses and subsequent sequence read-out require only a few seconds per template.  相似文献   

18.
Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing whole plastid genomes to find markers for evolutionary analyses is therefore particularly useful when overall genetic distances are low.  相似文献   

19.
Restriction endonucleases are highly specific in recognizing the particular DNA sequence they act on. However, their activity is affected by sequence context, enzyme concentration and buffer composition. Changes in these factors may lead to either ineffective cleavage at the cognate restriction site or relaxed specificity allowing cleavage of degenerate ‘star’ sites. Additionally, uncharacterized restriction endonucleases and engineered variants present novel activities. Traditionally, restriction endonuclease activity is assayed on simple substrates such as plasmids and synthesized oligonucleotides. We present and use high-throughput Illumina sequencing-based strategies to assay the sequence specificity and flanking sequence preference of restriction endonucleases. The techniques use fragmented DNA from sequenced genomes to quantify restriction endonuclease cleavage on a complex genomic DNA substrate in a single reaction. By mapping millions of restriction site–flanking reads back to the Escherichia coli and Drosophila melanogaster genomes we were able to quantitatively characterize the cognate and star site activity of EcoRI and MfeI and demonstrate genome-wide decreases in star activity with engineered high-fidelity variants EcoRI-HF and MfeI-HF, as well as quantify the influence on MfeI cleavage conferred by flanking nucleotides. The methods presented are readily applicable to all type II restriction endonucleases that cleave both strands of double-stranded DNA.  相似文献   

20.
The twist, rise, slide, shift, tilt and roll between adjoining base pairs in DNA depend on the identity of the bases. The resulting dependence of the double helix conformation on the nucleotide sequence is important for DNA recognition by proteins, packaging and maintenance of genetic material, and other interactions involving DNA. This dependence, however, is obscured by poorly understood variations in the stacking geometry of the same adjoining base pairs within different sequence contexts. In this article, we approach the problem of sequence-dependent DNA conformation by statistical analysis of X-ray and NMR structures of DNA oligomers. We evaluate the corresponding helical coherence length—a cumulative parameter quantifying sequence-dependent deviations from the ideal double helix geometry. We find, e.g. that the solution structure of synthetic oligomers is characterized by 100–200 Å coherence length, which is similar to ~150 Å coherence length of natural, salmon-sperm DNA. Packing of oligomers in crystals dramatically alters their helical coherence. The coherence length increases to 800–1200 Å, consistent with its theoretically predicted role in interactions between DNA at close separations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号