首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Yeast artificial chromosomes (YACs) have recently provided a potential route to long-range coverage of complex genomes in contiguous cloned DNA. In a pilot project for 50 Mb (1.5% of the human genome), a variety of techniques have been applied to assemble Xq24–q28 YAC contigs up to 8 Mb in length and assess their quality. The results indicate the relative strength of several approaches and support the adequacy of YAC-based methods for mapping the human genome.  相似文献   

2.
The genomes of several vertebrates, including six mammals, the chicken, Xenopus and four ray-finned fishes have been sequenced or are currently being sequenced to provide a better understanding of the human genome through comparative analysis. However, this list does not include cartilaginous fishes, which are the most basal living jawed vertebrates [1]. The genomes of the current ‘popular’ cartilaginous fishes such as the nurse shark, dogfish, and horn shark are larger than the human genome (∼3800 Mb to 7000 Mb) [2], and are not attractive for whole-genome sequencing. Here, we report the characterization of the relatively small genome (1200 Mb) of a cartilaginous fish, the elephant fish (Callorhinchus milii), and propose it as a model for whole-genome sequencing.  相似文献   

3.
Novel sequences are DNA sequences present in an individual''s genome but absent in the human reference assembly. They are predicted to be biologically important, both individual and population specific, and consistent with the known human migration paths. Recent works have shown that an average person harbors 2–5 Mb of such sequences and estimated that the human pan-genome contains as high as 19–40 Mb of novel sequences. To identify them in a de novo genome assembly, some existing sequence aligners have been used but no computational method has been specifically proposed for this task. In this work, we developed NSIT (Novel Sequence Identification Tool), a software that can accurately and efficiently identify novel sequences in an individual''s de novo whole genome assembly. We identified and characterized 1.1 Mb, 1.2 Mb, and 1.0 Mb of novel sequences in NA18507 (African), YH (Asian), and NA12878 (European) de novo genome assemblies, respectively. Our results show very high concordance with the previous work using the respective reference assembly. In addition, our results using the latest human reference assembly suggest that the amount of novel sequences per individual may not be as high as previously reported. We additionally developed a graphical viewer for comparisons of novel sequence contents. The viewer also helped in identifying sequence contamination; we found 130 kb of Epstein-Barr virus sequence in the previously published NA18507 novel sequences as well as 287 kb of zebrafish repeats in NA12878 de novo assembly. NSIT requires 2GB of RAM and 1.5–2 hrs on a commodity desktop. The program is applicable to input assemblies with varying contig/scaffold sizes, ranging from 100 bp to as high as 50 Mb. It works in both 32-bit and 64-bit systems and outperforms, by large margins, other fast sequence aligners previously applied to this task. To our knowledge, NSIT is the first software designed specifically for novel sequence identification in a de novo human genome assembly.  相似文献   

4.
In this report we present the results of the analysis of approximately 2.7 Mb of genomic information for the American mink (Neovison vison) derived through BAC end sequencing. Our study, which encompasses approximately 1/1000th of the mink genome, suggests that simple sequence repeats (SSRs) are less common in the mink than in the human genome, whereas the average GC content of the mink genome is slightly higher than that of its human counterpart. The 2.7 Mb mink genomic dataset also contained 2,416 repeat elements (retroids and DNA transposons) occupying almost 31% of the sequence space. Among repeat elements, LINEs were over-represented and endogenous viruses (aka LTRs) under-represented in comparison to the human genome. Finally, we present a virtual map of the mink genome constructed with reference to the human and canine genome assemblies using a comparative genomics approach and incorporating over 200 mink BESs with unique hits to the human genome.  相似文献   

5.
Recent genome sequencing papers have given genome sizes of 180 Mb for Drosophila melanogaster Iso-1 and 125 Mb for Arabidopsis thaliana Columbia. The former agrees with early cytochemical estimates, but numerous cytometric estimates of around 170 Mb imply that a genome size of 125 Mb for arabidopsis is an underestimate. In this study, nuclei of species pairs were compared directly using flow cytometry. Co-run Columbia and Iso-1 female gave a 2C peak for arabidopsis only approx. 15 % below that for drosophila, and 16C endopolyploid Columbia nuclei had approx. 15 % more DNA than 2C chicken nuclei (with >2280 Mb). Caenorhabditis elegans Bristol N2 (genome size approx. 100 Mb) co-run with Columbia or Iso-1 gave a 2C peak for drosophila approx. 75 % above that for 2C C. elegans, and a 2C peak for arabidopsis approx. 57 % above that for C. elegans. This confirms that 1C in drosophila is approx. 175 Mb and, combined with other evidence, leads us to conclude that the genome size of arabidopsis is not approx. 125 Mb, but probably approx. 157 Mb. It is likely that the discrepancy represents extra repeated sequences in unsequenced gaps in heterochromatic regions. Complete sequencing of the arabidopsis genome until no gaps remain at telomeres, nucleolar organizing regions or centromeres is still needed to provide the first precise angiosperm C-value as a benchmark calibration standard for plant genomes, and to ensure that no genes have been missed in arabidopsis, especially in centromeric regions, which are clearly larger than once imagined.  相似文献   

6.
Libraries of the entire human genome, or regions of the genome, have been made in bacteria, yeast, and somatic cells. We have expanded this strategy using overlapping YACs and P1s from human 21q22.2 (the Down syndrome region) to create a panel of transgenic mice containing DNA that encompasses this region of the human genome. Together the members of the in vivo library, each with a unique transgene (four YACs and four P1s), contain approximately 2 Mb of contiguous DNA. The integrity, stable inheritance, and expression of a coding sequence for each member of the YAC panel are demonstrated, and the uses of the panel are described.  相似文献   

7.
Henk DA  Fisher MC 《PloS one》2012,7(2):e31268
Fungal genomes range in size from 2.3 Mb for the microsporidian Encephalitozoon intestinalis up to 8000 Mb for Entomophaga aulicae, with a mean genome size of 37 Mb. Basidiobolus, a common inhabitant of vertebrate guts, is distantly related to all other fungi, and is unique in possessing both EF-1α and EFL genes. Using DNA sequencing and a quantitative PCR approach, we estimated a haploid genome size for Basidiobolus at 350 Mb. However, based on allelic variation, the nuclear genome is at least diploid, leading us to believe that the final genome size is at least 700 Mb. We also found that EFL was in three times the copy number of its putatively functionally overlapping paralog EF-1α. This suggests that gene or genome duplication may be an important feature of B. ranarum evolution, and also suggests that B. ranarum may have mechanisms in place that favor the preservation of functionally overlapping genes.  相似文献   

8.
Most proterminal regions of human chromosomes are GC-rich and gene-rich. Chromosome 3p is an exception. Its proterminal region is GC-poor, and likely to lose heterozy-gosity, thus causing a number of fatal diseases. Except one gap left in the telomeric position, the proterminal region of human chromosome 3p has been completely sequenced. The detailed sequence analysis showed: (i) the GC content of this region was 38.5%, being the lowest among all the human proterminal regions; (ii) this region contained 20 known genes and 22 predicted genes, with an average gene size of 97.5 kb. The previously mapped gene Cntn3 was not found in this region, but instead located in the 74 Mb position of human chromosome 3p; (iii) the interspersed repeats of this region were more active than the average level of the whole human genome, especially (TA)n, the content of which was twice the genome average; (iv) this region had a conserved synteny extending from 104.1 Mb to 112.4 Mb on the mouse chromosome 6, which was 8% larger in size, not in accordance with the whole genome comparison, probably because the 3pter-p26 region was more likely to lose neocleitides and its mouse synteny had more active interspersed repeats.  相似文献   

9.
Genome sizes of six different Wolbachia strains from insect and nematode hosts have been determined by pulsed-field gel electrophoresis of purified DNA both before and after digestion with rare-cutting restriction endonucleases. Enzymes SmaI, ApaI, AscI, and FseI cleaved the studied Wolbachia strains at a small number of sites and were used for the determination of the genome sizes of wMelPop, wMel, and wMelCS (each 1.36 Mb), wRi (1.66 Mb), wBma (1.1 Mb), and wDim (0.95 Mb). The Wolbachia genomes studied were all much smaller than the genomes of free-living bacteria such as Escherichia coli (4.7 Mb), as is typical for obligate intracellular bacteria. There was considerable genome size variability among Wolbachia strains, especially between the more parasitic A group Wolbachia infections of insects and the mutualistic C and D group infections of nematodes. The studies described here found no evidence for extrachromosomal plasmid DNA in any of the strains examined. They also indicated that the Wolbachia genome is circular.  相似文献   

10.
Most proterminal regions of human chromosomes are GC-rich and gene-rich. Chromosome 3p is an exception. Its proterminal region is GC-poor, and likely to lose heterozygosity, thus causing a number of fatal diseases. Except one gap left in the telomeric position, the proterminal region of human chromosome 3p has been completely sequenced. The detailed sequence analysis showed: (i) the GC content of this region was 38.5%, being the lowest among all the human proterminal regions; (ii) this region contained 20 known genes and 22 predicted genes, with an average gene size of 97.5 kb. The previously mapped gene Cntn3 was not found in this region, but instead located in the 74 Mb position of human chromosome 3p; (iii) the interspersed repeats of this region were more active than the average level of the whole human genome, especially (TA)n, the content of which was twice the genome average; (iv) this region had a conserved synteny extending from 104.1 Mb to 112.4 Mb on the mouse chromosome 6, which was 8% larger in size, not in accordance with the whole genome comparison, probably because the 3pter-p26 region was more likely to lose neocleitides and its mouse synteny had more active interspersed repeats.  相似文献   

11.
Recent segmental and gene duplications in the mouse genome   总被引:2,自引:0,他引:2       下载免费PDF全文

Background

The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies.

Results

We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice.

Conclusion

Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.
  相似文献   

12.
The diploid genome sequence of an individual human   总被引:4,自引:1,他引:3  
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.  相似文献   

13.

Background

Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences.

Results

We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene.

Conclusion

Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.  相似文献   

14.
A library of yeast artificial chromosomes (YACs) with human DNA inserts has been assembled from a human/hamster somatic cell hybrid containing Xq24-Xqter human DNA. Screening of the agar-embedded transformants for human DNA used a manifold of 3000 stainless-steel pins to transfer colonies onto the surface of media. This facilitated the recovery of the 1 in 300 clones that contained a human DNA insert (the remainder had hamster DNA and were discarded). The library described here consists of about two genomic equivalents (102 Mb) of human DNA in 467 clones: 167 were generated by EcoRI partial digestion and contain 25.5 Mb of human DNA; 252 used partial digestion with TaqI and cover 64.2 Mb; and 48 were from sheared DNA inserts and cover 11.7 Mb. Clones were screened by hybridization with 70 probes previously assigned to Xq24-Xq28. Eleven probes did not hybridize to any YACs in the library, and 16 probes hybridized to one YAC each, 23 to two, 13 to three, and 7 to four. Also, individual YACs large enough to detect features like the clustering of polymorphic sequences in subregions of Xq24-Xqter have been obtained. For example, XY58 contained five probe sequences previously independently isolated. The overall yield of YACs containing probe sequences was indistinguishable from Poisson statistical expectations for random cloning (P = 0.9). Thus, YAC libraries such as the one described here can include most, if not all, of the sequences in the source DNA from which the library is derived. These results support the possibility that YACs may provide a reliable bridge between linkage studies and conventional recombinant DNA analyses in mapping of the human genome.  相似文献   

15.
Mapping the whole human genome by fingerprinting yeast artificial chromosomes.   总被引:18,自引:0,他引:18  
Physical mapping of the human genome has until now been envisioned through single chromosome strategies. We demonstrate that by using large insert yeast artificial chromosomes (YACs) a whole genome approach becomes feasible. YACs (22,000) of 810 kb mean size (5 genome equivalents) have been fingerprinted to obtain individual patterns of restriction fragments detected by a LINE-1 (L1) probe. More than 1000 contigs were assembled. Ten randomly chosen contigs were validated by metaphase chromosome fluorescence in situ hybridization, as well as by analyzing the inter-Alu PCR patterns of their constituent YACs. We estimate that 15% to 20% of the human genome, mainly the L1-rich regions, is already covered with contigs larger than 3 Mb.  相似文献   

16.
The genomes of nonhuman primates are powerful references for better understanding the recent evolution of the human genome. Here we compare the order of 802 genomic markers mapped in a rhesus macaque (Macaca mulatta) radiation hybrid panel with the human genome, allowing for nearly complete cross-reference to the human genome at an average resolution of 3.5 Mb. At least 23 large-scale chromosomal rearrangements, mostly inversions, are needed to explain the changes in marker order between human and macaque. Analysis of the breakpoints flanking inverted chromosomal segments and estimation of their duplication divergence dates provide additional evidence implicating segmental duplications as a major mechanism of chromosomal rearrangement in recent primate evolution.  相似文献   

17.
Dan S  Chen F  Choy KW  Jiang F  Lin J  Xuan Z  Wang W  Chen S  Li X  Jiang H  Leung TY  Lau TK  Su Y  Zhang W  Zhang X 《PloS one》2012,7(2):e27835
Fetal chromosomal abnormalities are the most common reasons for invasive prenatal testing. Currently, G-band karyotyping and several molecular genetic methods have been established for diagnosis of chromosomal abnormalities. Although these testing methods are highly reliable, the major limitation remains restricted resolutions or can only achieve limited coverage on the human genome at one time. The massively parallel sequencing (MPS) technologies which can reach single base pair resolution allows detection of genome-wide intragenic deletions and duplication challenging karyotyping and microarrays as the tool for prenatal diagnosis. Here we reported a novel and robust MPS-based method to detect aneuploidy and imbalanced chromosomal arrangements in amniotic fluid (AF) samples. We sequenced 62 AF samples on Illumina GAIIx platform and with averagely 0.01× whole genome sequencing data we detected 13 samples with numerical chromosomal abnormalities by z-test. With up to 2× whole genome sequencing data we were able to detect microdeletion/microduplication (ranged from 1.4 Mb to 37.3 Mb of 5 samples from chorionic villus sampling (CVS) using SeqSeq algorithm. Our work demonstrated MPS is a robust and accurate approach to detect aneuploidy and imbalanced chromosomal arrangements in prenatal samples.  相似文献   

18.
Lund H  Nyegaard M  Svarrer T  Grove A  Sunde L 《Gene》2012,497(2):280-284

Introduction

Hydatidiform mole is an abnormal human pregnancy, characterised by absent or abnormal embryonic differentiation, vesicular chorionic villi and trophoblastic hyperplasia. Although the mole phenotype has hereto not been correlated to mutations in the molar genome, the aetiology for hydatidiform moles clearly is genetic: Most molar genomes analysed either have had a relative excess of paternal genome sets relative to maternal genome sets, or a global error in maternally imprinted genes, giving them a “paternal pattern”. However it remains yet to be specified which gene(s) in the molar genome actually causes the molar phenotype when present in a state of “paternal excess” or “maternal deficiency”.

Material and methods

A molar pregnancy in a woman with a balanced translocation (t(2;5) was subjected to histopathological evaluation and genetic analyses of ploidy and parental origin of the genome.

Results

Morphology: Partial hydatidiform mole. Karyotyping of metaphase chromosomes: 69,XXY,der(5)t(2;5)(q23;q33)mat. SNP array analysis mapped the breakpoints to 2q31.2 (genome position 179 Mb) and 5q34 (genome position 165 Mb). DNA microsatellite marker analysis showed that for the regions not involved in the translocation, the conceptus had two paternal and one maternal allele(s). Telomeric to the breakpoint on chromosome 2, the mole had two paternal and two maternal alleles and telomeric to the breakpoint on chromosome 5 the mole had paternal alleles, exclusively.

Conclusions

If the molar phenotype is caused by paternal excess of one gene, only, it is unlikely that this gene is located telomeric to genome position 179 Mb on chromosome 2. And similarly, if the phenotype complete mole is caused by the presence of exclusively paternally imprinted alleles of one gene, this gene is not located telomeric to genome position 165 Mb on chromosome 5.  相似文献   

19.
To determine the physical length of the chromosome of Campylobacter jejuni, the genome was subjected to digestion by a series of restriction endonucleases to produce a small number of large restriction fragments. These fragments were then separated by pulsed-field gel electrophoresis with the contour-clamped homogeneous electric field system. The DNA of C. jejuni, with its low G+C content, was found to have no restriction sites for enzymes NotI and SfiI, which cut a high-G+C regions. Most of the restriction enzymes that were used resulted in DNA fragments that were either too numerous or too small for genome size determination, with the exception of the enzymes SalI (5' ... G decreases TCGAG ... 3'), SmaI (5' .... CCC decreases GGG .... 3'), and KpnI (5' ... GGTAC decreases C .... 3'). With SalI, six restriction fragments with average values of 48.5, 80, 110, 220, 280, and 980 kilobases (kb) were obtained when calibrated with both a lambda DNA ladder and yeast Saccharomyces cerevisiae chromosome markers. The sum of these fragments yielded an average genome size of 1.718 megabases (Mb). With SmaI, nine restriction fragments with average values ranging from 39 to 371 kb, which yielded an average genome size of 1.726 Mb were obtained. With KpnI, 11 restriction fragments with sizes ranging from 35 to 387.5 kb, which yielded an average genome size of 1.717 Mb were obtained. A SalI restriction map was derived by partial digestion of the C. jejuni DNA. The genome sizes of C. laridis, C. coli, and C. fetus were also determined with the contour-clamped homogeneous electric field system by SalI, SmaI, and KpnI digestion. Average genome sizes were found to be 1.714 Mb for C. coli, 1.267 Mb for C. fetus subsp. fetus, and 1.451 Mb for C. laridis.  相似文献   

20.
The sequence of the human genome is not yet complete, and major gaps remain at the centromere region of each chromosome, which is comprised of repetitive alpha satellite DNA. In this article, we describe the sequences in the vicinity of the centromere that are included in the current genome assembly, analyze the approximately 7Mb of alpha satellite that have been assembled thus far and anticipate the nature of the sequences that remain to be accounted for.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号