首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A "gene-island" sequencing strategy has been developed that expedites the targeted acquisition of orthologous gene sequences from related species for comparative genome analysis. A 152-kb bacterial artificial chromosome (BAC) clone from sorghum (Sorghum bicolor) encoding phytochrome A (PHYA) was fully sequenced, revealing 16 open reading frames with a gene density similar to many regions of the rice (Oryza sativa) genome. The sequences of genes in the orthologous region of the maize (Zea mays) and rice genomes were obtained using the gene-island sequencing method. BAC clones containing the orthologous maize and rice PHYA genes were identified, sheared, subcloned, and probed with the sorghum PHYA-containing BAC DNA. Sequence analysis revealed that approximately 75% of the cross-hybridizing subclones contained sequences orthologous to those within the sorghum PHYA BAC and less than 25% contained repetitive and/or BAC vector DNA sequences. The complete sequence of four genes, including up to 1 kb of their promoter regions, was identified in the maize PHYA BAC. Nine orthologous gene sequences were identified in the rice PHYA BAC. Sequence comparison of the orthologous sorghum and maize genes aided in the identification of exons and conserved regulatory sequences flanking each open reading frame. Within genomic regions where micro-colinearity of genes is absolutely conserved, gene-island sequencing is a particularly useful tool for comparative analysis of genomes between related species.  相似文献   

2.
Characterization of the segmental duplication LCR7-20 in the human genome   总被引:1,自引:0,他引:1  
Liu X  Li X  Li M  Acimovic YJ  Li Z  Scherer SW  Estivill X  Tsui LC 《Genomics》2004,83(2):262-269
Our previous study described the amplification of a genomic sequence containing exon 9 of CFTR in the human genome. Here we report that this CFTR sequence is part of a large duplicated sequence unit, provisionally named LCR7-20. Through successive screening of two human chromosome 7-specific cosmid libraries to construct a cosmid contig, we assembled two sequenced BAC clones into a single contig containing a prototypic LCR7-20 unit. Subsequent searches of existing human genome sequences identified additional six copies of LCR7-20-like sequences with more than 90% sequence homology. Additional genomic clones containing LCR7-20-like sequences were then isolated from total genomic BAC and PAC libraries. Restriction fragment analysis and limited sequencing data indicated that there could be around 30 copies of LCR7-20-like sequences in the human genome and that the average region of homology could extend over 120 kb. As indicated by fluorescence in situ hybridization analysis, LCR7-20-like sequences are dispersed on different chromosomes, mainly in the centromeric and pericentromeric regions, and some may exist in tandem copies. Our study also indicates that many genomic regions containing LCR7-20's either have been misassembled or are missing in current versions of the human genome sequence.  相似文献   

3.
Vertebrate whole genome sequence assembly can benefit from a priori knowledge of variability in the target genome, with researchers often selecting highly inbred individuals for sequencing. However, for most species highly inbred research lines are lacking, requiring the use of an outbred individual(s). Here we examined the source DNA [Nicholas inbred (Nici)] of the CHORI-260 turkey bacterial artificial chromosome (BAC) library through analysis of microsatellites and BAC sequences. Heterozygosity of Nici was compared with that of individuals from several breeder lines. Seventy-eight microsatellites were screened for polymorphism in a total of 43 birds, identifying an average individual heterozygosity of 0.39, with Nici at 0.35. Additional loci (total of 147) were examined on a subset of individuals to obtain better genome coverage. The mean heterozygosity for this subset was 0.33 with Nici at 0.31. Examination of approximately 200 kb of genome sequence identified SNPs in the order of one per 200 bp in Nici. These data suggest that the heterozygosity of Nici is comparable to other birds of selected breeder lines and that whole genome sequencing would result in an abundant resource of genome-wide polymorphisms.  相似文献   

4.
We estimated the genome size of Korean ginseng ( Panax ginseng C.A. Meyer), a medicinal herb, constructed a Hin dIII BAC library, and analyzed BAC-end sequences to provide an initial characterization of the library. The 1C nuclear DNA content of Korean ginseng was estimated to be 3.33 pg (3.12×103 Mb). The BAC library consists of 106,368 clones with an average size of 98.61 kb, amounting to 3.34 genome equivalents. Sequencing of 2167 BAC clones generated 2492 BAC-end sequences with an average length of 400 bp. Analysis using BLAST and motif searches revealed that 10.2%, 20.9% and 3.8% of the BAC-end sequences contained protein-coding regions, transposable elements and microsatellites, respectively. A comparison of the functional categories represented by the protein-coding regions found in BAC-end sequences with those of Arabidopsis revealed that proteins pertaining to energy metabolism, subcellular localization, cofactor requirement and transport facilitation were more highly represented in the P. ginseng sample. In addition, a sequence encoding a glucosyltransferase-like protein implicated in the ginsenoside biosynthesis pathway was also found. The majority of the transposable element sequences found belonged to the gypsy type (67.6%), followed by copia (11.7%) and LINE (8.0%) retrotransposons, whereas DNA transposons accounted for only 2.1% of the total in our sequence sample. Higher levels of transposable elements than protein-coding regions suggest that mobile elements have played an important role in the evolution of the genome of Korean ginseng, and contributed significantly to its complexity. We also identified 103 microsatellites with 3–38 repeats in their motifs. The BAC library and BAC-end sequences will serve as a useful resource for physical mapping, positional cloning and genome sequencing of P. ginseng.Electronic Supplementary Material Supplementary material is available in the online version of this article at Communicated by M.-A. Grandbastien  相似文献   

5.
Bread wheat (Triticum aestivum L.) is one of the most important crops globally and a high priority for genetic improvement, but its large and complex genome has been seen as intractable to whole genome sequencing. Isolation of individual wheat chromosome arms has facilitated large-scale sequence analyses. However, so far there is no such survey of sequences from the A genome of wheat. Greater understanding of an A chromosome could facilitate wheat improvement and future sequencing of the entire genome. We have constructed BAC library from the long arm of T. aestivum chromosome 1A (1AL) and obtained BAC end sequences from 7,470 clones encompassing the arm. We obtained 13,445 (89.99%) useful sequences with a cumulative length of 7.57 Mb, representing 1.43% of 1AL and about 0.14% of the entire A genome. The GC content of the sequences was 44.7%, and 90% of the chromosome was estimated to comprise repeat sequences, while just over 1% encoded expressed genes. From the sequence data, we identified a large number of sites suitable for development of molecular markers (362 SSR and 6,948 ISBP) which will have utility for mapping this chromosome and for marker assisted breeding. From 44 putative ISBP markers tested 23 (52.3%) were found to be useful. The BAC end sequence data also enabled the identification of genes and syntenic blocks specific to chromosome 1AL, suggesting regions of particular functional interest and targets for future research.  相似文献   

6.
7.
Previous studies in the chicken have identified a single microchromosome (GGA16) containing the ribosomal DNA (rDNA) and two genetically unlinked MHC regions, MHC-B and MHC-Y. Chicken DNA sequence from these loci was used to develop PCR primers for amplification of homologous fragments from the turkey (Meleagris gallopavo). PCR products were sequenced and overgo probes were designed to screen the CHORI 260 turkey BAC library. BAC clones corresponding to the turkey rDNA, MHC-B and MHC-Y were identified. BAC end and subclone sequencing confirmed identity and homology of the turkey BAC clones to the respective chicken loci. Based on subclone sequences, single-nucleotide polymorphisms (SNPs) segregating within the UMN/NTBF mapping population were identified and genotyped. Analysis of SNP genotypes found the B and Y to be genetically unlinked in the turkey. Silver staining of metaphase chromosomes identified a single pair of microchromosomes with nucleolar organizer regions (NORs). Physical locations of the rDNA and MHC loci were determined by fluorescence in situ hybridization (FISH) of the BAC clones to metaphase chromosomes. FISH clearly positioned the rDNA distal to the Y locus on the q-arm of the MHC chromosome and the MHC-B on the p-arm. An internal telomere array on the MHC chromosome separates the B and Y loci.  相似文献   

8.
Large-insert bacterial artificial chromosome (BAC) libraries are necessary for advanced genetics and genomics research. To facilitate gene cloning and characterization, genome analysis, and physical mapping of scallop, two BAC libraries were constructed from nuclear DNA of Zhikong scallop, Chlamys farreri Jones et Preston. The libraries were constructed in the BamHI and MboI sites of the vector pECBAC1, respectively. The BamHI library consists of 73,728 clones, and approximately 99% of the clones contain scallop nuclear DNA inserts with an average size of 110 kb, covering 8.0x haploid genome equivalents. Similarly, the MboI library consists of 7680 clones, with an average insert of 145 kb and no insert-empty clones, thus providing a genome coverage of 1.1x. The combined libraries collectively contain a total of 81,408 BAC clones arrayed in 212 384-well microtiter plates, representing 9.1x haploid genome equivalents and having a probability of greater than 99% of discovering at least one positive clone with a single-copy sequence. High-density clone filters prepared from a subset of the two libraries were screened with nine pairs of Overgos designed from the cDNA or DNA sequences of six genes involved in the innate immune system of mollusks. Positive clones were identified for every gene, with an average of 5.3 BAC clones per gene probe. These results suggest that the two scallop BAC libraries provide useful tools for gene cloning, genome physical mapping, and large-scale sequencing in the species.  相似文献   

9.
Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.  相似文献   

10.
The complete sequence of Musa acuminata bacterial artificial chromosome (BAC) clones is presented and, consequently, the first analysis of the banana genome organization. One clone (MuH9) is 82,723 bp long with an overall G+C content of 38.2%. Twelve putative protein-coding sequences were identified, representing a gene density of one per 6.9 kb, which is slightly less than that previously reported for Arabidopsis but similar to rice. One coding sequence was identified as a partial M. acuminata malate synthase, while the remaining sequences showed a similarity to predicted or hypothetical proteins identified in genome sequence data. A second BAC clone (MuG9) is 73,268 bp long with an overall G+C content of 38.5%. Only seven putative coding regions were discovered, representing a gene density of only one gene per 10.5 kb, which is strikingly lower than that of the first BAC. One coding sequence showed significant homology to the soybean ribonucleotide reductase (large subunit). A transition point between coding regions and repeated sequences was found at approximately 45 kb, separating the coding upstream BAC end from its downstream end that mainly contained transposon-like sequences and regions similar to known repetitive sequences of M. acuminata. This gene organization resembles Gramineae genome sequences, where genes are clustered in gene-rich regions separated by gene-poor DNA containing abundant transposons.Communicated by J.S. Heslop-Harrison  相似文献   

11.
Whole-genome sequencing and variant discovery in C. elegans   总被引:1,自引:0,他引:1  
Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage and representation. Massively parallel sequencing facilitates strain-to-reference comparison for genome-wide sequence variant discovery. Owing to the short-read-length sequences produced, we developed a revised approach to determine the regions of the genome to which short reads could be uniquely mapped. We then aligned Solexa reads from C. elegans strain CB4858 to the reference, and screened for single-nucleotide polymorphisms (SNPs) and small indels. This study demonstrates the utility of massively parallel short read sequencing for whole genome resequencing and for accurate discovery of genome-wide polymorphisms.  相似文献   

12.
ABSTRACT: BACKGROUND: A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). RESULTS: The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar 'Chandler' were mapped to 48,661 'Chandler' bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to genotype a walnut mapping population having 'Chandler' as one of the parents. Genotyping results were used to adjust the filtering parameters of the updated AGSNP pipeline. With the adjusted filtering criteria, 69.6% of SNPs discovered with the updated pipeline were real and could be mapped on the walnut genetic map. A total of 13,439 SNPs were discovered by BES re-sequencing. BESs harboring SNPs were in 677 FPC contigs covering 98% of the physical map of the walnut genome. CONCLUSION: The updated AGSNP pipeline is a versatile SNP discovery tool for a high-throughput, genome-wide SNP discovery in both autogamous and allogamous species. With this pipeline, a large set of SNPs were identified in a single walnut cultivar.  相似文献   

13.
Large polyploid genomes of non-model species remain challenging targets for DNA polymorphism discovery despite the increasing throughput and continued reductions in cost of sequencing with new technologies. For these species especially, there remains a requirement to enrich genomic DNA to discover polymorphisms in regions of interest because of large genome size and to provide the sequence depth to enable estimation of copy number. Various methods of enriching DNA have been utilised, but some recent methods enable the efficient sampling of large regions (e.g. the exome). We have utilised one of these methods, solution-based hybridization (Agilent SureSelect), to capture regions of the genome of two sugarcane genotypes (one Saccharum officinarum and one Saccharum hybrid) based mainly on gene sequences from the close relative Sorghum bicolor. The capture probes span approximately 5.8?megabases (Mb). The enrichment over whole-genome shotgun sequencing was 10-11-fold for the two genotypes tested. This level of enrichment has important consequences for detecting single nucleotide polymorphisms (SNPs) from a single lane of Illumina (Genome Analyzer) sequence reads. The detection of polymorphisms was enabled by the depth of sequence at or near probe sites and enabled the detection of 270?000-280?000 SNPs within each genotype from a single lane of sequence using stringent detection parameters. The SNPs were present in 13?000-16?000 targeted genes, which would enable mapping of a large number of these chosen genes. SNP validation from 454 sequencing and between-genotype confirmations gave an 87%-91% validation rate.  相似文献   

14.
Diagnosis of Pasteurella pneumotropica in laboratory animals relies on isolation of the organism, biochemical characterization, and, more recently, DNA-based diagnostic methods. 16S rRNA and rpoB gene sequences were examined for development of a real-time PCR assay. Partial sequencing of rpoB (456 bp) and 16S rRNA (1368 bp) of Pasteurella pneumotropica isolates identified by microbiologic and biochemical assays indicated that either gene sequence can be used to distinguish P. pneumotropica from other members of the Pasteurellaceae family. However, alignment of rpoB sequences from the Pasteurella pneumotropica Heyl (15 sequences) and Jawetz (16 sequences) biotypes with other Pasteurellaceae sequences from GenBank indicated that although rpoB DNA sequencing could be used for diagnosis, development of diagnostic primers and probes would be difficult, because the sequence variability between Heyl and Jawetz biotypes is not clustered in any particular region of the rpoB sequence. In contrast, alignment of 16S rRNA sequences revealed a region with unique and stable nucleotide motifs sufficient to permit development of a specific fluorogenic real-time PCR assay to confirm P. pneumotropica isolated by culture and to differentiate Heyl and Jawetz biotypes.  相似文献   

15.
One approach to identify potentially important segments of the human genome is to search for DNA regions with nonrandom patterns of human sequence variation. Previous studies have investigated these patterns primarily in and around candidate gene regions. Here, we determined patterns of DNA sequence variation in 2.5 Mb of finished sequence from five regions on human chromosome 21. By sequencing 13 individual chromosomes, we identified 1460 single-nucleotide polymorphisms (SNPs) and obtained unambiguous haplotypes for all chromosomes. For all five chromosomal regions, we observed segments with high linkage disequilibrium (LD), extending from 1.7 to>81 kb (average 21.7 kb), disrupted by segments of similar or larger size with no significant LD between SNPs. At least 25% of the contig sequences consisted of segments with high LD between SNPs. Each of these segments was characterized by a restricted number of observed haplotypes,with the major haplotype found in over 60% of all chromosomes. In contrast, the interspersed segments with low LD showed significantly more haplotype patterns. The position and extent of the segments of high LD with restricted haplotype variability did not coincide with the location of coding sequences. Our results indicate that LD and haplotype patterns need to be investigated with closely spaced SNPs throughout the human genome, independent of the location of coding sequences, to reliably identify regions with significant LD useful for disease association studies.  相似文献   

16.
Single nucleotide polymorphisms (SNPs) can significantly contribute to the characterization of the genes predisposing to iron overloads or deficiencies. We report an SNP survey of coding and non-coding regions of eight genes involved in iron metabolism, by two successive methods. First, we made use of the public domain sequence data, by using assembled expressed sequence tags, non-redundant sequences, and SNP database screening. We extracted 77 potential SNPs of which only 31 could be further validated by sequencing DNA from 44 unrelated multi-ethnic individuals. Our results indicate that a bioinformatic approach may be effective only in those cases where candidate SNPs are extracted from two different data sources or in cases of experimentally confirmed SNPs. Second, additional systematic sequencing of DNA from 24 unrelated Breton subjects increased the number of SNPs over a total length of 86 kb to 96. The average distance between the SNPs and minor allele frequencies were higher than reported by others authors; this discrepancy may reflect the nature of the genes studied and the ethnic homogeneity of our test population.  相似文献   

17.
水稻单核苷酸多态性及其应用现状   总被引:6,自引:0,他引:6  
刘传光  张桂权 《遗传》2006,28(6):737-744
单核苷酸多态性(single nucleotide polymorphisms, SNPs)在水稻中数量多,分布密度高,遗传稳定性高。水稻SNPs的发现方法主要有对样本DNA的PCR产物直接测序、从SSR区段检测SNPs和从基因组序列直接搜索等。目前已有多种基因分型技术运用到了水稻SNPs检测,SNPs检测的高度自动化使水稻SNPs基因分型非常方便。单核苷酸多态性在水稻遗传图谱的构建、基因克隆和功能基因组学研究、标记辅助选择育种、遗传资源分类及物种进化等方面的应用具有巨大潜力。  相似文献   

18.
Although pioneering sequencing projects have shed light on the boxer and poodle genomes, a number of challenges need to be met before the sequencing and annotation of the dog genome can be considered complete. Here, we present the DNA sequence of the Jindo dog genome, sequenced to 45-fold average coverage using Illumina massively parallel sequencing technology. A comparison of the sequence to the reference boxer genome led to the identification of 4 675 437 single nucleotide polymorphisms (SNPs, including 3 346 058 novel SNPs), 71 642 indels and 8131 structural variations. Of these, 339 non-synonymous SNPs and 3 indels are located within coding sequences (CDS). In particular, 3 non-synonymous SNPs and a 26-bp deletion occur in the TCOF1 locus, implying that the difference observed in cranial facial morphology between Jindo and boxer dogs might be influenced by those variations. Through the annotation of the Jindo olfactory receptor gene family, we found 2 unique olfactory receptor genes and 236 olfactory receptor genes harbouring non-synonymous homozygous SNPs that are likely to affect smelling capability. In addition, we determined the DNA sequence of the Jindo dog mitochondrial genome and identified Jindo dog-specific mtDNA genotypes. This Jindo genome data upgrade our understanding of dog genomic architecture and will be a very valuable resource for investigating not only dog genetics and genomics but also human and dog disease genetics and comparative genomics.  相似文献   

19.
Molecular markers are used to provide the link between genotype and phenotype, for the production of molecular genetic maps and to assess genetic diversity within and between related species. Single nucleotide polymorphisms (SNPs) are the most abundant molecular genetic marker. SNPs can be identified in silico , but care must be taken to ensure that the identified SNPs reflect true genetic variation and are not a result of errors associated with DNA sequencing. The SNP detection method autoSNP has been developed to identify SNPs from sequence data for any species. Confidence in the predicted SNPs is based on sequence redundancy, and haplotype co-segregation scores are calculated for a further independent measure of confidence. We have extended the autoSNP method to produce autoSNPdb, which integrates SNP and gene annotation information with a graphical viewer. We have applied this software to public barley expressed sequences, and the resulting database is available over the Internet. SNPs can be viewed and searched by sequence, functional annotation or predicted synteny with a reference genome, in this case rice. The correlation between SNPs and barley cultivar, expressed tissue type and development stage has been collated for ease of exploration. An average of one SNP per 240 bp was identified, with SNPs more prevalent in the 5' regions and simple sequence repeat (SSR) flanking sequences. Overall, autoSNPdb can provide a wealth of genetic polymorphism information for any species for which sequence data are available.  相似文献   

20.
The reported draft human genome sequence includes many contigs that are separated by gaps of unknown sequence. These gaps may be due to chromosomal regions that are not present in the Escherichia coli libraries used for DNA sequencing because they cannot be cloned efficiently, if at all, in bacteria. Using a yeast artificial chromosome (YAC)/ bacterial artificial chromosome (BAC) library generated in yeast, we found that approximately 6% of human DNA sequences tested transformed E. coli cells less efficiently than yeast cells, and were less stable in E. coli than in yeast. When the ends of several YAC/BAC isolates cloned in yeast were sequenced and compared with the reported draft sequence, major inconsistencies were found with the sequences of those YAC/BAC isolates that transformed E. coli cells inefficiently. Two human genomic fragments were re-isolated from human DNA by transformation-associated recombination (TAR) cloning. Re-sequencing of these regions showed that the errors in the draft are the results of both missassembly and loss of specific DNA sequences during cloning in E. coli. These results show that TAR cloning might be a valuable method that could be widely used during the final stages of the Human Genome Project.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号