首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.

Background

Most eukaryotic genomes include a substantial repeat-rich fraction termed heterochromatin, which is concentrated in centric and telomeric regions. The repetitive nature of heterochromatic sequence makes it difficult to assemble and analyze. To better understand the heterochromatic component of the Drosophila melanogaster genome, we characterized and annotated portions of a whole-genome shotgun sequence assembly.

Results

WGS3, an improved whole-genome shotgun assembly, includes 20.7 Mb of draft-quality sequence not represented in the Release 3 sequence spanning the euchromatin. We annotated this sequence using the methods employed in the re-annotation of the Release 3 euchromatic sequence. This analysis predicted 297 protein-coding genes and six non-protein-coding genes, including known heterochromatic genes, and regions of similarity to known transposable elements. Bacterial artificial chromosome (BAC)-based fluorescence in situ hybridization analysis was used to correlate the genomic sequence with the cytogenetic map in order to refine the genomic definition of the centric heterochromatin; on the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm.

Conclusions

Whole-genome shotgun assembly produced a reliable draft-quality sequence of a significant part of the Drosophila heterochromatin. Annotation of this sequence defined the intron-exon structures of 30 known protein-coding genes and 267 protein-coding gene models. The cytogenetic mapping suggests that an additional 150 predicted genes are located in heterochromatin at the base of the Release 3 euchromatic sequence. Our analysis suggests strategies for improving the sequence and annotation of the heterochromatic portions of the Drosophila and other complex genomes.  相似文献   

4.
Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ~2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.  相似文献   

5.
Parallel to improvements in DNA sequencing and computer technologies, the output of bio-information grows dramatically every year. More and more species with important commercial, medical and biological significance have been or are being sequenced. There are two kinds of whole-genome sequencing strategies: The clone-by-clone shotgun method (hierarchical shotgun) and the whole-genome shotgun (WGS) method, each with its individual strengths and draw-backs. In the clone-by-clone method, the a…  相似文献   

6.
Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L.ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of Iow-cost whole-genome microarray. With the increasing number of organisms being sequenced,we believe that DB data will play an important role both in other assembly procedures and infuture genomic studies.  相似文献   

7.
Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0582-8) contains supplementary material, which is available to authorized users.  相似文献   

8.
Effects of tethering HP1 to euchromatic regions of the Drosophila genome   总被引:7,自引:0,他引:7  
Heterochromatin protein 1 (HP1) is a conserved non-histone chromosomal protein enriched in heterochromatin. On Drosophila polytene chromosomes, HP1 localizes to centric and telomeric regions, along the fourth chromosome, and to specific sites within euchromatin. HP1 associates with centric regions through an interaction with methylated lysine nine of histone H3, a modification generated by the histone methyltransferase SU(VAR)3-9. This association correlates with a closed chromatin configuration and silencing of euchromatic genes positioned near heterochromatin. To determine whether HP1 is sufficient to nucleate the formation of silent chromatin at non-centric locations, HP1 was tethered to sites within euchromatic regions of Drosophila chromosomes. At 25 out of 26 sites tested, tethered HP1 caused silencing of a nearby reporter gene. The site that did not support silencing was upstream of an active gene, suggesting that the local chromatin environment did not support the formation of silent chromatin. Silencing correlated with the formation of ectopic fibers between the site of tethered HP1 and other chromosomal sites, some containing HP1. The ability of HP1 to bring distant chromosomal sites into proximity with each other suggests a mechanism for chromatin packaging. Silencing was not dependent on SU(VAR)3-9 dosage, suggesting a bypass of the requirement for histone methylation.  相似文献   

9.
The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%–5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.  相似文献   

10.
A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926  相似文献   

11.
12.
13.
14.
We describe a cloned segment of unique DNA from the Oregon R strain of Drosophila melanogaster that contains a short type I insertion of the kind principally found within rDNA. The predominant type I rDNA insertion is 5kb in length, but there are also a co-terminal sub-set of shorter type I elements that share a common right hand junction with the rDNA. The insertion that we now describe is another member of this sub-set. The right hand junction of the type I sequence with the unique DNA is identical to the right hand junction of the type I sequences with rDNA. There is no significant feature within the insertion sequence that could have determined the position of the left junction with the sequence into which it is inserted. Like the corresponding short type I insertions in rDNA, the insertion into the unique DNA is flanked on both sides by a duplicated sequence, which in this case is 10 base pairs long. The cloning of a sequence corresponding to the uninterrupted unique location was facilitated by the observation that the Karsnas strain of D. melanogaster contains only uninterrupted sequences of this kind. The duplicated sequence at the target site for the insertion is only present as a single copy in the uninterrupted DNA. The sequence of the target site for the insertion (ACTGTTCT) in the unique segment shows a striking homology to the target in rDNA (ACTGTCCC).  相似文献   

15.
16.
17.
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a single sequencing reaction, is used to interpret the multiplex PCR results. Two protocols are presented, one that minimizes pipetting and another that minimizes the number of reactions. The pipette optimized multiplex PCR method has been employed in the final phases of closing the Streptococcus pneumoniae genome sequence, with excellent results.  相似文献   

18.
Phenol oxidase exists in Drosophila hemolymph as a prophenol oxidase, A1 and A3, that is activated in vivo with a native activating system, AMM-1, by limited proteolysis with time. The polypeptide in purified prophenol oxidase A3 has a molecular weight of approximately 77,000 Da. A PCR-based cDNA sequence coding A3 has 2501 bp encoding an open reading frame of 682 amino acid residues. The potential copper-binding sites, from Trp-196 to Tyr-245, and from Asn-366 to Phe-421, are highly homologous to the corresponding sites in other invertebrates. The availability of prophenol oxidase cDNA should be useful in revealing the biochemical differences between A1 and A3 isoforms in Drosophila melanogaster that are refractory or unable to activate prophenol oxidase.  相似文献   

19.
We have investigated at the molecular level four cases in which D. melanogaster middle repetitive DNA probes consistently hybridized to a particular band on chromosomes sampled from a D. melanogaster natural population. Two corresponded to true fixations of a roo and a Stalker element, and the others were artefacts of the in situ hybridization technique caused by the presence of genomic DNA flanking the transposable elements (TEs) in the probes. The two fixed elements are located in the beta-heterochromatin (20A and 80B, respectively) and are embedded in large clusters of other elements, many of which may also be fixed. We also found evidence that this accumulation is an ongoing process. These results support the hypothesis that TEs accumulate in the non-recombining part of the genome. Their implications for the effects of TEs on determining the chromatin structure of the host genomes are discussed in the light of recent evidence for the role of TE-derived small interfering-RNAs as cis -acting determinants of heterochromatin formation.  相似文献   

20.
Heterozygosity is a major challenge to efficient, high-quality genomic assembly and to the full genomic survey of polymorphism and divergence. In Drosophila melanogaster lines derived from equatorial populations are particularly resistant to inbreeding, thus imposing a major barrier to the determination and analyses of genomic variation in natural populations of this model organism. Here we present a simple genome sequencing protocol based on the whole-genome amplification of the gynogenetically derived haploid genome of a progeny of females mated to males homozygous for the recessive male sterile mutation, ms(3)K81. A single "lane" of paired-end sequences (2 × 76 bp) provides a good syntenic assembly with >95% high-quality coverage (more than five reads). The amplification of the genomic DNA moderately inflates the variation in coverage across the euchromatic portion of the genome. It also increases the frequency of chimeric clones. But the low frequency and random genomic distribution of the chimeric clones limits their impact on the final assemblies. This method provides a solid path forward for population genomic sequencing and offers applications to many other systems in which small amounts of genomic DNA have unique experimental relevance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号