Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L.ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of Iow-cost whole-genome microarray. With the increasing number of organisms being sequenced,we believe that DB data will play an important role both in other assembly procedures and infuture genomic studies. 相似文献
Parallel to improvements in DNA sequencing and computer technologies, the output of bio-information grows dramatically every year. More and more species with important commercial, medical and biological significance have been or are being sequenced. There are two kinds of whole-genome sequencing strategies: The clone-by-clone shotgun method (hierarchical shotgun) and the whole-genome shotgun (WGS) method, each with its individual strengths and draw-backs. In the clone-by-clone method, the a… 相似文献
Two recombinant plasmids containing, respectively, three and eight tandem repeats of a 177 base-pair (bp) element from radish nuclear DNA have been isolated. These plasmids were used as probes to investigate the organization and the copy number of this element within the genome. This sequence is present in congruent to 0.6 million copies. Restriction analysis provides evidence for sequence heterogeneity and reveals the occurrence of non-overlapping subfamilies. Nine units were sequenced and found to be remarkably conserved. However, sequences in the two clones clearly belong to two distinct subgroups. Our data suggest that these sequences evolved in a concerted manner and that homogenization mechanisms such as gene conversions certainly took place. The 177 bp sequence is made from three 60 bp blocks that are derived from a common ancestor. Exchanges between the three blocks probably occurred before they became fixed as a patchwork of short sequences, the 177 bp element. This unit of 177 bp was then amplified in several steps. The presence of such a repeated sequence can be detected in other Cruciferae when hybridizations are carried out under low stringency conditions. Direct comparison with a previously published mustard satellite DNA sequence indicates a similar organization and a 75% homology. Homology was also found with shorter regions (congruent to 60 bp) of broad bean and corn satellite DNA. Finally, homology was also found with several animal alphoid sequences, suggesting that this family also occurs in the plant genomes. 相似文献
The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.
Results
Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp.
Conclusions
The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis. 相似文献
The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems. 相似文献
We determined nucleotide sequences of homologous 0.9-kb fragments of
mitochondrial DNAs (mtDNAs) derived from four species of old-world monkeys,
one species of new-world monkeys, and two species of prosimians. With these
nucleotide sequences and homologous sequences for five species of
hominoids, we constructed a phylogenetic tree for the four groups of
primates. The phylogeny obtained is generally consistent with evolutionary
trees constructed in previous studies. Our results also suggest that the
rate of nucleotide substitution for mtDNAs in hominines (human, chimpanzee,
and gorilla) may have slowed down compared with that for old-world monkeys.
This evolutionary feature of mitochondrial genes is similar to one found in
nuclear genes.
相似文献
Field mice of the genus Calomys are small, mostly granivorous rodents common to several habitats in South America. To date, phylogenies for the genus have been proposed on the basis of morphological, chromosomal, and biochemical data, often with contradictory results due to incomplete species sampling or methodological shortcomings. In this paper, we propose relationships among 10 species of Calomys based on the complete cytochrome b gene sequence. Our analyses show that Calomys is constituted by two major clades, one mostly associated with mountain habitats with subsequent invasions to lowland habitats and another with species restricted to lowland habitats both north and south of the Amazon basin. The evolution of the genus was likely accompanied by a reduction of chromosome diploid numbers that occurred independently in each of the two evolutionary lineages. A "clock" calibrated on the split between Auliscomys and Loxodontomys suggests that the almost nonexistent fossil record for the genus greatly underestimates divergence times among its species. 相似文献
The molecular mechanism involved in packaging centromeric heterochromatin is still poorly understood. CENP-B, a centromeric protein present in human cells, is though to be involved in this process. This is a DNA-binding protein that localizes to the central domain of the centromere of human and mouse chromosomes due to its association with the 17-bp CENP-B box sequence. We have designed a biochemical approach to search for functional homologues of CENP-B in Drosophila melanogaster. This strategy relies upon the use of DNA fragments containing the CENP-B box to identify proteins that specifically bind this sequence. Three polypeptides were isolated by nuclear protein extraction, followed by sequential ion exchange columns and DNA affinity chromatography. All three proteins are present in the complex formed after gel retardation with the human alphoid satellite DNA that contains the CENP-B box. Footprinting analysis reveals that the complex occupies both strands of the CENP-B box, although it is still unclear which of the polypeptides actually makes contact with the DNA. Localization of fluorescein-labeled proteins after microinjection into early Drosophila embryos shows that they associate with condensed chromosomes. Immunostaining of embryos with a polyclonal serum made against all three polypeptides also shows chromosomal localization throughout mitosis. During metaphase and anaphase the antigens appear to localize preferentially to centromeric heterochromatin. Immunostaining of neuroblasts chromosome spreads confirmed these results, though some staining of chromosomal arms is also observed. The data strongly suggests that the polypeptides we have identified are chromosomal binding proteins that accumulate mainly at the centromeric heterochromatin. Furthermore, DNA binding assays clearly indicate that they have a high specific affinity for the human CENP-B box. This would suggest that at least one of the three proteins isolated might be a functional homologue of the human CENP-B. 相似文献
Whole genome shotgun sequence analysis has become the standard method for beginning to determine a genome sequence. The preparation of the shotgun sequence clones is, in fact, a biological experiment. It determines which segments of the genome can be cloned into Escherichia coli and which cannot. By analyzing the complete set of sequences from such an experiment, it is possible to identify genes lethal to E. coli. Among this set are genes encoding restriction enzymes which, when active in E. coli, lead to cell death by cleaving the E. coli genome at the restriction enzyme recognition sites. By analyzing shotgun sequence data sets we show that this is a reliable method to detect active restriction enzyme genes in newly sequenced genomes, thereby facilitating functional annotation. Active restriction enzyme genes have been identified, and their activity demonstrated biochemically, in the sequenced genomes of Methanocaldococcus jannaschii, Bacillus cereus ATCC 10987 and Methylococcus capsulatus. 相似文献
The cosmid clone, CX16-2D12, was previously localized to the centromeric region of the human X chromosome and shown to lack human X-specific satellite DNA. A 1.2 kb EcoRI fragment was subcloned from the CX16-2D12 cosmid and was named 2D12/E2. DNA sequencing revealed that this 1,205 bp fragment consisted of approximately five tandemly repeated DNA monomers of 220 bp. DNA sequence homology between the monomers of 2D12/E2 ranged from 72.8% to 78.6%. Interestingly, DNA sequence analysis of the 2D12/E2 clone displayed a change in monomer unit orientation between nucleotide positions 585–586 from a tail-to-head arrangement to a head-to-tail configuration. This may reflect the existence of at least one inversion within this repetitive DNA array in the centromeric region of the human X chromosome. The DNA consensus sequence derived from a compilation of these 220 bp monomers had approximately 62% DNA sequence similarity to the previously determined 8 satellite DNA consensus sequence. Comparison of the 2D12/E2 and 8 consensus sequences revealed a 20 bp DNA sequence that was well conserved in both DNA consensus sequences. Slot-blot analysis revealed that this repetitive DNA sequence comprises approximately 0.015% of the human genome, similar to that found with 8 satellite DNA. These observations suggest that this satellite DNA clone is derived from a subfamily of satellite DNA and is thus designated X satellite DNA. When genomic DNA from six unrelated males and two unrelated females was cut with SstI or HpaI and separated by pulsed-field gel electrophoresis, no restriction fragment length polymorphisms were observed for either X (2D12/E2) or 8 (50E4) probes. Fluorescence in situ hybridization localized the 2D12/E2 clone to the lateral sides of the primary constriction specifically on the human X chromosome. 相似文献
Yersinia pestis is the causative agent of the bubonic, septicemic, and pneumonic plagues (also known as black death) and has been responsible for recurrent devastating pandemics throughout history. To further understand this virulent bacterium and to accelerate an ongoing sequencing project, two whole-genome restriction maps (XhoI and PvuII) of Y. pestis strain KIM were constructed using shotgun optical mapping. This approach constructs ordered restriction maps from randomly sheared individual DNA molecules directly extracted from cells. The two maps served different purposes; the XhoI map facilitated sequence assembly by providing a scaffold for high-resolution alignment, while the PvuII map verified genome sequence assembly. Our results show that such maps facilitated the closure of sequence gaps and, most importantly, provided a purely independent means for sequence validation. Given the recent advancements to the optical mapping system, increased resolution and throughput are enabling such maps to guide sequence assembly at a very early stage of a microbial sequencing project. 相似文献
The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.
Methods
Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated.
Results
Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs.
Conclusions
Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability. 相似文献
The human alpha satellite repetitive DNA family is organized as distinct chromosomal subsets located at the centromeric regions of each human chromosome. Here, we describe a subset of the alpha satellite which is localized to human chromosome 11. The principal unit of repetition of this alpha satellite subset is an 850 bp XbaI fragment composed of five tandem diverged alphoid monomers, each 171 bp in length. The pentamer repeat units are themselves tandemly reiterated, present in 500 copies per chromosome 11. In filter hybridization experiments, the Alpha 11 probes are specific for the centromeric alpha satellite sequences of human chromosome 11. The complete nucleotide sequences of two independent copies of the XbaI pentamer reveal a pentameric configuration shared with the alphoid repeats of chromosomes 17 and X, consistent with the existence of an ancestral pentameric repeat common to the centromeric arrays of at least these three human chromosomes. 相似文献
Radiation-induced single-strand breaks were found throughout the 172 bp repeat units of African green monkey component alpha DNA. Two kinds of 3'-ends of 5'-32P-labeled restriction fragments were found, as previously described by others. After irradiation in vitro, the yield of single-strand breaks was 4 X 10(-5) breaks/nucleotide/Gy, as determined by analyses in DNA sequencing type gels. Protection from X-ray damage was found when the DNA received 150 Gy in the presence of 2-mercaptoethanol. The results demonstrate a very sensitive quantitative means to study the role of indirect effects of ionizing radiation on strand-break induction and protection at the base sequence level. Component alpha DNA was isolated from irradiated CV-1 cells and was analyzed for single-strand breaks. Under these conditions the frequency of breaks was less than the frequency obtained when purified DNA was irradiated. The methodology is presented because of its relevance to the study of DNA strand breakage in living cells. 相似文献
In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.
Methods
Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.
Results
Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.
Conclusions
Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.
Electronic supplementary material
The online version of this article (doi:10.1186/s12711-015-0149-x) contains supplementary material, which is available to authorized users. 相似文献
Chloroplast gene matK sequence data were used to estimate the phylogeny of 112 species of Crassulaceae sampled from 33 genera and all six recognized subfamilies. Our analyses suggest that five of six subfamilies recognized in the most recent comprehensive classification of the family are not monophyletic. Instead, we recovered a basal split in Crassulaceae between the southern African CRASSULA: clade (Crassuloideae) and the rest of the family (Sedoideae). These results are compatible with recent studies of cpDNA restriction site analyses. Within Sedoideae, four subclades were also recovered: KALANCHOE:, Leucosedum, Acre, and AEONIUM:; evidence also exists for a TELEPHIUM: clade and SEMPERVIVUM: clade. The genus SEDUM: is highly polyphyletic with representatives spread throughout the large Sedoideae clade. Sympetaly and polymerous flowers have arisen multiple times in Crassulaceae and thus are not appropriate characters upon which to base subfamilial limits, as has been done in the past. One floral character, haplostemy, appears to be confined to the well-supported CRASSULA: clade. Our analyses suggest a southern African origin of the family, with subsequent dispersal northward into the Mediterranean region. From there, the family spread to Asia/eastern Europe and northern Europe; two separate lineages of European Crassulaceae subsequently dispersed to North America and underwent substantial diversification. Our analyses also suggest that the original base chromosome number in Crassulaceae is x = 8 and that polyploidy has played an important role in seven clades. Three of these clades are exclusively polyploid (SEMPERVIVUM: clade and two subclades within the KALANCHOE: and AEONIUM: clades), whereas four (Crassula, Telephium, Leucosedum, and ACRE: clades) comprise both diploid and polyploid taxa. Polyploidy is particularly rampant and cytological evolution especially complex in the ACRE: clade. 相似文献
Despite the growing number of sequenced bovine genomes, the knowledge of the population-wide variation of sequences remains limited. In many studies, statistical methodology was not applied in order to relate findings in the sequenced samples to a population-wide level. Our goal was to assess the population-wide variation in DNA sequence based on whole-genome sequences of 32 Holstein–Friesian cows. The number of SNPs significantly varied across individuals. The number of identified SNPs increased with coverage, following a logarithmic curve. A total of 15,272,427 SNPs were identified, 99.16 % of them being bi-allelic. Missense SNPs were classified into three categories based on their genomic location: housekeeping genes, genes undergoing strong selection, and genes neutral to selection. The number of missense SNPs was significantly higher within genes neutral to selection than in the other two categories. The number of variants located within 3′UTR and 5′UTR regions was also significantly different across gene families. Moreover, the number of insertions and deletions differed significantly among cows varying between 261,712 and 330,103 insertions and from 271,398 to 343,649 deletions. Results not only demonstrate inter-individual variation in the number of SNPs and indels but also show that the number of missense SNPs differs across genes representing different functional backgrounds.