首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Comparative evolutionary analysis of whole genomes requires not only accurate annotation of gene space, but also proper annotation of the repetitive fraction which is often the largest component of most if not all genomes larger than 50 kb in size.

Results

Here we present the Rice TE database (RiTE-db) - a genus-wide collection of transposable elements and repeated sequences across 11 diploid species of the genus Oryza and the closely-related out-group Leersia perrieri. The database consists of more than 170,000 entries divided into three main types: (i) a classified and curated set of publicly-available repeated sequences, (ii) a set of consensus assemblies of highly-repetitive sequences obtained from genome sequencing surveys of 12 species; and (iii) a set of full-length TEs, identified and extracted from 12 whole genome assemblies.

Conclusions

This is the first report of a repeat dataset that spans the majority of repeat variability within an entire genus, and one that includes complete elements as well as unassembled repeats. The database allows sequence browsing, downloading, and similarity searches. Because of the strategy adopted, the RiTE-db opens a new path to unprecedented direct comparative studies that span the entire nuclear repeat content of 15 million years of Oryza diversity.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1762-3) contains supplementary material, which is available to authorized users.  相似文献   

2.
Itoh Y  Kampf K  Arnold AP 《Chromosoma》2008,117(2):111-121
The zebra finch (Taeniopygia guttata) has a large Z chromosome and highly condensed W chromosome. We used the random amplified polymorphic DNA (RAPD) polymerase chain reaction (PCR) technique to isolate female-specific sequences ZBM1 and ZBM2. Southern blot hybridization to male and female zebra finch genomic DNA suggested that these sequences were located on the W chromosome, although homologous sequences appeared to be autosomal or Z-linked. Fluorescent in situ hybridization (FISH) using bacterial artificial chromosome (BAC) clones corresponding to ZBM sequences showed hybridization to the whole W chromosome, suggesting that the BACs encode sequences that are repeated across the entire W chromosome. Based on the sequencing of a ZBM repetitive sequence and Z chromosome derived BAC clones, we demonstrate a random distribution of repeat sequences that are specific to the W chromosome or encoded by both Z and W. The positions of ZW-common repeat sequences mapped to a noncoding region of a Z chromosome BAC clone containing the CHD1Z gene. The apparent lineage-specificity of W chromosome repeat sequences in passerines and galliform birds suggest that the W chromosome had not differentiated well from the Z at the time of divergence of these lineages. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

3.
4.
A collection of 9,990 single-pass nuclear genomic sequences, corresponding to 5 Mb of tomato DNA, were obtained using methylation filtration (MF) strategy and reduced to 7,053 unique undermethylated genomic islands (UGIs) distributed as follows: (1) 59% non-coding sequences, (2) 28% coding sequences, (3) 12% transposons—96% of which are class I retroelements, and (4) 1% organellar sequences integrated into the nuclear genome over the past approximately 100 million years. A more detailed analysis of coding UGIs indicates that the unmethylated portion of tomato genes extends as far as 676 bp upstream and 766 bp downstream of coding regions with an average of 174 and 171 bp, respectively. Based on the analysis of the UGI copy distribution, the undermethylated portion of the tomato genome is determined to account for the majority of the unmethylated genes in the genome and is estimated to constitute 61±15 Mb of DNA (~5% of the entire genome)—which is significantly less than the 220 Mb estimated for gene-rich euchromatic arms of the tomato genome. This result indicates that, while most genes reside in the euchromatin, a significant portion of euchromatin is methylated in the intergenic spacer regions. Implications of the results for sequencing the genome of tomato and other solanaceous species are discussed.  相似文献   

5.
We report the results of a comprehensive search of Drosophila melanogaster DNA sequences in GenBank for di-, tri-, and tetranucleotide repeats of more than four repeat units, and a DNA library screen for dinucleotide repeats. Dinucleotide repeats are more abundant (66%) than tri- (30%) or tetranucleotide (4%) repeats. We estimate that 1917 dinucleotide repeats with 10 or more repeat units are present in the euchromatic D. melanogaster genome and, on average, they occur once every 60 kb. Relative to many other animals, dinucleotide repeats in D. melanogaster are short. Tri- and tetranucleotide repeats have even fewer repeat units on average than dinucleotide repeats. Our WorldWide Web site (http://www.bio.cornell.edu/genetics/aquadro/aquadro.html) posts the complete list of 1298 microsatellites (≥ five repeat units) identified from the GenBank search. We also summarize assay conditions for 70 D. melanogaster microsatellites characterized in previous studies and an additional 56 newly characterized markers.  相似文献   

6.
An overview of the apple genome through BAC end sequence analysis   总被引:1,自引:0,他引:1  
The apple, Malus x domestica Borkh., is one of the most important fruit trees grown worldwide. A bacterial artificial chromosome (BAC)-based physical map of the apple genome has been recently constructed. Based on this physical map, a total of approximately 2,100 clones from different contigs (overlapping BAC clones) have been selected and sequenced at both ends, generating 3,744 high-quality BAC end sequences (BESs) including 1,717 BAC end pairs. Approximately 8.5% of BESs contain simple sequence repeats (SSRs), most of which are AT/TA dimer repeats. Potential transposable elements are identified in approximately 21% of BESs, and most of these elements are retrotransposons. About 11% of BESs have homology to the Arabidopsis protein database. The matched proteins cover a broad range of categories. The average GC content of the predicted coding regions of BESs is 42.4%; while, that of the whole BESs is 39%. A small number of BES pairs were mapped to neighboring chromosome regions of A. thaliana and Populus trichocarpa; whereas, no pairs are mapped to the Oryza sativa genome. The apple has a higher degree of synteny with the closely related Populus than with the distantly related Arabidopsis. BAC end sequencing can be used to anchor a small proportion of the apple genome to the Populus and possibly to the Arabidopsis genomes.  相似文献   

7.
Bread wheat (Triticum aestivum L.) is one of the most important crops globally and a high priority for genetic improvement, but its large and complex genome has been seen as intractable to whole genome sequencing. Isolation of individual wheat chromosome arms has facilitated large-scale sequence analyses. However, so far there is no such survey of sequences from the A genome of wheat. Greater understanding of an A chromosome could facilitate wheat improvement and future sequencing of the entire genome. We have constructed BAC library from the long arm of T. aestivum chromosome 1A (1AL) and obtained BAC end sequences from 7,470 clones encompassing the arm. We obtained 13,445 (89.99%) useful sequences with a cumulative length of 7.57 Mb, representing 1.43% of 1AL and about 0.14% of the entire A genome. The GC content of the sequences was 44.7%, and 90% of the chromosome was estimated to comprise repeat sequences, while just over 1% encoded expressed genes. From the sequence data, we identified a large number of sites suitable for development of molecular markers (362 SSR and 6,948 ISBP) which will have utility for mapping this chromosome and for marker assisted breeding. From 44 putative ISBP markers tested 23 (52.3%) were found to be useful. The BAC end sequence data also enabled the identification of genes and syntenic blocks specific to chromosome 1AL, suggesting regions of particular functional interest and targets for future research.  相似文献   

8.
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules.  相似文献   

9.
Macrostructure of the tomato telomeres.   总被引:23,自引:3,他引:20  
The macrostructure of the tomato telomeres has been investigated by in situ hybridization, genomic sequencing, and pulsed-field gel electrophoresis. In situ hybridizations with a cloned telomeric sequence from Arabidopsis thaliana indicated that the telomeric repeat of tomato cross-hybridizes with that of Arabidopsis and is located at all telomeres. Bal31 digestion kinetics confirmed that the tomato telomeric repeat represents the outermost DNA sequence of each tomato chromosome. Genomic sequencing of enriched tomato telomeric sequences, using primers derived from the Arabidopsis sequence, revealed that the consensus sequence of the tomato telomeric repeat is TT(T/A)AGGG compared with the Arabidopsis consensus sequence of TTTAGGG. Furthermore, as shown by pulsed-field gel electrophoresis, the telomeric repeat of tomato is separated by not more than a few hundred kilobases from a previously described 162-base pair satellite DNA repeat of tomato (TGR I) at 20 of the 24 telomeres. Together, these sequences are found in the heterochromatic terminal knob observed in pachytene chromosomes. Therefore, these two repeats determine the structure of 20 of the 24 tomato chromosome ends over approximately 2% of the total chromosome length.  相似文献   

10.
Here, we report genotyping conditions for 434 new polymorphic pig microsatellite markers containing trinucleotide and tetranucleotide repeat motifs in pig. Microsatellite sequences were detected in silico from bacterial artificial chromosome (BAC) clone end sequences and mapped to the pig genome. A set of 22 microsatellites is described, which can be separated in a simultaneous electrophoresis by multiplexing across a large size range, in combination with 4-colour labelling. Marker information content and false pedigree exclusion probabilities are documented in five purebred populations, allowing assessment of this panel in pig parentage testing applications. Combined exclusion probabilities >99.7% were achieved in all pedigree test cases.  相似文献   

11.
We have developed the software package Tomato and Potato Assembly Assistance System (TOPAAS), which automates the assembly and scaffolding of contig sequences for low-coverage sequencing projects. The order of contigs predicted by TOPAAS is based on read pair information; alignments between genomic, expressed sequence tags, and bacterial artificial chromosome (BAC) end sequences; and annotated genes. The contig scaffold is used by TOPAAS for automated design of nonredundant sequence gap-flanking PCR primers. We show that TOPAAS builds reliable scaffolds for tomato (Solanum lycopersicum) and potato (Solanum tuberosum) BAC contigs that were assembled from shotgun sequences covering the target at 6- to 8-fold coverage. More than 90% of the gaps are closed by sequence PCR, based on the predicted ordering information. TOPAAS also assists the selection of large genomic insert clones from BAC libraries for walking. For this, tomato BACs are screened by automated BLAST analysis and in parallel, high-density nonselective amplified fragment length polymorphism fingerprinting is used for constructing a high-resolution BAC physical map. BLAST and amplified fragment length polymorphism analysis are then used together to determine the precise overlap. Assembly onto the seed BAC consensus confirms the BACs are properly selected for having an extremely short overlap and largest extending insert. This method will be particularly applicable where related or syntenic genomes are sequenced, as shown here for the Solanaceae, and potentially useful for the monocots Brassicaceae and Leguminosea.  相似文献   

12.
Zhao C  Zhang T  Zhang X  Hu S  Xiang J 《Gene》2012,502(1):9-15
The sequencing of BAC clones (~100 kb) can reveal some characteristics of a genome that are challenging to obtain based on short sequences. Additionally, although the immune genes of the Zhikong scallop (Chlamys farreri) have been studied widely, few analyses have been conducted at the DNA level. In this study, four C. farreri BAC clones containing innate immune genes, including hsp70, l gbp (lipopolysaccharide and beta-1,3-glucan binding protein), serine protease and a gene with an immunoglobulin-like domain, were sequenced and analyzed both to explore the genomic characteristics of C. farreri based on long DNA sequences and to promote the study of C. farreri immune genes at the DNA level. The total length of the four BACs was 389.98 kb. A total of 34 genes were predicted in these sequences, and several features of protein-coding regions in the C. farreri genome were inferred based on this information. Two LGBP genes were located close together in a 22-kb region in one BAC clone, indicating the physical linkage of some immune genes in C. farreri. A cluster of membrane transport genes was also observed; these genes might play important roles in eliminating toxins in C. farreri, which lives as a filter feeder. Further analysis showed 15.43% of the BAC sequence was repetitive. Tandem repeats were the most abundant repeat type, followed by transposable elements. A total of 31 SSRs were predicted in the four BACs. An IS10 family transposon was identified, and a suspected regulatory non-coding RNA gene for this transposon (RNA-OUT) was observed to overlap with it complementarily. This work will promote future studies on the genomics, immune system and non-coding regions of C. farreri.  相似文献   

13.
Aphids cause serious physical and economic damage to most major crops throughout the world, and there is a pressing requirement to isolate genes conferring aphid resistance. The Sd-1 locus in Malus spp. (apple) confers resistance against the rosy leaf-curling aphid (Dysaphis devecta Wlk.), and was recently positioned within a 1.3-cM region on linkage group 7, flanked by molecular markers. These markers were used as a basis for development of a BAC contig spanning the locus, together with adapter-mediated amplification of flanking sequences to obtain BAC insert-end sequences, and fingerprinting of BAC clones. Approximately 800 kb of the Sd-1 genomic region was covered by 19 overlapping BACs, with an average insert size of 75-150 kb. The physical-genetic distance ratio was estimated at 460 kb/cM, although the distribution of recombination events was irregular with respect to estimated physical distance. Recombinant analysis and development of new markers allowed Sd-1 to be positioned within an interval of approximately 180 kb located on either of two overlapping BACs. From one of these, an insert end sequence showed a significant degree of similarity to nucleotide binding site-leucine rich repeat (NBS-LRR) resistance genes. Fluorescent in situ hybridization (FISH) of BAC clones within the contig enabled positioning and orientation of the locus within a euchromatic region, very close to the telomere of linkage group 7.  相似文献   

14.
Libraries constructed in bacterial artificial chromosome (BAC) vectors have become the choice for clone sets in high throughput genomic sequencing projects primarily because of their high stability. BAC libraries have been proposed as a source for minimally over-lapping clones for sequencing large genomic regions, and the use of BAC end sequences (i.e. sequences adjoining the insert sites) has been proposed as a primary means for selecting minimally overlapping clones for sequencing large genomic regions. For this strategy to be effective, high throughput methods for BAC end sequencing of all the clones in deep coverage BAC libraries needed to be developed. Here we describe a low cost, efficient, 96 well procedure for BAC end sequencing. These methods allow us to generate BAC end sequences from human and Arabidoposis libraries with an average read length of >450 bases and with a single pass sequencing average accuracy of >98%. Application of BAC end sequences in genomic sequen-cing is discussed.  相似文献   

15.
16.
The ankyrin repeat (ANK) protein family plays a crucial role in plant growth and development and in response to biotic and abiotic stresses. However, no detailed information concerning this family is available for tomato (Solanum lycopersicum) due to the limited information on whole genome sequences. In this study, we identified a total of 130 ANK genes in tomato genome (SlANK), and these genes were distributed across all 12 chromosomes at various densities. And chromosomal localizations of SlANK genes indicated 25 SlANK genes were involved in tandem duplications. Based on their domain composition, all of the SlANK proteins were grouped into 13 subgroups. A combined phylogenetic tree was constructed with the aligned SlANK protein sequences. This tree revealed that the SlANK proteins comprise five major groups. An analysis of the expression profiles of SlANK genes in tomato in different tissues and in response to stresses showed that the SlANK proteins play roles in plant growth, development and stress responses. To our knowledge, this is the first report of a genome-wide analysis of the tomato ANK gene family. This study provides valuable information regarding the classification and putative functions of SlANK genes in tomato.  相似文献   

17.
The zebra finch (Taeniopygia guttata) is an important model organism for studying behavior, neuroscience, avian biology, and evolution. To support the study of its genome, we constructed a BAC library (TG__Ba) using DNA from livers of females. The BAC library consists of 147,456 clones with 98% containing inserts of an average size of 134 kb and represents 15.5 haploid genome equivalents. By sequencing a whole BAC, a full-length androgen receptor open reading frame was identified, the first in an avian species. Comparison of BAC end sequences and the whole BAC sequence with the chicken genome draft sequence showed a high degree of conserved synteny between the zebra finch and the chicken genome.  相似文献   

18.
Little is known about the physical makeup of heterochromatin in the soybean (Glycine max L. Merr.) genome. Using DNA sequencing and molecular cytogenetics, an initial analysis of the repetitive fraction of the soybean genome is presented. BAC 076J21, derived from linkage group L, has sequences conserved in the pericentromeric heterochromatin of all 20 chromosomes. FISH analysis of this BAC and three subclones on pachytene chromosomes revealed relatively strict partitioning of the heterochromatic and euchromatic regions. Sequence analysis showed that this BAC consists primarily of repetitive sequences such as a 102-bp tandem repeat with sequence identity to a previously characterized approximately 120-bp repeat (STR120). Fragments of Calypso-like retroelements, a recently inserted SIRE1 element, and a SIRE1 solo LTR were present within this BAC. Some of these sequences are methylated and are not conserved outside of G. max and G. soja, a close relative of soybean, except for STR102, which hybridized to a restriction fragment from G. latifolia. These data present a picture of the repetitive fraction of the soybean genome that is highly concentrated in the pericentromeric regions, consisting of rapidly evolving tandem repeats with interspersed retroelements.  相似文献   

19.
20.
The perennial grass, switchgrass (Panicum virgatum L.), is a promising bioenergy crop and the target of whole genome sequencing. We constructed two bacterial artificial chromosome (BAC) libraries from the AP13 clone of switchgrass to gain insight into the genome structure and organization, initiate functional and comparative genomic studies, and assist with genome assembly. Together representing 16 haploid genome equivalents of switchgrass, each library comprises 101,376 clones with average insert sizes of 144 (HindIII-generated) and 110 kb (BstYI-generated). A total of 330,297 high quality BAC-end sequences (BES) were generated, accounting for 263.2 Mbp (16.4%) of the switchgrass genome. Analysis of the BES identified 279,099 known repetitive elements, >50,000 SSRs, and 2,528 novel repeat elements, named switchgrass repetitive elements (SREs). Comparative mapping of 47 full-length BAC sequences and 330K BES revealed high levels of synteny with the grass genomes sorghum, rice, maize, and Brachypodium. Our data indicate that the sorghum genome has retained larger microsyntenous regions with switchgrass besides high gene order conservation with rice. The resources generated in this effort will be useful for a broad range of applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号