共查询到20条相似文献,搜索用时 31 毫秒
1.
Cheung J Wilson MD Zhang J Khaja R MacDonald JR Heng HH Koop BF Scherer SW 《Genome biology》2003,4(8):R47
Background
The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies.Results
We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice.Conclusion
Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.2.
Chudin Eugene Walker Randal Kosaka Alan Wu Sue X Rabert Douglas Chang Thomas K Kreder Dirk E 《Genome biology》2002,4(1):1-10
Background
The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods.Results
We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes.Conclusion
The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics. 相似文献3.
The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes. 相似文献
4.
Mukesh Jain Pushp Priya Shalu Jhanwar Aamir W. Khan Niraj Shah Vikas K. Singh Rohini Garg Ganga Jeena Manju Yadav Chandra Kant Priyanka Sharma Gitanjali Yadav Sabhyata Bhatia Akhilesh K. Tyagi Debasis Chattopadhyay 《The Plant journal : for cell and molecular biology》2013,74(5):715-729
Cicer arietinum L. (chickpea) is the third most important food legume crop. We have generated the draft sequence of a desi‐type chickpea genome using next‐generation sequencing platforms, bacterial artificial chromosome end sequences and a genetic map. The 520‐Mb assembly covers 70% of the predicted 740‐Mb genome length, and more than 80% of the gene space. Genome analysis predicts the presence of 27 571 genes and 210 Mb as repeat elements. The gene expression analysis performed using 274 million RNA‐Seq reads identified several tissue‐specific and stress‐responsive genes. Although segmental duplicated blocks are observed, the chickpea genome does not exhibit any indication of recent whole‐genome duplication. Nucleotide diversity analysis provides an assessment of a narrow genetic base within the chickpea cultivars. We have developed a resource for genetic markers by comparing the genome sequences of one wild and three cultivated chickpea genotypes. The draft genome sequence is expected to facilitate genetic enhancement and breeding to develop improved chickpea varieties. 相似文献
5.
Jun Hyuck Lee Hye Yeon Koh Sung Gu Lee Shawn Doyle Brent C. Christner Hak Jun Kim 《Journal of bacteriology》2012,194(23):6636
We report the draft genome sequence of Paenisporosarcina sp. strain TG-20, which is 4.12 Mb in size and consists of 4,071 protein-coding genes and 76 RNA genes. The genome sequence of Paenisporosarcina sp. TG-20 may provide useful information about molecular adaptations that enhance survival in icy subsurface environments. 相似文献
6.
7.
Gene duplication is one of the major driving forces shaping genome and organism evolution and thought to be itself regulated by some intrinsic properties of the gene. Comparing the essential genes among mouse and human, we observed that the essential genes avoid duplication in mouse while prefer to remain duplicated in humans. In this study, we wanted to explore the reasons behind such differences in gene essentiality by cross-species comparison of human and mouse. Moreover, we examined essential genes that are duplicated in humans are functionally more redundant than that in mouse. The proportion of paralog pseudogenization of essential genes is higher in mouse than that of humans. These duplicates of essential genes are under stringent dosage regulation in human than in mouse. We also observed slower evolutionary rate in the paralogs of human essential genes than the mouse counterpart. Together, these results clearly indicate that human essential genes are retained as duplicates to serve as backed up copies that may shield themselves from harmful mutations. 相似文献
8.
Identification of six novel genes by experimental validation of GeneMachine predicted genes 总被引:1,自引:0,他引:1
Makalowska I Sood R Faruque MU Hu P Robbins CM Eddings EM Mestre JD Baxevanis AD Carpten JD 《Gene》2002,284(1-2):203-213
9.
10.
Pershouse M Li J Yang C Su H Brundage E Di W Biggs PJ Bradley A Chinault AC 《Genomics》2000,69(1):139-142
Even with the completion of a draft version of the human genome sequence only a fraction of the genes identified from this sequence have known functions. Chromosomal engineering in mouse cells, in concert with gene replacement assays to prove the functional significance of a given genomic region or gene, represents a rapid and productive means for understanding the role of a given set of genes. Both techniques rely heavily on detailed maps of chromosomal regions, initially to understand the scope of the regions being modified and finally to provide the cloned resources necessary to allow both finished sequencing and large insert complementation. This report describes the creation of a BAC clone contig on mouse chromosome 11 in a region showing conservation of synteny with sequences on human chromosome 17. We have created a detailed map of an approximately 3-cM region containing at least 33 genes through the use of multiple BAC mapping strategies, including chromosome walking and multiplex oligonucleotide hybridization and gap filling. The region described is one of the targets of a large effort to create a series of mice with regional deletions on mouse chromosome 11 (33-80 cM) that can subsequently be subjected to further mutagenesis. 相似文献
11.
12.
Ancient duplications of the human proglucagon gene 总被引:5,自引:0,他引:5
Irwin DM 《Genomics》2002,79(5):741-746
The human proglucagon gene (GCG) is encoded within a finished 576-kb DNA sequence generated by the Human Genome Project. GCG is flanked by 18 kb and 65 kb of DNA, 5' and 3', respectively, that do not encode genes. The genomic sequence that includes GCG was found to have a long history of gene duplication events. Some members of the glucagon-like family of genes, GCG on chromosome 2 and GIP on chromosome 17, may be products of ancient genome duplications on the early vertebrate lineage. A large genomic tandem duplication event that included DPP4-like and GCG genes occurred before the amphibian-mammal divergence, but one of the duplicated copies of GCG has been lost on the human lineage. Recently, a processed pseudogene of the X-chromosome-linked gene TIMM8A was inserted downstream of GCG. Some ancient duplicates of GCG may retain physiological functions in other vertebrates. 相似文献
13.
It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human–mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Futhermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes. 相似文献
14.
Christina Gabrielsen Dag A. Brede Pablo E. Hernández Ingolf F. Nes Dzung B. Diep 《Journal of bacteriology》2012,194(24):6976-6977
This work describes the draft genome sequence of Lactococcus garvieae DCC43. The 2.2-Mb draft genome contains 2,227 predicted protein-coding genes, among which is a region encoding the bacteriocin garvicin ML. No antibiotic resistance genes or capsule-related virulence genes were identified. Two plasmid replication regions indicate that this strain likely contains plasmids. Comparative genomics suggests that this strain displays a high degree of sequence variation from the previously sequenced L. garvieae strains. 相似文献
15.
16.
17.
18.
The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of "reliable" overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our "reliable-overlap" algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps. 相似文献
19.
The primary objective of this study was to create a genome-wide high resolution map (i.e., >100 bp) of 'rearrangement hotspots' which can facilitate the identification of regions capable of mediating de novo deletions or duplications in humans. A hierarchical method was employed to fragment segmental duplications (SDs) into multiple smaller SD units. Combining an end space free pairwise alignment algorithm with a 'seed and extend' approach, we have exhaustively searched 409 million alignments to detect complex structural rearrangements within the reference-guided assembly of the NA18507 human genome (18× coverage), including the previously identified novel 4.8 Mb sequence from de novo assembly within this genome. We have identified 1,963 rearrangement hotspots within SDs which encompass 166 genes and display an enrichment of duplicated gene nucleotide variants (DNVs). These regions are correlated with increased non-allelic homologous recombination (NAHR) event frequency which presumably represents the origin of copy number variations (CNVs) and pathogenic duplications/deletions. Analysis revealed that 20% of the detected hotspots are clustered within the proximal and distal SD breakpoints flanked by the pathogenic deletions/duplications that have been mapped for 24 NAHR-mediated genomic disorders. FISH Validation of selected complex regions revealed 94% concordance with in silico localization of the highly homologous derivatives. Other results from this study indicate that intra-chromosomal recombination is enhanced in genic compared with agenic duplicated regions, and that gene desert regions comprising SDs may represent reservoirs for creation of novel genes. The generation of genome-wide signatures of 'rearrangement hotspots', which likely serve as templates for NAHR, may provide a powerful approach towards understanding the underlying mutational mechanism(s) for development of constitutional and acquired diseases. 相似文献
20.
Hoskins RA Smith CD Carlson JW Carvalho AB Halpern A Kaminker JS Kennedy C Mungall CJ Sullivan BA Sutton GG Yasuhara JC Wakimoto BT Myers EW Celniker SE Rubin GM Karpen GH 《Genome biology》2002,3(12):research0085.1-8516