首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
EcoGene: a genome sequence database for Escherichia coli K-12   总被引:5,自引:1,他引:4       下载免费PDF全文
The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Center. The EcoGene12 release includes 4293 genes. EcoGene12 differs from the GenBank annotation of the complete genome sequence in several ways, including (i) the revision of 706 predicted or confirmed gene start sites, (ii) the correction or hypothetical reconstruction of 61 frame-shifts caused by either sequence error or mutation, (iii) the reconstruction of 14 protein sequences interrupted by the insertion of IS elements, and (iv) pre-dictions that 92 genes are partially deleted gene fragments. A literature survey identified 717 proteins whose N-terminal amino acids have been verified by sequencing. 12 446 cross-references to 6835 literature citations and s are provided. EcoGene is accessible at a new website: http://bmb.med.miami.edu/EcoGene/EcoWeb. Users can search and retrieve individual EcoGene GenePages or they can download large datasets for incorporation into database management systems, facilitating various genome-scale computational and functional analyses.  相似文献   

2.
We present a relational database program developed in FoxBase+/Mac for the viewing and manipulation of ordered restrictionmaps and associated features of the Escherichia coli genomeincluding scquenced genes and the Kohara miniset of bacteriophagelambda clones. Use of this program allows easy access to thewealth of information being collected in a datase ofDNA sequences,maps and genetic data known as EcoSeq, EcoMap and EcoGene respectively.  相似文献   

3.
A growing variety of “genotype-by-sequencing” (GBS) methods use restriction enzymes and high throughput DNA sequencing to generate data for a subset of genomic loci, allowing the simultaneous discovery and genotyping of thousands of polymorphisms in a set of multiplexed samples. We evaluated a “double-digest” restriction-site associated DNA sequencing (ddRAD-seq) protocol by 1) comparing results for a zebra finch (Taeniopygia guttata) sample with in silico predictions from the zebra finch reference genome; 2) assessing data quality for a population sample of indigobirds (Vidua spp.); and 3) testing for consistent recovery of loci across multiple samples and sequencing runs. Comparison with in silico predictions revealed that 1) over 90% of predicted, single-copy loci in our targeted size range (178–328 bp) were recovered; 2) short restriction fragments (38–178 bp) were carried through the size selection step and sequenced at appreciable depth, generating unexpected but nonetheless useful data; 3) amplification bias favored shorter, GC-rich fragments, contributing to among locus variation in sequencing depth that was strongly correlated across samples; 4) our use of restriction enzymes with a GC-rich recognition sequence resulted in an up to four-fold overrepresentation of GC-rich portions of the genome; and 5) star activity (i.e., non-specific cutting) resulted in thousands of “extra” loci sequenced at low depth. Results for three species of indigobirds show that a common set of thousands of loci can be consistently recovered across both individual samples and sequencing runs. In a run with 46 samples, we genotyped 5,996 loci in all individuals and 9,833 loci in 42 or more individuals, resulting in <1% missing data for the larger data set. We compare our approach to similar methods and discuss the range of factors (fragment library preparation, natural genetic variation, bioinformatics) influencing the recovery of a consistent set of loci among samples.  相似文献   

4.

Background

The cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation.

Results

The optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts).

Conclusion

Alignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI’s current designation of UMD3.1 sequence assembly as the “reference assembly” and the Btau4.6 as the “alternate assembly.” The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1823-7) contains supplementary material, which is available to authorized users.  相似文献   

5.
The accelerated rate of genomic sequencing has led to an abundance of completely sequenced genomes. Annotation of the open reading frames (ORFs) (i.e., gene prediction) in these genomes is an important task and is most often performed computationally based on features in the nucleic acid sequence. Using recent advances in proteomics, we set out to predict the set of ORFs for an organism based principally on expressed protein-based evidence. Using a novel search strategy, we mapped peptides detected in a whole-cell lysate of Mycoplasma pneumoniae onto a genomic scaffold and extended these "hits" into ORFs bound by traditional genetic signals to generate a "proteogenomic map". We were able to generate an ORF model for M. pneumoniae strain FH using proteomic data with a high correlation to models based on sequence features. Ultimately, we detected over 81% of the genomically predicted ORFs in M. pneumoniae strain M129 (the originally sequenced strain). We were also able to detect several new ORFs not originally predicted by genomic methods, various N-terminal extensions, and some evidence that would suggest that certain predicted ORFs are bogus. Some of these differences may be a result of the strain analyzed but demonstrate the robustness of protein analysis across closely related genomes. This technique is a cost-effective means to add value to genome annotation, and a prerequisite for proteome quantitation and in vivo interaction measures.  相似文献   

6.
It is widely accepted that people establish allocentric spatial representation after learning a map. However, it is unknown whether people can directly acquire egocentric representation after map learning. In two experiments, the participants learned a distal environment through a map and then performed the egocentric pointing tasks in that environment under three conditions: with the heading aligned with the learning perspective (baseline), after 240° rotation from the baseline (updating), and after disorientation (disorientation). Disorientation disrupted the internal consistency of pointing among objects when the participants learned the sequentially displayed map, on which only one object name was displayed at a time while the location of “self” remained on the screen all the time. However, disorientation did not affect the internal consistency of pointing among objects when the participants learned the simultaneously displayed map. These results suggest that the egocentric representation can be acquired from a sequentially presented map.  相似文献   

7.
Physical and linkage mapping underpin efforts to sequence and characterize the genomes of eukaryotic organisms by providing a skeleton framework for whole genome assembly. Hitherto, linkage and physical “contig” maps were generated independently prior to merging. Here, we develop a new and easy method, BAC HAPPY MAPPING (BAP mapping), that utilizes BAC library pools as a HAPPY mapping panel together with an Mbp-sized DNA panel to integrate the linkage and physical mapping efforts into one pipeline. Using Arabidopsis thaliana as an exemplar, a set of 40 Sequence Tagged Site (STS) markers spanning ∼10% of chromosome 4 were simultaneously assembled onto a BAP map compiled using both a series of BAC pools each comprising 0.7x genome coverage and dilute (0.7x genome) samples of sheared genomic DNA. The resultant BAP map overcomes the need for polymorphic loci to separate genetic loci by recombination and allows physical mapping in segments of suppressed recombination that are difficult to analyze using traditional mapping techniques. Even virtual “BAC-HAPPY-mapping” to convert BAC landing data into BAC linkage contigs is possible.  相似文献   

8.
Whole-Genome Shotgun Optical Mapping of Rhodospirillum rubrum   总被引:1,自引:0,他引:1  
Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a “molecular cytogenetics” approach to solving problems in genomic analysis.  相似文献   

9.
《PloS one》2013,8(3)
A physically anchored consensus map is foundational to modern genomics research; however, construction of such a map in oat (Avena sativa L., 2n = 6x = 42) has been hindered by the size and complexity of the genome, the scarcity of robust molecular markers, and the lack of aneuploid stocks. Resources developed in this study include a modified SNP discovery method for complex genomes, a diverse set of oat SNP markers, and a novel chromosome-deficient SNP anchoring strategy. These resources were applied to build the first complete, physically-anchored consensus map of hexaploid oat. Approximately 11,000 high-confidence in silico SNPs were discovered based on nine million inter-varietal sequence reads of genomic and cDNA origin. GoldenGate genotyping of 3,072 SNP assays yielded 1,311 robust markers, of which 985 were mapped in 390 recombinant-inbred lines from six bi-parental mapping populations ranging in size from 49 to 97 progeny. The consensus map included 985 SNPs and 68 previously-published markers, resolving 21 linkage groups with a total map distance of 1,838.8 cM. Consensus linkage groups were assigned to 21 chromosomes using SNP deletion analysis of chromosome-deficient monosomic hybrid stocks. Alignments with sequenced genomes of rice and Brachypodium provide evidence for extensive conservation of genomic regions, and renewed encouragement for orthology-based genomic discovery in this important hexaploid species. These results also provide a framework for high-resolution genetic analysis in oat, and a model for marker development and map construction in other species with complex genomes and limited resources.  相似文献   

10.
Studies of the evolution of collective behavior consider the payoffs of individual versus social learning. We have previously proposed that the relative magnitude of social versus individual learning could be compared against the transparency of payoff, also known as the “transparency” of the decision, through a heuristic, two-dimensional map. Moving from west to east, the estimated strength of social influence increases. As the decision maker proceeds from south to north, transparency of choice increases, and it becomes easier to identify the best choice itself and/or the best social role model from whom to learn (depending on position on east–west axis). Here we show how to parameterize the functions that underlie the map, how to estimate these functions, and thus how to describe estimated paths through the map. We develop estimation methods on artificial data sets and discuss real-world applications such as modeling changes in health decisions.  相似文献   

11.
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation.  相似文献   

12.
The genome for the marine pseudotemperate member of the Siphoviridae HSIC has been sequenced using a combination of linker amplification library construction, restriction digest library construction, and primer walking. HSIC enters into a pseudolysogenic relationship with its host, Listonella pelagia, characterized by sigmoidal growth curves producing >109 cells/ml and >1011 phage/ml. The genome (37,966 bp; G+C content, 44%) contained 47 putative open reading frames (ORFs), 17 of which had significant BLASTP hits in GenBank, including a β subunit of DNA polymerase III, a helicase, a helicase-like subunit of a resolvasome complex, a terminase, a tail tape measure protein, several phage-like structural proteins, and 1 ORF that may assist in host pathogenicity (an ADP ribosyltransferase). The genome was circularly permuted, with no physical ends detected by sequencing or restriction enzyme digestion analysis, and lacked a cos site. This evidence is consistent with a headful packaging mechanism similar to that of Salmonella phage P22 and Shigella phage Sf6. Because none of the phage-like ORFs were closely related to any existing phage sequences in GenBank (i.e., none more than 62% identical and most <25% identical at the amino acid level), HSIC is unique among phages that have been sequenced to date. These results further emphasize the need to sequence phages from the marine environment, perhaps the largest reservoir of untapped genetic information.  相似文献   

13.
14.
Long intergenic noncoding RNAs (lincRNAs) represent a large fraction of transcribed loci in eukaryotic genomes. Although classified as noncoding, most lincRNAs contain open reading frames (ORFs), and it remains unclear why cytoplasmic lincRNAs are not or very inefficiently translated. Here, we analyzed signatures of hindered translation in lincRNA sequences from five eukaryotes, covering a range of natural selection pressures. In fission yeast and Caenorhabditis elegans, that is, species under strong selection, we detected significantly shorter ORFs, a suboptimal sequence context around start codons for translation initiation, and trinucleotides (“codons”) corresponding to less abundant tRNAs than for neutrally evolving control sequences, likely impeding translation elongation. For human, we detected signatures for cell-type-specific hindrance of lincRNA translation, in particular codons in abundant cytoplasmic lincRNAs corresponding to lower expressed tRNAs than control codons, in three out of five human cell lines. We verified that varying tRNA expression levels between cell lines are reflected in the amount of ribosomes bound to cytoplasmic lincRNAs in each cell line. We further propose that codons at ORF starts are particularly important for reducing ribosome-binding to cytoplasmic lincRNA ORFs. Altogether, our analyses indicate that in species under stronger selection lincRNAs evolved sequence features generally hindering translation and support cell-type-specific hindrance of translation efficiency in human lincRNAs. The sequence signatures we have identified may improve predicting peptide-coding and genuine noncoding lincRNAs in a cell type.  相似文献   

15.
16.
A fundamental strategy for organising connections in the nervous system is the formation of neural maps. Map formation has been most intensively studied in sensory systems where the central arrangement of axon terminals reflects the distribution of sensory neuron cell bodies in the periphery or the sensory modality. This straightforward link between anatomy and function has facilitated tremendous progress in identifying cellular and molecular mechanisms that underpin map development. Much less is known about the way in which networks that underlie locomotion are organised. We recently showed that in the Drosophila embryo, dendrites of motorneurons form a neural map, being arranged topographically in the antero-posterior axis to represent the distribution of their target muscles in the periphery. However, the way in which a dendritic myotopic map forms has not been resolved and whether postsynaptic dendrites are involved in establishing sets of connections has been relatively little explored. In this study, we show that motorneurons also form a myotopic map in a second neuropile axis, with respect to the ventral midline, and they achieve this by targeting their dendrites to distinct medio-lateral territories. We demonstrate that this map is “hard-wired”; that is, it forms in the absence of excitatory synaptic inputs or when presynaptic terminals have been displaced. We show that the midline signalling systems Slit/Robo and Netrin/Frazzled are the main molecular mechanisms that underlie dendritic targeting with respect to the midline. Robo and Frazzled are required cell-autonomously in motorneurons and the balance of their opposite actions determines the dendritic target territory. A quantitative analysis shows that dendritic morphology emerges as guidance cue receptors determine the distribution of the available dendrites, whose total length and branching frequency are specified by other cell intrinsic programmes. Our results suggest that the formation of dendritic myotopic maps in response to midline guidance cues may be a conserved strategy for organising connections in motor systems. We further propose that sets of connections may be specified, at least to a degree, by global patterning systems that deliver pre- and postsynaptic partner terminals to common “meeting regions.”  相似文献   

17.
The genome for the marine pseudotemperate member of the Siphoviridae phiHSIC has been sequenced using a combination of linker amplification library construction, restriction digest library construction, and primer walking. phiHSIC enters into a pseudolysogenic relationship with its host, Listonella pelagia, characterized by sigmoidal growth curves producing >10(9) cells/ml and >10(11) phage/ml. The genome (37,966 bp; G+C content, 44%) contained 47 putative open reading frames (ORFs), 17 of which had significant BLASTP hits in GenBank, including a beta subunit of DNA polymerase III, a helicase, a helicase-like subunit of a resolvasome complex, a terminase, a tail tape measure protein, several phage-like structural proteins, and 1 ORF that may assist in host pathogenicity (an ADP ribosyltransferase). The genome was circularly permuted, with no physical ends detected by sequencing or restriction enzyme digestion analysis, and lacked a cos site. This evidence is consistent with a headful packaging mechanism similar to that of Salmonella phage P22 and Shigella phage Sf6. Because none of the phage-like ORFs were closely related to any existing phage sequences in GenBank (i.e., none more than 62% identical and most <25% identical at the amino acid level), phiHSIC is unique among phages that have been sequenced to date. These results further emphasize the need to sequence phages from the marine environment, perhaps the largest reservoir of untapped genetic information.  相似文献   

18.
We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.  相似文献   

19.
20.
A genomic analysis of 18 P. aeruginosa phages, including nine newly sequenced DNA genomes, indicates a tremendous reservoir of proteome diversity, with 55% of open reading frames (ORFs) being novel. Comparative sequence analysis and ORF map organization revealed that most of the phages analyzed displayed little relationship to each other.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号