首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence.

Results

Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm.

Conclusions

The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.  相似文献   

2.
3.
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon.  相似文献   

4.
Termination of DNA replication, complete topological unlinking of the parental template DNA strands, partition of the daughter chromosomes, and cell division follow in an ordered and interdependent sequence during normal bacterial growth. In Escherichia coli, topoisomerase IV (Topo IV), encoded by parE and parC, is responsible for decatenation of the two newly formed chromosomes. In an effort to uncover the pathway of information flow between the macromolecular processes that describe these events, we identified dnaX, encoding the τ and γ subunits of the DNA polymerase III holoenzyme, as a high-copy suppressor of the temperature-sensitive phenotype of the parE10 allele. We show that suppression derives from overexpression of the γ, but not the τ, subunit of the holoenzyme and that the partition defect of parE10 cells is nearly completely reverted at the nonpermissive temperature as well. These observations suggest a possible association between Topo IV and the replication machinery.  相似文献   

5.
6.
Several germline single nucleotide polymorphisms (SNPs) have been identified in the POLB gene, but little is known about their cellular and biochemical impact. DNA Polymerase β (Pol β), encoded by the POLB gene, is the main gap-filling polymerase involved in base excision repair (BER), a pathway that protects the genome from the consequences of oxidative DNA damage. In this study we tested the hypothesis that expression of the POLB germline coding SNP (rs3136797) in mammalian cells could induce a cancerous phenotype. Expression of this SNP in both human and mouse cells induced double-strand breaks, chromosomal aberrations, and cellular transformation. Following treatment with an alkylating agent, cells expressing this coding SNP accumulated BER intermediate substrates, including single-strand and double-strand breaks. The rs3136797 SNP encodes the P242R variant Pol β protein and biochemical analysis showed that P242R protein had a slower catalytic rate than WT, although P242R binds DNA similarly to WT. Our results suggest that people who carry the rs3136797 germline SNP may be at an increased risk for cancer susceptibility.  相似文献   

7.
Targeted gene alteration (TGA) is a strategy for correcting single base mutations in the DNA of human cells that cause inherited disorders. TGA aims to reverse a phenotype by repairing the mutant base within the chromosome itself, avoiding the introduction of exogenous genes. The process of how to accurately repair a genetic mutation is elucidated through the use of single‐stranded DNA oligonucleotides (ODNs) that can enter the cell and migrate to the nucleus. These specifically designed ODNs hybridize to the target sequence and act as a beacon for nucleotide exchange. The key to this reaction is the frequency with which the base is corrected; this will determine whether the approach becomes clinically relevant or not. Over the course of the last five years, workers have been uncovering the role played by the cells in regulating the gene repair process. In this essay, we discuss how the impact of the cell on TGA has evolved through the years and illustrate ways that inherent cellular pathways could be used to enhance TGA activity. We also describe the cost to cell metabolism and survival when certain processes are altered to achieve a higher frequency of repair.  相似文献   

8.
9.
Total DNA isolated from Rhizobium leguminosarum VF39SM cells is resistant to cleavage by the restriction endonuclease PstI. Plasmid curing and transfer studies localized this phenotype to pRleVF39b, the second smallest of six plasmids found in this bacterium. In vitro selection for vector modification was employed to isolate a presumptive methylase gene (M.Rle39BI) from a plasmid gene library. Total and plasmid DNAs isolated from E. coli containing M.RleBI were resistant to digestion by PstI. Sequence data suggested that a putative restriction endonuclease (R.Rle39BI) was also encoded on the same fragment. The two genes were flanked by identical copies of a putative insertion sequence, which was also present in several copies elsewhere in the VF39SM genome. The presence of this element in other strains examined suggested that this element is indeed an insertion sequence. The differences in G/C content between the DNA coding for the R/M system and that of the IS element suggest that this DNA region may have been acquired by horizontal transfer.  相似文献   

10.
The sequenced yeast genome offers a unique resource for the analysis of eukaryotic cell function and enables genome-wide screens for genes involved in cellular processes. We have identified genes involved in cell surface assembly by screening transposon-mutagenized cells for altered sensitivity to calcofluor white, followed by supplementary screens to further characterize mutant phenotypes. The mutated genes were directly retrieved from genomic DNA and then matched uniquely to a gene in the yeast genome database. Eighty-two genes with apparent perturbation of the cell surface were identified, with mutations in 65 of them displaying at least one further cell surface phenotype in addition to their modified sensitivity to calcofluor. Fifty of these genes were previously known, 17 encoded proteins whose function could be anticipated through sequence homology or previously recognized phenotypes and 15 genes had no previously known phenotype.  相似文献   

11.
Two genetic mouse models for human phenylketonuria have been characterized by DNA sequence analysis. For each, a distinct mutation was identified within the protein coding sequence of the phenylalanine hydroxylase gene. This establishes that the mutated locus is the same as that causing human phenylketonuria and allows a comparison between these mouse phenylketonuria models and the human disease. A genotype/phenotype relationship that is strikingly similar to the human disease emerges, underscoring the similarity of phenylketonuria in mouse and man. InPAHENU1,the phenotype is mild. ThePahenu1mutation predicts a conservative valine to alanine amino acid substitution and is located in exon 3, a gene region where serious mutations are rare in humans. InPAHENU2,the phenotype is severe. ThePahenu2mutation predicts a radical phenylalanine to serine substitution and is located in exon 7, a gene region where serious mutations are common in humans. InPAHENU2,the sequence information was used to devise a direct genotyping system based on the creation of a newAlw26I restriction endonuclease site.  相似文献   

12.
This paper describes genes from yeast and mouse with significant sequence similarities to aDrosophila gene that encodes the blood cell tumor suppressor pendulin. The protein encoded by the yeast gene, Srp1p, and mouse pendulin share 42% and 51% amino acid identity withDrosophila pendulin, respectively. All three proteins consist of 10.5 degenerate tandem repeats of ~ 42 amino acids each. Similar repeats occur in a superfamily of proteins that includes theDrosophila Armadillo protein. All three proteins contain a consensus sequence for a bipartite nuclear localization signal (NLS) in the N-terminal domain, which is not part of the repeat structure. Confocal microscopic analysis of yeast cells stained with antibodies against Srp1p reveals that this protein is intranuclear throughout the cell cycle. Targeted gene disruption shows thatSRP1 is an essential gene. Despite their sequence similarities,Drosophila and mouse pendulin are unable to rescue the lethality of anSRP1 disruption. We demonstrate that yeast cells depleted of Srp1p arrest in mitosis with a G2 content of DNA. Arrested cells display abnormal structures and orientations of the mitotic spindles, aberrant segregation of the chromatin and the nuclei, and threads of chromatin emanating from the bulk of nuclear DNA. This phenotype suggests that Srplp is required for the normal function of microtubules and the spindle pole bodies, as well as for nuclear integrity. We suggest that Srp1p interacts with multiple components of the cell nucleus that are required for mitosis and discuss its functional similarities to, and differences fromDrosophila pendulin.  相似文献   

13.
R Deuring  W Doerfler 《Gene》1983,26(2-3):283-289
In previous work we have described a symmetric recombinant (SYREC1) between Ad12 DNA and human KB cell DNA. This recombinant DNA molecule has been generated during productive infection and is encapsidated into virions. From the DNA of a similar symmetric recombinant (termed SYREC2) between the left terminus of Ad12 DNA and human KB cellular DNA, the site of linkage between the two DNAs was cloned and sequenced. It was demonstrated that the first 2081 Ad12 nucleotides counting from the left viral terminus are conserved and linked to a sequence of GC-rich (70.4% G + C) KB cell DNA which occurs about 20 times per cellular genome. Except for a common 5'-CTGGC-3' pentanucleotide between the Ad12 DNA and KB cell DNA sequences, extensive patch homologies were not apparent at the site of junction. Similarly, comparisons of the deleted Ad12 DNA sequence and the cellular sequence replacing it did not reveal patch homologies. The 304 bp abutting the Ad12 terminus were shown to hybridize to KB cell DNA. These results provided definitive proof for the occurrence of recombinants between viral and cellular DNAs in human cells productively infected by Ad12 as previously shown by less direct experiments (Burger and Doerfler, 1974; Schick et al., 1976). Across the site of junction, an open reading frame exists which extends the truncated 54-kDal protein of the E1b region of Ad12 DNA for another 66 amino acids encoded by KB cellular DNA. This sequence is terminated by two UGA translational termination signals. The hypothetical protein has not yet been isolated.  相似文献   

14.
The telomeric DNA of vertebrates consists of d(TTAGGG)n tandem repeats, which can form quadruplex DNA structures in vitro and likely in vivo. Despite the fact that the G-rich telomeric DNA is susceptible to oxidation, few biochemical studies of base excision repair in telomeric DNA and quadruplex structures have been done. Here, we show that telomeric DNA containing thymine glycol (Tg), 8-oxo-7,8-dihydroguanine (8-oxoG), guanidinohydantoin (Gh), or spiroiminodihydantoin (Sp) can form quadruplex DNA structures in vitro. We have tested the base excision activities of five mammalian DNA glycosylases (NEIL1, NEIL2, mNeil3, NTH1, and OGG1) on these lesion-containing quadruplex substrates and found that only mNeil3 had excision activity on Tg in quadruplex DNA and that the glycosylase exhibited a strong preference for Tg in the telomeric sequence context. Although Sp and Gh in quadruplex DNA were good substrates for mNeil3 and NEIL1, none of the glycosylases had activity on quadruplex DNA containing 8-oxoG. In addition, NEIL1 but not mNeil3 showed enhanced glycosylase activity on Gh in the telomeric sequence context. These data suggest that one role for Neil3 and NEIL1 is to repair DNA base damages in telomeres in vivo and that Neil3 and Neil1 may function in quadruplex-mediated cellular events, such as gene regulation via removal of damaged bases from quadruplex DNA.  相似文献   

15.
Host base excision repair (BER) proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1) and mutY homolog (MYH) as well as DNA polymerase beta (Polβ). While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5′dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggeted Polβ DNA synthesis activity is not necessary while 5′dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.  相似文献   

16.
We describe here the role of muramidases present in clones of metagenomic DNA that result in cell aggregation and biofilm formation by Escherichia coli. The metagenomic clones were obtained from uncultured Lachnospiraceae-affiliated bacteria resident in the foregut microbiome of the Tammar wallaby. One of these fosmid clones (p49C2) was chosen for more detailed studies and a variety of genetic methods were used to delimit the region responsible for the phenotype to an open reading frame of 1425 bp. Comparative sequence analysis with other fosmid clones giving rise to the same phenotype revealed the presence of muramidase homologues with the same modular composition. Phylogenetic analysis of the fosmid sequence data assigned these fosmid inserts to recently identified, but uncultured, phylogroups of Lachnospiraceae believed to be numerically dominant in the foregut microbiome of the Tammar wallaby. The muramidase is a modular protein containing putative N-acetylmuramoyl--alanine amidase and an endo-β-N-acetylglucosaminidase catalytic module, with a similar organization and functional properties to some Staphylococcal autolysins that also confer adhesive properties and biofilm formation. We also show here that the cloned muramidases result in the production of extracellular DNA, which appears to be the key for biofilm formation and autoaggregation. Collectively, these findings suggest that biofilm formation and cell aggregation in gut microbiomes might occur via the concerted action of carbohydrate-active enzymes and the production of extracellular DNA to serve as a biofilm scaffold.  相似文献   

17.
Epstein-Barr virus, EBV, and humans have a common history that reaches back to our primate ancestors. The virus co-evolved with man and has established a largely harmless and highly complex co-existence. It is carried as silent infection by almost all human adults. A serendipitous discovery established that it is the causative agent of infectious mononucleosis.Still, EBV became known first in 1964, in a rare, geographically prevalent malignant lymphoma of B-cell origin, Burkitt lymphoma BL. Its association with a malignancy prompted intensive studies and its capacity to immortalize B-lymphocytes in vitro was soon demonstrated. Consequently EBV was classified therefore as a potentially tumorigenic virus. Despite of this property however, the virus carrier state itself does not lead to malignancies because the transformed cells are recognized by the immune response. Consequently the EBV induced proliferation of EBV carrying B-lymphocytes is manifested only under immunosuppressive conditions.The expression of EBV encoded genes is regulated by the cell phenotype. The virus genome can be found in malignancies originating from cell types other than the B-lymphocyte. Even in the EBV infected B-cell, the direct transforming capacity is restricted to a defined window of differentiation. A complex interaction between virally encoded proteins and B-cell specific cellular proteins constitute the proliferation inducing program.In this short review we touch upon aspects which are the subject of our present work.We describe the mechanisms of some of the functional interactions between EBV encoded and cellular proteins that determine the phenotype of latently infected B-cells.The growth promoting EBV encoded genes are not expressed in the virus carrying BL cells. Still, EBV seems to contribute to the etiology of this tumor by modifying events that influence cell survival and proliferation. We describe a possible growth promoting mechanism in the genesis of Burkitt lymphoma that depends on the presence of EBV.  相似文献   

18.
It is well recognized that base sequence exerts a significant influence on the properties of DNA and plays a significant role in protein–DNA interactions vital for cellular processes. Understanding and predicting base sequence effects requires an extensive structural and dynamic dataset which is currently unavailable from experiment. A consortium of laboratories was consequently formed to obtain this information using molecular simulations. This article describes results providing information not only on all 10 unique base pair steps, but also on all possible nearest-neighbor effects on these steps. These results are derived from simulations of 50–100 ns on 39 different DNA oligomers in explicit solvent and using a physiological salt concentration. We demonstrate that the simulations are converged in terms of helical and backbone parameters. The results show that nearest-neighbor effects on base pair steps are very significant, implying that dinucleotide models are insufficient for predicting sequence-dependent behavior. Flanking base sequences can notably lead to base pair step parameters in dynamic equilibrium between two conformational sub-states. Although this study only provides limited data on next-nearest-neighbor effects, we suggest that such effects should be analyzed before attempting to predict the sequence-dependent behavior of DNA.  相似文献   

19.
Sequence analysis of chloroplast and mitochondrial large subunit rRNA genes from over 75 green algae disclosed 28 new group I intron-encoded proteins carrying a single LAGLIDADG motif. These putative homing endonucleases form four subfamilies of homologous enzymes, with the members of each subfamily being encoded by introns sharing the same insertion site. We showed that four divergent endonucleases from the I-CreI subfamily cleave the same DNA substrates. Mapping of the 66 amino acids that are conserved among the members of this subfamily on the 3-dimensional structure of I-CreI bound to its recognition sequence revealed that these residues participate in protein folding, homodimerization, DNA recognition and catalysis. Surprisingly, only seven of the 21 I-CreI amino acids interacting with DNA are conserved, suggesting that I-CreI and its homologs use different subsets of residues to recognize the same DNA sequence. Our sequence comparison of all 45 single-LAGLIDADG proteins identified so far suggests that these proteins share related structures and that there is a weak pressure in each subfamily to maintain identical protein–DNA contacts. The high sequence variability we observed in the DNA-binding site of homologous LAGLIDADG endonucleases provides insight into how these proteins evolve new DNA specificity.  相似文献   

20.
Several regions of the human mitochondrial genome are refractory to cloning in plasmid and bacteriophage DNA vectors. For example, recovery of recombinant M13 clones containing a 462 basepair MboI-Kpn I restriction fragment that spans nucleotide positions 15591 to 16053 of HeLa cell mitochondrial DNA was as much as 100-fold lower than the recovery of M13 clones containing other regions of the human mitochondrial genome. All of 50 recombinant M13 clones containing this ‘uncloneable’ fragment had one or more changes in nucleotide sequence. Each clone contained at least one alteration in two nucleotide positions within the tRNAThr gene that encode portions of the anticodon loop and D-stem of the HeLa mitochondrial tRNAThr. These results imply that the HeLa mitochondrial tRNAThr gene is responsible for the ‘uncloneable’ phenotype of this region of human mitochondrial (mt) DNA.A total of 61 nucleotide sequence alterations were identified in 50 independent clones containing the HeLa mt tRNAThr gene. 56 mutations were single-base substitutions; 5 were deletions. Approximately 80% of the base substitution mutations were A:T → G:C transitions. A preference for A:T → G:C transition mutations also characterizes polymorphic base substitution variants in the mitochondrial DNA of unrelated individuals. This similarity suggests that human mitochondrial DNA sequence variation within and between individuals may have a common origin.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号