首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Molecular characterizations of bacteria often employ ribosomal DNA (rDNA) to establish the identity and relationships among organisms, but the use of rRNA sequences can be problematic as the result of alignment ambiguities caused by indels, the lack of informative characters, and varying functional constraints over the molecule. Although protein-coding regions have been used as an alternative to rRNA, there is neither consensus among the genes examined nor ways to rapidly obtain sequence information for such genes from uncharacterized bacterial species. To standardize the set of protein-coding loci assayed in bacterial genomes, we examined over 100 widely distributed genes to identify sets of universal primers for use in the PCR amplification of protein coding regions that are common to virtually all bacteria. From this set, we developed primer sets that each target of 10 genes spanning an array of genomic locations and functional categories. Although many of the primers contain sequence degeneracies that aid in targeting genes across diverse taxa, most are adequate for direct sequencing of amplification products, thereby eliminating intermediate cloning before sequence determination. We foresee the analysis of these protein-coding regions as being complementary to ribosomal DNA for answering questions pertaining to bacterial identification, classification, phylogenetics and evolution.  相似文献   

2.
The explosive growth in biological data in recent years has led to the development of new methods to identify DNA sequences. Many algorithms have recently been developed that search DNA sequences looking for unique DNA sequences. This paper considers the application of the Burrows-Wheeler transform (BWT) to the problem of unique DNA sequence identification. The BWT transforms a block of data into a format that is extremely well suited for compression. This paper presents a time-efficient algorithm to search for unique DNA sequences in a set of genes. This algorithm is applicable to the identification of yeast species and other DNA sequence sets.  相似文献   

3.
The availability of bacterial genome sequences has created a need for improved methods for sequence-based functional analysis to facilitate moving from annotated DNA sequence to genetic materials for analyzing the roles that postulated genes play in bacterial phenotypes. A powerful cloning method that uses lambda integrase recombination to clone and manipulate DNA sequences has been adapted for use with the gram-negative alpha-proteobacterium Sinorhizobium meliloti in two ways that increase the utility of the system. Adding plasmid oriT sequences to a set of vehicles allows the plasmids to be transferred to S. meliloti by conjugation and also allows cloned genes to be recombined from one plasmid to another in vivo by a pentaparental mating protocol, saving considerable time and expense. In addition, vehicles that contain yeast Flp recombinase target recombination sequences allow the construction of deletion mutations where the end points of the deletions are located at the ends of the cloned genes. Several deletions were constructed in a cluster of 60 genes on the symbiotic plasmid (pSymA) of S. meliloti, predicted to code for a denitrification pathway. The mutations do not affect the ability of the bacteria to form nitrogen-fixing nodules on Medicago sativa (alfalfa) roots.  相似文献   

4.
Inference of haplotypes from PCR-amplified samples of diploid populations   总被引:51,自引:0,他引:51  
Direct sequencing of genomic DNA from diploid individuals leads to ambiguities on sequencing gels whenever there is more than one mismatching site in the sequences of the two orthologous copies of a gene. While these ambiguities cannot be resolved from a single sample without resorting to other experimental methods (such as cloning in the traditional way), population samples may be useful for inferring haplotypes. For each individual in the sample that is homozygous for the amplified sequence, there are no ambiguities in the identification of the allele's sequence. The sequences of other alleles can be inferred by taking the remaining sequence after "subtracting off" the sequencing ladder of each known site. Details of the algorithm for extracting allelic sequences from such data are presented here, along with some population-genetic considerations that influence the likelihood for success of the method. The algorithm also applies to the problem of inferring haplotype frequencies of closely linked restriction-site polymorphisms.  相似文献   

5.
DNA methylation and epigenetic inheritance   总被引:6,自引:0,他引:6  
Mammalian cell lines silence genes at low frequency by the methylation of promoter sequences. These silent genes can be reactivated at high frequency by the demethylating agent 5-azacytidine (5-aza-CR). The inactive and active epigenetic states of such genes are stably inherited. A method for silencing genes is now available. It involves treatment of permeabilized cells with 5-methyl deoxycytidine triphosphate (5-methyl dCTP) which is incorporated into DNA. The methylation of promoter sequences has been confirmed using the bisulfite genomic sequencing procedure. Methylated oligonucleotides homologous to promoter sequences might be used to specifically target and silence given genes, but results so far have not been conclusive. Treatments that silence or reactivate genes by changing DNA methylation can be referred to as epimutagens, as distinct from mutagens that act by changing DNA sequences. The epimutagen 5-aza-CR reactivates genes but has little mutagenic activity, whereas standard mutagens (such as ethyl methane sulfonate and ultraviolet light) have little reactivation activity. Nevertheless, much more information is required about the effects of DNA-damaging agents in changing DNA methylation and gene activity and also about the role of epimutations in tumor progression.  相似文献   

6.
R R Robinson  N Davidson 《Cell》1981,23(1):251-259
A recombinant DNA phage containing a cluster of Drosophila melanogaster tRNA genes has been isolated and analyzed. The insert of this phage has been mapped by in situ hybridization to chromosomal region 50AB, a known tRNA site. Nucleotide sequencing of the entire Drosophila tRNA coding region reveals seven tRNA genes spanning 2.5 kb of chromosomal DNA. This cluster is separated from other tRNA regions on the chromosome by at least 2.7 kb on one side, and 9.6 kb on the other. Two tRNA genes are nearly identical and contain intervening sequences of length 38 and 45 bases, respectively, in the anticodon loop. These two genes are assigned to be tRNALeu genes because of significant sequence homology with yeast tRNA3Leu, and secondary structure homology with yeast tRNA3Leu intervening sequence. In addition, an 8 base sequence (AAAAUCUU) is conserved in the same location in the intervening sequences of Drosophila tRNALeu genes and a yeast tRNA3Leu gene. Similar sequenes occur in all other tRNAs containing intervening sequences. The remaining five genes are identical tRNAIle genes, which are also identical to a tRNAIle gene from chromosomal region 42A. The 5' flanking regions are only weakly homologous, but each set of isoacceptors contains short regions of strong homology approximately 20 nucleotides preceding the tRNA coding sequences: GCNTTTTG preceding tRNAIle genes; and GANTTTGG preceding tRNALeu genes. The genes are irregularly distributed on both DNA strands; spacing regions are divergent in sequence and length.  相似文献   

7.
Although remarkable progress in metagenomic sequencing of various environmental samples has been made, large numbers of fragment sequences have been registered in the international DNA databanks, primarily without information on gene function and phylotype, and thus with limited usefulness. Industrial useful biological activity is often carried out by a set of genes, such as those constituting an operon. In this connection, metagenomic approaches have a weakness because sets of the genes are usually split up, since the sequences obtained by metagenome analyses are fragmented into 1-kb or much shorter segments. Therefore, even when a set of genes responsible for an industrially useful function is found in one metagenome library, it is usually difficult to know whether a single genome harbors the entire gene set or whether different genomes have individual genes. By modifying Self-Organizing Map (SOM), we previously developed BLSOM for oligonucleotide composition, which allowed classification (self-organization) of sequence fragments according to genomes. Because BLSOM could reassociate genomic fragments according to genomes, BLSOM may ameliorate the abovementioned weakness of metagenome analyses. Here, we have developed a strategy for clustering of metagenomic sequences according to phylotypes and genomes, by testing a gene set contributing to environment preservation.  相似文献   

8.
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.  相似文献   

9.
One of the challenges to the effective utilization of cDNA microarray analysis in mouse models of oncogenesis is the choice of a critical set of probes that are informative for human disease. Given the thousands of genes with a potential role in human oncogenesis and the hundreds of thousands of mouse sequences available for use as probes, selection of an informative set of mouse probes can be an overwhelming task. We have developed a web based sequence mining tool using DataBase Independent (DBI) Perl to annotate publicly available sequences. The Mouse Oncochip Design Tool uses the Mouse Genome Database (MGD) developed and maintained by the Jackson Laboratories for mouse DNA sequences. There are over 380 000 sequences in their database. The output list has been ordered to present the genes more likely to be informative in a mouse model of human cancer using a candidate set of oncogenes to order the list. Mouse sequences that represent genes that are homologous with a member of a human oncogene set are listed first. In addition it provides a set of links for information on clone source gene function. Contact: http://nciarray.nci.nih.gov/cgi-bin/me/mouse_design.cgi  相似文献   

10.
The availability of bacterial genome sequences has created a need for improved methods for sequence-based functional analysis to facilitate moving from annotated DNA sequence to genetic materials for analyzing the roles that postulated genes play in bacterial phenotypes. A powerful cloning method that uses lambda integrase recombination to clone and manipulate DNA sequences has been adapted for use with the gram-negative α-proteobacterium Sinorhizobium meliloti in two ways that increase the utility of the system. Adding plasmid oriT sequences to a set of vehicles allows the plasmids to be transferred to S. meliloti by conjugation and also allows cloned genes to be recombined from one plasmid to another in vivo by a pentaparental mating protocol, saving considerable time and expense. In addition, vehicles that contain yeast Flp recombinase target recombination sequences allow the construction of deletion mutations where the end points of the deletions are located at the ends of the cloned genes. Several deletions were constructed in a cluster of 60 genes on the symbiotic plasmid (pSymA) of S. meliloti, predicted to code for a denitrification pathway. The mutations do not affect the ability of the bacteria to form nitrogen-fixing nodules on Medicago sativa (alfalfa) roots.  相似文献   

11.
12.
There are eight unlinked genes for yeast tyrosine transfer RNA. In previous work, nonsense suppressors have been isolated at each of the eight loci, and these loci have been genetically mapped (Hawthorne &; Leupold, 1974). It has also been demonstrated by RNA-DNA hybridization that the genes are physically located on eight different EcoRI restriction fragments (Olson et al., 1977). The purpose of the present report is to cross-correlate the set of tyrosine-inserting suppressor loci with the set of tRNATyr-hybridizing restriction fragments. This cross-correlation was achieved for six of the eight loci by analyzing the meiotic and mitotic linkage between the tyrosine-inserting suppressors and the genetic determinants of naturally occurring size variants of the tRNATyr-hybridizing restriction fragments.Now that individual suppressor loci have been identified with specific DNA fragments, it should be possible to analyze the phenotypes of these mutant genes in terms of their DNA sequences. The method by which these assignments were made also offers a new approach to the general problem of correlating genes with restriction fragments; it is particularly suited to organisms with powerful genetic systems in which hybridization to chromosome spreads in situ is impractical.  相似文献   

13.
Structure of the mouse C-reactive protein gene   总被引:3,自引:0,他引:3  
A genomic DNA clone corresponding to the mouse C-reactive protein (CRP) has been isolated and characterized. The mouse CRP gene is 1.9-kilobase pairs in length and contains a single intron of 213-base pairs which interrupts the codon for the 2nd amino acid residue of the mature CRP protein. We compared nucleotide sequences of the mouse and human CRP genes and discussed structures of possible regulatory sequences. With this characterization, the isolation and sequence analyses of a set of mouse and human pentraxin genes, i.e. CRP and serum amyloid P component genes is not complete.  相似文献   

14.
Summary In addition to the set of curved DNA segments isolated previously from Escherichia coli, another set of curved DNA segments has now been isolated. To gain an insight into the functional significance of these curved DNA sequences, systematic analyses were carried out, which included not only mapping of the precise locations of the segments on the E. coli chromosome but also clarification of the gene organization in the chromosomal regions surrounding the curved DNA sequences. It was demonstrated that most of the curved DNA sequences, which have been characterized so far, appear to be located immediately upstream of the coding sequences of adjacent genes. It was also demonstrated that an E. coli histone-like protein, named H-NS (or H1a), exhibits a strong affinity for naturally occurring curved DNA sequences in regions upstream promoters.  相似文献   

15.
16.
Comparative ab initio prediction of gene structures using pair HMMs   总被引:3,自引:0,他引:3  
We present a novel comparative method for the ab initio prediction of protein coding genes in eukaryotic genomes. The method simultaneously predicts the gene structures of two un-annotated input DNA sequences which are homologous to each other and retrieves the subsequences which are conserved between the two DNA sequences. It is capable of predicting partial, complete and multiple genes and can align pairs of genes which differ by events of exon-fusion or exon-splitting. The method employs a probabilistic pair hidden Markov model. We generate annotations using our model with two different algorithms: the Viterbi algorithm in its linear memory implementation and a new heuristic algorithm, called the stepping stone, for which both memory and time requirements scale linearly with the sequence length. We have implemented the model in a computer program called DOUBLESCAN. In this article, we introduce the method and confirm the validity of the approach on a test set of 80 pairs of orthologous DNA sequences from mouse and human. More information can be found at: http://www.sanger.ac.uk/Software/analysis/doublescan/  相似文献   

17.
Development of competence for DNA uptake by the bacterium Haemophilus influenzae is tightly regulated, and expression of the cell's complement of competence genes is absolutely dependent on the cAMP-CRP complex. A second regulator of competence may maximize competence under starvation conditions. Several investigators have recently identified a consensus sequence (competence regulatory element, CRE) in the promoter regions of some competence genes and have proposed that this may be a binding site for Sxy (TfoX), a putative positive regulator of competence. However, a scoring method that reliably ranks candidate binding sites according to affinity for the cognate binding protein predicts that the cAMP-CRP complex will bind CRE sequences with high affinity. Moreover, the predicted Sxy protein lacks recognizable DNA-binding motifs and has not been shown to bind DNA. No other consensus sequences (putative binding sites) were identified in the promoter regions of competence genes. These observations suggest that the proposed competence-specific regulatory elements are in fact CRP-binding sites, and highlight the central role of cAMP-an established bacterial mediator of the response to nutritional stress-in competence regulation. Minor sequence elements uniquely conserved in the set of CRE sequences are predicted to reduce CRP affinity, and a model is suggested in which a secondary regulator of competence genes may interact with CRP under certain conditions to stabilize the initiation complex.  相似文献   

18.
The use of automated fluorescent DNA sequencer systems and PCR-based DNA sequencing methods plays an important role in the actual effort to improve the efficiency of large-scale DNA analysis. While dideoxy-terminators labeled with energy-transfer dyes (BigDyes) provide the most versatile method of automated DNA sequencing, premature terminations result in a substantially reduced reading length of the DNA sequence. Premature terminations are usually evidenced by base ambiguities and are often accompanied by diminished signal intensity from that point on in the sequence. I studied a two-step protocol for Taq cycle sequencing using the ABI BigDye terminator for reducing premature terminations in DNA sequences. I demonstrate that combining the annealing step with the extension step at one temperature (60°C) reduces premature terminations in DNA sequences that regularly contain premature terminations when the three temperature steps are used. This modification significantly increases the number of accurately read bases in DNA sequences.  相似文献   

19.
Evaluating Quantitative Variation in the Genome of ZEA MAYS   总被引:7,自引:2,他引:5       下载免费PDF全文
Genomic diversity within the species Zea mays has been examined by measuring the variation in the repetitive component of the nuclear genome among North American inbred lines and varieties. This was done by preparing a set of clones of repetitive maize sequences that differ in function, molecular arrangement and multiplicity and then using these as probes for quantitative hybridization to DNA from various maize genotypes. The comparison showed that the majority of repeated sequences are markedly variable in copy number among the ten maize strains tested.The clone sample contained the rDNA and 5S genes, the major repeat of the chromosome knobs, sequences functioning as origins of DNA replication in yeast (ARS sequences) and randomly cloned sequences of unknown function and chromosomal location. The sequences ranged in reiteration frequency from 200 to greater than 10(5) copies and included both tandemly arrayed and dispersed repeats. The copy numbers were measured by hybridizing labeled cloned sequences to aliquots of high molecular weight genomic DNA that were applied to nitrocellulose filters through a slotted template (slot blotting). The hybridization signal on an autoradiogram occurred in a narrow band that could be scored reliably with a densitometer. This provided a rapid method of determining the abundance of particular repeated sequences in individual plants and plant populations. Using this technique, we found that the copy number of repeated sequences of all types generally varied among the strains by two- to threefold, although at least one sequence showed no detectable variation. In contrast to the variability found between strains, individuals within an inbred line or variety were found to be indistinguishable in terms of specific sequence multiplicity. Each genotype has a different pattern of copy numbers for the set of repeated sequence clones, and this pattern is characteristic of all individuals of a particular genotype. The data also show that the copy number of each sequence varies independently. No strains had uniformly high or low copy numbers for the entire set of probes.  相似文献   

20.
MOTIVATION: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a 'multiple genes, single species' approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called 'single gene, multiple species'. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. RESULTS: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. AVAILABILITY: Software available upon request from the authors. http://ural.wustl.edu/softwares.html  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号