共查询到20条相似文献,搜索用时 9 毫秒
1.
2.
3.
Whole genome DNA microarrays were constructed and used to investigate genomic diversity in 18 Campylobacter jejuni strains from diverse sources. New algorithms were developed that dynamically determine the boundary between the conserved and variable genes. Seven hypervariable plasticity regions (PR) were identified in the genome (PR1 to PR7) containing 136 genes (50%) of the variable gene pool. When comparisons were made with the sequenced strain NCTC11168, the number of absent or divergent genes ranged from 2.6% (40 genes) to 10.2% (163) and in total 16.3% (269) of the genes were variable. PR1 contains genes important in the utilisation of alternative electron acceptors for respiration and may confer a selective advantage to strains in restricted oxygen environments. PR2, 3 and 7 contain many outer membrane and periplasmic proteins and hypothetical proteins of unknown function that might be linked to phenotypic variation and adaptation to different ecological niches. PR4, 5 and 6 contain genes involved in the production and modification of antigenic surface structures. 相似文献
4.
Optimized design and assessment of whole genome tiling arrays 总被引:1,自引:0,他引:1
Gräf S Nielsen FG Kurtz S Huynen MA Birney E Stunnenberg H Flicek P 《Bioinformatics (Oxford, England)》2007,23(13):i195-i204
MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential. RESULTS: We introduce the uniqueness score, U, a novel quality measure for oligonucleotide probes and present a method to quickly compute it. We show that U is equivalent to the number of shortest unique substrings in the probe and describe an efficient greedy algorithm to design mammalian whole genome tiling arrays using probes that maximize U. Using the mouse genome, we demonstrate how several optimizations influence the tiling array design characteristics. With a sensible set of parameters, our designs cover 78% of the mouse genome including many regions previously considered 'untilable' due to the presence of repetitive sequence. Finally, we compare our whole genome tiling array designs with commercially available designs. AVAILABILITY: Source code is available under an open source license from http://www.ebi.ac.uk/~graef/arraydesign/. 相似文献
5.
6.
Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays 下载免费PDF全文
Guttman M Mies C Dudycz-Sulicz K Diskin SJ Baldwin DA Stoeckert CJ Grant GR 《PLoS genetics》2007,3(8):e143
Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays using an amplification protocol. We demonstrate the accurate detection on simulated data, and on real datasets involving known regions of aberration within subtypes of breast cancer at a resolution consistent with that of the array. Similarly, we apply our method to previously published datasets, including a 250K SNP array, and verify known results as well as detect novel regions of concordant aberration. The algorithm has been fully implemented and tested and is freely available as a Java application at http://www.cbil.upenn.edu/MSA. 相似文献
7.
Identification of small gains and losses in single cells after whole genome amplification on tiling oligo arrays 下载免费PDF全文
Jochen B. Geigl Anna C. Obenauf Julie Waldispuehl-Geigl Eva M. Hoffmann Martina Auer Martina H?rmann Maria Fischer Zlatko Trajanoski Michael A. Schenk Lars O. Baumbusch Michael R. Speicher 《Nucleic acids research》2009,37(15):e105
Clinical DNA is often available in limited quantities requiring whole-genome amplification for subsequent genome-wide assessment of copy-number variation (CNV) by array-CGH. In pre-implantation diagnosis and analysis of micrometastases, even merely single cells are available for analysis. However, procedures allowing high-resolution analyses of CNVs from single cells well below resolution limits of conventional cytogenetics are lacking. Here, we applied amplification products of single cells and of cell pools (5 or 10 cells) from patients with developmental delay, cancer cell lines and polar bodies to various oligo tiling array platforms with a median probe spacing as high as 65 bp. Our high-resolution analyses reveal that the low amounts of template DNA do not result in a completely unbiased whole genome amplification but that stochastic amplification artifacts, which become more obvious on array platforms with tiling path resolution, cause significant noise. We implemented a new evaluation algorithm specifically for the identification of small gains and losses in such very noisy ratio profiles. Our data suggest that when assessed with sufficiently sensitive methods high-resolution oligo-arrays allow a reliable identification of CNVs as small as 500 kb in cell pools (5 or 10 cells), and of 2.6–3.0 Mb in single cells. 相似文献
8.
Streptococcus suis type 2 (SS2) is an important swine pathogen and zoonosis agent. A/J mice are significantly more susceptible than C57BL/6 (B6) mice to SS2 infection, but the genetic basis is largely unknown. Here, alterations in gene expression in SS2 (strain HA9801)-infected mice were identified using Illumina mouse BeadChips. Microarray analysis revealed 3,692 genes differentially expressed in peritoneal macrophages between A/J and B6 mice due to SS2 infection. Between SS2-infected A/J and control A/J mice, 2646 genes were differentially expressed (1469 upregulated; 1177 downregulated). Between SS2-infected B6 and control B6 mice, 1449 genes were differentially expressed (778 upregulated; 671 downregulated). These genes were analyzed for significant Gene Ontology (GO) categories and signaling pathways using the Kyoto Encylopedia of Genes and Genomes (KEGG) database to generate a signaling network. Upregulated genes in A/J and B6 mice were related to response to bacteria, immune response, positive regulation of B cell receptor signaling pathway, type I interferon biosynthesis, defense and inflammatory responses. Additionally, upregulated genes in SS2-infected B6 mice were involved in antigen processing and presentation of exogenous peptides, peptide antigen stabilization, lymphocyte differentiation regulation, positive regulation of monocyte differentiation, antigen receptor-mediated signaling pathway and positive regulation of phagocytosis. Downregulated genes in SS2-infected B6 mice played roles in glycolysis, carbohydrate metabolic process, amino acid metabolism, behavior and muscle regulation. Microarray results were verified by quantitative real-time PCR (qRT-PCR) of 14 representative deregulated genes. Four genes differentially expressed between SS2-infected A/J and B6 mice, toll-like receptor 2 (Tlr2), tumor necrosis factor (Tnf), matrix metalloproteinase 9 (Mmp9) and pentraxin 3 (Ptx3), were previously implicated in the response to S. suis infection. This study identified candidate genes that may influence susceptibility or resistance to SS2 infection in A/J and B6 mice, providing further validation of these models and contributing to understanding of S. suis pathogenic mechanisms. 相似文献
9.
MOTIVATION: There is a growing literature on wavelet theory and wavelet methods showing improvements on more classical techniques, especially in the contexts of smoothing and extraction of fundamental components of signals. G+C patterns occur at different lengths (scales) and, for this reason, G+C plots are usually difficult to interpret. Current methods for genome analysis choose a window size and compute a chi(2) statistics of the average value for each window with respect to the whole genome. RESULTS: Firstly, wavelets are used to smooth G+C profiles to locate characteristic patterns in genome sequences. The method we use is based on performing a chi(2) statistics on the wavelet coefficients of a profile; thus we do not need to choose a fixed window size, in that the smoothing occurs at a set of different scales. Secondly, a wavelet scalogram is used as a measure for sequence profile comparison; this tool is very general and can be applied to other sequence profiles commonly used in genome analysis. We show applications to the analysis of Deinococcus radiodurans chromosome I, of two strains of Helicobacter pylori (26695, J99) and two of Neisseria meningitidis (serogroup B strain MC58 and serogroup A strain Z2491). We report a list of loci that have different G+C content with respect to the nearby regions; the analysis of N. meningitidis serogroup B shows two new large regions with low G+C content that are putative pathogenicity islands. AVAILABILITY: Software and numerical results (profiles, scalograms, high and low frequency components) for all the genome sequences analyzed are available upon request from the authors. 相似文献
10.
Different foreign genes incidentally integrated into the same locus of the Streptococcus suis genome 下载免费PDF全文
Some strains of Streptococcus suis possess a type II restriction-modification (RM) system, whose genes are thought to be inserted into the genome between purH and purD from a foreign source by illegitimate recombination. In this study, we characterized the purHD locus of the S. suis genomes of 28 serotype reference strains by DNA sequencing. Four strains contained the RM genes in the locus, as described before, whereas 11 strains possessed other genetic regions of seven classes. The genetic regions contained a single gene or multiple genes that were either unknown or similar to hypothetical genes of other bacteria. The mutually exclusive localization of the genetic regions with the atypical G+C contents indicated that these regions were also acquired from foreign sources. No transposable element or long-repeat sequence was found in the neighboring regions. An alignment of the nucleotide sequences, including the RM gene regions, suggested that the foreign regions were integrated by illegitimate recombination via short stretches of nucleotide identity. By using a thermosensitive suicide plasmid, the RM genes were experimentally introduced into an S. suis strain that did not contain any foreign genes in that locus. Integration of the plasmid into the S. suis genome did not occur in the purHD locus but occurred at various chromosomal loci, where there were 2 to 10 bp of nucleotide identity between the chromosome and the plasmid. These results suggest that various foreign genes described here were incidentally integrated into the same locus of the S. suis genome. 相似文献
11.
As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Methods introduced to date have generally utilized incomplete and likely insufficient subsets of the available data. We have developed an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of a large sparse data matrix in which each protein is uniquely represented as a vector of overlapping tetrapeptide frequencies. Quantitative pairwise estimates of species similarity were obtained by summing the protein vectors to form species vectors, then determining the cosines of the angles between species vectors. Evolutionary trees produced using this method confirmed many accepted prokaryotic relationships. However, several unconventional relationships were also noted. In addition, we demonstrate that many of the SVD-derived right basis vectors represent particular conserved protein families, while many of the corresponding left basis vectors describe conserved motifs within these families as sets of correlated peptides (copeps). This analysis represents the most detailed simultaneous comparison of prokaryotic genes and species available to date. 相似文献
12.
Whole genome tiling arrays provide a high resolution platform for profiling of genetic, epigenetic, and gene expression polymorphisms. In this study we surveyed natural genomic variation in cytosine methylation among Arabidopsis thaliana wild accessions Columbia (Col) and Vancouver (Van) by comparing hybridization intensity difference between genomic DNA digested with either methylation-sensitive (HpaII) or -insensitive (MspI) restriction enzyme. Single Feature Polymorphisms (SFPs) were assayed on a full set of 1,683,620 unique features of Arabidopsis Tiling Array 1.0F (Affymetrix), while constitutive and polymorphic CG methylation were assayed on a subset of 54,519 features, which contain a 5'CCGG3' restriction site. 138,552 SFPs (1% FDR) were identified across enzyme treatments, which preferentially accumulated in pericentromeric regions. Our study also demonstrates that at least 8% of all analyzed CCGG sites were constitutively methylated across the two strains, while about 10% of all analyzed CCGG sites were differentially methylated between the two strains. Within euchromatin arms, both constitutive and polymorphic CG methylation accumulated in central regions of genes but under-represented toward the 5' and 3' ends of the coding sequences. Nevertheless, polymorphic methylation occurred much more frequently in gene ends than constitutive methylation. Inheritance of methylation polymorphisms in reciprocal F1 hybrids was predominantly additive, with F1 plants generally showing levels of methylation intermediate between the parents. By comparing gene expression profiles, using matched tissue samples, we found that magnitude of methylation polymorphism immediately upstream or downstream of the gene was inversely correlated with the degree of expression variation for that gene. In contrast, methylation polymorphism within genic region showed weak positive correlation with expression variation. Our results demonstrated extensive genetic and epigenetic polymorphisms between Arabidopsis accessions and suggested a possible relationship between natural CG methylation variation and gene expression variation. 相似文献
13.
Identification of competence pheromone responsive genes in Streptococcus pneumoniae by use of DNA microarrays 总被引:6,自引:0,他引:6
Peterson SN Sung CK Cline R Desai BV Snesrud EC Luo P Walling J Li H Mintz M Tsegaye G Burr PC Do Y Ahn S Gilbert J Fleischmann RD Morrison DA 《Molecular microbiology》2004,51(4):1051-1070
14.
Evidence for horizontal transfer of SsuDAT1I restriction-modification genes to the Streptococcus suis genome 下载免费PDF全文
Different strains of Streptococcus suis serotypes 1 and 2 isolated from pigs either contained a restriction-modification (R-M) system or lacked it. The R-M system was an isoschizomer of Streptococcus pneumoniae DpnII, which recognizes nucleotide sequence 5′-GATC-3′. The nucleotide sequencing of the genes encoding the R-M system in S. suis DAT1, designated SsuDAT1I, showed that the SsuDAT1I gene region contained two methyltransferase genes, designated ssuMA and ssuMB, as does the DpnII system. The deduced amino acid sequences of M.SsuMA and M.SsuMB showed 70 and 90% identity to M.DpnII and M.DpnA, respectively. However, the SsuDAT1I system contained two isoschizomeric restriction endonuclease genes, designated ssuRA and ssuRB. The deduced amino acid sequence of R.SsuRA was 49% identical to that of R.DpnII, and R.SsuRB was 72% identical to R.LlaDCHI of Lactococcus lactis subsp. cremoris DCH-4. The four SsuDAT1I genes overlapped and were bounded by purine biosynthetic gene clusters in the following gene order: purF-purM-purN-purH-ssuMA-ssuMB-ssuRA-ssuRB-purD-purE. The G+C content of the SsuDAT1I gene region (34.1%) was lower than that of the pur region (48.9%), suggesting horizontal transfer of the SsuDAT1I system. No transposable element or long-repeat sequence was found in the flanking regions. The SsuDAT1I genes were functional by themselves, as they were individually expressed in Escherichia coli. Comparison of the sequences between strains with and without the R-M system showed that only the region from 53 bp upstream of ssuMA to 5 bp downstream of ssuRB was inserted in the intergenic sequence between purH and purD and that the insertion target site was not the recognition site of SsuDAT1I. No notable substitutions or insertions could be found, and the structures were conserved among all the strains. These results suggest that the SsuDAT1I system could have been integrated into the S. suis chromosome by an illegitimate recombination mechanism. 相似文献
15.
Genome evolution in prokaryotes is assisted by integration of gene pools from phages and plasmids. Regions downstream of tRNAs and tmRNAs are considered as hot spots for the integration of these gene pools or genomic islands. Till date, genomic islands have been identified only at tRNA/tmRNA genes in the enterobacterial genomes. Present work reports 10 distinct small RNAs as potent integration sites for genomic islands. A known tool tRNAcc 1.0 has been used to identify genomic islands associated with small RNAs c0362, oxyS, ryaA, rybB, rybD, ryeB, ryeE, rtT, sraE and tmRNA. The coordinates of 25 such small RNA associated genomic islands in three E. coli (strains: CFT073, EDL933 and K12) and Shigella flexneri (strain: 301) genomes are presented. Moreover cross-verification of the genomic sequences encoded within the identified genomic islands in horizontal gene transfer database, GenBank annotation features and atypical sequence compositions support our results. Again, all of the identified 25 genomic integration sites do exhibit genomic block rearrangements with respect to the associated small RNA. Similar to tRNAs/tmRNAs, the downstream regions of the small RNAs are found to be hotspots of integration. 相似文献
16.
17.
18.
19.