首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although multiple sclerosis (MS) is one of the most common central nervous system diseases in young adults, little is known about its etiology. Several human endogenous retroviruses (ERVs) are considered to play a role in MS. We are interested in which ERVs can be identified in the vicinity of MS associated genetic marker to find potential initiators of MS. We analysed the chromosomal regions surrounding 58 single nucleotide polymorphisms (SNPs) that are associated with MS identified in one of the last major genome wide association studies. We scanned these regions for putative endogenous retrovirus sequences with large open reading frames (ORFs). We observed that more retrovirus-related putative ORFs exist in the relatively close vicinity of SNP marker indices in multiple sclerosis compared to control SNPs. We found very high homologies to HERV-K, HCML-ARV, XMRV, Galidia ERV, HERV-H/env62 and XMRV-like mouse endogenous retrovirus mERV-XL. The associated genes (CYP27B1, CD6, CD58, MPV17L2, IL12RB1, CXCR5, PTGER4, TAGAP, TYK2, ICAM3, CD86, GALC, GPR65 as well as the HLA DRB1*1501) are mainly involved in the immune system, but also in vitamin D regulation. The most frequently detected ERV sequences are related to the multiple sclerosis-associated retrovirus, the human immunodeficiency virus 1, HERV-K, and the Simian foamy virus. Our data shows that there is a relation between MS associated SNPs and the number of retroviral elements compared to control. Our data identifies new ERV sequences that have not been associated with MS, so far.  相似文献   

2.
3.
4.
5.
BackgroundGenetically modified organisms (GMOs) have numerous biomedical, agricultural and environmental applications. Development of accurate methods for the detection of GMOs is a prerequisite for the identification and control of authorized and unauthorized release of these engineered organisms into the environment and into the food chain. Current detection methods are unable to detect uncharacterized GMOs, since either the DNA sequence of the transgene or the amino acid sequence of the protein must be known for DNA-based or immunological-based detection, respectively.MethodsHere we describe the application of an epigenetics-based approach for the detection of mammalian GMOs via analysis of chromatin structural changes occurring in the host nucleus upon the insertion of foreign or endogenous DNA.ResultsImmunological methods combined with DNA next generation sequencing enabled direct interrogation of chromatin structure and identification of insertions of various size foreign (human or viral) DNA sequences, DNA sequences often used as genome modification tools (e.g. viral sequences, transposon elements), or endogenous DNA sequences into the nuclear genome of a model animal organism.ConclusionsThe results provide a proof-of-concept that epigenetic approaches can be used to detect the insertion of endogenous and exogenous sequences into the genome of higher organisms where the method of genetic modification, the sequence of inserted DNA, and the exact genomic insertion site(s) are unknown.General significanceMeasurement of chromatin dynamics as a sensor for detection of genomic manipulation and, more broadly, organism exposure to environmental or other factors affecting the epigenomic landscape are discussed.  相似文献   

6.
Typically, detection of protein sequences in collision-induced dissociation (CID) tandem MS (MS2) dataset is performed by mapping identified peptide ions back to protein sequence by using the protein database search (PDS) engine. Finding a particular peptide sequence of interest in CID MS2 records very often requires manual evaluation of the spectrum, regardless of whether the peptide-associated MS2 scan is identified by PDS algorithm or not. We have developed a compact cross-platform database-free command-line utility, pepgrep, which helps to find an MS2 fingerprint for a selected peptide sequence by pattern-matching of modelled MS2 data using Peptide-to-MS2 scoring algorithm. pepgrep can incorporate dozens of mass offsets corresponding to a variety of post-translational modifications (PTMs) into the algorithm. Decoy peptide sequences are used with the tested peptide sequence to reduce false-positive results. The engine is capable of screening an MS2 data file at a high rate when using a cluster computing environment. The matched MS2 spectrum can be displayed by using built-in graphical application programming interface (API) or optionally recorded to file. Using this algorithm, we were able to find extra peptide sequences in studied CID spectra that were missed by PDS identification. Also we found pepgrep especially useful for examining a CID of small fractions of peptides resulting from, for example, affinity purification techniques. The peptide sequences in such samples are less likely to be positively identified by using routine protein-centric algorithm implemented in PDS. The software is freely available at http://bsproteomics.essex.ac.uk:8080/data/download/pepgrep-1.4.tgz.  相似文献   

7.
Genomic SELEX is a method for studying the network of nucleic acid–protein interactions within any organism. Here we report the discovery of several interesting and potentially biologically important interactions using genomic SELEX. We have found that bacteriophage MS2 coat protein binds several Escherichia coli mRNA fragments more tightly than it binds the natural, well-studied, phage mRNA site. MS2 coat protein binds mRNA fragments from rffG (involved in formation of lipopolysaccharide in the bacterial outer membrane), ebgR (lactose utilization repressor), as well as from several other genes. Genomic SELEX may yield experimentally induced artifacts, such as molecules in which the fixed sequences participate in binding. We describe several methods (annealing of oligonucleotides complementary to fixed sequences or switching fixed sequences) to eliminate some, or almost all, of these artifacts. Such methods may be useful tools for both randomized sequence SELEX and genomic SELEX.  相似文献   

8.
9.
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.Database search tools, such as Sequest (3), Mascot (4), and InsPecT (5), are the most frequently used methods for reliable protein identification in tandem mass (MS/MS) spectrometry based proteomics. These operate by separately matching each MS/MS spectrum to peptide sequences from reference protein databases where all proteins of interest are presumably contained. But this assumption often does not hold true as many important proteins, such as monoclonal antibodies, are not contained in any database because mechanisms of antibody variation (including genetic recombination and somatic hyper-mutation (6)) constantly create new proteins with novel unique sequences. These mechanisms of variation are the foundation of adaptive immune systems and have enabled highly successful antibody-based therapeutic strategies (7, 8). Nevertheless, such variation also means that antibody MS/MS spectra are typically impossible to identify via standard database search techniques whenever the corresponding sequences are not known in advance. An inherent drawback of database search strategies is that they are only as good as the database(s) being searched and incomplete databases often result in proteins being misidentified or left unidentified (9).Despite the importance of novel protein identification, few high-throughput methods have been developed for de novo sequencing of unknown proteins. Low-throughput Edman degradation is a well-known de novo sequencing approach that can accurately call amino acid sequences in N/C-terminal regions of unknown proteins but has drawbacks that make it unsuitable for sequencing proteins longer than 50 amino acids or proteins with post-translational modifications (10, 11). Many have recognized the potential of tandem mass spectrometry for protein sequencing. For example, in 1987 Johnson and Biemann (12) manually sequenced a complete protein from rabbit bone marrow. Meanwhile, automated de novo sequencing methods that rely on interpretations of individual MS/MS spectra are limited in that they typically cannot reconstruct long (8+ AA) sequences without mis-predicting 1 in 5 AA on average for low accuracy collision-induced dissociation (CID) spectra (13, 14). Recent advances in de novo peptide sequencing have improved sequencing accuracy to over 95% for high resolution higher energy collisional dissociation (HCD)1 spectra (15), but at limited sequence coverage (Chi H et al. report only 55% sequence coverage of peptides identified by database search). In fact, all current per-spectrum de novo sequencing strategies face a significant tradeoff between sequencing accuracy and coverage as spectra exhibiting complete peptide fragmentation rarely cover entire target proteins, yet are required to accurately reconstruct full-length peptide sequences. An alternative approach to separately sequencing individual spectra is to simultaneously interpret multiple MS/MS spectra from overlapping peptides. This Shotgun Protein Sequencing (SPS) paradigm differs from traditional algorithms by deriving consensus sequences from contigs - sets of multiple MS/MS spectra from distinct peptides with overlapping sequences (1, 16). Because SPS aggregates multiple spectra from overlapping peptides, protein sequences extending beyond the length of enzymatically digested peptides can be extracted from spectra with incomplete peptide fragmentation. Furthermore, SPS has been found to generate sequences that frequently cover 90–95+% of the target protein sequence(s) whereas mis-predicting only 1 out of every 20 amino acids on high resolution MS/MS spectra (2). But a remaining limitation of SPS is that it still generates fragmented sequences that do not singularly cover large regions of the target protein sequences, much less complete proteins: SPS sequences have an average length of 10–15 amino acids (depending on input data) and the longest recovered SPS de novo sequence is less than 45 amino acids long (1).The considerable limitations of de novo sequencing strategies have typically been addressed by attempting to circumvent them using error-tolerant matching to known protein sequences. One such strategy (17) is to generate short de novo sequence tags and then match them exactly to protein databases without requiring matching the N/C-term flanking masses (to allow for unexpected polymorphisms or post-translational modifications). Short sequence tags are usually derived from parts of the spectrum with high signal-to-noise ratios and typically have higher sequencing accuracy than full-length de novo sequences (18). This approach was later extended in MS-Shotgun (19) and continues to be a popular technique for speeding up database search tools (5, 2022). Homology matching of full length de novo sequences was first explored in CIDentify (23) and later in MS-BLAST (24) by searching de novo sequences using FASTA and WU-BLAST2 (respectively) to find homologous matches to sequences of related proteins; FASTS (25) also approached the problem using a modified version of FASTA. However, common de novo sequencing errors tend to produce sequences that are heavily penalized in pure sequence homology searches. For example, missing peaks in MS/MS spectra may easily cause GA subsequences to be reconstructed as Q or AG (same-mass sequences), thus making subsequent BLAST searches unlikely to succeed. This issue was partially considered in CIDentify and more thoroughly addressed in SPIDER (26) by explicitly modeling de novo sequencing errors together with BLOSUM scores in MS/MS-based sequence homology searches. In addition, OpenSea (27) further explored database matching of de novo sequences for analysis of unexpected post-translational modifications (PTMs). Finally, Shen et al. (28) used short unique de novo sequence tags, called UStags, to discover protein-localized PTMs.Recent approaches to homology matching of de novo sequences have built on genome assembly and sequencing techniques to achieve database-assisted full-length sequencing of unknown proteins. Comparative Shotgun Protein Sequencing (cSPS) complemented SPS assembly techniques with usage of error tolerant matching of de novo sequences to find overlapping SPS de novo sequences that are then further assembled into full-length protein sequences (2). cSPS was designed to support the sequencing of highly divergent proteins that have regions close enough in homology to transfer matches from a reference. cSPS was shown to enable de novo sequencing of monoclonal antibodies at 95+% sequencing accuracy, while simultaneously tolerating and identifying unexpected PTMs (29). In difference from cSPS, Champs (30) de novo sequences individual spectra to obtain putative peptide sequences, which are then mapped to homologous proteins to correct sequencing errors and reconstruct protein sequences with 100% accuracy and 99% coverage. However, Champs is designed to only map peptides that differ from the reference sequence by one or two amino acids and does not handle PTMs. As such, its sequencing accuracy is not directly comparable to that of cSPS as Champs was not designed to sequence highly divergent proteins (such as monoclonal antibodies) with multiple PTMs, insertions, deletions, and/or recombinations. GenoMS (31) extended the approaches in cSPS/Champs by explicitly modeling protein splice variants as paths in splice graphs where nodes represent translated exon regions (32). MS/MS spectra are first searched for exact sequence matches against all possible protein isoforms. The remaining unidentified MS/MS spectra are then aligned to the matched peptides and de novo sequenced to extend the matched sequences into novel regions. Reported sequences are 97–99% accurate and cover 96–99% of target proteins depending on sequence similarity between the novel and reference sequences (31). However, GenoMS de novo sequences are usually extended less than 3 amino acids beyond matched peptides because sequencing accuracy degrades as sequences are extended, thus preventing the consistent extension of long (10+ AA) sequences. Altogether, the use of homology matching approaches for full-length de novo protein sequencing continues to be limited by 1) requiring the previous knowledge of closely related protein sequences and 2) the inherent difficulties in statistically significant homology-tolerant matching of error-prone short de novo sequences.The Meta-SPS approach proposed here seeks to de novo sequence complete proteins, or long protein regions, without any use of a database. Meta-SPS builds upon SPS by treating SPS de novo sequences (contig sequences) as input spectra and further assembling them into longer de novo sequences (meta-contig sequences). We show that Meta-SPS extends de novo sequences to lengths over 100 AA while boosting sequencing accuracy to only 1 mistake per 40 amino acid predictions, thus enabling database-free de novo sequencing of completely novel proteins while also allowing error-tolerant matching approaches to support higher-divergence homologies (by searching longer, more accurate de novo sequences). Meta-SPS algorithms are demonstrated on CID and HCD MS/MS spectra and its limitations are discussed in relation to the underlying limitations of bottom-up tandem mass spectrometry.  相似文献   

10.
11.
《Gene》1996,173(2):241-246
The glucose-6-phosphate dehydrogenase-encoding gene (G6PD) belongs to a group with constitutive expression in all tissues. The regulation of these housekeeping genes is poorly understood, as compared to what is known about many genes whose expression is restricted to a particular tissue or stage of development, and which are often regulated by locus control regions (LCR) able to act over wide distances. In order to identify sequences in human G6PD which are necessary for its expression, we have generated transgenic mice carrying a 20-kb G6PD construct, including only 2.5 kb of upstream and 2.0 kb of downstream flanking sequence. All mice which carried the transgene (TG) expressed it, and the levels of expression detected in a range of tissues from three independent lines of mice were comparable to that of the endogenous murine G6PD. The variation in enzyme activity from tissue to tissue was remarkably similar for both the TG and the endogenous gene, and was shown to be due in both cases to variations in the steady-state mRNA levels.  相似文献   

12.
A technology of mass spectrometry (MS) was used in this study for the large-scale proteomic identification and verification of protein-encoding genes present in the silkworm (Bombyx mori) genome. Peptide sequences identified by MS were compared with those from an open reading frame (ORF) library of the B. mori genome and a cDNA library, to validate the coding attributes of ORFs. Two databases were created. The first was based on a 9× draft sequence of the silkworm genome and contained 14,632 putative proteins. The second was based on a B. mori pupal cDNA library containing 3,187 putative proteins of at least 30 amino acid residues in length. A total of 81,000 peptide sequences with a threshold score of 60% were generated by the MS/MS analysis, and 55,400 of these were chosen for a sequence alignment. By searching these two databases, 6,649 and 250 proteins were matched, which accounted for approximately 45.4% and 7.8% of the peptide sequences and putative proteins, respectively. Further analyses carried out by several bioinformatic tools suggested that the matches included proteins with predicted transmembrane domains (1,393) and preproteins with a signal peptide (976). These results provide a fundamental understanding of the expression and function of silkworm proteins.  相似文献   

13.
14.
15.
Myelination plays an important role in cognitive development and in demyelinating diseases like multiple sclerosis (MS), where failure of remyelination promotes permanent neuro-axonal damage. Modification of cell surface receptors with branched N-glycans coordinates cell growth and differentiation by controlling glycoprotein clustering, signaling, and endocytosis. GlcNAc is a rate-limiting metabolite for N-glycan branching. Here we report that GlcNAc and N-glycan branching trigger oligodendrogenesis from precursor cells by inhibiting platelet-derived growth factor receptor-α cell endocytosis. Supplying oral GlcNAc to lactating mice drives primary myelination in newborn pups via secretion in breast milk, whereas genetically blocking N-glycan branching markedly inhibits primary myelination. In adult mice with toxin (cuprizone)-induced demyelination, oral GlcNAc prevents neuro-axonal damage by driving myelin repair. In MS patients, endogenous serum GlcNAc levels inversely correlated with imaging measures of demyelination and microstructural damage. Our data identify N-glycan branching and GlcNAc as critical regulators of primary myelination and myelin repair and suggest that oral GlcNAc may be neuroprotective in demyelinating diseases like MS.  相似文献   

16.
Within the haploid genome there are approximately 1,000 copiesof the human endogenous retroviruslike sequence, HERV-H. Althoughthese sequences are scattered throughout the entire genome,in situ hybridization experiments revealed that there are discreteclusters positioned on chromosomes 1p and 7q. In this study,we have located three HERV-H sequences which were unexpectedlyclustered within a 300-kilobase region close to the GRPR locuson the X chromosome. In previous studies, no clusteringof thissequence has been reported at this locus. Our finding demonstratesthat, like other repetitive sequences, clustering of HERV-Hoccurs in the human genome, although these sequences may notalways be detected by in situ hybridization methods.  相似文献   

17.
The HERV‐W family of human endogenous retroviruses represents a group of numerous sequences that show close similarity in genetic composition. It has been documented that some members of HERV‐W–derived expression products are supposed to play significant role in humans' pathology, such as multiple sclerosis or schizophrenia. Other members of the family are necessary to orchestrate physiological processes (eg, ERVWE1 coding syncytin‐1 that is engaged in syncytiotrophoblast formation). Therefore, an assay that would allow the recognition of particular form of HERV‐W members is highly desirable. A peptide nucleic acid (PNA)–mediated technique for the discrimination between multiple sclerosis‐associated retrovirus and ERVWE1 sequence has been developed. The assay uses a PNA probe that, being fully complementary to the ERVWE1 but not to multiple sclerosis‐associated retrovirus (MSRV) template, shows high selective potential. Single‐stranded DNA binding protein facilitates the PNA‐mediated, sequence‐specific formation of strand invasion complex and, consequently, local DNA unwinding. The target DNA may be then excluded from further analysis in any downstream process such as single‐stranded DNA‐specific exonuclease action. Finally, the reaction conditions have been optimized, and several PNA probes that are targeted toward distinct loci along whole HERV‐W env sequences have been evaluated. We believe that PNA/single‐stranded DNA binding protein–based application has the potential to selectively discriminate particular HERV‐W molecules as they are at least suspected to play pathogenic role in a broad range of medical conditions, from psycho‐neurologic disorders (multiple sclerosis and schizophrenia) and cancers (breast cancer) to that of an auto‐immunologic background (psoriasis and lupus erythematosus).  相似文献   

18.
The reported draft human genome sequence includes many contigs that are separated by gaps of unknown sequence. These gaps may be due to chromosomal regions that are not present in the Escherichia coli libraries used for DNA sequencing because they cannot be cloned efficiently, if at all, in bacteria. Using a yeast artificial chromosome (YAC)/ bacterial artificial chromosome (BAC) library generated in yeast, we found that approximately 6% of human DNA sequences tested transformed E. coli cells less efficiently than yeast cells, and were less stable in E. coli than in yeast. When the ends of several YAC/BAC isolates cloned in yeast were sequenced and compared with the reported draft sequence, major inconsistencies were found with the sequences of those YAC/BAC isolates that transformed E. coli cells inefficiently. Two human genomic fragments were re-isolated from human DNA by transformation-associated recombination (TAR) cloning. Re-sequencing of these regions showed that the errors in the draft are the results of both missassembly and loss of specific DNA sequences during cloning in E. coli. These results show that TAR cloning might be a valuable method that could be widely used during the final stages of the Human Genome Project.  相似文献   

19.
Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35 kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35 kDa protein was ~ 98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号