首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions.

Results

We applied our method to several sets of co-expressed genes and were able to define subsets with enrichment in particular biological processes and specific upstream regulatory motifs.

Conclusions

These results show the potential of our technique for functional prediction and regulatory motif identification from microarray data.
  相似文献   

2.

Background

The human genome contains millions of single nucleotide polymorphisms (SNPs); many of these SNPs are intronic and have unknown functional significance. SNPs occurring within intron branchpoint sites, especially at the adenine (A), would presumably affect splicing; however, this has not been systematically studied. We employed a splicing prediction tool to identify human intron branchpoint sites and screened dbSNP for identifying SNPs located in the predicted sites to generate a genome-wide branchpoint site SNP database.

Results

We identified 600 SNPs located within branchpoint sites; among which, 216 showed a change in A. After scoring the SNPs by counting the As in the ±?10 nucleotide region, only four SNPs were identified without additional As (rs13296170, rs12769205, rs75434223, and rs67785924). Using minigene constructs, we examined the effects of these SNPs on splicing. The three SNPs (rs13296170, rs12769205, and rs75434223) with nucleotide substitution at the A position resulted in abnormal splicing (exon skipping and/or intron inclusion). However, rs67785924, a 5-bp deletion that abolished the branchpoint A nucleotide, exhibited normal RNA splicing pattern, presumably using two of the downstream As as alternative branchpoints. The influence of additional As on splicing was further confirmed by studying rs2733532, which contains three additional As in the ±?10 nucleotide region.

Conclusions

We generated a high-confidence genome-wide branchpoint site SNP database, experimentally verified the importance of A in the branchpoint, and suggested that other nearby As can protect branchpoint A substitution from abnormal splicing.
  相似文献   

3.

Background

Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.

Methods

We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.

Results

We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.

Conclusions

Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).
  相似文献   

4.

Background

STAT1 and IRF1 collaborate to induce interferon-γ (IFNγ) stimulated genes (ISGs), but the extent to which they act alone or together is unclear. The effect of single nucleotide polymorphisms (SNPs) on in vivo binding is also largely unknown.

Results

We show that IRF1 binds at proximal or distant ISG sites twice as often as STAT1, increasing to sixfold at the MHC class I locus. STAT1 almost always bound with IRF1, while most IRF1 binding events were isolated. Dual binding sites at remote or proximal enhancers distinguished ISGs that were responsive to IFNγ versus cell-specific resistant ISGs, which showed fewer and mainly single binding events. Surprisingly, inducibility in one cell type predicted ISG-responsiveness in other cells. Several dbSNPs overlapped with STAT1 and IRF1 binding motifs, and we developed methodology to rapidly assess their effects. We show that in silico prediction of SNP effects accurately reflects altered binding both in vitro and in vivo.

Conclusions

These data reveal broad cooperation between STAT1 and IRF1, explain cell type specific differences in ISG-responsiveness, and identify genetic variants that may participate in the pathogenesis of immune disorders.
  相似文献   

5.

Background

While continental level ancestry is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge.

Methods

We study the problem of predicting human biogeographical ancestry from genomic data under resource constraints. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We propose methods to construct such ancestry informative SNP panels using correlation-based and outlier-based methods.

Results

We accessed the performance of the proposed SNP panels derived from just one chromosome, using data from the 1000 Genome Project, Phase 3. For continental-level ancestry classification, we achieved an overall classification rate of 96.75% using 206 single nucleotide polymorphisms (SNPs). For sub-population level ancestry prediction, we achieved an average pairwise binary classification rates as follows: subpopulations in Europe: 76.6% (58 SNPs); Africa: 87.02% (87 SNPs); East Asia: 73.30% (68 SNPs); South Asia: 81.14% (75 SNPs); America: 85.85% (68 SNPs).

Conclusion

Our results demonstrate that one single chromosome (in particular, Chromosome 1), if carefully analyzed, could hold enough information for accurate prediction of human biogeographical ancestry. This has significant implications in terms of the computational resources required for analysis of ancestry, and in the applications of such analyses, such as in studies of genetic diseases, forensics, and soft biometrics.
  相似文献   

6.

Background

GAW20 working group 5 brought together researchers who contributed 7 papers with the aim of evaluating methods to detect genetic by epigenetic interactions. GAW20 distributed real data from the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study, including single-nucleotide polymorphism (SNP) markers, methylation (cytosine-phosphate-guanine [CpG]) markers, and phenotype information on up to 995 individuals. In addition, a simulated data set based on the real data was provided.

Results

The 7 contributed papers analyzed these data sets with a number of different statistical methods, including generalized linear mixed models, mediation analysis, machine learning, W-test, and sparsity-inducing regularized regression. These methods generally appeared to perform well. Several papers confirmed a number of causative SNPs in either the large number of simulation sets or the real data on chromosome 11. Findings were also reported for different SNPs, CpG sites, and SNP–CpG site interaction pairs.

Conclusions

In the simulation (200 replications), power appeared generally good for large interaction effects, but smaller effects will require larger studies or consortium collaboration for realizing a sufficient power.
  相似文献   

7.

Background

False occurrences of functional motifs in protein sequences can be considered as random events due solely to the sequence composition of a proteome. Here we use a numerical approach to investigate the random appearance of functional motifs with the aim of addressing biological questions such as: How are organisms protected from undesirable occurrences of motifs otherwise selected for their functionality? Has the random appearance of functional motifs in protein sequences been affected during evolution?

Results

Here we analyse the occurrence of functional motifs in random sequences and compare it to that observed in biological proteomes; the behaviour of random motifs is also studied. Most motifs exhibit a number of false positives significantly similar to the number of times they appear in randomized proteomes (=expected number of false positives). Interestingly, about 3% of the analysed motifs show a different kind of behaviour and appear in biological proteomes less than they do in random sequences. In some of these cases, a mechanism of evolutionary negative selection is apparent; this helps to prevent unwanted functionalities which could interfere with cellular mechanisms.

Conclusion

Our thorough statistical and biological analysis showed that there are several mechanisms and evolutionary constraints both of which affect the appearance of functional motifs in protein sequences.
  相似文献   

8.

Background

An accumulation of evidence has revealed the important role of epigenetic factors in explaining the etiopathogenesis of human diseases. Several empirical studies have successfully incorporated methylation data into models for disease prediction. However, it is still a challenge to integrate different types of omics data into prediction models, and the contribution of methylation information to prediction remains to be fully clarified.

Results

A stratified drug-response prediction model was built based on an artificial neural network to predict the change in the circulating triglyceride level after fenofibrate intervention. Associated single-nucleotide polymorphisms (SNPs), methylation of selected cytosine-phosphate-guanine (CpG) sites, age, sex, and smoking status, were included as predictors. The model with selected SNPs achieved a mean 5-fold cross-validation prediction error rate of 43.65%. After adding methylation information into the model, the error rate dropped to 41.92%. The combination of significant SNPs, CpG sites, age, sex, and smoking status, achieved the lowest prediction error rate of 41.54%.

Conclusions

Compared to using SNP data only, adding methylation data in prediction models slightly improved the error rate; further prediction error reduction is achieved by a combination of genome, methylation genome, and environmental factors.
  相似文献   

9.

Background

Systematic evaluation and study of single nucleotide polymorphisms (SNPs) made possible by high throughput genotyping technologies and bioinformatics promises to provide breakthroughs in the understanding of complex diseases. Understanding how the millions of SNPs in the human genome are involved in conferring susceptibility or resistance to disease, or in rendering a drug efficacious or toxic in the individual is a major goal of the relatively new fields of pharmacogenomics. Esophageal squamous cell carcinoma is a high-mortality cancer with complex etiology and progression involving both genetic and environmental factors. We examined the association between esophageal cancer risk and patterns of 61 SNPs in a case-control study for a population from Shanxi Province in North Central China that has among the highest rates of esophageal squamous cell carcinoma in the world.

Methods

High-throughput Masscode mass spectrometry genotyping was done on genomic DNA from 574 individuals (394 cases and 180 age-frequency matched controls). SNPs were chosen from among genes involving DNA repair enzymes, and Phase I and Phase II enzymes.We developed a novel adaptation of the Decision Forest pattern recognition method named Decision Forest for SNPs (DF-SNPs). The method was designated to analyze the SNP data.

Results

The classifier in separating the cases from the controls developed with DF-SNPs gave concordance, sensitivity and specificity, of 94.7%, 99.0% and 85.1%, respectively; suggesting its usefulness for hypothesizing what SNPs or combinations of SNPs could be involved in susceptibility to esophageal cancer. Importantly, the DF-SNPs algorithm incorporated a randomization test for assessing the relevance (or importance) of individual SNPs, SNP types (Homozygous common, heterozygous and homozygous variant) and patterns of SNP types (SNP patterns) that differentiate cases from controls. For example, we found that the different genotypes of SNP GADD45B E1122 are all associated with cancer risk.

Conclusion

The DF-SNPs method can be used to differentiate esophageal squamous cell carcinoma cases from controls based on individual SNPs, SNP types and SNP patterns. The method could be useful to identify potential biomarkers from the SNP data and complement existing methods for genotype analyses.
  相似文献   

10.

Background

Identification of genes underlying production traits is a key aim of the mink research community. Recent availability of genomic tools have opened the possibility for faster genetic progress in mink breeding. Availability of mink genome assembly allows genome-wide association studies in mink.

Results

In this study, we used genotyping-by-sequencing to obtain single nucleotide polymorphism (SNP) genotypes of 2496 mink. After multiple rounds of filtering, we retained 28,336 high quality SNPs and 2352 individuals for a genome-wide association study (GWAS). We performed the first GWAS for body weight, behavior, along with 10 traits related to fur quality in mink.

Conclusions

Combining association results with existing functional information of genes and mammalian phenotype databases, we proposed WWC3, MAP2K4, SLC7A1 and USP22 as candidate genes for body weight and pelt length in mink.
  相似文献   

11.
12.

Background

Using the dataset provided for Genetic Analysis Workshop 14 by the Collaborative Study on the Genetics of Alcoholism, we performed genome-wide linkage analysis of age at onset of alcoholism to compare the utility of microsatellites and single-nucleotide polymorphisms (SNPs) in genetic linkage study.

Methods

A multipoint nonparametric variance component linkage analysis method was applied to the survival distribution function obtained from semiparametric proportional hazards model of the age at onset phenotype of alcoholism. Three separate linkage analyses were carried out using 315 microsatellites, 2,467 and 9,467 SNPs, spanning the 22 autosomal chromosomes.

Results

Heritability of age at onset was estimated to be approximately 12% (p < 0.001). We observed weak correlation, both in trend and strength, of genome-wide linkage signals between microsatellites and SNPs. Results from SNPs revealed more and stronger linkage signals across the genome compared with those from microsatellites. The only suggestive evidence of linkage from microsatellites was on chromosome 1 (LOD of 1.43). Differences in map densities between the two sets of SNPs used in this study did not appear to confer an advantage in terms of strength of linkage signals.

Conclusion

Our study provided support for better performance of dense SNP maps compared with the sparse mirosatellite maps currently available for linkage analysis of quantitative traits. This better performance could be attributable to precise definition and high map resolutions achievable with dense SNP maps, thus resulting in increased power to detect possible loci affecting given trait or disease.
  相似文献   

13.

Background

Metabolic syndrome is a risk factor for type 2 diabetes and cardiovascular disease. We identified common genetic variants that alter the risk for metabolic syndrome in the Korean population. To isolate these variants, we conducted a multiple-genotype and multiple-phenotype genome-wide association analysis using the family-based quasi-likelihood score (MFQLS) test. For this analysis, we used 7211 and 2838 genotyped study subjects for discovery and replication, respectively. We also performed a multiple-genotype and multiple-phenotype analysis of a gene-based single-nucleotide polymorphism (SNP) set.

Results

We found an association between metabolic syndrome and an intronic SNP pair, rs7107152 and rs1242229, in SIDT2 gene at 11q23.3. Both SNPs correlate with the expression of SIDT2 and TAGLN, whose products promote insulin secretion and lipid metabolism, respectively. This SNP pair showed statistical significance at the replication stage.

Conclusions

Our findings provide insight into an underlying mechanism that contributes to metabolic syndrome.
  相似文献   

14.

Background

Genome wide association studies have identified microtubule associated protein tau (MAPT) H1 haplotype single nucleotide polymorphisms (SNPs) as leading common risk variants for Parkinson’s disease, progressive supranuclear palsy and corticobasal degeneration. The MAPT risk variants fall within a large 1.8 Mb region of high linkage disequilibrium, making it difficult to discern the functionally important risk variants. Here, we leverage the strong haplotype-specific expression of MAPT exon 3 to investigate the functionality of SNPs that fall within this H1 haplotype region of linkage disequilibrium.

Methods

In this study, we dissect the molecular mechanisms by which haplotype-specific SNPs confer allele-specific effects on the alternative splicing of MAPT exon 3. Firstly, we use haplotype-hybrid whole-locus genomic MAPT vectors studies to identify functional SNPs. Next, we characterise the RNA-protein interactions at two loci by mass spectrometry. Lastly, we knockdown candidate splice factors to determine their effect on MAPT exon 3 using a novel allele-specific qPCR assay.

Results

Using whole-locus genomic DNA expression vectors to express MAPT haplotype variants, we demonstrate that rs17651213 regulates exon 3 inclusion in a haplotype-specific manner. We further investigated the functionality of this region using RNA-electrophoretic mobility shift assays to show differential RNA-protein complex formation at the H1 and H2 sequence variants of SNP rs17651213 and rs1800547 and subsequently identified candidate trans-acting splicing factors interacting with these functional SNPs sequences by RNA-protein pull-down experiment and mass spectrometry. Finally, gene knockdown of candidate splice factors identified by mass spectrometry demonstrate a role for hnRNP F and hnRNP Q in the haplotype-specific regulation of exon 3 inclusion.

Conclusions

We identified common splice factors hnRNP F and hnRNP Q regulating the haplotype-specific splicing of MAPT exon 3 through intronic variants rs1800547 and rs17651213. This work demonstrates an integrated approach to characterise the functionality of risk variants in large regions of linkage disequilibrium.
  相似文献   

15.
16.

Background

In recent years, both single-nucleotide polymorphism (SNP) array and functional magnetic resonance imaging (fMRI) have been widely used for the study of schizophrenia (SCZ). In addition, a few studies have been reported integrating both SNPs data and fMRI data for comprehensive analysis.

Methods

In this study, a novel sparse representation based variable selection (SRVS) method has been proposed and tested on a simulation data set to demonstrate its multi-resolution properties. Then the SRVS method was applied to an integrative analysis of two different SCZ data sets, a Single-nucleotide polymorphism (SNP) data set and a functional resonance imaging (fMRI) data set, including 92 cases and 116 controls. Biomarkers for the disease were identified and validated with a multivariate classification approach followed by a leave one out (LOO) cross-validation. Then we compared the results with that of a previously reported sparse representation based feature selection method.

Results

Results showed that biomarkers from our proposed SRVS method gave significantly higher classification accuracy in discriminating SCZ patients from healthy controls than that of the previous reported sparse representation method. Furthermore, using biomarkers from both data sets led to better classification accuracy than using single type of biomarkers, which suggests the advantage of integrative analysis of different types of data.

Conclusions

The proposed SRVS algorithm is effective in identifying significant biomarkers for complicated disease as SCZ. Integrating different types of data (e.g. SNP and fMRI data) may identify complementary biomarkers benefitting the diagnosis accuracy of the disease.
  相似文献   

17.

Background

Hot spot residues are functional sites in protein interaction interfaces. The identification of hot spot residues is time-consuming and laborious using experimental methods. In order to address the issue, many computational methods have been developed to predict hot spot residues. Moreover, most prediction methods are based on structural features, sequence characteristics, and/or other protein features.

Results

This paper proposed an ensemble learning method to predict hot spot residues that only uses sequence features and the relative accessible surface area of amino acid sequences. In this work, a novel feature selection technique was developed, an auto-correlation function combined with a sliding window technique was applied to obtain the characteristics of amino acid residues in protein sequence, and an ensemble classifier with SVM and KNN base classifiers was built to achieve the best classification performance.

Conclusion

The experimental results showed that our model yields the highest F1 score of 0.92 and an MCC value of 0.87 on ASEdb dataset. Compared with other machine learning methods, our model achieves a big improvement in hot spot prediction.
  相似文献   

18.
Xia  Xiaoxuan  Weng  Haoyi  Men  Ruoting  Sun  Rui  Zee  Benny Chung Ying  Chong  Ka Chun  Wang  Maggie Haitian 《BMC genetics》2018,19(1):67-37

Background

Association studies using a single type of omics data have been successful in identifying disease-associated genetic markers, but the underlying mechanisms are unaddressed. To provide a possible explanation of how these genetic factors affect the disease phenotype, integration of multiple omics data is needed.

Results

We propose a novel method, LIPID (likelihood inference proposal for indirect estimation), that uses both single nucleotide polymorphism (SNP) and DNA methylation data jointly to analyze the association between a trait and SNPs. The total effect of SNPs is decomposed into direct and indirect effects, where the indirect effects are the focus of our investigation. Simulation studies show that LIPID performs better in various scenarios than existing methods. Application to the GAW20 data also leads to encouraging results, as the genes identified appear to be biologically relevant to the phenotype studied.

Conclusions

The proposed LIPID method is shown to be meritorious in extensive simulations and in real-data analyses.
  相似文献   

19.
20.

Background

The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.

Methods

We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.

Results

Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.

Conclusions

The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号