首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 148 毫秒
1.
2.
3.
4.
Recent studies of mammalian genomes have uncovered the vast extent of copy number variations (CNVs) that contribute to phenotypic diversity. Compared to SNP, a CNV can cover a wider chromosome region, which may potentially incur substantial sequence changes and induce more significant effects on phenotypes. CNV has been becoming an alternative promising genetic marker in the field of genetic analyses. Here we firstly report an account of CNV regions in the cattle genome in Chinese Holstein population. The Illumina Bovine SNP50K Beadchips were used for screening 2047 Holstein individuals. Three different programes (PennCNV, cnvPartition and GADA) were implemented to detect potential CNVs. After a strict CNV calling pipeline, a total of 99 CNV regions were identified in cattle genome. These CNV regions cover 23.24 Mb in total with an average size of 151.69 Kb. 52 out of these CNV regions have frequencies of above 1%. 51 out of these CNV regions completely or partially overlap with 138 cattle genes, which are significantly enriched for specific biological functions, such as signaling pathway, sensory perception response and cellular processes. The results provide valuable information for constructing a more comprehensive CNV map in the cattle genome and offer an important resource for investigation of genome structure and genomic variation underlying traits of interest in cattle.  相似文献   

5.
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity.  相似文献   

6.
Currently, risk stratification is the most difficult problem in prostate cancer (PCa) management. Gleason grading cannot adequately predict cancer progression. This study aimed to identify chromosome-specific segment size alterations that could aid risk stratification and predict metastasis using a retrospective cohort-study strategy. A binary logistic regression model was generated using 16 chromosome-specific segments with size alterations (deletions and amplifications) that showed associations with disease stage (primary versus metastatic). The regression model was trained with the MSKCC PIK3R1 PCa cohort (n = 1417), and validated with the TCGA Firehose Legacy (n = 500), MSKCC Prostate Oncogenome Project (n = 218), and the SU2C/PCF Dream Team (n = 150) PCa cohorts. Furthermore, the capacity of the model to predict metastasis between primary tumours with metastasis (n = 54) and primary tumours without metastasis (n = 54) was tested. The accuracy, sensitivity, and specificity of the model at disease stage stratification ranged from 69.02% to 88.55%, 72.8% to 86.00% and 66.30% to 89.50%, respectively. The model also showed good performance at metastasis prediction with accuracy, sensitivity, and specificity of 57.41%, 62.96% and 51.85%, respectively. The study conclusion was that chromosome-specific segment size alterations can aid risk stratification and metastasis prediction. The significance of the study findings is that in combinations with clinical, biochemical, and histopathological variables, chromosome-specific alterations could improve current risk stratification and prediction models for PCa.  相似文献   

7.

Background

The advent of high throughput sequencing methods breeds an important amount of technical challenges. Among those is the one raised by the discovery of copy-number variations (CNVs) using whole-genome sequencing data. CNVs are genomic structural variations defined as a variation in the number of copies of a large genomic fragment, usually more than one kilobase. Here, we aim to compare different CNV calling methods in order to assess their ability to consistently identify CNVs by comparison of the calls in 9 quartets of identical twin pairs. The use of monozygotic twins provides a means of estimating the error rate of each algorithm by observing CNVs that are inconsistently called when considering the rules of Mendelian inheritance and the assumption of an identical genome between twins. The similarity between the calls from the different tools and the advantage of combining call sets were also considered.

Results

ERDS and CNVnator obtained the best performance when considering the inherited CNV rate with a mean of 0.74 and 0.70, respectively. Venn diagrams were generated to show the agreement between the different algorithms, before and after filtering out familial inconsistencies. This filtering revealed a high number of false positives for CNVer and Breakdancer. A low overall agreement between the methods suggested a high complementarity of the different tools when calling CNVs. The breakpoint sensitivity analysis indicated that CNVnator and ERDS achieved better resolution of CNV borders than the other tools. The highest inherited CNV rate was achieved through the intersection of these two tools (81%).

Conclusions

This study showed that ERDS and CNVnator provide good performance on whole genome sequencing data with respect to CNV consistency across families, CNV breakpoint resolution and CNV call specificity. The intersection of the calls from the two tools would be valuable for CNV genotyping pipelines.  相似文献   

8.
Development of joint application strategies for two microbial gene finders   总被引:2,自引:0,他引:2  
MOTIVATION: As a starting point in annotation of bacterial genomes, gene finding programs are used for the prediction of functional elements in the DNA sequence. Due to the faster pace and increasing number of genome projects currently underway, it is becoming especially important to have performant methods for this task. RESULTS: This study describes the development of joint application strategies that combine the strengths of two microbial gene finders to improve the overall gene finding performance. Critica is very specific in the detection of similarity-supported genes as it uses a comparative sequence analysis-based approach. Glimmer employs a very sophisticated model of genomic sequence properties and is sensitive also in the detection of organism-specific genes. Based on a data set of 113 microbial genome sequences, we optimized a combined application approach using different parameters with relevance to the gene finding problem. This results in a significant improvement in specificity while there is similarity in sensitivity to Glimmer. The improvement is especially pronounced for GC rich genomes. The method is currently being applied for the annotation of several microbial genomes. AVAILABILITY: The methods described have been implemented within the gene prediction component of the GenDB genome annotation system.  相似文献   

9.
10.
11.

Background

Somatically acquired structure variations (SVs) and copy number variations (CNVs) can induce genetic changes that are directly related to tumor genesis. Somatic SV/CNV detection using next-generation sequencing (NGS) data still faces major challenges introduced by tumor sample characteristics, such as ploidy, heterogeneity, and purity. A simulated cancer genome with known SVs and CNVs can serve as a benchmark for evaluating the performance of existing somatic SV/CNV detection tools and developing new methods.

Results

SCNVSim is a tool for simulating somatic CNVs and structure variations SVs. Other than multiple types of SV and CNV events, the tool is capable of simulating important features related to tumor samples including aneuploidy, heterogeneity and purity.

Conclusions

SCNVSim generates the genomes of a cancer cell population with detailed information of copy number status, loss of heterozygosity (LOH), and event break points, which is essential for developing and evaluating somatic CNV and SV detection methods in cancer genomics studies.  相似文献   

12.

Background

Lipids have critical functions in cellular energy storage, structure and signaling. Many individual lipid molecules have been associated with the evolution of prostate cancer; however, none of them has been approved to be used as a biomarker. The aim of this study is to identify lipid molecules from hundreds plasma apparent lipid species as biomarkers for diagnosis of prostate cancer.

Methodology/Principal Findings

Using lipidomics, lipid profiling of 390 individual apparent lipid species was performed on 141 plasma samples from 105 patients with prostate cancer and 36 male controls. High throughput data generated from lipidomics were analyzed using bioinformatic and statistical methods. From 390 apparent lipid species, 35 species were demonstrated to have potential in differentiation of prostate cancer. Within the 35 species, 12 were identified as individual plasma lipid biomarkers for diagnosis of prostate cancer with a sensitivity above 80%, specificity above 50% and accuracy above 80%. Using top 15 of 35 potential biomarkers together increased predictive power dramatically in diagnosis of prostate cancer with a sensitivity of 93.6%, specificity of 90.1% and accuracy of 97.3%. Principal component analysis (PCA) and hierarchical clustering analysis (HCA) demonstrated that patient and control populations were visually separated by identified lipid biomarkers. RandomForest and 10-fold cross validation analyses demonstrated that the identified lipid biomarkers were able to predict unknown populations accurately, and this was not influenced by patient''s age and race. Three out of 13 lipid classes, phosphatidylethanolamine (PE), ether-linked phosphatidylethanolamine (ePE) and ether-linked phosphatidylcholine (ePC) could be considered as biomarkers in diagnosis of prostate cancer.

Conclusions/Significance

Using lipidomics and bioinformatic and statistical methods, we have identified a few out of hundreds plasma apparent lipid molecular species as biomarkers for diagnosis of prostate cancer with a high sensitivity, specificity and accuracy.  相似文献   

13.
TWINSCAN is a new gene-structure prediction system that directly extends the probability model of GENSCAN, allowing it to exploit homology between two related genomes. Separate probability models are used for conservation in exons, introns, splice sites, and UTRs, reflecting the differences among their patterns of evolutionary conservation. TWINSCAN is specifically designed for the analysis of high-throughput genomic sequences containing an unknown number of genes. In experiments on high-throughput mouse sequences, using homologous sequences from the human genome, TWINSCAN shows notable improvement over GENSCAN in exon sensitivity and specificity and dramatic improvement in exact gene sensitivity and specificity. This improvement can be attributed entirely to modeling the patterns of evolutionary conservation in genomic sequence.  相似文献   

14.

Background

A newly recognized type of genetic variation, Copy Number Variation (CNV), is detected in mammalian genomes, e.g. the cattle genome. This form of variation can potentially cause phenotypic variation. Our objective was to determine whether dense SNP (single nucleotide polymorphisms) panels can capture the genetic variation due to a simple bi-allelic CNV, with the prospect of including the effect of such structural variations into genomic predictions.

Methods

A deletion type CNV on bovine chromosome 6 was predicted from its neighboring SNP with a multiple regression model. Our dataset consisted of CNV genotypes of 1,682 cows, along with 100 surrounding SNP genotypes. A prediction model was fitted considering 10 to 100 surrounding SNP and the accuracy obtained directly from the model was confirmed by cross-validation.

Results and conclusions

The accuracy of prediction increased with an increasing number of SNP in the model and the predicted accuracies were similar to those obtained by cross-validation. A substantial increase in accuracy was observed when the number of SNP increased from 10 to 50 but thereafter the increase was smaller, reaching the highest accuracy (0.94) with 100 surrounding SNP. Thus, we conclude that the genotype of a deletion type CNV and its putative QTL effect can be predicted with a maximum accuracy of 0.94 from surrounding SNP. This high prediction accuracy suggests that genetic variation due to simple deletion CNV is well captured by dense SNP panels. Since genomic selection relies on the availability of a dense marker panel with markers in close linkage disequilibrium to the QTL in order to predict their genetic values, we also discuss opportunities for genomic selection to predict the effects of CNV by dense SNP panels, when CNV cause variation in quantitative traits.  相似文献   

15.
Recent studies of mammalian genomes have uncovered the extent of copy number variation (CNV) that contributes to phenotypic diversity, including health and disease status. Here we report a first account of CNVs in the pig genome covering part of the chromosomes 4, 7, 14, and 17 already sequenced and assembled. A custom tiling oligonucleotide array was used with a median probe spacing of 409 bp for screening 12 unrelated Duroc boars that are founders of a large family material. After a strict CNV calling pipeline, 37 copy number variable regions (CNVRs) across all four chromosomes were identified, with five CNVRs overlapping segmental duplications, three overlapping pig unigenes and one overlapping a RefSeq pig mRNA. This CNV snapshot analysis is the first of its kind in the porcine genome and constitutes the basis for a better understanding of porcine phenotypes and genotypes with the prospect of identifying important economic traits.  相似文献   

16.
The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.  相似文献   

17.
Lymphadenectomy offers the only hope for cure when lymph nodes are involved. In gastric cancer, three approaches have been pursued to preoperatively predict node status in individual patients, modern radiological imaging techniques, sentinel node and technique that uses a computerized database of information to convert a large amount of information and experience to a treatment decision for an individual patient. The aim of this study was to evaluate accuracy in preoperative prediction of lymph node status in selected patients with the help of computer analysis for stage-appropriate surgery. With the help of computer programs Win Estimate and Microsoft Access, we constructed an artificial neural network that calculated a statistical prediction of nodal status in an observed patient with preoperatively gathered data. In 110 patients who have undergone R0 resection with D2 lymphadenectomy, the differences between the individual results generated by artificial neural network calculation and the actual data were compared. The accuracy of computerized predictions of N0 stage for study group is 91%, sensitivity 94% and specificity 87%. The results of accuracy of computerized preoperative prediction of N2 stage are 88%, with sensitivity 94% and specificity 88%. Preoperative analyses of patient data and tumour characteristics offers a rational approach to individualizing tumour therapy where the extent of lymph node dissection is tailored to the type, site, and stage of the tumour, thereby minimizing the disadvantages associated with the extensive operative procedure.  相似文献   

18.
19.
The genetic basis of phenotypic variation can be partially explained by the presence of copy-number variations (CNVs). Currently available methods for CNV assessment include high-density single-nucleotide polymorphism (SNP) microarrays that have become an indispensable tool in genome-wide association studies (GWAS). However, insufficient concordance rates between different CNV assessment methods call for cautious interpretation of results from CNV-based genetic association studies. Here we provide a cross-population, microarray-based map of copy-number variant regions (CNVRs) to enable reliable interpretation of CNV association findings. We used the Affymetrix Genome-Wide Human SNP Array 6.0 to scan the genomes of 1167 individuals from two ethnically distinct populations (Europe, N=717; Rwanda, N=450). Three different CNV-finding algorithms were tested and compared for sensitivity, specificity, and feasibility. Two algorithms were subsequently used to construct CNVR maps, which were also validated by processing subsamples with additional microarray platforms (Illumina 1M-Duo BeadChip, Nimblegen 385K aCGH array) and by comparing our data with publicly available information. Both algorithms detected a total of 42669 CNVs, 74% of which clustered in 385 CNVRs of a cross-population map. These CNVRs overlap with 862 annotated genes and account for approximately 3.3% of the haploid human genome.We created comprehensive cross-populational CNVR-maps. They represent an extendable framework that can leverage the detection of common CNVs and additionally assist in interpreting CNV-based association studies.  相似文献   

20.
MOTIVATION: Currently available methods for the prediction of subcellular location of mitochondrial proteins rely largely on the presence of mitochondrial targeting signals in the protein sequences. However, a large fraction of mitochondrial proteins lack such signals, making those tools ineffective for genome-scale prediction of mitochondria-targeted proteins. Here, we propose a method for genome-scale prediction of nucleus-encoded mitochondrial proteins. The new method, MITOPRED, is based on the Pfam domain occurrence patterns and the amino acid compositional differences between mitochondrial and non-mitochondrial proteins. RESULTS: MITOPRED could predict mitochondrial proteins with 100% specificity at a 44% sensitivity rate and with 67% specificity at 99% sensitivity. Additionally, it was sufficiently robust to predict mitochondrial proteins across different eukaryotic species with similar accuracy. Based on Matthews correlation coefficient measure, the prediction performance of MITOPRED is clearly superior (0.73) to those of the two popular methods TargetP (0.51) and PSORT (0.53). Using this method, we predicted the nucleus-encoded mitochondrial proteins from six complete genomes (three invertebrate, two vertebrate and one plant species) and estimated the total number in each genome. In human, our method estimated the existence of 1362 mitochondrial proteins corresponding to 4.8% of the total proteome. AVAILABILITY: MITOPRED program is freely accessible at http://mitopred.sdsc.edu. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION: Training data sets are also available at http://mitopred.sdsc.edu  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号