首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors.

Results

In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants.

Conclusions

Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0359-1) contains supplementary material, which is available to authorized users.  相似文献   

2.
Selection signals of Korean cattle might be attributed largely to artificial selection for meat quality. Rapidly increased intragenic markers of newly annotated genes in the bovine genome would help overcome limited findings of genetic markers associated with meat quality at the selection signals in a previous study. The present study examined genetic associations of marbling score (MS) with intragenic nucleotide variants at selection signals of Korean cattle. A total of 39 092 nucleotide variants of 407 Korean cattle were utilized in the association analysis. A total of 129 variants were selected within newly annotated genes in the bovine genome. Their genetic associations were analyzed using the mixed model with random polygenic effects based on identical-by-state genetic relationships among animals in order to control for spurious associations produced by population structure. Genetic associations of MS were found (P<3.88×10−4) with six intragenic nucleotide variants on bovine autosomes 3 (cache domain containing 1, CACHD1), 5 (like-glycosyltransferase, LARGE), 16 (cell division cycle 42 binding protein kinase alpha, CDC42BPA) and 21 (snurportin 1, SNUPN; protein tyrosine phosphatase, non-receptor type 9, PTPN9; chondroitin sulfate proteoglycan 4, CSPG4). In particular, the genetic associations with CDC42BPA and LARGE were confirmed using an independent data set of Korean cattle. The results implied that allele frequencies of functional variants and their proximity variants have been augmented by directional selection for greater MS and remain selection signals in the bovine genome. Further studies of fine mapping would be useful to incorporate favorable alleles in marker-assisted selection for MS of Korean cattle.  相似文献   

3.
Sawyer SL  Howell WM  Brookes AJ 《BioTechniques》2003,35(2):292-6, 298
Genome variation provides researchers with thousands of markers with which to study human demographic history and phenotypes. Insertion-deletion (indel) polymorphism is an important and abundant form of human genome variation, and convenient methods for genotyping indels are therefore needed. Here we evaluate dynamic allele-specific hybridization (DASH) for its ability to score indels. Evaluation of six model indel DASH assays based on synthetic oligonucleotides showed that length differences of 1-5 bp were accurately scored. Only single probes were required to assay indels of 3-4 bp or less, while longer indels tended to require the use of both allele probes serially. The best results were obtained by central placing of the probe over the indel. Model study findings were confirmed by running indel DASH assays upon PCR-amplified targets representing four polymorphisms from Alzheimer's disease candidate genes APBB1 and LRP1. These indels were genotyped in a set of 121 patients and 156 controls. While no disease association was found, the data quality confirmed that DASH is a robust and useful procedure for genotyping indels of the size range typically found in the human genome.  相似文献   

4.
Whole‐genome sequencing studies are vital to gain a thorough understanding of genomic variation. Here, we summarize the results of a whole‐genome sequencing study comprising 88 horses and ponies from diverse breeds at 19.1× average coverage. The paired‐end reads were mapped to the current EquCab3.0 horse reference genome assembly, and we identified approximately 23.5 million single nucleotide variants and 2.3 million short indel variants. Our dataset included at least 7 million variants that were not previously reported. On average, each individual horse genome carried ~5.7 million single nucleotides and 0.8 million small indel variants with respect to the reference genome assembly. The variants were functionally annotated. We provide two examples for potentially deleterious recessive alleles that were identified in a heterozygous state in individual genome sequences. Appropriate management of such deleterious recessive alleles in horse breeding programs should help to improve fertility and reduce the prevalence of heritable diseases. This comprehensive dataset has been made publicly available, will represent a valuable resource for future horse genetic studies and supports the goal of accelerating the rates of genetic gain in domestic horse.  相似文献   

5.
6.

Background  

Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome.  相似文献   

7.
Mono-ADP-ribosyltransferase (ART) 4 belongs to a family of ectoenzymes that catalyze the transfer of ADP-ribose from NAD+ to a target protein. ART4 could be detected on HEL cells and erythrocytes by FACS analysis while it was absent from activated monocytes, despite the presence of ART4 mRNA in these cells. The predicted glycosylphosphatidylinositol (GPI) linkage of ART4 could be verified by showing that treatment of erythrocytes, HEL cells and ART4-transfected HEK-293-T cells with phosphatidylinositol-specific phospholipase C results in a decrease in ART4 expression. Furthermore, an ART4 construct carrying an Ala285Val mutation that is critical for the formation of a GPI anchor failed to be expressed in transfected C-33A cells. Analysis of the gene structure revealed that the first of the three exons was at least 236 bp longer than previously published and that splicing occurred in the coding region of the mRNA from HEL cells and monocytes. When carrying out 5' inverse RACE-PCR we confirmed the existence of 5 ATGs in the 5' untranslated region (5'UTR). By deletion and site-directed mutagenesis of the ATGs, we showed that the first two ATGs impair translation and that both the 3rd and 5th ATG can be used for translation initiation after expression in C-33A cells. On analysis of the 3'UTR, which contains 2 adenylate/uridylate-rich elements (AREs), we detected one variant in monocytes that would be devoid of a GPI-anchor signal and thus could represent a secreted form of ART4. Thus, alternative splicing and the use of regulatory elements in the 5'UTR and 3'UTR represent means to control ART4 expression.  相似文献   

8.

Background

Gene expression microarrays measure the levels of messenger ribonucleic acid (mRNA) in a sample using probe sequences that hybridize with transcribed regions. These probe sequences are designed using a reference genome for the relevant species. However, most model organisms and all humans have genomes that deviate from their reference. These variations, which include single nucleotide polymorphisms, insertions of additional nucleotides, and nucleotide deletions, can affect the microarray’s performance. Genetic experiments comparing individuals bearing different population-associated single nucleotide polymorphisms that intersect microarray probes are therefore subject to systemic bias, as the reduction in binding efficiency due to a technical artifact is confounded with genetic differences between parental strains. This problem has been recognized for some time, and earlier methods of compensation have attempted to identify probes affected by genome variants using statistical models. These methods may require replicate microarray measurement of gene expression in the relevant tissue in inbred parental samples, which are not always available in model organisms and are never available in humans.

Results

By using sequence information for the genomes of organisms under investigation, potentially problematic probes can now be identified a priori. However, there is no published software tool that makes it easy to eliminate these probes from an annotation. I present equalizer, a software package that uses genome variant data to modify annotation files for the commonly used Affymetrix IVT and Gene/Exon platforms. These files can be used by any microarray normalization method for subsequent analysis. I demonstrate how use of equalizer on experiments mapping germline influence on gene expression in a genetic cross between two divergent mouse species and in human samples significantly reduces probe hybridization-induced bias, reducing false positive and false negative findings.

Conclusions

The equalizer package reduces probe hybridization bias from experiments performed on the Affymetrix microarray platform, allowing accurate assessment of germline influence on gene expression.  相似文献   

9.
Recent attempts to discover genetic factors affecting cattle resistance/susceptibility to bovine spongiform encephalopathy (BSE) have led to the identification of two insertion/deletion (indel) polymorphisms, located within the promoter and intron 1 of the prion protein gene PRNP, showing a significant association with the occurrence of classical form of the disease. Because the effect of the polymorphisms was studied only in few populations, in this study we investigated whether previously described association of PRNP indel polymorphisms with BSE susceptibility in cattle is also present in Polish cattle population. We found a significant relation between the investigated PRNP indel polymorphisms (23 and 12 bp indels), and susceptibility of Polish Holstein-Friesian cattle to classical BSE (P < 0.05). The deletion variants of both polymorphisms were related to increased susceptibility, whereas insertion variants were protective against BSE.  相似文献   

10.
高危型人乳头瘤病毒(Human papillomavims,HPV)是宫颈癌的主要致病因子。利用Arraydesigner2.0和BLAST等生物学软件对10种型别的人乳头瘤病毒全基因组序列进行分析,设计高特异性、熔解温度(Tm)和GC含量相近的60mer HPV型特异性寡核苷酸探针,用于HPV检测芯片的制备,并对其中四型最常见HPV病毒(HPV6,11,16,18)探针的有效性进行初步验证,结果表明设计所得的探针型特异性好,可以应用于HPV的检测与分型。  相似文献   

11.
In the present study, we describe the deep sequencing and structural analysis of the Holstein breed bull genome. Our aim was to receive a high-quality Holstein bull genome reference sequence and to describe different types of variations in its genome compared to Hereford breed as a reference. We generated four mate-paired libraries and one fragment library from 30 μg of genomic DNA. Colour space fasta were mapped and paired to the reference cow (Bos taurus) genome assembly from Oct. 2011 (Baylor 4.6.1/bosTau7). Initial sequencing resulted in the 4,864,054,296 of 50-bp reads. Average mapping efficiency was 71.7 % and altogether 3,494,534,136 reads and 157,928,163,086 bp were successfully mapped, resulting in 60 × coverage. This is the highest coverage for bovine genome published so far. Tertiary analysis found 6,362,988 SNPs in the bull’s genome, 4,045,889 heterozygous and 2,317,099 homozygous variants. Annotation revealed that 4,330,337 of all discovered SNPs were annotated in the dbSNP database (build 137) and therefore 2,032,651 SNPs were novel. Large indel variations accounted for the 245,947,845 bp of the variation in entire genome and their number was 312,879. We also found that small indels (number was 633,310) accounted for the total variation of 2,542,552 nucleotides in the genome. Only 106,768 small indels were listed in the dbSNP. Finally, we identified 2,758 inversions in the genome of the bull covering in total 23,099,054 bp of genome’s variation. The largest inversion was 87,440 bp in size. In conclusion, the present study discovered different types of novel variants in bull’s genome after high-coverage sequencing. Better knowledge of the functions of these variations is needed.  相似文献   

12.
13.
14.
15.
We previously reported mutations in North American West Nile viruses (WNVs) with a small-plaque (sp), temperature-sensitive (ts), and/or mouse-attenuated (att) phenotype. Using an infectious clone, site-directed mutations and 3' untranslated region (3'UTR) exchanges were introduced into the WNV NY99 genome. Characterization of mutants demonstrated that a combination of mutations involving the NS4B protein (E249G) together with either a mutation in the NS5 protein (A804V) or three mutations in the 3'UTR (A10596G, C10774U, A10799G) produced sp, ts, and/or att variants. These results suggested that the discovery of North American WNV-phenotypic variants is rare because of the apparent requirement of concurrent polygenic mutations.  相似文献   

16.
DNA from 130 individuals was studied with up to 18 (primarily cDNA) probes for the frequency of variants in this initial experiment to determine the feasibility of this approach to screening for germinal gene mutations. This approach, a modification of the usual restriction enzyme mapping strategy, focuses on the detection of insertion/deletion/rearrangement (I/D/R) variants, because the DNA is digested with only two restriction enzymes before transfer to membranes and hybridization with an extensive series of unrelated probes. Some 4000 noncontiguous, independent DNA fragments ("loci"), functional loci, pseudogenes or anonymous fragments, (a total of approximately 77,400 kb) were screened. 19 different classes and 31 copies of presumably I/D/R variants were detected while 4 different classes and 24 individuals exhibiting base substitution variants were observed. 18 of the 19 I/D/R classes were rare variants, that is, each were observed at a frequency, within this population, of less than 0.01; 3 of the base substitution classes existed at polymorphic frequencies and only 1 was a rare variant. 10 of the I/D/R classes, occurring in a total of 18 individuals, were detected with probes which are not known to be associated with repetitive elements. This is a variant frequency for I/D/R variants without known repetitive elements of 0.15 classes and 0.23 copies for each 1000 kb screened; this would extrapolate to 1600 such variant sites in the genome of each individual. Within the context of a mutation screening program, the rare variants, either with or without repetitive elements, would have a higher probability of being de novo mutations than would polymorphic variants; this former group would be the focus of family studies to test for the heritability of the allele (fragment pattern). Sufficient DNA probes are available to screen a significant portion of the human genome for genetic variation and de novo mutations of this type.  相似文献   

17.
A major genetic component of BSE susceptibility   总被引:2,自引:0,他引:2  

Background  

Coding variants of the prion protein gene (PRNP) have been shown to be major determinants for the susceptibility to transmitted prion diseases in humans, mice and sheep. However, to date, the effects of polymorphisms in the coding and regulatory regions of bovine PRNP on bovine spongiform encephalopathy (BSE) susceptibility have been considered marginal or non-existent. Here we analysed two insertion/deletion (indel) polymorphisms in the regulatory region of bovine PRNP in BSE affected animals and controls of four independent cattle populations from UK and Germany.  相似文献   

18.
The domestic dog serves as an excellent model to investigate the genetic basis of disease. More than 400 heritable traits analogous to human diseases have been described in dogs. To further canine medical genetics research, we established the Dog Biomedical Variant Database Consortium (DBVDC) and present a comprehensive list of functionally annotated genome variants that were identified with whole genome sequencing of 582 dogs from 126 breeds and eight wolves. The genomes used in the study have a minimum coverage of 10× and an average coverage of ~24×. In total, we identified 23 133 692 single‐nucleotide variants (SNVs) and 10 048 038 short indels, including 93% undescribed variants. On average, each individual dog genome carried ~4.1 million single‐nucleotide and ~1.4 million short‐indel variants with respect to the reference genome assembly. About 2% of the variants were located in coding regions of annotated genes and loci. Variant effect classification showed 247 141 SNVs and 99 562 short indels having moderate or high impact on 11 267 protein‐coding genes. On average, each genome contained heterozygous loss‐of‐function variants in 30 potentially embryonic lethal genes and 97 genes associated with developmental disorders. More than 50 inherited disorders and traits have been unravelled using the DBVDC variant catalogue, enabling genetic testing for breeding and diagnostics. This resource of annotated variants and their corresponding genotype frequencies constitutes a highly useful tool for the identification of potential variants causative for rare inherited disorders in dogs.  相似文献   

19.
The role of the 5'-untranslated region (5'UTR) in the replication of enteroviruses has been studied by using a series of poliovirus type 3 (PV3) replicons containing the chloramphenicol acetyltransferase reporter gene in which the 5'UTR was replaced by the 5'UTR of either coxsackievirus B4 or human rhinovirus 14 or composite 5'UTRs derived from sequences of PV3, human rhinovirus 14, coxsackievirus B4, or encephalomyocarditis virus. The results indicate that efficient replication of an enterovirus genome requires a compatible interaction between the 5'-terminal cloverleaf structure and the coding and/or 3'-noncoding regions of the genome. A crucial determinant of this interaction is the stem-loop formed by nucleotides 46 to 81 (stem-loop d). The independence of the cloverleaf structure formed by the 5'-terminal 88 nucleotides and the ribosome landing pad or internal ribosome entry site (IRES) was investigated by constructing a 5'UTR composed of the PV3 cloverleaf and the IRES from encephalomyocarditis virus. Chloramphenicol acetyltransferase gene-containing replicons and viruses containing this recombinant 5'UTR showed levels of replication similar to those of the corresponding genomes containing the complete PV3 5'UTR, indicating that the cloverleaf and the IRES may be regarded as functionally independent and nonoverlapping elements.  相似文献   

20.
To find single-nucleotide polymorphisms (SNPs) in the human genome, three modern technologies of molecular genetic analysis were combined: the ligase detection reaction (LDR), rolling circle amplification (RCA), and immobilized microarray of gel elements (IMAGE). SNPs were detected in target DNA by selective ligation of allele-specific nucleotides in microarrays. The ligation product was assayed in microarray gel pads by RCA. Two variants of microarray analysis were compared. One included selective ligation of short oligonu-cleotides immobilized in a microarray with subsequent amplification with a preformed circular probe (a common circle). The probe was especially designed for human genome research. The other variant employed immobilized allele-specific padlock probes, which could be circularized as a result of selective ligation. Codon 72 SNP of the human p53 gene was used as a model. RCA in microarrays proved to be a quantitative assay and, in combination with LDR, allowed efficient discrimination of alleles. The principles and prospects of LDR/RCA in microarrays are discussed.Translated from Molekulyarnaya Biologiya, Vol. 39, No. 1, 2005, pp. 30–39.Original Russian Text Copyright © 2005 by Kashkin, Strizhkov, Gryadunov, Surzhikov, Grechishnikova, Kreindlin, Chupeeva, Evseev, Turygin, Mirzabekov.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号