首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Computer-aided protein-coding gene prediction in uncharacterized genomic DNA sequences is one of the most important issues of biological signal processing.A modified filter method based on a statistically optimal null filter(SONF) theory is proposed for recognizing protein-coding regions.The square deviation gain(SDG) between the input and output of the model is used to identify the coding regions.The effective SDG amplification model with Class I and Class II enhancement is designed to suppress the non-coding regions.Also,an evaluation algorithm has been used to compare the modified model with most gene prediction methods currently available in terms of sensitivity,specificity and precision.The performance for identification of protein-coding regions has been evaluated at the nucleotide level using benchmark datasets and 91.4%,96%,93.7% were obtained for sensitivity,specificity and precision,respectively.These results suggest that the proposed model is potentially useful in gene finding field,which can help recognize protein-coding regions with higher precision and speed than present algorithms.  相似文献   

2.
G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR sequences have been collected, the ligand specificity of many GPCRs is still unknown and only one crystal structure of the rhodopsin-like family has been solved. Therefore, identifying GPCR types only from sequence data has become an important research issue. In this study, a novel technique for identifying GPCR types based on the weighted Levenshtein distance between two receptor sequences and the nearest neighbor method (NNM) is introduced, which can deal with receptor sequences with different lengths directly. In our experiments for classifying four classes (acetylcholine, adrenoceptor, dopamine, and serotonin) of the rhodopsin-like family of GPCRs, the error rates from the leave-one-out procedure and the leave-half-out procedure were 0.62% and 1.24%, respectively. These results are prior to those of the covariant discriminant algorithm, the support vector machine method, and the NNM with Euclidean distance.  相似文献   

3.
Helicoverpa armigera, cotton bollworm, is one of the most disastrous pests worldwide, threatening various food and economic crops. Functional genomic tools may provide efficient approaches for its management. The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) system, dependent on a single guide RNA (sgRNA), has been used to induce indels for targeted mutagenesis in cotton bollworm. However, genomic deletions may be more desirable to disrupt the function of noncoding genes or regulatory sequences. By injecting two sgRNAs with Cas9 protein targeting different exons, we obtained predictable genomic deletions of several hundred bases. We achieved this type of modification with different combinations of sgRNA pairs, including HaCad and HaABCC2. Our finding indicated that CRISPR/Cas9 can be used as an efficient tool to engineer genomes with chromosomal deletion in H. armigera.  相似文献   

4.
EST analysis in barley defines a unigene set comprising 4,000 genes   总被引:3,自引:0,他引:3  
We report the generation of 13,109 EST (Expressed Sequence Tag) sequences from barley as a first step towards the generation of a unigene set for this organism. Sequences were generated from three libraries encompassing 7,568 cDNA clones. Comparisons to nucleic acid and protein sequence databases enabled the assignment of putative functions to the mRNAs. The results of the searches against protein databases were parsed and built into a regularly updated database, available over the World Wide Web. The Stack_Pack clustering system has been applied to survey the level of redundancy, which was calculated to amount to 69%, thus we identified 4,000 different barley genes. To prove the usability of the results of the clustering process for further experiments, we subjected alignments with sequences similar to elongation factor 1 alpha to additional analysis. These sequences represented the largest group with identical putative functions (228 members) and clustering based on the analysis of 3′ sequences subdivided the group into five different assemblies. Alignments of the consensus sequences facilitated the development of PCR assays suitable for genetic mapping of four of the different gene-family members, which reside on chromosomes 2H, 4H and 5H, thus demonstrating the suitability of the cluster-results as a basis for in-depth analyses of barley gene families. Received: 15 March 2001 / Accepted: 18 April 2001  相似文献   

5.
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.  相似文献   

6.
Complete Genomic Sequence of a Chinese Isolate of Duck Hepatitis Virus   总被引:1,自引:0,他引:1  
The complete genomic sequence of Duck hepatitis virus 1(DHV-1) ZJ-V isolate was sequenced and determined to be 7 691 nucleotides(nt) in length with a 5'-terminal un-translated region(UTR) of 626 nt and a 3'-terminal UTR of 315 nt(not including the poly(A) tail).One large open reading frame(ORF) was found within the genome(nt 627 to 7 373) coding for a polypeptide of 2 249 amino acids.Our data also showed that the poly(A) tail of DHV-1 has at least 22 A's.Sequence comparison revealed significant homology(from 91.9% to 95.7%) between the protein sequences of the ZJ-V isolate and those of 21 reference isolates.Although DHV-1 has been classified as an unassigned virus in the Picornaviridae family,its genome showed some unique characteristics.DHV-1 contains 3 copies of the 2A gene and only 1 copy of the 3B gene,and its 3'-NCR is longer than those of other picornaviruses.Phylogenetic analysis to do sequence homology based on the VP1 protein sequences showed that the ZJ-V isolate shares high sequence homology with the reported DHV-1 isolates(from 92.9% to 99.2%),indicating that DHV-1 is genetically stable.  相似文献   

7.
The NetAcet method has been developed to make predictions of N-terminal acetylation sites, but more information of the data set could be utilized to improve the performance of the model. By employing a new way to extract patterns from sequences and using a sample balancing mechanism, we obtained a correlation coefficient of 0.85, and a sensitivity of 93% on an independent mammalian data set. A web server utilizing this method has been constructed and is available at http://166.111.24.5/acetylation.html.  相似文献   

8.
The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coil genome, with its length more than 5,000 bp, and a mismatch probability less than 4%.  相似文献   

9.
RNA–protein interactions influence many biological processes. Identifying the binding sites of RNA-binding proteins(RBPs) remains one of the most fundamental and important challenges to the studies of such interactions. Capturing RNA and RBPs via chemical crosslinking allows stringent purification procedures that significantly remove the non-specific RNA and protein interactions. Two major types of chemical crosslinking strategies have been developed to date, i.e., UV-enabled crosslinking and enzymatic mechanism-based covalent capture. In this review, we compare such strategies and their current applications, with an emphasis on the technologies themselves rather than the biology that has been revealed. We hope such methods could benefit broader audience and also urge for the development of new methods to study RNA RBP interactions.  相似文献   

10.
Methionine synthase (MS) is grouped into two classes. Class One MS (MetH) and Class Two MS (MetE) share no homology and differ in their catalytic model. Based on the conserved sequences of metE genes from different organisms, a segment of the metE gene was first cloned from Pichia pastoris genomic DNA by PCR, and its 5‘ and 3‘ regions were further cloned by 5‘- and 3‘-rapid amplification of cDNA ends (RACE), respectively. The assembled sequence reveals an open reading frame encoding a polypeptide of 768 residues, and the deduced product shares 76% identity with MetE of Saccharomyces cerevisiae. P. pastoris methionine synthase (PpMetE) consists of two domains common to MetEs. The active site is located in the C-terminal domain, in which the residues involved in the interaction of zinc with substrates are conserved. Homologous expression of PpMetE in P. pastoris was achieved, and the heterologous expression of PpMetE in the S. cerevisiae strain XJB3-1D that is MetE-defective restored the growth of the mutant on methionine-free minimal media. The gene sequence has been submitted to GenBank/EMBL/DDBJ under accession No. AY601648.  相似文献   

11.
12.
Kang HM  Zaitlen NA  Wade CM  Kirby A  Heckerman D  Daly MJ  Eskin E 《Genetics》2008,178(3):1709-1723
Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.  相似文献   

13.
14.
cDNA selection with YACs   总被引:1,自引:0,他引:1  
Identification of expressed sequence tags (ESTs) in large genomic segments is an important step in positional cloning and genomic mapping studies. A simple and efficient polymerase chain reaction (PCR)-based approach is described here to identify coding sequences in large genomic fragments of DNA cloned in vectors such as yeast artificial chromosome (YAC) vectors. The method is based on blocking of sequences such as repetitive and GC rich sequences in the genomic DNA immobilized on nylon paper discs prior to hybridization of the discs to cDNA library, and recovery of the selected cDNAs by the PCR. Single or multiple cDNA libraries can be used in the selection procedure. The procedure has been used successfully also with total yeast DNA containing a YAC.  相似文献   

15.
We determined the nucleotide sequences of 64 TAC (transformation-competent artificial chromosome) clones selected from genomic libraries of Lotus japonicus accession Miyakojima MG-20 based on the sequence information of expressed sequence tags (ESTs), cDNAs, genes and DNA markers from L. japonicus and other legumes. The length of the DNA regions sequenced in this study was 6,370,255 bp, and the total length of the L. japonicus genome sequenced so far is 32,537,698 bp together with the nucleotide sequences of 256 TAC clones previously reported. Five hundred forty-eight potential protein-encoding genes with known or predicted functions, 127 gene segments and 224 pseudogenes were assigned to the newly sequenced regions by computer prediction and similarity searches against the sequences in protein and EST databases. Based on the nucleotide sequences of the clones, simple sequence repeat length polymorphism (SSLP) or derived cleaved amplified polymorphic sequence (dCAPS) markers were generated, and each clone was genetically localized onto the linkage map of two accessions of L. japonicus, MG-20 and Gifu B-129. The sequence data, gene information and mapping information are available through the World Wide Web at http://www.kazusa.or.jp/lotus/.  相似文献   

16.
FISH digital imaging microscopy in mosquito genomics   总被引:1,自引:0,他引:1  
The yellow fever mosquito, Aedes aegypti, transmits pathogens that affect both humans and livestock, and has been the focus of extensive research to identify genetic loci that may be useful in control strategies. Fluorescence in situ hybridization (FISH) and digital imaging microscopy have provided a rapid mechanism to populate the physical map with probes derived from genetic markers, cDNAs and recombinant genomic libraries. When the physical and genetic linkage maps are aligned, map-based cloning will allow the rapid isolation of target genomic sequences. The strategy of FISH mapping and the results of initial hybridization studies are reviewed here by Martin Ferguson, Susan Brown and Dennis Knudson. An Ae. aegypti-specific genomic database, which collates data from mapping studies, sequences, references and other relevant information, is also discussed.  相似文献   

17.
杨新平  于常海 《生命科学》1999,11(4):189-191
cDNA捕捉法或cDNA直选法是一种以表达为基础的基因分离技术,直接利用目的区域的基因组DNA捕捉cDNA,快速从大的基因组区域分离表达序列。该法已成功地应用于定位克隆和详尽的转录图谱的构建。  相似文献   

18.
Using the sequence information of expressed sequences tags (ESTs), cDNAs and genes from Lotus japonicus and other legumes, 73 TAC (transformation-competent artificial chromosomes) clones were selected from a genomic library of L. japonicus accession MG-20, and their nucleotide sequences were determined. The length of the DNA sequenced in this study was 7,455,959 bp, and the total length of the DNA regions sequenced so far is 26,167,443 bp together with the nucleotide sequences of 183 TAC clones previously reported. By similarity searches against the sequences in protein and EST databases and prediction by computer programs, a total of 699 potential protein-encoding genes with known or predicted functions, 163 gene segments and 267 pseudogenes were assigned to the newly sequenced regions. Based oil the nucleotide sequences of the clones, simple sequence repeat length polymorphism (SSLP) or derived cleaved amplified polymorphic sequence (dCAPS) markers were generated, and each clone was located onto the linkage map of two accessions of L. japonicus, Gifu B-129 and Miyakojima MG-20. The sequence data, gene information and mapping information are available through the World Wide Web at http://www.kazusa.or.jp/lotus/.  相似文献   

19.
20.
FISH physical mapping with barley BAC clones   总被引:7,自引:0,他引:7  
Fluorescence in situ hybridization (FISH) is a useful technique for physical mapping of genes, markers, and other single- or low-copy sequences. Since clones containing less than 10 kb of single-copy DNA do not reliably produce detectable signals with current FISH techniques in plants, a bacterial artificial chromosome (BAC) partial library of barley was constructed and a FISH protocol for detecting unique sequences in barley BAC clones was developed. The library has a 95 kb average barley insert, representing about 20% of a barley genome. Two BAC clones containing hordein gene sequences were identified and partially characterized. FISH using these two BAC clones as probes showed specific hybridization signals near the end of the short arm of one pair of chromosomes. Restriction digests of these two BAC clones were compared with restriction patterns of genomic DNA; all fragments contained in the BAC clones corresponded to bands present in the genomic DNA, and the two BAC clones were not identical. The barley inserts contained in these two BAC clones were faithful copies of the genomic DNA. FISH with four BAC clones with inserts varying from 20 to 150 kb, showed distinct signals on paired chromatids. Physical mapping of single- or low-copy sequences in BAC clones by FISH will help to correlate the genetic and physical maps. FISH with BAC clones also provide an additional approach for saturating regions of interest with markers and for constructing contigs spanning those regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号