首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Background

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.

Results

In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.

Conclusions

We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.

  相似文献   

2.
Botrytis cinerea is a filamentous plant pathogen of a wide range of plant species, and its infection may cause enormous damage both during plant growth and in the post-harvest phase. We have constructed a cDNA library from an isolate of B. cinerea and have sequenced 11,482 expressed sequence tags that were assembled into 1,003 contigs sequences and 3,032 singletons. Approximately 81% of the unigenes showed significant similarity to genes coding for proteins with known functions: more than 50% of the sequences code for genes involved in cellular metabolism, 12% for transport of metabolites, and approximately 10% for cellular organization. Other functional categories include responses to biotic and abiotic stimuli, cell communication, cell homeostasis, and cell development. We carried out pair-wise comparisons with fungal databases to determine the B. cinerea unisequence set with relevant similarity to genes in other fungal pathogenic counterparts. Among the 4,035 non-redundant B. cinerea unigenes, 1,338 (23%) have significant homology with Fusarium verticillioides unigenes. Similar values were obtained for Saccharomyces cerevisiae and Aspergillus nidulans (22% and 24%, respectively). The lower percentages of homology were with Magnaporthe grisae and Neurospora crassa (13% and 19%, respectively). Several genes involved in putative and known fungal virulence and general pathogenicity were identified. The results provide important information for future research on this fungal pathogen.  相似文献   

3.

Background  

Salinization causes negative effects on plant productivity and poses an increasingly serious threat to the sustainability of agriculture. Wild soybean (Glycine soja) can survive in highly saline conditions, therefore provides an ideal candidate plant system for salt tolerance gene mining.  相似文献   

4.
Drought is the most crucial environmental factor that limits productivity of many crop plants. Exploring novel genes and gene combinations is of primary importance in plant drought tolerance research. Stress tolerant genotypes/species are known to express novel stress responsive genes with unique functional significance. Hence, identification and characterization of stress responsive genes from these tolerant species might be a reliable option to engineer the drought tolerance. Safflower has been found to be a relatively drought tolerant crop and thus, it has been the choice of study to characterize the genes expressed under drought stress. In the present study, we have evaluated differential drought tolerance of two cultivars of safflower namely, A1 and Nira using selective physiological marker traits and we have identified cultivar A1 as relatively drought tolerant. To identify the drought responsive genes, we have constructed a stress subtracted cDNA library from cultivar A1 following subtractive hybridization. Analysis of?~1,300 cDNA clones resulted in the identification of 667 unique drought responsive ESTs. Protein homology search revealed that 521 (78?%) out of 667 ESTs showed significant similarity to known sequences in the database and majority of them previously identified as drought stress-related genes and were found to be involved in a variety of cellular functions ranging from stress perception to cellular protection. Remaining 146 (22?%) ESTs were not homologous to known sequences in the database and therefore, they were considered to be unique and novel drought responsive genes of safflower. Since safflower is a stress-adapted oil-seed crop this observation has great relevance. In addition, to validate the differential expression of the identified genes, expression profiles of selected clones were analyzed using dot blot (reverse northern), and northern blot analysis. We showed that these clones were differentially expressed under different abiotic stress conditions. The implications of the analyzed genes in abiotic stress tolerance are discussed in our study.  相似文献   

5.
Salinity is a major abiotic stress that greatly affects plant growth and crop production. Sodium ions in saline soil are toxic to plants because of their adverse effects on potassium nutrition, cytosolic enzyme activities, photosynthesis, and metabolism. It is important to identify genes involved in salinity tolerance from mangrove plants that survive under saline conditions. In this study, a total of 864 randomly selected cDNA clones were isolated and sequenced from the primary cDNA library of Acanthus ebracteatus. Among the 521 readable sequences, 138 of them were assembled into 43 contigs, whereas 383 were singletons. Sequence analyses demonstrated that 349 of these expressed sequence tags showed significant homology to functional proteins, of which 18% are particularly interesting as they correspond to genes involved in stress response. Some of these clones, including putative mannitol dehydrogenase, plastidic aldolase, secretory peroxidase, ascorbate peroxidase, and vacuolar H+-ATPase, may be related to osmotic homeostasis, ionic homeostasis, and detoxification.  相似文献   

6.
7.
Sesame (Sesamum indicum) is an important oilseed crop which produces seeds with 50% oil that have a distinct flavor and contains antioxidant lignans. Because sesame lignans are known to have antioxidant and health-protecting properties, metabolic pathways for lignans have been of interest in developing sesame seeds. As an initial approach to identify genes involved in accumulation of storage products and in the biosynthesis of antioxidant lignans, 3328 expressed sequence tags (ESTs) were obtained from a cDNA library of immature seeds 5-25 days old. ESTs were clustered and analyzed by the BLASTX or FASTAX program against the GenBank NR and Arabidopsis proteome databases. To compare gene expression profiles during development of green and non-green seeds, a comparative analysis was carried out between developing sesame and Arabidopsis seed ESTs. Analyses of these two seed EST sets have helped to identify similar and different gene expression profiles during seed development, and to identify a large number of sesame seed-specific genes. In particular, we have identified EST candidates for genes possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin, and also suggest a possible metabolic pathway for the generation of cofactors required for synthesis of storage lipid in non-green oilseeds. Seed-specific expression of several candidate genes has been confirmed by northern blot analysis.  相似文献   

8.
Lotus japonicus has received increased attention as a potential model legume plant. In order to study gene expression in reproductive organs and to identify genes that play a crucial function in sexual reproduction, we constructed a cDNA library from immature flower buds containing anthers at the stage of developing tapetum cells in L. japonicus, and characterized 919 expressed sequence tags (ESTs) randomly selected from a cDNA library of the immature flower buds. The 919 ESTs analyzed were clustered into 821 non-redundant EST groups. As a result of a database search, 436 groups (53%) out of the 821 groups showed sequence similarity to genes registered in the public database. Out of these 436 groups, 109 groups showed similarity to genes encoding hypothetical proteins whose function had not yet been estimated. Three hundred eighty five groups (47%) showed no significant homology to known sequences and were classified as novel sequences. A comparison of 821 non-redundant EST sequences and EST sequences derived from the whole plant L. japonicus revealed that 474 EST sequences derived from immature flower buds were not found in the EST sequences of the whole plant. In order to confirm the expression pattern of potential reproductive-organ specific EST clones, nine clones, which were not matched to ESTs derived from the whole plant, were selected, and RT-PCR analysis was performed on these clones. As a result of RT-PCR, we found two novel anther specific clones. One clone was homologous to a gene encoding human cleft lip and palate associated transmembrane protein (CLPTM1) like protein, and the other clone did not show a significant similarity to any genes deposited in the public database. These results indicate that ESTs analyzed here represent a valuable resource for finding reproductive-organ specific genes in Lotus japonicus.  相似文献   

9.
10.
11.
12.
A normalized cDNA library was constructed from the adductor muscle of M. yessoensis and acquired 4595 high quality expressed sequence tags (ESTs). After clustering and assembly of the ESTs, 3061 unigenes containing 654 contigs and 2407 singletons were identified. The contig length ranged from 266 bp to 2364 bp and the average length of these contigs was 544 bp. Blastx nonredundant protein database analysis showed that 1522 unigenes had significant homology to known genes (E value ≤ 10? 5). By comparing to Clusters of Orthologous Groups (COG) categories, 460 unigenes were annotated (E value ≤ 10? 10). Using Kyoto Encyclopedia of Genes and Genomes (KEGG), 345 of 3061 unigenes were assigned into 103 pathways (E value ≤ 10? 5). For InterProScan searches, 1237 unigenes were annotated containing 727 different types of protein domains. 941 of the 1237 unigenes were annotated for Gene Ontology (GO) classification using Uniprot2GO associations in any category (biological, cellular, and molecular). By sequences comparability and analysis of Blastx NCBI nonredundant protein database and KEGG, 66 unigenes were identified that may be involved in genetic information processing based on the known knowledge. The study provides a material basis as useful information for the genomic analysis of shellfish.  相似文献   

13.
14.
15.
For comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 22,983 5' end expressed sequence tags (ESTs) were accumulated from normalized and size-selected cDNA libraries constructed from young (2 weeks old) plants. The EST sequences were clustered into 7137 non-redundant groups. Similarity search against public non-redundant protein database indicated that 3302 groups showed similarity to genes of known function, 1143 groups to hypothetical genes, and 2692 were novel sequences. Homologues of 5 nodule-specific genes which have been reported in other legume species were contained in the collected ESTs, suggesting that the EST source generated in this study will become a useful tool for identification of genes related to legume-specific biological processes. The sequence data of individual ESTs are available at the web site: http://www.kazusa.or.jp/en/plant/lotus/EST/.  相似文献   

16.
Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.  相似文献   

17.
18.
19.
Theobroma cacao L. expressed sequence tags (ESTs) were converted into useful genetic markers for fingerprinting individuals and genetic linkage mapping. Primers were designed to microsatellite‐containing ESTs. Twenty‐two T. cacao accessions, parents of various mapping populations segregating for disease resistance and crop yield characteristics, were tested. Twenty‐seven informative loci were discovered with 26 primer pairs. The number of detected alleles ranged from two to 11 and averaged 4.4 per locus. All 27 markers could be mapped into at least one of the existing F1 or F2 populations segregating for agronomically important traits.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号