oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Search Results: 1 - 10 of 21 matches for " datamining "
All listed articles are free for downloading (OA Articles)
Page 1 /21
Display every page Item
A Perspective Missing Values In Data mining Applications
Dr.S.S.Dhenakaran, T. kalaivani
International Journal of Engineering Trends and Technology , 2012,
Abstract: In large database there may be some values missing in some of the attributes. These missing values are calculated first by identifying it either discrete/continuous and then the values are calculated by mean, median. In this paper the calculated missing set values are utilized to estimate the imputation of missing values in data set. Methods are discussed for learning and application of decision rules for classification of data with many missing values. A method is presented to induce decision rules from data with missing values either by format of the rules is showing no different than with missing values or no special features are specified to prepare the original data. Data with missing values complicates both the learning process and the application of solution of new data. The most common preprocessing techniques involves filling in the missing values.
Nitrogen assimilation in Citrus based on CitEST data mining
Wickert, Ester;Marcondes, Jackson;Lemos, Manoel Victor;Lemos, Eliana G.M.;
Genetics and Molecular Biology , 2007, DOI: 10.1590/S1415-47572007000500009
Abstract: assimilation of nitrate and ammonium are vital procedures for plant development and growth. from these primary paths of inorganic nitrogen assimilation, this metabolism integrates diverse paths for biosynthesis of macromolecules, such as amino acids and nucleotides, and the central intermediate metabolism, like carbon metabolism and photorespiration. this paper reports research performed in the citest (citrus expressed sequence tag) database for the main genes involved in nitrogen metabolism and those previously described in other organisms. the results show that a complete cluster of genes involved in the assimilation of nitrogen and the metabolisms of glutamine, glutamate, aspartate and asparagine can be found in the citest data. the main enzymes found were nitrate reductase (nr), nitrite reductase (nir), glutamine synthetase (gs), glutamate synthetase (gogat), glutamate dehydrogenase (gdh), aspartate aminotransferase (aspat) and asparagine synthetase (as). the different enzymes involved in this metabolism have been shown to be highly conserved among the citrus and poncirus species. this work serves as a guide for future functional analysis of these enzymes in citrus.
Missing Value Imputation using Refined Mean Substitution
R. S. Somasundaram,R. Nedunchezhian
International Journal of Computer Science Issues , 2012,
Abstract: In a previous work, it was clearly shown that the performance of the very simple imputation method based on Most Common Attribute Value called MC gave performance better than that of several complex imputation algorithms. And in that work [1] it was shown that the performance of MC was almost equal to that of best performing imputation method called Event Covering (EC). So in this work, It is tried to improve the performance of the simple imputation method MC and proposed a new algorithm. The performance of the proposed algorithm has been compared with the other simple and efficient imputation methods. The performance has been measured with respect to different rate or different percentage of missing values in the data set. To evaluate the performance, the standard WDBC data set has been used. The proposed algorithm performed very well and the arrived results were more significant and comparable.
分类分析中基于信息论准则的特征选取
黄金杰, 吕宁, 李双全, 蔡云泽
自动化学报 , 2008, DOI: 10.3724/SP.J.1004.2008.00383
Abstract: ?Featureselectionaimstoreducethedimensionalityofpatternsforclassificatoryanalysisbyselectingthemostinformativeinsteadofirrelevantand/orredundantfeatures.Inthisstudy,twonovelinformation-theoreticmeasuresforfeaturerankingarepresented:oneisanimprovedformulatoestimatetheconditionalmutualinformationbetweenthecandidatefeaturefiandthetargetclassCgiventhesubsetofselectedfeaturesS,i.e.,I(C;fi|S),undertheassumptionthatinformationoffeaturesisdistributeduniformly;theotherisamutualinformation(MI)basedconstructivecriterionthatisabletocapturebothirrelevantandredundantinputfeaturesunderarbitrarydistributionsofinformationoffeatures.Withthesetwomeasures,twonewfeatureselectionalgorithms,calledthequadraticMI-basedfeatureselection(QMIFS)approachandtheMI-basedconstructivecriterion(MICC)approach,respectively,areproposed,inwhichnoparameterslikeβinBattiti'sMIFSand(KwakandChoi)'sMIFS-Umethodsneedtobepreset.Thus,theintractableproblemofhowtochooseanappropriatevalueforβtodothetradeoffbetweentherelevancetothetargetclassesandtheredundancywiththealready-selectedfeaturesisavoidedcompletely.ExperimentalresultsdemonstratethegoodperformancesofQMIFSandMICConbothsyntheticandbenchmarkdatasets.
A Data Mining Based Approach to Customer Behaviour in an Electronic Settings  [PDF]
A. Tope-Oke, C. A. Afolalu, O. Omofade
Journal of Computer and Communications (JCC) , 2019, DOI: 10.4236/jcc.2019.75004
Abstract: The understanding of customer incidents and behaviour is crucial to the success of any organization. Evidence from literature shows a prediction pattern of products to customer. These studies predicted product characteristics leaving out the customers characteristics. To address this gap, this study aims to design datamining system and implement it on an electronic commerce organization website. The customer information and history (clickstreams) from the electronic commerce website was used to predict the customers’ behaviour. This will give meaningful and usable data patterns to organizations. Python programming language was used to design the datamining system, while PHP, HTML, and JavaScript were used for the e-commerce website. A brief description of the background of e-commerce and data mining, previous work of researchers who have worked on data mining in e-commerce settings, was reviewed and the relationship between their findings and this work was established. The data mining system utilizes consensus clustering technique and the clustering algorithm with a graphical-based approach. Furthermore, the interaction between the data mining system and the customer’s dataset on an ecommerce website was defined. Quantitative evidence for determining the number and membership of possible customer behavioural clusters within the dataset was generated.
“Datamining” dos genes da celulose sintase relacionados com ESTs de Eucalyptus spp. (Nota Científica). Cellulose synthase genes dataming related with Eucalyptus spp. expressed sequence tags. (SCIENTIFIC NOTE)
Léo ZIMBACK,Edson Seizo MORI,Mário Luiz Teixeira de MORAES,Edson Luiz FURTADO
Revista do Instituto Florestal , 2008,
Abstract: Trata-se de um estudo sobre “datamining”envolvendo genes ligados ao crescimento decontrole n o hormonal, utilizando o banco dedados de ESTs de eucalipto, efetuado atravésdo Projeto Genoma do Eucalipto (FORESTs)comparados ao nível de aminoácidos. Foramidentificados os clusters de ESTsEGBGFB1211D01.g, EGEZRT6201E10.g,EGCCFB1220G07.g, EGRFCL1206E01.g,EGEQST2006A06.g, EGRFCL1206E01.g,EGEQRT3001H05.b e EGBFRT3106G11.g,similares às proteínas de celulose sintase e suassubunidades controlando o crescimento emArabidopsis thaliana, Gossipium hirsutum,Populus tremuloides, Zea mays e Nicotiana alata,registradas no National Center of BiotechnologiesInformation - NCBI, informa o valiosa parafuturos programas de melhoramento genético dogênero Eucalyptus.This is a study about data mining ofexpressed sequence tags (ESTs) involved withcellulose synthase growth effect genes resultedfrom the Eucalyptus ESTs Genome Project(FORESTs) compared at aminoacids level. Using asequencing of derived from cDNAs librariesinduced and not induced by bacteria, wereidentified EST clusters EGBGFB1211D01.g,EGEZRT6201E10.g, EGCCFB1220G07.g,EGRFCL1206E01.g, EGEQST2006A06.g,EGRFCL1206E01.g, EGEQRT3001H05.b, andEGBFRT3106G11.g, similar to cellulose synthaseproteins controlling growth effect in Arabidopsisthaliana, Gossipium hirsutum, Populustremuloides, Zea mays, and Nicotiana alata,registered on National Center of BiotechnologiesInformation - NCBI. These mining results areimportant to improve Eucalyptus breeding programs.
Intelligent Pattern Mining and Data Clustering for Pattern Cluster Analysis using Cancer Data
G.Raj Kumar,Dr.K.Duraiswamy,M.Thangamani,Dr.P.Thangaraj
International Journal of Engineering Science and Technology , 2010,
Abstract: Data mining techniques are used for the knowledge discovery process under the large data set environment. Clustering techniques are used to group up the relevant data sets. Hierarchical and partitioned clustering techniques are used for the clustering process. The clustering process is the complex task with high process time. The pattern extraction scheme is applied to find frequent item sets. Association rule mining techniques are applied to carry out the pattern extraction process. The pattern extraction scheme and the clustering scheme are integrated in the simultaneous pattern extraction and clustering scheme. The clustering process is improved with pattern comparison and transaction transfer process. The simultaneous clustering scheme is implemented to analyze the cancer patient diagnosis reports. The system is implemented as four major modules data set management, pattern extraction, clustering process and performance analysis. The data sets are preprocessed before the pattern extraction process. The patterns are used in the simultaneous clustering process. The performance analysis is done with the comparison of the data clustering scheme and pattern clustering schemes. The process time and memory factors are used in the performance analysis process. The cluster accuracy is represented using the fitness values. The system is enhanced with the K-means clustering algorithm.
Sampling based Association Rules Mining- A Recent Overview
V.Umarani,,Dr.M.Punithavalli
International Journal on Computer Science and Engineering , 2010,
Abstract: Association rule discovery from large databases is one of the tedious tasks in datamining.The process of frequent itemset mining, the first step in the mining of association rules, is a computational and IO intensive process necessitating repeated passes over the entiredatabase. Sampling has been often suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. Data mining literature presents with numerous sampling based approaches to speed up the process of Association Rule Mining(ARM).Sampling is one of theimportant and popular data reduction technique that is used to mine huge volume of data efficiently. Sampling can speed up the mining of associationrules. In this paper, we provide an overview of existing sampling based association rule mining algorithms.
Application of SIG and OLAP technologies on IBGE databases as a decision support tool for the county administration
REGO, E. A.,GALANTE, A. C.,BRITO, J. L. N. S.
Salesian Journal on Information Systems , 2008,
Abstract: This paper shows a Decision Support System development for any brazilian county. The system is free of any costs research. For doing so, one uses the datawarehouse, OLAP and GIS technologies all together with the IBGE's database to give to the user a query building tool, showing the results in maps or/and tables format, on a very simple and efficient way.
A Schematic Technique Using Data type Preserving Encryption to Boost Data Warehouse Security
M. Sreedhar Reddy,M. Rajitha Reddy,R. Viswanath,G. V. Chalam
International Journal of Computer Science Issues , 2011,
Abstract: An ingenious data warehouse habitually contains information which must be painstaking enormously sensitive and proprietary. Protection of this information, as important as it is, is too often thorny by the presence of assorted computing environments, managerial issues, difficulties in controlling data distribution, and slipshod attitudes towards information security. We present a method of in progression fortification based on an encryption scheme which preserves the data type of the plaintext resource. We suppose that this method is particularly companionable for multifaceted data warehouse environments.
Page 1 /21
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.