• Abdurrahman Abdurrahman STMIK BANDUNG
Keywords: web usage mining, classification of web users, ant colony optimization, Ant-Miner algorithm, cAnt-Miner algorithm, heuristic functions, preprocessing, accuracy of rules, simplification of rules


Web Usage Mining (WUM) is the use of data mining methods to extract knowledge from web usage data. One function of WUM is to support Business Intelligence (BI) purpose in which one of the important information needed is the classification of web users that can be used for acquisition, penetration, and user retention activity. There are two main problems encountered in conducting the classification of web users. The first is the determination of antecedent attributes as a term of classification rules, which is a major problem in data mining classification function in general. The second problem is the preprocessing activity which involves preparing the supporting data for the web users’ classification need which is the most difficult stage in WUM.

For the web user classification method, we propose a classification method based on ant colony optimization method (ACO) as a distributed intelligent system using heuristic function which is in line with the problem areas. We proposed a heuristic functions for web user classification based on web usage data that uses entropy of antecedent candidate, information gain from attribute of total number of web user access  and average of access duration of web user.  For preprocessing purpose, a method of data preparation that can support the needs of web users’ classification is proposed. The data used consists of web access log data, web user profile data and web transaction data. The preprocessing activity consists of parsing, data cleansing, and extraction of the web user sessions using heuristic method concerning web page access timeout and differences in web browser agent.

Testing is done by comparing the performance of the proposed algorithm with Ant-Miner algorithm, cAnt-Miner algorithm, and the Continuous Ant-Miner algorithm. The results of testing of four web data shows that the performance of the proposed algorithm is better in terms of accuracy of rules and simplification of rules.


Download data is not yet available.