|
基于有效特征选择的高价值移动通信用户预测方法 |
Prediction for high-value mobile communication users based on efficient feature selection |
投稿时间:2017-01-06 |
DOI: |
中文关键词: 移动通信用户 不平衡数据集 特征选择 Pearson相关分析 随机森林 预测模型 |
英文关键词: mobile communication user imbalanced dataset feature selection Pearson correlation analysis random forest prediction model |
基金项目:国家自然科学基金资助项目(60975031). |
|
摘要点击次数: 5971 |
全文下载次数: 3458 |
中文摘要: |
高价值移动通信用户预测是电信客户关系管理中的一项重要内容。针对建立预测模型时遇到的高维、大规模、类不平衡等数据处理问题,提出了一种基于有效特征选择的预测方法。利用欠采样方式从初始不平衡数据集提取多个平衡训练集,使用结合Pearson相关性分析和随机森林特征重要性评估的特征选择策略,在集成学习方法中嵌入加权和投票机制获得最优的特征子集,最后采用随机森林算法建立预测模型。实验结果表明,该预测模型可以有效降低特征集的维度并提升对高价值移动通信用户的预测性能。 |
英文摘要: |
The prediction of high-value mobile communication user is an important part of telecom customer relationship management. This paper proposed a predicting method based on efficient feature selection to solve such problems as high dimension, large scale and imbalanced classes in data processing. With balanced training sets extracted from an initial imbalanced dataset using under-sampling, a feature selection strategy based on Pearson correlation analysis and random forest method assessing the feature’s importance was applied and the best feature subset was selected by embedding weighted and voting mechanism in the ensemble learning method. The final prediction model was built by random forest algorithm. Experimental results show that the proposed model not only reduces the dimension of feature set efficiently, but also improves its prediction performance for high-value mobile communication users. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |
|
|
|