案例,spss,数据分析

数据挖掘的决策树研究


全文字数:9000字左右  原创时间:<=2022年

【内容摘要】

数据挖掘的决策树研究

摘    要

在信息爆炸的时代,数据挖掘技术迅猛发展。数据挖掘指的是分析数据,使用自动化或半自动化的工具从大量的、有噪声的、模糊的数据中挖掘隐含模式的过程。在数据挖掘的多种方法中,决策树是一种既高效又简洁的方法。本文以决策树为主要的研究对象,重点探讨探讨了决策树的生成和剪枝两个阶段。决策树的基本思想是通过一批己知的训练数据建立一棵决策树,然后利用建好的决策树,对数据进行预测。决策树剪枝的作用是简化决策树,提高决策树的泛化能力,避免对训练集的过适应,是决策树学习中的重要研究内容。本文详细的论述了基于信息熵理论ID3、C4.5算法以及基于GINI系数的CART算法的基础理论,并简要对其他决策树算法的决策树分裂方式和剪枝方法等方面进行比较。
关键词:数据挖掘  决策树  ID3

Abstract

    In the era of information explosion, data mining technology is developing rapidly. Data mining is the process that refers to analyzing data and mining implied model from a lot of noisy and ambiguous data by automated or semi-automated tools. Decision tree is an efficient and concise method in a variety of data mining technology. The emphasis of this essay is decision tree, studying the decision tree formation and pruning. We can build a decision tree by the medium of known training data first. Then, capitalized on the decision tree which had been created, we predicted with data.  Pruning is an important domain of decision tree induction, which can simplify and populate decision tree and avoid over-fitting. In this paper, we amplify the ID3 and C4.5 algorithms based on the information theory and CART method based on the lowest CINI index, and compare different algorithms.
Key words: Data mining  Decision tree  ID3
 

 

*若需了解更多与协助请咨询↓→[电脑QQ][手机QQ]【数据协助】