Data Mining Methods with Trees

Information management

Name and surname of author:

Marta Žambochová

Year:

2008

Issue:

1

Keywords:

data mining, decision tree, ID3 algorithm

JEL clasification:

C19 - Econometric and Statistical Methods: Other, C44 - Statistical Decision Theory - Operations Research, C63 - Computational Techniques - Simulation Modeling

DOI (& full text):

http://

Anotation:

Present world is characterized by ever growing volume of data collected and saved into databases. Data often can‘t be analysed by using standard statistical methods because they contain many missing figures or are in qualitative units, and because some databases are in very wide usage. Each organization must be able to extract important information from an extensive database. These were the main reasons why data mining was initiated. Tree structures are used in many diverse areas. Tree structures are frequently used in statistical data analysis, particularly in data mining. This paper describes decision trees, their data structure and their implementation in statistical data analysis. Decision trees offer a non-algebraic method for partitioning data. Using decision trees is attractive because they offer visualization, simplicity of interpretation and high accuracy. We can utilize them to solve various classificatory and predictive exercises. They are a perfect instrument to help managers in the decision-making processes. The decision trees are also used to form different groups of clients in order to prepare special offers and campaigns. Their potential lies in the ability to predict potential debtors on which may be decided whether to give or reject a loan or insurance to a particular costumer. The decision trees are also used to predict the potency for a new product designed for targeted customer, detect an insurance fraud, or foretell the number of people, who want to attend the competition and so on. There are quite a few algorithms, which have been described and are being used to form decision trees. The following two are among the basic ones: algorithm ID3 and its improved version C4.5. The author is J. R. Quinlan. The first one is very illustrative and it is really important in order to acquire the basic understanding in decision trees problematic. The article contains an example of this ID3 algorithm application.

Present world is characterized by ever growing volume of data collected and saved into databases.
Data often can‘t be analysed by using standard statistical methods because they contain
many missing figures or are in qualitative units, and because some databases are in very wide usage.
Each organization must be able to extract important information from an extensive database.
These were the main reasons why data mining was initiated.
Tree structures are used in many diverse areas. Tree structures are frequently used in statistical
data analysis, particularly in data mining.
This paper describes decision trees, their data structure and their implementation in statistical
data analysis. Decision trees offer a non-algebraic method for partitioning data. Using decision
trees is attractive because they offer visualization, simplicity of interpretation and high accuracy.
We can utilize them to solve various classificatory and predictive exercises. They are a perfect
instrument to help managers in the decision-making processes.
The decision trees are also used to form different groups of clients in order to prepare special
offers and campaigns. Their potential lies in the ability to predict potential debtors on which may be
decided whether to give or reject a loan or insurance to a particular costumer. The decision trees
are also used to predict the potency for a new product designed for targeted customer, detect an
insurance fraud, or foretell the number of people, who want to attend the competition and so on.
There are quite a few algorithms, which have been described and are being used to form decision
trees. The following two are among the basic ones: algorithm ID3 and its improved version
C4.5. The author is J. R. Quinlan. The first one is very illustrative and it is really important in order
to acquire the basic understanding in decision trees problematic. The article contains an example
of this ID3 algorithm application.

Section:

Information management

Appendix (online electronic version):

14_zambochova.pdf (290.06 kB)

Links: