Gini Index, Information Gain, Mutual Information And Entropy
Tell Me …
Gini index, Information Gain, and Entropy are 3 commonly used measures in decision tree algorithms for feature selection. These measures are used to determine the best split of data into different subsets based on the values of a feature.
Gini index is a measure of impurity that is used in classification problems. It measures the probability of misclassifying a randomly chosen item in a dataset. A value of 0 indicates that all items in the dataset belong to the same class, while a value of 1 indicates that the items are evenly distributed across all classes. Gini index:
Information gain is a measure of the reduction in entropy that results from splitting a dataset based on the values of a feature. Entropy is a measure of the randomness or uncertainty of a dataset. A value of 0 indicates that the dataset is completely pure, while a value of 1 indicates that the dataset is completely random. Information gain:
Entropy is a measure of the impurity of a dataset. It is similar to the Gini index but is calculated using the logarithmic function. Entropy:
where D:= dataset; A:= feature being considered; n:= number of classes; p_k:= proportion of items in class k; H:= entropy.