Gini Index, Information Gain, Mutual Information And Entropy

Tell Me …

Mi'kail Eli'yah
8 min readMar 5, 2023

Gini index, Information Gain, and Entropy are 3 commonly used measures in decision tree algorithms for feature selection. These measures are used to determine the best split of data into different subsets based on the values of a feature.

Gini index is a measure of impurity that is used in classification problems. It measures the probability of misclassifying a randomly chosen item in a dataset. A value of 0 indicates that all items in the dataset belong to the same class, while a value of 1 indicates that the items are evenly distributed across all classes. Gini index:

Information gain is a measure of the reduction in entropy that results from splitting a dataset based on the values of a feature. Entropy is a measure of the randomness or uncertainty of a dataset. A value of 0 indicates that the dataset is completely pure, while a value of 1 indicates that the dataset is completely random. Information gain:

Entropy is a measure of the impurity of a dataset. It is similar to the Gini index but is calculated using the logarithmic function. Entropy:

where D:= dataset; A:= feature being considered; n:= number of classes; p_k:= proportion of items in class k; H:= entropy.

--

--

Mi'kail Eli'yah
Mi'kail Eli'yah

No responses yet