Decision trees in python

Explained

Decision trees are a form of supervised learning that can be used for both classification and regression purposes. In my experience, they are typically utilized for classification purposes. The model takes in an instance and then goes down the tree, testing significant features against a determined conditional statement. Depending on the result, it will go down to the left or right child branch and onward after that. Typically the most significant features in the process will fall closer to the root of the tree.

Decision trees are becoming increasingly popular and can serve as a strong learning algorithm for any data scientist to have in their repertoire, especially when coupled with techniques like random forests, boosting, and bagging. Once again, use the video below for a more in-depth look into the underlying functionality of decision trees.

Youtube - Decision Tree

Now that you know a little more about decision trees and how they work, let’s go ahead and implement one in Python.

Getting Started

from sklearn import tree
df = pd.read_csv(‘iris_df.csv’)
df.columns = [‘X1’, ‘X2’, ‘X3’, ‘X4’, ‘Y’]
df.head()

Implementation

from sklearn.cross_validation import train_test_split
decision = tree.DecisionTreeClassifier(criterion=’gini’)
X = df.values[:, 0:4]
Y = df.values[:, 4]
trainX, testX, trainY, testY = train_test_split( X, Y, test_size = 0.3)
decision.fit(trainX, trainY)
print(‘Accuracy: \n’, decision.score(testX, testY))

Visualization

from sklearn.externals.six import StringIO 
from IPython.display import Image
import pydotplus as pydot
dot_data = StringIO()
tree.export_graphviz(decision, out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

Decision trees in python

Published 06-14-2017 00:00:00

Explained

Getting Started

Implementation

Visualization