DecisionTree

## Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features ( Defined by Scikit)¶

### Case Study: We will use Decision Tree to figure out the hiring based on Past Data.¶

In [75]:
import numpy as np
import pandas as pd
from sklearn import tree

input_file = "PastHires.csv"

In [76]:
Past_Hires

Out[76]:
Years Experience Employed? Previous employers Level of Education Top-tier school Interned Hired
0 10 Y 4 BS N N Y
1 0 N 0 BS Y Y Y
2 7 N 6 BS N N N
3 2 Y 1 MS Y N Y
4 20 N 2 PhD Y N N
5 0 N 0 PhD Y Y Y
6 5 Y 2 MS N Y Y
7 3 N 1 BS N Y Y
8 15 Y 5 BS N N Y
9 0 N 0 BS N N N
10 1 N 1 PhD Y N N
11 4 Y 1 BS N Y Y
12 0 N 0 PhD Y N Y

Map Y,N to 1,0 and levels of education to scale of 0 for BS, 1 for MS and 2 for PhD

In [77]:
d = {'Y': 1, 'N': 0}
Past_Hires['Hired'] = Past_Hires['Hired'].map(d) ## using map
Past_Hires['Employed?'] = Past_Hires['Employed?'].map(d)
Past_Hires['Top-tier school'] = Past_Hires['Top-tier school'].map(d)
Past_Hires['Interned'] = Past_Hires['Interned'].map(d)

d = {'BS': 0, 'MS': 1, 'PhD': 2}
Past_Hires['Level of Education'] = Past_Hires['Level of Education'].map(d)
Past_Hires

Out[77]:
Years Experience Employed? Previous employers Level of Education Top-tier school Interned Hired
0 10 1 4 0 0 0 1
1 0 0 0 0 1 1 1
2 7 0 6 0 0 0 0
3 2 1 1 1 1 0 1
4 20 0 2 2 1 0 0
5 0 0 0 2 1 1 1
6 5 1 2 1 0 1 1
7 3 0 1 0 0 1 1
8 15 1 5 0 0 0 1
9 0 0 0 0 0 0 0
10 1 0 1 2 1 0 0
11 4 1 1 0 0 1 1
12 0 0 0 2 1 0 1

### Look at the features:¶

Years Experience
Employed?
Previous employers
Level of Education
Top-tier school Interned
Hired

### Hired is the Target Feature¶

Years Experience Employed? Previous employers Level of Education Top-tier school Internet are the features we will use to predict.

In [78]:
features = list(Past_Hires.columns[:6])
features

Out[78]:
['Years Experience',
'Employed?',
'Previous employers',
'Level of Education',
'Top-tier school',
'Interned']
In [79]:
type(features)

Out[79]:
list

### Construct the decision tree using Decision Tree Classifier.¶

In [80]:
y = Past_Hires["Hired"]
X = Past_Hires[features]

In [81]:
clf = tree.DecisionTreeClassifier()

In [82]:
clf = clf.fit(X,y)

In [83]:
from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

Out[83]:
To read this decision tree, each condition branches left for "true" and right for "false". When you end up at a value, the value array represents how many samples exist in each target value.
In [84]:
features

Out[84]:
['Years Experience',
'Employed?',
'Previous employers',
'Level of Education',
'Top-tier school',
'Interned']
In [85]:
print (clf.predict([[10, 1, 4, 0, 0,0 ]]))

[1]


## Ensemble learning: using a random forest. We will build 12 decision Trees¶

We'll use a random forest of 10 decision trees to predict employment of specific candidate profiles:

In [86]:
from sklearn.ensemble import RandomForestClassifier

clf10TREE = RandomForestClassifier(n_estimators=12)
clf10TREE = clf.fit(X, y)


### Time to predict, Candidate # 1 ::: Predict employment of an employed 9 years Experience¶

In [87]:
print (clf10TREE.predict([[9, 1, 4, 0, 0, 0]]))

[1]


### Time to predict, Candidate # 2 ::: Predict employment of an unemployed 3-year Experience¶

In [88]:
print (clf10TREE.predict([[3, 0, 0, 0, 0, 0]]))

[0]

In [89]:
# Thanks , This is a sample code , (C) DataTiles.io , DataTiles.ai