AI Workshop – Predict employee leave: will they leave or will they stay?

Imagine you are an HR-Manager, and you would like to know which employees are likely to stay, and which might leave your company. Besides you would like to understand which factors contribute to leaving your company. You have gathered data in the past (well, in this case Kaggle simulated a dataset for you, but just imagine), and now you can start with this Hands On Lab to build your prediction model to see if that can help you.

In this lab, you will learn how to create a machine learning module with Azure Machine Learning Studio that predicts whether an employee will stay or leave your company. We are aware of the limitations of the dataset but the objective of this hands on lab is to inspire you to explore the possibilities of using machine learning for your own research, and not to build the next HR-solution.

You will follow several steps to explore the data and build a machine learning model to predict whether an employee will leave or not, and why.

  • Step 1: Get the starting experiment
  • Step 2: Get a first understanding of the data
  • Step 3: Prepare a training and a test set
  • Step 4: Train the model
  • Step 5: Score the test set
  • Step 6: Evaluate the results
  • Step 7: Gain insights on the why
  • Step 8: Publish your model
  • Step 9: Deploy your model as a web service
  • Step 10: Test your model

You will build this prediction model with the Azure Machine Learning Studio. The complete model will look like this:

Predict employee leave model

Prerequisites: Get Access to Azure Machine Learning Studio

There are several options to start with Azure ML. The easiest way is to go to https://azure.microsoft.com/en-us/services/machine-learning/ and click on the “Get started now” button.

get started with azure

You will need a Windows LiveID to sign in. If you don’t have one, you can sign up here: https://signup.live.com/

Hereafter, you can select the Free Workspace option:

Sign up for Azure Machine Learning Studio

Step 1: Get the starting experiment

We created a starting experiment for you on the Azure AI Gallery to give you a smooth start. This experiment uses a simulated dataset from Kaggle. You have to open the experiment in your studio by clicking on the green button “Open in Studio”. This will open the Azure Machine Learning Studio in a browsers, and you can copy the experiment to your free workspace.

Predict employee leave starting model

Step 2: Inspecting the data

In the Starting experiment: Predict Employee Leave experiment, you will find the Employee Leave data on the canvas, together with a Summarize Data module. If you look at the top corner right, you can see that the experiment is “in draft”. This means that is hasn’t been saved, nor that it has been executed before. Therefore, we start with running the experiment, by clicking on the run button in the menu at the bottom. After running the experiment, the top corner message will change into “Finished running”, and we can start inspecting our data.

run starting experiment

To get a first impression of the data, you can right-click the output port of the dataset and select “Visualize” from the menu to visualize the data. The output port is the little circle under every module on the canvas:

output port
inspect employee leave data

You can scroll through the different columns, and by selecting them, you get an overview in the panel on the right.

We can continue inspecting the dataset. The data comprise a wide range of topics which allow to explain employees’ leave behavior in relation with A) organizational factors (department); B) employment relational factors (i.e. tenure, the number of projects participated in; the average working hours per month; objective career development; salary); and C) job-related factors (performance evaluation; involvement in workplace accidents).

We have the following available variables in the dataset:

Organizational factors

  • Department

Employment relational factors

  • Time spent at the company
  • Number of projects
  • Average monthly hours
  • Salary
  • Whether they have had a promotion in the last 5 years

Job-related factors

  • Last evaluation
  • Whether they have had a work accident

Dependent variable

  • Whether the employee has left
summarize employee leave data

Another way to get a first impression of the data. Therefore we use the Summarize Data module, which gives us insights about the data.

predict employee leave data summary

After you ran the model, you can right-click on the output port of the Summarize Data module and select Visualize. We see that we have 14999 observations, and that we don’t miss any data. We also get an idea about the variance and distribution of the data.

Step 3: Prepare a training and a test set

We split the dataset into a training and a test set, using 70% of the data to train the model with, and 30% of the data to test the model later on. Therefore we drag the Split Data module on the canvas. You can find this module in the menu left, next to the canvas. You can either click throught the various options, or use the search function.

search module menu

When you have found the Split Data module, you can drag it on the canvas, and connect the output port of the dataset to the inport port of the Split Data module. You can connect the modules by left-clicking on the output port, and keep you mouse button down while draging it to the module you want to connect it to. We set a seed, so we can repeat this experiment.

Step 4: Train the model

Since we have split the data, we can continue to work with the training data set. We first select the Train model module and drag it on the canvas. But when we do so you will a little red exclamation mark. This is because we haven’t selected the variable that we want to predict and we haven’t defined the algorithm that we want to use to train the model with. First we will select the dependent variable. Therefore, we have to click on the Launch column selector.

train employee leave model

In order to set the dependent variable, we select the variable “left” (indicating whether an employee has left or not) from AVAILABLE COLUMNS and use the arrow button to get it to the right side, under “SELECTED COLUMNS”.

dependent variable employee leave

Furthermore, we have to select the algorithm to train the model with. In this experiment we use the Two-Class Boosted Decision Tree algorithm with the standard parametrization. We do add a seed to make this experiment replicable.

model algorithm employee leave

Step 5: Score the test set

After this, we are prepared to score the test set and see how our model performs. Therefore, we use the Score Model module and we connect both the output port of the Train model module, which contains the trained model, as the outcome of the Split Data set, containing the test data.

score employee leave model

Step 6: Evaluate the results

Finally, it’s time to evaluate the results of our model. We use the Evaluate Model module which we connect to the results of our prior scoring.

evaluate employee leave model

Let’s run the model, and then right click on the Evaluate Model module to visualize the results. We can predict with 98% accuracy and 98% precision.

Human resources analytics - model evaluation

Step 7: Gain insights on the why

Our final question was why employees were leaving. Therefore, we could add the Permutation Feature Importancy module. We connect the output port of the Train Model module and the output port of the Split Data module. Now we can compute the permutation feature importance scores of feature variables given this trained model and the test dataset. We set a seed to make the experiment replicable, and we focus on accuracy, meaning that we are both interested in selected correctly the people that leave, and the people that will not leave.

feature importance employee leave model

If we run the model, and right-click on the output port of the Permutation Feature Importance module, we find that satisfaction was one of the main factors when leaving, according to this dataset. Next to that, the number of project an employee got was important.

Human resources analytics - permutation feature importance results

Step 8: Publish your Model

Publish the Model as a Web Service

Make sure you have saved and ran the experiment. With the Starting experiment: Predict Employee Leave experiment open, click the SET UP WEB SERVICE icon at the bottom of the Azure ML Studio page and click Predictive Web Service [Recommended].

Set up web service with Azure Machine Learning Studio

A new Predictive Experiment tab will be automatically created. Verify that, with a bit of rearranging, the Predictive Experiment resembles this figure:

Published Azure Machine Learning model

You can now run the Predictive Experiment. When the experiment has finished running, visualize the output of the Score Model module and verify that only the Scored Labels column is returned.

Scored data from predictive experiment

9: Deploy and Use the Web Service

In the Starting Experiment: Predict Employee Leave [Predictive Exp.] experiment, click the Deploy Web Service icon at the bottom of the Azure ML Studio window and select the Deploy Web Service [Classic] option.

Deploy web service

Wait a few seconds for the dashboard page to appear. You have several options to connect to the webservice.

Consume the model

Step 10: Test your model

To test this webservice, you can i.e. click on New Web Services Experience (preview). This will open a new browser where you have the option to test your model (Test endpoint option under BASICS)

test model endpoint

When clicking on Test endpoint, you have the option to enable the usage of sample data, which will generate a sample record to test your model with.

enable sample data

After enabling this sample data, you will see the generated sample data.

Test request response of the model

The final step would be pressing the Test Request-Response button: will this person leave the company?

end result of the prediction test

Another option is to click on the blue TEST button, or to launch an Excel file. It is up to you to explore these options now.

Summary

By completing this lab, you have prepared your environment and data, and built and deployed your own Azure ML model. We hope you enjoyed this introductory lab and that you will build many more machine learning solutions!

Limitations

Of course there is much information missing. We don’t know anything about the dates of the obtained data, nor do we know anything between the data gathering and the moment that the employee left.

Inspiration

As mentioned before, this hands on lab is created to inspire you. If for whatever reason you were struggling to get the model built, you can also download the complete model from the Azure AI Gallery.

If you want to know more about how to work with the Azure Machine Learning Studio, we would like to invite you to take a look at the Principles of Machine Learning course. And if you want to learn more about data science in general, please check out the Microsoft Professional Program – Data Science Track or the other DataChangers’ learning options! .

We hope you enjoyed this workshop. Please feel free to leave us your comments!

2 Replies to “AI Workshop – Predict employee leave: will they leave or will they stay?”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.