“If I had an hour to solve a problem, I’d spend 55 minutes thinking about the problem and five minutes thinking about solutions.” – Albert Einstein
This is the second in a series of posts on machine learning in HR tech. If you haven’t already, we recommend you read the first post here.
In our last post, we explained why it takes more than data science and machine learning (ML) to build a knowledge graph for a job matching system. In a nutshell, to build a knowledge representation of sufficient quality, you need people who understand the knowledge you want to represent. More generally, if you want to solve a problem as opposed to putting out fires, you need people who understand the problem (and not firefighters). This is key not only to determining a good approach, but to the entire system lifecycle. So many attempts at innovative problem-solving fall flat because a) they target the wrong problem or b) the people involved did not understand the problem. So, before we get into the nitty-gritty of ML-based HR tech in the next post, let’s discuss the problem we want to solve, starting with the all-important results.
The discussion that follows is based around a job matching system as an example of HR tech. However, this discussion is highly relevant to any kind of HR analytics.
In a job matching system, at the most superficial level, you want to feed candidate and job profiles into a machine, let the machine do its magic and spit out a list of… well, depending on the problem you want to solve, it could be:
A significant part of understanding the problem is defining similarity. What makes one job similar to another? A first idea might be that two jobs are similar if they have similar or even the same job titles. So, when are job titles similar? Take a look at the following examples:
The job titles on the right all share the keyword Manager, but they are clearly not similar in any respect. On the other hand, most of the job titles on the right are similar in some sense. However, whether they are close enough depends on the problem you want to solve, and where the expressions came from. For instance, if you want to use the results of your system for, say, national statistics, these job titles may be close enough. If you want to recommend jobs to a software engineer, that person may not be interested in any of the other jobs, as they only cover a subset of the tasks and skills of a software engineer – in theory. In reality, you will find that the terms software engineer, software developer and software programmer are often confused or used interchangeably in job postings. In addition, even identical job titles often refer to very different jobs: an accountant for a global conglomerate is not the same as an accountant for a small local business; a project manager in the manufacturing department may not perform well if transferred to the marketing department; a carpenter with 18 months of training is probably not as skilled as a carpenter with 4 years of training. Business size, industry, training and many other factors all impact the skill set required or acquired in a given position. So let’s consider jobs in terms of skills.
We discussed skills in quite some detail in this post. The gist of it is that first of all, there is a lot of talk about skills, but no consensus on the most basic and pertinent question, namely what exactly constitutes a skill, i.e. the definition. Then there is the question of granularity, i.e. how much detail should be captured. The choice is highly dependent on the problem you want to solve. However, your system will typically need to understand skills at the finest level of granularity in any case so that it can correctly normalize any term it encounters to whatever level of detail you settle on. Which leads to the final point: When it comes to skills, we have a highly unfavorable circumstance of projection bias in a tower of babel. Given a term or expression describing a skill, most of us expect that everyone means the same thing when they use that term. In reality, there is significant confusion because everyone has their own interpretation based on their unique combination of experience, knowledge and other factors. We also discussed this in an episode of our podcast: Analyzing skills data. Long story short: There is much work to do in terms of skills and skills modeling.
Now, in an ideal world everyone would speak the same skills language. Realistically, this is simply not going to happen. What we can do is attempt to integrate this translation work within our system. Which is one of the key strengths of a knowledge representation. And we discussed in the last post of this series at length why a good knowledge graph cannot be generated by a system purely based on ML. So let’s suppose for a moment we have a solid definition of skills, the appropriate level of granularity for our problem and we are all talking about the same thing.
Again, we want to find a workable definition of similarity. Working with the premise that a job is determined by its skill set, if you have two jobs with the same skills, then the jobs are the same. This implies that the more skills two jobs have in common, the more similar the jobs. At the other end of the spectrum, two jobs that have no skills in common are not similar. Sound’s logical, right? So, if we feed lots of job postings into our system to analyze them all in terms of skills, our system can identify similar jobs. Easy. Let’s look at an example. The following are all required skills listed in three real-life job postings:
According to our theory – and with an understanding that certain terms have similar meanings, like client centered and guest focus – jobs A and B must be similar and job C quite different. Think again.
Ok, but surely our system will discern more accurate and comparable skill sets if we feed it more data. Well, it might – if your system processes that data correctly. So far, these systems typically deliver results like the ones discussed in this post, where 1 in 15 online job postings for refuse workers in Europe apparently require knowledge in fine arts.
The more pertinent question, however, is how useful that would be to the problem you want to solve. As we have explained before, there is no such thing as a standard skill set for a given profession in a global sense. In addition, standardizing skills sets according to job titles is more or less equivalent to simply comparing job titles. So, if you have a matching or recommendation system, this approach will not be helpful. If you want to perform skills analysis, say, to develop a skills strategy for your business, it will not be helpful either. Instead, we need to come up with a viable definition of job similarity that not only consists of a set of features like job titles, skills, industries, company size, experience, education, and so on. The definition must also include aspects such as the importance of each feature depending on the context. For instance, specific training and certification is an indispensable feature of jobs for registered nurses and irrelevant for stocking and unloading jobs at a supermarket. Work experience is not helpful in matching graduates to entry-level jobs, but likely necessary when looking or hiring for a management position. Of course, if you’re designing a purely ML-based system, you probably want to leave the task of determining importance or weighting of the features to the system. However, somewhere along the line, you will (hopefully) want someone to check the results. And it shouldn’t be a data scientist.
Suppose you have found a workable definition of job similarity and you have a good knowledge graph to help your system understand all the different terms used in your job-related data. Now you want to build a matching engine based on ML. Again, you need to first think about the problem at hand. Apart from the semantic aspects discussed above, there are ethical considerations. One thing you will want to avoid for obvious reasons is bias – in any use case. Also, if you want the system to be used in any part of an HR tech stack, it must be both explainable and interpretable. For one thing, there is increasing evidence that legislation will eventually dictate explainability and interpretability in ML-based HR tech (AI Bill of Rights, EU AI Act, Canada’s AIDA, and so on). Arguably more important is the fact that HR tech is used in decisions that strongly affect real people’s lives and as such, must be transparent, fair and in all respects ethical. And even if your system is only going to produce job recommendations for users, explainable results could be an excellent value proposition by building trust and providing users with actionable insights. And with any HR analytics tool, you should want to know how the tool arrived at its decision or recommendation before acting on it. But first, we need to discuss what exactly we mean by interpretability and explainability in an ML-based system, and how the two differ.
These terms are often used interchangeably, and the difference between the two is not always clear. In short, an ML system is interpretable if a human can understand exactly how the model is generating results. For instance, a matching engine is interpretable if the designers know how the various features (job titles, skills, experience, etc.) and their weights determine the match results. An explainable ML system can tell humans why it generated a specific result. The difference is subtle, but key. Think of it like this: Just because the system told you why it did something (and the explanation sounds reasonable), that doesn’t mean that you know why it did it.
In our matching engine example, an explainable system will provide a written explanation giving some insight into what went into the decision-making process. You can see, say, a candidate’s strengths and weaknesses, which is certainly helpful for some use cases. But you don’t know exactly how this information played into the matching score and thus you cannot reproduce the result yourself. In fact, depending on how the system is designed, you can’t even be sure that the explanation given really matches the actual reasoning of the system. It is also unclear whether the explanation accounts for all factors that went into the decision. There could be hidden bias or other problematic features. Apart from the fact that these issues could cause difficulties in a legal dispute, it looks a lot like transparency washing—particularly because most people are unaware of the potential inaccuracy of these explanations.
In an interpretable system, you know precisely how the results are generated. They are reproducible and consistent, as well as accurately explainable. In addition, it is easier to identify and correct bias and build trust. Simply put, an interpretable ML system can easily be made ethical by design.
However, even in an interpretable system you still face the challenge of correctly processing the data. It doesn’t really matter what color the box is, black, white or any other color under the sun. It’s still opaque if your system can’t properly deal with the input data. Because mistakes in the processing inevitably lead to spurious reasoning. To perform any kind of reliable analysis, your system must – at the very least – be able recognize and normalize the relevant information in a dataset correctly. Otherwise, we just end up back where we started:
Of course, no system is perfect, so the question is how much trash are you willing to accept? In other words, you need benchmarks and metrics. You need experts that are actually capable of evaluating the system and the results. And maybe let’s not set those benchmarks too low. After all, being better than appalling isn’t saying much…
As a fan of ML and data science (which, by the way, we are too – as long as it’s put to good use), you may still want to build a purely ML-based machine that can correctly process the data, feed it into an interpretable system and produce fair and accurate results with truthful explanations that can be understood by humans. So, in the next post, we’re going to dive into the nuts and bolts of this ML-based machine, looking at everything from training data over natural language processing and vector embeddings to bias mitigation, interpretability and explainability. Stay tuned.