2023 Jobs Desirability Index — Kenya

3 min readJan 5, 2023

The year has just turned the corner and am grateful for 2023!

The new year comes with new possibilities and new resolutions — new year new me if you may. For those wanting a switch of environments in their workplaces, and curious to find out which companies stand out, welcome.

Recap

Last year a tweet from the account @WanjikuReports ran a poll querying where Kenyans would want to work in the new (now old) year, and the results were collated by yours truly in this article.

This year another poll was carried out:

and the results are as shown below:

The methodology for obtaining these results haven’t changed from last year, follow these in this article. Also the workings are public available via this notebook.

Of note is that there have been major improvements in the quality of output produced, and a few learnings which are detailed below:

Testing and Bias

I added more testing measures to the implementation of the main algorithm to better understand accuracy of the output. Almost immediately a bias jumped out that was not picked on in last years calculations, that is, confirmation bias. Since I work at one of these companies, when the results produced confirmed my own implicit bias I accepted these without further investigation. Bullet proofing on testing reduced subjectivity from this bias.

I was also able to catch errors caused by fuzzy matching on two similarly matching tokens from the companies list which introduced either double counting on specific tokens, or mis-classification errors. Since this is a fuzzy matching problem, we must account for some errors in aggregation, but its our duty as data engineers/scientists to squash these errors where we can!

Increase in Calculation Speeds

This was tackled by breaking down the main procedure into two functions and measuring outputs from each function. To reduce the time taken by the “offending” function python’s multiprocessing library came to the rescue

from multiprocessing import Pool

# prepare_token function defined here

with Pool() as mp_pool:
    sentences = enumerate(narr)
    final_tokens_arr = mp_pool.map(prepare_token, sentences)

This increased code execution times 3-fold, whereas the previous code execution times were ~60secs, the new implementation times are ~20secs on my 2-core machine running macos — with more cores translating to faster executions times

A further improvement could be made by utilizing the venerable ray lib, with a straightforward description provided from this stackoverflow answer

import ray
from ray.util.multiprocessing import Pool
pool = Pool()

Unfortunately, ray categorically refused to be installed in my vm running python version 3.7.13. So, will look into implementing it on this project next year :), maybe running on 3.11.*

Further analysis

Will perform some co-occurences analysis on this notebook within the course year, so stay tuned 👍🏽

To a successful 2023!

2023 Jobs Desirability Index — Kenya

Written by Dawid Kimana

No responses yet