Data engineer will soon be at the top emerging jobs lists

LinkedIn recently released their 2017 U.S. Emerging Jobs Report.

Data Engineer (DE) should be near the top of the list but the industry hasn’t standardized on this term. Right now, the people with the skills to do data engineering are spread across 20+ LEGACY job titles. If these 20+ job LEGACY titles were mapped to the emerging title ‘Data Engineering’ then there would be a big spike in the data.


The titles, ‘Data Scientist’ and ‘Machine Learning Engineer’ don’t have this mapping problem. This is great news for all the DB architects, ETL specialists, DBAs, BI developers, RDBMs engineers and everyone else doing ‘data engineering’ today that haven’t updated their titles to ‘Data Engineer’.

Depending on the project, the healthy ratio of ‘Data Engineers’ to ‘Data Scientists’ ranges between (1 DE: 1 DS) to (4 DE : 1 DS). This means for every data scientist, there needs to be 1 or more people doing data engineering. For an average data science project 80% of the work is in data preparation, which is best done by the ‘data engineer’ role.

We have had previous tutorials on the demand function and supply function. Today, well qualified ‘Data Engineers’ are in short supply and the demand for this skill set is only going to increase.

From the executive search industry view point, there is an over arching demand for “data” talent. The lack of clarity in titles makes it challenging for many hr/ recruiting departments to communicate the needs of the managers and businesses they support. They are getting better, but the need and opportunities are definitely abundant!

However, there is one point that needs to be better clarified. The Data Engineer role is also spread into the Machine Learning Engineer and Data Scientist roles, and even more heavily than the legacy titles. These roles are the leads for projects and as there hasn’t been a clearly defined Data Engineer role, much of the Data Scientist and Machine Learning Engineer’s role is to clean and order the data themselves so they can do their most important work with it! 

In the startup world, the teams are smaller and people have to multitask more, but even when working with large organizations, the situation is quite similar! Even with a large team, those running the DS/ML projects have to closely coordinate with others to convey the need for, and reasoning behind, specialized collecting, labeling, and ordering of data. This job may not be split out for a long time, but it’ll eventually move in that direction as the field matures.

Many Data Scientists today are playing dual roles, ‘data scientist’ and ‘data engineer’ out of necessity. It doesn’t mean it’s the optimal solution. The best data scientists are usually the ones that have taken very different paths through life. While either can ‘play’ each other’s job roles, the best data scientist adds the most value when they are doing data science.

Similarly, the best data engineers add the most value when they are doing data engineering. There is always going to be a grey area between the two roles. The data engineering team should be encouraged to spread into data science.

Likewise, data scientists should always be learning more about enterprise class database systems. Cross training is extremely important for project momentum and quality.

However, the most value comes from deep domain expertise and ability to deliver high quality work in their respective roles. As they say: “Jack of all trades, master of none”.


Both comments and pings are currently closed.

Comments are closed.