Data Engineering — The Junior Problem

Joseph Thickpenny Ryan
5 min readMay 13, 2021

As the industry begins to realise that Data Science can’t exist on its own, the hype around Data Engineering grows. Wherever we see hype around an industry or job role, we see keen individuals looking to get involved (for various reasons). Whether this is good or bad isn’t for me to say, but the industry is crying out for more Data Engineers, so the demand is definitely there. This drive of individuals towards Data Engineering means that this demand will start to get filled, but employing individuals isn’t the whole story, we have the problem of upskilling them to meet the demands of the job.

I’ll be going over what I think the problems are when trying to get into Data Engineering, why we have these problems and my thoughts on the future.

Problems? What problems?

There are a combination of factors within the Data industry that make it difficult to embrace as someone just coming into technology. Here we’ll introduce what I see as some of the key difficulties:

  • An immature industry
  • An ever changing technology landscape
  • Big differences from more traditional Software Engineering routes

Data has always been important, but until fairly recently a number of industries were either content with their traditional Business Intelligence (BI) practices or were ignoring a significant amount of information that they could have been using to their advantage.

With more companies attempting to get value from their data, either as a new use case or as an extension of their BI capabilities, there are plenty of employers now just finding their feet. This creates an immature industry, with a small number of experts available to guide companies to their goals and a lack of senior people available to train people entering Data Engineering. It’s important to realise that because any industry could stand to benefit from better data practices, every business could feasibly hire data professionals. The shortage is real.

Waiting just around the corner to compound that problem is the technology landscape itself. As a rapidly evolving industry there are technologies that lost relevance almost as quickly as they gained hype, and plenty of newcomers promising the world to fawning onlookers. This means the search engines have a healthy amount of recent but largely irrelevant guides, either because the technologies have gone out of fashion (sorry, Hadoop) or because they’re locked behind adoption paywalls (hi, Snowflake).

The task for adopters of modern data practices, either as a company or as an individual, is therefore difficult. There is a lack of expert guidance available to hire, and a lack of trustworthy resources available to get started. This is easier to overcome as a company because you have the option of throwing money at the problem, but it’s significantly more difficult for individuals trying to find their way in the industry.

All of the factors I’ve highlighted are intertwined, but I think the first two are a good introduction for the final point. If we use the “full stack developer” as our image of a traditional Software Engineering route then this becomes quite apparent. If you enter this route as a junior you will likely land in a very strictly defined role, where a large number of the technology decisions were made a long time ago (“we’re an X shop”). This gives you a very clearly defined route that doesn’t force you to make very many base technology decisions as well as presenting you with senior team members who will help you develop. If a company does this properly it’s the perfect environment for people to learn their trade before branching out into more technologies.

For an industry with lots of people still trying to find their way, and a technology landscape that sometimes changes overnight, the perfect learning environment hasn’t had time to settle. This makes it difficult for a route into the industry for strictly junior level individuals, because the number of opportunities is extremely limited. We have this problem to deal with before we even start on why the technologies choices are different. This is the junior problem.

The Future

This is a problem that businesses can’t afford to ignore if they want to foster a data driven culture internally, so they have to do something about it. It’s easier said than done, but we can learn from the traditional software engineering route and tweak it to fit our own needs. This still suffers from the lack of available experts, but we can’t avoid that as a starting point.

In the short term, there are some obvious candidates for people who could move over into Data Engineering in order to fill the gaps:

  • Backend Engineers
  • DBAs and UN*X Sysadmins
  • Data Scientists who are actually doing Data Engineering and just have the wrong job title

These roles have enough in common with Data Engineering that they potentially have a base set of skills allowing them to move across with less friction. It’s not perfect, but it could act as a way to quickly gain more senior members, allowing you to get into a place where juniors can thrive more quickly. This is likely to be healthy for your company, the industry can’t just keep hiring seniors.

For the long term we need to produce internal processes that mirror learning experiences that we know to work, in this case the traditional software engineering pattern is probably the easiest target to aim for. Whereby we will be hiring juniors and putting them through a strict set of learning experiences, rather than attaching them to a single project where the use of technology is simply too broad for them to gain long term value.

To do this we ironically need to diverge a little bit from Software Engineering. The strength of the learning pattern comes from seniority and known technologies, but in Data Engineering we know that the technologies can change quite quickly. We therefore need to think about the key ideas that our industry relies on, and how and why we apply certain technologies to achieve the goals set out by those ideas. For the Software Engineering learning path this is already a known quantity because the problem is well documented, Data Engineering hasn’t quite settled enough to be able to claim this luxury.

Closing Thoughts

Data Engineering needs more junior opportunities, both for the good of the industry and for the good of interested individuals. We can only hope to achieve this by making internal changes within our own companies that allow us to drive this change, it’ll be better for everyone involved. There are short term options, but these must be used to further the long term goals.

--

--

Joseph Thickpenny Ryan

Data Engineer. Double barrelled surname without a hyphen, it’s a problem I didn’t choose.