In part 3 we introduced Kafka, but didn’t explore it very deeply. In this part we will be looking at how we can utilise Kafka for event collection, with specific interest on the size of our event log outputs. We will also talk about how to land data off the queue, and what consideration you might have when doing so. Code for this series can be found here.

Batching Data

For this part of the series you’ll be happy to hear that we’ll be working with a single docker-compose file, so you won’t have to jump between stacks! …

In Part 2 we introduced an API that we can send user creation data to, and talked about running queries on the raw data that is created by that API. There are some problems with that as an approach, which are going to be tackled in this article. Specifically we’ll look at sending events rather than relying on the transactional system’s database, look at some of the problems we can run into by doing that and look at potential solutions. As ever, the code can be found here.

Previously it was discussed that we could copy our API’s database into…

A technical starting points for Data Engineering. Here we’ll look at an example of how a traditional BI workflow might look, and then discuss how Data Engineering handles the same problem when the scale and immediacy becomes an issue for the traditional approach. The example will have us looking at an API written in Python and backed by Postgres. We’ll use this to examine how to get data out of your system in a batch manner, and discuss this as an approach. Code is available here.

Making Our API

The first thing we need is an API, and we want it to do…

Part 1 will primarily serve as a non technical introduction to Data Engineering, some of the areas it has a lot in common with and how it all mixes together. The further parts will introduce the concepts through a technical lens, giving you some ideas for projects and some basic example code.

What is Data Engineering?

This is a question that you will either be asking yourself, or have already tried to research but still aren’t sure about. There’s no shame in that, the community itself recognises that the job role can be extremely broad with lots of different approaches to solving the same…

As the industry begins to realise that Data Science can’t exist on its own, the hype around Data Engineering grows. Wherever we see hype around an industry or job role, we see keen individuals looking to get involved (for various reasons). Whether this is good or bad isn’t for me to say, but the industry is crying out for more Data Engineers, so the demand is definitely there. …

Joseph Thickpenny Ryan

Data Engineer. Double barrelled surname without a hyphen, it’s a problem I didn’t choose.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store