After a couple of weeks of playing around in Azure, largely focusing on Azure Data Factory (ADF), Data Lakes, Synapse and other similar offerings from Microsoft, I feel that it is all starting to fall into place and make sense. Admittedly, the vague level of information provided did not help on areas to focus on but in the end, general curiousity has started to prevail.
As part of my future work building various pipelines within ADF, I have started to make notes on similarities and differences between existing tools that is used at work and Microsoft’s offering as I am aware that in the future I will more then likely be writing internal documentation (oh the joy!). Whilst doing this deep dive, I have come to love how linear and clean the different activities are laid out within the UI, but behind the clean UI and fairly simple design, is quite a powerhouse of ELTL tooling. This power is increased greatly once you realise how easy it is to integrate different services like Function and Logic apps.
Over the next few months, I am going to start to piece together and post here about solutions to problems that I come across whilst building Pipelines, as well as any useful tools and apps that I build along the way as it will be a long but fruitful journey.
Over the past few months at work, we have been hearing about the possibility that we will be moving our ETL package away from Alteryx to a Microsoft based tool. For the team that I work in, this would be a significant shift as a good 85% of our workload is made easier through the use of Alteryx with a library of Macros and flows that have been built up over the past four or five years. Due to the lack of clarity provided by the working group that was setup to assess the migration, I decided to take a bit of a look at Microsoft’s ETL offerings as I knew we would either be left on Alteryx (potentially with a heavily limited number of licenses) or thanks to some research, we would either be left with SQL Server Integration Services or Azure based tools like Azure Data Factory.
As time went on, it started to become more and more likely that Alteryx was going to be replaced due to the cost of licensing. Fortunately I had been working in SSIS by replicating several of our key data ETL flows to make sure that all the functionality we needed was available. I was hoping for SSIS partially as it meant I could work with Visual Studio at work (I am not a developer professionally, just hobby) but alas it was not meant to be, last week it was confirmed that Azure tools would be the solution that we are moving to in the near future. Unfortunately, nothing was clear around how much access we will have to Azure, what functionality we can use without having to raise support tickets, this makes the difference in how usable it will be for our team, especially in the initial replication process.
Until questions are answered and we get given access to actually work in Azure, I thought that it would be a good time to redeem the free 30 day Azure trial and start to build some workflows with dummy data that could loosely represent the core functionality of what will be needed for work. It is a bit of a win-win situation, I get to learn how to work with Azure Data Factory, Logic Apps, Databricks etc, there is no cost to me (for 30 days) and it gives me a bit of a head start at work when it comes to the understanding of how everything works. On another note, I thought that if I document my learning throughout my time with Azure, it would provide a good reason to reboot the blog and provide reference points for anyone else who wishes to start their time with Azure.