Cloud Data Architecture
Recently I have been working on designing end to end data pipelines on Azure for my company. In this post, I document all my thought process and reasons for all the design choices made when architecting data pipeline. I also compile them into a checklist which I will continue to refine and reference them when doing data architect projects in the future.
I will use designing a sales data pipeline as an example. The goal is to extract sales data from outlet POS system, store them in cloud, make sales predictions and then push the data to PowerBI for business users.
Here is an overview of the whole architecture:
We divide the whole data architect process into 6 sections:
Machine Learning Pipeline (Coming Soon)
CI/CD Pipeline (Coming Soon)
Data Warehouse (Coming Soon)
BI Reporting (Coming Soon)
Checklist
Data Ingestion Pipeline
What business use case will this data pipeline used for?
What are the data sources and data format?
Are we collecting data in batches or streaming?
Where do we store collected data?
Data Transformation Pipeline
What data stored in each data lake tiers?
What tools should we use for data transformation?
How to orchestrate data pipelines?
How to monitor data pipelines?
Awesome Resources
Azure Data and AI Architect Handbook: Gives an overview of different data services and architecture concepts on Azure.
Fundamentals of Data Engineering: Gives an overview on modern data engineering. Highly Recommended!
Exam AZ-305 - Designing Microsoft Azure Infrastructure Solutions: Official learning module for Azure solutions architect certification