.
Links to other parts
A discussion between a CxO and a senior Data Architect Part 2
A discussion between a CxO and a senior Data Architect Part 3
A discussion between a CxO and a senior Data Architect Part 4
A discussion between a CxO and a senior Data Architect Part 5
.
.
Below is the conversation that happened in an interview for the role of Lead Data Architect. It was more like business analysis and technical discussion between a client and a consultant. We try our best to narrate the same way as it happened in the interview. We are changing the names for privacy purposes, assuming that the interviewer is Alison, and the interviewee is Vasumat.
Mr. Vasumat has already gone through 1 coding round and 3 technical discussions within the span of 8 weeks. Now, he is having the final discussion with senior leadership. He was informed that he may need to prepare to spend 6 to 8 hours (including lunchtime) in their office. Vasumat reached the office and completed his entry formalities by 8:45 AM whereas his interview is scheduled for 9:15 AM. HR introduced Vasumat to the interviewer.
.
Here you go:
.
Alison: It’s nice to meet you Mr. Vasumat
Vasumat: Honor is all mine mam
Alison: You can call me Alison.
• I am a chief program director working with XXXXX for the last 20 years and I take care of the digital transformation and modernization of our software applications.
• We have a humongous enterprise that includes thousands of servers, databases, and petabytes of data with heterogeneous database systems. We’ve been planning to migrate our applications to the cloud.
• That’s the reason I’ve been staying here in India for the next few months to settle down the tech stack and for the migration kickoff. We hired cloud architects and looking for a data architect who can handle end-to-end data migration.
• We have a partnership with Microsoft, so
Azure will be the primary choice. However, down the line, we would like to go with multi-cloud architecture most probably with Azure and Google.
• Richard told me that you have considerable experience, especially in data platforms and cloud migrations. Also, from the previous discussions, they have a positive outlook on your profile. I just wanted to discuss it before finalizing.
• So Vasumat, I would like to hear about your experience with cloud migrations
.
Vasumat: Thanks for the detailed background. I have over 13 years of experience in dealing with enterprise database systems. My cloud journey started 7 years back and earned extensive knowledge in migrating heterogeneous database systems from on-premises to the cloud. Experience in both Azure and AWS but my primary specialization is in Azure.
Alison: Since you are an architect, you must have participated in the migration planning and execution, isn’t it?
Vasumat: Certainly! It’s part of my role.
.
Alison: Great, can you describe the phases of cloud migration at a high level?
Vasumat: There are 5 (1+4) phases involved:
• Migration Preparation and Business Planning + Discovery, Planning, Execution, and Optimization.
Alison: I am listening…
Vasumat: Let me explain in detail.
• Migration Preparation and Business Planning: We create a business case for migration by projecting benefits in terms of cost, performance, and digitalization aspects and seek business approvals. Before starting the next phase, we must have a dedicated stakeholder (Cloud/Solution/Migration/Data Architect) who can lead and execute the migration project.
◦ Phase 0 – Output: A clear Business Case that tells us why we are migrating to the cloud
.
• Discovery & Assessment: We understand, analyze, and collect data from the source environment. This includes infrastructure, software, hardware, network & security, operational models, enterprise policies and agreements, and data landscape. I do take care of the data part.
◦ Software applications: web, desktop, mobile, message broker, web service, etc.
◦ Application dependencies: Custom libraries, parent and child apps, integrations, etc. Ex: Our web app is using a Currency conversion service; we built a custom library to convert an invoice query result to a PDF etc.
◦ Legacy applications: Unsupported / Older versions, mainframes, etc.
◦ Third-party solutions: Collibra, Meta-Center, HVR replication, Commvault, etc.
◦ Client Tools: Visual Studio, Eclipse, Oracle developer, SSMS, MySQL Workbench, etc.
◦ Source code repository: TFS, SVN, Git, Bitbucket, etc.
◦ Operating systems and capacity: Windows Linux, Mac, Memory, CPU, IO, Disks, etc.
◦ Licensing: Opensource, corporate agreements, renewal status, licensing mobility, etc.
◦ Hardware mapping: Servers, Racks, Cables, Switches, Storage, Routers, Power Equipment, Colling Systems, Network devices, routers, etc.
◦ Servers: Physical / Virtual, Application, Web, Database, Proxy, Mail, FTP, Print, DNS, etc.
◦ Network and Security: SSL, TLS, Encryption, firewalls, network security groups, web application and third-party firewalls, end user connectivity, internet gateways, etc.
◦ Authentication and authorization: Domain controllers, active directory, Active Directory, Federation Services, SSO integration, etc.
◦ Compliance requirements and solutions: Standard regulatory and industry-specific compliance Ex: GDPR, HIPAA, PCI DSS, SOX, CCPA, etc. Code scanners, auditing tools, and services.
◦ Operational models: Development, release & deployment, configuration, escalation, system maintenance, patching, virtualization, etc.
◦ Environments: DEV, QA, UAT, Pre-Prod, Prod, etc.
◦ Data Landscape:
· Data Architecture: Lambda, Kappa
· Sources: Web Apps, Mobile Apps, IoT, etc.
· Storage layers: databases, data warehouse, data lakes, delta lakes, file system/share, etc.
· Databases: RDBMS, NoSQL, Analytical, Caching, Big Data management systems, etc.
· Data Movement/Synchronization: Batch Processing
, streaming, pull/push, etc.
· HA-DR and Load Balancing: Agreements (RTO/RPO), tools, services, and native features used for HA, DR, and LB purposes. Ex: Oracle (RAC, Data Guard, Streams, Flashback, Fail-safe, etc.), SQL Server (Always-On, replication, log-shipping, Change Data Capture, etc.), third-party tools (HVR replication, Commvault backups, etc.), Linux/ Windows snapshots, etc.
· Data Integrations and Lineage: Managing and monitoring all data stores, access, classification, and data movement across the enterprise from a central location. It includes data lineage tools (Collibra, metacenter, etc.), data directories, and dictionaries.
· Data Ingestion, ETL, and ELT tools: Kafka, NiFi, Talend, SSIS, Informatica, etc.
· Reporting tools: SSRS, Power BI, Tableau, etc.
· Data Security: Data encryption features, data masking, access control, certificates, key stores/vaults, etc.
· Data Classification:
• Security
(Public, Internal, Confidential, Restricted)
• Usage (Hot, Warm, Cool)
• Domain (Government, Health, Financial, PII, etc.)
• Type of Data (Transactional, Operational, Analytical, Master, Historical, system logs, backups, etc.)
◦ Assessment reports: We need to run assessment tools and services to identify the cloud migration readiness of our workloads.
◦ Benchmarks: Performance baselines, common errors, and possible resolutions running assessment tools and finding the compatibility issues and migration blockers (Ex: Data Migration Assistant, Upgrade advisor, etc.), etc.
◦ Software and Hardware Maintenance: Maintenance and upgrade windows, predictive analysis tools, etc.
◦ Procedures and Policies: Configurations, decommission policies, security frameworks, etc.
◦ Application rankings: Rank applications and data stores based on their criticality, business dependency, complexity, revenue impact, data volume, etc.
◦ Business landscape: Operating regions, time zones, runtimes, shifts, critical applications, operational costs, vendors, accountabilities, point of contacts, approval matrix, escalation matrix, domain-specific hardware/software requirements, compliance certifications, etc.
◦ Documents: AS-IS architecture diagrams, Service Level Agreements, Functional Design Documents, Detailed Design Documents, Commercials, Network and security frameworks, Data Requirements, etc.
· Phase-1 – Output: Documented footprint/inventory of all resources, assets, and processes of the source environment. The discovery document describes the entire enterprise data and infrastructure landscape and migration readiness of the workloads.
.
Alison: That’s a long list man. I am curious to know how you collect all this information from the source environment.
Vasumat: That’s a valid question indeed. I understand collecting this data from Data stores and file systems. But I got the same doubt when the first time we needed to assess the infrastructure as part of cloud migration. Now I learned it and here is the approach:
• Perform IT infrastructure audits to identify all assets (software, hardware, upgrades, maintenance, etc.) associated with our organization.
• We also need more granular details at the application level. Any enterprise application has the architecture diagram that we call “Detailed AS IS Architecture”. Architecture diagrams can give us the maximum information on all assets, resources, and integrations involved in that application. If in case, there is no architecture available for some reason then it takes some additional time and effort to prepare an AS IS Architecture Diagram.
• Along with AS-IS architectures, we may need to go through the existing application inventories, repositories, documentation, etc.
• Also, discussions might require with the technology and business stakeholders, operators, vendors, and counterparts, which include architects, application, database, network, storage, security, audit, BA (Business Analysts), maintenance and deployment teams, etc.
• Assessing Data Platform: Some customers maintain data lineage using custom or native tools. We can easily assess the data platform as Data lineage contains complete information including, the type of data, classification, data movement, integrations, access control, data shares, etc. If there is no lineage then we may need to prepare enterprise data lineage using tools, custom scripts, or by going through data repositories, catalogs, dictionaries, data tags, etc.
.
Alison: Interesting. coming back to the main discussion, we talked about Discovery, shall we talk about the next phase?
Vasumat: Sure, next is the migration planning phase.
• Planning Phase: Based on the Discovery phase output, we should start planning for the migration. Here we talk about:
• Cloud model: Private, Public, Hybrid, multi-cloud
• Deployment type: PaaS, IaaS, SaaS
• Migration scope: Determine which applications and dependents to be migrated,
• Subscriptions, Tenants: Based on region-specific requirements, environments (DEV, QA, PROD), and billing requirements we can go for multi subscription and tenant model.
• Resource mapping: For example, RDBMS with Azure SQL / Managed Instance; Oracle with Azure VM nodes; NoSQL with Cosmos DB APIs; Choose service tiers and performance levels based on computing and storage requirements; File-server with Azure File Shares; Object storage with Blob or Data Lake; Data warehouse with Synapse; Applications to Azure App service or host them on an Azure VM; ADF for Cloud-based ETL and SSIS workloads; Elastic jobs for SQL Agent jobs; Azure functions for running event-based programs; Azure Logic Apps for running workflows; Event Hub, Stream Analytics, IoT hub and Queue storage for streaming and analytics purpose; Azure HDInsight, Databricks, ML, Synapse Analytics, ADLS analytics, Stream Analytics, Cosmos DB, etc. for Big Data workloads; Azure Purview / Data Catalog for Data Lineage.
• Migration strategy: Lift-and-Shift, Phased, 6R principle
• Migration type: Offline, Online
• Migration approach/Tools/Services: DMS (Data Migration Service), Data Migration Assistant (DMA), SSMA (SQL Server Migration Assistant), Backup/Restore, Replication, ETL, etc.
• Migration Sequence: Migration sequence/priority for applications and dependents
• Timelines: Doing POCs, calculating estimated timelines, and defining business downtime, we may need to consider application development schedules, release activities, downtime requirements, etc.
• Budget and cost calculations: TCO (Total Cost of Ownership), ROI (Return On Investment), Migration Cost, User training cost, licensing cost, hidden cost like downtime cost or upgrading to the latest, etc.
• Cost modeling: Pay-AS-You-Go, On-demand, Reservation, Long-term storage, etc.
• Legacy applications: Consider upgrade and migrate
• CI/CD Integration: Integrating code build and release with cloud-based continuous integration / continuous delivery (CI/CD) pipelines.
• Connectivity requirements: Between on-premises and cloud (Ex: for migration activity, hybrid cloud scenarios, VPN Gateways, ExpressRoute, etc.), internet gateway requirements, VPN-to-VPN connectivity, proxy requirements, Load Balancers, etc.
• Cloud Security: Consider Storage and transparent layer Encryption, Data Masking, customer-managed keys, firewalls, Network Security Groups, RBAC (Roll Based Access Control), ACL(Access Control Lists), active directory mapping, SSO integration, Multifactor Authentication, AKV (Azure Key Vault), Azure storage encryption, Azure cloud backbone/private data transfer, Database TDE (Transparent Data Encryption), Azure ATP (Advanced Threat Protection), etc.
• Documentation: Keep all collected and prepared documentation in Phase-1 and 2 at central document storage.
• Architectures: Application level as-is and to-be architectures
• Policies and procedures: Infrastructure, security frameworks, policies, SLA, Vendor contacts, etc.
• Migration Problem Resolution document: We must have a central document that should include all possible migration blockers, detailed root cause analysis, and possible resolutions for both the short and long term.
• Performance baselines: Capture the current baselines on-premises and document them. So that we can compare it with the Cloud baselines once migrated.
• Business and User Impact Analysis: Identify and document the impact on end customers after migration.
• Test cases: Along with the application-level test case documents, we should create and maintain a test case document for workload migration.
• Communication: We must work on a training plan and communication with the external users and stakeholders
◦ Phase 2 – Output: “TO BE Architecture” means cloud architecture for our on-premises workloads with a detailed migration plan.
.
Vasumat: Any questions at this phase?
Alison: Well, I have some, but I would prefer to talk about it, after discussing the next 2 phases.
Vasumat: Sure, after successful planning we may need to proceed ahead to the execution phase.
• Execution: After assessment and migration planning, we go for business approvals on commercials/budget/resource billing/vendor, etc. Once everything is in place, we need to follow the plan and execute the migration.
• In any enterprise environment, in 99% of cases first we choose the simplest and most stable non-production workload for migration. There are three key aspects that we need to capture and document during the migration process:
◦ Testing: Application, connectivity, integration, functionality, performance, and security
◦ Statistics: Capture the migration statistics including configurations, time durations, errors, surprises, warnings, data volume, etc.
◦ Solutions: Conclude both short-term and long-term solutions for the identified errors and issues during the migration.
• This process is iterative till the last application and its dependents got migrated to the cloud as per the plan.
◦ Phase 3 – Output: Successful migration result in a fully tested application running on cloud infrastructure.
.
• Optimize: Once we migrate our workloads to the cloud, we need to closely monitor, review, and optimize the workloads, especially from performance and cost wise. That includes:
• Resizing the compute resources: Upgrading or downgrading between service tiers, storage allocation, etc.
• Leveraging the modernized features: For example, automatic tuning, geo-replication, advanced threat detection, RBAC, Multi-Factor Authentication, etc.
• Seek more automation opportunities that improve operational efficiency Ex: Azure Automation, ARM Templates, Azure DevOps CI/CD pipelines, etc.
• Slowly integrate with the Cloud-based logging, monitoring, and alerting tools Ex: Application Insights, Azure Monitor, Log Analytics, Network watcher, Resource Health, etc.
◦ Phase 4 – Output: An optimized (better performance with the least possible cost) enterprise workload.
.
Alison: Impressive explanation, thanks for that. I’ve got a couple of questions.
Alison: Between SaaS, PaaS, and IaaS which is the best deployment model?
Vasumat: Each type of service model has its benefits; thus, it is our responsibility to recognize the feasibility, compatibility, and differences between these three and the result will be the right choice that is appropriate for our business.
.
Alison: Assume I am a newbie to the Cloud world; can you describe cloud deployment models with some understandable real-time examples?
Vasumat: Sure, I’ll compare it with the transportation terms
• IaaS (Infrastructure as a Service – Host on it):
◦ Renting infrastructure (server/VM, hardware, network, etc.) – Like a rental car, we don’t own it, we just drive ourselves, and we can return, change, or upgrade whenever we wish.
◦ We can rent a VM, install our app, use it, and return the VM to the cloud once our business is done.
We need to drive (control/take care) our application.
• PaaS (Platform as a Service – Build on it):
◦ Renting the platform (Operating systems, middleware, database/warehouse, storage, functions, app service, Cosmos DB, Redis Cache, event grid, etc.) – Like a taxi/Uber, we book a ride, we don’t own the car, we don’t even drive, a driver takes us to the destination.
◦ We can rent a database, create our schema, populate with the data set, use it and release it once our job is done. Database installation and maintenance will be taken care of by the cloud. We just need to use the service.
• SaaS (Software as a Service – Consume it):
◦ Renting software – Like public transport, routes are pre-set, combined with rides with other people.
◦ We can rent Office365 as an email service for our enterprise.
• Private Cloud / Own Data center: Like your own car. It’s a capital expenditure, we are purchasing an asset hence we have full control over it. But we must handle upfront and maintenance costs.
.
Alison: You described four different phases in migration, which is the most significant phase that determines the success or failure of the migration project. I know all are equally important but rank them and tell me about the top ranker.
Vasumat: The migration planning phase is the key aspect. But again, it depends on the Assessment phase. I would say that Assessment + Migration Planning grades the first rank.
Alison: Can you justify? Take an example if required.
Vasumat: Sure! I reckon you heard the quote “IF YOU FAIL TO PLAN, YOU ARE PLANNING TO FAIL” by Benjamin Franklin. People mostly use this quote in the financial world, but I believe it applies to every choice in our life. Thus, the same is suitable for Cloud Migration too. Let me explain with an example.
• Use Case Background: Assume person A wants to enjoy a week of the holiday season in Switzerland and he is making the travel arrangements. Mr. A and his family stamped the tourist VISA hence he sought the flights, identified the flights based on the departure time, and cost, and quickly booked the tickets from New York to Zurich airport.
• Problem-1 (Cost): On the day of travel, they reached the airport and at the time of boarding pass, he realized that the allowed baggage is very less compared to the general standard, hence they had to pay for the extra luggage.
• Problem-2 (Time): As per the itinerary New York-US > Boston-US > Lisbon-Portugal > Zurich-Swiss. In Boston, layover time is 4 hours and in Lisbon, layover time is 10:50 Hrs. and the total travel time including the layover is 28 hours.
• Problem-3 (Readiness/Compatibility): Finally, they reached Zurich airport, and when the immigration officer verified the Covid-19 vaccination certificate of Mr. A, they found that his surname was not fully mentioned on it.
Officer reached his higher officials for more confirmation, and Mr. A and his family had to wait for 2 more hours for the approvals, somehow the problem got resolved and they approved Mr. A and his family’s entry into Swiss.
• Problem-4 (Readiness/Compatibility): They realized that the time in Swiss is 2:30 AM with extreme snowfall condition outside. They reached the hotel and end-up with a final surprise when Hotel reception requested a Covid-19 negative test report within the last 48 hours. Since it is a mandate, they couldn’t get the room allotted. Mr. A struggled to find a hotel with no such restrictions and found one which is located on the outskirts, reached there and at last, they checked in by morning 6:30.
• Problem-5 (Readiness/Compatibility): Unlikely, this year Swiss is facing extreme weather conditions, and as per the weather prediction, it could be worse for the next 5 days.
• Mr. A end-up with an awful holiday trip.
• Use Case Analysis:
◦ Failures in Assessment and Discovery: Mr. A didn’t properly check the airline limitations, layover times, immigration rules, hotel policy, arrival time, and weather conditions.
◦ Execution Phase: Here he made a wrong plan but followed it blindly. Certainly, we can compare the same example with a cloud migration project.
◦ Do not choose the target resource just based on its pricing. For example, serverless comes at a cheaper price, but that doesn’t mean it suits all our requirements.
◦ Eagle View: While planning for migration consider all possible aspects. If we do not consider the migration timelines, possible failures could be, the project might be planning critical releases, or if our migration activity overlaps with a maintenance overlap in the target environment, etc.
◦ Resource Mapping: Map resources and compute tiers after properly analyzing the current workload statistics and performance baselines. For example, a standard SSD comes at a cheaper price, so I choose it for my production workload. So that I can project some margin in cost saving sheet. No, it doesn’t work out like that, we can only conclude the right choice after considering the IOPS and throughput requirements. However, a premium SSD is the minimum for all production workloads.
◦ To conclude, we need a perfect plan which requires comprehensive analysis. So, a perfect plan with a proper execution drives our migration project into a successful path.
.
Alison: (With smiling) I think we need some coffee now!
Vasumat: I am afraid, am I talking a lot?
Alison: No no, I didn’t mean it! I was just trying to map your thought process with our technology infrastructure. By the way, Benjamin’s Quote is my favorite too. Let’s have a 10 min break and then we’ll continue our discussion.
.
I am holding here, and we will see the remaining discussion in the next part.
Wow, Uday is back. It’s highly impressive as usual 🙂 but I need some time to digest this 🙂 Will be expecting the next discussion on Data Engineering. Thanks again, sir.
Aww, Uday gaaru I have been waiting for this. Thanks, master. By the way, I knew the original names of the interviewer and interviewee 🙂 🙂 🙂