Links to other parts
A discussion between a CxO and a senior Data Architect Part 1
A discussion between a CxO and a senior Data Architect Part 3
A discussion between a CxO and a senior Data Architect Part 4
A discussion between a CxO and a senior Data Architect Part 5
.
Background: We have been going through a discussion that took place between senior leadership and a data architect. They took a coffee break and here Part – 2 continues.
.
<During the break, had some casual talks about personal life and hobbies, etc. which is irrelevant here. We’ll see the discussion after the break>
.
Discussion Follows:
.
Alison: So, Vasumat, let’s say a CxO level employee asked you “why we should migrate to the Cloud?”. Assume that person doesn’t care about technology, but business. You need to convince him/her. Can you?
Vasumat: Well, that’s a reasonable question and we call it a Business Case. That’s what we do in the 0th phase “Migration Preparation and Business Planning”. Before starting the migration project, we need to establish a Business Case for migration, and set expectations for cost, return, and the timing of those costs and returns. Money is everything in business, so the migration business case showcases the TCO (Total Cost of Ownership) and ROI (Return On Investment) calculations that tell the benefit of migrating to the cloud.
Alison: Can you define the terms TCO and ROI?
Vasumat: Sure! Those are the key parameters to establish the business case. TCO and ROI both talk about money we need to pay from our pocket and the gain or return that we get on the investment.
Alison: I am listening….
Vasumat:
• TCO (Total Cost Ownership): is the sum of the expenses associated with purchasing, deploying, using, and retiring a product/equipment/asset. Essentially, the sum of the purchase and operating price of an asset for its lifetime. There are two types of costs involved direct and indirect. Ex: When we buy a car:
◦ Direct cost: Car retail price, registration tax, road tax, insurance premium, excise duty, state tax, monthly fuel expenses, car maintenance and servicing cost, etc.
◦ Indirect Cost: Parking tickets and toll charges come under hidden or indirect charges.
◦ Car TCO: The total of all Direct and Indirect expenses is called the car TCO. Likewise, we need to calculate the overall TCO for our organization’s technology infrastructure which includes both hardware, software, and operations costs.
• On-premises TCO calculation: We usually perform infrastructure audits to identify the direct and indirect costs
◦ Direct costs: Direct costs are relatively easy to calculate, as they remain on our organization’s balance sheet. It is categorized into hardware (physical assets), software (licensing, maintenance, upgrades, etc.), management (architect, hosting, reporting, etc.), support (supporting staff), and implementation (development, customizing, and integrating). Ex: Physical servers, storage devices, network devices, printers, software licenses, maintenance contracts, warranties, supplies, material, spare parts, real estate/space, internet facility, labor/resources, hiring, onboarding, training employees, maintenance, upgrades, security, disaster recovery, etc.
◦ Indirect costs: Difficult to calculate but equally important as direct costs. Ex: Un-scheduled maintenance due to breakdown or failure, business impact due to unexpected downtime, power, and energy consumptions, depreciation of asset performance and cost over time, etc.
• Cloud TCO calculation: Almost all the cloud providers are offering an advanced TCO calculator. Based on our on-premises analysis we can input the infrastructure details (servers, OS, cores, memory, storage, licensing, etc.) and it will automatically calculate and report the estimated cloud TCO. Additionally, we need to consider the one-time investment in migration activity.
◦ Direct: Cloud services & subscription, integration & testing, consulting & training, and Migration cost. Ex: Database service, App service, VM, Load balancer, data ingestion rates, storage, application testing, training on cloud technology, hiring for cloud migrations, the cost for 6R strategy (ex: upgrading to new versions), etc.
◦ Indirect: Excess compute and storage reservations, premium service packages with no proper SLA (no guarantee on gains Ex: performance, cost, support), software licenses used or forfeited (loosing or giving up) in the transition, required infrastructure at on-premises after the cloud migration (hybrid model), hosting without POC can cause unexpected cost increments, and user behavior (untrained users may provision a lot of unnecessary services without knowing the consequences from cost point), etc.
• ROI (Return on investment): It tells us how much profit or loss our investment has earned. ROI is the ratio of a profit or loss made in a fiscal year, expressed as a percentage. Here is the simple formula:
◦ General formula: ROI = (Net Profit / Cost of Investment) x 100
◦ *Net Profit = Gain from the investment – Cost of Investment
.
◦ ROI formula commonly used in cloud migrations:
◦ ROI = ((FVI – IVI) / Cost of Investment) X 100
◦ *FVI: Final Value of Investment
◦ *IVI: Initial Value of Investment
.
Alison: Vasumat, I’ll give you a small business case, would you be able to calculate ROI?
Vasumat: Honestly, I am not experienced in calculating overall business ROI. While calculating business ROI they consider employee wages, business revenue, profit, sales marketing, promotions, cash flows, IRR (Internal Rate of Returns), NPV (Net Present Value), and a lot of other factors. But I played the role of giving input from IT infrastructure.
.
Alison: I understand, but you are in the right direction indeed. Do one thing, just take a simple example, and explain how to calculate ROI, if you are not comfortable, we can move on to the next topic.
Vasumat: Perhaps, I’ll try with my understanding of ROI. Assume, “Mr. V” bought 1000 shares of a company called “C”, at $10 per share. One year later, he sold the shares for $14.5 per share. Also, he earned a $700 dividend (profit shared by the company with the shareholders) during the one-year holding period. He spent a total of $145 on trading commissions (for buying – $68 and selling – $77). Now we’ll calculate the ROI of Mr. V’s investment.
.
> Cost of Investment = ($10 * 1,000) = $10,000
> Initial Value of Investment = ($10 * 1,000) + $68 = $10,068
> Final Value of Investment = ($14.5 * 1,000) + $700 – $77 = $15,123
ROI = (FVI – IVI) / Cost of Investment * 100
ROI = (($15,123 – $10,068) / $10,000) * 100 = 50.55%
ROI of Mr. V’s investment = 50.55%
.
*Need to add commission at the time of buying as we pay it from our pocket.
*Subtracting commission at the time of selling as it is deducted from the total amount
Likewise, we calculate the ROI of our business. Probably we can compare trade commission with the cloud operating costs in the cloud. After migration, ROI in the first year might be low or negative when migration cost is greater than savings. However, when expanded to a 3-year or 5-year model, the savings increase drastically.
.
Alison: Thanks for the explanation. So, TCO and ROI are the key contributors to cloud migration decision-making. Isn’t it?
Vasumat: Yes! Along with this, additionally, we must consider the time factor (we can provision a new server in a few min in the cloud), modernization, innovation, accessibility, flexibility, and security. These factors lead to more productivity and hence more business and money.
.
Alison: Agreed. During this TCO and ROI process have you heard about CapEx and OpEx?
Vasumat: Yes! Those are just standard naming conventions to identify:
• Capital Expenditures (CapEx – Upfront costs) are assets purchased for long-term benefits that remain on an organization’s balance sheet and depreciate over time (ex: servers, hardware, and insurance tied to asset use, etc.). In other words, “predicting the future usage and investing in advance”. The best example of CapEx investment in the cloud would be Reserved instances. We are reserving the VM usage for a long time with discounted prices.
• Operational Expenditures (OpEx – Pay as we use) are expenses associated with purchasing services for a pre-determined period Ex: cloud and SaaS products, and pay-as-you-go pricing.
.
Alison: Tell me what is the best approach lift-and-shift or phased migration?
Vasumat: There is no best method, it always depends on the requirement as it is unique for each business. But in a general opinion, to make the cloud journey more optimal, risk-free, and cost-effective in long term, I would recommend a phased approach.
Alison: I think I am disagreeing with you. Do you know the benefits of the lift-and-shift approach?
Vasumat: I agree that Lift-and-shift/rehosting/forklift is the fastest migration method. We simply move our systems as-is to the cloud without having to re-architect them. It is also less resource-intensive because we don’t up-version, tweak, or refactor our source application.
Alison: Hence, you are changing your recommended option from phased to lift-and-shift, isn’t it?
Vasumat: I reckon, I still prefer a phased approach.
Alison: Then you may need to convince me from lift-and-shift to a phased approach.
Vasumat: Well, I am again emphasizing that one solution doesn’t fit all business requirements. I was not saying that lift-and-shift is a bad approach but considering all the factors phased approach suits most business cases. In my experience, for one of the clients, we used the lift-and-shift approach and successfully migrated their 85 applications and 200+ TB of data to Azure. However, let me explain the challenges involved:
Lift-and-shift is like shifting our infrastructure between two data centers. We can’t make our application cloud optimized which means we do not get the full advantage of cloud features. We may not be able to operate highly customized or complex applications effectively on the cloud without modification. Handling legacy applications is also difficult. In the long run, keeping a rehosted application (since it is not cloud optimized) as-is can raise cloud costs by up to 15% due to overprovisioned compute resources. In the lift-and-shift approach there are 4 main problems reported by the technology giants across the world:
A) Technical difficulties and integration issues with public cloud
B) Unexpected costs
C) Performance degradation
D) Unable to use PaaS features.
One of the famous product organizations moved their workloads as-is via a lift-and-shift approach and they had to roll back (returned to the local data center) after their application’s performance degraded.
.
Alison: Tell me something, what are the possible causes for performance issues after cloud migration?
So, have you seen any such performance degradation use case after the cloud migration?
Vasumat: From my experience, lack of planning and testing is the major reason for cloud failures. Poor planning leads to wrong cost, performance, and unrealistic timelines and hence failure. Other reasons include:
• Wrong selection of cloud provider, feature, infrastructure, service, data layer, configurations, etc.
• Ignoring integrations, security compliance, critical business cases, cloud readiness, business schedules, etc.
• Inadequate skills/talent/resources, underestimating cloud migrations, etc.
.
Alison: In your experience, did any of your customers approached you saying that their application performance is down after cloud migration?
Vasumat: Yes! But not from our vertical. I mean, in our Org, I was projected as a data platform COE for the Asia Pacific. One of the customers from a cross-business approached me for help.
• The problem statement was “for one of the apps, they have been facing a drastic performance degradation after migrating it to the cloud. App database server was migrated to cloud VM as they needed more control over the database server and OS. For the production server, they have been using the same configuration both on-premise (before migration) and cloud. But, application performance was downgraded by 35%. For time being they have added more computing (Memory and CPU) with no use. Moreover, both database and OS contain the latest version”
• My observation: Very first thing that caught my attention was the database edition, they have been using the standard edition. I inquired and came to know that they have been using Enterprise edition on-premise. Since they had not been using any enterprise edition features, simply downgraded to the standard edition. Since performance was down, they added more CPU (48 cores – 24 cores at on-prem) and memory (512 GB – 256 GB at On-prem) and still, it was not helping them. It is obvious because it is not just about the feature support, the standard edition has limitations on computing utilization. It supports a maximum of up to 24 cores and 128 GB of memory.
• Solution: Upgraded the edition to enterprise and downgraded the VM service tier to match the on-prem compute. The next moment app started performing under the given baseline.
• Technical mapping (finding the right platform and resources in the cloud) must be done with Eagle View. Otherwise, we end up with these kinds of issues.
.
Alison: Suppose we have 50 applications running on-premises, which one do you choose first for the migration?
Vasumat: I will choose the easiest one first.
Alison: What is your definition of “easiest”? How do you identify it?
Vasumat: We identify the complexity, and tag applications with a number (1 is the easiest and 50 is the most critical) based on three major parameters.
A) Business impact (If the application is down, no of users/transactions, revenue impact for one hour)
B) Application architecture complexity (number of dependencies, integrations, etc.)
C) Data volume
.
Alison: Make sense to me. But what are the usual business cases fit for the lift-and-shift method?
Vasumat: Well, from my experience I observed 4 usual cases that suit lift-and-shift.
A) Due to some restrictions at the source, we cannot establish proper connectivity between on-premises and the cloud. Ex: Some DMZ environments have a lot of restrictions on the network.
B) Looking for faster migration: Because the phased approach takes a lot of time and effort, especially for large-scale migrations.
C) When a Business says “Do not touch our application architecture”: May be using legacy or tightly coupled applications or the business is not interested in investing in rearchitecting the applications.
D) Application is flexible enough to be optimized after the migration: We migrate the workload as-is and then slowly apply the optimization by taking advantage of cloud features.
.
Alison: Do you know what is 6R strategy?
Vasumat: Well, 6R represents the cloud migration strategies and it essentially answers the question of how to migrate our IT assets to the cloud.
• Rehost (Lift-and-shift): Migrate applications without any changes. It’s one of the quickest and easiest cloud migration strategies. Ex: Create AWS-EC2’s or Azure VMs and simply migrate on-premises apps and databases to the cloud without making any changes. Suits large-scale apps and migrations running with a deadline.
• Re-platform(lift-tinker-and-shift): Make a few configurational changes to the apps to better suit the cloud environment without changing the core architecture. Ex: Migrate web applications to VM (IaaS) as-is and database instances to a database-as-a-service platform like AWS RDS or Azure Database (PaaS). Suits for leveraging the cloud benefits without refactoring the app.
• Refactor/Re-architect: Rewriting our applications from scratch using cloud-native technologies. Ex: microservices architecture, serverless, containers, function-as-a-service, load balancers, etc. It is the most expensive, resource-intensive, and time-consuming migration approach. But we can see significant productivity, business improvement, cost savings, and performance improvement in the long run. Suits for A) Strong business need for scalability, speed, and performance. B) Migrating to the cloud with considering the long-term benefits. C) Migrating legacy applications to the cloud.
• Repurchase (Drop & Shop): Replacing the on-premises application with a cloud-native vendor-packaged software (SaaS). We drop the existing on-premises license and start a new license with the cloud provider. Ex: Customer Relationship Management (CRM) to Salesforce.com, HR System to Workday, Content Management System (CMS) to Drupal, mailboxes to office365, Collibra to Azure Purview, etc. Migration is simple, fast, and eliminates a lot of effort. Suits business cases where we are replacing the traditional software with a Software-as-a-Service model.
• Retire (Decommission): Remove applications that are no longer needed. After the assessment and discovery phase, we can identify the unnecessary applications (components, tools, services, data, processes, etc.). As per the IT industry stats, 10-20% of enterprise IT portfolio (apps, infra, compute power, etc.) is no longer useful when migrating to the cloud and can be simply turned off. Ex: Environment cleanup before migration would be very helpful to fix the migration scope and it can instantly boost our savings.
• Retain (Re-visit): If your application doesn’t come under any of the above 5-R’s then keep using it from on-premises as-is. It happens with applications that require a significant amount of refactoring before they can be migrated to the cloud. Suits for A) Hybrid cloud deployments B) Compliance or regulatory constraints not allowing to be on cloud C) Legacy apps not compatible with Cloud and business is negative on new investment D) Application will be retiring soon E) You need to more time for decision and would like to revisit the app later.
.
Alison: Thanks for the great explanation. Can you define public, private, and hybrid cloud models?
Vasumat: Sure.
• Public cloud (Infra is shared between organizations): This is cloud computing where infrastructure is managed by the third-party provider (Azure, AWS, GCP, Oracle, etc.), we host our applications and those will be accessed by our users over the Internet.
• Private cloud (dedicated infra): is cloud computing that is dedicated solely to our organization.
• Hybrid cloud (combination of both): any environment that uses both public and private clouds.
• Multi-Cloud: Like a Hybrid cloud, an enterprise uses more than one cloud (private or public, AWS, Azure, GCP).
.
Alison: How to choose the right cloud model for any business?
Vasumat: After evaluating the source environment, match it with the correct model.
• Public Cloud: A) To meet unpredictable workload demands: It is a highly scalable and flexible cloud. B) Reducing CapEx: Don’t buy servers, instead buy a service via the Pay-as-you-go model. C) IT infrastructure management overhead: Push it over to the cloud provider and focus on business. D) Cost savings: If we properly optimize our applications, we see drastic savings over a long time. E) Faster Innovation: We can provide the required infra within minutes and start development.
• Private Cloud: A) Highly regulated industries Ex: Nuclear industry, finance related, Military agency, etc. B) Highly sensitive data Ex: Personally Identifiable Information (Social Security Number, Passport Number, Aadhar, etc.), medical formulas, government agency data, etc. C) Full control over infrastructure: We need strong control over the data center and security D) Industry-specific hardware: Our apps required some customized advanced data center technology which is not currently available in the public cloud.
• Hybrid Cloud: A) We have multiple verticals with different IT security, regulatory, and performance requirements. B) Want to take advantage of both private and public cloud platforms? Ex: We host sensitive data on private and non-sensitive on the cloud. We retain legacy applications on on-premises and use the cloud for modern apps. C) Retiring an application in near future, so we do not want to invest in it.
• Multi-Cloud: A) Business wants to take advantage of multiple clouds in their respective specialization. Ex: We have been running Oracle systems, for better compatibility and performance, migrating database instances to Oracle Cloud Infrastructure (OCI), and for cost saving and more flexibility using Azure for hosting applications. B) To meet regulatory requirements, sometimes we need to choose multiple cloud providers due to the availability of data centers in required zones and regions.
.
Alison: That answered my question. Ok, now, we take a break. You must have received a card along with your visitor badge, right?
Vasumat: Yes!
Alison: Excellent, at the right-side corner you find a snack bar, there you find a plethora of food options. You can use that card (no limits ha) and have some snacks and drinks. Also, you will see a beautiful garden outside, you can sit and relax there. We’ll meet here after 60 min. Hope you are ok with that.
Vasumat: I’m starving too! Would love to have a break now.
.
Discussion continues in the next part.
.
.
.
.