A discussion between a CxO and a senior Data Architect Part 5

A discussion between a CxO and a senior Data Architect Part 5

.

Links to other parts

A discussion between a CxO and a senior Data Architect Part 1

A discussion between a CxO and a senior Data Architect Part 2

A discussion between a CxO and a senior Data Architect Part 3

A discussion between a CxO and a senior Data Architect Part 4

.

Background: We have been going through a discussion that took place between senior leadership and a data architect. Here the final part of the series continues.

.

Discussion Follows:

.

Alison: We are currently holding stakeholder financial portfolios, customer personal identities, and other sensitive information which is classified as confidential and restrictive. When I say moving to the cloud, the first thing that comes to my mind was data security. We are going to store our corporate data in a public cloud data center like Azure or AWS. Since you are the data owner, you need to convince me about the cloud migration by explaining the public cloud security capabilities. Considering I have zero knowledge about cloud security, can you list out all possible security risks and how Cloud providers can handle them?

Vasumat: Security is the top concern for any business. Security is a shared responsibility between the cloud provider (Azure, AWS, Google Cloud, etc.) and the customer in a public cloud platform. Three fundamental objectives of data security are A) Confidentiality – Ensuring data privacy; B) Integrity – Protect data from accidental or intentional alteration or deletion without proper authorization; C) Availability / Data Resiliency – Despite the incidents data continues to be available at a required level of performance.

 Things to be protected: We need to protect everything that belongs to our enterprise infrastructure. However, those are categorized as: Cloud endpoint, Network, Data, Application, Resource, Keys & Identities, Backups, Logs, and Cloud Datacenter – Physical device protection.

.

Possible security risks, reasons, solutions / preventive measures:

 Account hijacking: Compromised login credentials can put our entire enterprise at risk.

Reason: Weak credentials, unchanged passwords, keylogging (monitoring keystrokes), sharing credentials, etc.

Prevention: Strong credentials, no sharing, define expiry time for tokens, enable password policy, enable multifactor authentication, do not write passwords in a clear text format, store keys, and certificates in Azure Vault, allow access only to the specific IP addresses, do not use public computers or Wi-fi to connect to the cloud portals, etc.

.

 Human error: It is an indirect threat to our cloud workloads. Ex: Unknowingly deleting a resource, downloading insecure applications, misconfigurations, etc.

Reason: Low clarity of goals, untrained resources, unclear policies, not having proper data handover process in resource exit formalities, etc.

Prevention: Train the resources, make your IT policies stronger (Ex: Password expiry, restricting risky apps, games, pirated software downloads, internet gateways), create a tight monitoring control, etc.

.

 Application Security failures: Web applications are increasingly targeted by malicious attacks that exploit commonly known vulnerabilities. Ex: SQL injection (code injection technique, malicious SQL code/statements are inserted into application fields and try to access information that was not intended to be displayed.), cross-site scripting (attacker sends malicious code/payload to the server using feedback form, comment section, etc.), etc.

Reasons: Not sanitizing the inputs, not implementing timeout policy, displaying session IDs in URL, not using SSL/TLS, not encrypting passwords, failing to verify the incoming request source, exposing object references (table/view/function, database, file, storage, server, etc.), exposing error handling information to the end client, running unnecessary services, using outdated software, plugins, not having a standard audit policy, etc.

Prevention: Properly sanitize the user inputs; configure session timeout based on requirement; do not expose unnecessary information (error info, object references, session ID, app metadata, etc.) to the end client; always make sure that underlying app components are updated with the latest patch; don’t do redirects at all. If it is necessary, have a static list of valid locations to redirect to; equip apps with SSL/TLS, multi-factor authentication, etc.; Establish a strong security clearance layer, which means every time new code is deployed, we need to review, scan and identify security loopholes; Enable Web Application Firewall which acts as a layer between application and internet and filters the traffic and protects our App from common attacks like cross-site forgery, cross-site-scripting (XSS), file inclusion, and SQL injection. It is recommended to use Cloud-based WAF to automatically update it to handle the latest threats. Schedule periodic audits on application code. We use vulnerability scanners like Grabber, which performs automatic black-box testing and identifies security vulnerabilities.

.

 Data Breach & Theft of Intellectual Property: Altering, deleting, uploading, or downloading our corporate data without authorization is called a data breach. If it happens for sensitive data (patents, trade secrets, PII-Personal Identifiable Information, financial info, etc.), we need to notify the victims and it can critically damage our organization’s image, sometimes leading to legal actions and heavy penalties. Ex: Cyber attacks, phishing attacks, malware Injections, etc.

Reason: Data Breach and theft of IP are the implications of a failed security framework. Typically, any security weak point can cause this to happen. Ex: Leaked Credentials, human error, application loopholes, weak or not having IT policies, storing encryption keys along with the encrypted data, etc.

Prevention: We must be able to control the entire workflow and data flow in our cloud workload. When a request is coming or going from/to our cloud network, we (our policies, standards, security posture) must drive the flow that includes “who can enter our network?”, “sanitizing the request based on its source and access pattern”, “network route it can take”, “resource that it can reach”, “data it can access”, “actions it can perform”, “results it can carry back to the request initiator (service, app, browser, etc.)” etc.

 To implement this, we need to have a strong authentication and authorization mechanism, giving the least possible permissions, enabling threat detection, restricting access to the specific IP addresses, applying data protection features (Classification, data masking, encryption, etc.), securing backup files, log files, frequent audits (data and application) and fixing the problems, take patching (IaaS) seriously, defining the clear data boundary standards and implementing the policies accordingly, storing encryption keys and certificates separately (using a key vault), etc.

.

 Data Loss: Data loss is any process or event that results in data being corrupted, deleted, and/or made unreadable by a user and/or software or application.

Reason: Hardware failure, power failure, datacenter failures, natural disasters, accidental deletion, not understanding and having proper agreements (data retention period), not having proper backup, no or weak disaster recovery plan, not performing the backup health, not having the tight protection control for backups, etc.

Prevention: Understand the SLA (Service Level Agreement) on Data Retention Policy (How long data is required and how to dispose of), Recovery Point Objective (RPO – Maximum allowed data loss), Recovery Time Objective (RTO-maximum allowed downtime), and plan your backup and disaster recovery accordingly. Depending on data volume, and operations, perform DR drills to ensure that backups are healthy. Wherever possible keep the secondary copies in across regions, utilize long-term backup features, etc.

.

 Compliance issues: Regulatory compliance is to mitigate the risk and protect our (both enterprise and customer) data. Our cloud infrastructure must be compliant with the regulatory standards that would be defined by our enterprise business team. If we fail to follow the standard and data breach or loss happens, our organization will be in a difficult position from both legal and financial aspects (higher penalties up to $21 million). The most common regulations are GDPR (General Data Protection Regulation), and data privacy (CCPA), likewise we have various regulators for health insurance (HIPAA), payment card (PCI DSS), financial information (SOX), etc. Sample compliance rules: A user mustn’t have access to prod and no-prod servers, block public internet access to VM, restrict a number of administrators, enable password policy, store keys and certificates separately from data, missing security patches is a compliance issue, etc.

Reason: Companies are not taking regulatory compliance seriously; Many companies are still in the awareness stage; Thinking about the investment in implementation efforts that requires collaboration, strategies, and skillset. Mostly programmers and IT developers are taking regulatory compliance as the least priority.

Prevention: Its Big bosses’ (IT Decision maker, cloud/data architect) responsibility to insist on compliance with the regulatory standards; At on-premises we may need to use third-party tools to audit our infrastructure to validate the compliance with the regulatory, but in the cloud, we have in-built support. We can use Azure policies to implement the required standards. “Regulatory compliance dashboard” in Azure security center is one of my favorite features. It monitors, validates, and reports non-compliant issues. So that we can fix them to ensure that we are complying with the regulatory standard. It validates almost all aspects Ex: Network, Cloud endpoints, Data protection, Threat detection, vulnerability management, privileged access, backup & recovery, etc.

Continue reading

Posted in Interview Q&A | Tagged , , , , , , , , , , , , , , | Leave a comment

A discussion between a CxO and a senior Data Architect Part 4

A discussion between a CxO and a senior Data Architect Part 5

Links to other parts

A discussion between a CxO and a senior Data Architect Part 1

A discussion between a CxO and a senior Data Architect Part 2

A discussion between a CxO and a senior Data Architect Part 3

A discussion between a CxO and a senior Data Architect Part 5

.

Background: We have been going through a discussion that took place between senior leadership and a data architect. Part – 4 continues.

.

Discussion Follows:

.

Alison: Can you tell me the most complex migration that you have done till today?

Vasumat: Sure! We have migrated workloads for one of the European-based engineering and construction firms. They operate from 85 offices across the globe with a staff of 13000 (IT Staff 4300) and revenues over $6.5 billion with a customer count of 0.4 million. We have created the business case by considering two major challenges. A) Data management on expensive storage – They are generating enormous data and struggling to maintain, and manage data and in handling disaster recovery solutions at their data centers. B) Scalability – Systems are not scalable to handle heavy workloads. We proposed the cloud solution by addressing these two challenges. Projected Azure as the suitable target by considering the TCO, ROI, and compatibility (majority of their workloads are using Microsoft products .Net, Windows OS, SQL Server, Office suite including Email, SharePoint, etc.).

.

Alison: Great! Can you summarize the migration?

Vasumat: Sure!

 Migration challenges include legacy systems (SQL Server 2000, Windows 2003), migrating huge data sets, Oracle to Azure SQL Database (Customer requirement), different migration strategies for various applications (refactoring, re-platforming, lift-and-shift, phased), critical application migration (with near zero downtime), hybrid cloud for 5 applications (some components at on-premises and some at Azure), heterogenous database systems (SQL Server, Oracle, MySQL, MongoDB), etc. diversified feature selection (PaaS, SaaS, IaaS, Serverless), etc.
 Tools used: Azure Migrate service (to migrate VMs & SQL Server – Discovery & Assessment, Server Migration Services); Azure Data Box (Shipping over 40 TB of data in a physical device); DMA-DMS (Data Migration Assistant – Database Migration Service for migrating databases); Azure Data Factory (Migrating huge datasets); AzCopy (Migrating storage); Virtual Machine Converter (MVMC) / Virtual Machine Manager (VMM) (for converting VMs on VMware hosts or physical computer to VM running on Microsoft Hyper-V; Azure Site Recovery (ASR) Azure DRaaS – Disaster Recovery as a Service for VMs, in some scenarios we also used it for VM migration; Recovery Services vault for Azure (For storing VM backups); VPN Gateway/ExpressRoute (To establish proper communication channel between on-prem and Azure); Azure Synapse Pathway (while migrating from data warehouse to Synapse, it Converts DDL and DML statements to compliant with Azure Synapse Analytics); SQL Server Migration Assistant (SSMA – to migrate from other RDBMS like MySQL, Oracle etc. to SQL Server and Synapse Analytics); Azure AD Connect (To synchronize on-prem active directory to Azure Active Directory) etc.
 Migration Outcome:
 As part of the analysis, we Identified unused/unnecessary applications and retired 180 VMs on-premises. Successfully migrated 260 applications, 1400+ servers, and VMs (1100 Windows, 300+ Linux),
 650+ database instances (SQL Server, Oracle, MySQL, MongoDB),
 Project environments migrated – DEV, TEST, STAGE, PRE-PROD, PROD,
 38 file shares, 12000+ active users, 25+ domains, 30+ messaging services,
 Data warehouses, reporting, and ETL solutions (SQL, Informatica, SSIS, SSRS),
 1.2 petabytes (1200 TiB) of data was migrated.
 Finally, we closed/decommissioned 97% of on-premises legacy data center resources.
 Auto-scaling cloud infrastructure helped them to handle peak loads with zero issues. At on-premises they used to experience an average of 3 to 4 outages/downtime per month due to infrastructure issues. After cloud migration, they had zero outages for the first 8 months and 1 outage in the first 12 months.
 Also, based on the utilization frequency, we leverage the storage data tiers (Hot, Cool & Archive) which saved a lot of cost to the business. Faster and more efficient application development, management, and support.

Continue reading

Posted in Interview Q&A | Tagged , , , , , , , , , , , , , , | Leave a comment

A discussion between a CxO and a senior Data Architect Part 3

A discussion between a CxO and a senior Data Architect Part 3

Links to other parts

A discussion between a CxO and a senior Data Architect Part 1

A discussion between a CxO and a senior Data Architect Part 2

A discussion between a CxO and a senior Data Architect Part 4

A discussion between a CxO and a senior Data Architect Part 5

.

Background: We have been going through a discussion that took place between senior leadership and a data architect. They took 45 min of lunch break and here Part – 3 continues.

.

Discussion Follows:

.

Alison: Hope you enjoyed your food.

Vasumat: Yup, more than that, I liked that elegant garden, especially the spectacular mountain view from the corner.

Alison: Hmm, yes, that’s a favorite coffee spot for everyone on the floor.

Alison: Coming to the business, now I would like to discuss your area of expertise. I believe you have extensive experience in migrating data platforms. Isn’t it?

Vasumat: True!

.

Alison: Can you describe the steps involved in migrating database systems from on-premises to the cloud? Well, let me simplify my question, let’s suppose we have some heterogeneous database systems like Microsoft, Oracle, etc. I would like to know how you handle the database migrations. Since we already discussed phases, you can explain the series of steps involved. Appreciate it if you can explain it with whiteboarding. Take your time and explain as much detail as possible.

Vasumat: Sure! Assuming we already have an approved business case and talent is assigned.

.

Migration Planning:

 Assess and Discover: Collect every possible information and prepare a data-estate / data-sheet / metadata about Data (file shares, network shares, data lakes, etc.) and Database systems (SQL, NoSQL, etc.) from the source environment. It includes database instances (RDBMS, versions, edition, license etc.), instance type (database engine, reporting, analytics, operational etc.), configurations (trace flags, collation etc.), features (FileStream, PolyBase etc.), host details (virtualization, physical servers, OS type, license, etc.), computing power (memory, disk, CPU, throughput etc.), storage – data volume (data size, data growth rate etc.), partition info (keys, distribution rate), data type (Transactional, analytical, Archive, warehouse, reporting, Staging etc.), environment (DEV, Testing, QA, Staging, Pre-Prod, Prod), tools (development, deployment, monitoring), networking, security (NSG, firewalls, SSL, encryption, login – passwords, service accounts, permissions etc.), backup strategy, high-availability, disaster recovery, load balancing, data dependencies, Database IT operations (Change request, incident, problem request management etc.), Key Performance Baselines and Indicators (IO, CPU, Memory, Query timeouts, errors ) etc.
 Integration Matrix: A data entity can be a “data provider” “data consumer” or both. We must prepare a matrix that can project all dependencies of our data entity/database. This means identifying the list of applications, services, and programs that are using our databases. Also defining and tagging application with priority (based on the revenue impact), complexity (number of components involved), route (source, destination, IP, port, port type, inbound/outbound, access credentials, etc.) information. It plays a vital role in determining the migration scope.
 Define the Migration Scope: Based on the assessment details, we can classify the data entities between, moving to the cloud, retaining at on-premises, and retiring/decommissioning.
 Define the Cloud solution: Need to identify
 Cloud model: Public, private, Hybrid, or Multi-Cloud
 Cloud Service provider: Azure, AWS, Google Cloud, Oracle Cloud Infrastructure, etc.
 Cloud Service model: Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service. From the DB side, we deal with PaaS and IaaS. We conclude this in the resource mapping step.
 Resource mapping: Map on-premises resources with the cloud resources. If we plan to re-platform or refactor, check compatibility, and perform a POC (Proof of Concept) in the cloud. Sizing the cloud resources must be handled by considering the expected performance and allocated budget. In general, we allocate a little higher compute resources than required, so it helps us for faster migrations. Once migration is done, we can downgrade the service tier to match the exact required size.
 For example, Map Database systems to Azure SQL Database, Database Pool, Managed Instance, AWS RDS/Aurora, Azure SQL Serverless, AWS Aurora Serverless, Azure VM, AWS EC2, etc.; Servers or Virtual Machines to Azure VM, AWS EC2; Network drives and shared folders to Azure Files, AWS Elastic File System (EFS); Web Applications to Azure App Service, AWS Elastic Beanstalk; Virtual server disks to Azure Managed Disks, AWS Elastic Block Storage (EBS); Storage solutions to Azure Blob Storage, Data Lake, AWS Simple Storage Services (S3); NoSQL to Azure Cosmos DB API’s, AWS DynamoDB, Simple DB, Document DB; Load balancer to Azure Load Balancer, AWS Network Load Balancer; Programs to Elastic Jobs, Azure Functions; Workflows and jobs to Logic Apps; ETL solutions to Data Factory, Databricks, Synapse pipelines, AWS Glue, etc.; Datawarehouse to Azure Synapse Analytics, AWS Redshift, etc.; CI/CD pipelines to Azure DevOps release pipelines, AWS CodePipeline, etc. Data shares to Azure Data Share, AWS Lake Formation, etc.
 Define the cloud migration approach and tools: For each component marked for migration tag it with one of the cloud migration strategies (Rehost, Refactor, re-architect, etc.). From the database aspect mostly we do either rehost (IaaS) or refactor (PaaS). Based on the migration requirement (Online, Offline) we need to select the right migration tool. Ex: Database Migration Service, Data Migration Assistant, Backup & restore, replication, AlwaysOn, export/import, Visual Studio, bulk load, ETL tools () third party tools, etc.
 Define the rollback plan: We need to prepare a Rollback plan and typically for 4 scenarios.
 A) Migrated, application testing failed, rollback.
 B) Migrated, started using the system in the cloud (data changes, schema changes), realized something is not working or failing after a few days, rollback
 C) Rollback from PaaS deployment to On-premises
 D) Rollback from IaaS to On-premises.
 Summary: Scenario A is easy as we simply need to update the connection string back to on-premises. Scenario B & C requires a lot of effort as we need to compare the databases between source and target and identify the changes and apply those changes to the on-premises database. Scenario D will be a little easier compared to B & C options as we can easily perform backup-restore from cloud to on-premises or we can configure HA&DR solution between Cloud VM and on-premises Database Instance.

.

Pre-Migration Activity:

Continue reading

Posted in Interview Q&A | Tagged , , , , , , , , , , , , , , | Leave a comment