Authors
Jevgenijs Jelistratovs
Jevgenijs Jelistratovs
Gary Hallam
Gary Hallam
connect

In early 2021, Adam Selipsky (AWS CEO) stated that less than 5% of IT workloads had migrated to the cloud. Then, at the following year's AWS re:Invent conference, he mentioned that this number had increased to somewhere between 5% and 15%. Given the vast scale of global data and compute volumes, this represents notable progress in just over a year!

However, this figure is still far from the high double digits, and substantial work remains. At FWD View (part of the Nagarro group), there has been a significant increase in demand for cloud migration projects. Even when projects are centered on digital transformation, data modernization, or process automation, they often include components related to migration or the introduction of cloud capabilities. In the context of migration, data modernization can take several forms:

  • A lift-and-shift migration from on-premises to cloud infrastructure, maintaining all existing technologies and processes. This often includes relocating RDBMS-based data warehouses (Oracle, MS SQL Server) or Data Lakes (Hadoop clusters ) to the cloud with minimal changes to the underlying architecture.
  • A more advanced process for migrating to a modern data stack, focusing on key data consumption workloads. This may involve moving to data platforms like Snowflake or Databricks or building an end-to-end data product factory using cloud-native tools in AWS, Azure, or GCP.
  • A hybrid approach to keep on-prem production systems and migrate data workloads to cloud.

Cloud security in practice: Lessons from Capital One

Even a lift-and-shift migration has merits, benefiting from strong foundational services offered by cloud providers. This approach benefits from the strong foundational services offered by the cloud, such as logging, access segregation, and more. In “Data Governance: The Definitive Guide,” the 2019 Capital One data breach is used as an interesting example that clearly demonstrates how cloud capabilities are helping customers. The data breach occurred as a result of the web application server firewall being misconfigured, meaning the perpetrator got temporary credentials and access files containing sensitive and personal information of Capital One’s customers. Because the stolen files resided in a cloud storage bucket, all access was logged. That allowed the FBI to access the log and trace the perpetrator’s IP address to just a few houses and quickly apprehend the perpetrator.

The pace of cloud migration: progress vs. gaps

Public and private clouds provide significant benefits by abstracting hardware, offering infrastructure as code, enabling precise control and measurement of usage, and delivering economies of scale. While security was once a primary concern, cloud platforms now deliver some of the strongest security postures, with out-of-the-box features across infrastructure, platforms, and SaaS layers. When we look at data modernization in the cloud there are several key areas where improvements can be applied across hybrid cloud infrastructures:

  • Data Governance
  • Data Privacy/Protection
  • Data Lineage
  • Data Quality
  • Data Cataloguing

In the Capital One example above, we highlighted the fact that cloud capabilities helped to quickly identify the perpetrator and prevent a more significant impact to Capital One and its customers. However, implementing a proactive data governance program could further reduce the risk surface. If there were an easy-to-use, out-of-the-box data privacy and protection solution, we could better segregate application access, and anonymize, encrypt, and protect customer data.

It's true that cloud providers and data platforms offer out-of-the-box capabilities to address some of the functions I’ve listed, especially in data privacy. For example, Snowflake includes native features for classifying and tagging data to enforce row-level policies, and Databricks’ Unity Catalog ensures data security. Likewise, tools like Azure Purview, GCP Dataplex, and AWS Lake Formation serve similar purposes. However, these tools are largely aimed at technical users and do not fully cover governance aspects such as the business ownership of data, data asset lifecycle management, and traceability or lineage that can be easily accessed by business users. Without having those data governance capabilities in place to keep IT, data, and business teams collaborating, it is very difficult to guarantee end-to-end protection of your and your customers’ data in the cloud. It is difficult to control what is not readily visible to users in their daily work.

T-Mobile suffered data breaches in three consecutive years. The biggest of these breaches cost T-Mobile 350 million dollars! The Federal Communication Commission (FCC) investigated T-Mobile and found several violations: failure to meet its legal duty to protect confidentiality of private information; impermissibly using, disclosing, or permitting access to private information without customer approval; failure to take reasonable measures to discover and protect against attempts to gain unauthorized access to private information; unjust and unreasonable information security practices; and making misrepresentations to customers about its information security practices. These incidents underline the critical importance of implementing GDPR controls and robust data governance.

There’s a clear case for implementing data protection and governance across hybrid on-prem and cloud environments. However, when you look at the immense cloud capability and the “as-a-service” platform features, software features and tools that it has spawned, that task can seem very difficult and challenging indeed. This explosion of functionality creates the opportunity to rapidly design truly unique and efficient data architectures for competitive advantage. The number of available options can make navigation and selection challenging and even overwhelming, as the infographic Figure 1 shows.

Figure 1 – Matt Turck’s “MAD” Landscape

You can have many of these tools in any combination with any cloud provider, which makes it very powerful but also difficult to navigate and determine the best choice. Despite advancements in modern applications, software alone, without a supporting operating model, is unlikely to solve enterprise data challenges. In fact, it can add to the challenge considerably, especially where access to the data is restricted by the API design and performance of SaaS software models.

Modern approaches to data migration

At FWD View (now part of Nagarro), recurring patterns across on-prem and cloud architectures have informed the development of a comprehensive package to assist organizations in implementing data governance and data privacy/protection to address the hybrid data environment challenge. We have created best practices and operating models alongside data protection and governance tool selection and suggestions. One scenario leverages Collibra and Perforce Delphix.

Firstly, here is a brief overview of both platforms:

  • Collibra offers a solid data intelligence platform to accompany your modern data stack and fill in most of the data governance gaps. Its focus on the business audience brings in much needed ownership and accountability to work together with data and business IT to make sure company standards are owned, understood, and enforced across your data cloud.
  • Perforce Delphix is a DevOps data platform that automates secure, compliant data delivery for on-premises and cloud environments. It combines advanced data virtualization and automated masking to protect sensitive information while enabling fast, high-volume data synchronization.

Our expertise in both platforms, using them to implement data protection and security best practices. Check out this case study where FWD View and Delphix helped with data protection to improve GDPR compliance.

Implementing and maintaining GDPR compliance requires a significant amount of effort, including the necessary manual input to describe business processes across your organization. Non-compliance exposes your organization to significant financial risks but maintaining GDPR controls manually can be an expensive and error prone exercise. To provide robust and efficient data protection across your on-prem and non-production environments, we have worked with our partners Collibra & Delphix to automate data protection and controls, significantly reducing the amount of manual effort required and enhancing the coverage and protection through automation.

Data Governance and Cloud Data Privacy Blog Illustration-02Figure 2 – Hybrid Data protection and control with Collibra and Perforce Delphix


This initiative has helped many of our customers across a variety of sectors ranging from financial to property management. In many cases, this has included an end-to-end migration process of the customer’s on-prem data stack. It operates both where your target architecture is cloud only or hybrid.

We observed a couple of common challenges that stood out:

  • Complexity of enforcing adequate data obfuscation policies across hybrid data platforms
  • Significant manual effort and lack of communication for GRC (governance, risk & compliance) and data teams.

This is exactly the reason why we decided to choose Collibra and Delphix to tackle this challenge.

To manage complexity at the organizational scale we are leveraging Collibra as a leading data governance platform to promote data culture, establish clear ownership and create communication processes across organizations. We use Collibra business lineage capability to provide a viewpoint on the data flow across the systems that need protection to make sure there are no gaps and policy lineage is clear (Figure 3 – Collibra business lineage). It becomes even more complex when we are taking into consideration robust development cycles and need to ensure compliance of development environments.

Data Governance and Cloud Data Privacy Blog Illustration-01

Figure 3 - Collibra business lineage

This is where Delphix plays a tremendous role. It is capable of automating customers’ development environments and doing it in a protected manner. Technical metadata from Delphix is shared with Collibra Catalog, where data stewards can run custom validation rules that were created for review if there is any technical metadata identified as sensitive or personal that is not yet covered by data masking rules. Every high-level data privacy policy would be aligned to technical metadata and implemented by Delphix via masking algorithms in the databases. Data Steward can push down data masking algorithms where masking algorithms are executed by Delphix jobs (Figure 4 – Delphix masking jobs).

 Figure 4 - Delphix masking jobs 

Previously, these processes could take days or weeks to assign responsibility, identify sensitive information, and create and execute appropriate rules. With Collibra and Delphix, this takes minutes and is executed in a self-service fashion.

Once all jobs are executed and sensitive data is protected, information from Delphix is synchronized with Collibra again. We created a Collibra customized dashboard to surface the aggregated data protection information and bring it to Chief Data Officers (CDO) or Chief Information Security Officers (CISO) . Responsible stakeholders can immediately access aggregated information, drill down to column and masking algorithms, review the state of data protection, and provide audit evidence when required. Beyond protecting data across hybrid environments and improving compliance with GDPR (CCPA, LGPD, etc.), organizations also achieve stronger data governance, and the benefits it brings include up to 50% faster time-to-insights, up to 30% lower operational costs due to better data quality, lower errors and reduced data discovery and wrangling time.

In order to achieve a stake where data policies can be requested, created and enforced automatically in self-service fashion, we have created a microservice between Collibra and Delphix to synchronize technical metadata and data discovery information and automatically pull and push policy information for information sharing and policy enforcement purposes. To help our customers with quick starting their automation of their data protection and migration, we have aggregated its knowledge, best practices and mentioned above integration into a “Masking Accelerator” package. As a part of this package, we help our customers:

  • Tailor the operating model of privacy and data governance teams to function in an automated, collaborative manner.
  • Do end-to-end installation and configuration of Masking Accelerator.
  • Configure Collibra and Delphix.
  • Facilitate rollout and onboard users to Collibra and Delphix

 

If you want to learn more about the “FWDview Masking Accelerator with Collibra and Delphix” please download our data sheet or reach out to jevgenijs.jelistratovs@nagarro.com or Gary (gary.hallam@perforce.com) our partner at Perforce Delphix.

Authors
Jevgenijs Jelistratovs
Jevgenijs Jelistratovs
Gary Hallam
Gary Hallam
connect
This page uses AI-powered translation. Need human assistance? Talk to us