GDPR Compliance Through Multi-Region Architecture: An Engineering Deep Dive
Starting this month, Gleen AI customers can decide where their customer data resides in order to comply with regulatory requirements, particularly targeting GDPR support in the European region. In this engineering post, we will discuss how we built this GDPR support from the ground up, detailing the challenges faced and the engineering decisions made.
Architecture Overview
Before this implementation, Gleen AI's infrastructure was primarily hosted in the US West regions, reflecting the geographical origins of our founding team. To support GDPR requirements, we needed to re-architect our system to allow data operations within the EU for European users.
Key Requirements and Challenges
- Data Residency: Ensure compute and data operations for European users occur within EU regions.
- Cross-Region Authentication: Maintain seamless user experience across regions (for admins only).
- External Integrations: Handle OAuth flows and webhooks across multiple integrations.
- Data Consistency: Ensure unique identifiers and prevent data duplication across regions.
Engineering Decision
Option 1: Data Segregation at Storage Level
Our initial approach was to separate data at the storage level by using different database connections and employing a load balancer to direct traffic to either EU or non-EU regions. While this seemed straightforward at first, we quickly realized it would lead to code complexity. The need to determine viewer context in all flows to switch connections and run custom privacy checks made this option less appealing.
Option 2: Separate Data Centers with Replica Production Environments
We ultimately chose to create separate data centers with complete replicas of our production environment, including new databases and services. This clean-slate approach allowed us to treat each region as an independent production environment, simplifying our deployment and management processes.
Overcoming the Integration Hurdle
The biggest challenge with our chosen approach was managing external integrations. Many of our integrations, such as Zendesk, Freshdesk, and Shopify, were designed to send webhooks to a single domain. To address this, we introduced the concept of edge server.
Edge Servers
To solve the problem of redirection, we introduced the concept of an edge server and redirected all external integrations' URLs through these edge servers. The role of each edge server was to copy the request and multicast it to all environments—US, EU, or any additional regions we might add in the future. This solution addressed our problem of having multiple regions, with the edge server sending the requests to each region.
Multiple Entry Points and Shadow Identifiers
To provide flexibility for administrators, we enabled login from any endpoint (app.gleen.ai or eu.gleen.ai). However, this created a challenge: how to maintain session information across regions without violating GDPR principles?
Our solution was to introduce "shadow identifiers" - partial admin or bot profile representations that don't contain actual user information. These shadow identifiers maintain session continuity across regions, allowing customers to seamlessly switch between US and EU environments.
Cross-Region Session Management
To maintain sessions across regions, we implemented a Redis queue system. While this initially introduced latency for EU region access, we mitigated this by adding a secondary replica that receives events from the primary's heartbeat. This approach reduced the latency to a one-time 300-400ms delay during cookie creation, with subsequent accesses using the EU-hosted secondary replica.
Simplifying Complex Integrations: The Shopify Case Study
Our Shopify integration presented a unique challenge due to its multiple setup pathways - direct installation from Shopify or integration from an existing bot profile, each with variations based on user login status. We addressed this by creating a streamlined product experience that guides users through region selection and bot profile creation as needed.
Conclusion
Building GDPR support for Gleen AI involved overcoming significant technical challenges and making crucial engineering decisions. By introducing edge servers, managing unique IDs, and ensuring seamless user experiences across regions, we have set a robust foundation for compliance and global operations. This initiative not only supports GDPR compliance but also enhances our commitment to respecting user privacy and data sovereignty across all regions.