top of page

AWS to Snowflake Migration

Problem:

  • Needed to migrate weather data from AWS to Snowflake to enable centralized data warehousing, reporting, and improved analytics. 

​​

Solution:

  • Data Ingestion: Utilized AWS Lambda to process incoming CSV data and store it in DynamoDB, triggering downstream data pipelines.

  • Data Migration: Built an event-driven pipeline using DynamoDB Streams, AWS S3, and Snowpipe to automatically migrate data from AWS to Snowflake.

  • Data Delivery: Ensured fault-tolerant data delivery using AWS SQS for notifications and data delivery confirmation.

  • Impact: Reduced data latency by 30%, enabling faster and more accurate data availability in Snowflake for reporting.

DB-Migration-Considerations.png

System Architecture

  • Data Source: API sales data being ingested on daily basis to AWS Lambda for processing.

  • Data Processing:

    • Lambda Function #1: Ingested and validated data before storing it in DynamoDB.

    • DynamoDB Streams: Triggered a second Lambda function when new records were inserted.

  • Data Migration:

    • Lambda Function #2: Captured the DynamoDB Stream event and copied data to S3.

    • S3 Bucket: Acted as the staging area for Snowflake ingestion.

  • Data Load into Snowflake:

    • Snowpipe: Automatically ingested data from S3 into Snowflake tables.

      • Need a Table to Copy into​

      • Need a External Stage Object

        • This needs an Integration Object and File Format.​

      • Need to Create an Event Notification in AWS S3 bucket.​

    • External Stage: Configured Snowflake storage integration to pull data from S3.

  • Data Delivery:

    • AWS SQS (Simple Queue Service): Sent confirmation messages for successful data load, ensuring fault-tolerant and guaranteed delivery.

system_architecture.png

Video Walkthrough

In this video, I walk through the full migration pipeline, including:

  • Setting up AWS Lambda to ingest CSV data and push it to DynamoDB.

  • Configuring DynamoDB Streams to trigger real-time data processing.

  • Writing a second Lambda Function to copy data from DynamoDB to S3.

  • Using Snowpipe to automatically ingest data from S3 into Snowflake.

  • Demonstrating how Snowflake’s dashboard visualizes the incoming data in near real-time.

Conclusion

Working on the AWS to Snowflake Migration Project provided me with invaluable experience in real-time data ingestion, cloud-to-cloud migration, and data warehousing. By leveraging DynamoDB Streams, Lambda, S3, and Snowpipe, I was able to successfully reduce data latency by 30%, enabling faster access to data for reporting.

​

This project strengthened my understanding of:

  • Building real-time data pipelines.

  • Integrating AWS services with Snowflake.

  • Ensuring fault-tolerant and automated data migration.

 

This project closely mirrors real-world cloud migration tasks, significantly enhancing my data engineering skill set.

bottom of page