AWS: “Hello World” implementation of DMS
AWS Database Migration Service (DMS) is one of the important skillsets to have for any Data Engineer working on migrating data in or out of AWS. In this project, let me explain you on how to use DMS for one of the frequent industry use-case of migrating data from on-premises MySQL to AWS RDS MySQL.
Architecture:
Preparing Source Database:
- As this is a migration use-case, it should have a source database, as already mentioned above should be an on-premises MySQL database. However, as we can’t “literally” have this on-premises database for this project, we are going to mimic the same by provisioning a windows EC2 instance and installing MySQL client on top of the instance.
- For simplicity, I have deployed this EC2 instance in a public subnet having direct connectivity to the internet.
- Make sure the security group associated with this instance has the following two inbound rules as shown in the below image. To elaborate, this SG has an inbound rule for RDP restricted to my IP address and a self-inbound rule for all TCP connections.
4. Once the instance is successfully provisioned, RDP into the instance and download Chrome from Internet Explorer by disabling Settings > Internet Options > Security > Internet > Enable Protected mode option and adding google.com to Trusted Sites.
5. Now, from Chrome download and install MySQL client onto the EC2 instance (just a google search would give you the required downloads webpage). Please note that while installing MySQL, please set Root user password and add an Admin user by configuring the password which would be used to connect to the database through MySQL workbench.
6. In order to get the test data to load onto the source database, from Chrome browser in the source EC2 instance, go to the GitHub repository at https://github.com/datacharmer/test_db, download the code files and extract them to a directory. After extracting the files, go to employees.sql file and update the absolute path for all the dump files to be loaded as shown in the image below.
7. In order to load the files to the database, go to C:\ > Program Files > MySQL > MySQL Server > bin and open command terminal at this location and execute below commands:
mysql -u root -p
# Enter password of root user
# Now, the control would be passed to mysql shell.
# Execute the script as shown below
mysql> source C:/Users/Downloads/test_db-master/test_db-master/employees.sql
After above commands are successfully executed, open MySQL workbench and verify that employees database has been populated and execute below commands to verify if the records are present in the tables.
select * from employees.departments;
select * from employees.employees;
Now, we have the data on source database ready to be migrated to target database.
Preparing Target Database:
- As a target database, let’s create an RDS database with latest MySQL engine installed for this migration.
- On the AWS DMS dashboard, click Create database and follow the fields navigation by providing inputs. Here, for the target database, I have selected the same Security Group as that of source EC2 instance for easier demonstration such that the source instance is able to connect to this database. However, in real-world use-case this is not mandatory, and one can create a separate Security Group for databases with appropriate traffic and ports open.
- Now, on the source EC2 instance, open MySQL workbench and try connecting to target RDS database by providing hostname, username and password. Validate that you are able to successfully connect to target database.
As of now, we have both source and target databases ready for migration.
Creating facilitator between Source and Target, the Replication Instance:
- Replication Instance is just an EC2 instance with replication software installed on it by AWS.
- Creating the replication instance is fairly straightforward by clicking on Create replication instance on DMS dashboard and providing appropriate inputs for each field.
- Here as well, do not forget to select the same Security Group as that of source EC2 instance for the replication instance.
Creating Source and Target Endpoints:
- In order to perform data migration using DMS, we should create endpoints for source and target databases each.
- On the DMS dashboard, navigate to Endpoints and create both source and target endpoints by selecting/providing appropriate values for engine, server name, username and password fields (Source Endpoint image is shown below, create a similar one for Target as well).
3. Once both the endpoints are created, you can test the connection by selecting Connections tab and clicking Test Connections as shown below.
Creating Database Migration Task:
- The database migration task acts as glue which binds all the components of DMS together. On the DMS dashboard, navigate to Database migration tasks section and select Create task.
- While creating the task, select the already created Replication Instance, Source and Target Endpoints and choose Migrate existing data as Migration type.
- For Selection rules, add a new rule by selecting schema as Employees as we need to migrate all the tables of this schema from source to destination.
- Keep default for all other settings and create the task.
- Data migration would get triggered automatically after the task creation (as the same default option is chosen). Once, the load gets completed, on MySQL workbench, connect to target RDS instance and verify that the data has been migrated on to it.