AWS: Converting JSON file into CSV
Hello, in this article, let us see how we can convert JSON data uploaded to S3 bucket into CSV file using Lambda.
AWS Services used:
- S3
- Lambda
- AWS Identity and Access Management
GitHub Repository:
Code files and required contents could be found at https://github.com/prashanthvinay/Automation_convert-json-to-csv
Why do we need to convert JSON data into CSV?
To understand the motivation behind this conversion, let's see the differences between a JSON and CSV file.
JSON: 1. Ideal for exchanging structure data. 2. Supports hierarchical relational data. 3. Larger file size. 4. Less secure. 5. Easily scalable and suitable for large datasets.
CSV: 1. Ideal for storing tabular data in a delimited text file. 2. Doesn’t support hierarchical structure. 3. More compact. 4. More secure. 5. Convenient for small datasets.
When there arises a need where having the data in CSV format is more suitable to achieve a business requirement, we execute this process to convert JSON data into CSV format.
Steps:
- First, let us create a Lambda function with runtime as Python 3.9 whose code could be found at GitHub repository (json-to-csv.py). Mind that you could use the default execution role created during function creation to which we are going to add further permissions later.
- Now, increase the execution time of Lambda function to 1 minute. Also, as we are using Pandas library in the code to read the JSON data, we should be adding an AWS layer called AWSSDKPandas-Python39 to the function.
- Now, let us create a new S3 bucket which stores the JSON and corresponding CSV files generated.
- Inside the newly created S3 bucket, create two folders: json/ into which the JSON files would be uploaded and csv/ in which the converted CSV files would be generated.
5. Now, in order to invoke the Lambda function on the upload of JSON files into S3 bucket folder, configure an Event notification on S3 bucket. To configure this, go to S3 bucket > Properties > Event notifications > Create event notification and set the fields as shown in the images below. Please select the lambda function created in step# 1 as destination.
6. Now, for Lambda function to read JSON files from S3 and create corresponding CSV files back into S3 bucket folder, you can create a custom policy with only required S3 permissions or optionally attach AmazonS3FullAccess managed policy to the execution role of Lambda function.
7. Finally, in order to test the setup, upload a JSON file whose sample file could be found at GitHub repository (Countries.json) to json/ folder of S3 bucket and verify that a corresponding CSV file would be generated in csv/ folder of the same S3 bucket.
Architecture Diagram: