Using Lambda Function to read data from S3 Bucket on AWS

In this post we will see how we automatically trigger a lambda function to read data from s3 bucket using cloud watch logs

Pre-requisites
- AWS account

STEP 1
create an Iam role
we will create an iam role and attach this permission

Login to your aws management console and on the navigate to IAM and on the sidebar click role on the iam console
then create a new role
under the use case, Choose Lambda

create role

click on next then add permissions, Look for the permissions stated below and click on next

  • AmazonS3FullAccess

  • AWSLambdaBasicExecutionRole

  • AWSXRayDaemonWriteAccess

  • AmazonDMSCloudWatchLogsRole

  • CloudWatchEventsFullAccess

Enter the role Name i would use “s3-lambda-role”and then create role

Once you have create the role

STEP 2

create the bucket, this bucket is used to trigger your lambda function and read any csv data file sent to your bucket.

Navigate to the search and type S3 then click

then click Create Bucket

add the name “pharma-data-logs”

leave all configurations as default and click on Create

once the bucket has been created

STEP 3

we create the lambda function, using the lambda console

  • Navigate to the lambda console and open the functions page

  • click on “Author from scratch”

  • name the functions —  “s3-datalogs-read-lambda’’

  • click on runtime and choose the runtime to use, select python and choose version 3.9 as the runtime “this is latest version 3.9 as at the timing of writing”.

  • the architecture is x86_64

  • then Change the default execution role and choose an execution role you created. click “use an existing role” and choose the role ‘s3-lambda-role’

then click on create

Once the function is created, we need to set up a trigger using Amazon S3. To create a trigger:

  • Navigate to the configuration console, select ‘Source’, and choose ‘S3’ as the trigger source.

  • Select the bucket name ‘Pharma-data-logs’.

  • For event types, choose ‘All object create events’. This setting allows the Lambda function to be triggered by any actions such as PUT, POST, or DELETE, whenever a file is uploaded to the bucket.

  • For the suffix, use the default setting, which can help determine specific criteria for triggering the function.”

Now we create the code to run the function, replace the default code with the code of choice.

import json
import boto3
import csv
import io

s3Client = boto3.client('s3')  #this is used to call the object in the bucket

def lambda_handler(event, context):

    # here it will check the event records the first one 0 then it checks the s3 and find the bucket and the name using the event name 
    bucket = event['Records'][0]['s3']['bucket']['name']  
    key = event['Records'][0]['s3']['object']['key']

    print(bucket)
    print(key)

    #determine the object
    response = s3Client.get_object(Bucket=bucket, Key=key)

    #Process it
    data = response['Body'].read().decode('utf-8')
    reader = csv.reader(io.StringIO(data))
    next(reader)
    for row in reader:
        print(str.format("SKU = {}, Product = {}, Location = {}", row[0], row[1], row[2]))

Since we aren’t importing pandas modules we wont be adding layers, the layers can be used to add SDKpandas — pythons

now we click on deploy to save it.

Now we can upload the CSV file into the s3 bucket which will trigger the lambda function and it will display the data in your cloud watch logs

move to S3 console and upload the file which contains list of pharmaceutical products

then click on add files and upload

once the files are uploaded it will trigger the lambda function which will be executed and once you navigate to your lambda console click on monitor and click on view Cloud watch logs then you can now check the logs files on the logs window.

IN SUMMARY

AWS Lambda is a serverless compute service that allows you to execute code via a lambda function without the need to provision or manage servers. Using Python as our runtime, Lambda runs the code on a high-availability compute infrastructure and manages all the administrative tasks associated with the compute resources, including server and operating system maintenance, automatic scaling, and logging.

Here, we have used Amazon S3 to trigger real-time data processing in Lambda via event-based processing. This is beneficial for automating business tasks in the pharmaceutical industry that do not require a constantly running server or a scheduled job to manage the infrastructure.