Multiple file upload into Amazon s3 bucket

Prince Francis
3 min readJul 4, 2019

--

If we want to upload hundreds of files into Amazon s3 bucket, there are 3 options.

  1. Upload asynchronously — it may require a lot of CPU and network usage. We should limit maximum number of threads to balance the performance.
  2. Upload one by one — It may require a lot of time, if there is 1000 files, it will take minimum 250 minutes to complete the uploading.
  3. Zip all the files and upload into s3, then extract it — We should use Amazon Lambda services to get it done.

The following are the steps to create and configure Lambda service to unzip into the s3 buckets.

1. create an IAM role

Open the IAM console and select Lambda as below and click on ‘Nexst’

Search ‘s3’ in the policy list and select ‘AmazonS3FullAccess’ and search ‘lambda’ and select ‘AWSLambdaBasicExecutionRole’ and click on the next button 2 times. Now you will get a page as follows. There you enter a name for the role.

Now click on ‘Create role’ button.

Now you have successfully created a role.

2. Create Lambda function

Open the Lambda console. There you click on the ‘Create function’ button.

Now give a name for these function, select language as Python.

In the permission section select the ‘Use and existing role’ and now select the role created in step-1.

click on the create function button.

3. Add trigger

Now click on the add trigger button

There you select ‘S3’ then select your bucket. Select ‘PUT’ as event type.

Give a prefix — the folder name in the bucket to which we upload the zip file

Give suffix as .zip

Now click on the ‘Add’ button.

4. Write the code in the lambda function

import boto3
import string
import random
import os
import zipfile
def lambda_handler(event, context):
s3_resource= boto3.resource('s3')
s3_client= boto3.client('s3')
bucketName = event['Records'][0]['s3']['bucket']['name']
bucket = s3_resource.Bucket(bucketName)
zip_key = event['Records'][0]['s3']['object']['key']


chars=string.ascii_uppercase + string.digits
randomName = ''.join(random.choice(chars) for _ in range(8))
tmpFolder = '/tmp/' + randomName + '/'
os.makedirs(tmpFolder)
unzipTmpFile= randomName + '.zip'
attachmentFolder=''
extension = ".zip"
targetDirectory = 'folder-to-which-files-to-be-extracted'

s3_client.download_file(bucketName, zip_key, tmpFolder + unzipTmpFile)
dir_name = tmpFolder
os.chdir(dir_name)

for item in os.listdir(tmpFolder):
if item.endswith(extension):
file_name = os.path.abspath(item)
zip_ref = zipfile.ZipFile(file_name)
zip_ref.extractall(dir_name)
zip_ref.close()
os.remove(file_name)

mrssFiles = []
# r=root, d=directories, f = files
for r, d, f in os.walk(tmpFolder):
for file in f:
mrssFiles.append(os.path.join(r, file))
for file_name in mrssFiles:
s3_client.upload_file(file_name, bucketName, targetDirectory + '/' + file_name.replace(tmpFolder, '', 1))
os.remove(file_name)
return {
'statusCode': 200,
'body': zip_key
}

5. Upload a zip file into the folder ‘bulk-upload’ (the one we mentioned in the Trigger section), you can see the extracted files the directory mentioned in the code.

--

--

Prince Francis
Prince Francis

Written by Prince Francis

Providing simple solutions for complex problems.

Responses (4)