案件:
S3 存储桶中有一个很大的 zip 文件,其中包含大量图像。有没有一种方法无需下载整个文件即可读取元数据或知道 zip 文件中有多少个文件?
当文件是本地文件时,在 python 中,我可以将其作为 zipfile() 打开,然后调用 namelist() 方法,该方法返回内部所有文件的列表,我可以对其进行计数。但是,当文件驻留在 S3 中而无需下载时,不确定如何执行此操作。另外,如果 Lambda 可以实现这一点那就最好了。
我认为这会解决你的问题:
import zlib
import zipfile
import io
def fetch(key_name, start, len, client_s3):
"""
range-fetches a S3 key
"""
end = start + len - 1
s3_object = client_s3.get_object(Bucket=bucket_name, Key=key_name, Range="bytes=%d-%d" % (start, end))
return s3_object['Body'].read()
def parse_int(bytes):
"""
parses 2 or 4 little-endian bits into their corresponding integer value
"""
val = (bytes[0]) + ((bytes[1]) << 8)
if len(bytes) > 3:
val += ((bytes[2]) << 16) + ((bytes[3]) << 24)
return val
def list_files_in_s3_zipped_object(bucket_name, key_name, client_s3):
"""
List files in s3 zipped object, without downloading it. Returns the number of files inside the zip file.
See : https://stackoverflow.com/questions/41789176/how-to-count-files-inside-zip-in-aws-s3-without-downloading-it
Based on : https://stackoverflow.com/questions/51351000/read-zip-files-from-s3-without-downloading-the-entire-file
bucket_name: name of the bucket
key_name: path to zipfile inside bucket
client_s3: an object created using boto3.client("s3")
"""
bucket = bucket_name
key = key_name
response = client_s3.head_object(Bucket=bucket_name, Key=key_name)
size = response['ContentLength']
eocd = fetch(key_name, size - 22, 22, client_s3)
# start offset and size of the central directory
cd_start = parse_int(eocd[16:20])
cd_size = parse_int(eocd[12:16])
# fetch central directory, append EOCD, and open as zipfile!
cd = fetch(key_name, cd_start, cd_size, client_s3)
zip = zipfile.ZipFile(io.BytesIO(cd + eocd))
print("there are %s files in the zipfile" % len(zip.filelist))
for entry in zip.filelist:
print("filename: %s (%s bytes uncompressed)" % (entry.filename, entry.file_size))
return len(zip.filelist)
if __name__ == "__main__":
import boto3
import sys
client_s3 = boto3.client("s3")
bucket_name = sys.argv[1]
key_name = sys.argv[2]
list_files_in_s3_zipped_object(bucket_name, key_name, client_s3)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)