S3 Interview questions
Basic Level
Question 1: Access Denied Error — In production, users report "Access Denied" when accessing objects in an S3 bucket via a web app. Walk through your troubleshooting steps.
Answer: First, verify the bucket and object exist using the AWS CLI:
aws s3 ls s3://bucket-name --recursiveCheck the IAM identity used by the app with:
aws sts get-caller-identityand test permissions with the IAM policy simulator. Then examine bucket policy, ACLs, and S3 Block Public Access settings. Finally, confirm encryption settings if SSE-KMS is used — ensure the role/user has required KMS permissions (e.g., kms:Decrypt). This resolved a similar issue by fixing a missing kms:Decrypt permission.
Question 2: High S3 Costs Suddenly — Your team's S3 bill spiked 3x without increased usage. How do you investigate and optimize?
Answer: Use S3 Storage Lens or Cost Explorer to break down costs by prefix, storage class, and requests. Identify unneeded Standard storage and consider lifecycle policies or Intelligent-Tiering to auto-move infrequent data. Example outcome: combined with CloudFront OAI restrictions and tiering, costs dropped from $250 to $5/month. Consider enabling Requester Pays if partners are responsible for access costs.
Question 3: Enable Versioning Issue — After enabling versioning on a production bucket with 1M objects, delete operations fail. Explain why and fix.
Answer: Deletes create delete markers rather than removing object data, so object count appears to double. List versions with:
aws s3api list-object-versions --bucket my-bucketClean up using lifecycle rules such as "Permanently delete noncurrent versions after X days" and "Remove expired delete markers" to prevent storage bloat.
Intermediate Level
Question 4: Lifecycle Policy Not Transitioning — Logs in S3/logs/ prefix aren't transitioning to IA after 30 days per policy. Bucket has versioning. Troubleshoot.
Answer: Get the lifecycle config:
aws s3api get-bucket-lifecycle-configuration --bucket your-bucketWith versioning enabled, objects become current/noncurrent versions; ensure your lifecycle rules target current and/or noncurrent versions appropriately (e.g., transition noncurrent to IA at 30 days). Verify the rule filters the correct prefix ("logs/") and test in staging.
Question 5: Cross-Account Read Access — Dev account needs read access to prod S3 bucket in another account for CI/CD. How to set up securely?
Answer: In the prod account, create an IAM role granting s3:GetObject and s3:ListBucket and a trust policy that allows the dev account to assume the role. Add a bucket policy permitting that role ARN (or use aws:SourceAccount conditions). In the dev account, let your CI role/user assume that role. Prefer cross-account roles over sharing keys.
Question 6: Slow Downloads in App — Users complain of slow S3 object downloads >100MB during peak hours. Optimize performance.
Answer: Implement exponential backoff for retries. Use key prefix partitioning (e.g., date-based keys like 2026/01/13/) to distribute request load and avoid per-prefix request limits. Front S3 with CloudFront for edge caching. For large uploads/downloads, use multipart uploads and consider S3 Transfer Acceleration for global users. These steps reduced latency ~70% in past projects.
Question 7: Block Public Access Override — Legacy app broke after enabling Block Public Access; it relied on public bucket policy. Fix without disabling Block Public Access.
Answer: Since Block Public Access blocks ACLs/policies that allow public access, switch to CloudFront with Origin Access Control (OAC) and update the bucket policy to allow only CloudFront (or the CloudFront OAC) to access the bucket. Add policy conditions such as "aws:SecureTransport": "true" to enforce HTTPS. This preserves Block Public Access while allowing controlled access via CloudFront.
Advanced Level
Question 8: Replication Lag in CRR — CRR setup for compliance replicates 99% objects but lags >15min on deletes during high churn. Fix.
Answer: Enable S3 Replication Time Control (RTC) to get a 15-minute SLA (99.99%) and metrics/events. For deletes, ensure "Replicate delete markers" is configured if you need deletes replicated; otherwise manage deletions via lifecycle rules. If using KMS, ensure keys are valid/symmetric in both regions. Monitor via CloudWatch and adjust replication rules/tags.
Question 9: Encryption Mismatch KMS — App fails uploading to S3 with SSE-KMS due to "Access Denied" on kms:GenerateDataKey*. Prod uses custom KMS key.
Answer: Grant the IAM role kms:GenerateDataKey and kms:Decrypt in the KMS key policy (or via IAM policies referencing the key ARN). Ensure the bucket policy and upload requests align with "s3:x-amz-server-side-encryption": "aws:kms" and the correct key ARN. Audit with Access Analyzer to detect gaps.
Question 10: Batch Operations Fail Throttling — S3 Batch copy job on 10M objects fails with SlowDown. Prod env has concurrent jobs. Troubleshoot.
Answer: SlowDown indicates exceeding request rate. Split the job into smaller batches (by prefix), add exponential backoff when generating manifests and during retries, and poll job status less aggressively (e.g., every 5 min instead of every few seconds). Ensure manifest CSV is UTF-8 without BOM and keys <1024 chars. Retry with ~1M object batches.
Question 11: Cost Optimization Multi-Tier Logs — Processing 10TB daily logs; costs high from Standard + retrievals. Design lifecycle for optimization.
Answer: Example lifecycle rules:
Transition >30 days to IA (or Intelligent-Tiering)
Transition >90 days to Glacier Flexible
Transition >365 days to Glacier Deep Archive
Expire after 7 years
Separate small objects (<128 KB) into a different prefix for specialized handling. Use S3 Analytics/Storage Lens to refine policies. This approach saved ~60% vs Standard in a production case.
Question 12: Secure Static Site Hosting — Host React app on S3 with custom domain, HTTPS, invalidate cache on deploy, restrict to CloudFront.
Answer: Host the build artifacts in an S3 bucket (no public ACLs), create a CloudFront distribution with an ACM certificate, and configure Origin Access Control (OAC) so only CloudFront can access the bucket. Use CloudFront behaviors and custom error pages to support SPA routing (serve index.html on 404). Invalidate CloudFront cache on deploy via:
Block direct S3 access entirely via bucket policy.
Question 13: Versioning Cleanup Explosion — Bucket object count jumped 2x after mass deletes; noncurrent versions piling up. Prod impact? Clean.
Answer: Configure lifecycle rules:
NoncurrentVersionExpiration after 30 days
ExpiredObjectDeleteMarker: true
For one-time cleanup, use S3 Inventory + S3 Batch Delete (manifest from inventory CSV). Query inventory with Athena to plan deletes; this avoided large bills in a production case.
Additional Scenarios
Question 14: VPC Endpoint for S3 Access — EC2 in private subnet can't reach S3; public internet blocked. Costs rising from NAT Gateway.
Answer: Create a VPC Gateway Endpoint for S3 and add it to the subnet route tables (free). Use an endpoint policy to restrict access (e.g., condition "aws:SourceVpce"). For KMS, create Interface Endpoints if needed. This reduced NAT Gateway egress costs significantly.
Question 15: Multipart Upload Cleanup — Prod ETL leaves orphaned multipart uploads, consuming space. How to detect/clean?
Answer: Use a lifecycle rule:
Use S3 Inventory + Athena to find multipart uploads older than X days and then S3 Batch Abort for large-scale cleanup. This reclaimed substantial space in production.
Question 16: S3 Analytics for Perf — High latency in EU region for game data. Use analytics to optimize.
Answer: Enable S3 Storage Class Analysis to export usage patterns and identify infrequently accessed data. Transition those objects to cheaper storage classes or use regional caching (CloudFront) to reduce latency. Observed ~50% reduction in peak latency after changes.
Question 17: Requester Pays for Sharing — Third-party submits 1TB data to your bucket daily, high transfer costs. Mitigate.
Answer: Enable Requester Pays on the bucket so requesters pay GET/PUT costs, and document requirements for partners. Combine with Transfer Acceleration if partners need faster uploads. This shift reduced your monthly bill significantly.
Question 18: Object Lock Compliance — Regulatory req: logs immutable 7y for WORM. Implement for new/existing.
Answer: Enable Object Lock in the bucket and use GOVERNANCE mode for testing, then set default retention to 7 years. For existing objects, use S3 Batch Copy with object lock retention options to apply retention. Use Compliance mode after audits to prevent deletions even by root.
Question 19: Metrics Alarm for Throttling — Prod app throttled on S3 PUTs during spikes. Proactive monitoring?
Answer: Create CloudWatch alarms on ThrottledRequests metrics (e.g., >5 in 5 min) and send notifications via SNS/PagerDuty. Parse access logs with metric filters for 503 SlowDown. Combine alarms with application changes (prefix partitioning) to mitigate throttling.
Question 20: Disaster Recovery Test Fail — CRR to DR region lags; test restore fails on encrypted objects. Fix.
Answer: Use symmetric multi-region KMS keys and ensure replication grants are present in both regions. Enable Replication Time Control (RTC) for SLA and test via controlled chaos (delete subset in source and verify destination within 15 min). Consider MFA Delete for critical data if required.
References / Further reading:
