A serverless file upload service I built with Node.js and AWS. Key features: - RESTful API with Express.js - AWS Lambda for metadata extraction - S3 storage with encryption - DynamoDB for metadata storage - API Gateway HTTP API deployment - Lambda authorizer with API key authentication - Comprehensive test suite (53 unit tests) - Security hardening (XSS, injection prevention) Tech stack I chose: - Node.js + Express - AWS S3, Lambda, DynamoDB, API Gateway - Multer for file uploads - wkhtmltopdf for PDF testing Author: Leonardo da Silva Calado |
||
|---|---|---|
| scripts | ||
| src | ||
| .api-endpoint | ||
| .env.example | ||
| .gitignore | ||
| API_GATEWAY_SUCCESS.md | ||
| app.json | ||
| apprunner.yaml | ||
| DEPLOYMENT.md | ||
| DEPLOYMENT_STATUS.md | ||
| package.json | ||
| README.md | ||
| SECURITY.md | ||
Nexton File Upload & Metadata Extraction Service
A serverless file upload service I built with Node.js and AWS. The service accepts file uploads via REST API, stores them in S3, and automatically extracts metadata using Lambda functions.
Architecture Overview
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Client │─────>│ Express API │─────>│ Amazon S3 │
│ Application │ │ (/upload) │ │ (Storage) │
└─────────────┘ └──────────────┘ └─────────────┘
│ │
│ │ S3 Event
V │ Trigger
┌──────────────┐ │
│ DynamoDB │ V
│ (Metadata) │<─────┌─────────────┐
└──────────────┘ │AWS Lambda │
^ │(Metadata │
│ │Extraction) │
│ └─────────────┘
┌──────────────┐
│ Express API │
│ (/metadata) │
└──────────────┘
Components I'm Using
- Express REST API - Handles file uploads and metadata queries
- Amazon S3 - File storage with encryption
- AWS Lambda - Event-driven metadata extraction
- DynamoDB - Metadata storage
- API Gateway - Serverless API deployment with Lambda authorizer
Features
- RESTful API for file uploads and metadata retrieval
- PDF and image file support (JPEG, PNG)
- Automatic metadata extraction (file size, type, PDF page count, text content)
- User-defined metadata with validation
- S3 encryption for secure file storage
- Event-driven processing with Lambda
- Unique file identifiers (UUID v4)
- API key authentication with Lambda authorizer
- Input validation and sanitization
Prerequisites
- Node.js >= 18.0.0 (tested with v24.11.1)
- AWS CLI configured with appropriate credentials
- AWS Account with permissions to create:
- S3 buckets
- DynamoDB tables
- Lambda functions
- IAM roles and policies
- Yarn package manager
Installation & Setup
1. Clone and Install Dependencies
cd /home/leonardo/projects/nexton_upload
yarn install
2. Configure Environment Variables
cp .env.example .env
Edit .env with your AWS configuration (values will be populated after setup script):
AWS_REGION=us-east-1
AWS_ACCOUNT_ID=your-account-id
S3_BUCKET_NAME=nexton-file-uploads
DYNAMODB_TABLE_NAME=nexton-file-metadata
LAMBDA_FUNCTION_NAME=nexton-metadata-extractor
PORT=3000
NODE_ENV=development
MAX_FILE_SIZE_MB=10
ALLOWED_FILE_TYPES=application/pdf,image/jpeg,image/png,image/jpg
3. Deploy AWS Infrastructure
Run the automated setup script to create all AWS resources:
./scripts/setup-aws-resources.sh
This script will:
- Create an S3 bucket with versioning, encryption, and private access
- Create a DynamoDB table with optimized indexes
- Create IAM roles with least-privilege permissions
- Deploy the Lambda function for metadata extraction
- Configure S3 event notifications to trigger Lambda
Update your .env file with the output from the setup script.
4. Start the API Server
npm start
# or for development with auto-reload:
npm run dev
The server will start on http://localhost:3000 (or your configured PORT).
API Documentation
Authentication
All API requests require authentication using an API key.
Include the API key in the x-api-key header:
# Get your API key
API_KEY=$(cat .api-key)
# Use it in requests
curl -H "x-api-key: $API_KEY" https://your-api-endpoint.com/...
Without valid API key, you'll receive:
{
"message": "Unauthorized"
}
Status: 401 Unauthorized
Health Check
GET /health
Check service status.
Response:
{
"status": "healthy",
"timestamp": "2025-12-01T10:00:00.000Z",
"service": "nexton-file-upload-service",
"version": "1.0.0"
}
Upload File
POST /upload
Upload a file with optional metadata.
Request:
- Content-Type:
multipart/form-data - Body:
file: File to upload (required)metadata: JSON string with user metadata (optional)
Example using curl:
# Load API key
API_KEY=$(cat .api-key)
# Upload file
curl -X POST http://localhost:3000/upload \
-H "x-api-key: $API_KEY" \
-F "file=@document.pdf" \
-F 'metadata={"author":"John Doe","expirationDate":"2025-12-31","description":"Important document"}'
Example using JavaScript:
const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('metadata', JSON.stringify({
author: 'John Doe',
expirationDate: '2025-12-31',
description: 'Important document'
}));
const response = await fetch('http://localhost:3000/upload', {
method: 'POST',
headers: {
'x-api-key': 'your-api-key-here'
},
body: formData
});
const result = await response.json();
console.log('File ID:', result.file_id);
Success Response (201):
{
"success": true,
"file_id": "123e4567-e89b-12d3-a456-426614174000",
"message": "File uploaded successfully. Metadata extraction in progress.",
"details": {
"fileName": "document.pdf",
"fileSize": 102400,
"contentType": "application/pdf",
"uploadedAt": "2025-12-01T10:00:00.000Z",
"status": "processing"
}
}
Error Response (400):
{
"success": false,
"error": "Validation failed",
"details": [
"File size exceeds maximum allowed size of 10MB"
]
}
Retrieve Metadata
GET /metadata/:file_id
Retrieve complete metadata for a file.
Example:
# Load API key
API_KEY=$(cat .api-key)
# Get metadata
curl -H "x-api-key: $API_KEY" \
http://localhost:3000/metadata/123e4567-e89b-12d3-a456-426614174000
Success Response (200):
{
"success": true,
"file_id": "123e4567-e89b-12d3-a456-426614174000",
"metadata": {
"fileName": "document.pdf",
"contentType": "application/pdf",
"fileSize": 102400,
"uploadedAt": "2025-12-01T10:00:00.000Z",
"processedAt": "2025-12-01T10:00:05.000Z",
"status": "completed",
"userMetadata": {
"author": "john-doe",
"expirationdate": "2025-12-31",
"description": "Important document"
},
"extractedMetadata": {
"fileSize": 102400,
"fileSizeFormatted": "100 KB",
"contentType": "application/pdf",
"type": "pdf",
"numberOfPages": 5,
"pdfInfo": {
"producer": "Adobe PDF Library 15.0",
"creator": "Microsoft Word",
"title": "Document Title"
},
"textSample": "This is the beginning of the document...",
"textLength": 5420,
"hasText": true,
"extractedAt": "2025-12-01T10:00:05.000Z"
},
"s3Location": "s3://nexton-file-uploads/uploads/123e4567-e89b-12d3-a456-426614174000"
}
}
Status Values:
processing: Lambda function is still extracting metadatacompleted: Metadata extraction completed successfullyfailed: Metadata extraction failed (seeerrorMessagefield)
Error Response (404):
{
"success": false,
"error": "File not found",
"file_id": "123e4567-e89b-12d3-a456-426614174000"
}
Testing
Run Unit Tests
npm test
Run Integration Tests
Ensure the API server is running and AWS resources are deployed:
npm start # In one terminal
npm test # In another terminal
Manual Testing
- Upload a PDF:
curl -X POST http://localhost:3000/upload \
-F "file=@test.pdf" \
-F 'metadata={"author":"Test User"}'
- Check metadata (wait a few seconds for Lambda processing):
curl http://localhost:3000/metadata/<file_id_from_upload>
- Verify in AWS Console:
- Check S3 bucket for uploaded file
- Check DynamoDB table for metadata
- Check CloudWatch Logs for Lambda execution
Security Measures I Implemented
-
Authentication & Authorization
- API Key Authentication - I secured all endpoints with Lambda authorizer
- IAM roles with least-privilege access
- S3 bucket with private access (no public reads)
- Lambda execution roles limited to specific resources
- API keys stored securely (
.api-keyfile excluded from git)
-
Data Protection
- S3 server-side encryption (AES-256)
- S3 versioning enabled for data recovery
- HTTPS-only API through API Gateway
-
Input Validation
- File type validation (MIME type + extension check)
- File size limits (default 10MB, configurable)
- Metadata sanitization to prevent injection attacks
- UUID validation for file IDs
-
Error Handling
- Sanitized error messages (no sensitive data exposed)
- Server-side logging for debugging
- Graceful degradation on service failures
Monitoring & Debugging
CloudWatch Logs
- Lambda Logs:
/aws/lambda/nexton-metadata-extractor - API Logs: Configure application logging as needed
Useful AWS CLI Commands
# Check Lambda function status
/snap/bin/aws lambda get-function --function-name nexton-metadata-extractor --region us-east-1
# View recent Lambda logs
/snap/bin/aws logs tail /aws/lambda/nexton-metadata-extractor --follow --region us-east-1
# List files in S3 bucket
/snap/bin/aws s3 ls s3://nexton-file-uploads/uploads/
# Query DynamoDB for file
/snap/bin/aws dynamodb get-item \
--table-name nexton-file-metadata \
--key '{"fileId":{"S":"<your-file-id>"}}' \
--region us-east-1
# Check DynamoDB table status
/snap/bin/aws dynamodb describe-table \
--table-name nexton-file-metadata \
--region us-east-1
Why I Chose These Technologies
Express.js
- In my opinion, it's the most straightforward framework for building APIs
- Great middleware ecosystem (I'm using multer for file uploads)
- Easy to understand and maintain
S3 for Storage
- Virtually unlimited scalability
- Built-in encryption and versioning
- Event notifications trigger my Lambda functions automatically
- Cost-effective
Lambda for Processing
- I only pay for actual compute time
- Auto-scaling without managing servers
- Perfect for event-driven architecture
DynamoDB for Metadata
- Fast response times
- Flexible schema - I can store different metadata for different file types
- Automatic scaling
API Gateway + Lambda Authorizer
- I chose this for serverless deployment
- Lambda authorizer lets me validate API keys on every request
- No servers to manage, scales automatically
Design Patterns I Used
- Event-Driven Architecture - S3 events trigger Lambda asynchronously
- Separation of Concerns - API, storage, and processing are decoupled
- Idempotency - Unique UUIDs prevent duplicate processing
- Graceful Degradation - API returns success even if Lambda fails
Challenges I Solved
Asynchronous Processing
The Problem: Metadata extraction takes time; I didn't want clients waiting.
My Solution:
- Upload returns immediately with
file_id - Status field indicates processing state
- Clients poll
/metadata/:file_idto check completion
Large File Uploads
The Problem: Memory constraints with large files.
My Solution:
- I set configurable file size limits (default 10MB)
- Using memory storage in multer for direct S3 upload (no disk I/O)
- Tuned Lambda memory allocation to 512MB
Concurrent Uploads
The Problem: Multiple uploads could cause race conditions.
My Solution:
- UUID v4 for globally unique file identifiers
- DynamoDB conditional writes prevent overwrites
- S3 object versioning for data protection
Security
The Problem: File uploads are a common attack vector.
My Solution:
- Strict MIME type validation
- File extension verification
- Metadata sanitization to prevent injection attacks
- S3 private access only
- IAM least-privilege principles
- API key authentication on all endpoints
Project Scope
File Types: I'm supporting PDF, JPEG, and PNG files.
File Sizes: I set a 10MB limit which I think is suitable for most documents.
Metadata Storage: User metadata is stored as key-value pairs in DynamoDB.
Scalability: The serverless design handles thousands of concurrent uploads since S3 and Lambda auto-scale.
Data Retention: No automatic expiration - files are kept indefinitely unless manually deleted.
Geographic Distribution: Single region deployment (us-east-1).
Deployment
I'm using API Gateway + Lambda for serverless deployment:
# Deploy or update the API
bash scripts/deploy-api-gateway.sh
# Add authorization (if not configured)
bash scripts/add-api-authorization.sh
See DEPLOYMENT_STATUS.md for complete deployment documentation.
Cleanup
To delete all AWS resources:
./scripts/cleanup-aws-resources.sh
Warning: This will permanently delete all uploaded files and metadata!
Project Structure
nexton_upload/
├── src/
│ ├── api/
│ │ └── server.js # Express API server
│ ├── lambda/
│ │ ├── metadataExtractor.js # Lambda function code
│ │ └── package.json # Lambda dependencies
│ ├── utils/
│ │ ├── s3.js # S3 operations
│ │ ├── dynamodb.js # DynamoDB operations
│ │ └── validation.js # Input validation
│ └── tests/ # Unit, integration, E2E tests
├── scripts/
│ ├── setup-aws-resources.sh # Infrastructure setup
│ ├── deploy-api-gateway.sh # API Gateway deployment
│ ├── add-api-authorization.sh # Add Lambda authorizer
│ └── cleanup-aws-resources.sh # Resource cleanup
├── package.json
├── .env.example
└── README.md
Author
Leonardo da Silva Calado