Episode 3 — NodeJS MongoDB Backend Architecture / 3.7 — Handling Files with Express

Interview Questions: Handling Files with Express (Episode 3)

How to use this material (instructions)

Read 3.7.a through 3.7.f.
Answer aloud, then compare below.
Pair with 3.7-Exercise-Questions.md.

Beginner Level

Q1: What is multipart/form-data and why is it required for file uploads?

Why interviewers ask: Tests understanding of HTTP encoding types and why standard body parsers fail.

Model answer:

multipart/form-data is an HTTP content encoding type that splits the request body into multiple parts, separated by a boundary string. Each part can carry either text or binary data with its own Content-Type header. It is required for file uploads because the other two common encodings -- application/json (parsed by express.json()) and application/x-www-form-urlencoded (parsed by express.urlencoded()) -- can only carry text data. Binary file content must be sent as a separate part with its own metadata. In HTML forms, you enable this by setting enctype="multipart/form-data" on the <form> tag. Without it, the browser sends the file input's value as a plain filename string rather than the actual file data.

Q2: What is Multer and how does it fit into the Express middleware chain?

Why interviewers ask: Tests knowledge of the primary file upload library in Express.

Model answer:

Multer is an Express middleware built on top of the busboy streaming parser. It intercepts multipart/form-data requests, parses the boundary-separated parts, extracts text fields into req.body, and extracts files into req.file (single upload) or req.files (multiple uploads). Multer is applied as route-level middleware, not globally, because only specific routes need file handling. It provides four upload methods: single(fieldName) for one file, array(fieldName, maxCount) for multiple same-field files, fields([...]) for multiple named fields, and none() for text-only multipart forms. Multer also accepts a configuration object for storage engine, file size limits, and a file filter function.

Q3: What is the difference between Multer's disk storage and memory storage?

Why interviewers ask: Tests understanding of storage trade-offs in file handling.

Model answer:

Disk storage (multer.diskStorage()) streams the file directly to the filesystem. You configure a destination function (where to save) and a filename function (what to name the file). The uploaded file is available via req.file.path and req.file.filename. It is suitable when you need to keep files on the server permanently or process them with tools like Sharp or ffmpeg.

Memory storage (multer.memoryStorage()) holds the entire file as a Buffer in RAM at req.file.buffer. No file is written to disk. This is ideal when you plan to immediately upload the buffer to a cloud service like Cloudinary, or when performing in-memory validation (magic bytes check). The risk is memory exhaustion if many large files are uploaded concurrently -- always pair memory storage with strict limits.fileSize.

Q4: How do you handle Multer errors (e.g., file too large) in Express?

Why interviewers ask: Tests error-handling patterns in Express middleware.

Model answer:

Multer throws MulterError instances for known issues like LIMIT_FILE_SIZE, LIMIT_UNEXPECTED_FILE, and LIMIT_FILE_COUNT. The fileFilter callback can also throw custom errors. Both must be caught by an Express error-handling middleware (four parameters: err, req, res, next). You check if (err instanceof multer.MulterError) to identify Multer-specific errors, then switch on err.code to return appropriate messages. Custom errors from fileFilter are regular Error instances. A production-grade handler distinguishes between Multer errors, file-filter errors, and unexpected server errors, returning 400 for client mistakes and 500 for server issues.

Intermediate Level

Q5: How do you validate that an uploaded file is actually the type it claims to be?

Why interviewers ask: Tests security awareness -- MIME spoofing is a real attack vector.

Model answer:

File validation should happen at three levels. Level 1: Check the file extension in the fileFilter callback. This is the weakest check because extensions are trivially renamed. Level 2: Check file.mimetype in fileFilter. This is better but still spoofable -- the MIME type is sent by the client and can be forged. Level 3 (most reliable): After the file is received (in memory or on disk), read the first few bytes -- the magic bytes or file signature -- and compare them against known patterns. For example, JPEG files start with FF D8 FF, PNG files start with 89 50 4E 47. The file-type npm package automates this by reading the buffer and returning the detected MIME type. If the detected type does not match the claimed type, reject the file and delete it from disk if it was already saved.

Q6: Explain the production upload pipeline: client to cloud to database.

Why interviewers ask: Tests understanding of real-world architecture for file handling.

Model answer:

The standard production pipeline has seven steps. (1) The client sends the file as multipart/form-data via a POST or PUT request. (2) Multer with memory storage receives the file as a Buffer in RAM -- no disk write. (3) Server-side validation runs: MIME type check, magic bytes verification, and file size enforcement. (4) The buffer is streamed to a cloud service like Cloudinary using upload_stream(). (5) The cloud service processes the image (resize, optimize, CDN cache) and returns a response containing secure_url and public_id. (6) The application stores only the URL and public_id in the database (e.g., MongoDB via Mongoose) -- never the file itself. (7) The API returns the cloud URL to the client, which uses it directly for rendering. This architecture scales across multiple servers, survives redeployments, and leverages CDN caching.

Q7: When would you use multer-storage-cloudinary vs the manual buffer-to-Cloudinary approach?

Why interviewers ask: Tests practical decision-making between convenience and control.

Model answer:

multer-storage-cloudinary is a Multer storage engine that uploads directly to Cloudinary during the parsing phase. The file goes straight from the incoming stream to Cloudinary's API -- you configure it once, and req.file.path contains the Cloudinary URL. It is simpler to set up and produces cleaner route handlers.

The manual approach uses memory storage, receives the buffer, then calls cloudinary.uploader.upload_stream() in the route handler. This gives more control: you can validate magic bytes on the buffer before uploading, apply conditional transformations, choose different cloud folders based on request data, retry on failure, or upload to multiple services.

Rule of thumb: Use multer-storage-cloudinary for simple projects where you trust the fileFilter for validation. Use the manual approach when you need magic-bytes validation, conditional logic, or multi-cloud support.

Q8: What are Cloudinary URL-based transformations and why are they valuable?

Why interviewers ask: Tests knowledge of cloud image optimization patterns.

Model answer:

Cloudinary allows you to transform images by inserting parameters into the URL path between /upload/ and the public_id. For example, w_200,h_200,c_fill,g_face resizes to 200x200, fills the frame, and centers on the detected face. q_auto,f_auto lets Cloudinary pick the optimal quality level and format (WebP for Chrome, JPEG for Safari). These transformations are generated on-the-fly on the first request and then cached on the CDN globally. The key advantage is that you upload the original high-resolution image once and generate any number of variants purely by changing the URL -- no re-upload, no server-side processing, no storage of multiple sizes. This is critical for responsive design where different devices need different image dimensions.

Advanced Level

Q9: How would you handle file upload for a horizontally scaled application behind a load balancer?

Why interviewers ask: Tests architecture thinking at scale.

Model answer:

In a horizontally scaled setup, requests hit different server instances via a load balancer. If files are stored on local disk, a file uploaded to Server A is invisible to Server B -- subsequent requests for that file may 404. The solution is to never store files on the application server.

Option 1 (recommended): Use memory storage and upload directly to a cloud service (Cloudinary, S3). All servers reference the same cloud URLs. No shared filesystem needed.

Option 2: Use a shared filesystem (NFS, EFS on AWS). All servers mount the same volume. Adds complexity and a single point of failure.

Option 3: Use object storage (S3) with presigned URLs. The client uploads directly to S3, bypassing the application server entirely. The server generates a presigned URL, the client uploads to it, then notifies the server of the S3 key. This eliminates the load balancer problem and offloads bandwidth from the application tier.

Additionally, you should set sticky sessions or use a token-based system if any intermediate processing requires the same server to handle multiple steps of an upload flow.

Q10: What security risks exist with file uploads and how do you mitigate each one?

Why interviewers ask: Tests security depth -- file uploads are one of the highest-risk features.

Model answer:

Risk	Description	Mitigation
Remote Code Execution	Attacker uploads a .php or .js file and accesses its URL to execute it	Store files outside the web root; never serve from the `uploads/` folder directly; use cloud storage
MIME spoofing	Attacker renames `malware.exe` to `photo.jpg`	Validate magic bytes, not just MIME type or extension
Path traversal	Filename like `../../etc/passwd` overwrites system files	Sanitize filenames; use `path.basename()`; generate UUIDs
Denial of Service	Uploading huge files exhausts disk or memory	Set `limits.fileSize` and `limits.files`; use disk storage for large files
Zip bomb	A small compressed file expands to gigabytes	Limit decompressed size; scan archives before extraction
Stored XSS	SVG file containing `<script>` tags	Serve user files with `Content-Disposition: attachment` or validate SVG content
Unrestricted file types	Accepting any file type widens the attack surface	Whitelist allowed MIME types and extensions in `fileFilter`

Defense-in-depth: combine all these layers. No single check is sufficient.

Q11: Compare Cloudinary, ImageKit, and AWS S3 for file storage in a Node.js application.

Why interviewers ask: Tests awareness of the cloud storage ecosystem and trade-off analysis.

Model answer:

Criteria	Cloudinary	ImageKit	AWS S3 + CloudFront
Built-in transformations	Yes (URL-based)	Yes (URL-based)	No (need Lambda@Edge or imgproxy)
CDN	Included (Akamai)	Included (CloudFront)	CloudFront (separate service)
Free tier	25 GB storage, 25 GB bandwidth	20 GB storage, 20 GB bandwidth	5 GB S3, limited CloudFront
Pricing at scale	Can be expensive	Moderate	Cheapest for raw storage
SDK quality	Excellent (Node, Python, etc.)	Good	Excellent (AWS SDK)
Video support	Yes (encoding, streaming)	Yes (basic)	Yes (with MediaConvert)
Complexity	Low (all-in-one)	Low (all-in-one)	High (assemble multiple services)
Best for	Small-to-medium apps, image-heavy	Image-heavy apps, budget-conscious	Large-scale, cost-sensitive, custom pipelines

For most Express projects starting out, Cloudinary or ImageKit is the pragmatic choice. S3 becomes more cost-effective at very high volume or when you need custom processing pipelines.

Q12: How would you implement resumable (chunked) file uploads in an Express application?

Why interviewers ask: Tests knowledge of handling large file uploads beyond basic Multer.

Model answer:

Standard Multer uploads are all-or-nothing -- if the connection drops at 90%, the entire upload is lost. For large files (video, datasets), resumable uploads are essential.

Approach 1: tus protocol. Use the tus-node-server package, which implements the open tus protocol. The client splits the file into chunks and sends each with an offset header. The server persists each chunk. If the connection drops, the client queries the server for the last-received offset and resumes from there. Libraries like tus-js-client handle the client side.

Approach 2: Presigned multipart upload to S3. The server generates presigned URLs for each chunk (using S3's multipart upload API). The client uploads chunks directly to S3 in parallel. When all chunks are uploaded, the server calls CompleteMultipartUpload. This offloads bandwidth and processing from the application server entirely.

Approach 3: Custom chunked endpoint. The client splits the file and sends POST /upload/chunk with chunk index and total count. The server writes each chunk to a temp directory. After the final chunk, the server concatenates them, validates, and moves to permanent storage. This requires managing state (which chunks have arrived) and cleanup of incomplete uploads.

In all approaches, the key challenges are: tracking upload state, handling out-of-order chunks, cleaning up stale incomplete uploads, and providing progress feedback to the client.

Quick-fire

#	Question	One-line
1	What Content-Type triggers Multer?	`multipart/form-data`
2	HTML attribute for file forms	`enctype="multipart/form-data"`
3	Single-file property on req	`req.file`
4	Multi-file property on req	`req.files`
5	Disk storage: file location property	`req.file.path`
6	Memory storage: file data property	`req.file.buffer`
7	Error class for Multer errors	`multer.MulterError`
8	Error code for oversized files	`LIMIT_FILE_SIZE`
9	JPEG magic bytes	`FF D8 FF`
10	PNG magic bytes	`89 50 4E 47`
11	Delete from Cloudinary	`cloudinary.uploader.destroy(publicId)`
12	Best storage for cloud pipeline	Memory storage (no disk writes)

<- Back to 3.7 -- Handling Files with Express (README)