Episode 1 — Fundamentals / 1.1 — How The Internet Works

Interview Questions: How the Internet Works

Practice questions with model answers for networking fundamentals commonly asked in software engineering and infrastructure interviews.

How to use this material (instructions)

Read lessons in order — README.md, then 1.1.a → 1.1.e.
Practice out loud — 60–120 seconds per question before reading the model answer.
Draw — DNS recursion, NAT, and “type a URL” timelines reward diagrams.
Pair with exercises — 1.1-Exercise-Questions.md.
Quick review — 1.1-Quick-Revision.md the night before.

Beginner (Q1–Q7)

Q1. What happens when you type a URL in the browser and press Enter?

Why interviewers ask: This is the most common “whole stack” question—it checks whether you can connect DNS, TCP/TLS, HTTP, caching, and rendering without hand-waving.

Model answer (structured walkthrough):

Parse the URL — The browser splits the string into scheme (https), host (example.com), path, query, fragment. It may use HSTS or the preload list to force HTTPS.
DNS resolution — If the hostname is not in cache (browser → OS stub resolver → recursive resolver), the resolver walks the chain (recursive → authoritative) until it gets A/AAAA (or CNAME then A/AAAA). For many sites you also need answers for CDN names.
Pick a server IP — Often from multiple A/AAAA records (load balancing, anycast, geo).
TCP connection — SYN → SYN-ACK → ACK to the server IP on port 443 (for HTTPS). Congestion control and retransmission apply.
TLS handshake — ClientHello (ciphers, SNI), server certificate chain, key exchange, Finished. Now you have an encrypted channel; SNI lets the server pick the right virtual host/certificate.
HTTP request — e.g. GET /path HTTP/1.1 with Host:, cookies, headers. HTTP/2 multiplexes streams on one connection; HTTP/3 uses QUIC over UDP instead of TCP+TLS in classic form.
Server processing — CDN edge may serve from cache; origin may hit app servers, DB, etc.
HTTP response — Status, headers (Content-Type, caching headers), body. Browser may follow redirects (3xx).
Render pipeline — Parse HTML, build DOM, fetch CSS/JS (more DNS/TCP/TLS/HTTP), layout, paint. Same-origin policy and CORS govern cross-origin fetches.

ASCII overview:

Browser                    Recursive DNS              Origin / CDN
   |                            |                          |
   |---- query A/AAAA --------->|                          |
   |<--- IP(s) -----------------|                          |
   |                                                       |
   |---- TCP SYN ----------------------------------------->|
   |<--- SYN-ACK ------------------------------------------|
   |---- ACK -------------------------------------------->|
   |                                                       |
   |---- TLS ClientHello --------------------------------->|
   |<--- cert, key exchange, Finished ---------------------|
   |                                                       |
   |---- HTTP GET /path ---------------------------------->|
   |<--- 200 OK + body ------------------------------------|

Follow-ups to expect: DNS cache layers, HTTP/2 vs HTTP/3, role of CDN, what changes with a SPA (still initial document + assets).

Q2. What is TCP, and how does it differ from UDP?

Model answer:

TCP (Transmission Control Protocol) is connection-oriented, reliable, and ordered. It uses handshakes, sequence numbers, acknowledgments, retransmissions, and flow/congestion control. Good when correctness and completeness matter more than minimal latency (web pages, APIs, file transfer, most “download the whole thing” workloads).
UDP (User Datagram Protocol) is connectionless and best-effort: no built-in ordering or retransmission in the protocol itself. Lower overhead and no “waiting for lost packet” at the protocol layer—good for latency-sensitive or loss-tolerant traffic (VoIP, many games, QUIC builds reliability on top of UDP-style transport, DNS often uses UDP for small queries).

Comparison table:

Aspect	TCP	UDP
Connection	Yes (handshake)	No
Reliability	Retransmits, ordered delivery	No guarantees
Overhead	Higher (state, headers)	Lower
Use cases	HTTP (classic), SMTP, SSH	DNS (typical), QUIC base, streaming

Q3. What is DNS?

Model answer:

DNS (Domain Name System) maps human-readable names (e.g. example.com) to machine-oriented records (most famously A and AAAA for IPv4/IPv6 addresses). It is a hierarchical, distributed database: the public tree is split into zones delegated from the root (.) → TLD (.com) → registrar/zone (example.com). Resolvers query authoritative name servers for each label; TTL controls caching. DNS is critical for almost every user-facing request on the web.

Q4. What is the difference between HTTP and HTTPS?

Model answer:

HTTP sends requests and responses in plain text (at the application layer)—anyone on the path can read or alter content (MITM).
HTTPS is HTTP over TLS (historically “HTTP over SSL/TLS”). It provides encryption (confidentiality), integrity (tamper detection), and authentication of the server via X.509 certificates (and optionally mutual TLS for the client). Browsers show a secure context; HSTS can force HTTPS.

Q5. What is the difference between IPv4 and IPv6?

Model answer:

IPv4 uses 32-bit addresses (e.g. 203.0.113.10), widely deployed, NAT-heavy in practice due to address exhaustion.
IPv6 uses 128-bit addresses (e.g. 2001:db8::1), vastly larger space, simpler header in some respects, no NAT as a requirement (though NAT64/NPT exists in transition scenarios). Coexistence is common: dual stack, tunneling, or translation.

	IPv4	IPv6
Address size	32 bits	128 bits
Notation	Dotted decimal	Hex groups with `::` compression
Typical LAN practice	Private RFC1918 + NAT	Global or ULA + different edge patterns

Q6. What is the difference between a MAC address and an IP address?

Model answer:

MAC (Media Access Control) address: Layer 2 identifier on a local link (Ethernet/Wi‑Fi). Usually flat and vendor-assigned (burned in or configured); used for switching on the same broadcast domain. Not routable across the whole Internet.
IP address: Layer 3 identifier for end-to-end (or hop-by-hop) routing across networks. Hierarchical (network prefix + host); routers forward based on longest-prefix match.

Analogy: IP is like a postal address for a building in the world; MAC is like an apartment number on the local hallway—only relevant on that segment until a router bridges to another network.

Q7. What is the OSI model? How does it map to TCP/IP?

Model answer:

The OSI model is a 7-layer reference model for networking:

Layer	Name	Examples / concepts
7	Application	HTTP, DNS, SMTP (as “application protocols”)
6	Presentation	Encryption/encoding (often folded into app in practice)
5	Session	Session management (often folded into app/TLS)
4	Transport	TCP, UDP
3	Network	IP, ICMP, routing
2	Data link	Ethernet frames, MAC, switches
1	Physical	Bits on wire/fiber/radio

TCP/IP is often described as 4 layers: Link, Internet, Transport, Application—many OSI layers collapse into the Application layer in real stacks.

Interview tip: Interviewers often want where something lives (e.g. “TLS is mostly above L4 from an OSI pedagogy perspective but implemented alongside transport”).

Intermediate (Q8–Q13)

Q8. What is NAT, and why do we use it?

Model answer:

NAT (Network Address Translation) maps private IP addresses inside a network to one or more public IPs on the outside (typical: NAPT/PAT maps (private IP, local port) ↔ (public IP, public port)). Reasons: IPv4 address conservation, simple topology hiding, and easy home/office sharing of one public IP.

Trade-offs: Breaks end-to-end connectivity unless you add port forwarding, UPnP, NAT traversal, or IPv6. Stateful NAT is a single point of complexity for debugging (timeouts, ALGs).

Tiny diagram:

[ Laptop 10.0.0.5:4444 ] -----> [ NAT router ] -----> Internet 203.0.113.1:60001
[ Phone  10.0.0.12:5555 ] ----> [ 10.0.0.1   ] -----> same public IP, different mapped port

Q9. What is a CDN, and how does it speed up the web?

Model answer:

A CDN (Content Delivery Network) places caches and edge servers close to users (geographically and in terms of network path). Static assets (and sometimes dynamic or personalized content via advanced patterns) are served from an edge PoP using:

Shorter RTT to the user
Offload from origin
Anycast or DNS-based routing to a “good” edge
TLS termination at the edge

Headers like Cache-Control, ETag, and CDN-specific controls govern freshness.

Q10. Walk through the DNS resolution process in detail.

Model answer:

Stub resolver (OS/library) checks local cache (browser cache, OS cache).
If miss, query a recursive resolver (e.g. ISP, 1.1.1.1, 8.8.8.8).
Recursive resolver checks its cache; if miss, it acts as a full resolver:
- Query a root server: “who is .com?” → referral to TLD servers.
- Query TLD servers: “who is example.com?” → referral to authoritative servers for that zone.
- Query authoritative servers for the exact name (may follow CNAME chains within limits).
Return final answer (A/AAAA, etc.) to stub; each record has a TTL for caching.

Note: DNSSEC adds a chain of trust; DoH/DoT only changes transport to the resolver, not the recursive→authoritative logic.

Q11. Explain common HTTP status codes. What is the difference between 401 and 403?

Model answer:

Code	Meaning	Typical use
200	OK	Success with body
204	No Content	Success, no body
301	Moved Permanently	URL changed forever (SEO)
302	Found	Temporary redirect (historically abused)
304	Not Modified	Conditional GET satisfied
400	Bad Request	Malformed client request
401	Unauthorized	Authentication required or failed (“who are you?”)
403	Forbidden	Authenticated (or not), but not allowed (“I know who you are; you can’t do this”)
404	Not Found	Resource doesn’t exist (or hidden)
429	Too Many Requests	Rate limiting
500	Internal Server Error	Server bug/unhandled error
502	Bad Gateway	Upstream invalid response
503	Service Unavailable	Overload/maintenance
504	Gateway Timeout	Upstream too slow

401 vs 403 trick (what good candidates say):

RFC semantics: 401 = authentication problem; response should include WWW-Authenticate. 403 = authorization/refusal; server understands the request but refuses it.
Reality: APIs and frameworks blur lines; some use 403 for “not logged in” (bad for pedants but common). Strong answer: prefer 401 when credentials are missing/invalid, 403 when identity is known (or anonymous is fine) but policy forbids the action.

Q12. What is the difference between a hub, a switch, and a router?

Model answer:

Device	Layer (typical)	Behavior
Hub	1 (physical/repeater)	Broadcasts all frames to every port—collision domain shared; obsolete for Ethernet.
Switch	2	Learns MACs, forwards frames to the correct port; separates collision domains; same broadcast domain per VLAN.
Router	3	Forwards packets by IP prefix; connects different IP subnets; does not forward broadcast; NAT/firewall often here.

Q13. What is BGP, and what is a BGP hijack?

Model answer:

BGP (Border Gateway Protocol) is the path-vector protocol the Internet uses for inter-domain routing between autonomous systems (ASes). Routers advertise IP prefixes and AS paths; policy and shortest-path-ish preferences pick routes.

BGP hijack: A misconfiguration or malicious AS announces prefixes it does not legitimately own (or more-specific prefixes), attracting traffic. Mitigations include RPKI (Route Origin Authorization), IRR, monitoring, and coordination—none are perfect without widespread adoption and operations discipline.

Advanced (Q14–Q18)

Q14. Compare DNSSEC, DoH, and DoT.

Model answer:

	What it solves	Transport / placement
DNSSEC	Data integrity & authenticity of DNS answers (signed records, chain of trust); not encryption of queries by itself	Classic DNS (UDP/TCP 53) with signatures
DoH (DNS over HTTPS)	Encrypts stub→resolver traffic inside HTTPS; hides DNS from some intermediaries; easy to reuse HTTP infra	HTTPS (often port 443)
DoT (DNS over TLS)	Same privacy goal as DoH for stub→resolver	Dedicated TCP 853 with TLS

Key distinction: DNSSEC authenticates DNS data. DoH/DoT protect the channel to the resolver. They are complementary. Neither hides the fact that the recursive resolver (or authoritarian path) learns queries unless you add oblivious/OHTTP-style designs.

Q15. What is anycast, and how do CDNs use it?

Model answer:

Anycast advertises the same IP prefix from multiple locations. Routing (BGP) sends packets to the topologically nearest (by ISP’s routing policy) site. CDNs and DNS use anycast so users hit a nearby edge automatically without application-level geo logic for that first hop.

Contrast: Unicast = one owner per IP; multicast = one-to-many (different use case); broadcast = everyone on L2.

Q16. How would you design a system for 100 million users?

Model answer (interview-style, not a single “right” architecture):

Clarify read/write ratio, latency, consistency, data locality, and compliance. Then outline layers:

Edge: CDN for static assets; WAF/DDoS protection; TLS at edge.
API tier: Stateless services behind load balancers; horizontal scaling; auto-scaling on CPU/latency/queue depth; multi-AZ for HA.
Data: Sharding by user/tenant/geo; read replicas and caching (Redis/Memcached); CQRS if reads dominate; event-driven decoupling (Kafka, etc.).
Consistency: Choose strong vs eventual per feature; idempotency keys; sagas/outbox for distributed workflows.
Identity & traffic: OAuth/OIDC; rate limits; feature flags.
Observability: Metrics, traces, logs; SLOs and error budgets.
Multi-region: Active-active or active-passive; global load balancing; data replication latency trade-offs.

Diagram (logical):

Users -> CDN -> LB -> App pods (stateless) -> Cache
                              |
                              v
                         Primary DB / shards
                              |
                              v
                         Async workers / search index

Q17. Deep dive: “What happens when you type a URL?” — focus on caching and QUIC.

Model answer:

DNS: Hits may be served from browser cache, OS cache, recursive resolver cache—each honors TTL; stale answers possible near expiry.
TCP/TLS: Session resumption (tickets/session IDs) reduces handshake cost.
HTTP: Conditional requests (ETag, If-None-Match), Cache-Control, CDN edge caches; Service Worker caches for PWAs.
HTTP/3 / QUIC: UDP-based, encryption integrated, 0-RTT resumption possible (with replay considerations), connection migration—changes the story from “TCP then TLS” to “QUIC handshake.”

Q18. How does TLS relate to the OSI layers, and where does encryption sit?

Model answer:

Pedagogically, TLS is often placed between L4 and L7: it provides transport security for application protocols. In practice, HTTP speaks over a TLS-protected byte stream; the same socket carries encrypted records. IPsec is an alternative at L3 for VPNs. Interviewers care that you don’t claim “encryption is only L7” without nuance.

Quick-Fire (Yes / No + one line)

Question	Answer	One-line rationale
Can two devices on the same LAN have the same IP?	No (not if they need stable IP communication on that subnet)	IP clash causes ARP/route confusion and broken connectivity.
Does your MAC address change at each router hop?	No	L2 header is rewritten hop-by-hop; end-host MACs don’t travel across routers—next hop uses next segment’s MACs.
Is DNS always UDP?	No	Large responses use TCP; DNSSEC and some resolvers prefer TCP; DoH/DoT use TLS.
Does HTTPS hide the hostname from your ISP?	Mostly no (traditional TLS)	SNI historically sent in cleartext; ESNI/ECH improves privacy.
Is a switch enough to connect two different IP subnets?	No	Need L3 (router) to route between subnets unless hosts proxy—switch alone doesn’t IP-route.
Does NAT provide security by itself?	Not really	It’s not a firewall; stateful firewalling is separate (often combined in home routers).
Can UDP be “reliable”?	Yes, at app layer	Apps or protocols like QUIC add ACKs/retransmits atop UDP.
Is 403 always “logged in but forbidden”?	No	Some APIs misuse 403 for unauthenticated users; RFC-wise prefer 401 for auth failure.

Interview Tips

Start high-level, then drill down. Offer a skeleton (“DNS → TCP → TLS → HTTP → render”), then ask if they want depth on one hop.
Name the caches at each layer (DNS TTL, TCP connection reuse, HTTP cache, CDN, browser).
Use precise vocabulary: authoritative vs recursive resolver, SYN flood vs congestion, TLS vs “SSL” (say TLS).
Admit trade-offs. NAT breaks E2E; BGP trusts operators; DNSSEC deployment friction; QUIC 0-RTT replay.
401 vs 403: State the RFC distinction, then acknowledge real-world inconsistency—shows maturity.
System design: Clarify requirements before drawing boxes; cite scalability patterns (sharding, cache-aside, async) tied to data access patterns.
Draw timelines when explaining handshakes—interviewers love sequence over prose walls.

Use this doc alongside hands-on practice: packet captures (tcpdump, Wireshark), dig, curl -v, and a simple local proxy to see headers and TLS in action.