Design principles, versioning, pagination, rate limiting, caching, and API gateway patterns.
Welcome everyone to this deep dive on API architecture. Today we're going to explore the design principles, patterns, and tradeoffs that go into building production-grade APIs. Whether you're designing your first API or refactoring an existing one, understanding these fundamentals will help you make better architectural decisions. We'll cover everything from REST maturity levels to caching strategies, authentication patterns, and how to choose between REST and GraphQL. Let's dive in.
Every well-designed API, regardless of protocol or framework, follows these four core principles. Looking at the cards on this slide, first we have consistency. This means using the same naming conventions, error formats, and pagination everywhere. If your users endpoint returns data and meta, your orders endpoint should too. Next is discoverability. Clients should be able to navigate your API without constantly reading docs. Use HATEOAS links and predictable URL structures. Third, backwards compatibility. Never break existing clients. You can add new fields freely, but never remove or rename them without bumping the version. And finally, least surprise. When I send a DELETE request to users slash forty-two, it should delete user forty-two, not do something unexpected. These principles form the foundation of good API design.
Not all REST APIs are created equal. Leonard Richardson defined four levels of maturity. Looking at the table, Level zero is what he calls the swamp of POX, plain old XML. This is essentially RPC-style with a single endpoint where the action is in the request body. Level one introduces resources, so you have multiple endpoints but still just one HTTP method. Level two adds proper HTTP verbs. Now you're using GET for retrieval, POST for creation, DELETE for deletion. This is where most production APIs land. Level three adds hypermedia, or HATEOAS, where responses include navigation links. GitHub's API is a well-known Level three example. While Level three adds discoverability, it also increases response size and complexity. Most teams find Level two hits the sweet spot between RESTful principles and practical implementation.
Breaking changes are inevitable, so you need to choose a versioning strategy early. Looking at the table, we have four common approaches. URL path versioning puts the version right in the path, like slash v2 slash users. Query parameter versioning adds it as a query string. Header versioning uses the Accept header. And content negotiation uses media types. Each has tradeoffs. URL path versioning is the most explicit and visible in logs, but it duplicates your entire route tree. Query parameters are easy to forget and hard to enforce. Headers are clean but invisible, making them harder to test. The recommendation here is URL path versioning. It's explicit, easy to route at the gateway level, and most widely understood by API consumers. You'll see it in logs, monitoring tools, and it's trivial for clients to specify.
Let's look at three common pagination approaches, each with distinct tradeoffs. Looking at this code, first we have offset-based pagination. This is the simplest approach with page and limit parameters. The problem is it can skip items when there are concurrent writes. Second is cursor-based pagination. The client receives an opaque token, usually base64 encoded, that points to the last item. This is stable even when rows are inserted or deleted between pages. Third is keyset-based pagination, which uses a sortable field like created timestamp. It's very fast on large datasets. The recommendation here is cursor-based pagination as your default. It handles concurrent modifications gracefully, works well for infinite scroll feeds, and the opaque cursor gives you flexibility to change the underlying implementation without breaking clients.
Rate limiting protects your API from abuse and ensures fair usage across all clients. Looking at this code, we're setting up two rate limiters with Express. The global limiter allows one hundred requests per fifteen-minute window. The auth limiter is much stricter, only five requests per window, and it only counts failed attempts. This prevents brute force attacks on login endpoints. The key here is that standard headers are enabled, so clients receive X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers with every response. Well-behaved clients can see when they're approaching the limit and back off proactively. The message field defines what users see when they hit the limit. This pattern is essential for any production API.
Caching at the right layer eliminates redundant work and dramatically reduces latency. Looking at the table, we have four caching layers. At the CDN or edge, you use Cache-Control headers with TTLs in minutes to hours. This works great for public, rarely-changing data like a product catalog. The API gateway can cache responses by route and parameters for seconds to minutes. The application layer uses Redis or Memcached for database query results. And finally, the client can use ETags. Looking at the code example, we're implementing ETag-based caching. We hash the response data to generate an ETag, then check if the client's If-None-Match header matches. If it does, we return a three-oh-four Not Modified status with no body, saving bandwidth. If not, we send the full response with the current ETag. This pattern is perfect for configuration endpoints or other data that clients fetch repeatedly.
An API gateway is a single entry point that handles cross-cutting concerns so your backend services don't have to. Looking at the diagram, clients hit the gateway, which then handles authentication, rate limiting, and response caching before routing requests to the appropriate service. Looking at the cards below, the gateway handles request routing, so version-based routing lives here. It offloads authentication, validating JWTs once so backend services receive a trusted user context. And it can do request aggregation, combining multiple backend calls into a single client response. This is the backend-for-frontend pattern. The gateway reduces client round-trips and keeps your microservices focused on business logic rather than infrastructure concerns. Popular gateway implementations include Kong, AWS API Gateway, and Envoy.
Authentication answers who are you, while authorization answers what can you do. These are separate concerns. Looking at the table, we have four common patterns. API keys are simple static keys in headers, great for server-to-server communication. JWT bearer tokens are signed tokens with claims, perfect for user-facing apps and microservices. OAuth two-point-oh provides delegated access for third-party integrations. And session cookies work well for traditional web apps. Looking at the code example, we're implementing role-based access control with JWT. The authorize middleware checks if a user is authenticated, then verifies they have one of the required roles. Usage is simple, just wrap your route handler with authorize and pass the allowed roles. This keeps authorization logic centralized and declarative rather than scattered throughout your codebase.
OpenAPI, formerly known as Swagger, is the industry standard for describing REST APIs. The best practice is to write the spec first, then generate code from it. Looking at this YAML example, we're defining a users endpoint with a GET operation. The spec describes the role query parameter with its allowed values, the response structure including the data array and pagination metadata, and references to schema components. The power of OpenAPI is what you can generate from this single source of truth. You can auto-generate server stubs in any language, client SDKs for TypeScript, Python, or Go, validation middleware that enforces the schema, and interactive documentation with Swagger UI or Redoc. This spec-first approach ensures your documentation never drifts from your implementation.
You can't improve what you can't measure, so instrument these four signals from day one. Looking at the cards, first track request metrics including request count, latency percentiles at P50, P95, and P99, error rate, and status code distribution per endpoint. Second, implement distributed tracing. Propagate a trace ID across all services so you can see the full request journey and identify which service was slow or where errors originated. Third, use structured logging. Log JSON with request ID, HTTP method, path, status code, duration, and user ID. This enables correlation across services. And fourth, set up alerting. Alert on error rate above one percent, P99 latency above five hundred milliseconds, or any spike in five-hundred-level errors. Use SLOs to define what's acceptable and alert when you breach them. These four signals give you the visibility you need to debug issues and optimize performance.
The GraphQL versus REST debate isn't religious, it's contextual. Each excels in different scenarios. Looking at the table, if you have few, known clients like a mobile app and web app, REST works great. But if you have many diverse clients with different data needs, GraphQL shines. For predictable, resource-oriented data, use REST. For deeply nested, relationship-heavy data, GraphQL is better. If you need HTTP caching at the CDN or proxy level, REST is your friend. If client-side caching is sufficient, GraphQL works well. For backend-driven development with a mature ecosystem, REST has universal support. For frontend-driven, rapid iteration, GraphQL excels, especially in TypeScript and React ecosystems. The key insight is you can run both. Many companies use REST for public APIs and simple CRUD, while using GraphQL internally where frontends need flexible data fetching. A hybrid approach lets you choose the right tool for each use case.
Hands-on implementation guides with detailed code examples, step-by-step instructions, and expanded explanations for each topic.