Performance

Web servers (programs) are supposed to serve requests quickly from more than one TCP/IP connection at a time.

Main key performance parameters (measured under a varying load of clients and requests per client), are:

- number of requests per second (depending on the type of request, etc.);
- latency response time in milliseconds for each new connection or request;
- throughput in bytes per second (depending on file size, cached or not cached content, available network bandwidth, etc.).

Above three parameters vary noticeably depending on the number of active connections, so a fourth parameter is the concurrency level supported by a web server under a specific configuration.

Last but not least, the specific server model used to implement a web server program can bias the performance and scalability level that can be reached under heavy load or when using high end hardware (many CPUs, disks, etc.).

Performance of a web server is typically measured using one of automated load testing tools.

A web server (program) has defined load limits, because it can handle only a limited number of concurrent client connections (usually between 2 and 60,000, by default between 500 and 1,000) per IP address (and IP port) and it can serve only a certain maximum number of requests per second depending on:

- its own settings;
- the HTTP request type;
- content origin (static or dynamic);
- the fact that the served content is or is not cached;
- the hardware and software limits of the OS where it is working.

When a web server is near to or over its limits, it becomes overloaded and thus unresponsive.