HAProxy TCP/HTTP Load Balancer

[[#What is HAProxy?]]
[[#Load Balancer Concepts]]
[[#Round Robin Load Balancing]]
[[#HAProxy Pros and Cons]]
[[#Load Balancer Architecture with Reverse Proxy]]
[[#SSL Termination]]
[[#Access Control Lists (ACLs)]]
[[#HAProxy Setup Guide]]
[[#Example Configuration]]

What is HAProxy?

HAProxy (High Availability Proxy) is a free, open-source load balancing and proxying solution for TCP and HTTP-based applications. It is particularly suited for high traffic websites and provides reliability, high performance, and protection against common overload scenarios.

HAProxy has become the standard for many large-scale infrastructures including GitHub, Twitter, Reddit, Stack Overflow, and many others due to its performance, reliability, and feature set.

Load Balancer Concepts

A load balancer distributes network traffic across multiple servers to ensure no single server becomes overwhelmed, improving the reliability and capacity of applications.

Key Load Balancing Concepts:

Load Distribution: Spreading requests across multiple servers
Health Checks: Monitoring backend servers to ensure they're responding properly
Failover: Automatically routing traffic away from failed servers
Session Persistence: Ensuring a client's requests go to the same server (also called sticky sessions)
Layer 4 vs Layer 7 Load Balancing:
- Layer 4 (Transport): Works at the TCP/UDP level, faster but less feature-rich
- Layer 7 (Application): Works at the HTTP level, more intelligent routing but more resource-intensive

Load Balancing Algorithms:

Round Robin: Requests are distributed sequentially across servers
Least Connections: Routes to the server with the fewest active connections
IP Hash: Uses client IP address to determine which server receives the request
URI Hash: Routes based on the requested URI
Weighted Algorithms: Administrators can assign different weights to servers based on capacity

Round Robin Load Balancing

Round Robin is one of the most common load balancing algorithms used in HAProxy and other load balancers.

How Round Robin Works:

Requests are distributed to each server in sequence
After reaching the last server, the algorithm starts again from the first server
Each server receives roughly the same number of requests

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (and the cycle continues)

Weighted Round Robin:

HAProxy supports weighted round robin, allowing administrators to handle servers with different capacities:

backend web_servers
    balance roundrobin
    server web1 10.0.0.1:80 weight 3
    server web2 10.0.0.2:80 weight 1

In this example, web1 will receive 3 requests for every 1 request sent to web2.

Round Robin Advantages:

Simple to implement and understand
Works well when servers have similar capabilities
Good for stateless applications

Round Robin Limitations:

Doesn't account for varying request processing times
Doesn't consider current server load
May not be ideal for sessions requiring persistence

HAProxy Pros and Cons

Pros:

Performance:
- Extremely fast and efficient
- Can handle tens of thousands of connections with minimal resources
- Low latency
Reliability:
- Battle-tested in high-traffic environments
- Stable across versions
- Excellent in production environments
Features:
- Advanced load balancing algorithms
- Detailed health checking
- Content switching and ACLs
- SSL termination
- Detailed statistics
Security:
- DDoS mitigation
- Rate limiting
- Connection limiting
- ACL-based security controls
Monitoring:
- Built-in stats page
- Extensive logging capabilities
- Integration with monitoring tools

Cons:

Learning Curve:
- Configuration syntax can be complex for beginners
- Advanced features require deeper understanding
No Built-in Web GUI:
- Configuration is primarily text-based
- Third-party GUIs exist but aren't part of the core project
Limited HTTP Caching:
- Not designed as a full-featured caching proxy
- NGINX may be better for caching static content
No Traffic Scripting:
- Limited ability to modify requests/responses compared to some alternatives
- No native scripting language (though there is Lua support in newer versions)
No Native Service Discovery:
- Requires additional tools for dynamic scaling in cloud environments
- Configuration changes require reload

Load Balancer Architecture with Reverse Proxy

A common architecture pattern is to deploy HAProxy as a load balancer in front of a cluster of reverse proxies (like NGINX).

Typical Flow:

[Clients] ↔ [HAProxy Load Balancer] ↔ [NGINX Reverse Proxies] ↔ [Application Servers]

Advantages of This Architecture:

Separation of Concerns:
- HAProxy focuses on load distribution and high availability
- NGINX handles content caching, compression, and complex HTTP processing
Defense in Depth:
- Multiple layers of security and protection
- Can isolate different types of traffic
Specialized Optimization:
- Configure HAProxy for optimal load balancing
- Configure NGINX for optimal HTTP processing
Scalability:
- Can scale reverse proxies and application servers independently
- Can handle more concurrent connections

Example Deployment Architecture:

Internet
   ↓
[Firewall]
   ↓
[HAProxy Load Balancer Cluster]
   ↓
[NGINX Reverse Proxy Pool]
   ↓
[Application Server Cluster]
   ↓
[Database Cluster]

Configuration Considerations:

X-Forwarded Headers: Ensure proper headers are passed through the proxies to maintain client information
Health Checks: Configure health checks at each layer
SSL: Decide where SSL termination occurs (typically at the HAProxy layer)
Session Persistence: Ensure sticky sessions work through all layers if needed

SSL Termination

SSL Termination is an important feature of HAProxy, allowing it to handle encrypted traffic efficiently.

What is SSL Termination?

SSL termination is the process of decrypting incoming HTTPS traffic at the load balancer, allowing backend servers to receive unencrypted HTTP requests.

[Client] ←(HTTPS)→ [HAProxy] ←(HTTP)→ [Backend Servers]

Benefits of SSL Termination:

Performance:
- Reduces CPU load on backend servers
- Centralizes encryption/decryption processing
- Improves overall system throughput
Certificate Management:
- SSL certificates managed in one place
- Easier certificate renewals
- Simplified TLS version and cipher management
Security Features:
- Centralized TLS policy enforcement
- Better monitoring of encrypted traffic
- Can add security headers to all responses

Implementation in HAProxy:

frontend https-in
    bind *:443 ssl crt /etc/ssl/certs/combined.pem
    mode http
    http-response set-header Strict-Transport-Security "max-age=31536000"
    default_backend app-servers

backend app-servers
    mode http
    server app1 10.0.0.1:80 check
    server app2 10.0.0.2:80 check

The combined.pem file contains both the certificate and private key.

Access Control Lists (ACLs)

Access Control Lists (ACLs) in HAProxy are powerful tools for creating rules based on various conditions.

ACL Syntax:

acl NAME CRITERION VALUE

Common ACL Types:

# Path-based
acl is_api path_beg /api
acl is_static path_end .jpg .png .gif

# Host-based
acl is_example hdr(host) -i example.com

# Method-based
acl is_post method POST

# Source IP-based
acl internal_network src 10.0.0.0/8

# Custom header-based
acl auth_token hdr(Authorization) -m found

Using ACLs:

frontend http-in
    bind *:80

    # Block specific IPs
    acl blocked_ips src 192.168.1.100
    http-request deny if blocked_ips

    # Route API requests to API servers
    acl is_api path_beg /api
    use_backend api_servers if is_api

    # Rate limiting
    acl too_many_requests sc0_gpc0_rate() gt 100
    http-request deny if too_many_requests

    # Default backend
    default_backend web_servers

ACL Use Cases:

Traffic Routing: Direct traffic to different backends based on criteria
Access Control: Restrict access based on IP, headers, etc.
Rate Limiting: Prevent abuse by limiting request rates
Request Filtering: Block malicious requests
Content Switching: Route based on content type or URL patterns
A/B Testing: Direct percentage of traffic to different backends

HAProxy Setup Guide

This guide covers basic HAProxy installation and configuration on a Linux system.

Installation:

Debian/Ubuntu:

sudo apt update
sudo apt install haproxy

CentOS/RHEL:

sudo yum install haproxy

Verify Installation:

haproxy -v

Basic Configuration:

Edit the configuration file:
```
sudo vi /etc/haproxy/haproxy.cfg
```

Basic configuration structure:

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000

frontend http-in
    bind *:80
    default_backend servers

backend servers
    balance roundrobin
    server server1 192.168.1.10:80 check
    server server2 192.168.1.11:80 check

Check configuration syntax:

sudo haproxy -c -f /etc/haproxy/haproxy.cfg

Start/Restart HAProxy:
```
sudo systemctl restart haproxy
```
Enable at boot:
```
sudo systemctl enable haproxy
```

Setting Up SSL Termination:

Prepare SSL certificate:

cat /path/to/your_cert.pem /path/to/your_key.pem > /etc/ssl/private/combined.pem
chmod 600 /etc/ssl/private/combined.pem

Configure HAProxy for SSL:

frontend https-in
    bind *:443 ssl crt /etc/ssl/private/combined.pem
    http-request set-header X-Forwarded-Proto https
    default_backend servers

Setting Up Health Checks:

backend servers
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    server server1 192.168.1.10:80 check inter 5s fall 3 rise 2
    server server2 192.168.1.11:80 check inter 5s fall 3 rise 2

This checks the /health endpoint every 5 seconds. The server is considered down after 3 consecutive failures and back up after 2 consecutive successful checks.

Enabling Statistics Page:

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:your_password

Access the stats page at http://your_server_ip:8404/stats

Example Configuration

Here's a comprehensive HAProxy configuration example with common features:

global
    log /dev/log local0
    log /dev/log local1 notice
    maxconn 50000
    user haproxy
    group haproxy
    daemon
    # SSL settings
    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384
    ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

# HTTPS frontend with SSL termination
frontend www-https
    bind *:443 ssl crt /etc/ssl/certs/mydomain.pem
    mode http
    option forwardfor
    # HTTP security headers
    http-response set-header Strict-Transport-Security "max-age=31536000"
    http-response set-header X-Content-Type-Options nosniff
    http-response set-header X-Frame-Options DENY
    # ACLs for routing
    acl is_api path_beg /api
    acl is_webapp path_beg /app
    acl is_static path_end .jpg .png .css .js
    # Rate limiting
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    acl too_many_requests sc0_http_req_rate(www-https) gt 20
    http-request deny if too_many_requests
    # Route to different backends based on path
    use_backend api_servers if is_api
    use_backend webapp_servers if is_webapp
    use_backend static_servers if is_static
    default_backend web_servers

# HTTP frontend with redirect to HTTPS
frontend www-http
    bind *:80
    mode http
    option httplog
    # Redirect HTTP to HTTPS
    redirect scheme https code 301 if !{ ssl_fc }

# API Backend
backend api_servers
    mode http
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    cookie SERVERID insert indirect nocache
    server api1 10.0.0.1:8080 check cookie s1
    server api2 10.0.0.2:8080 check cookie s2
    server api3 10.0.0.3:8080 check cookie s3 backup

# Web App Backend
backend webapp_servers
    mode http
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    cookie SERVERID insert indirect nocache
    server web1 10.0.1.1:8080 check cookie s1 weight 3
    server web2 10.0.1.2:8080 check cookie s2 weight 2

# Static Content Backend
backend static_servers
    mode http
    balance roundrobin
    option httpchk HEAD /
    server static1 10.0.2.1:80 check
    server static2 10.0.2.2:80 check

# Default Web Backend
backend web_servers
    mode http
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    cookie SERVERID insert indirect nocache
    server web1 10.0.3.1:8080 check cookie s1
    server web2 10.0.3.2:8080 check cookie s2

# Stats page
listen stats
    bind *:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST
    stats auth admin:secretpassword

This configuration includes:

SSL termination
Multiple backend definitions
Path-based routing
HTTP to HTTPS redirection
Cookie-based sticky sessions
Health checks
Rate limiting
Security headers
Statistics page

Table of Contents​

What is HAProxy?​

Load Balancer Concepts​

Key Load Balancing Concepts:​

Load Balancing Algorithms:​

Round Robin Load Balancing​

How Round Robin Works:​

Weighted Round Robin:​

Round Robin Advantages:​

Round Robin Limitations:​

HAProxy Pros and Cons​

Pros:​

Cons:​

Load Balancer Architecture with Reverse Proxy​

Typical Flow:​

Advantages of This Architecture:​

Example Deployment Architecture:​

Configuration Considerations:​

SSL Termination​

What is SSL Termination?​

Benefits of SSL Termination:​

Implementation in HAProxy:​

Access Control Lists (ACLs)​

ACL Syntax:​

Common ACL Types:​

Using ACLs:​

ACL Use Cases:​

HAProxy Setup Guide​

Installation:​

Debian/Ubuntu:​

CentOS/RHEL:​

Verify Installation:​

Basic Configuration:​

Setting Up SSL Termination:​

Setting Up Health Checks:​

Enabling Statistics Page:​

Example Configuration​

Table of Contents