Posted by Irving Popovetsky on April 7th, 2009
Let me start by saying this: I am not a fan of CoyotePoint load balancers. My support experiences so far have all been atrocious. The system architecture is a cheap imitation of F5’s BigIP architecture from a decade ago which constantly limits me. I’m convinced that people only buy these things because they’re cheap.
I’ve been working with a customer who’s exceeded the throughput capabilities of their Equalizer E350si load balancer. Although the marketing materials will tell you that this unit is capable of throughput up to 700Mbps (hah!), we were maxing out and dropping packets above 50 Mbps.
Rant: You see, the problem lies in Coyote’s system architecture. This E350si is powered by a NetBurst-architecture Pentium 4 2.8Ghz, with HyperThreading disabled. Coyote uses a FreeBSD-4 based kernel, which was well known for it’s slow timers, slow interrupt handling, and immature device polling implementation. In this classic system architecture, each incoming packet generates an Interrupt Request (IRQ), which must be serviced by the CPU in a time-slotted fashion. So what we have is a load balancer which reports it’s CPU as being mostly idle, but in reality cannot handle packets quickly enough. THIS IS NOT HOW YOU DESIGN NETWORK EQUIPMENT, PEOPLE. End rant.
Good news: In version 8, Coyote introduced a new mode of operation called DSR, or Direct Server Return. DSR is quite clever, really, because it get’s around Coyote’s packet handling limitation (to a big degree) by handling the incoming network packets, but allowing the web servers to respond directly to clients. This cuts the number of TCP packets the Coyote has to process in half, and cuts the number of Ethernet frames by much more if you consider that the return packets are much larger.
Here’s how it works. In a traditional setup, the Coyote receives a packet on its external interface (em1), makes a load balancing decision, and then forwards to the packet along to a host behind its internal interface (em0). Most shops NAT here as well, for security and/or IP address conservation reasons. So the Coyote must perform Layer 2 – 4 (or 7) processing of the packet as it receives it, then make a load balancing decision, then translate the packet (that’s the T in NAT), then re-process the packet going out the internal interface. Then, rinse, lather, repeat for the return packet. Such is the life of a typical load balancer.
In DSR mode, you start by chopping off the Internal interface of the load balancer altogether and eliminating NAT. This requires moving your webservers onto publicly routable IP addresses, so please make sure they are firewalled properly. Now you have your load balancer and webservers all on the same ethernet segment. You create a VIP (Virtual IP) on the load balancer, and then add that SAME VIP address as a loopback address to the webservers!
You’re probably scratching your head, wondering how this is going to work. I know that I was. Here’s the magical part. Only the load balancer responds to ARP requests for the VIP. The webservers have Apache listening on the VIP address, but don’t respond to ARP requests at all on that address. Each incoming packet is sent from the router to the MAC address of the load balancer, which then makes a load balancing decision and then sends an identical copy of that packet to the MAC address of the web server. Let me say that again. The load balancer performs no more translation, it literally just copies the packet over to the webserver. Since the source MAC address is unchanged, the web server replies directly to the router and skips the load balancer entirely.
Sounds a bit scary, but works well. Except for one thing. In their brilliance, the Coyote folks created a section in the Manual with configuration instructions for “Linux/Unix Systems”, but ACTUALLY put in instructions for BSD-like systems only. Who runs FreeBSD anymore? DON’T TRY THESE INSTRUCTIONS ON A LINUX SERVER UNLESS YOU WANT TO LOCK YOURSELF OUT.
On Linux, the correct way to create the loopback address is by adding a “labelled” loopback interface, but ALWAYS set the netmask of your new interface to “255.255.255.255″. If you match the netmask of the VIP, your webserver will stop responding to packets on it’s external interface. Very bad.
So, assume your public VIP address is 188.8.131.52 (fake, to protect the innocent), and your webserver’s address is 184.108.40.206. Create a loopback address like so:
/sbin/ifconfig lo:vip inet 220.127.116.11 netmask 255.255.255.255
Then, the output of “ifconfig -a” looks something like this:
eth0 Link encap:Ethernet HWaddr 00:40:A4:8E:B0:1A
inet addr:18.104.22.168 Bcast:22.214.171.124 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:47131905 errors:0 dropped:0 overruns:0 frame:0
TX packets:77804088 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:5111837334 (4875.0 Mb) TX bytes:104047655003 (99227.5 Mb)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1311164 errors:0 dropped:0 overruns:0 frame:0
TX packets:1311164 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:439975560 (419.5 Mb) TX bytes:439975560 (419.5 Mb)
lo:vip Link encap:Local Loopback
inet addr:126.96.36.199 Mask:255.255.255.255
UP LOOPBACK RUNNING MTU:16436 Metric:1
If it all works, you should be able to confirm correct operation by using tcpdump or Wireshark/Ethereal on the webserver, and verifying that the SOURCE address is your VIP address and you’re seeing lots of 200 OK messages.
tshark -n -i eth0 -R http.response port 80