flannel
Flannel is another example of a dual CNI plugin design:
Connectivity is taken care of by the
flannelbinary. This binary is ametaplugin– a plugin that wraps other reference CNI plugins. In the simplest case, it generates abridgeplugin configuration and “delegates” the connectivity setup to it.Reachability is taken care of by the Daemonset running
flanneld. Here’s an approximate sequence of actions of what happens when the daemon starts:- It queries the Kubernetes Node API to discover its local
PodCIDRandClusterCIDR. This information is saved in the/run/flannel/subnet.envand is used by the flannel metaplugin to generate thehost-localIPAM configuration. - It creates a vxlan interfaces called
flannel.1and updates the Kubernetes Node object with its MAC address (along with its own Node IP). - Using Kubernetes API, it discovers the VXLAN MAC information of other Nodes and builds a local unicast head-end replication (HER) table for its vxlan interface.
- It queries the Kubernetes Node API to discover its local
Info
This plugin assumes that daemons have a way to exchange information (e.g. VXLAN MAC). Previously, this required a separate database (hosted etcd) which was considered a big disadvantage. The new version of the plugin uses Kubernetes API to store that information in annotations of a Node API object.
The fully converged IP and MAC tables will look like this:
Lab
Assuming that the lab is already setup, flannel can be enabled with the following 3 commands:
Check that the flannel daemonset has reached the READY state:
Now we need to “kick” all Pods to restart and pick up the new CNI plugin:
Here’s how the information from the diagram can be validated (using worker2 as an example):
- Pod IP and default route
- Node routing table
- Static ARP entries for NextHops
- VXLAN forwarding database
A day in the life of a Packet
Let’s track what happens when Pod-1 tries to talk to Pod-3.
Note
We’ll assume that the ARP and MAC tables are converged and fully populated.
1. Pod-1 wants to send a packet to 10.244.0.2. Its network stack looks up the routing table to find the NextHop IP:
2. The packet reaches the cbr0 bridge in the root network namespace, where the lookup is performed again:
3. The NextHop and the outgoing interfaces are set, the ARP table lookup returns the static entry provisioned by the flanneld:
4. Next, the FDB of the VXLAN interface is consulted to find out the destination VTEP IP:
5. The packet is VXLAN-encapsulated and sent to the control-node where flannel.1 matches the VNI and the VXLAN MAC:
6. The packet gets decapsulated and its original destination IP looked up in the main routing table:
7. The ARP and bridge tables are then consulted to find the outgoing veth interface:
8. Finally, the packet arrives in the Pod-3’s network namespace where it gets processed by the local network stack:
SNAT functionality
Similar to kindnet flanneld sets up the SNAT rules to enable egress connectivity for the Pods, the only difference is that it does this directly inside the POSTROUTING chain:
Caveats and Gotchas
- The official installation manifest does not install the CNI binary by default. This binary is distributed as a part of reference CNI plugins and needs to be installed separately.
- flannel can run in a
direct routingmode, which acts by installing static routes for hosts on the same subnet. - flannel can use generic UDP encapsulation instead of VXLAN