The official documentation outlines a number of requirements that any CNI plugin implementation should support. Rephrasing it in a slightly different way, a CNI plugin must provide at least the following two things:
eth0 interface with IP reachable from the root network namespace of the hosting Node.
Connectivity requirement is the most straight-forward one to understand – every Pod must have a NIC to communicate with anything outside of its own network namespace. Some local processes on the Node (e.g. kubelet) need to reach PodIP from the root network namespace (e.g. to perform health and readiness checks), hence the root NS connectivity requirement.
There are a number of reference CNI plugins that can be used to setup connectivity, most notable examples are:
Reachability, on the other hand, may require a bit of unpacking:
PodCIDR range configured on the Node.
PodCIDRs assigned to other Nodes, allocations are normally managed by the controller-manager based on the
--cluster-cidr configuration flag.
PodCIDRs may require different methods:
The above mechanisms are not determined exclusively by the underlying network. Plugins can use a mixture of different methods (e.g. host-based static routes for the same L2 segment and overlays for anything else) and the choice can be made purely based on operational complexity (e.g. overlays over BGP).
It goes without saying that the base underlying assumption is that Nodes can reach each other using their Internal IPs. It is the responsibility of the infrastructure provider (IaaS) to fulfil this requirement.
In addition to the base functionality described above, there’s always a need to do things like:
These functions can be performed by the same monolithic plugin or via a plugin chaining, where multiple plugins are specified in the configuration file and get invoked sequentially by the container runtime.
CNI plugins repository provides reference implementations of the most commonly used plugins.
Contrary to the typical network plugin design approach that includes a long-lived stateful daemon, CNI Specification defines an interface – a set of input/output parameters that a CNI binary is expected to ingest/produce. This makes for a very clean design that is also very easy to swap and upgrade. The most beautiful thing is that the plugin becomes completely stateless – it’s just a binary file on a disk that gets invoked whenever a Pod gets created or deleted. Here’s a sequence of steps that a container runtime has to do whenever a new Pod gets created:
The last step, if done manually, would look something like this:
cni_plugin < /etc/cni/net.d/01-cni.conf
The CNI plugin then does all of the required interface plumbing and IP allocation and returns back (prints to stdout) the resulting data structure. In the case of plugin chaining, all this information (original inputs + result) gets passed to all plugins along the chain.
Despite its design simplicity, unless you have something else that takes care of establishing end-to-end reachability (e.g. cloud controller), a CNI binary must be accompanied by a long-running stateful daemon/agent. This daemon usually runs in the root network namespace and manages the Node’s network stack between CNI binary invocations – at the very least it adds and removes static routes as Nodes are added to or removed from the cluster. Its operation is not dictated by any standard and the only requirement is to establish Pod-to-Pod reachability.
In reality, this daemon does a lot more than just manage reachability and may include a kube-proxy replacement, Kubernetes controller, IPAM etc.
See meshnet-cni for an example of binary+daemon architecture.
To learn more about CNI, you can search for the “Kubernetes and the CNI: Where We Are and What’s Next”, which I cannot recommend highly enough. It is what’s shaped my current view of the CNI and heavily inspired the current article. Some other links I can recommend: