Experiments with container networking: Part 3
Kubernetes with BaGPipe BGP and CNI
The third part of the discussion about CNI, Kubernetes and EVPN BGP brings us to the solution that is shown on a topology below. We want our solution to stitch a Kubernetes orchestrated datacenter and provide us with seamless inter and intra DC communication between pods. All this should be implemented with regards to service-oriented model.
To show that we can deliver this design we will prepare a proof-of-concept environment. To do this we require a way to integrate the BaGPipe BGP daemon into Kubernetes environment. Luckily kubelet (a process that is responsible for managing pods on the Kubernetes node) supports CNI interface.
How CNI plugin talks to BaGPipe BGP
First of all let’s understand how CNI can talk to BaGPipe BGP daemon. Fortunately BaGPipe BGP project implemented REST API that we can use from our CNI plugin as it was discussed in the Part 2.
There is a JSON structure that BaGPipe BGP expects to receive. Sending this kind of message allows us to avoid interaction with bagpipe-rest-attach
command-line tool.
Import_rt
- import route-target community string e.g. 65000:123 (ASN:Number)
Export_rt
- export route-target community string e.g. 65000:123 (ASN:Number)
Route targets are used to control route import and export
policy in VRF. For those who are curious BaGPipe BGP uses RouterID:Number as a Route Distinguisher attribute (e.g. 192.168.33.22:5). It is Type 1 RD and chosen according to RFC7432 recommendation.
Vpn_type
- CNI plugin that I wrote supports only evpn
by this time.
Vpn_instance_id
- unique identifier for an instance id. I put a name of veth interface.
Ip_address
- ip address that would be assigned to pod or container.
Local_port
- the same value as vpn instance id but in nested json format.
Gateway_ip
- ip address of the default gateway that will be assigned to pod or container
Mac_address
- of the interface inside container’s namespace
Advertise_subnet
- Optional attribute if we set it to True then VRF will advertise whole subnet, by default only /32 advertised
Readvertise
- used to readvertise subnets received from route targets.
I implemented a function inside our plugin to send BaGPipe BGP JSON message. It was pretty easy to do as Go has a library to communicate on HTTP protocol net/http
.
This method probably requires certain improvement because we should handle properly response from BGP speaker. I think for this proof-of-concept lab it will be suitable though.
Our plugin has two functions that will implement adding and deleting network interface of the pod:
Function cmdDel
implemented as well: please refer to BaGPipe CNI plugin implementation. To delete interface we require to enter inside pod namespace and delete veth link from there. Also we need to form JSON message and send it to BaGPipe REST API with detach
command.
I encountered certain difficulties in implementing cmdDel procedure because BaGPipe BGP requires a local interface name to be passed as an argument (which basicly is the interface belonging to root namespace, e.g. vethXXXXX) . Nevertheless thankfully to ethtool go package I managed to get the root namespace leg index and then pass root namespace veth leg name to sendBagpipeReq
function.
Diagram below is showing logical flow:
CNI plugin receives configuration based on environment variables CNI_*
from external application that executes it. For Kubernetes it would be kubelet. Bare in mind that configuration is taken from configuration file which stored at /etc/cni/net.d/*.conf. Example of configuration file can be found in Part 1. Then plugin creates veth interface pair, assigns IP address using ipam
plugin and sends attach
command to BaGPipe BGP daemon.
To understand better EVPN BGP NLRI format please find below a tcpdump (wireshark) snippet with explanation.
Proof of concept BaGPipe BGP, CNI plugin and Kubernetes
Now we can bring all things together and show CNI BaGPipe plugin in action. To implement PoC lab we will use two Kubernetes nodes running BaGPipe BGP, BaGPipe CNI plugin, kubelet, kube-proxy and docker. And one master node that is running kube-apiserver,kube-controller-manager,kube-scheduler and etcd.
In kubelet configuration we should specify CNI plugin as a network plugin. It could be done in the /etc/kubernetes/kubelet
file. Commands below should be executed on both Kubernetes nodes.
Next we should clone CNI github repository to /opt
directory:
Now we should clone BaGPipe CNI plugin into /opt/cni/plugins/main/bagpipe
directory. Also we require to download ethtool go package.
Then we should go to /opt/cni and build our plugins:
Drop configuration into /etc/cni/net.d/10-bagpipe.conf and configure network. For this example we will use host-local
ipam plugin with different IP ranges. Own CNI BaGPipe ipam plugin is in TODO list. Nevertheless, we can use range-start
and range-end
to avoid pod IP addresses overlapping.
Node 1:
Node 2:
Next we should setup BaGPipe BGP and start it:
Node 1:
Node 2:
We will use the GoBGP Route Reflector from Part 2
After configuration completed, restarting kubelet service on both nodes.
Finally from master node we are running our pods.
Now we can do experiment and ping from one of the Kubernetes nodes another one. Let’s execute ping inside a docker container that is running in a pod. In this example from node 10.27.1.10 we are pinging node 10.27.0.124.
And we are again using tcpdump to catch this traffic and look inside UDP datagram. As you can see 10.27.0.0/16 network ICMP traffic is encapsulated inside VXLAN UDP datagrams.
All in all we have shown that it is possible to use BaGPipe BGP EVPN implementation together with Kubernetes due to exceptional flexibility of CNI interface. We used in this PoC my own implementation of CNI plugin that I named BaGPipe CNI. Anyone who is intersted in contribution or have questions about this PoC always welcome on email or pull request.