Maps the port and the packet forwarded port label node
2020 6th IEEE International Conference on Network Softwarization (NetSoft)
PolKA: Polynomial Key-based Architecture for Source Routing in Network Fabrics
Cristina Dominicini and Diego Mafioletti | Ana C. Locateli, Rodolfo Villaca, | Alexander Gorodnik |
---|---|---|
Federal Institute of Education Science | Magnos Martinello, and Mois´es Ribeiro | School of Mathematics |
and Technology of Esp´ırito Santo | Federal University of Esp´ırito Santo | University of Bristol |
Esp´ırito Santo, Brazil | Esp´ırito Santo, Brazil | Bristol, UK |
cristina.dominicini@ifes.edu.br, | magnos@inf.ufes.br, moises@ele.ufes.br {ana.locateli,rodolfo.villaca}@ufes.br, | a.gorodnick@bristol.ac.uk |
diego.rossi@ifes.edu.br |
architectures evolution. In this sense, an emerging architecture
for software-defined data centers (DCs) and WANs is the
in order to adapt to highly variable traffic patterns.
This is far from a trivial challenge due to the scale, dynam-
However, the resulting designs require large numbers of table
entries [2], are constrained by the limited capacity of switch
1The edge node may be a virtual switch in a server, a hypervisor, a top-of-rack (ToR) switch, or a ingress domain gateway.
978-1-7281-5684-2/20/$31.00 ©2020 IEEE
Table I: Comparison of routing methods
Edge | 1,1,3 | Controller | s4 |
|
|
||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | s1 | 3 | 0 | ||||||||||||||||||||||||||||
node | |||||||||||||||||||||||||||||||
1 | |||||||||||||||||||||||||||||||
2 | |||||||||||||||||||||||||||||||
0 | 1 | ||||||||||||||||||||||||||||||
s3 | |||||||||||||||||||||||||||||||
s2 | 1 | 3 | 0 | ||||||||||||||||||||||||||||
Core Network |
Figure 1: Example of Port Switching SR for a SDN fabric.
In summary, the contributions of this work are: (i) we propose PolKA, a fully stateless RNS-based SR scheme for network fabrics that is compatible with binary polynomial arithmetic; (ii) we propose a technique to enable the imple-mentation of the polynomial mod operation in P4-enabled programmable switches [11] by reusing Cyclic Redundancy Check (CRC) hardware; (iii) we implement emulated and hardware prototypes to demonstrate that PolKA can achieve similar performance to Port Switching; and (iv) we demon-strate how PolKA can use intrinsic properties from the RNS to enable advanced routing features, like agile path migration, fast failure reaction, and service function chaining (SFC). This paper is structured as follows. Section II discusses related works. Section III presents the mathematical back-ground of PolKA. Section IV presents PolKA, and performs a scalability analysis. Section V studies how to implement PolKA and Port Switching according to P4 architecture, followed by the implementation and evaluation of proof-of-concept prototypes in Sections VI and VII. Finally, Section VIII discusses conclusions and future works.
This section describes the mathematical formulation that supports the proposal of PolKA. More information about finite fields and polynomials rings can be found in [9], [20].
Galois Field of order 2, whose elements are residue classes Polynomial Ring over GF(2): Let GF(2) = {0, 1} be the
Euclidean Division Theorem for Polynomials: Let f(t) and unique polynomials q(t) and r(t) over GF(2) such that f(t) = g(t) be polynomials over GF(2) , where g(t) �= 0. There exist g(t).q(t) + r(t), where either r(t) = 0 or deg(r) < deg(g). The polynomial r(t) is called the remainder of the division of f(t) by g(t), and will be denoted by < f(t) >g(t).
Polynomial congruence: Given f(t), g(t), and h(t) polyno-mials over GF(2), we say that f(t) is congruent to h(t) modulo g(t), and write f(t) ≡ h(t) mod g(t), if h(t) =< f(t) >g(t). Irreducible Polynomials: A non-zero polynomial g(t), is called a divisor of f(t) over GF(2) if f(t) = a(t).g(t), for some polynomial a(t) over GF(2). Two polynomials f(t) and g(t) over GF(2) are coprime if their only common divisor is 1. A non-constant polynomial f(t) over GF(2) is called irreducible over GF(2) if its only divisors are possibly a constant polynomial and itself.
= � M(t)/si(t)
oi(t) · mi(t) · ni(t) (2)
(3)
PolKA architecture is composed of three elements: (i) edge nodes that embed the route labels into the packets, (ii) core nodes that execute basic packet transport, and (iii) a logically centralized SDN Controller that selects routing paths and configures the nodes. In this architecture, the SR relies on three polynomial identifiers over GF(2): (i) routeID: a route identifier, calculated by the controller using the polynomial CRT and embedded into the packet by the edge nodes; (ii) nodeID: an identifier previously assigned to core nodes by the controller in a network configuration phase; and (iii) portID: an identifier assigned to the output ports of each core node.
0 | S1 |
|
1 | 1 | 2 | ||||
---|---|---|---|---|---|---|---|---|---|
0 | S2 | 0 | S3 | 5 | |||||
o1 = (1)dec = (1)bin | 3 | 7 | 6 | ||||||
o2 = (2)dec = (10)bin | o3 = (6)dec = (110)bin |
In this paper, all polynomials will be considered as poly-nomials over GF(2). Note that a polynomial f(t) = antn+ an−1tn−1+ . . . + a1t1+ a0t0can be represented by the bit string anan−1 . . . a1a0. Thus, an identifier is represented by a bit string formed by the coefficients of a polynomial, which are either 0 or 1. Also, the bit length of the identifier is len(f). Consider a packet should be routed via a selected path, represented by N core nodes and their respective output ports.
polynomials representing the nodeIDs of the nodes in this path. Let S = {s1(t), s2(t), . . . , sN(t)} be the multiset of the
The routeID is embedded in the packet by the edge element, and the forwarding operation in each core node calculates the output port as the remainder of the euclidean division of the routeID in the packet by its nodeID: oi(t) = < R(t) >si(t) The Controller may proactively compute R(t) or calculate it when the first packet of a flow arrives. On the other hand, core nodes only execute a simple mod operation per packet.
B. Usage example
output port set O is composed by the polynomials: Considering the path defined in Fig. 2 (s1 → s2 → s3), the
o1(t) = 1, o2(t) = t = 10, o3(t) = t2+ t = 110
|
---|
Thus, R(t) must satisfy the conditions of Eq. (5):
M(t) = (t + 1) · (t2+ t + 1) · (t3+ t + 1)
m1(t) = s2(t) · s3(t) = (t2+ t + 1) · (t3+ t + 1) m2(t) = s1(t) · s3(t) = (t + 1) · (t3+ t + 1) m3(t) = s1(t) · s2(t) = (t + 1) · (t2+ t + 1) And solving Eq. (4), we find the polynomials ni(t):
n1(t) = 1, | n2(t) = 1, |
|
---|
Therefore, packets should embed the routeID 10000, so each node can calculate its portID by dividing this routeID by its own nodeID. For example, the remainder of R(t) = 10000 divided by s3(t) = 1011 is o3(t) = 110 (port label 6).
C. Scalability analysis of the routeID
Topology | nports | diam. | size | Bits for PolKA |
|
---|---|---|---|---|---|
Two-tier S16 L16* | 24 | 3 | 32 | 21 | 15 |
|
16 | 5 | 320 | 55 | 20 |
4 | 7 | 20 | 42 | 14 | |
8 | 7 | 30 | 49 | 21 |
Table II compares the scalability of PolKA and Port Switch-ing for two common DC topologies (fat-tree and two-tier), and two continental backbone topologies [21] (ARPANET and GEANT2). This topology set covers diverse properties for number of ports, diameter and size. For backbone topologies, as the number of ports varies per node, we considered nports as the maximum number of ports of any node.
The results for the integer RNS-based approach were omit-ted, because they are very close to PolKA. This means that using PolKA instead of the integer version does not incur in reserving extra bits for the routeID header. The Port Switching method consumes less bits for representing the routeID than RNS-based SR approaches, specially in the cases where the size of the topology is high (e.g., fat-tree). These cases demand a large number of irreducible polynomials for PolKA, causing the selection of polynomials of high degree or large integer numbers even when the node does not demand many ports. For the topologies under worst case analysis in Table 1, the maximum len(R) results show that PolKA fits existing packet headers (e.g., 96 bits of Ethernet source and destination addresses, or a stack of MPLS labels with 20 bits per label). When deploying RNS-based SR, the cost to exploit RNS features may be to reserve more bits in the packet header for representing the routeID. Nevertheless, there are known techniques that make optimal assignment of nodeIDs (avoiding the worst case scenario), and reduce routeID length [12], [14].
Authorized licensed use limited to: University of Texas at San Antonio. Downloaded on October 03,2020 at 14:44:07 UTC from IEEE Xplore. Restrictions apply. 329
Ethernet | bos |
|
bos | port | ... | bos | port | IP | data |
---|
start | extract | start | lookahead | |
---|---|---|---|---|
ethernet header | ethernet header |
Ethernet | routeID | IP | data |
---|
(b) PolKA header with fixed length | drop packet | F | drop packet | F | ||
---|---|---|---|---|---|---|
|
|
|||||
IPv4 header, encapsulate the SR header, and change the | T | T | ||||
routeID | ||||||
etherType to TYPE SR=0x1234. |
B. PolKA pipeline
The PolKA header contains a routeID, whose maximum length depends on the network topology, as explained in Section IV-C. Fig. 3(b) shows the format of PolKA header with a fixed length field for storing the routeID, after the Ethernet header. At edge switches, PolKA pipeline for encapsulating the SR header is similar to Sourcey, but the result of the table lookup is an action that sets the output port to the directly connected core switch and encapsulates a single routeID. At core switches, the pipeline of Fig. 4(b) is executed. When etherType is TYPE SR, as PolKA only needs read access to packet headers, it uses the lookahead method of P4 language that evaluates a set of bits from the input packet without advancing the packet index pointer. To discover the output port, the switch has to perform a mod operation between the routeID in the packet and its own nodeID. Besides, in PolKA, there is no information in the header to identify the last hop. This is not a problem in fabric networks, because the packet delivery to an edge switch represents the end of the SR path. At the edge switch, the SR header is removed and the etherType is changed to TYPE IPv4.
end | pop header | end | port = routeID |
---|---|---|---|
mod nodeID | |||
set output port | set output port | ||
& emit packet | & emit packet | ||
Figure 4: P4 pipelines for core switches.
(yellow in Fig. 4(a)). On the other hand, in PolKA (Fig. 4(b)), the packet remains unchanged along the path;• In Sourcey, the output port is directly available in the SR header, while PolKA requires an arithmetic operation over the routeID to calculate the output port.
Authorized licensed use limited to: University of Texas at San Antonio. Downloaded on October 03,2020 at 14:44:07 UTC from IEEE Xplore. Restrictions apply. 330
Therefore, the switch calculates the output port by using two SHIFT, one CRC, and two XOR operations, which is more computationally efficient than executing a division.
With the use of this technique, PolKA can be deployed in P4 targets that allow the configuration of generator polynomials. Since P4 supports CRC operations through the use of external libraries [11], the support for customized polynomials depends on the architecture models and how specific targets implement them. The PSA2and v1model3architectures support cus-tomized polynomials of 16 and 32 bits. In terms of targets, the software switch bmv24and the hardware switch Tofino from Barefoot support customized CRC polynomials, while Netronome SmartNICs only support fixed CRC polynomials.
2https://p4.org/p4-spec/docs/PSA.html
3https://github.com/p4lang/p4c/blob/master/p4include/v1model.p4 4https://github.com/p4lang/behavioral-model
5https://www.netronome.com/
6http://pisces.cs.princeton.edu/
7http://p4.elte.hu/
8https://github.com/intrig-unicamp/macsad/
9https://github.com/p4lang/p4app/tree/rc-2.0.0/
TX |
|
P0 | ||
---|---|---|---|---|
V0 | ||||
RX | edge.p4 | P1 | core.p4 | |
V1 |
namespace ns1
SmartNICs
Figure 5: SmartNIC setup.
2) SmartNICs features: Netronome SmartNICs give access to hardware timestamps to P4 programs.They partially imple-ment the functionalities of v1model for P4 16, but it currently supports only a small set of fixed CRC polynomials, which restricts the use of its CRC hardware in PolKA. Nevertheless,
10https://www.sympy.org/
H1 | H2 | H3 | H10 | |
---|---|---|---|---|
E1 | E2 | E3 | E10 | |
S1 | S2 | S3 | S10 |
we could use them for measuring forwarding latency in one core node, as proposed in the next section.
3) P4 programs: The P4 programs are adaptations of the codes used in the emulated prototype. The new edge code encapsulates both SR headers and a timestamp header for executing latency measurements, which is composed of an egress (tsout) and an ingress timestamp (tsin).
The test uses the linear fabric topology of Fig. 6 to compare PolKA and Sourcey as the number of hops increases in the core network (e.g., from 0 for path H1 → H11 to 9 for path H1 → H10). The following experiments were executed: (i) round trip time (RTT): host H1 sends 1 ICMP packet/s during 60s to each of the other hosts using ping tool; (ii) jitter: host H1 sends an UDP traffic of 5Mbps (half of link capacity) with big packets to each of the other hosts during 60s using the iperf tool; and (iii) flow completion time (FCT): host H1 transmits a file of 100Mb with big packets over a TCP connection to each of the other hosts using the iperf tool (3 repetitions). Fig. 7 shows the comparison between PolKA and Sourcey solutions for RTT, jitter, and FCT experiments.
In the RTT experiments (Fig. 7(a) and Fig. 7(b)), it is possible to observe that: the RTT grows linearly with the increase of the number of hops for both solutions; (ii) Sourcey solution presents better RTT performance than PolKA solu-tion; and (iii) the standard deviation is small and in the same order of magnitude for both solutions. Besides, there is no significant difference in the results for different packet sizes in the RTT experiments. This is because Mininet does not consider transmission time in the emulation. In addition, jitter (Fig. 7(c)) is small and equivalent for both solutions. Finally, the FCT experiment (Fig. 7(d)) shows that both solutions require approximately the same time to transfer the file and the standard deviation is small.
C. Emulated prototype: Agile path migration in PolKA
This experiment shows how traffic engineering can benefit from SR for bandwidth allocation with agile path migration.
RTT (ms) | 12 |
|
Jitter (ms) | 0.20 |
|
FCT (s) | 150 |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 0.15 | 125 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | 0.10 | 100 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | 75 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | 50 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0.05 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | 25 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0.00 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Number of core hops | Number of core hops | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
(b) RTT big packet |
|
(d) FCT |
Average core latency (us) | 14 | Average core latency (us) | 14 | Average core latency (us) | 14 | Average core latency (us) | 14 |
|
|||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||||||||||||||||||||||||||||||||||
12 | 12 | 12 | 12 | ||||||||||||||||||||||||||||||||||||||||
10 | 10 | 10 | 10 | ||||||||||||||||||||||||||||||||||||||||
8 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 8 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|
8 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|
8 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||||
Path length | Path length | Path length | Path length | ||||||||||||||||||||||||||||||||||||||||
(a) Low throughput, small packets | (b) Low throughput, big packets |
E11 | L1 | E21 |
|
S2 |
|
E41 | L4 | Throughput (Mbps) | 10 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
L3 | 5 | |||||||||||||
E22 | E31 | |||||||||||||
H11 | H21 | H22 | H31 | H32 | H41 |
|
0 | |||||||
|
25 | |||||||||||||
Time (s) | ||||||||||||||
(a) Topology | (b) Throughput at H21 |
flow A is allocated to blue path (E11 − L1 − S1 − L2 − E21) and flow B is allocated to red path (E12−L1−S1−L3−E31). Thus, these flows compete for bandwidth at link L1−S1 in the interval of 10s to 40s. The effects of TCP congestion control for fair share of total bandwidth can be seen in Fig. 9(b) that shows the throughput at the destination of flow A (H21). At 40s, the traffic engineering decides to exploit idle links and migrates flow A from the blue path to the green path (E11 −L1−S2−L2−E21). From this moment, there is no competition with flow B, so flow A consumes all the link bandwidth.
To perform path migration, the SDN Controller only has to modify a single flow entry at the edge switch E11 for destination H21. The only field that has to be modified is the routeID to embed the new route through green path. Once this single operation is executed, all the packets of flow A that leave H11 will be tagged with the routeID of the new path.
E1 | 1 | S1 | S3 |
|
S8 | S9 | S2 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | |||||||||||||
VMS | |||||||||||||
S2 | |||||||||||||
E2 | S7 |
|
S10 | S3 |
|
||||||||
4 | |||||||||||||
S4 | |||||||||||||
S4 | S6 | 1 | E6 | ||||||||||
S7 | |||||||||||||
VNF | S5 | 3 | VMD | S6 |
|
||||||||
8 |
|
||||||||||||
CNPq, FAPES, CTIC, RNP, and EU H2020 (FUTEBOL). |
Throughput (Mbps) | Throughput (Mbps) | 6 | 0 |
|
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||
4 | 20 | 40 | |||||||||
2 | |||||||||||
0 | 20 | 40 | 60 | 0 | |||||||
Time (s) | Time (s) | ||||||||||
(d) Throughput at S3 |
Figure 10: Fast failure reaction for failure of link S4-S6.
veloped a simple control plane application that causes link failures and makes port failure information available to the data plane by populating a table of faulty ports. In addition, we modified the core pipeline to perform a lookup in this table before sending the packet to the output port. If there is a hit, the packet is randomly deflected to one of the other healthy ports. Otherwise, the packet is transmitted normally. The generation of a random value within an interval is provided by v1model and could also be replaced by a hash function if the objective is to always select the same port per flow. Failure detection mechanisms are not in the scope of this work.
|
|||||
---|---|---|---|---|---|
ser. HotSDN ’12. |
|
||||
|
|||||
Research. | |||||
New York, NY, | |||||
|
|||||
Student Workshop. |
|
||||
|
|||||
[20] I. Herstein, Topics in Algebra, 2nd ed. |
|
||||
|