I am currently finishing a project where I am designing a custom Quality of Service policy for a customer's suite of business critical applications. The customer uses MPLS as their WAN and the bandwidth for the MPLS is different depending on the site; Datacenters have more bandwidth, branches have decent bandwidth, and small kiosks only have a few megabits of bandwidth. The customer noticed that they were experiencing some congestion at the small kiosks, and at some branches during peak times of the day and wanted to see if enabling Quality of Service would protect the traffic to their business critical applications.
Like most people, the customer tried to enabling QOS everywhere to see if that just fixed the problem, and like most people, they found that just enabling QOS everywhere isn’t going to fix the problem. Quality of Service is an optimization tool used to prioritize traffic, and like most optimization tools a shotgun approach won’t provide the best results. As my customer realized as they started working with me, Quality of Service requires planning and periodic tuning to get those desired results. This project has inspired me to write a detailed series of blog posts going over when to use QoS, how to classify traffic, and finally how to shape and queue traffic. In this post we will be going over the first question; When to use QoS?
WHAT IS QUALITY OF SERVICE
Before we talking about when to use QoS, lets first back up a bit and talk about what Quality of Service is. Well as Cisco describes it:
“QoS technologies refer to the set of tools and techniques to manage network resources and are considered the key enabling technology for network convergence. The objective of QoS technologies is to make voice, video and data convergence appear transparent to end users. QoS technologies allow different types of traffic to contend inequitably for network resources.”
Enterprise QoS Solution Reference Network Design Guide
So lets break down what this means. First QoS is not just one thing, but a set of tools and technologies that together allow network engineers to prioritize certain types of traffic over other types of traffic. Every device from the application itself to the network devices the traffic will pass through can act on and/or manipulate this prioritization to optimize the traffic flow, potentially reducing delay for mission critical applications.
“The objective of QoS technologies is to make traffic appear transparent to end users.” Certain applications are sensitive to delay and jitter, such as voice and video traffic, while other types of traffic can handle delay and still appear to be working to the end users, such as email and transactional data. QoS technologies aim to carve up the network resources in such a way that satisfies the different needs of each application.
Finally, “QoS technologies allow different types of traffic to contend inequitably to for network resources.” The key word here is “inequitably”. Interfaces have a maximum bandwidth, the CPU of a router can only process packets so fast, so the devices need a way to carve up the finite resources and provide it to the applications that need it most to satisfy the individual application requirements. The most common example of this is that we want to guarantee that voice and video traffic has a certain amount of bandwidth so that we don’t get any delay, while at the same time we don’t mind queuing up email traffic. QoS allows us to identify the different applications so we can treat them differently.
HOW QUALITY OF SERVICE WORKS
So now that we know what QoS is, how does it work? As we just learned, QoS is a set of tools and techniques, not just one thing. So this begs the question what are these tools? The first, and arguably the most important, technology is the TOS field in the IP packet header and the COS field in the ethernet header. Both provide similar functions and they are the “thing” that is being manipulated to identify the application.
The second technology is the traditional way network devices interact with the TOS/COS fields. Traditionally Cisco devices use policy-maps to interact with the TOS/COS fields, this can be anything from changing the values in the fields (Marking), adding the packet to a specific queue (Queuing), guaranteeing certain traffic doesn’t get dropped before other traffic (Shaping and Policing), or simply changing the way the router handles routing of the traffic (Policy based-routing).
The last technology we will talk about is the modern or custom application that used the values in the TOS/COS fields to change the network behavior. These apps include things like SD-WAN controllers dynamically changing the WAN link to use for traffic based on both the delay and jitter of the circuit plus the value in the TOS/COS fields. I wont go into detail with these technologies, I just want to point out that they exist.
At this point we should talk about the values that are put in the TOS/COS fields. Over the years, the way we classified traffic changed and because of this you will see a bunch of different names for the values; from IP Precedence, to DSCP, to COS. In order to keep this blog post short and sweet just know that IP Precedence and DSCP are related and that DSCP is a more granular way of expressing IP Precedence and is located in the IP TOS field, while COS is also roughly based on IP Precedence but at layer 2 and as you guessed is located in the COS field. Now lets talk about DSCP and COS a bit more in depth.
Differentiated Services Code Point (DSCP) is a layer 3 classification tool for QoS. It can be broken up into three different categories:
- Expedited Forwarding (EF): This marking typically guarantees the highest priority. Applications that require low loss and low latency, such as IP VoIP systems should be marked with this marking.
- Assured Forwarding (AF): These markings are typically used for applications that request higher priority than default traffic. AF has a large range of different codepoint values, allowing for granular classification between applications.
- Class Selector (CS): These markings provide backwards compatibility with the older style of classification called IP Precedence. As we will see below CS and AF markings are closely related, however CS is a less granular classification than AF.
To better explain what I mean when i say that CS and AF markings are closely related I’m going to have to get really into the weeds. Below is a snapshot from notes I took while attending a CCIE bootcamp.
On the right are the ToS bits in the layer 3 header and how IP Precedence (second row) and DSCP (third and fourth rows) use those bits to represent their markings. Looking at the last row, you can see that CS maps 1:1 to IP Precedence, which is why its backwards compatible. The addition of the next two bits is used by DSCP AF markings to give more granularity.
On the left we see the “formula” for calculating the AF marking. The first number (x) represents the CS bits, and the second number (y) represents the more granular DP bits. To give an example:
We can see that CS1’s binary representative is a 1 in the CS bits and a 0 in the DP bits. AF 11, 12, and 13 all have a 1 in the CS bits but have a 1, 2, or 3 respectively in the DP bits. This means that AF has a relationship to the CS markings. The reason we say that DSCP is an extension to IP Precedence is due to this relationship. In a way you can think of CS1 marking as AF10.
Now that you understand the relationship, lets talk about priority. EF is the highest priority, then starting with CS7 we move from 7 being the highest priority to 1 being the lowest. When you get to CS4, that’s where the AF granularity comes into play with AFx1 being the higher, and AFx3 being the lower priorities. In the third blog post in this series I will talk about weighted tail drop. The priority comes into play when we talk about tail drop.
Class of Service (COS) is a layer 2 classification tool for QoS. The one thing to note about COS is that its field is part of the 802.1Q header, meaning that COS can only come into play on trunked interfaces. For devices that are layer 2 only, such as a layer 2 switch, cannot read the layer 3 header and therefore the DSCP value in that header. This means that we need to represent the contents of the layer 3 header in the layer 2 header and so COS is born.
The values of COS have the same granularity as IP Precedence. Like IP Precedence, COS has a 3-bit field and so can represent 8 different values (0-7). The higher the value the higher the priority, with the exception of 0 which represents default and 1 which represents the scavenger class. so the priority order is 1,0,2…7.
WHEN TO USE QOS
Now that we have a high-level understanding of QOS lets finally answer the question, when to use QOS. For end to end QOS we first need to understand the network flow. What is my path from point A to point B? Does every device and every circuit support QOS? Where are my bottlenecks?
These three questions are key in understanding where to enable QOS. The first question is pretty explanatory; what is my path from point A to point B? Identify which devices does the traffic you want to classify flow through. This question will give you a list of devices you will need to configure for QOS. Additionally, identify which applications you want to classify, and if the application itself is QOS aware (if the application natively sets the TOS field on the client or server). This information is key when it comes to classifying traffic, which we will go over in the next blog.
Does every device and every circuit support QOS? In order for end to end QOS to work every device on that list of devices has to support QOS. It is important to know that the vast majority of commodity internet circuits don’t honor QOS markings. This is important to know because if congestion occurs on the carriers network, setting QOS won’t do anything to fix the problem. On the other hand, private WAN circuits such as MPLS do support QOS. Note that in some cases the provider will need to get involved to enable QOS support on their circuits (this was the case with my last QOS project), but I have yet to work with an MPLS provider that doesn’t offer QOS support.
Finally, where are the bottlenecks? Identifying where the network congestion is important because it tells you where to shape and queue traffic. QOS, namely queuing, really only takes effect when there is congestion. I have often been asked, how can I verify that the router is queuing and dropping traffic correctly? The problem is that if there isn’t any congestion, then there is no need for the device to use the queues you created, and therefore until there is congestion its nearly impossible to see if everything works as intended. Identifying bottlenecks is key both because it is where QOS will be the most effective, and because it will be the points in the network you will go to verify QOS is working as intended.
Hopefully now you understand QOS a bit better, and when and where to apply QOS settings. As stated throughout this post I will be posting two more blogs on this subject that go deeper into Classification and Marking, and Shaping and Queuing.
As always if you have any questions on Quality of Service for you and your business and would like to schedule a free consultation with us, please reach out to us at firstname.lastname@example.org and we’ll be happy to help!
Trevor Butler, Network Architect