From d0a81f67cd6286d32f42a167d19c7a387c23db79 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Thu, 3 Nov 2016 14:56:01 +0100 Subject: [PATCH 1/3] net: make default TX queue length a defined constant The default TX queue length of Ethernet devices have been a magic constant of 1000, ever since the initial git import. Looking back in historical trees[1][2] the value used to be 100, with the same comment "Ethernet wants good queues". The commit[3] that changed this from 100 to 1000 didn't describe why, but from conversations with Robert Olsson it seems that it was changed when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the link speed increased x10 the queue size were also adjusted. This value later caused much heartache for the bufferbloat community. This patch merely moves the value into a defined constant. [1] https://git.kernel.org/cgit/linux/kernel/git/davem/netdev-vger-cvs.git/ [2] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/ [3] https://git.kernel.org/tglx/history/c/98921832c232 Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller --- include/net/pkt_sched.h | 2 ++ net/ethernet/eth.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index cd334c9584e9..f1b76b8e6d2d 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -6,6 +6,8 @@ #include #include +#define DEFAULT_TX_QUEUE_LEN 1000 + struct qdisc_walker { int stop; int skip; diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index d9e2fe1da724..8c5a479681ca 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -62,6 +62,7 @@ #include #include #include +#include __setup("ether=", netdev_boot_setup); @@ -359,7 +360,7 @@ void ether_setup(struct net_device *dev) dev->min_mtu = ETH_MIN_MTU; dev->max_mtu = ETH_DATA_LEN; dev->addr_len = ETH_ALEN; - dev->tx_queue_len = 1000; /* Ethernet wants good queues */ + dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN; dev->flags = IFF_BROADCAST|IFF_MULTICAST; dev->priv_flags |= IFF_TX_SKB_SHARING; From 1159708432f7067b82388695e29d7105e79bd293 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Thu, 3 Nov 2016 14:56:06 +0100 Subject: [PATCH 2/3] net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a default qdisc attached, given they will be backed by physical device, that already have a qdisc attached for pushback. It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as this can be useful for difference policy reasons (e.g. bandwidth limiting containers). For this to work, the tx_queue_len need to have a sane value, because some qdiscs inherit/copy the tx_queue_len (namely, pfifo, bfifo, gred, htb, plug and sfb). Commit a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()") caught situations where some drivers didn't initialize tx_queue_len. The problem with the commit was choosing 1 as the fallback value. A qdisc queue length of 1 causes more harm than good, because it creates hard to debug situations for userspace. It gives userspace a false sense of a working config after attaching a qdisc. As low volume traffic (that doesn't activate the qdisc policy) works, like ping, while traffic that e.g. needs shaping cannot reach the configured policy levels, given the queue length is too small. This patch change the value to DEFAULT_TX_QUEUE_LEN, given other IFF_NO_QUEUE devices (that call ether_setup()) also use this value. Fixes: a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()") Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index f23e28668f32..0260ad314506 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7651,7 +7651,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, if (!dev->tx_queue_len) { dev->priv_flags |= IFF_NO_QUEUE; - dev->tx_queue_len = 1; + dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN; } dev->num_tx_queues = txqs; From 84c46dd865384a1613e797b967552e4a428b5202 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Thu, 3 Nov 2016 14:56:11 +0100 Subject: [PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device It is a clear misconfiguration to attach a qdisc to a device with tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred, htb, plug and sfb) inherit/copy this value as their queue length. Why should the kernel catch such a misconfiguration? Because prior to introducing the IFF_NO_QUEUE device flag, userspace found a loophole in the qdisc config system that allowed them to achieve the equivalent of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from a device. The loophole on older kernels is setting tx_queue_len=0, *prior* to device qdisc init (the config time is significant, simply setting tx_queue_len=0 doesn't trigger the loophole). This loophole is currently used by Docker[1] to get better performance and scalability out of the veth device. The Docker developers were warned[1] that they needed to adjust the tx_queue_len if ever attaching a qdisc. The OpenShift project didn't remember this warning and attached a qdisc, this were caught and fixed in[2]. [1] https://github.com/docker/libcontainer/pull/193 [2] https://github.com/openshift/origin/pull/11126 Instead of fixing every userspace program that used this loophole, and forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's catch the misconfiguration on the kernel side. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller --- net/sched/sch_api.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 206dc24add3a..f337f1bdd1d4 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -960,6 +960,17 @@ static struct Qdisc *qdisc_create(struct net_device *dev, sch->handle = handle; + /* This exist to keep backward compatible with a userspace + * loophole, what allowed userspace to get IFF_NO_QUEUE + * facility on older kernels by setting tx_queue_len=0 (prior + * to qdisc init), and then forgot to reinit tx_queue_len + * before again attaching a qdisc. + */ + if ((dev->priv_flags & IFF_NO_QUEUE) && (dev->tx_queue_len == 0)) { + dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN; + netdev_info(dev, "Caught tx_queue_len zero misconfig\n"); + } + if (!ops->init || (err = ops->init(sch, tca[TCA_OPTIONS])) == 0) { if (qdisc_is_percpu_stats(sch)) { sch->cpu_bstats =