linux/include
Eric Dumazet 3d9a0d2f82 dql: dql_queued() should write first to reduce bus transactions
While doing high throughput test on a BQL enabled NIC,
I found a very high cost in ndo_start_xmit() when accessing BQL data.

It turned out the problem was caused by compiler trying to be
smart, but involving a bad MESI transaction :

  0.05 │  mov    0xc0(%rax),%edi    // LOAD dql->num_queued
  0.48 │  mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
 58.23 │  add    %edx,%edi
  0.58 │  cmp    %edi,0xc4(%rax)
  0.76 │  mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
  0.72 │  js     bd8

I got an incredible 10 % gain [1] by making sure cpu do not attempt
to get the cache line in Shared mode, but directly requests for
ownership.

New code :
	mov    %edx,0xc8(%rax)  // STORE dql->last_obj_cnt = count
	add    %edx,0xc0(%rax)  // RMW   dql->num_queued += count
	mov    0xc4(%rax),%ecx  // LOAD dql->adj_limit
	mov    0xc0(%rax),%edx  // LOAD dql->num_queued
	cmp    %edx,%ecx

The TX completion was running from another cpu, with high interrupts
rate.

Note that I am using barrier() as a soft hint, as mb() here could be
too heavy cost.

[1] This was a netperf TCP_STREAM with TSO disabled, but GSO enabled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:04:55 -04:00
..
acpi PCI updates for v3.17: 2014-09-19 10:50:30 -07:00
asm-generic
clocksource
crypto crypto: drbg - backport "fix maximum value checks on 32 bit systems" 2014-09-05 15:52:28 +08:00
drm
dt-bindings
keys
kvm
linux dql: dql_queued() should write first to reduce bus transactions 2014-09-29 00:04:55 -04:00
math-emu
media [media] vb2: fix VBI/poll regression 2014-09-21 20:57:30 -03:00
memory
misc
net net_sched: remove the first parameter from tcf_exts_destroy() 2014-09-28 17:29:01 -04:00
pcmcia
ras
rdma IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get 2014-09-19 09:55:42 -07:00
rxrpc include/rxrpc/types.h: Remove unused header 2014-08-29 20:33:39 -07:00
scsi [SCSI] fix regression that accidentally disabled block-based tcq 2014-09-19 13:23:32 +01:00
soc/tegra
sound
target
trace net: treewide: Fix typo found in DocBook/networking.xml 2014-09-05 17:35:28 -07:00
uapi Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2014-09-28 17:19:15 -04:00
video
xen xen/arm: introduce XENFEAT_grant_map_identity 2014-09-11 18:11:52 +00:00
Kbuild