Commit Graph

74720 Commits

Author SHA1 Message Date
Saeed Mahameed e725440e75 net/mlx5_core: Set/Query port MTU commands
Introduce set/Query low level functions to access MTU in hardware. To be
used by the netdev.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:24:08 -07:00
Rana Shahout 90b3e38d04 net/mlx5_core: Modify CQ moderation parameters
Introduce mlx5_core_modify_cq_moderation() to be used by the netdev, to
set hardware coalescing.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:59 -07:00
Rana Shahout 4c916a7980 net/mlx5_core: Implement get/set port status
Implemet get/set port status low level functions to be exposed by the
netdev.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:46 -07:00
Saeed Mahameed adb0c9545b net/mlx5_core: Implement access functions of ptys register fields
Those registers will be used by the ethtool to set/get settings.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:31 -07:00
Saeed Mahameed 938fe83c8d net/mlx5_core: New device capabilities handling
- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
  generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
  cap).
- Modify IB driver to use new macros for checking caps.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:22 -07:00
Saeed Mahameed e281682bf2 net/mlx5_core: HW data structs/types definitions cleanup
mlx5_ifc.h was heavily modified here since it is now generated by a
script from the device specification (PRM rev 0.25). This specification
is backward compatible to existing hardware.

Some structures/fields were added here in order to enable the Ethernet
functionality of the driver.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:11 -07:00
Saeed Mahameed db058a186f net/mlx5_core: Set irq affinity hints
Preparation for upcoming ethernet driver.
- Move msix array from eq_table struct to priv since its not related to
  eq_table
- Intorduce irq_info struct to hold all irq information
- Move name from mlx5_eq to irq_info struct since it is irq property.
- Set IRQ affinity hints

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:22:48 -07:00
Amir Vadai 64ffaa2159 net/mlx5_core,mlx5_ib: Do not use vmap() on coherent memory
As David Daney pointed in mlx4_core driver [1], mlx5_core is also
misusing the DMA-API.

This patch is removing the code that vmap() memory allocated by
dma_alloc_coherent().

After this patch, users of this drivers might fail allocating resources
on memory fragmeneted systems.  This will be fixed later on.

[1] - https://patchwork.ozlabs.org/patch/458531/

CC: David Daney <david.daney@cavium.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:22:37 -07:00
David S. Miller 8ed9b5e1c8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-05-28

This series contains updates to ethtool, ixgbe, i40e and i40evf.

John adds helper routines for ethtool to pass VF to rx_flow_spec.  Since
the ring_cookie is 64 bits wide which is much larger than what could be
used for actual queue index values, provide helper routines to pack a VF
index into the cookie.  Then John provides a ixgbe patch to allow flow
director to use the entire queue space.

Neerav provides a i40e patch to collect XOFF Rx stats, where it was not
being collected before.

Anjali provides ATR support for tunneled packets, as well as stats to
count tunnel ATR hits.  Cleaned up PF struct members which are
unnecessary, since we can use the stat index macro directly.  Cleaned
up flow director ATR/SB messages to a higher debug level since they
are not useful unless silicon validation is happening.

Greg provides a patch to disable offline diagnostics if VFs are enabled
since ethtool offline diagnostic tests are not designed (out of scope)
to disable VF functions for testing and re-enable afterward.  Also cleans
up TODO comment that is no longer needed.

Vasu provides a fix an FCoE EOF case where i40e_fcoe_ctxt_eof() maybe
called before i40e_fcoe_eof_is_supported() is called.

Jesse adds skb->xmit_more support for i40evf.  Then provides a performance
enhancement for i40evf by inlining some functions which provides a 15%
gain in small packet performance.  Also cleans up the use of time_stamp
since it is no longer used to determine if there is a tx_hang and was
a part of a previous tx_hang design which is no longer used.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:09:58 -07:00
Vivien Didelot f4fb874cf0 if_vlan: fix vlaue -> value typo
Fixes "vlaue" for "value" in include/linux/if_vlan.h.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 17:52:28 -07:00
Alexei Starovoitov 37e82c2f97 bpf: allow BPF programs access skb->skb_iif and skb->dev->ifindex fields
classic BPF already exposes skb->dev->ifindex via SKF_AD_IFINDEX extension.
Allow eBPF program to access it as well. Note that classic aborts execution
of the program if 'skb->dev == NULL' (which is inconvenient for program
writers), whereas eBPF returns zero in such case.
Also expose the 'skb_iif' field, since programs triggered by redirected
packet need to known the original interface index.
Summary:
__skb->ifindex         -> skb->dev->ifindex
__skb->ingress_ifindex -> skb->skb_iif

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 17:51:13 -07:00
Mathieu Olivari 5790cf3c00 stmmac: add phy-handle support to the platform layer
On stmmac driver, PHY specification in device-tree was done using the
non-standard property "snps,phy-addr". Specifying a PHY on a different
MDIO bus that the one within the stmmac controller doesn't seem to be
possible when device-tree is used.

This change adds support for the phy-handle property, as specified in
Documentation/devicetree/bindings/net/ethernet.txt.

Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 17:04:36 -07:00
Stephen Boyd f7b81d67d0 clk: qcom: Add support for NSS/GMAC clocks and resets
Add the NSS/GMAC clocks and the TCM clock and NSS resets.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 17:04:36 -07:00
John Fastabend 8cf6f497de ethtool: Add helper routines to pass vf to rx_flow_spec
The ring_cookie is 64 bits wide which is much larger than can be used
for actual queue index values. So provide some helper routines to
pack a VF index into the cookie. This is useful to steer packets to
a VF ring without having to know the queue layout of the device.

CC: Alex Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-05-28 03:31:12 -07:00
Eric Dumazet ed2dfd9009 tcp/dccp: warn user for preferred ip_local_port_range
After commit 07f4c90062 ("tcp/dccp: try to not exhaust
ip_local_port_range in connect()") it is advised to have an even number
of ports described in /proc/sys/net/ipv4/ip_local_port_range

This means start/end values should have a different parity.

Let's warn sysadmins of this, so that they can update their settings
if they want to.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 14:35:36 -04:00
Sunil Goutham e5c4708b2b pci: Add Cavium PCI vendor id
This vendor id will be used by network (vNIC), USB (xHCI),
SATA (AHCI), GPIO, I2C, MMC and maybe other drivers
for ThunderX SoC.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 14:19:44 -04:00
Florian Westphal d6b915e29f ip_fragment: don't forward defragmented DF packet
We currently always send fragments without DF bit set.

Thus, given following setup:

mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
   A           R1              R2         B

Where R1 and R2 run linux with netfilter defragmentation/conntrack
enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
will respond with icmp too big error if one of these fragments exceeded
1400 bytes.

However, if R1 receives fragment sizes 1200 and 100, it would
forward the reassembled packet without refragmenting, i.e.
R2 will send an icmp error in response to a packet that was never sent,
citing mtu that the original sender never exceeded.

The other minor issue is that a refragmentation on R1 will conceal the
MTU of R2-B since refragmentation does not set DF bit on the fragments.

This modifies ip_fragment so that we track largest fragment size seen
both for DF and non-DF packets, and set frag_max_size to the largest
value.

If the DF fragment size is larger or equal to the non-df one, we will
consider the packet a path mtu probe:
We set DF bit on the reassembled skb and also tag it with a new IPCB flag
to force refragmentation even if skb fits outdev mtu.

We will also set DF bit on each fragment in this case.

Joint work with Hannes Frederic Sowa.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 13:03:31 -04:00
Florian Fainelli e463d88c36 net: phy: Add phy_interface_is_rgmii helper
RGMII interfaces come in 4 different flavors that the PHY library needs
to care about: regular RGMII (no delays), RGMII with either RX or TX
delay, and both. In order to avoid errors of checking only for one type
of RGMII interface and miss the 3 others, introduce a convenience
function which tests for all values.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 00:27:35 -04:00
Eric Dumazet 095dc8e0c3 tcp: fix/cleanup inet_ehash_locks_alloc()
If tcp ehash table is constrained to a very small number of buckets
(eg boot parameter thash_entries=128), then we can crash if spinlock
array has more entries.

While we are at it, un-inline inet_ehash_locks_alloc() and make
following changes :

- Budget 2 cache lines per cpu worth of 'spinlocks'
- Try to kmalloc() the array to avoid extra TLB pressure.
  (Most servers at Google allocate 8192 bytes for this hash table)
- Get rid of various #ifdef

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-26 19:48:46 -04:00
Eric Dumazet 7f1598678d ipv6: ipv6_select_ident() returns a __be32
ipv6_select_ident() returns a 32bit value in network order.

Fixes: 286c2349f6 ("ipv6: Clean up ipv6_select_ident() and ip6_fragment()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 20:27:11 -04:00
Martin KaFai Lau d52d3997f8 ipv6: Create percpu rt6_info
After the patch
'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
we need to compensate the performance hit (bouncing dst->__refcnt).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:35 -04:00
Martin KaFai Lau 8d0b94afdc ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister
This patch keeps track of the DST_NOCACHE routes in a list and replaces its
dev with loopback during the iface down/unregister event.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
Martin KaFai Lau 3da59bd945 ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set
This patch always creates RTF_CACHE clone with DST_NOCACHE
when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
the fl6->daddr.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Julian Anastasov <ja@ssi.bg>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
Martin KaFai Lau b197df4f0f ipv6: Add rt6_get_cookie() function
Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie.  Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
Martin KaFai Lau 45e4fd2668 ipv6: Only create RTF_CACHE routes after encountering pmtu exception
This patch creates a RTF_CACHE routes only after encountering a pmtu
exception.

After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
tree, the rt->rt6i_node->fn_sernum is bumped which will fail the
ip6_dst_check() and trigger a relookup.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:33 -04:00
Martin KaFai Lau 2647a9b070 ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST
When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
Also, rt6i_gateway is always set to the nexthop while the nexthop
could be a gateway or the rt6i_dst.addr.

After removing the rt6i_dst and rt6i_src dependency in the last patch,
we also need to stop the caller from depending on rt6i_gateway and
RTF_ANYCAST.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:33 -04:00
Martin KaFai Lau fd0273d793 ipv6: Remove external dependency on rt6i_dst and rt6i_src
This patch removes the assumptions that the returned rt is always
a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
destination and source address.  The dst and src can be recovered from
the calling site.

We may consider to rename (rt6i_dst, rt6i_src) to
(rt6i_key_dst, rt6i_key_src) later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:32 -04:00
Martin KaFai Lau 286c2349f6 ipv6: Clean up ipv6_select_ident() and ip6_fragment()
This patch changes the ipv6_select_ident() signature to return a
fragment id instead of taking a whole frag_hdr as a param to
only set the frag_hdr->identification.

It also cleans up ip6_fragment() to obtain the fragment id at the
beginning instead of using multiple "if" later to check fragment id
has been generated or not.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:32 -04:00
Hannes Frederic Sowa a60e3cc7c9 net: make skb_splice_bits more configureable
Prepare skb_splice_bits to be able to deal with AF_UNIX sockets.

AF_UNIX sockets don't use lock_sock/release_sock and thus we have to
use a callback to make the locking and unlocking configureable.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 00:06:59 -04:00
Hannes Frederic Sowa be12a1fe29 net: skbuff: add skb_append_pagefrags and use it
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 00:06:58 -04:00
David S. Miller 36583eb54d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cadence/macb.c
	drivers/net/phy/phy.c
	include/linux/skbuff.h
	net/ipv4/tcp.c
	net/switchdev/switchdev.c

Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
renaming overlapping with net-next changes of various
sorts.

phy.c was a case of two changes, one adding a local
variable to a function whilst the second was removing
one.

tcp.c overlapped a deadlock fix with the addition of new tcp_info
statistic values.

macb.c involved the addition of two zyncq device entries.

skbuff.h involved adding back ipv4_daddr to nf_bridge_info
whilst net-next changes put two other existing members of
that struct into a union.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23 01:22:35 -04:00
Linus Torvalds 0b6280c620 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala.

 2) Fix neighbour table entry leak in rocker driver, from Ying Xue.

 3) Do not emit bonding notifications for unregistered interfaces, from
    Nicolas Dichtel.

 4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent
    Fourcot.

 5) Fix regression in ipv6 multicast filter test, from Henning Rogge.

 6) do_replace() in various footables netfilter modules is missing a
    check for 0 counters in the datastructure provided by the user.  Fix
    from Dave Jones, and found with trinity.

 7) Fix RCU bug in packet scheduler classifier module unloads, from
    Daniel Borkmann.

 8) Avoid deadlock in tcp_get_info() by using u64_sync.  From Eric
    Dumzaet.

 9) Input packet processing can race with inetdev_destroy() teardown,
    fix potential OOPS in ip_error() by explicitly testing whether the
    inetdev is still attached.  From Eric W Biederman.

10) MLDv2 parser in bridge multicast code breaks too early while
    parsing.  Fix from Thadeu Lima de Souza Cascardo.

11) Asking for settings on non-zero PHYID doesn't work because we do not
    import the command structure from the user and use the PHYID
    provided there.  Fix from Arun Parameswaran.

12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich.

13) Missing NF_TABLES depends for TPROXY etc can cause build failures,
    fix from Florian Westphal.

14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly,
    from Jesper Dangaard Brouer.

15) If netlink autobind retry fails, we have to reset the sockets portid
    back to zero.  From Herbert Xu.

16) VXLAN netns exit code unregisters using wrong device, from John W
    Linville.

17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from
    Dmitry Tunin and Wen-chien Jesse Sung.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  bridge: fix lockdep splat
  net: core: 'ethtool' issue with querying phy settings
  bridge: fix parsing of MLDv2 reports
  ARM: zynq: DT: Use the zynq binding with macb
  net: macb: Disable half duplex gigabit on Zynq
  net: macb: Document zynq gem dt binding
  ipv4: fill in table id when replacing a route
  cdc_ncm: Fix tx_bytes statistics
  ipv4: Avoid crashing in ip_error
  tcp: fix a potential deadlock in tcp_get_info()
  net: sched: fix call_rcu() race on classifier module unloads
  net: phy: Make sure phy_start() always re-enables the phy interrupts
  ipv6: fix ECMP route replacement
  ipv6: do not delete previously existing ECMP routes if add fails
  Revert "netfilter: bridge: query conntrack about skb dnat"
  netfilter: ensure number of counters is >0 in do_replace()
  netfilter: nfnetlink_{log,queue}: Register pernet in first place
  tcp: don't over-send F-RTO probes
  tcp: only undo on partial ACKs in CA_Loss
  net/ipv6/udp: Fix ipv6 multicast socket filter regression
  ...
2015-05-22 15:44:50 -07:00
Linus Torvalds 1c8df7bd48 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio
2015-05-22 15:15:30 -07:00
David S. Miller 572152adfb Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contain Netfilter fixes for your net tree, they are:

1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
   This problem is due to wrong order in the per-net registration and netlink
   socket events. Patch from Francesco Ruggeri.

2) Make sure that counters that userspace pass us are higher than 0 in all the
   x_tables frontends. Discovered via Trinity, patch from Dave Jones.

3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
   breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 14:25:45 -04:00
Eric Dumazet d654976cbf tcp: fix a potential deadlock in tcp_get_info()
Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.

We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.

[  523.722504] ======================================================
[  523.728706] [ INFO: possible circular locking dependency detected ]
[  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[  523.739202] -------------------------------------------------------
[  523.745474] ss/18032 is trying to acquire lock:
[  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[  523.758129]
[  523.758129] but task is already holding lock:
[  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[  523.774661]
[  523.774661] which lock already depends on the new lock.
[  523.774661]
[  523.782850]
[  523.782850] the existing dependency chain (in reverse order) is:
[  523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420

Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.

Fixes: 0df48c26d8 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 13:46:06 -04:00
Florian Westphal bd5850d39f net: sched: pkt_cls: remove unused macros from uapi
Jamal points out that this header also contains kernel internal magic that
cannot be used from userspace for anything meaningful.

Lets remove what the kernel doesn't use anymore and wrap remainder with
__KERNEL__.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 23:26:51 -04:00
Marcelo Ricardo Leitner 2efd055c53 tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.

RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut

These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.

Join work with Eric Dumazet.

Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 23:25:21 -04:00
Linus Torvalds 865d872280 xen: bug fixes for 4.1-rc4
- Fix ARM build regression.
 - Fix VIRQ_CONSOLE related oops.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJVXGwGAAoJEFxbo/MsZsTR63EH/32TrMPyQXa/hkRbwhxAb63z
 6FyPLfoJ0I9eVS0VE+EICDQkhCX/6553dgTu3vJqLzVrB8Q3ULGqOywUA6yIJAed
 s/BPz6IcYl5sVaZt5S5KHdB3eCeF2SlFhWpyD52e3Ejs+Ykkd8F+YbS0XN2aIyDI
 VaSFQ9gW1gtBIOe/ie0nC7vhsZUTTysYctBspic4QrTJcVRv2rlvMbSb/XpZhb87
 S8DQ1PlPIvOFz/l3NVx0MH/rz281sS7ooaDc0QHTrlz5A+weIAc7YJtrUt0Hm9Vd
 HbPyhFNEwuPv6A1FlVO6XR6DeHqUW6Kbt83tgtnboly6Ye4UQfRBiGOD8rB7HuU=
 =Xsha
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull two xen bugfixes from David Vrabel:

 - fix ARM build regression.

 - fix VIRQ_CONSOLE related oops.

* tag 'for-linus-4.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/events: don't bind non-percpu VIRQs with percpu chip
  xen/arm: Define xen_arch_suspend()
2015-05-21 20:19:38 -07:00
Linus Torvalds 6efdb114b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
 "Bugfixes for HID subsystem that should go in 4.1.  Important
  highlights:

   - the patch that extended support for HID++ protocol for TK820
     touchpad turns out to be causing regressions due to firmware
     issues; patch reverting back to basic support from Benjamin
     Tissoires

   - Wacom driver can oops for devices that report non-touch data on
     touch interfaces.  Fix from Ping Cheng

   - gpiolib is not mandatory for i2c-hid, so the driver shouldn't fail
     if gpiolib is not enabled.  Fix from Mika Westerberg"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: wacom: fix an Oops caused by wacom_wac_finger_count_touches
  HID: usbhid: Add HID_QUIRK_NOGET for Aten DVI KVM switch
  HID: hid-sensor-hub: Fix debug lock warning
  Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
  HID: i2c-hid: Do not fail probing if gpiolib is not enabled
2015-05-21 17:23:11 -07:00
Linus Torvalds 3d854120e9 The first set of clk fixes for 4.1 are all driver bugs, with the
exception of a single locking fix in the core code. All driver fixes are
 for code that was merged recently. The Samsung stuff is mostly fixes
 around suspend/resume, the Qualcomm fixes are for invalid hardware
 configuration data and the Silicon Labs patches are fixes following
 their move away from platform_data to Device Tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVW/1LAAoJEKI6nJvDJaTUuTEP/i0mykL1zAXOoUDAVOsULkLq
 55MUnknjfoh4CHeLRru+XeBkGNDjLKPNg4y47NadrMvRsav3sqY9WR6HZasB3roa
 uGanJZRr1nVqaD3l2ki8twc4FgNfeEaFfY018Z3dDLSYbnsq990hWQChgTGLxuDN
 iz9fh8DdOALa9dp2qo604oUjVN9QU+rqRClA1d8JhtiEfeCkTjzUBO8pkJ5ZO8ve
 sRgQ6TkY0u69rVTYspot1cwkPLL8gCiX+nTazakhKNxEg6W1q7PrrHtZsWms47P5
 SpemNaSnwwRBy1wYnbZAxgsLOmg/g7seICN/uz/OyR9ZgzRJz66HeI9yPceaCbnz
 9eNxGtsDWvE5uejoqIBqivFULTtuUdo5ZOe5Xwz/b35Hy0l79T5Ag7NmPNwkAqOd
 WOSMuv581tnRpgHz3bG+PFknsjqzCKCoMfW3bjHIH56lZA9AMlJhKuV6g85XY3WA
 WAF+QtBWqD76TMzcf3DMg9egGn2321jvs5/I8jRMa9xCgV0Ycam0DwzDZ6G8KAhK
 nWWoUyQSJU0tElphfylqSFR+00anovbB/BORGwMhZ9gohDR4UCdtCK2FwOxRY7kg
 SWU1ruDpFSfLMstpwU9Rb+9F0TtSkxWWPtk23bto83erYXzx0ezwhLIA/pEPi+mO
 chOqdtgPvtfS3NeBwB/5
 =yjsw
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Michael Turquette:
 "The first set of clk fixes for 4.1 are all driver bugs, with the
  exception of a single locking fix in the core code.

  All driver fixes are for code that was merged recently.  The Samsung
  stuff is mostly fixes around suspend/resume, the Qualcomm fixes are
  for invalid hardware configuration data and the Silicon Labs patches
  are fixes following their move away from platform_data to Device Tree"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: si5351: Do not pass struct clk in platform_data
  clk: si5351: Mention clock-names in the binding documentation
  clk: add missing lock when call clk_core_enable in clk_set_parent
  clk: exynos5420: Restore GATE_BUS_TOP on suspend
  clk: qcom: Fix MSM8916 gfx3d_clk_src configuration
  clk: qcom: Fix MSM8916 venus divider value
  clk: exynos5433: Fix wrong PMS value of exynos5433_pll_rates
  clk: exynos5433: Fix wrong parent clock of sclk_apollo clock
  clk: exynos5433: Fix CLK_PCLK_MONOTONIC_CNT clk register assignment
  clk: exynos5433: Fix wrong offset of PCLK_MSCL_SECURE_SMMU_JPEG
  clk: Use CONFIG_ARCH_EXYNOS instead of CONFIG_ARCH_EXYNOS5433
2015-05-21 16:57:50 -07:00
Eric Dumazet f5af1f57a2 inet_hashinfo: remove bsocket counter
We no longer need bsocket atomic counter, as inet_csk_get_port()
calls bind_conflict() regardless of its value, after commit
2b05ad33e1 ("tcp: bind() fix autoselection to share ports")

This patch removes overhead of maintaining this counter and
double inet_csk_get_port() calls under pressure.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Flavio Leitner <fbl@redhat.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 18:55:32 -04:00
Alexei Starovoitov 04fd61ab36 bpf: allow bpf programs to tail-call other bpf programs
introduce bpf_tail_call(ctx, &jmp_table, index) helper function
which can be used from BPF programs like:
int bpf_prog(struct pt_regs *ctx)
{
  ...
  bpf_tail_call(ctx, &jmp_table, index);
  ...
}
that is roughly equivalent to:
int bpf_prog(struct pt_regs *ctx)
{
  ...
  if (jmp_table[index])
    return (*jmp_table[index])(ctx);
  ...
}
The important detail that it's not a normal call, but a tail call.
The kernel stack is precious, so this helper reuses the current
stack frame and jumps into another BPF program without adding
extra call frame.
It's trivially done in interpreter and a bit trickier in JITs.
In case of x64 JIT the bigger part of generated assembler prologue
is common for all programs, so it is simply skipped while jumping.
Other JITs can do similar prologue-skipping optimization or
do stack unwind before jumping into the next program.

bpf_tail_call() arguments:
ctx - context pointer
jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
index - index in the jump table

Since all BPF programs are idenitified by file descriptor, user space
need to populate the jmp_table with FDs of other BPF programs.
If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere
and program execution continues as normal.

New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can
populate this jmp_table array with FDs of other bpf programs.
Programs can share the same jmp_table array or use multiple jmp_tables.

The chain of tail calls can form unpredictable dynamic loops therefore
tail_call_cnt is used to limit the number of calls and currently is set to 32.

Use cases:
Acked-by: Daniel Borkmann <daniel@iogearbox.net>

==========
- simplify complex programs by splitting them into a sequence of small programs

- dispatch routine
  For tracing and future seccomp the program may be triggered on all system
  calls, but processing of syscall arguments will be different. It's more
  efficient to implement them as:
  int syscall_entry(struct seccomp_data *ctx)
  {
     bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */);
     ... default: process unknown syscall ...
  }
  int sys_write_event(struct seccomp_data *ctx) {...}
  int sys_read_event(struct seccomp_data *ctx) {...}
  syscall_jmp_table[__NR_write] = sys_write_event;
  syscall_jmp_table[__NR_read] = sys_read_event;

  For networking the program may call into different parsers depending on
  packet format, like:
  int packet_parser(struct __sk_buff *skb)
  {
     ... parse L2, L3 here ...
     __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol));
     bpf_tail_call(skb, &ipproto_jmp_table, ipproto);
     ... default: process unknown protocol ...
  }
  int parse_tcp(struct __sk_buff *skb) {...}
  int parse_udp(struct __sk_buff *skb) {...}
  ipproto_jmp_table[IPPROTO_TCP] = parse_tcp;
  ipproto_jmp_table[IPPROTO_UDP] = parse_udp;

- for TC use case, bpf_tail_call() allows to implement reclassify-like logic

- bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table
  are atomic, so user space can build chains of BPF programs on the fly

Implementation details:
=======================
- high performance of bpf_tail_call() is the goal.
  It could have been implemented without JIT changes as a wrapper on top of
  BPF_PROG_RUN() macro, but with two downsides:
  . all programs would have to pay performance penalty for this feature and
    tail call itself would be slower, since mandatory stack unwind, return,
    stack allocate would be done for every tailcall.
  . tailcall would be limited to programs running preempt_disabled, since
    generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would
    need to be either global per_cpu variable accessed by helper and by wrapper
    or global variable protected by locks.

  In this implementation x64 JIT bypasses stack unwind and jumps into the
  callee program after prologue.

- bpf_prog_array_compatible() ensures that prog_type of callee and caller
  are the same and JITed/non-JITed flag is the same, since calling JITed
  program from non-JITed is invalid, since stack frames are different.
  Similarly calling kprobe type program from socket type program is invalid.

- jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map'
  abstraction, its user space API and all of verifier logic.
  It's in the existing arraymap.c file, since several functions are
  shared with regular array map.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 17:07:59 -04:00
Eric Dumazet eb9344781a tcp: add a force_schedule argument to sk_stream_alloc_skb()
In commit 8e4d980ac2 ("tcp: fix behavior for epoll edge trigger")
we fixed a possible hang of TCP sockets under memory pressure,
by allowing sk_stream_alloc_skb() to use sk_forced_mem_schedule()
if no packet is in socket write queue.

It turns out there are other cases where we want to force memory
schedule :

tcp_fragment() & tso_fragment() need to split a big TSO packet into
two smaller ones. If we block here because of TCP memory pressure,
we can effectively block TCP socket from sending new data.
If no further ACK is coming, this hang would be definitive, and socket
has no chance to effectively reduce its memory usage.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 16:56:40 -04:00
Florian Westphal faecbb45eb Revert "netfilter: bridge: query conntrack about skb dnat"
This reverts commit c055d5b03b.

There are two issues:
'dnat_took_place' made me think that this is related to
-j DNAT/MASQUERADE.

But thats only one part of the story.  This is also relevant for SNAT
when we undo snat translation in reverse/reply direction.

Furthermore, I originally wanted to do this mainly to avoid
storing ipv6 addresses once we make DNAT/REDIRECT work
for ipv6 on bridges.

However, I forgot about SNPT/DNPT which is stateless.

So we can't escape storing address for ipv6 anyway. Might as
well do it for ipv4 too.

Reported-and-tested-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-20 13:51:25 +02:00
Andy Zhou 06b2c61c92 ip: remove unused function prototype
ip_do_nat() function was removed prior to kernel 3.4. Remove the
unnecessary function prototype as well.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:54:36 -04:00
Daniel Borkmann 492135557d tcp: add rfc3168, section 6.1.1.1. fallback
This work as a follow-up of commit f7b3bec6f5 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:

  [...] A host that receives no reply to an ECN-setup SYN within the
  normal SYN retransmission timeout interval MAY resend the SYN and
  any subsequent SYN retransmissions with CWR and ECE cleared. [...]

Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3 ("tcp: extend
ECN sysctl to allow server-side only ECN"):

 1) Normal ECN-capable path:

    SYN ECE CWR ----->
                <----- SYN ACK ECE
            ACK ----->

 2) Path with broken middlebox, when client has fallback:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
            SYN ----->
                <----- SYN ACK
            ACK ----->

In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)

In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:

  Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
  Gorry Fairhurst, and Richard Scheffenegger:
    "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
    Proc. PAM 2015, New York.
  http://ecn.ethz.ch/ecn-pam15.pdf

Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.

tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.

Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:53:37 -04:00
Parav Pandit c9a70d4346 net-next: ethtool: Added port speed macros.
Signed-off-by: Parav Pandit <parav.pandit@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:32:18 -04:00
David Vrabel 77bb3dfdc0 xen/events: don't bind non-percpu VIRQs with percpu chip
A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
VCPU than it is bound to.  This can result in a race between
handle_percpu_irq() and removing the action in __free_irq() because
handle_percpu_irq() does not take desc->lock.  The interrupt handler
sees a NULL action and oopses.

Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).

  # cat /proc/interrupts | grep virq
   40:      87246          0  xen-percpu-virq      timer0
   44:          0          0  xen-percpu-virq      debug0
   47:          0      20995  xen-percpu-virq      timer1
   51:          0          0  xen-percpu-virq      debug1
   69:          0          0   xen-dyn-virq      xen-pcpu
   74:          0          0   xen-dyn-virq      mce
   75:         29          0   xen-dyn-virq      hvc_console

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
2015-05-19 19:55:36 +01:00
Eric Dumazet b5d721d761 inet: properly align icsk_ca_priv
tcp_illinois and upcoming tcp_cdg require 64bit alignment of
icsk_ca_priv

x86 does not care, but other architectures might.

Fixes: 05cbc0db03 ("ipv4: Create probe timer for tcp PMTU as per RFC4821")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@intel.com>
Acked-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 11:08:00 -04:00
Andy Zhou 49d16b23cd bridge_netfilter: No ICMP packet on IPv4 fragmentation error
When bridge netfilter re-fragments an IP packet for output, all
packets that can not be re-fragmented to their original input size
should be silently discarded.

However, current bridge netfilter output path generates an ICMP packet
with 'size exceeded MTU' message for such packets, this is a bug.

This patch refactors the ip_fragment() API to allow two separate
use cases. The bridge netfilter user case will not
send ICMP, the routing output will, as before.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 00:15:39 -04:00