Commit Graph

55576 Commits

Author SHA1 Message Date
Gao feng c296bb4d5d netfilter: nf_conntrack: refactor l4proto support for netns
Move the code that register/unregister l4proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l4proto_register
nf_ct_l4proto_pernet_register
nf_ct_l4proto_unregister
nf_ct_l4proto_pernet_unregister

We same many line breaks with it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 14:40:53 +01:00
Gao feng 6330750d56 netfilter: nf_conntrack: refactor l3proto support for netns
Move the code that register/unregister l3proto to the
module_init/exit context.

Given that we have to modify some interfaces to accomodate
these changes, it is a good time to use shorter function names
for this using the nf_ct_* prefix instead of nf_conntrack_*,
that is:

nf_ct_l3proto_register
nf_ct_l3proto_pernet_register
nf_ct_l3proto_unregister
nf_ct_l3proto_pernet_unregister

We same many line breaks with it.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 14:39:20 +01:00
Gao feng 04d8700179 netfilter: nf_ct_proto: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:33 +01:00
Gao feng 5f69b8f521 netfilter: nf_ct_labels: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:23 +01:00
Gao feng 5e615b2200 netfilter: nf_ct_helper: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:13 +01:00
Gao feng 8684094cf1 netfilter: nf_ct_timeout: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:56:02 +01:00
Gao feng 3fe0f943d4 netfilter: nf_ct_ecache: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:50 +01:00
Gao feng 73f4001a52 netfilter: nf_ct_tstamp: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:39 +01:00
Gao feng b7ff3a1fae netfilter: nf_ct_acct: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:29 +01:00
Gao feng 83b4dbe198 netfilter: nf_ct_expect: move initialization out of pernet_operations
Move the global initial codes to the module_init/exit context.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:55:00 +01:00
Gao feng f94161c1bb netfilter: nf_conntrack: move initialization out of pernet operations
nf_conntrack initialization and cleanup codes happens in pernet
operations function. This task should be done in module_init/exit.
We can't use init_net to identify if it's the right time to initialize
or cleanup since we cannot make assumption on the order netns are
created/destroyed.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23 12:53:35 +01:00
Pablo Neira Ayuso 8a454ab95e netfilter: add missing xt_connlabel.h header in installation
In (c539f01 netfilter: add connlabel conntrack extension), it
was missing the change to the Kbuild file to install the header
in the system.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-21 13:46:49 +01:00
Pablo Neira Ayuso e7db3cbcd6 netfilter: add missing xt_bpf.h header in installation
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-21 12:30:59 +01:00
Willem de Bruijn e6f30c7317 netfilter: x_tables: add xt_bpf match
Support arbitrary linux socket filter (BPF) programs as x_tables
match rules. This allows for very expressive filters, and on
platforms with BPF JIT appears competitive with traditional
hardcoded iptables rules using the u32 match.

The size of the filter has been artificially limited to 64
instructions maximum to avoid bloating the size of each rule
using this new match.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-21 12:20:19 +01:00
Florian Westphal 9b21f6a909 netfilter: ctnetlink: allow userspace to modify labels
Add the ability to set/clear labels assigned to a conntrack
via ctnetlink.

To allow userspace to only alter specific bits, Pablo suggested to add
a new CTA_LABELS_MASK attribute:

The new set of active labels is then determined via

active = (active & ~mask) ^ changeset

i.e., the mask selects those bits in the existing set that should be
changed.

This follows the same method already used by MARK and CONNMARK targets.

Omitting CTA_LABELS_MASK is the same as setting all bits in CTA_LABELS_MASK
to 1: The existing set is replaced by the one from userspace.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18 00:28:17 +01:00
Florian Westphal 0ceabd8387 netfilter: ctnetlink: deliver labels to userspace
Introduce CTA_LABELS attribute to send a bit-vector of currently active labels
to userspace.

Future patch will permit userspace to also set/delete active labels.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18 00:28:16 +01:00
Florian Westphal c539f01717 netfilter: add connlabel conntrack extension
similar to connmarks, except labels are bit-based; i.e.
all labels may be attached to a flow at the same time.

Up to 128 labels are supported.  Supporting more labels
is possible, but requires increasing the ct offset delta
from u8 to u16 type due to increased extension sizes.

Mapping of bit-identifier to label name is done in userspace.

The extension is enabled at run-time once "-m connlabel" netfilter
rules are added.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-18 00:28:15 +01:00
Kevin Cernekee 7266507d89 netfilter: nf_ct_sip: support Cisco 7941/7945 IP phones
Most SIP devices use a source port of 5060/udp on SIP requests, so the
response automatically comes back to port 5060:

    phone_ip:5060 -> proxy_ip:5060   REGISTER
    proxy_ip:5060 -> phone_ip:5060   100 Trying

The newer Cisco IP phones, however, use a randomly chosen high source
port for the SIP request but expect the response on port 5060:

    phone_ip:49173 -> proxy_ip:5060  REGISTER
    proxy_ip:5060 -> phone_ip:5060   100 Trying

Standard Linux NAT, with or without nf_nat_sip, will send the reply back
to port 49173, not 5060:

    phone_ip:49173 -> proxy_ip:5060  REGISTER
    proxy_ip:5060 -> phone_ip:49173  100 Trying

But the phone is not listening on 49173, so it will never see the reply.

This patch modifies nf_*_sip to work around this quirk by extracting
the SIP response port from the Via: header, iff the source IP in the
packet header matches the source IP in the SIP request.

Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-17 21:12:44 +01:00
David S. Miller 4b87f92259 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/networking/ip-sysctl.txt
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

Both conflicts were simply overlapping context.

A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-15 15:05:59 -05:00
David S. Miller 47fb3a26e2 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains netfilter fixes for 3.8-rc3,
they are:

* fix possible BUG_ON if several netns are in use and the nf_conntrack
  module is removed, initial patch from Gao feng, final patch from myself.

* fix unset return value if conntrack zone are disabled at
  compile-time, reported by Borislav Petkov, fix from myself.

* fix display error message via dmesg for arp_tables, from Jan Engelhardt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 18:26:41 -05:00
Paul Moore 5dbbaf2de8 tun: fix LSM/SELinux labeling of tun/tap devices
This patch corrects some problems with LSM/SELinux that were introduced
with the multiqueue patchset.  The problem stems from the fact that the
multiqueue work changed the relationship between the tun device and its
associated socket; before the socket persisted for the life of the
device, however after the multiqueue changes the socket only persisted
for the life of the userspace connection (fd open).  For non-persistent
devices this is not an issue, but for persistent devices this can cause
the tun device to lose its SELinux label.

We correct this problem by adding an opaque LSM security blob to the
tun device struct which allows us to have the LSM security state, e.g.
SELinux labeling information, persist for the lifetime of the tun
device.  In the process we tweak the LSM hooks to work with this new
approach to TUN device/socket labeling and introduce a new LSM hook,
security_tun_dev_attach_queue(), to approve requests to attach to a
TUN queue via TUNSETQUEUE.

The SELinux code has been adjusted to match the new LSM hooks, the
other LSMs do not make use of the LSM TUN controls.  This patch makes
use of the recently added "tun_socket:attach_queue" permission to
restrict access to the TUNSETQUEUE operation.  On older SELinux
policies which do not define the "tun_socket:attach_queue" permission
the access control decision for TUNSETQUEUE will be handled according
to the SELinux policy's unknown permission setting.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@parisplace.org>
Tested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 18:16:59 -05:00
Florian Fainelli f9a8f83b04 net: phy: remove flags argument from phy_{attach, connect, connect_direct}
The flags argument of the phy_{attach,connect,connect_direct} functions
is then used to assign a struct phy_device dev_flags with its value.
All callers but the tg3 driver pass the flag 0, which results in the
underlying PHY drivers in drivers/net/phy/ not being able to actually
use any of the flags they would set in dev_flags. This patch gets rid of
the flags argument, and passes phydev->dev_flags to the internal PHY
library call phy_attach_direct() such that drivers which actually modify
a phy device dev_flags get the value preserved for use by the underlying
phy driver.

Acked-by: Kosta Zertsekel <konszert@marvell.com>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:11:50 -05:00
Benjamin LaHaise c1b52739e4 pkt_sched: namespace aware act_mirred
Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
and struct net pointer is not provided in the call chain.  His original
patch made use of current->nsproxy->net_ns to find the network namespace,
but this fails to work correctly for userspace code that makes use of
netlink sockets in different network namespaces.  Instead, pass the
"struct net *" down along the call chain to where it is needed.

This version removes the ifb changes as Eric has submitted that patch
separately, but is otherwise identical to the previous version.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:09:36 -05:00
YOSHIFUJI Hideaki / 吉藤英明 6059283378 ipv6 netevent: Remove old_neigh from netevent_redirect.
The only user is cxgb3 driver.

old_neigh is used to check device change, but it must not happen
on redirect.  In this sense, we can remove old_neigh argument.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 15:04:59 -05:00
Linus Torvalds 9bbcbad438 Sound fixes for 3.8-rc4
Most of commits found here are for ASoC device specific fixes,
 arizona, cs4271, wm5102, wm2200, etc, in addition to a couple of
 memory leak fixes in ASoC core.
 
 Other than that, regression fixes in HD-audio and USB-audio, and
 a fix for new Realtek codecs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ88mfAAoJEGwxgFQ9KSmkbXkP/3NGkP2N7DX6qK9pJ00sR9dY
 gMmu0yM+W/jJZ0Xf2yU7x/sZdWrUbtICOZ67cOrG7M9W3DIOzMtvaiMCIh12+LUH
 xU1KRPfVe+oLGy6eQBR5o45z4DNNwgNZ9LpKv8/DySrRMJ60hJ4jxJKSnMmTyCe7
 vMlp9k1nJP41jkZX1rRW3JrBBSMB9NHWiltm+GIZJ3VKqM3+uiJJZaMAWcjnMiYy
 tnVOxomJzNPCxv9Bv2GzgD0MII0G8olYhXeVJP3B76EnNbK6G5PbxmkTp+/oKVQf
 fU2Z5FheqqT3mjYeYo++vj0E5Dau+iYU02z0uHMMErUCpKTvwGa2szBH1PPOqcHU
 H85EBJ12OQt8vNNpGSKdPBtAqqOJm9C1jcT1387Dubq9Oz2X113Zl7QI3xH5u91s
 8lDhJI4XhlDwjSe5SUCHsdIwG4uQM9UOdc8hEJkCMemu2cFQTJUN5mwRJsd9B6Po
 ACB231GwNsRRq28nh1AxtroWUlO7exWOpsbqwoF6zuhwtUzR1Vm87Vd4go/T2wTD
 Lvyt5rDSnhFHWELkVBTk3/U/6fn0aQ8pR0wuScF1VoQCF1vphlEBWuHUOnorFmd+
 gt1T4pp4jc/xmCZLdMXlCcsctPb+5HRVO+DkfCh9RYDvanh8HfgUF/WlhF361qXr
 Vu4uLsazWMJSFX/gXTzn
 =iVG2
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Most of commits found here are for ASoC device specific fixes,
  arizona, cs4271, wm5102, wm2200, etc, in addition to a couple of
  memory leak fixes in ASoC core.

  Other than that, regression fixes in HD-audio and USB-audio, and a fix
  for new Realtek codecs."

* tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits)
  ALSA: usb-audio: Fix NULL dereference by access to non-existing substream
  ALSA: hda - Add support of new codec ALC284
  ALSA: usb-audio: Make ebox44_table static
  ALSA: hdspm - Fix wordclock status on AES32
  Revert "ALSA: hda - Shut up pins at power-saving mode with Conexnat codecs"
  ALSA: hda - Disable runtime D3 for Intel CPT & co
  ALSA: pxa27x: fix ac97 warm reset
  ALSA: pxa27x: fix ac97 cold reset
  ASoC: wm_adsp: Ensure that block writes are from DMA aligned addresses
  ASoC: wm2000: Fix sense of speech clarity enable
  ASoC: wm5100: Remove DSP B and left justified formats
  ASoC: arizona: Remove DSP B and left justified AIF modes
  ASoC: wm2200: Remove DSP B and left justified AIF modes
  ASoC: wm5102: Improve speaker enable performance
  ASoC: core: fix the memory leak in case of remove_aux_dev()
  ASoC: core: fix the memory leak in case of device_add() failure
  ASoC: cs42l52: Catch no-match case in cs42l52_get_clk
  ASoC: lm49453: Update lm49453_reg_defs values as per LM49453 HW revision-B
  ASoC: lm49453: Fix adc, mic and sidetone volume ranges
  ASoC: arizona: Correct FLL source definitions
  ...
2013-01-14 10:56:05 -08:00
YOSHIFUJI Hideaki / 吉藤英明 38675170e4 ipv6: 64bit version of ipv6_prefix_equal().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:01 -05:00
YOSHIFUJI Hideaki / 吉藤英明 2ef9733203 ipv6: Remove __ipv6_prefix_equal().
ipv6_prefix_equal() just casts its arguments and it is the only
user of __ipv6_prefix_equal().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:01 -05:00
YOSHIFUJI Hideaki / 吉藤英明 5206c579da ipv6: 64bit version of ipv6_addr_set().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:01 -05:00
YOSHIFUJI Hideaki / 吉藤英明 a04d40b895 ipv6: 64bit version of ipv6_addr_v4mapped().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:00 -05:00
YOSHIFUJI Hideaki / 吉藤英明 e287656b36 ipv6: 64bit version of ipv6_addr_loopback().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:00 -05:00
YOSHIFUJI Hideaki / 吉藤英明 9f2e73345a ipv6: 64bit version of ipv6_addr_diff().
Introduce __ipv6_addr_diff64() to to find the first different
bit between two addresses on 64bit architectures.

32bit version is still available as __ipv6_addr_diff32(),
and __ipv6_addr_diff() automatically selects appropriate
version.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14 13:17:00 -05:00
Linus Torvalds 3441f0d26d Driver core fixes for 3.8-rc3
Here are two patches for 3.8-rc3.
 
 One removes the __dev* defines from init.h now that all usages of it are gone
 from your tree.  The other fix is for debugfs's paramater that was using the
 wrong base for the option.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDzjcAACgkQMUfUDdst+ykJVwCcDqiKrO9p0dcH9WXN5aukBWX/
 N8EAoK786v7PjtiVyNOJ/cPUDU8OHUpg
 =U4nL
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg Kroah-Hartman:
 "Here are two patches for 3.8-rc3.

  One removes the __dev* defines from init.h now that all usages of it
  are gone from your tree.  The other fix is for debugfs's paramater
  that was using the wrong base for the option.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  debugfs: convert gid= argument from decimal, not octal
  Remove __dev* markings from init.h
2013-01-14 09:07:11 -08:00
Linus Torvalds 6843cc0e0f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix regression allowing IP_TTL setting of zero, fix from Cong Wang.

 2) Fix leak regressions in tunap, from Jason Wang.

 3) be2net driver always returns IRQ_HANDLED in INTx handler, fix from
    Sathya Perla.

 4) qlge doesn't really support NETIF_F_TSO6, don't set that flag.  Fix
    from Amerigo Wang.

 5) Add 802.11ad Atheros wil6210 driver, from Vladimir Kondratiev.

 6) Fix MTU calculations in mac80211 layer, from T Krishna Chaitanya.

 7) Station info layer of mac80211 needs to use del_timer_sync(), from
    Johannes Berg.

 8) tcp_read_sock() can loop forever, because we don't immediately stop
    when recv_actor() returns zero.  Fix from Eric Dumazet.

 9) Fix WARN_ON() in tcp_cleanup_rbuf().  We have to use sk_eat_skb() in
    tcp_recv_skb() to handle the case where a large GRO packet is split
    up while it is use by a splice() operation.  Fix also from Eric
    Dumazet.

10) addrconf_get_prefix_route() in ipv6 tests flags incorrectly, it
    does:

        if (X && (p->flags & Y) != 0)

    when it really meant to go:

        if (X && (p->flags & X) != 0)

    fix from Romain Kuntz.

11) Fix lost Kconfig dependency for bfin_mac driver hardware
    timestamping.  From Lars-Peter Clausen.

12) Fix regression in handling of RST without ACK in TCP, from Eric
    Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
  be2net: fix unconditionally returning IRQ_HANDLED in INTx
  tuntap: fix leaking reference count
  tuntap: forbid calling TUNSETIFF when detached
  tuntap: switch to use rtnl_dereference()
  net, wireless: overwrite default_ethtool_ops
  qlge: remove NETIF_F_TSO6 flag
  tcp: accept RST without ACK flag
  net: ethernet: xilinx: Do not use NO_IRQ in axienet
  net: ethernet: xilinx: Do not use axienet on PPC
  bnx2x: Allow management traffic after boot from SAN
  bnx2x: Fix fastpath structures when memory allocation fails
  bfin_mac: Restore hardware time-stamping dependency on BF518
  tun: avoid owner checks on IFF_ATTACH_QUEUE
  bnx2x: move debugging code before the return
  tuntap: refuse to re-attach to different tun_struct
  ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]
  ipv6: fix the noflags test in addrconf_get_prefix_route
  tcp: fix splice() and tcp collapsing interaction
  tcp: splice: fix an infinite loop in tcp_read_sock()
  net: prevent setting ttl=0 via IP_TTL
  ...
2013-01-14 08:27:10 -08:00
YOSHIFUJI Hideaki / 吉藤英明 25d46f43a9 ipv6: Move comment to right place.
IN6ADDR_* and in6addr_* are not exported to userspace, and are defined
in include/linux/in6.h.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 21:04:37 -05:00
YOSHIFUJI Hideaki / 吉藤英明 dd3332bfcb ipv6: Store Router Alert option in IP6CB directly.
Router Alert option is very small and we can store the value
itself in the skb.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 daad151263 ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().
Move generalized version of ipv6_is_mld() to header,
and use it from ip6_mc_input().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 e7219858ac ipv6: Use ipv6_get_dsfield() instead of ipv6_tclass().
Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced
ipv6_tclass(), but similar function is already available as
ipv6_get_dsfield().

We might be able to call ipv6_tclass() from ipv6_get_dsfield(),
but it is confusing to have two versions.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:14 -05:00
YOSHIFUJI Hideaki / 吉藤英明 6502ca527f ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明 3e4e4c1f2d ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
Pablo Neira Ayuso 1e47ee8367 netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netns
canqun zhang reported that we're hitting BUG_ON in the
nf_conntrack_destroy path when calling kfree_skb while
rmmod'ing the nf_conntrack module.

Currently, the nf_ct_destroy hook is being set to NULL in the
destroy path of conntrack.init_net. However, this is a problem
since init_net may be destroyed before any other existing netns
(we cannot assume any specific ordering while releasing existing
netns according to what I read in recent emails).

Thanks to Gao feng for initial patch to address this issue.

Reported-by: canqun zhang <canqunzhang@gmail.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12 14:12:36 +01:00
Stanislaw Gruszka d07d7507bf net, wireless: overwrite default_ethtool_ops
Since:

commit 2c60db0370
Author: Eric Dumazet <edumazet@google.com>
Date:   Sun Sep 16 09:17:26 2012 +0000

    net: provide a default dev->ethtool_ops

wireless core does not correctly assign ethtool_ops.

After alloc_netdev*() call, some cfg80211 drivers provide they own
ethtool_ops, but some do not. For them, wireless core provide generic
cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call:

        if (!dev->ethtool_ops)
                dev->ethtool_ops = &cfg80211_ethtool_ops;

But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211
drivers without custom ethtool_ops), but points to &default_ethtool_ops.

In order to fix the problem, provide function which will overwrite
default_ethtool_ops and use it by wireless core.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11 15:55:48 -08:00
Linus Torvalds c727b4c63c Merge branch 'akpm' (incoming fixes from Andrew)
Merge misc fixes from Andrew Morton:
 "The audit fixes have been floating around for a while - Al and Eric
  aren't responding to either myself or Kees so I asked Kees to
  re-review them and here they are."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  lib/rbtree.c: avoid the use of non-static __always_inline
  MAINTAINERS: Omar had moved
  mm: compaction: partially revert capture of suitable high-order page
  linux/audit.h: move ptrace.h include to kernel header
  kernel/audit.c: avoid negative sleep durations
  audit: catch possible NULL audit buffers
  audit: create explicit AUDIT_SECCOMP event type
  MAINTAINERS: fix a status pattern
  MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h
  mm: thp: acquire the anon_vma rwsem for write during split
  mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep
  lockdep, rwsem: provide down_write_nest_lock()
  arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64
  mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
  mm: use aligned zone start for pfn_to_bitidx calculation
  fs/exec.c: work around icc miscompilation
  mm: compaction: fix echo 1 > compact_memory return error issue
  mm: memblock: fix wrong memmove size in memblock_merge_regions()
  drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
  mm: migrate: check page_count of THP before migrating
  ...
2013-01-11 14:55:15 -08:00
Michel Lespinasse 3cb7a56344 lib/rbtree.c: avoid the use of non-static __always_inline
lib/rbtree.c declared __rb_erase_color() as __always_inline void, and
then exported it with EXPORT_SYMBOL.

This was because __rb_erase_color() must be exported for augmented
rbtree users, but it must also be inlined into rb_erase() so that the
dummy callback can get optimized out of that call site.

(Actually with a modern compiler, none of the dummy callback functions
should even be generated as separate text functions).

The above usage is legal C, but it was unusual enough for some compilers
to warn about it.  This change makes things more explicit, with a static
__always_inline ____rb_erase_color function for use in rb_erase(), and a
separate non-inline __rb_erase_color function for use in
rb_erase_augmented call sites.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:56 -08:00
Mel Gorman 8fb74b9fb2 mm: compaction: partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket.  It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca0e ("mm: compaction: capture a suitable high-order page
immediately when it is made available").

The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed.  For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped.  However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.

In retrospect the capture patch took the wrong approach.  What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way.  While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested.  This patch
partially reverts commit 1fb3f8ca0e ("mm: compaction: capture a
suitable high-order page immediately when it is made available").

Reported-and-tested-by: Eric Wong <normalperson@yhbt.net>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:56 -08:00
Mike Frysinger c0a3a20b6c linux/audit.h: move ptrace.h include to kernel header
While the kernel internals want pt_regs (and so it includes
linux/ptrace.h), the user version of audit.h does not need it.  So move
the include out of the uapi version.

This avoids issues where people want the audit defines and userland
ptrace api.  Including both the kernel ptrace and the userland ptrace
headers can easily lead to failure.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:56 -08:00
Kees Cook 7b9205bd77 audit: create explicit AUDIT_SECCOMP event type
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
could only kill a process.  While we still want to make sure an audit
record is forced on a kill, this should use a separate record type since
seccomp mode 2 introduces other behaviors.

In the case of "handled" behaviors (process wasn't killed), only emit a
record if the process is under inspection.  This change also fixes
userspace examination of seccomp audit events, since it was considered
malformed due to missing fields of the AUDIT_ANOM_ABEND event type.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Julien Tinnes <jln@google.com>
Acked-by: Will Drewry <wad@chromium.org>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:55 -08:00
Jiri Kosina 1b963c81b1 lockdep, rwsem: provide down_write_nest_lock()
down_write_nest_lock() provides a means to annotate locking scenario
where an outer lock is guaranteed to serialize the order nested locks
are being acquired.

This is analogoue to already existing mutex_lock_nest_lock() and
spin_lock_nest_lock().

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mel@csn.ul.ie>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:55 -08:00
David Decotigny 896f97ea95 lib: cpu_rmap: avoid flushing all workqueues
In some cases, free_irq_cpu_rmap() is called while holding a lock (eg
rtnl).  This can lead to deadlocks, because it invokes
flush_scheduled_work() which ends up waiting for whole system workqueue
to flush, but some pending works might try to acquire the lock we are
already holding.

This commit uses reference-counting to replace
irq_run_affinity_notifiers().  It also removes
irq_run_affinity_notifiers() altogether.

[akpm@linux-foundation.org: eliminate free_cpu_rmap, rename cpu_rmap_reclaim() to cpu_rmap_release(), propagate kref_put() retval from cpu_rmap_put()]
Signed-off-by: David Decotigny <decot@googlers.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:54 -08:00
Linus Torvalds 52b820d917 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull intel DRM fixes from Dave Airlie:
 "Just intel fixes, including getting the Ironlake systems back to the
  state they were in for 3.6."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915: Revert shrinker changes from "Track unbound pages"
  drm/i915: Use pixel size for computing linear offsets into a sprite
  drm/i915: Add DEBUG messages to all intel_create_user_framebuffer error paths
  drm/i915: The sprite scaler on Ironlake also support YUV planes
  drm: Only evict the blocks required to create the requested hole
  drm/i915: Treat crtc->mode.clock == 0 as disabled
  Revert "drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13"
  drm/i915; Only increment the user-pin-count after successfully pinning the bo
2013-01-11 09:05:28 -08:00
Mel Gorman 47ecfcb7d0 mm: compaction: Partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket.  It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca0e ("mm: compaction: capture a suitable high-order page
immediately when it is made available").

The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed.  For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped.  However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.

In retrospect the capture patch took the wrong approach.  What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way.  While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested.  This patch
partially reverts commit 1fb3f8ca "mm: compaction: capture a suitable
high-order page immediately when it is made available".

Reported-and-tested-by: Eric Wong <normalperson@yhbt.net>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 09:02:00 -08:00