Commit Graph

2259 Commits

Author SHA1 Message Date
Yuchung Cheng 356d7d88e0 netfilter: nf_conntrack: fix tcp_in_window for Fast Open
Currently the conntrack checks if the ending sequence of a packet
falls within the observed receive window. However it does so even
if it has not observe any packet from the remote yet and uses an
uninitialized receive window (td_maxwin).

If a connection uses Fast Open to send a SYN-data packet which is
dropped afterward in the network. The subsequent SYNs retransmits
will all fail this check and be discarded, leading to a connection
timeout. This is because the SYN retransmit does not contain data
payload so

end == initial sequence number (isn) + 1
sender->td_end == isn + syn_data_len
receiver->td_maxwin == 0

The fix is to only apply this check after td_maxwin is initialized.

Reported-by: Michael Chan <mcfchan@stanford.edu>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-10 18:36:22 +02:00
Dan Carpenter e4d091d7bf netfilter: nfnetlink_{log,queue}: fix information leaks in netlink message
These structs have a "_pad" member.  Also the "phw" structs have an 8
byte "hw_addr[]" array but sometimes only the first 6 bytes are
initialized.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05 17:36:04 +02:00
Pablo Neira Ayuso a206bcb3b0 netfilter: xt_TCPOPTSTRIP: fix possible off by one access
Fix a possible off by one access since optlen()
touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1.

This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen
that stores the TCP header length, to save some cycles.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01 11:45:15 +02:00
Pablo Neira Ayuso 71ffe9c77d netfilter: xt_TCPMSS: fix handling of malformed TCP header and options
Make sure the packet has enough room for the TCP header and
that it is not malformed.

While at it, store tcph->doff*4 in a variable, as it is used
several times.

This patch also fixes a possible off by one in case of malformed
TCP options.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01 11:42:53 +02:00
Eric Dumazet baf60efa58 netfilter: xt_socket: fix broken v0 support
commit 681f130f39 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD
flag") added a potential NULL dereference if an old iptables package
uses v0 of the match.

Fix this by removing the test on @info in fast path.

IPv6 can remove the test as well, as it uses v1 or v2.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15 11:15:21 +02:00
Pablo Neira Ayuso f09eca8db0 netfilter: ctnetlink: fix incorrect NAT expectation dumping
nf_ct_expect_alloc leaves unset the expectation NAT fields. However,
ctnetlink_exp_dump_expect expects them to be zeroed in case they are
not used, which may not be the case. This results in dumping the NAT
tuple of the expectation when it should not.

Fix it by zeroing the NAT fields of the expectation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15 11:14:51 +02:00
David S. Miller 0c1072ae02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/renesas/sh_eth.c
	net/ipv4/gre.c

The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.

The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.

Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:55:13 -07:00
Florian Westphal 496e4ae7dc netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flag
The common case is that TCP/IP checksums have already been
verified, e.g. by hardware (rx checksum offload), or conntrack.

Userspace can use this flag to determine when the checksum
has not been validated yet.

If the flag is set, this doesn't necessarily mean that the packet has
an invalid checksum, e.g. if NIC doesn't support rx checksum.

Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can
infer that IP/TCP checksum has already been validated if either the
SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED
flag is unset.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-30 18:15:48 +02:00
Julian Anastasov 4d0c875dcc ipvs: add sync_persist_mode flag
Add sync_persist_mode flag to reduce sync traffic
by syncing only persistent templates.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Alexander Frolkin eba3b5a787 ipvs: SH fallback and L4 hashing
By default the SH scheduler rejects connections that are hashed onto a
realserver of weight 0.  This patch adds a flag to make SH choose a
different realserver in this case, instead of rejecting the connection.

The patch also adds a flag to make SH include the source port (TCP, UDP,
SCTP) in the hash as well as the source address.  This basically allows
for deterministic round-robin load balancing (i.e., where any director
in a cluster of directors with identical config will send the same
packet the same way).

The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
can be set per service.  They are set using a new option to ipvsadm.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov acaac5d8bb ipvs: drop SCTP connections depending on state
Drop SCTP connections under load (dropentry context) depending
on the protocol state, just like for TCP: INIT conns are
dropped immediately, established are dropped randomly while
connections in progress or shutdown are skipped.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov 61e7c420b4 ipvs: replace the SCTP state machine
Convert the SCTP state table, so that it is more readable.
Change the states to be according to the diagram in RFC 2960
and add more states suitable for middle box. Still, such
change in states adds incompatibility if systems in sync
setup include this change and others do not include it.

With this change we also have proper transitions in INPUT-ONLY
mode (DR/TUN) where we see packets only from client. Now
we should not switch to 10-second CLOSED state at a time
when we should stay in ESTABLISHED state.

The short names for states are because we have 16-char space
in ipvsadm and 11-char limit for the connection list format.
It is a sequence of the TCP implementation where the longest
state name is ESTABLISHED.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Alexander Frolkin c6c96c1883 ipvs: sloppy TCP and SCTP
This adds support for sloppy TCP and SCTP modes to IPVS.

When enabled (sysctls net.ipv4.vs.sloppy_tcp and
net.ipv4.vs.sloppy_sctp), allows IPVS to create connection state on any
packet, not just a TCP SYN (or SCTP INIT).

This allows connections to fail over from one IPVS director to another
mid-flight.

Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:46 +09:00
Julian Anastasov bba54de5bd ipvs: provide iph to schedulers
Before now the schedulers needed access only to IP
addresses and it was easy to get them from skb by
using ip_vs_fill_iph_addr_only.

New changes for the SH scheduler will need the protocol
and ports which is difficult to get from skb for the
IPv6 case. As we have all the data in the iph structure,
to avoid the same slow lookups provide the iph to schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26 18:01:45 +09:00
Florian Westphal 797a7d66d2 netfilter: ctnetlink: send event when conntrack label was modified
commit 0ceabd8387
(netfilter: ctnetlink: deliver labels to userspace) sets the event bit
when we raced with another packet, instead of raising the event bit
when the label bit is set for the first time.

commit 9b21f6a909
(netfilter: ctnetlink: allow userspace to modify labels) forgot to update
the event mask in the "conntrack already exists" case.

Both issues result in CTA_LABELS attribute not getting included in the
conntrack event.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-24 11:32:56 +02:00
Balazs Peter Odor 5aed93875c netfilter: nf_nat_sip: fix mangling
In (b20ab9c netfilter: nf_ct_helper: better logging for dropped packets)
there were some missing brackets around the logging information, thus
always returning drop.

Closes https://bugzilla.kernel.org/show_bug.cgi?id=60061

Signed-off-by: Balazs Peter Odor <balazs@obiserver.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-24 11:32:40 +02:00
Eric Dumazet 681f130f39 netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag
xt_socket module can be a nice replacement to conntrack module
in some cases (SYN filtering for example)

But it lacks the ability to match the 3rd packet of TCP
handshake (ACK coming from the client).

Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.

The wildcard is the legacy socket match behavior, that ignores
LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)

iptables -I INPUT -p tcp --syn -j SYN_CHAIN
iptables -I INPUT -m socket --nowildcard -j ACCEPT

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 20:28:49 +02:00
Florian Westphal 6547a22187 netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
When loose tracking is enabled (default), non-syn packets cause
creation of new conntracks in established state with default timeout for
established state (5 days).  This causes the table to fill up with UNREPLIED
when the 'new ack' packet happened to be the last-ack of a previous,
already timed-out connection.

Consider:

A 192.168.x.52792 > 10.184.y.80: F, 426:426(0) ack 9237 win 255
B 10.184.y.80 > 192.168.x.52792: ., ack 427 win 123
<61 second pause>
C 10.184.y.80 > 192.168.x.52792: F, 9237:9237(0) ack 427 win 123
D 192.168.x.52792 > 10.184.y.80: ., ack 9238 win 255

B moves conntrack to CLOSE_WAIT and will kill it after 60 second timeout,
C is ignored (FIN set), but last packet (D) causes new ct with 5-days timeout.

Use UNACK timeout (5 minutes) instead to get rid of these entries sooner
when in ESTABLISHED state without having seen traffic in both directions.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 11:20:13 +02:00
Daniel Borkmann 130ffbc263 netfilter: check return code from nla_parse_tested
These are the only calls under net/ that do not check nla_parse_nested()
for its error code, but simply continue execution. If parsing of netlink
attributes fails, we should return with an error instead of continuing.
In nearly all of these calls we have a policy attached, that is being
type verified during nla_parse_nested(), which we would miss checking
for otherwise.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-20 11:20:13 +02:00
David S. Miller d98cae64e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/ath/ath9k/Kconfig
	drivers/net/xen-netback/netback.c
	net/batman-adv/bat_iv_ogm.c
	net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved.  In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported.  Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic.  Instead they have
to explicitly do the locking.  There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so.  So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 16:49:39 -07:00
Julian Anastasov 06f3d7f973 ipvs: SCTP ports should be writable in ICMP packets
Make sure that SCTP ports are writable when embedded in ICMP
from client, so that ip_vs_nat_icmp can translate them safely.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-19 09:53:52 +09:00
Joe Perches fe2c6338fd net: Convert uses of typedef ctl_table to struct ctl_table
Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
  xargs perl -p -i -e '\
	sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
        s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 02:36:09 -07:00
Phil Oester b396966c46 netfilter: xt_TCPMSS: Fix missing fragmentation handling
Similar to commit bc6bcb59 ("netfilter: xt_TCPOPTSTRIP: fix
possible mangling beyond packet boundary"), add safe fragment
handling to xt_TCPMSS.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-12 11:06:19 +02:00
Phil Oester 70d19f805f netfilter: xt_TCPMSS: Fix IPv6 default MSS too
As a followup to commit 409b545a ("netfilter: xt_TCPMSS: Fix violation
of RFC879 in absence of MSS option"), John Heffner points out that IPv6
has a higher MTU than IPv4, and thus a higher minimum MSS. Update TCPMSS
target to account for this, and update RFC comment.

While at it, point to more recent reference RFC1122 instead of RFC879.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-12 11:04:41 +02:00
Eric Dumazet 45203a3b38 net_sched: add 64bit rate estimators
struct gnet_stats_rate_est contains u32 fields, so the bytes per second
field can wrap at 34360Mbit.

Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
and switch the kernel to use this structure natively.

This structure is dumped to user space as a new attribute :

TCA_STATS_RATE_EST64

Old tc command will now display the capped bps (to 34360Mbit), instead
of wrapped values, and updated tc command will display correct
information.

Old tc command output, after patch :

eric:~# tc -s -d qd sh dev lo
qdisc pfifo 8001: root refcnt 2 limit 1000p
 Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
 rate 34360Mbit 189696pps backlog 0b 0p requeues 0

This patch carefully reorganizes "struct Qdisc" layout to get optimal
performance on SMP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-11 02:51:03 -07:00
Pablo Neira Ayuso ed82c43732 netfilter: xt_TCPOPTSTRIP: don't use tcp_hdr()
In (bc6bcb5 netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond
packet boundary), the use of tcp_hdr was introduced. However, we
cannot assume that skb->transport_header is set for non-local packets.

Cc: Florian Westphal <fw@strlen.de>
Reported-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-11 01:55:07 +02:00
Dan Carpenter a8241c6351 ipvs: info leak in __ip_vs_get_dest_entries()
The entry struct has a 2 byte hole after ->port and another 4 byte
hole after ->stats.outpkts.  You must have CAP_NET_ADMIN in your
namespace to hit this information leak.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-10 14:53:00 +02:00
Pablo Neira Ayuso 7b8dfe289f netfilter: nfnetlink_queue: fix missing HW protocol
Locally generated IPv4 and IPv6 traffic gets skb->protocol unset,
thus passing zero.

ip6tables -I OUTPUT -j NFQUEUE
libmnl/examples/netfilter# ./nf-queue 0 &
ping6 ::1
packet received (id=1 hw=0x0000 hook=3)
                         ^^^^^^

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-07 18:55:20 +02:00
David S. Miller 143554ace8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Conflicts:
	net/netfilter/nf_log.c

The conflict in nf_log.c is that in 'net' we added CONFIG_PROC_FS
protection around foo_proc_entry() calls to fix a build failure,
whereas in Pablo's tree a guard if() test around a call is
remove_proc_entry() was removed.  Trivially resolved.

Pablo Neira Ayuso says:

====================
The following patchset contains the first batch of
Netfilter/IPVS updates for your net-next tree, they are:

* Three patches with improvements and code refactorization
  for nfnetlink_queue, from Florian Westphal.

* FTP helper now parses replies without brackets, as RFC1123
  recommends, from Jeff Mahoney.

* Rise a warning to tell everyone about ULOG deprecation,
  NFLOG has been already in the kernel tree for long time
  and supersedes the old logging over netlink stub, from
  myself.

* Don't panic if we fail to load netfilter core framework,
  just bail out instead, from myself.

* Add cond_resched_rcu, used by IPVS to allow rescheduling
  while walking over big hashtables, from Simon Horman.

* Change type of IPVS sysctl_sync_qlen_max sysctl to avoid
  possible overflow, from Zhang Yanfei.

* Use strlcpy instead of strncpy to skip zeroing of already
  initialized area to write the extension names in ebtables,
  from Chen Gang.

* Use already existing per-cpu notrack object from xt_CT,
  from Eric Dumazet.

* Save explicit socket lookup in xt_socket now that we have
  early demux, also from Eric Dumazet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06 01:03:06 -07:00
David S. Miller 6bc19fb82d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' bug fixes into 'net-next' as we have patches
that will build on top of them.

This merge commit includes a change from Emil Goode
(emilgoode@gmail.com) that fixes a warning that would
have been introduced by this merge.  Specifically it
fixes the pingv6_ops method ipv6_chk_addr() to add a
"const" to the "struct net_device *dev" argument and
likewise update the dummy_ipv6_chk_addr() declaration.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-05 16:37:30 -07:00
Phil Oester 409b545ac1 netfilter: xt_TCPMSS: Fix violation of RFC879 in absence of MSS option
The clamp-mss-to-pmtu option of the xt_TCPMSS target can cause issues
connecting to websites if there was no MSS option present in the
original SYN packet from the client. In these cases, it may add a
MSS higher than the default specified in RFC879. Fix this by never
setting a value > 536 if no MSS option was specified by the client.

This closes netfilter's bugzilla #662.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 13:59:22 +02:00
Florian Westphal 7f87712c01 netfilter: nfnetlink_queue: only add CAP_LEN attr when needed
CAP_LEN contains the size of the network packet we're queueing to
userspace, i.e. normally it is the same as the NFQA_PAYLOAD attribute len.

Include it only in the unlikely case when NFQA_PAYLOAD is truncated due
to copy_range limitations.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 12:40:54 +02:00
Florian Westphal 9cefbbc9c8 netfilter: nfnetlink_queue: cleanup copy_range usage
For every packet queued, we check if configured copy_range
is 0, and treat that as 'copy entire packet'.

We can move this check to the queue configuration, and can
set copy_range appropriately.

Also, convert repetitive '0xffff - NLA_HDRLEN' to a macro.

[ queue initialization still used 0xffff, although its harmless
  since the initial setting is overwritten on queue config ]

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 12:40:28 +02:00
Pablo Neira Ayuso 37bc4f8dfa netfilter: nfnetlink_cttimeout: fix incomplete dumping of objects
Fix broken incomplete object dumping if the list of objects does not
fit into one single netlink message.

Reported-by: Gabriel Lazar <Gabriel.Lazar@com.utcluj.ro>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 12:36:37 +02:00
Pablo Neira Ayuso 991a6b735f netfilter: nfnetlink_acct: fix incomplete dumping of objects
Fix broken incomplete object dumping if the list of objects does not
fit into one single netlink message.

Reported-by: Gabriel Lazar <Gabriel.Lazar@com.utcluj.ro>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-05 12:36:36 +02:00
Simon Horman 938177e9f3 netfilter: Correct calculation using skb->tail and skb-network_header
This corrects an regression introduced by "net: Use 16bits for *_headers
fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
that case skb->tail will be a pointer whereas skb->network_header
will be an offset from head. This is corrected by using wrappers that
ensure that calculations are always made using pointers.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31 16:38:25 -07:00
Jan Beulich a70b9641e6 ipvs: ip_vs_sh: fix build
kfree_rcu() requires offsetof(..., rcu_head) < 4096, which can
get violated with a sufficiently high CONFIG_IP_VS_SH_TAB_BITS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-29 17:50:39 +02:00
Michal Kubeček d660164d79 netfilter: xt_LOG: fix mark logging for IPv6 packets
In dump_ipv6_packet(), the "recurse" parameter is zero only if
dumping contents of a packet embedded into an ICMPv6 error
message. Therefore we want to log packet mark if recurse is
non-zero, not when it is zero.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-29 12:29:18 +02:00
Jiri Pirko 351638e7de net: pass info struct via netdevice notifier
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

v2->v3: fix typo on simeth
	shortened dev_getter
	shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-28 13:11:01 -07:00
Jeff Mahoney 4e7dba99c9 netfilter: Implement RFC 1123 for FTP conntrack
The FTP conntrack code currently only accepts the following format for
 the 227 response for PASV:
 227 Entering Passive Mode (148,100,81,40,31,161).

 It doesn't accept the following format from an obscure server:
 227 Data transfer will passively listen to 67,218,99,134,50,144

 From RFC 1123:
 The format of the 227 reply to a PASV command is not
 well standardized.  In particular, an FTP client cannot
 assume that the parentheses shown on page 40 of RFC-959
 will be present (and in fact, Figure 3 on page 43 omits
 them).  Therefore, a User-FTP program that interprets
 the PASV reply must scan the reply for the first digit
 of the host and port numbers.

 This patch adds support for the RFC 1123 clarification by:
 - Allowing a search filter to specify NUL as the terminator so that
   try_number will return successfully if the array of numbers has been
   filled when an unexpected character is encountered.
 - Using space as the separator for the 227 reply and then scanning for
   the first digit of the number sequence. The number sequence is parsed
   out using the existing try_rfc959 but with a NUL terminator.

References: https://bugzilla.novell.com/show_bug.cgi?id=466279
References: http://bugzilla.netfilter.org/show_bug.cgi?id=574
Reported-by: Mark Post <mpost@novell.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter-devel@vger.kernel.org
Cc: netfilter@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-27 13:32:43 +02:00
Grzegorz Lyczba dc7b3eb900 ipvs: Fix reuse connection if real server is dead
Expire cached connection for new TCP/SCTP connection if real
server is down. Otherwise, IPVS uses the dead server for the
reused connection, instead of a new working one.

Signed-off-by: Grzegorz Lyczba <grzegorz.lyczba@gmail.com>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-27 13:00:45 +02:00
Florian Westphal 9d5242b192 netfilter: nfnetlink_queue: avoid peer_portid test
The portid is set to NETLINK_CB(skb).portid at create time.
The run-time check will always be false.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-26 22:05:11 +02:00
Zhang Yanfei 0799567424 ipvs: change type of netns_ipvs->sysctl_sync_qlen_max
This member of struct netns_ipvs is calculated from nr_free_buffer_pages
so change its type to unsigned long in case of overflow.  Also, type of
its related proc var sync_qlen_max and the return type of function
sysctl_sync_qlen_max() should be changed to unsigned long, too.

Besides, the type of ipvs_master_sync_state->sync_queue_len should be
changed to unsigned long accordingly.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-05-26 08:17:33 +09:00
Simon Horman a38e5e230e ipvs: use cond_resched_rcu() helper when walking connections
This avoids the situation where walking of a large number of connections
may prevent scheduling for a long time while also avoiding excessive
calls to rcu_read_unlock() and rcu_read_lock().

Note that in the case of !CONFIG_PREEMPT_RCU this will
add a call to cond_resched().

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 14:23:18 +02:00
Pablo Neira Ayuso 6d11cfdba5 netfilter: don't panic on error while walking through the init path
Don't panic if we hit an error while adding the nf_log or pernet
netfilter support, just bail out.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
2013-05-23 14:22:30 +02:00
Florian Westphal 2a7851bffb netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812:

[ ip6tables -m addrtype ]
When I tried to use in the nat/PREROUTING it messes up the
routing cache even if the rule didn't matched at all.
[..]
If I remove the --limit-iface-in from the non-working scenario, so just
use the -m addrtype --dst-type LOCAL it works!

This happens when LOCAL type matching is requested with --limit-iface-in,
and the default ipv6 route is via the interface the packet we test
arrived on.

Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation
creates an unwanted cached entry, and the packet won't make it to the
real/expected destination.

Silently ignoring --limit-iface-in makes the routing work but it breaks
rule matching (--dst-type LOCAL with limit-iface-in is supposed to only
match if the dst address is configured on the incoming interface;
without --limit-iface-in it will match if the address is reachable
via lo).

The test should call ipv6_chk_addr() instead.  However, this would add
a link-time dependency on ipv6.

There are two possible solutions:

1) Revert the commit that moved ipt_addrtype to xt_addrtype,
   and put ipv6 specific code into ip6t_addrtype.
2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions.

While the former might seem preferable, Pablo pointed out that there
are more xt modules with link-time dependeny issues regarding ipv6,
so lets go for 2).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 11:58:55 +02:00
Eric Dumazet 00028aa370 netfilter: xt_socket: use IP early demux
With IP early demux added in linux-3.6, we perform TCP lookup in IP
layer before iptables hooks.

We can avoid doing a second lookup in xt_socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 11:09:53 +02:00
Eric Dumazet 27e7190efd netfilter: xt_CT: optimize XT_CT_NOTRACK
The percpu untracked ct are not currently used for XT_CT_NOTRACK.

xt_ct_tg_check()/xt_ct_target() provides a single ct.

Thats not optimal as the ct->ct_general.use cache line will bounce among
cpus.

Use the intended [1] thing : xt_ct_target() should select the percpu
object.

[1] Refs :
commit 5bfddbd46a ("netfilter: nf_conntrack: IPS_UNTRACKED bit")
commit b3c5163fe0 ("netfilter: nf_conntrack: per_cpu untracking")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-23 11:09:29 +02:00
Pablo Neira Ayuso bc6bcb59dd netfilter: xt_TCPOPTSTRIP: fix possible mangling beyond packet boundary
This target assumes that tcph->doff is well-formed, that may be well
not the case. Add extra sanity checkings to avoid possible crash due
to read/write out of the real packet boundary. After this patch, the
default action on malformed TCP packets is to drop them. Moreover,
fragments are skipped.

Reported-by: Rafal Kupka <rkupka@telemetry.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-16 17:35:53 +02:00
Hans Schillstrom 8cdb46da06 netfilter: log: netns NULL ptr bug when calling from conntrack
Since (69b34fb netfilter: xt_LOG: add net namespace support
for xt_LOG), we hit this:

[ 4224.708977] BUG: unable to handle kernel NULL pointer dereference at 0000000000000388
[ 4224.709074] IP: [<ffffffff8147f699>] ipt_log_packet+0x29/0x270

when callling log functions from conntrack both in and out
are NULL i.e. the net pointer is invalid.

Adding struct net *net in call to nf_logfn() will secure that
there always is a vaild net ptr.

Reported as netfilter's bugzilla bug 818:
https://bugzilla.netfilter.org/show_bug.cgi?id=818

Reported-by: Ronald <ronald645@gmail.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-15 14:11:07 +02:00