OVS locking was recently changed to have private OVS lock which
simplified overall locking. Therefore there is no need to have
another global genl lock to protect OVS data structures. Following
patch uses of parallel_ops genl family for OVS. This also allows
more granual OVS locking using ovs_mutex for protecting OVS data
structures, which gives more concurrencey. E.g multiple genl
operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All genl callbacks are serialized by genl-mutex. This can become
bottleneck in multi threaded case.
Following patch adds an parameter to genl_family so that a
particular family can get concurrent netlink callback without
genl_lock held.
New rw-sem is used to protect genl callback from genl family unregister.
in case of parallel_ops genl-family read-lock is taken for callbacks and
write lock is taken for register or unregistration for any family.
In case of locked genl family semaphore and gel-mutex is locked for
any openration.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 068a2de57d (net: release dst entry while
cache-hot for GSO case too)
Before GSO packet segmentation, we already take care of skb->dst if it
can be released.
There is no point adding extra test for every segment in the gso loop.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, packet_sock has a struct tpacket_stats stats member for
TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3
``union tpacket_stats_u stats_u'' was introduced, where however only
statistics for TPACKET_V3 are held, and when copied to user space,
TPACKET_V3 does some hackery and access also tpacket_stats' stats,
although everything could have been done within the union itself.
Unify accounting within the tpacket_stats_u union so that we can
remove 8 bytes from packet_sock that are there unnecessary. Note that
even if we switch to TPACKET_V3 and would use non mmap(2)ed option,
this still works due to the union with same types + offsets, that are
exposed to the user space.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a 4 byte hole in packet_ring_buffer structure before
prb_bdqc, that can be filled with 'pending' member, thus we can
reduce the overall structure size from 224 bytes to 216 bytes.
This also has the side-effect, that in struct packet_sock 2*4 byte
holes after the embedded packet_ring_buffer members are removed,
and overall, packet_sock can be reduced by 1 cacheline:
Before: size: 1344, cachelines: 21, members: 24
After: size: 1280, cachelines: 20, members: 24
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there is no way to find out which timestamp is reported in
tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of
SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE,
SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the
PF_PACKET code in software.
Therefore, report in the tp_status member of the ring buffer which
timestamp has been reported for RX and TX path. This should not break
anything for the following reasons: i) in RX ring path, the user needs
to test for tp_status & TP_STATUS_USER, and later for other flags as
well such as TP_STATUS_VLAN_VALID et al, so adding other flags will
do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP
socketoption are not available resp. had no effect except that the
application setting this is buggy. Next to TP_STATUS_AVAILABLE, the
user also should check for other flags such as TP_STATUS_WRONG_FORMAT
to reclaim frames to the application. Thus, in case TX ts are turned
off (default case), nothing happens to the application logic, and in
case we want to use this new feature, we now can also check which of
the ts source is reported in the status field as provided in the docs.
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we only have software timestamping for the TX ring buffer
path, but this limitation stems rather from the implementation. By
just reusing tpacket_get_timestamp(), we can also allow hardware
timestamping just as in the RX path.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transmit timestamping is enabled at the socket level, record a
timestamp on packets written to a PACKET_TX_RING. Tx timestamps are
always looped to the application over the socket error queue. Software
timestamps are also written back into the packet frame header in the
packet ring.
Reported-by: Paul Chavent <paul.chavent@onera.fr>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains fixes for recently applied
Netfilter/IPVS updates to the net-next tree, most relevantly
they are:
* Fix sparse warnings introduced in the RCU conversion, from
Julian Anastasov.
* Fix wrong endianness in the size field of IPVS sync messages,
from Simon Horman.
* Fix missing if checking in nf_xfrm_me_harder, from Dan Carpenter.
* Fix off by one access in the IPVS SCTP tracking code, again from
Dan Carpenter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This if statement was accidentally dropped in (aaa795a netfilter:
nat: propagate errors from xfrm_me_harder()) so now it returns
unconditionally.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
John W. Linville says:
====================
Here is one last(?) big wireless bits pull request before the merge window...
Regarding the mac80211 bits, Johannes says:
"Here's another big pull request for the -next stream. This one has a ton
of driver updates, which hopefully addresses all drivers, but maybe you
have more new drivers than I have in my tree? Not entirely sure, let me
know if this is the case and then I can merge wireless-next.
I'm including a large number of small changes, see the shortlog. The two
bigger things are making VHT compatible with not using channel contexts
(from Karl) and the stop-while-suspended fixes I developed together with
Stanislaw."
...and...
"This time I have a relatively large number of fixes and small
improvements, the most important one being Bob's RCU fix. The two big
things are Felix's work on rate scaling tables (with a big thanks to
Karl too) and my own work on CSA handling to finally properly handle HT
(and some VHT.)"
As for the iwlwifi bits, Johannes says:
"The biggest work here is Bluetooth coexistence and power saving. Other
than that, I have a few small fixes that weren't really needed for 3.9
and a new PCI ID."
About the NFC bits, Samuel says:
"With this one we have:
- A major pn533 update. The pn533 framing support has been changed in order to
easily support all pn533 derivatives. For example we now support the ACR122
USB dongle.
- An NFC MEI physical layer code factorization through the mei_phy NFC API.
Both the microread and the pn544 drivers now use it.
- LLCP aggregation support. This allows NFC p2p devices to send aggregated
frames containing all sort of LLCP frames except SYMM and aggregation
frames.
- More LLCP socket options for getting the remote device link parameters.
- Fixes for the LLCP socket option code added with the first pull request for
3.10.
- Some support for LLCP corner cases like 0 length SDUs and general DISC
(tagged with a 0,0 dsap ssap couple) handling.
- RFKILL support for NFC."
For the b43 bits, Rafał says:
"Let me remind the changes for b43:
> Changes include:
> 1) Minor improvements for HT-PHY code (BCM4331)
> 2) Code cleaning for HT-PHY and N-PHY"
Concerning the bluetooth bits, Gustavo says:
"A set of changes intended for 3.10. The biggest changes here are from David
Herrmann, he rewrote most of the HIDP layer making it more reliable. Marcel
added a driver setup stage for device that need special handling on their
early initialization. Other than that we have the usual clean ups, bugfixes
and small improvements."
Along with all that, there is the usual collection of random/various
updates to ath9k, mwifiex, brcmfmac, brcmsmac, rt2x00, and wil6210.
I also included a pull of the wireless tree to resolve a merge conflict.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending of the kernel configuration (CONFIG_UIDGID_STRICT_TYPE_CHECKS), we can
get the following errors:
net/netlink/af_netlink.c: In function ‘netlink_queue_mmaped_skb’:
net/netlink/af_netlink.c:663:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’
net/netlink/af_netlink.c:664:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’
net/netlink/af_netlink.c: In function ‘netlink_ring_set_copied’:
net/netlink/af_netlink.c:693:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’
net/netlink/af_netlink.c:694:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’
We must use the helpers to get the uid and gid, and also take care of user_ns.
Fix suggested by Eric W. Biederman <ebiederm@xmission.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netlink/diag.c: In function 'sk_diag_put_rings_cfg':
net/netlink/diag.c:28:17: error: 'struct netlink_sock' has no member named 'pg_vec_lock'
net/netlink/diag.c:29:29: error: 'struct netlink_sock' has no member named 'rx_ring'
net/netlink/diag.c:31:30: error: 'struct netlink_sock' has no member named 'tx_ring'
net/netlink/diag.c:33:19: error: 'struct netlink_sock' has no member named 'pg_vec_lock'
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cc: Daniel Martensson <Daniel.Martensson@stericsson.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove my soon bouncing email address.
Also remove the "Contact:" line in file header.
The MAINTAINERS file is a better place to find the
contact person anyway.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ip_vs_sync_mesg and ip_vs_sync_mesg_v0 are both sent across the wire
and used internally to store IPVS synchronisation messages.
Up until now the scheme used has been to convert the size field
to network byte order before sending a message on the wire and
convert it to host byte order when sending a message.
This patch changes that scheme to always treat the field
as being network byte order. This seems appropriate as
the structure is sent across the wire. And by consistently
treating the field has network byte order it is now possible
to take advantage of sparse to flag any future miss-use.
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
The sctp_events[] come from sch->type in set_sctp_state(). They are
between 0-255 so that means we need 256 elements in the array.
I believe that because of how the code is aligned there is normally a
hole after sctp_events[] so this patch doesn't actually change anything.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
There are two motivations for this:
1. It improves readability to my eyes
2. Using nested min() calls results in a shadowed _min1 variable,
which is a bit untidy. Sparse complained about this.
I have also replaced (size_t)64 with a variable of type size_t and value 64.
This also improves readability to my eyes.
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Some service fields are in network order:
- netmask: used once in network order and also as prefix len for IPv6
- port
Other parameters are in host order:
- struct ip_vs_flags: flags and mask moved between user and kernel only
- sync state: moved between user and kernel only
- syncid: sent over network as single octet
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
kbuild test robot reports for sparse warnings in
commits c2a4ffb70e ("ipvs: convert lblc scheduler to rcu")
and c5549571f9 ("ipvs: convert lblcr scheduler to rcu").
Fix it by removing extra __rcu annotation.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
- RCU annotations for ip_vs_info_seq_start and _stop
- __percpu for cpustats
- properly dereference svc->pe in ip_vs_genl_fill_service
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
kbuild test robot reports for sparse warnings
in commit 088339a57d ("ipvs: convert connection locking"):
net/netfilter/ipvs/ip_vs_conn.c:962:13: warning: context imbalance
in 'ip_vs_conn_array' - wrong count at exit
include/linux/rcupdate.h:326:30: warning: context imbalance in
'ip_vs_conn_seq_next' - unexpected unlock
include/linux/rcupdate.h:326:30: warning: context imbalance in
'ip_vs_conn_seq_stop' - unexpected unlock
Fix it by running ip_vs_conn_array under RCU lock
to avoid conditional locking and by adding proper RCU
annotations.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Use rcu_dereference_protected to resolve
sparse warning, found by kbuild test robot:
net/netfilter/ipvs/ip_vs_ctl.c:1464:35: warning: dereference of
noderef expression
Problem from commit 026ace060d
("ipvs: optimize dst usage for real server")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
batadv_mesh_free() schedules some RCU callbacks which need the bat_priv struct
to do their jobs, while free_netdev(), which is called immediately after, is
destroying the private data.
Put an rcu_barrier() in the middle so that free_netdev() is invoked only after
all the callbacks returned.
This bug has been introduced by ab8f433dd39be94e8617cff2dfe9f7eca162eb15
("batman-adv: Move deinitialization of soft-interface to destructor")
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c
The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.
The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.
An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.
Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.
Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.
Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sctp_packet is currently embedded into sctp_transport or
sits on the stack as 'singleton' in sctp_outq_flush(). Therefore,
its member 'malloced' is always 0, thus a kfree() is never called.
Because of that, we can just remove this code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value from list_netdevice() is not used and no need, so remove it.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the rate selection table to mac80211 from minstrel_update_stats.
Only rates for sample attempts are set in info->control.rates, with deferred
sampling, only the second slot gets changed.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pass the rate selection table to mac80211 from minstrel_ht_update_stats.
Only rates for sample attempts are set in info->control.rates.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow rate control modules to pass a rate selection table to mac80211
and the driver. This allows drivers to fetch the most recent rate
selection from the sta pointer for already buffered frames. This allows
rate control to respond faster to sudden link changes and it is also a
step towards adding minstrel_ht support to drivers like iwlwifi.
When a driver sets IEEE80211_HW_SUPPORTS_RC_TABLE, mac80211 will not
fill info->control.rates with rates from the rate table (to preserve
explicit overrides by the rate control module). The driver then
explicitly calls ieee80211_get_tx_rates to merge overrides from
info->control.rates with defaults from the sta rate table.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some protocols need a more reliable connection to complete
successful in reasonable time. This patch adds a user-space
API to indicate the wireless driver that a critical protocol
is about to commence and when it is done, using nl80211 primitives
NL80211_CMD_CRIT_PROTOCOL_START and NL80211_CRIT_PROTOCOL_STOP.
There can be only on critical protocol session started per
registered cfg80211 device.
The driver can support this by implementing the cfg80211 callbacks
.crit_proto_start() and .crit_proto_stop(). Examples of protocols
that can benefit from this are DHCP, EAPOL, APIPA. Exactly how the
link can/should be made more reliable is up to the driver. Things
to consider are avoid scanning, no multi-channel operations, and
alter coexistence schemes.
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Franky (Zhenhui) Lin <frankyl@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Initialize {mp,mi}->{max_tp_rate,max_tp_rate2,max_prob_rate} in
minstrel_ht's rate_init and rate_update.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
minstrel_ht initializes max_tp_rate max_tp_rate2 and max_prob_rate to
zero both for minstrel_ht_sta and minstrel_mcs_group_data.
This is wrong since there is no guarantee that the 1st rate of any
group is supported.
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
[fix some indentation on the way]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The RCU docs used to state that rcu_barrier() included a wait
for an RCU grace period; however the comments for rcu_barrier()
as of commit f0a0e6f... "rcu: Clarify memory-ordering properties
of grace-period primitives" contradict this.
So add back synchronize_{rcu,net}() to where they once were,
but keep the rcu_barrier()s for the call_rcu() callbacks.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Bob Copeland <bob@cozybit.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some driver implementations need to know whether mandatory
admission control is required by the AP for some ACs. Add
a parameter to the TX queue parameters indicating this.
As there's currently no support for admission control in
mac80211's AP implementation, it's only ever set for the
client implementation.
Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In cfg80211_can_use_iftype_chan(), check for P2P Device
first, and then for netdevs. This doesn't really change
anything but makes the code a bit easier to read since
it may not be obvious for everyone at first that a P2P
device has no netdev.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211_stop_p2p_device() requires the devlist_mtx to
be held, but nl80211_stop_p2p_device() doesn't acquire
it which is a locking error and causes a warning (when
lockdep is enabled). Fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add missing return statement for CONFIG_BUG=n.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following leak is reported by kmemleak:
[ 86.812073] kmemleak: Found object by alias at 0xffff88006ecc76f0
[ 86.816019] Pid: 739, comm: kworker/u:1 Not tainted 3.9.0-rc5+ #842
[ 86.816019] Call Trace:
[ 86.816019] <IRQ> [<ffffffff81151c58>] find_and_get_object+0x8c/0xdf
[ 86.816019] [<ffffffff8190e90d>] ? vlan_info_rcu_free+0x33/0x49
[ 86.816019] [<ffffffff81151cbe>] delete_object_full+0x13/0x2f
[ 86.816019] [<ffffffff8194bbb6>] kmemleak_free+0x26/0x45
[ 86.816019] [<ffffffff8113e8c7>] slab_free_hook+0x1e/0x7b
[ 86.816019] [<ffffffff81141c05>] kfree+0xce/0x14b
[ 86.816019] [<ffffffff8190e90d>] vlan_info_rcu_free+0x33/0x49
[ 86.816019] [<ffffffff810d0b0b>] rcu_do_batch+0x261/0x4e7
The reason is that in vlan_info_rcu_free() we don't take the VLAN protocol
into account when iterating over the vlan_devices_array.
Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If one does do something unfortunate and allow a
bad offload bug into the kernel, this the
skb_warn_bad_offload can effectively live-lock the
system, filling the logs with the same error over
and over.
Add rate limitation to this so that box remains otherwise
functional in this case.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains a small batch of Netfilter
updates for your net-next tree, they are:
* Three patches that provide more accurate error reporting to
user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing
code and NAT, from Patrick McHardy.
* Update copyright statements in Netfilter filters of
Patrick McHardy, from himself.
* Add Kconfig dependency on the raw/mangle tables to the
rpfilter, from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>