linux

Commit Graph

Author	SHA1	Message	Date
Sage Weil	c5c6b19d4b	ceph: explicitly specify page alignment in network messages The alignment used for reading data into or out of pages used to be taken from the data_off field in the message header. This only worked as long as the page alignment matched the object offset, breaking direct io to non-page aligned offsets. Instead, explicitly specify the page alignment next to the page vector in the ceph_msg struct, and use that instead of the message header (which probably shouldn't be trusted). The alloc_msg callback is responsible for filling in this field properly when it sets up the page vector. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:43:17 -08:00
Sage Weil	b7495fc2ff	ceph: make page alignment explicit in osd interface We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:43:12 -08:00
Sage Weil	e98b6fed84	ceph: fix comment, remove extraneous args The offset/length arguments aren't used. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:24:53 -08:00
Eric Dumazet	332dd96f7a	net/dst: dst_dev_event() called after other notifiers Followup of commit `ef885afbf8` (net: use rcu_barrier() in rollback_registered_many) dst_dev_event() scans a garbage dst list that might be feeded by various network notifiers at device dismantle time. Its important to call dst_dev_event() after other notifiers, or we might enter the infamous msleep(250) in netdev_wait_allrefs(), and wait one second before calling again call_netdevice_notifiers(NETDEV_UNREGISTER, dev) to properly remove last device references. Use priority -10 to let dst_dev_notifier be called after other network notifiers (they have the default 0 priority) Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 12:17:16 -08:00
Kulikov Vasiliy	88f8a5e3e7	net: tipc: fix information leak to userland Structure sockaddr_tipc is copied to userland with padding bytes after "id" field in union field "name" unitialized. It leads to leaking of contents of kernel stack memory. We have to initialize them to zero. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 09:25:46 -08:00
Eric Dumazet	18943d292f	inet: fix ip_mc_drop_socket() commit `8723e1b4ad` (inet: RCU changes in inetdev_by_index()) forgot one call site in ip_mc_drop_socket() We should not decrease idev refcount after inetdev_by_index() call, since refcount is not increased anymore. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 08:26:42 -08:00
Luiz Augusto von Dentz	63ce0900d7	Bluetooth: fix not setting security level when creating a rfcomm session This cause 'No Bonding' to be used if userspace has not yet been paired with remote device since the l2cap socket used to create the rfcomm session does not have any security level set. Signed-off-by: Luiz Augusto von Dentz <luiz.dentz-von@nokia.com> Acked-by: Ville Tervo <ville.tervo@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:56:10 -02:00
Gustavo F. Padovan	4f8b691c9f	Bluetooth: fix endianness conversion in L2CAP Last commit added a wrong endianness conversion. Fixing that. Reported-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:56:09 -02:00
steven miao	bfaaeb3ed5	Bluetooth: fix unaligned access to l2cap conf data In function l2cap_get_conf_opt() and l2cap_add_conf_opt() the address of opt->val sometimes is not at the edge of 2-bytes/4-bytes, so 2-bytes/4 bytes access will cause data misalignment exeception. Use get_unaligned_le16/32 and put_unaligned_le16/32 function to avoid data misalignment execption. Signed-off-by: steven miao <realmz6@gmail.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:56:00 -02:00
Johan Hedberg	bdb7524a75	Bluetooth: Fix non-SSP auth request for HIGH security level sockets When initiating dedicated bonding a L2CAP raw socket with HIGH security level is used. The kernel is supposed to trigger the authentication request in this case but this doesn't happen currently for non-SSP (pre-2.1) devices. The reason is that the authentication request happens in the remote extended features callback which never gets called for non-SSP devices. This patch fixes the issue by requesting also authentiation in the (normal) remote features callback in the case of non-SSP devices. This rule is applied only for HIGH security level which might at first seem unintuitive since on the server socket side MEDIUM is already enough for authentication. However, for the clients we really want to prefer the server side to decide the authentication requrement in most cases, and since most client sockets use MEDIUM it's better to be avoided on the kernel side for these sockets. The important socket to request it for is the dedicated bonding one and that socket uses HIGH security level. The patch is based on the initial investigation and patch proposal from Andrei Emeltchenko <endrei.emeltchenko@nokia.com>. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:55:27 -02:00
Randy Dunlap	96c99b473a	Bluetooth: fix hidp kconfig dependency warning Fix kconfig dependency warning to satisfy dependencies: warning: (BT_HIDP && NET && BT && BT_L2CAP && INPUT \|\| USB_HID && HID_SUPPORT && USB && INPUT) selects HID which has unmet direct dependencies (HID_SUPPORT && INPUT) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:55:27 -02:00
Brian Cavagnolo	352ffad646	mac80211: unset SDATA_STATE_OFFCHANNEL when cancelling a scan For client STA interfaces, ieee80211_do_stop unsets the relevant interface's SDATA_STATE_RUNNING state bit prior to cancelling an interrupted scan. When ieee80211_offchannel_return is invoked as part of cancelling the scan, it doesn't bother unsetting the SDATA_STATE_OFFCHANNEL bit because it sees that the interface is down. Normally this doesn't matter because when the client STA interface is brought back up, it will probably issue a scan. But in some cases (e.g., the user changes the interface type while it is down), the SDATA_STATE_OFFCHANNEL bit will remain set. This prevents the interface queues from being started. So we cancel the scan before unsetting the SDATA_STATE_RUNNING bit. Signed-off-by: Brian Cavagnolo <brian@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-08 16:53:47 -05:00
Felix Fietkau	3cc25e510d	cfg80211: fix a crash in dev lookup on dump commands IS_ERR and PTR_ERR were called with the wrong pointer, leading to a crash when cfg80211_get_dev_from_ifindex fails. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-08 16:53:47 -05:00
Pavel Emelyanov	aa58163a76	rds: Fix rds message leak in rds_message_map_pages The sgs allocation error path leaks the allocated message. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:09 -08:00
Junchang Wang	eb589063ed	pktgen: correct uninitialized queue_map This fix a bug reported by backyes. Right the first time pktgen's using queue_map that's not been initialized by set_cur_queue_map(pkt_dev); Signed-off-by: Junchang Wang <junchangwang@gmail.com> Signed-off-by: Backyes <backyes@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:07 -08:00
Shan Wei	f46421416f	ipv6: fix overlap check for fragments The type of FRAG6_CB(prev)->offset is int, skb->len is unsigned int, and offset is int. Without this patch, type conversion occurred to this expression, when (FRAG6_CB(prev)->offset + prev->len) is less than offset. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:06 -08:00
stephen hemminger	4c46ee5258	classifier: report statistics for basic classifier The basic classifier keeps statistics but does not report it to user space. This showed up when using basic classifier (with police) as a default catch all on ingress; no statistics were reported. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:05 -08:00
Dmitry Torokhov	86c2c0a8a4	NET: pktgen - fix compile warning This should fix the following warning: net/core/pktgen.c: In function ‘pktgen_if_write’: net/core/pktgen.c:890: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Reviewed-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-07 05:28:01 -08:00
Linus Torvalds	4b4a2700f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits) inet_diag: Make sure we actually run the same bytecode we audited. netlink: Make nlmsg_find_attr take a const nlmsghdr*. fib: fib_result_assign() should not change fib refcounts netfilter: ip6_tables: fix information leak to userspace cls_cgroup: Fix crash on module unload memory corruption in X.25 facilities parsing net dst: fix percpu_counter list corruption and poison overwritten rds: Remove kfreed tcp conn from list rds: Lost locking in loop connection freeing de2104x: fix panic on load atl1 : fix panic on load netxen: remove unused firmware exports caif: Remove noisy printout when disconnecting caif socket caif: SPI-driver bugfix - incorrect padding. caif: Bugfix for socket priority, bindtodev and dbg channel. smsc911x: Set Ethernet EEPROM size to supported device's size ipv4: netfilter: ip_tables: fix information leak to userland ipv4: netfilter: arp_tables: fix information leak to userland cxgb4vf: remove call to stop TX queues at load time. cxgb4: remove call to stop TX queues at load time. ...	2010-11-05 15:25:48 -07:00
Nelson Elhage	22e76c849d	inet_diag: Make sure we actually run the same bytecode we audited. We were using nlmsg_find_attr() to look up the bytecode by attribute when auditing, but then just using the first attribute when actually running bytecode. So, if we received a message with two attribute elements, where only the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different bytecode strings. Fix this by consistently using nlmsg_find_attr everywhere. Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-04 12:26:34 -07:00
Eric Dumazet	1f1b9c9990	fib: fib_result_assign() should not change fib refcounts After commit `ebc0ffae5` (RCU conversion of fib_lookup()), fib_result_assign() should not change fib refcounts anymore. Thanks to Michael who did the bisection and bug report. Reported-by: Michael Ellerman <michael@ellerman.id.au> Tested-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-04 12:05:32 -07:00
Jan Engelhardt	cccbe5ef85	netfilter: ip6_tables: fix information leak to userspace Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:55:39 -07:00
David S. Miller	758cb41106	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6	2010-11-03 18:52:32 -07:00
Herbert Xu	c00b2c9e79	cls_cgroup: Fix crash on module unload Somewhere along the lines net_cls_subsys_id became a macro when cls_cgroup is built as a module. Not only did it make cls_cgroup completely useless, it also causes it to crash on module unload. This patch fixes this by removing that macro. Thanks to Eric Dumazet for diagnosing this problem. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:50 -07:00
andrew hendry	a6331d6f9a	memory corruption in X.25 facilities parsing Signed-of-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:50 -07:00
Xiaotian Feng	41bb78b4b9	net dst: fix percpu_counter list corruption and poison overwritten There're some percpu_counter list corruption and poison overwritten warnings in recent kernel, which is resulted by `fc66f95c`. commit `fc66f95c` switches to use percpu_counter, in ip6_route_net_init, kernel init the percpu_counter for dst entries, but, the percpu_counter is never destroyed in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter is still on the list, then if we insert/remove other percpu_counter, list corruption resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of the freed value, the poison overwritten is resulted then. With the following patch, the percpu_counter list corruption and poison overwritten warnings disappeared. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:07 -07:00
Pavel Emelyanov	8200a59f24	rds: Remove kfreed tcp conn from list All the rds_tcp_connection objects are stored list, but when being freed it should be removed from there. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:07 -07:00
Pavel Emelyanov	58c490babd	rds: Lost locking in loop connection freeing The conn is removed from list in there and this requires proper lock protection. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:06 -07:00
sjur.brandeland@stericsson.com	47d1ff1765	caif: Remove noisy printout when disconnecting caif socket Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:04 -07:00
André Carvalho de Matos	f2527ec436	caif: Bugfix for socket priority, bindtodev and dbg channel. Changes: o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled in caif's setsockopt, using the struct sock attribute priority instead. o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled in caif's setsockopt, using the struct sock attribute ifindex instead. o Wrong assert statement for RFM layer segmentation. o CAIF Debug channels was not working over SPI, caif_payload_info containing padding info must be initialized. o Check on pointer before dereferencing when unregister dev in caif_dev.c Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:03 -07:00
Vasiliy Kulikov	b5f15ac4f8	ipv4: netfilter: ip_tables: fix information leak to userland Structure ipt_getinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-03 08:45:06 +01:00
Vasiliy Kulikov	1a8b7a6722	ipv4: netfilter: arp_tables: fix information leak to userland Structure arpt_getinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-03 08:44:12 +01:00
Sage Weil	df9f86faf3	ceph: fix small seq message skipping If the client gets out of sync with the server message sequence number, we normally skip low seq messages (ones we already received). The skip code was also incrementing the expected seq, such that all subsequent messages also appeared old and got skipped, and an eventual timeout on the osd connection. This resulted in some lagging requests and console messages like [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017 [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018 [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019 [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020 [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021 [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022 [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023 [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024 [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025 [233500.534614] ceph: tid 6023602 timed out on osd22, will reset osd Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-01 15:49:23 -07:00
Tom Herbert	df32cc193a	net: check queue_index from sock is valid for device In dev_pick_tx recompute the queue index if the value stored in the socket is greater than or equal to the number of real queues for the device. The saved index in the sock structure is not guaranteed to be appropriate for the egress device (this could happen on a route change or in presence of tunnelling). The result of the queue index being bad would be to return a bogus queue (crash could prersumably follow). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-01 12:55:52 -07:00
Dr. David Alan Gilbert	6f9b901823	l2tp: kzalloc with swapped params in l2tp_dfs_seq_open 'sparse' spotted that the parameters to kzalloc in l2tp_dfs_seq_open were swapped. Tested on current git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git at `1792f17b72` build, boots and I can see that directory, but there again I could see /sys/kernel/debug/l2tp with it swapped; I don't have any l2tp in use. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-01 06:56:02 -07:00
Thomas Graf	5ec1cea057	text ematch: check for NULL pointer before destroying textsearch config While validating the configuration em_ops is already set, thus the individual destroy functions are called, but the ematch data has not been allocated and associated with the ematch yet. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-31 09:37:38 -07:00
Linus Torvalds	3985c7ce85	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: isdn: mISDN: socket: fix information leak to userland netdev: can: Change mail address of Hans J. Koch pcnet_cs: add new_id net: Truncate recvfrom and sendto length to INT_MAX. RDS: Let rds_message_alloc_sgs() return NULL RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace RDS: Clean up error handling in rds_cmsg_rdma_args RDS: Return -EINVAL if rds_rdma_pages returns an error net: fix rds_iovec page count overflow can: pch_can: fix section mismatch warning by using a whitelisted name can: pch_can: fix sparse warning netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe ip_gre: fix fallback tunnel setup vmxnet: trivial annotation of protocol constant vmxnet3: remove unnecessary byteswapping in BAR writing macros ipv6/udp: report SndbufErrors and RcvbufErrors phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr	2010-10-30 18:42:58 -07:00
Linus Torvalds	253eacc070	net: Truncate recvfrom and sendto length to INT_MAX. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:44:07 -07:00
Andy Grover	d139ff0907	RDS: Let rds_message_alloc_sgs() return NULL Even with the previous fix, we still are reading the iovecs once to determine SGs needed, and then again later on. Preallocating space for sg lists as part of rds_message seemed like a good idea but it might be better to not do this. While working to redo that code, this patch attempts to protect against userspace rewriting the rds_iovec array between the first and second accesses. The consequences of this would be either a too-small or too-large sg list array. Too large is not an issue. This patch changes all callers of message_alloc_sgs to handle running out of preallocated sgs, and fail gracefully. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:18 -07:00
Andy Grover	fc8162e3c0	RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace Change rds_rdma_pages to take a passed-in rds_iovec array instead of doing copy_from_user itself. Change rds_cmsg_rdma_args to copy rds_iovec array once only. This eliminates the possibility of userspace changing it after our sanity checks. Implement stack-based storage for small numbers of iovecs, based on net/socket.c, to save an alloc in the extremely common case. Although this patch reduces iovec copies in cmsg_rdma_args to 1, we still do another one in rds_rdma_extra_size. Getting rid of that one will be trickier, so it'll be a separate patch. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:17 -07:00
Andy Grover	f4a3fc03c1	RDS: Clean up error handling in rds_cmsg_rdma_args We don't need to set ret = 0 at the end -- it's initialized to 0. Also, don't increment s_send_rdma stat if we're exiting with an error. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:17 -07:00
Andy Grover	a09f69c49b	RDS: Return -EINVAL if rds_rdma_pages returns an error rds_cmsg_rdma_args would still return success even if rds_rdma_pages returned an error (or overflowed). Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:16 -07:00
Linus Torvalds	1b1f693d7a	net: fix rds_iovec page count overflow As reported by Thomas Pollet, the rdma page counting can overflow. We get the rdma sizes in 64-bit unsigned entities, but then limit it to UINT_MAX bytes and shift them down to pages (so with a possible "+1" for an unaligned address). So each individual page count fits comfortably in an 'unsigned int' (not even close to overflowing into signed), but as they are added up, they might end up resulting in a signed return value. Which would be wrong. Catch the case of tot_pages turning negative, and return the appropriate error code. Reported-by: Thomas Pollet <thomas.pollet@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:16 -07:00
Eric Dumazet	3285ee3bb2	ip_gre: fix fallback tunnel setup Before making the fallback tunnel visible to lookups, we should make sure it is completely setup, once ipgre_tunnel_init() had been called and tstats per_cpu pointer allocated. move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from ipgre_fb_tunnel_init() to ipgre_init_net() Based on a patch from Pavel Emelyanov Reported-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:21:28 -07:00
Eric Dumazet	870be39258	ipv6/udp: report SndbufErrors and RcvbufErrors commit `a18135eb93` (Add UDP_MIB_{SND,RCV}BUFERRORS handling.) forgot to make the necessary changes in net/ipv6/proc.c to report additional counters in /proc/net/snmp6 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:17:23 -07:00
Linus Torvalds	1840897ab5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd mac80211: fix failure to check kmalloc return value in key_key_read libertas: Fix sd8686 firmware reload ath9k: Fix incorrect access of rate flags in RC netfilter: xt_socket: Make tproto signed in socket_mt6_v1(). stmmac: enable/disable rx/tx in the core with a single write. net: atarilance - flags should be unsigned long netxen: fix kdump pktgen: Limit how much data we copy onto the stack. net: Limit socket I/O iovec total length to INT_MAX. USB: gadget: fix ethernet gadget crash in gether_setup fib: Fix fib zone and its hash leak on namespace stop cxgb3: Fix panic in free_tx_desc() cxgb3: fix crash due to manipulating queues before registration 8390: Don't oops on starting dev queue dccp ccid-2: Stop polling dccp: Refine the wait-for-ccid mechanism dccp: Extend CCID packet dequeueing interface dccp: Return-value convention of hc_tx_send_packet() igbvf: fix panic on load ...	2010-10-29 14:17:12 -07:00
David S. Miller	a4765fa7bf	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-29 12:23:15 -07:00
Jesper Juhl	520efd1ace	mac80211: fix failure to check kmalloc return value in key_key_read I noticed two small issues in mac80211/debugfs_key.c::key_key_read while reading through the code. Patch below. The key_key_read() function returns ssize_t and the value that's actually returned is the return value of simple_read_from_buffer() which also returns ssize_t, so let's hold the return value in a ssize_t local variable rather than a int one. Also, memory is allocated dynamically with kmalloc() which can fail, but the return value of kmalloc() is not checked, so we may end up operating on a null pointer further on. So check for a NULL return and bail out with -ENOMEM in that case. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-29 14:33:26 -04:00
Eric Dumazet	d817d29d0b	netfilter: fix nf_conntrack_l4proto_register() While doing __rcu annotations work on net/netfilter I found following bug. On some arches, it is possible we publish a table while its content is not yet committed to memory, and lockless reader can dereference wild pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-29 19:59:40 +02:00
Patrick McHardy	64e4674922	netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=n net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-29 16:28:07 +02:00
Al Viro	51139adac9	convert get_sb_pseudo() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:33 -04:00
Al Viro	fc14f2fef6	convert get_sb_single() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:28 -04:00
David S. Miller	089282fb02	netfilter: xt_socket: Make tproto signed in socket_mt6_v1(). Otherwise error indications from ipv6_find_hdr() won't be noticed. This required making the protocol argument to extract_icmp6_fields() signed too. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 12:59:53 -07:00
Nelson Elhage	448d7b5daf	pktgen: Limit how much data we copy onto the stack. A program that accidentally writes too much data to the pktgen file can overflow the kernel stack and oops the machine. This is only triggerable by root, so there's no security issue, but it's still an unfortunate bug. printk() won't print more than 1024 bytes in a single call, anyways, so let's just never copy more than that much data. We're on a fairly shallow stack, so that should be safe even with CONFIG_4KSTACKS. Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 11:47:53 -07:00
David S. Miller	8acfe468b0	net: Limit socket I/O iovec total length to INT_MAX. This helps protect us from overflow issues down in the individual protocol sendmsg/recvmsg handlers. Once we hit INT_MAX we truncate out the rest of the iovec by setting the iov_len members to zero. This works because: 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial writes are allowed and the application will just continue with another write to send the rest of the data. 2) For datagram oriented sockets, where there must be a one-to-one correspondance between write() calls and packets on the wire, INT_MAX is going to be far larger than the packet size limit the protocol is going to check for and signal with -EMSGSIZE. Based upon a patch by Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 11:47:52 -07:00
Pavel Emelyanov	4aa2c466a7	fib: Fix fib zone and its hash leak on namespace stop When we stop a namespace we flush the table and free one, but the added fn_zone-s (and their hashes if grown) are leaked. Need to free. Tries releases all its stuff in the flushing code. Shame on us - this bug exists since the very first make-fib-per-net patches in 2.6.27 :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:03 -07:00
Gerrit Renker	1c0e0a0569	dccp ccid-2: Stop polling This updates CCID-2 to use the CCID dequeuing mechanism, converting from previous continuous-polling to a now event-driven mechanism. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:01 -07:00
Gerrit Renker	b1fcf55eea	dccp: Refine the wait-for-ccid mechanism This extends the existing wait-for-ccid routine so that it may be used with different types of CCID, addressing the following problems: 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for example has a full TX queue and becomes network-limited just as the application wants to close, then waiting for CCID-2 to become unblocked could lead to an indefinite delay (i.e., application "hangs"). 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes in its sending policy while the queue is being drained. This can lead to further delays during which the application will not be able to terminate. 3) The minimum wait time for CCID-3/4 can be expected to be the queue length times the current inter-packet delay. For example if tx_qlen=100 and a delay of 15 ms is used for each packet, then the application would have to wait for a minimum of 1.5 seconds before being allowed to exit. 4) There is no way for the user/application to control this behaviour. It would be good to use the timeout argument of dccp_close() as an upper bound. Then the maximum time that an application is willing to wait for its CCIDs to can be set via the SO_LINGER option. These problems are addressed by giving the CCID a grace period of up to the `timeout' value. The wait-for-ccid function is, as before, used when the application (a) has read all the data in its receive buffer and (b) if SO_LINGER was set with a non-zero linger time, or (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ state (client application closes after receiving CloseReq). In addition, there is a catch-all case of __skb_queue_purge() after waiting for the CCID. This is necessary since the write queue may still have data when (a) the host has been passively-closed, (b) abnormal termination (unread data, zero linger time), (c) wait-for-ccid could not finish within the given time limit. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:01 -07:00
Gerrit Renker	dc841e30ea	dccp: Extend CCID packet dequeueing interface This extends the packet dequeuing interface of dccp_write_xmit() to allow 1. CCIDs to take care of timing when the next packet may be sent; 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds). The main purpose is to take CCID-2 out of its polling mode (when it is network- limited, it tries every millisecond to send, without interruption). The mode of operation for (2) is as follows: * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(), * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full), * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER', * dccp_write_xmit() returns without further action; * after some time the wait-condition for CCID becomes true, * that CCID schedules the tasklet, * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(), * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now", * packet is sent, and possibly more (since dccp_write_xmit() loops). Code reuse: the taskled function calls dccp_write_xmit(), the timer function reduces to a wrapper around the same code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:00 -07:00
Gerrit Renker	fe84f4140f	dccp: Return-value convention of hc_tx_send_packet() This patch reorganises the return value convention of the CCID TX sending function, to permit more flexible schemes, as required by subsequent patches. Currently the convention is * values < 0 mean error, * a value == 0 means "send now", and * a value x > 0 means "send in x milliseconds". The patch provides symbolic constants and a function to interpret return values. In addition, it caps the maximum positive return value to 0xFFFF milliseconds, corresponding to 65.535 seconds. This is possible since in CCID-3/4 the maximum possible inter-packet gap is fixed at t_mbi = 64 sec. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:00 -07:00
Sanchit Garg	f6ac55b6c1	net/9p: Return error on read with NULL buffer This patch ensures that a read(fd, NULL, 10) returns EFAULT on a 9p file. Signed-off-by: Sanchit Garg <sancgarg@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:49 -05:00
Venkateswararao Jujjuri (JV)	b165d60145	9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:49 -05:00
Aneesh Kumar K.V	7b3bb3fe16	net/9p: Return error if we fail to encode protocol data We need to return error in case we fail to encode data in protocol buffer. This patch also return error in case of a failed copy_from_user. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:49 -05:00
Venkateswararao Jujjuri (JV)	52f44e0d08	net/9p: Add waitq to VirtIO transport. If there is not enough space for the PDU on the VirtIO ring, current code returns -EIO propagating the error to user. This patch introduced a wqit_queue on the channel, and lets the process wait on this queue until VirtIO ring frees up. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:48 -05:00
Venkateswararao Jujjuri (JV)	419b39561e	[net/9p]Serialize virtqueue operations to make VirtIO transport SMP safe. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:48 -05:00
M. Mohan Kumar	329176cc2c	9p: Implement TREADLINK operation for 9p2000.L Synopsis size[4] TReadlink tag[2] fid[4] size[4] RReadlink tag[2] target[s] Description Readlink is used to return the contents of the symoblic link referred by fid. Contents of symboic link is returned as a response. target[s] - Contents of the symbolic link referred by fid. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:48 -05:00
M. Mohan Kumar	1d769cd192	9p: Implement TGETLOCK Synopsis size[4] TGetlock tag[2] fid[4] getlock[n] size[4] RGetlock tag[2] getlock[n] Description TGetlock is used to test for the existence of byte range posix locks on a file identified by given fid. The reply contains getlock structure. If the lock could be placed it returns F_UNLCK in type field of getlock structure. Otherwise it returns the details of the conflicting locks in the getlock structure getlock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK start[8] - Starting offset for lock length[8] - Number of bytes to check for the lock If length is 0, check for lock in all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock/owns the task in case of reply client[4] - Client id of the system that owns the process which has the conflicting lock Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:47 -05:00
M. Mohan Kumar	a099027c77	9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:47 -05:00
Venkateswararao Jujjuri (JV)	920e65dc69	[9p] Introduce client side TFSYNC/RFSYNC for dotl. SYNOPSIS size[4] Tfsync tag[2] fid[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:47 -05:00
jvrao	8e44a0805f	net/9p: Add a Warning to catch NULL fids passed to p9_client_clunk(). Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:45 -05:00
Arun R Bharadwaj	4f7ebe8072	net/9p: This patch implements TLERROR/RLERROR on the 9P client. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:45 -05:00
Eric Dumazet	6b1686a71e	netfilter: nf_conntrack: allow nf_ct_alloc_hashtable() to get highmem pages commit `ea781f197d` (use SLAB_DESTROY_BY_RCU and get rid of call_rcu()) did a mistake in __vmalloc() call in nf_ct_alloc_hashtable(). I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-28 12:34:21 +02:00
Linus Torvalds	22cdbd1d57	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits) ehea: Fixing statistics bonding: Fix lockdep warning after bond_vlan_rx_register() tunnels: Fix tunnels change rcu protection caif-u5500: Build config for CAIF shared mem driver caif-u5500: CAIF shared memory mailbox interface caif-u5500: CAIF shared memory transport protocol caif-u5500: Adding shared memory include drivers/isdn: delete double assignment drivers/net/typhoon.c: delete double assignment drivers/net/sb1000.c: delete double assignment qlcnic: define valid vlan id range qlcnic: reduce rx ring size qlcnic: fix mac learning ehea: fix use after free inetpeer: __rcu annotations fib_rules: __rcu annotates ctarget tunnels: add __rcu annotations net: add __rcu annotations to protocol ipv4: add __rcu annotations to routes.c qlge: bugfix: Restoring the vlan setting. ...	2010-10-27 18:28:00 -07:00
Pavel Emelyanov	74b0b85b88	tunnels: Fix tunnels change rcu protection After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug was introduced into the SIOCCHGTUNNEL code. The tunnel is first unlinked, then addresses change, then it is linked back probably into another bucket. But while changing the parms, the hash table is unlocked to readers and they can lookup the improper tunnel. Respective commits are `b7285b79` (ipip: get rid of ipip_lock), `1507850b` (gre: get rid of ipgre_lock), `3a43be3c` (sit: get rid of ipip6_lock) and `94767632` (ip6tnl: get rid of ip6_tnl_lock). The quick fix is to wait for quiescent state to pass after unlinking, but if it is inappropriate I can invent something better, just let me know. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 14:20:08 -07:00
Jouni Malinen	dc9f48ce7c	mac80211: Fix scan_ies_len to include DS Params Commit `651b52254f` added DS Parameter Set information into Probe Request frames that are transmitted on 2.4 GHz band, but it failed to increment local->scan_ies_len to cover this new information. This variable needs to be updated to match the maximum IE data length so that the extra buffer need gets reduced from the driver limit. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-27 15:46:51 -04:00
Eric Dumazet	b914c4ea92	inetpeer: __rcu annotations Adds __rcu annotations to inetpeer (struct inet_peer)->avl_left (struct inet_peer)->avl_right This is a tedious cleanup, but removes one smp_wmb() from link_to_pool() since we now use more self documenting rcu_assign_pointer(). Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in all cases we dont need a memory barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:33 -07:00
Eric Dumazet	7a2b03c517	fib_rules: __rcu annotates ctarget Adds __rcu annotation to (struct fib_rule)->ctarget Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:32 -07:00
Eric Dumazet	b33eab0844	tunnels: add __rcu annotations Add __rcu annotations to : (struct ip_tunnel)->prl (struct ip_tunnel_prl_entry)->next (struct xfrm_tunnel)->next struct xfrm_tunnel tunnel4_handlers struct xfrm_tunnel tunnel64_handlers And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:32 -07:00
Eric Dumazet	e0ad61ec86	net: add __rcu annotations to protocol Add __rcu annotations to : struct net_protocol inet_protos struct net_protocol inet6_protos And use appropriate casts to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:31 -07:00
Eric Dumazet	1c31720a74	ipv4: add __rcu annotations to routes.c Add __rcu annotations to : (struct dst_entry)->rt_next (struct rt_hash_bucket)->chain And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:31 -07:00
Ursula Braun	853dc2e03d	ipv6: fix refcnt problem related to POSTDAD state After running this bonding setup script modprobe bonding miimon=100 mode=0 max_bonds=1 ifconfig bond0 10.1.1.1/16 ifenslave bond0 eth1 ifenslave bond0 eth3 on s390 with qeth-driven slaves, modprobe -r fails with this message unregister_netdevice: waiting for bond0 to become free. Usage count = 1 due to twice detection of duplicate address. Problem is caused by a missing decrease of ifp->refcnt in addrconf_dad_failure. An extra call of in6_ifa_put(ifp) solves it. Problem has been introduced with commit `f2344a131b`. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:30 -07:00
Ben Hutchings	66c68bcc48	net: NETIF_F_HW_CSUM does not imply FCoE CRC offload NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit checksum with the checksum of an arbitrary part of the packet data, whereas the FCoE CRC is something entirely different. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org [2.6.32+] Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:29 -07:00
Ben Hutchings	af1905dbec	net: Fix some corner cases in dev_can_checksum() dev_can_checksum() incorrectly returns true in these cases: 1. The skb has both out-of-band and in-band VLAN tags and the device supports checksum offload for the encapsulated protocol but only with one layer of encapsulation. 2. The skb has a VLAN tag and the device supports generic checksumming but not in conjunction with VLAN encapsulation. Rearrange the VLAN tag checks to avoid these. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:29 -07:00
Linus Torvalds	426e1f5cec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...	2010-10-26 17:58:44 -07:00
Eric Dumazet	518de9b39e	fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:15 -07:00
Glenn Wurster	7a876b0efc	IPv6: Temp addresses are immediately deleted. There is a bug in the interaction between ipv6_create_tempaddr and addrconf_verify. Because ipv6_create_tempaddr uses the cstamp and tstamp from the public address in creating a private address, if we have not received a router advertisement in a while, tstamp + temp_valid_lft might be < now. If this happens, the new address is created inside ipv6_create_tempaddr, then the loop within addrconf_verify starts again and the address is immediately deleted. We are left with no temporary addresses on the interface, and no more will be created until the public IP address is updated. To avoid this, set the expiry time to be the minimum of the time left on the public address or the config option PLUS the current age of the public interface. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 12:35:13 -07:00
Glenn Wurster	aed65501e8	IPv6: Create temporary address if none exists. If privacy extentions are enabled, but no current temporary address exists, then create one when we get a router advertisement. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 12:35:12 -07:00
Eric Dumazet	ded85aa86b	fib_hash: fix rcu sparse and logical errors While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to fz->fz_hash for real. - &fz->fz_hash[fn_hash(f->fn_key, fz)] + rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 11:42:39 -07:00
Eric Dumazet	ebb9fed2de	fib: fix fib_nl_newrule() Some panic reports in fib_rules_lookup() show a rule could have a NULL pointer as a next pointer in the rules_list. This can actually happen because of a bug in fib_nl_newrule() : It checks if current rule is the destination of unresolved gotos. (Other rules have gotos to this about to be inserted rule) Problem is it does the resolution of the gotos before the rule is inserted in the rules_list (and has a valid next pointer) Fix this by moving the rules_list insertion before the changes on gotos. A lockless reader can not any more follow a ctarget pointer, unless destination is ready (has a valid next pointer) Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru> Reported-by: Joe Buehler <aspam@cox.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 11:42:38 -07:00
David S. Miller	78fd9c4491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-26 11:32:28 -07:00
Linus Torvalds	4390110fef	Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits) svcrpc: svc_tcp_sendto XPT_DEAD check is redundant svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue svcrpc: assume svc_delete_xprt() called only once svcrpc: never clear XPT_BUSY on dead xprt nfsd4: fix connection allocation in sequence() nfsd4: only require krb5 principal for NFSv4.0 callbacks nfsd4: move minorversion to client nfsd4: delay session removal till free_client nfsd4: separate callback change and callback probe nfsd4: callback program number is per-session nfsd4: track backchannel connections nfsd4: confirm only on succesful create_session nfsd4: make backchannel sequence number per-session nfsd4: use client pointer to backchannel session nfsd4: move callback setup into session init code nfsd4: don't cache seq_misordered replies SUNRPC: Properly initialize sock_xprt.srcaddr in all cases SUNRPC: Use conventional switch statement when reclassifying sockets sunrpc/xprtrdma: clean up workqueue usage sunrpc: Turn list_for_each-s into the ..._entry-s ... Fix up trivial conflicts (two different deprecation notices added in separate branches) in Documentation/feature-removal-schedule.txt	2010-10-26 09:55:25 -07:00
Linus Torvalds	a4dd8dce14	Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: net/sunrpc: Use static const char arrays nfs4: fix channel attribute sanity-checks NFSv4.1: Use more sensible names for 'initialize_mountpoint' NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure NFS: client needs to maintain list of inodes with active layouts NFS: create and destroy inode's layout cache NFSv4.1: pnfs: filelayout: introduce minimal file layout driver NFSv4.1: pnfs: full mount/umount infrastructure NFS: set layout driver NFS: ask for layouttypes during v4 fsinfo call NFS: change stateid to be a union NFSv4.1: pnfsd, pnfs: protocol level pnfs constants SUNRPC: define xdr_decode_opaque_fixed NFSD: remove duplicate NFS4_STATEID_SIZE	2010-10-26 09:52:09 -07:00
David S. Miller	7932c2e55c	netfilter: Add missing CONFIG_SYSCTL checks in ipv6's nf_conntrack_reasm.c Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 09:08:53 -07:00
Joe Perches	411b5e0561	net/sunrpc: Use static const char arrays Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-25 22:19:52 -04:00
Christoph Hellwig	85fe4025c6	fs: do not assign default i_ino in new_inode Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Eric Dumazet	7e360c38ab	fs: allow for more than 2^31 files Andrew, Could you please review this patch, you probably are the right guy to take it, because it crosses fs and net trees. Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt depend on previous patch (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) Thanks ! [PATCH V4] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:18:20 -04:00
J. Bruce Fields	42d7ba3d6d	svcrpc: svc_tcp_sendto XPT_DEAD check is redundant The only caller (svc_send) has already checked XPT_DEAD. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:34 -04:00
J. Bruce Fields	01dba075d5	svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue If any xprt marked DEAD is also left BUSY for the rest of its life, then the XPT_DEAD check here is superfluous--we'll get the same result from the XPT_BUSY check just after. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:33 -04:00
J. Bruce Fields	ac9303eb74	svcrpc: assume svc_delete_xprt() called only once As long as DEAD exports are left BUSY, and svc_delete_xprt is called only with BUSY held, then svc_delete_xprt() will never be called on an xprt that is already DEAD. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:32 -04:00

1 2 3 4 5 ...

17192 Commits