linux

Commit Graph

Author	SHA1	Message	Date
Stephen Hemminger	b1153f29ee	netlink: make socket filters work on netlink Make socket filters work for netlink unicast and notifications. This is useful for applications like Zebra that get overrun with messages that are then ignored. Note: netlink messages are in host byte order, but packet filter state machine operations are done as network byte order. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 15:46:12 -07:00
Denis V. Lunev	edf0208702	[NET]: Make netlink_kernel_release publically available as sk_release_kernel. This staff will be needed for non-netlink kernel sockets, which should also not pin a namespace like tcp_socket and icmp_socket. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:18:32 -08:00
Denis V. Lunev	9dfbec1fb2	[NETLINK]: No need for a separate __netlink_release call. Merge it to netlink_kernel_release. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:17:56 -08:00
Pavel Emelyanov	910d6c320c	[GENETLINK]: Relax dances with genl_lock. The genl_unregister_family() calls the genl_unregister_mc_groups(), which takes and releases the genl_lock and then locks and releases this lock itself. Relax this behavior, all the more so the genl_unregister_mc_groups() is called from genl_unregister_family() only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 22:16:33 -08:00
Al Viro	0c11b9428f	[PATCH] switch audit_get_loginuid() to task_struct * all callers pass something->audit_context Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-02-01 14:04:59 -05:00
Pavel Emelyanov	23fe18669e	[NETNS]: Fix race between put_net() and netlink_kernel_create(). The comment about "race free view of the set of network namespaces" was a bit hasty. Look (there even can be only one CPU, as discovered by Alexey Dobriyan and Denis Lunev): put_net() if (atomic_dec_and_test(&net->refcnt)) /* true / __put_net(net); queue_work(...); / * note: the net now has refcnt 0, but still in * the global list of net namespaces / == re-schedule == register_pernet_subsys(&some_ops); register_pernet_operations(&some_ops); (some_ops)->init(net); /* * we call netlink_kernel_create() here * in some places / netlink_kernel_create(); sk_alloc(); get_net(net); / refcnt = 1 / / * now we drop the net refcount not to * block the net namespace exit in the * future (or this can be done on the * error path) / put_net(sk->sk_net); if (atomic_dec_and_test(&...)) / * true. BOOOM! The net is * scheduled for release twice */ When thinking on this problem, I decided, that getting and putting the net in init callback is wrong. If some init callback needs to have a refcount-less reference on the struct net, _it_ has to be careful himself, rather than relying on the infrastructure to handle this correctly. In case of netlink_kernel_create(), the problem is that the sk_alloc() gets the given namespace, but passing the info that we don't want to get it inside this call is too heavy. Instead, I propose to crate the socket inside an init_net namespace and then re-attach it to the desired one right after the socket is created. After doing this, we also have to be careful on error paths not to drop the reference on the namespace, we didn't get the one on. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:22 -08:00
Patrick McHardy	01480e1cf5	[NETLINK]: Add nla_append() Used to append data to a message without a header or padding. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:09 -08:00
Denis V. Lunev	775516bfa2	[NETNS]: Namespace stop vs 'ip r l' race. During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:08 -08:00
Denis V. Lunev	b7c6ba6eb1	[NETNS]: Consolidate kernel netlink socket destruction. Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:07 -08:00
Denis V. Lunev	869e58f870	[NETNS]: Double free in netlink_release. Netlink protocol table is global for all namespaces. Some netlink protocols have been virtualized, i.e. they have per/namespace netlink socket. This difference can easily lead to double free if more than 1 namespace is started. Count the number of kernel netlink sockets to track that this table is not used any more. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:05 -08:00
Ilpo Järvinen	3f25252675	[NETLINK] af_netlink: kill some bloat net/netlink/af_netlink.c: netlink_realloc_groups \| -46 netlink_insert \| -49 netlink_autobind \| -94 netlink_clear_multicast_users \| -48 netlink_bind \| -55 netlink_setsockopt \| -54 netlink_release \| -86 netlink_kernel_create \| -47 netlink_change_ngroups \| -56 9 functions changed, 535 bytes removed, diff: -535 net/netlink/af_netlink.c: netlink_table_ungrab \| +53 1 function changed, 53 bytes added, diff: +53 net/netlink/af_netlink.o: 10 functions changed, 53 bytes added, 535 bytes removed, diff: -482 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:50 -08:00
Eric Dumazet	9a429c4983	[NET]: Add some acquires/releases sparse annotations. Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:31 -08:00
Eric Dumazet	ea72912c88	[NETLINK]: kzalloc() conversion nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc(). It is now returning zeroed memory to its callers. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:06 -08:00
Patrick McHardy	6ac552fdc6	[NETLINK]: af_netlink.c checkpatch cleanups Fix large number of checkpatch errors. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:50 -08:00
Denis V. Lunev	e372c41401	[NET]: Consolidate net namespace related proc files creation. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Denis V. Lunev	022cbae611	[NET]: Move unneeded data to initdata section. This patch reverts Eric's commit `2b008b0a8e` It diets .text & .data section of the kernel if CONFIG_NET_NS is not set. This is safe after list operations cleanup. Signed-of-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 03:23:50 -08:00
Patrick McHardy	c3d8d1e30c	[NETLINK]: Fix unicast timeouts Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:15:12 -08:00
Pavel Emelyanov	6257ff2177	[NET]: Forget the zero_it argument of sk_alloc() Finally, the zero_it argument can be completely removed from the callers and from the function prototype. Besides, fix the checkpatch.pl warnings about using the assignments inside if-s. This patch is rather big, and it is a part of the previous one. I splitted it wishing to make the patches more readable. Hope this particular split helped. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-01 00:39:31 -07:00
Eric W. Biederman	2b008b0a8e	[NET]: Marking struct pernet_operations __net_initdata was inappropriate It is not safe to to place struct pernet_operations in a special section. We need struct pernet_operations to last until we call unregister_pernet_subsys. Which doesn't happen until module unload. So marking struct pernet_operations is a disaster for modules in two ways. - We discard it before we call the exit method it points to. - Because I keep struct pernet_operations on a linked list discarding it for compiled in code removes elements in the middle of a linked list and does horrible things for linked insert. So this looks safe assuming __exit_refok is not discarded for modules. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 22:54:53 -07:00
Denis V. Lunev	5c58298c25	[NETLINK]: Fix ACK processing after netlink_dump_start Revert to original netlink behavior. Do not reply with ACK if the netlink dump has bees successfully started. libnl has been broken by the `cd40b7d398` The following command reproduce the problem: /nl-route-get 192.168.1.1 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:51 -07:00
Jesper Juhl	f937f1f46b	[NETLINK]: Don't leak 'listeners' in netlink_kernel_create() The Coverity checker spotted that we'll leak the storage allocated to 'listeners' in netlink_kernel_create() when the if (!nl_table[unit].registered) check is false. This patch avoids the leak. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:32 -07:00
Denis V. Lunev	cd40b7d398	[NET]: make netlink user -> kernel interface synchronious This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:15:29 -07:00
Denis V. Lunev	aed815601f	[NET]: unify netlink kernel socket recognition There are currently two ways to determine whether the netlink socket is a kernel one or a user one. This patch creates a single inline call for this purpose and unifies all the calls in the af_netlink.c No similar calls are found outside af_netlink.c. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:14:32 -07:00
Denis V. Lunev	7ee015e0fa	[NET]: cleanup 3rd argument in netlink_sendskb netlink_sendskb does not use third argument. Clean it and save a couple of bytes. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:14:03 -07:00
Denis V. Lunev	3b71535f35	[NET]: Make netlink processing routines semi-synchronious (inspired by rtnl) v2 The code in netfilter/nfnetlink.c and in ./net/netlink/genetlink.c looks like outdated copy/paste from rtnetlink.c. Push them into sync with the original. Changes from v1: - deleted comment in nfnetlink_rcv_msg by request of Patrick McHardy Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 21:13:32 -07:00
Pavel Emelyanov	cf7732e4cc	[NET]: Make core networking code use seq_open_private This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:33 -07:00
Pavel Emelyanov	4665079cbb	[NETNS]: Move some code into __init section when CONFIG_NET_NS=n With the net namespaces many code leaved the __init section, thus making the kernel occupy more memory than it did before. Since we have a config option that prohibits the namespace creation, the functions that initialize/finalize some netns stuff are simply not needed and can be freed after the boot. Currently, this is almost not noticeable, since few calls are no longer in __init, but when the namespaces will be merged it will be possible to free more code. I propose to use the __net_init, __net_exit and __net_initdata "attributes" for functions/variables that are not used if the CONFIG_NET_NS is not set to save more space in memory. The exiting functions cannot just reside in the __exit section, as noticed by David, since the init section will have references on it and the compilation will fail due to modpost checks. These references can exist, since the init namespace never dies and the exit callbacks are never called. So I introduce the __exit_refok attribute just like it is already done with the __init_refok. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:58 -07:00
Denis Cheng	26ff5ddc5a	[NETLINK]: the temp variable name max is ambiguous with the macro max provided by <linux/kernel.h>, so changed its name to a more proper one: limit Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:25 -07:00
Denis Cheng	99406c885a	[NETLINK]: use the macro min(x,y) provided by <linux/kernel.h> instead Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:25 -07:00
Herbert Xu	0cfad07555	[NETLINK]: Avoid pointer in netlink_run_queue I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:24 -07:00
Eric W. Biederman	077130c0cf	[NET]: Fix race when opening a proc file while a network namespace is exiting. The problem: proc_net files remember which network namespace the are against but do not remember hold a reference count (as that would pin the network namespace). So we currently have a small window where the reference count on a network namespace may be incremented when opening a /proc file when it has already gone to zero. To fix this introduce maybe_get_net and get_proc_net. maybe_get_net increments the network namespace reference count only if it is greater then zero, ensuring we don't increment a reference count after it has gone to zero. get_proc_net handles all of the magic to go from a proc inode to the network namespace instance and call maybe_get_net on it. PROC_NET the old accessor is removed so that we don't get confused and use the wrong helper function. Then I fix up the callers to use get_proc_net and handle the case case where get_proc_net returns NULL. In that case I return -ENXIO because effectively the network namespace has already gone away so the files we are trying to access don't exist anymore. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:22 -07:00
Thomas Graf	8f4c1f9b04	[NETLINK]: Introduce nested and byteorder flag to netlink attribute This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:16 -07:00
Eric W. Biederman	b4b510290b	[NET]: Support multiple network namespaces with netlink Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:09 -07:00
Eric W. Biederman	1b8d7ae42d	[NET]: Make socket creation namespace safe. This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:07 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Denis Cheng	32b21e034b	[NETLINK]: use container_of instead This could make future redesign of struct netlink_sock easier. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:35 -07:00
Thomas Graf	79d310d01e	[GENETLINK]: Correctly report errors while registering a multicast group Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-24 15:34:53 -07:00
Thomas Graf	2c04ddb707	[GENETLINK]: Fix adjustment of number of multicast groups The current calculation of the maximum number of genetlink multicast groups seems odd, fix it. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-24 15:33:51 -07:00
Thomas Graf	79dc4386ae	[GENETLINK]: Fix race in genl_unregister_mc_groups() family->mcast_groups is protected by genl_lock so it must be held while accessing the list in genl_unregister_mc_groups(). Requires adding a non-locking variant of genl_unregister_mc_group(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-24 15:32:46 -07:00
Johannes Berg	2dbba6f773	[GENETLINK]: Dynamic multicast groups. Introduce API to dynamically register and unregister multicast groups. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 15:47:52 -07:00
Johannes Berg	84659eb529	[NETLIKN]: Allow removing multicast groups. Allow kicking listeners out of a multicast group when necessary (for example if that group is going to be removed.) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 15:47:05 -07:00
Johannes Berg	b4ff4f0419	[NETLINK]: allocate group bitmaps dynamically Allow changing the number of groups for a netlink family after it has been created, use RCU to protect the listeners bitmap keeping netlink_has_listeners() lock-free. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 15:46:06 -07:00
Johannes Berg	eb49653449	[NETLINK]: negative groups in netlink_setsockopt Reading netlink_setsockopt it's not immediately clear why there isn't a bug when you pass in negative numbers, the reason being that the >= comparison is really unsigned although 'val' is signed because nlk->ngroups is unsigned. Make 'val' unsigned too. [ Update the get_user() cast to match. --DaveM ] Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:07:51 -07:00
Philippe De Muyter	56b3d975bb	[NET]: Make all initialized struct seq_operations const. Make all initialized struct seq_operations in net/ const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 23:07:31 -07:00
Patrick McHardy	1092cb2197	[NETLINK]: attr: add nested compat attribute type Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:38 -07:00
Patrick McHardy	ef7c79ed64	[NETLINK]: Mark netlink policies const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:10 -07:00
Randy Dunlap	e63340ae6b	header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:07 -07:00
Patrick McHardy	9e71efcd6d	[NETLINK]: Remove bogus BUG_ON Remove bogus BUG_ON(mutex_is_locked(nlk_sk(sk)->cb_mutex)), when the netlink_kernel_create caller specifies an external mutex it might validly be locked. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:15:11 -07:00
Patrick McHardy	188ccb5583	[NETLINK]: Fix use after free in netlink_recvmsg When the user passes in MSG_TRUNC the skb is used after getting freed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:27:01 -07:00
Herbert Xu	3f660d66df	[NETLINK]: Kill CB only when socket is unused Since we can still receive packets until all references to the socket are gone, we don't need to kill the CB until that happens. This also aligns ourselves with the receive queue purging which happens at that point. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:17:14 -07:00
Adrian Bunk	42bad1da50	[NETLINK]: Possible cleanups. - make the following needlessly global variables static: - core/rtnetlink.c: struct rtnl_msg_handlers[] - netfilter/nf_conntrack_proto.c: struct nf_ct_protos[] - make the following needlessly global functions static: - core/rtnetlink.c: rtnl_dump_all() - netlink/af_netlink.c: netlink_queue_skip() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 00:57:41 -07:00
Patrick McHardy	ffa4d7216e	[NETLINK]: don't reinitialize callback mutex Don't reinitialize the callback mutex the netlink_kernel_create caller handed in, it is supposed to already be initialized and could already be held by someone. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:06 -07:00
Patrick McHardy	af65bdfce9	[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:03 -07:00
Johannes Berg	d30045a0bc	[NETLINK]: introduce NLA_BINARY type This patch introduces a new NLA_BINARY attribute policy type with the verification of simply checking the maximum length of the payload. It also fixes a small typo in the example. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:05 -07:00
Thomas Graf	c702e8047f	[NETLINK]: Directly return -EINTR from netlink_dump_start() Now that all users of netlink_dump_start() use netlink_run_queue() to process the receive queue, it is possible to return -EINTR from netlink_dump_start() directly, therefore simplying the callers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:33 -07:00
Thomas Graf	1d00a4eb42	[NETLINK]: Remove error pointer from netlink message handler The error pointer argument in netlink message handlers is used to signal the special case where processing has to be interrupted because a dump was started but no error happened. Instead it is simpler and more clear to return -EINTR and have netlink_run_queue() deal with getting the queue right. nfnetlink passed on this error pointer to its subsystem handlers but only uses it to signal the start of a netlink dump. Therefore it can be removed there as well. This patch also cleans up the error handling in the affected message handlers to be consistent since it had to be touched anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:30 -07:00
Thomas Graf	45e7ae7f71	[NETLINK]: Ignore control messages directly in netlink_run_queue() Changes netlink_rcv_skb() to skip netlink controll messages and don't pass them on to the message handler. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:29 -07:00
Thomas Graf	d35b685640	[NETLINK]: Ignore !NLM_F_REQUEST messages directly in netlink_run_queue() netlink_rcv_skb() is changed to skip messages which don't have the NLM_F_REQUEST bit to avoid every netlink family having to perform this check on their own. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:29 -07:00
Thomas Graf	33a0543cd9	[NETLINK]: Remove unused groups variable Leftover from dynamic multicast groups allocation work. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:27 -07:00
Arnaldo Carvalho de Melo	b529ccf279	[NETLINK]: Introduce nlmsg_hdr() helper For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:34 -07:00
Arnaldo Carvalho de Melo	4305b54135	[SK_BUFF]: Convert skb->end to sk_buff_data_t Now to convert the last one, skb->data, that will allow many simplifications and removal of some of the offset helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:29 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo	badff6d01a	[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push\|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:15 -07:00
Adrian Bunk	cb69cc5236	[TCP/DCCP/RANDOM]: Remove unused exports. This patch removes the following not or no longer used exports: - drivers/char/random.c: secure_tcp_sequence_number - net/dccp/options.c: sysctl_dccp_feat_sequence_window - net/netlink/af_netlink.c: netlink_set_err Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:03 -07:00
David S. Miller	b558ff7999	[NETLINK]: Mirror UDP MSG_TRUNC semantics. If the user passes MSG_TRUNC in via msg_flags, return the full packet size not the truncated size. Idea from Herbert Xu and Thomas Graf. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:35 -07:00
Denis Lunev	ac57b3a9ce	[NETLINK]: Don't attach callback to a going-away netlink socket There is a race between netlink_dump_start() and netlink_release() that can lead to the situation when a netlink socket with non-zero callback is freed. Here it is: CPU1: CPU2 netlink_release(): netlink_dump_start(): sk = netlink_lookup(); /* OK / netlink_remove(); spin_lock(&nlk->cb_lock); if (nlk->cb) { / false / ... } spin_unlock(&nlk->cb_lock); spin_lock(&nlk->cb_lock); if (nlk->cb) { / false / ... } nlk->cb = cb; spin_unlock(&nlk->cb_lock); ... sock_orphan(sk); / * proceed with releasing * the socket */ The proposal it to make sock_orphan before detaching the callback in netlink_release() and to check for the sock to be SOCK_DEAD in netlink_dump_start() before setting a new callback. Signed-off-by: Denis Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-18 17:05:58 -07:00
Arjan van de Ven	da7071d7e3	[PATCH] mark struct file_operations const 8 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
YOSHIFUJI Hideaki	746fac4dcd	[NET] NETLINK: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:58 -08:00
Mariusz Kozlowski	5e7c001c62	[AF_NETLINK]: module_put cleanup This patch removes redundant argument check for module_put(). Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-03 18:38:15 -08:00
Josef Sipek	6db5fc5d53	[PATCH] struct path: convert netlink Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:48 -08:00
Jamal Hadi Salim	48d4ed7a86	[GENETLINK]: Fix misplaced command flags. The command flags for dump and do were swapped.. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 20:06:25 -08:00
Jamal Hadi Salim	334c29a645	[GENETLINK]: Move command capabilities to flags. This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. We increment the nlctrl version so as to signal to user space that to not expect the attributes. We will try to be careful not to do this too often ;-> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:38:41 -08:00
Jamal Hadi Salim	a4d1366d50	[GENETLINK]: Add cmd dump completion. Remove assumption that generic netlink commands cannot have dump completion callbacks. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:32:09 -08:00
Thomas Graf	4e9b826935	[NETLINK]: Remove unused dst_pid field in netlink_skb_parms The destination PID is passed directly to netlink_unicast() respectively netlink_multicast(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:43 -08:00
Thomas Graf	e94ef68205	[GENETLINK] ctrl: Avoid empty CTRL_ATTR_OPS attribute when dumping Based on Jamal's patch but compiled and even tested. :-) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:23 -08:00
Thomas Graf	17c157c889	[GENL]: Add genlmsg_put_reply() to simplify building reply headers By modyfing genlmsg_put() to take a genl_family and by adding genlmsg_put_reply() the process of constructing the netlink and generic netlink headers is simplified. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:42 -08:00
Thomas Graf	81878d27fd	[GENL]: Add genlmsg_reply() to simply unicast replies to requests A generic netlink user has no interest in knowing how to address the source of the original request. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:41 -08:00
Thomas Graf	339bf98ffc	[NETLINK]: Do precise netlink message allocations where possible Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:11 -08:00
Heiko Carstens	a27b58fed9	[NET]: fix uaccess handling Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:41 -08:00
Thomas Graf	eb328111ef	[GENL]: Provide more information to userspace about registered genl families Additionaly exports the following information when providing the list of registered generic netlink families: - protocol version - header size - maximum number of attributes - list of available operations including - id - flags - avaiability of policy and doit/dumpit function libnl HEAD provides a utility to read this new information: 0x0010 nlctrl version 1 hdrsize 0 maxattr 6 op GETFAMILY (0x03) [POLICY,DOIT,DUMPIT] 0x0011 NLBL_MGMT version 1 hdrsize 0 maxattr 0 op unknown (0x02) [DOIT] op unknown (0x03) [DOIT] .... Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:51 -07:00
Thomas Graf	5176f91ea8	[NETLINK]: Make use of NLA_STRING/NLA_NUL_STRING attribute validation Converts existing NLA_STRING attributes to use the new validation features, saving a couple of temporary buffers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:25 -07:00
Thomas Graf	a5531a5d85	[NETLINK]: Improve string attribute validation Introduces a new attribute type NLA_NUL_STRING to support NUL terminated strings. Attributes of this kind require to carry a terminating NUL within the maximum specified in the policy. The `old' NLA_STRING which is not required to be NUL terminated is extended to provide means to specify a maximum length of the string. Aims at easing the pain with using nla_strlcpy() on temporary buffers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:24 -07:00
YOSHIFUJI Hideaki	ef047f5e10	[NET]: Use BUILD_BUG_ON() for checking size of skb->cb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:15 -07:00
Thomas Graf	d387f6ad10	[NETLINK]: Add notification message sending interface Adds nlmsg_notify() implementing proper notification logic. The message is multicasted to all listeners in the group. The applications the requests orignates from can request a unicast back report in which case said socket will be excluded from the multicast to avoid duplicated notifications. nlmsg_multicast() is extended to take allocation flags to allow notification in atomic contexts. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:49 -07:00
Thomas Graf	bf8b79e444	[NETLINK]: Convert core netlink handling to new netlink api Fixes a theoretical memory and locking leak when the size of the netlink header would exceed the skb tailroom. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:44 -07:00
Thomas Graf	fe4944e59c	[NETLINK]: Extend netlink messaging interface Adds: nlmsg_get_pos() return current position in message nlmsg_trim() trim part of message nla_reserve_nohdr(skb, len) reserve room for an attribute w/o hdr nla_put_nohdr(skb, len, data) add attribute w/o hdr nla_find_nested() find attribute in nested attributes Fixes nlmsg_new() to take allocation flags and consider size. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:43 -07:00
Akinobu Mita	fab2caf62e	[NETLINK]: Call panic if nl_table allocation fails This patch makes crash happen if initialization of nl_table fails in initcalls. It is better than getting use after free crash later. Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-29 21:22:18 -07:00
Panagiotis Issaris	0da974f4f3	[NET]: Conversions from kmalloc+memset to k(z\|c)alloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:51:30 -07:00
Arjan van de Ven	6abd219c6e	[PATCH] bcm43xx: netlink deadlock fix reported by Jure Repinc: > > http://bugzilla.kernel.org/show_bug.cgi?id=6773 > > checked out dmesg output and found the message > > > > ====================================================== > > [ BUG: hard-safe -> hard-unsafe lock order detected! ] > > ------------------------------------------------------ > > > > starting at line 660 of the dmesg.txt that I will attach. The patch below should fix the deadlock, albeit I suspect it's not the "right" fix; the right fix may well be to move the rx processing in bcm43xx to softirq context. [it's debatable, ipw2200 hit this exact same bug; at some point it's better to bite the bullet and move this to the common layer as my patch below does] Make the nl_table_lock irq-safe; it's taken for read in various netlink functions, including functions that several wireless drivers (ipw2200, bcm43xx) want to call from hardirq context. The deadlock was found by the lock validator. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michael Buesch <mb@bu3sch.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Jeff Garzik <jeff@garzik.org> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: jamal <hadi@cyberus.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-03 15:26:58 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Darrel Goeddel	c7bdb545d2	[NETLINK]: Encapsulate eff_cap usage within security framework. This patch encapsulates the usage of eff_cap (in netlink_skb_params) within the security framework by extending security_netlink_recv to include a required capability parameter and converting all direct usage of eff_caps outside of the lsm modules to use the interface. It also updates the SELinux implementation of the security_netlink_send and security_netlink_recv hooks to take advantage of the sid in the netlink_skb_params struct. This also enables SELinux to perform auditing of netlink capability checks. Please apply, for 2.6.18 if possible. Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-29 16:57:55 -07:00
Linus Torvalds	532f57da40	Merge branch 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b10' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] Audit Filter Performance [PATCH] Rework of IPC auditing [PATCH] More user space subject labels [PATCH] Reworked patch for labels on user space messages [PATCH] change lspp ipc auditing [PATCH] audit inode patch [PATCH] support for context based audit filtering, part 2 [PATCH] support for context based audit filtering [PATCH] no need to wank with task_lock() and pinning task down in audit_syscall_exit() [PATCH] drop task argument of audit_syscall_{entry,exit} [PATCH] drop gfp_mask in audit_log_exit() [PATCH] move call of audit_free() into do_exit() [PATCH] sockaddr patch [PATCH] deal with deadlocks in audit_free()	2006-05-01 21:43:05 -07:00
Steve Grubb	e7c3497013	[PATCH] Reworked patch for labels on user space messages The below patch should be applied after the inode and ipc sid patches. This patch is a reworking of Tim's patch that has been updated to match the inode and ipc patches since its similar. [updated: > Stephen Smalley also wanted to change a variable from isec to tsec in the > user sid patch. ] Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-05-01 06:09:58 -04:00
Soyoung Park	09493abfdb	[NETLINK]: cleanup unused macro in net/netlink/af_netlink.c 1 line removal, of unused macro. ran 'egrep -r' from linux-2.6.16/ for Nprintk and didn't see it anywhere else but here, in #define... Signed-off-by: Soyoung Park <speattle@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-29 18:33:13 -07:00
Alan Stern	e041c68341	[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-27 08:44:50 -08:00
Ingo Molnar	14cc3e2b63	[PATCH] sem2mutex: misc static one-file mutexes Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Jens Axboe <axboe@suse.de> Cc: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-26 08:56:55 -08:00
Patrick McHardy	4277a083ec	[NETLINK]: Add netlink_has_listeners for avoiding unneccessary event message generation Keep a bitmask of multicast groups with subscribed listeners to let netlink users check for listeners before generating multicast messages. Queries don't perform any locking, which may result in false positives, it is guaranteed however that any new subscriptions are visible before bind() or setsockopt() return. Signed-off-by: Patrick McHardy <kaber@trash.net> ACKed-by: Jamal Hadi Salim<hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:52:01 -08:00
Patrick McHardy	cc9a06cd8d	[NETLINK]: Fix use-after-free in netlink_recvmsg The skb given to netlink_cmsg_recv_pktinfo is already freed, move it up a few lines. Coverity #948 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-12 20:39:38 -08:00
Jamal Hadi Salim	e200bd8065	[NETLINK] genetlink: Fix bugs spotted by Andrew Morton. - panic() doesn't return. - Don't forget to unlock on genl_register_family() error path - genl_rcv_msg() is called via pointer so there's no point in declaring it `inline'. Notes: genl_ctrl_event() ignores the genlmsg_multicast() return value. lots of things ignore the genl_ctrl_event() return value. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-13 15:51:24 -08:00
Alexey Kuznetsov	a70ea994a0	[NETLINK]: Fix a severe bug netlink overrun was broken while improvement of netlink. Destination socket is used in the place where it was meant to be source socket, so that now overrun is never sent to user netlink sockets, when it should be, and it even can be set on kernel socket, which results in complete deadlock of rtnetlink. Suggested fix is to restore status quo passing source socket as additional argument to netlink_attachskb(). A little explanation: overrun is set on a socket, when it failed to receive some message and sender of this messages does not or even have no way to handle this error. This happens in two cases: 1. when kernel sends something. Kernel never retransmits and cannot wait for buffer space. 2. when user sends a broadcast and the message was not delivered to some recipients. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-02-09 16:43:38 -08:00

1 2 3 4

188 Commits