linux

Commit Graph

Author	SHA1	Message	Date
David S. Miller	9e0064a545	phonet: Fix build warning. net/phonet/socket.c: In function ‘pn_res_seq_show’: net/phonet/socket.c:726: warning: format ‘%02X’ expects type ‘unsigned int’, but argument 3 has type ‘long int’ Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 21:34:41 -07:00
Rémi Denis-Courmont	507215f8d0	Phonet: list subscribed resources via proc_fs Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 21:31:33 -07:00
Rémi Denis-Courmont	b6a563b2af	Phonet: look up the resource routing table when forwarding Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 21:31:33 -07:00
Rémi Denis-Courmont	7417fa83c1	Phonet: hook resource routing to userspace via ioctl()'s I wish we could use something cleaner, such as bind(). But that would not work since resource subscription is orthogonal/in addition to the normal object ID allocated via bind(). This is similar to multicasting which also uses ioctl()'s. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 21:31:32 -07:00
Rémi Denis-Courmont	4e3d16ce5e	Phonet: resource routing backend When both destination device and object are nul, Phonet routes the packet according to the resource field. In fact, this is the most common pattern when sending Phonet "request" packets. In this case, the packet is delivered to whichever endpoint (socket) has registered the resource. This adds a new table so that Linux processes can register their Phonet sockets to Phonet resources, if they have adequate privileges. (Namespace support is not implemented at the moment.) Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 21:31:32 -07:00
Rémi Denis-Courmont	6482f554e2	Phonet: remove dangling pipe if an endpoint is closed early Closing a pipe endpoint is not normally allowed by the Phonet pipe, other than as a side after-effect of removing the pipe between two endpoints. But there is no way to prevent Linux userspace processes from being killed or suffering from bugs, so this can still happen. We might as well forcefully close Phonet pipe endpoints then. The cellular modem supports only a few existing pipes at a time. So we really should not leak them. This change instructs the modem to destroy the pipe if either of the pipe's endpoint (Linux socket) is closed too early. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 21:31:31 -07:00
David S. Miller	7fedd7e5df	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2010-09-15 20:21:48 -07:00
Arnd Bergmann	0a5f1d476a	irda/irnet: use noop_llseek There may be applications trying to seek on the irnet character device, so we should use noop_llseek to avoid returning an error when the default llseek changes to no_llseek. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 19:29:55 -07:00
Eric Dumazet	3a43be3c32	sit: get rid of ipip6_lock As RTNL is held while doing tunnels inserts and deletes, we can remove ipip6_lock spinlock. My initial RCU conversion was conservative and converted the rwlock to spinlock, with no RTNL requirement. Use appropriate rcu annotations and modern lockdep checks as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 19:29:47 -07:00
Eric Dumazet	1507850b40	gre: get rid of ipgre_lock As RTNL is held while doing tunnels inserts and deletes, we can remove ipgre_lock spinlock. My initial RCU conversion was conservative and converted the rwlock to spinlock, with no RTNL requirement. Use appropriate rcu annotations and modern lockdep checks as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 19:29:46 -07:00
Eric Dumazet	b7285b7912	ipip: get rid of ipip_lock As RTNL is held while doing tunnels inserts and deletes, we can remove ipip_lock spinlock. My initial RCU conversion was conservative and converted the rwlock to spinlock, with no RTNL requirement. Use appropriate rcu annotations and modern lockdep checks as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 19:29:46 -07:00
Ben Hutchings	e0de7c93b9	ethtool: Remove unimplemented flow specification types struct ethtool_rawip4_spec and struct ethtool_ether_spec are neither commented nor used by any driver, so remove them. Adjust padding in the user-visible unions that included these structures. Fix references to struct ethtool_rawip4_spec in ethtool_get_rx_ntuple(), which should use struct ethtool_usrip4_spec. struct ethtool_usrip4_spec cannot hold IPv6 host addresses and there is no separate structure that can, so remove ETH_RX_NFC_IP6 and the reference to it in niu. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-15 14:42:13 -07:00
Gerrit Renker	37efb03fbd	dccp ccid-3: Simplify and consolidate tx_parse_options This simplifies and consolidates the TX option-parsing code: 1. The Loss Intervals option is not currently used, so dead code related to this option is removed. I am aware of no plans to support the option, but if someone wants to implement it (e.g. for inter-op tests), it is better to start afresh than having to also update currently unused code. 2. The Loss Event and Receive Rate options have a lot of code in common (both are 32 bit, both have same length etc.), so this is consolidated. 3. The test against GSR is not necessary, because - on first loading CCID3, ccid_new() zeroes out all fields in the socket; - ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to pinv = opt_recv->ccid3or_loss_event_rate; if (pinv == ~0U \|\| pinv == 0) hctx->p = 0; - as a result, the sequence number field is removed from opt_recv. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-09-15 12:36:02 +02:00
Gerrit Renker	d2c726309d	dccp ccid-3: remove buggy RTT-sampling history lookup This removes the RTT-sampling function tfrc_tx_hist_rtt(), since 1. it suffered from complex passing of return values (the return value both indicated successful lookup while the value doubled as RTT sample); 2. when for some odd reason the sample value equalled 0, this triggered a bug warning about "bogus Ack", due to the ambiguity of the return value; 3. on a passive host which has not sent anything the TX history is empty and thus will lead to unwanted "bogus Ack" warnings such as ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148 ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606. The fix is to replace the implicit encoding by performing the steps manually. Furthermore, the "bogus Ack" warning has been removed, since it can actually be triggered due to several reasons (network reordering, old packet, (3) above), hence it is not very useful. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-09-15 12:36:02 +02:00
Gerrit Renker	20cbd3e120	dccp ccid-3: A lower bound for the inter-packet scheduling algorithm This fixes a subtle bug in the calculation of the inter-packet gap and shows that t_delta, as it is currently used, is not needed. The algorithm from RFC 5348, 8.3 below continually computes a send time t_nom, which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies the scheduling granularity, s the packet size, and X the sending rate: t_distance = t_nom - t_now; // in microseconds t_delta = min(t_ipi, t_gran) / 2; // `delta' parameter in microseconds if (t_distance >= t_delta) { reschedule after (t_distance / 1000) milliseconds; } else { t_ipi = s / X; // inter-packet interval in usec t_nom += t_ipi; // compute the next send time send packet now; } Problem: -------- Rescheduling requires a conversion into milliseconds (sk_reset_timer()). The highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher granularity does not make much sense here. As a consequence, values of t_distance < 1000 are truncated to 0. This issue has so far been resolved by using instead if (t_distance >= t_delta + 1000) reschedule after (t_distance / 1000) milliseconds; This is unnecessarily large, a lower bound is t_delta' = max(t_delta, 1000). And it implies a further simplification: a) when HZ >= 500, then t_delta <= t_gran/2 = 10^6/(2HZ) <= 1000, so that t_delta' = MAX(1000, t_delta) = 1000 (constant value); b) when HZ < 500, then t_delta = 1/2MIN(rtt, t_ipi, t_gran) <= t_gran/2, so that 1000 <= t_delta' <= t_gran/2. The maximum error of using a constant t_delta in (b) is less than half a jiffy. Fix: ---- The patch replaces t_delta with a constant, whose value depends on CONFIG_HZ, changing the above algorithm to: if (t_distance >= t_delta') reschedule after (t_distance / 1000) milliseconds; where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-09-15 12:36:01 +02:00
andrew hendry	21a4591794	X.25 remove bkl in connect Connect already has socket locking. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-14 20:39:09 -07:00
Andrew Hendry	141646ce56	X.25 remove bkl in accept Accept already has socket locking. [ Extend socket locking over TCP_LISTEN state test. -DaveM ] Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-14 20:38:54 -07:00
andrew hendry	90c27297a9	X.25 remove bkl in bind Accept updates socket values in 3 lines so wrapped with lock_sock. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-14 20:34:52 -07:00
andrew hendry	25aa4efe4f	X.25 remove bkl in listen Listen updates socket values and needs lock_sock. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-14 20:34:52 -07:00
Joe Perches	55b1804c67	net/irda: Use static const char * const where possible Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-14 20:22:05 -07:00
Eric Dumazet	83b6b1f5d1	flow: better memory management Allocate hash tables for every online cpus, not every possible ones. NUMA aware allocations. Dont use a full page on arches where PAGE_SIZE > 1024sizeof(void ) misc: __percpu , __read_mostly, __cpuinit annotations flow_compare_t is just an "unsigned long" Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-13 20:02:50 -07:00
stephen hemminger	9ca7f87622	pkt_sched: remov unnecessary bh_disable Now that est_tree_lock is acquired with BH protection, the other call is unnecessary. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-10 12:47:59 -07:00
Eric Dumazet	a034ee3cca	fib: cleanups Use rcu_dereference_rtnl() helper Change hard coded constants in fib_flag_trans() 7 -> RTN_UNREACHABLE 8 -> RTN_PROHIBIT Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-10 12:32:02 -07:00
David S. Miller	e548833df8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c	2010-09-09 22:27:33 -07:00
Paul Gortmaker	b2abd4c033	tipc: Optimize handling excess content on incoming messages Remove code that trimmed excess trailing info from incoming messages arriving over an Ethernet interface. TIPC now ignores the extra info while the message is being processed by the node, and only trims it off if the message is retransmitted to another node. (This latter step is done to ensure the extra info doesn't cause the sk_buff to exceed the outgoing interface's MTU limit.) The outgoing buffer is guaranteed to be linear. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-09 21:34:14 -07:00
Eric Dumazet	49d61e2390	tunnels: missing rcu_assign_pointer() xfrm4_tunnel_register() & xfrm6_tunnel_register() should use rcu_assign_pointer() to make sure previous writes (to handler->next) are committed to memory before chain insertion. deregister functions dont need a particular barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-09 15:02:39 -07:00
Namhyung Kim	f39234d606	net/core: add lock context change annotations in net/core/sock.c __lock_sock() and __release_sock() releases and regrabs lock but were missing proper annotations. Add it. This removes following warning from sparse. (Currently __lock_sock() does not emit any warning about it but I think it is better to add also.) net/core/sock.c:1580:17: warning: context imbalance in '__release_sock' - unexpected unlock Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-09 15:02:39 -07:00
Namhyung Kim	a700d8be73	net/core: remove address space warnings on verify_iovec() move_addr_to_kernel() and copy_from_user() requires their argument as __user pointer but were missing proper markups. Add it. This removes following warnings from sparse. net/core/iovec.c:44:52: warning: incorrect type in argument 1 (different address spaces) net/core/iovec.c:44:52: expected void [noderef] <asn:1>uaddr net/core/iovec.c:44:52: got void msg_name net/core/iovec.c:55:34: warning: incorrect type in argument 2 (different address spaces) net/core/iovec.c:55:34: expected void const [noderef] <asn:1>from net/core/iovec.c:55:34: got struct iovec msg_iov Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-09 15:02:38 -07:00
Joe Perches	123031c0ee	sctp: fix test for end of loop Add a list_has_sctp_addr function to simplify loop Based on a patches by Dan Carpenter and David Miller Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-09 15:00:29 -07:00
David S. Miller	cf0ac2b8a7	Merge branch 'for-davem' of git://oss.oracle.com/git/agrover/linux-2.6	2010-09-09 14:58:11 -07:00
David S. Miller	e199e6136c	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2010-09-08 23:49:04 -07:00
Eric Dumazet	719f835853	udp: add rehash on connect() commit `30fff923` introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-08 21:45:01 -07:00
Eric Dumazet	e0386005ff	net: inet_add_protocol() can use cmpxchg() Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and friends. inet_protos[] & inet6_protos[] are moved to read_mostly section Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-08 21:31:35 -07:00
Andy Grover	20c72bd5f5	RDS: Implement masked atomic operations Add two CMSGs for masked versions of cswp and fadd. args struct modified to use a union for different atomic op type's arguments. Change IB to do masked atomic ops. Atomic op type in rds_message similarly unionized. Signed-off-by: Andy Grover <andy.grover@oracle.com>	2010-09-08 18:16:51 -07:00
Zach Brown	59f740a6ae	RDS/IB: print string constants in more places This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:50 -07:00
Zach Brown	4518071ac1	RDS: cancel connection work structs as we shut down Nothing was canceling the send and receive work that might have been queued as a conn was being destroyed. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:49 -07:00
Zach Brown	ffcec0e110	RDS: don't call rds_conn_shutdown() from rds_conn_destroy() rds_conn_shutdown() can return before the connection is shut down when it encounters an existing state that it doesn't understand. This lets rds_conn_destroy() then start tearing down the conn from under paths that are still using it. It's more reliable the shutdown work and wait for krdsd to complete the shutdown callback. This stopped some hangs I was seeing where krdsd was trying to shut down a freed conn. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:48 -07:00
Zach Brown	5adb5bc65f	RDS: have sockets get transport module references Right now there's nothing to stop the various paths that use rs->rs_transport from racing with rmmod and executing freed transport code. The simple fix is to have binding to a transport also hold a reference to the transport's module, removing this class of races. We already had an unused t_owner field which was set for the modular transports and which wasn't set for the built-in loop transport. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:47 -07:00
Zach Brown	77510481c0	RDS: remove old rs_transport comment rs_transport is now also used by the rdma paths once the socket is bound. We don't need this stale comment to tell us what cscope can. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:46 -07:00
Zach Brown	fe8ff6b58f	RDS: lock rds_conn_count decrement in rds_conn_destroy() rds_conn_destroy() can race with all other modifications of the rds_conn_count but it was modifying the count without locking. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:45 -07:00
Zach Brown	ea819867b7	RDS/IB: protect the list of IB devices The RDS IB device list wasn't protected by any locking. Traversal in both the get_mr and FMR flushing paths could race with additon and removal. List manipulation is done with RCU primatives and is protected by the write side of a rwsem. The list traversal in the get_mr fast path is protected by a rcu read critical section. The FMR list traversal is more problematic because it can block while traversing the list. We protect this with the read side of the rwsem. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:44 -07:00
Zach Brown	1bde04a63d	RDS/IB: print IB event strings as well as their number It's nice to not have to go digging in the code to see which event occurred. It's easy to throw together a quick array that maps the ib event enums to their strings. I didn't see anything in the stack that does this translation for us, but I also didn't look very hard. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:43 -07:00
Chris Mason	8576f374ac	RDS: flush fmrs before allocating new ones Flushing FMRs is somewhat expensive, and is currently kicked off when the interrupt handler notices that we are getting low. The result of this is that FMR flushing only happens from the interrupt cpus. This spreads the load more effectively by triggering flushes just before we allocate a new FMR. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-09-08 18:16:42 -07:00
Chris Mason	b4e1da3c9a	RDS: properly use sg_init_table This is only needed to keep debugging code from bugging. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2010-09-08 18:16:41 -07:00
Zach Brown	f046011cd7	RDS/IB: track signaled sends We're seeing bugs today where IB connection shutdown clears the send ring while the tasklet is processing completed sends. Implementation details cause this to dereference a null pointer. Shutdown needs to wait for send completion to stop before tearing down the connection. We can't simply wait for the ring to empty because it may contain unsignaled sends that will never be processed. This patch tracks the number of signaled sends that we've posted and waits for them to complete. It also makes sure that the tasklet has finished executing. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:40 -07:00
Zach Brown	ef87b7ea39	RDS: remove __init and __exit annotation The trivial amount of memory saved isn't worth the cost of dealing with section mismatches. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:39 -07:00
Andy Grover	c20f5b9633	RDS/IB: Use SLAB_HWCACHE_ALIGN flag for kmem_cache_create() We are definitely counting cycles as closely as DaveM, so ensure hwcache alignment for our recv ring control structs. Signed-off-by: Andy Grover <andy.grover@oracle.com>	2010-09-08 18:16:38 -07:00
Zach Brown	d455ab6409	RDS/IB: always process recv completions The recv refill path was leaking fragments because the recv event handler had marked a ring element as free without freeing its frag. This was happening because it wasn't processing receives when the conn wasn't marked up or connecting, as can be the case if it races with rmmod. Two observations support always processing receives in the callback. First, buildup should only post receives, thus triggering recv event handler calls, once it has built up all the state to handle them. Teardown should destroy the CQ and drain the ring before tearing down the state needed to process recvs. Both appear to be true today. Second, this test was fundamentally racy. There is nothing to stop rmmod and connection destruction from swooping in the moment after the conn state was sampled but before real receive procesing starts. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:36 -07:00
Zach Brown	80c51be56f	RDS: return to a single-threaded krdsd We were seeing very nasty bugs due to fundamental assumption the current code makes about concurrent work struct processing. The code simpy isn't able to handle concurrent connection shutdown work function execution today, for example, which is very much possible once a multi-threaded krdsd was introduced. The problem compounds as additional work structs are added to the mix. krdsd is no longer perforance critical now that send and receive posting and FMR flushing are done elsewhere, so the safest fix is to move back to the single threaded krdsd that the current code was built around. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:35 -07:00
Zach Brown	515e079dab	RDS/IB: create a work queue for FMR flushing This patch moves the FMR flushing work in to its own mult-threaded work queue. This is to maintain performance in preparation for returning the main krdsd work queue back to a single threaded work queue to avoid deep-rooted concurrency bugs. This is also good because it further separates FMRs, which might be removed some day, from the rest of the code base. Signed-off-by: Zach Brown <zach.brown@oracle.com>	2010-09-08 18:16:34 -07:00

1 2 3 4 5 ...

16504 Commits