Commit Graph

16513 Commits

Author SHA1 Message Date
Andy Grover 2fa57129df RDS: Bypass workqueue when queueing cong updates
Now that rds_send_xmit() does not block, we can call it directly
instead of going through the helper thread.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:16 -07:00
Andy Grover a7d3a28148 RDS: Call rds_send_xmit() directly from sendmsg()
rds_sendmsg() is calling the send worker function to
send the just-queued datagrams, presumably because it wants
the behavior where anything not sent will re-call the send
worker. We now ensure all queued datagrams are sent by retrying
from the send completion handler, so this isn't needed any more.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:15 -07:00
Andy Grover 2ad8099b58 RDS: rds_send_xmit() locking/irq fixes
rds_message_put() cannot be called with irqs off, so move it after
irqs are re-enabled.

Spinlocks throughout the function do not to use _irqsave because
the lock of c_send_lock at top already disabled irqs.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:13 -07:00
Andy Grover 049ee3f500 RDS: Change send lock from a mutex to a spinlock
This change allows us to call rds_send_xmit() from a tasklet,
which is crucial to our new operating model.

* Change c_send_lock to a spinlock
* Update stats fields "sem_" to "_lock"
* Remove unneeded rds_conn_is_sending()

About locking between shutdown and send -- send checks if the
connection is up. Shutdown puts the connection into
DISCONNECTING. After this, all threads entering send will exit
immediately. However, a thread could be *in* send_xmit(), so
shutdown acquires the c_send_lock to ensure everyone is out
before proceeding with connection shutdown.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:12 -07:00
Andy Grover f17a1a55fb RDS: Refill recv ring directly from tasklet
Performance is better if we use allocations that don't block
to refill the receive ring. Since the whole reason we were
kicking out to the worker thread was so we could do blocking
allocs, we no longer need to do this.

Remove gfp params from rds_ib_recv_refill(); we always use
GFP_NOWAIT.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:11 -07:00
Andy Grover 77dd550e55 RDS: Stop supporting old cong map sending method
We now ask the transport to give us a rm for the congestion
map, and then we handle it normally. Previously, the
transport defined a function that we would call to send
a congestion map.

Convert TCP and loop transports to new cong map method.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:10 -07:00
Andy Grover e32b4a7049 RDS/IB: Do not wait for send ring to be empty on conn shutdown
Now that we are signaling send completions much less, we are likely
to have dirty entries in the send queue when the connection is
shut down (on rmmod, for example.) These are cleaned up a little
further down in conn_shutdown, but if we wait on the ring_empty_wait
for them, it'll never happen, and we hand on unload.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:09 -07:00
Andy Grover ff3d7d3613 RDS: Perform unmapping ops in stages
Previously, RDS would wait until the final send WR had completed
and then handle cleanup. With silent ops, we do not know
if an atomic, rdma, or data op will be last. This patch
handles any of these cases by keeping a pointer to the last
op in the message in m_last_op.

When the TX completion event fires, rds dispatches to per-op-type
cleanup functions, and then does whole-message cleanup, if the
last op equalled m_last_op.

This patch also moves towards having op-specific functions take
the op struct, instead of the overall rm struct.

rds_ib_connection has a pointer to keep track of a a partially-
completed data send operation. This patch changes it from an
rds_message pointer to the narrower rm_data_op pointer, and
modifies places that use this pointer as needed.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:08 -07:00
Andy Grover aa0a4ef4ac RDS: Make sure cmsgs aren't used in improper ways
It hasn't cropped up in the field, but this code ensures it is
impossible to issue operations that pass an rdma cookie (DEST, MAP)
in the same sendmsg call that's actually initiating rdma or atomic
ops.

Disallowing this perverse-but-technically-allowed usage makes silent
RDMA heuristics slightly easier.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:07 -07:00
Andy Grover 2c3a5f9abb RDS: Add flag for silent ops. Do atomic op before RDMA
Add a flag to the API so users can indicate they want
silent operations. This is needed because silent ops
cannot be used with USE_ONCE MRs, so we can't just
assume silent.

Also, change send_xmit to do atomic op before rdma op if
both are present, and centralize the hairy logic to determine if
we want to attempt silent, or not.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:06 -07:00
Andy Grover 7e3bd65ebf RDS: Move some variables around for consistency
Also, add a comment.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:05 -07:00
Andy Grover 940786eb0a RDS: queue failure notifications for dropped atomic ops
When dropping ops in the send queue, we notify the client
of failed rdma ops they asked for notifications on, but not
atomic ops. It should be for both.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:04 -07:00
Andy Grover ee4c7b47e4 RDS: Add a warning if trying to allocate 0 sgs
rds_message_alloc_sgs() only works when nents is nonzero.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:03 -07:00
Andy Grover 372cd7dedf RDS: Do not set op_active in r_m_copy_from_user().
Do not allocate sgs for data for 0-length datagrams

Set data.op_active in rds_sendmsg() instead of
rds_message_copy_from_user().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:02 -07:00
Andy Grover 5b2366bd28 RDS: Rewrite rds_send_xmit
Simplify rds_send_xmit().

Send a congestion map (via xmit_cong_map) without
decrementing send_quota.

Move resetting of conn xmit variables to end of loop.

Update comments.

Implement a special case to turn off sending an rds header
when there is an atomic op and no other data.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:12:01 -07:00
Andy Grover 6c7cc6e469 RDS: Rename data op members prefix from m_ to op_
For consistency.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:59 -07:00
Andy Grover f8b3aaf2ba RDS: Remove struct rds_rdma_op
A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:58 -07:00
Andy Grover d0ab25a83c RDS: purge atomic resources too in rds_message_purge()
Add atomic_free_op function, analogous to rdma_free_op,
and call it in rds_message_purge().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:57 -07:00
Andy Grover 4324879df0 RDS: Inline rdma_prepare into cmsg_rdma_args
cmsg_rdma_args just calls rdma_prepare and does a little
arg checking -- not quite enough to justify its existence.
Plus, it is the only caller of rdma_prepare().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:56 -07:00
Andy Grover 241eef3e2f RDS: Implement silent atomics
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:55 -07:00
Andy Grover d37c935905 RDS: Move loop-only function to loop.c
Also, try to better-document the locking around the
rm and its m_inc in loop.c.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:54 -07:00
Andy Grover c8de3f1005 RDS/IB: Make all flow control code conditional on i_flowctl
Maybe things worked fine with the flow control code running
even in the non-flow-control case, but making it explicitly
conditional helps the non-fc case be easier to read.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:53 -07:00
Andy Grover 1d34f17571 RDS: Remove unsignaled_bytes sysctl
Removed unsignaled_bytes sysctl and code to signal
based on it. I believe unsignaled_wrs is more than
sufficient for our purposes.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:52 -07:00
Andy Grover da5a06cef5 RDS: rewrite rds_ib_xmit
Now that the header always goes first, it is possible to
simplify rds_ib_xmit. Instead of having a path to handle 0-byte
dgrams and another path to handle >0, these can both be handled
in one path. This lets us eliminate xmit_populate_wr().

Rename sent to bytes_sent, to differentiate better from other
variable named "send".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:51 -07:00
Andy Grover 919ced4ce7 RDS/IB: Remove ib_[header/data]_sge() functions
These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.

Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:50 -07:00
Andy Grover 6f3d05db0d RDS/IB: Remove dead code
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:49 -07:00
Andy Grover f147dd9eca RDS/IB: Disallow connections less than RDS 3.1
RDS 3.0 connections (in OFED 1.3 and earlier) put the
header at the end. 3.1 connections put it at the head.
The code has significant added complexity in order to
handle both configurations. In OFED 1.6 we can
drop this and simplify the code by only supporting
"header-first" configuration.

This patch checks the protocol version, and if prior
to 3.1, does not complete the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:48 -07:00
Andy Grover 9c030391e8 RDS/IB: eliminate duplicate code
both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:47 -07:00
Andy Grover 809fa148a2 RDS: inc_purge() transport function unused - remove it
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:46 -07:00
Andy Grover 6200ed7799 RDS: Whitespace
Tidy up some whitespace issues.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:44 -07:00
Andy Grover d22faec22c RDS: Do not mask address when pinning pages
This does not appear to be necessary.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:43 -07:00
Andy Grover 40589e74f7 RDS: Base init_depth and responder_resources on hw values
Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:42 -07:00
Andy Grover 15133f6e67 RDS: Implement atomic operations
Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:41 -07:00
Andy Grover a63273d499 RDS: Clear up some confusing code in send_remove_from_sock
The previous code was correct, but made the assumption that
if r_notifier was non-NULL then either r_recverr or r_notify
was true. Valid, but fragile. Changed to explicitly check
r_recverr (shows up in greps for recverr now, too.)

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:40 -07:00
Andy Grover f4dd96f7b2 RDS: make sure all sgs alloced are initialized
rds_message_alloc_sgs() now returns correctly-initialized
sg lists, so calleds need not do this themselves.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:39 -07:00
Andy Grover ff87e97a9d RDS: make m_rdma_op a member of rds_message
This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.

rdma SGs allocated from rm sg pool.

rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:38 -07:00
Andy Grover 21f79afa5f RDS: fold rdma.h into rds.h
RDMA is now an intrinsic part of RDS, so it's easier to just have
a single header.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:37 -07:00
Andy Grover fc445084f1 RDS: Explicitly allocate rm in sendmsg()
r_m_copy_from_user used to allocate the rm as well as kernel
buffers for the data, and then copy the data in. Now, sendmsg()
allocates the rm, although the data buffer alloc still happens
in r_m_copy_from_user.

SGs are still allocated with rm, but now r_m_alloc_sgs() is
used to reserve them. This allows multiple SG lists to be
allocated from the one rm -- this is important once we also
want to alloc our rdma sgl from this pool.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:36 -07:00
Andy Grover 3ef13f3c22 RDS: cleanup/fix rds_rdma_unuse
First, it looks to me like the atomic_inc is wrong.
We should be decrementing refcount only once here, no? It's
already being done by the mr_put() at the end.

Second, simplify the logic a bit by bailing early (with a warning)
if !mr.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:35 -07:00
Andy Grover e779137aa7 RDS: break out rdma and data ops into nested structs in rds_message
Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:33 -07:00
Andy Grover 8690bfa17a RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons
Favor "if (foo)" style over "if (foo != NULL)".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:11:32 -07:00
Andy Grover 2dc3935734 RDS: move rds_shutdown_worker impl. to rds_conn_shutdown
This fits better in connection.c, rather than threads.c.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:10:13 -07:00
Andy Grover 9de0864cf5 RDS: Fix locking in send on m_rs_lock
Do not nest m_rs_lock under c_lock

Disable interrupts in {rdma,atomic}_send_complete

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:32 -07:00
Andy Grover 7c82eaf00e RDS: Rewrite rds_send_drop_to() for clarity
This function has been the source of numerous bugs; it's just
too complicated. Simplified to nest spinlocks cleanly within
the second loop body, and kick out early if there are no
rms to drop.

This will be a little slower because conn lock is grabbed for
each entry instead of "caching" the lock across rms, but this
should be entirely irrelevant to fastpath performance.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:32 -07:00
Tina Yang 35b52c7053 RDS: Fix corrupted rds_mrs
On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion.  A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,

Thread #1
t0:
    sock_release
    rds_release
    rds_send_drop_to /* wait on send completion */
t2:
    rds_rdma_drop_keys()   /* destroy & free all mrs */

Thread #2
t1:
    rds_ib_send_cq_comp_handler
    rds_ib_send_unmap_rm
    rds_message_unmapped   /* wake up #1 @ t0 */
t3:
    rds_message_put
    rds_message_purge
    rds_mr_put   /* memory corruption detected */

The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.

        /* Release any MRs associated with this socket */
        while ((node = rb_first(&rs->rs_rdma_keys))) {
                mr = container_of(node, struct rds_mr, r_rb_node);
                if (mr->r_trans == rs->rs_transport)
                        mr->r_invalidate = 0;
                rds_mr_put(mr);
        }

I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one.  Whichever thread
holds the last reference will free the mr via rds_mr_put().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:31 -07:00
Andy Grover 9e2effba2c RDS: Fix BUG_ONs to not fire when in a tasklet
in_interrupt() is true in softirqs. The BUG_ONs are supposed
to check for if irqs are disabled, so we should use
BUG_ON(irqs_disabled()) instead, duh.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08 18:07:31 -07:00
Jianzhao Wang ae2688d59b net: blackhole route should always be recalculated
Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
triggered by IKE for example), hence this kind of route is always
temporary and so we should check if a better route exists for next
packets.
Bug has been introduced by commit d11a4dc18b.

Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 14:35:43 -07:00
Jarek Poplawski f6b085b69d ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
Hi,
Here is one more of these warnings and a patch below:

Sep  5 23:52:33 del kernel: [46044.244833] ===================================================
Sep  5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ]
Sep  5 23:52:33 del kernel: [46044.277000] ---------------------------------------------------
Sep  5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!
Sep  5 23:52:33 del kernel: [46044.293627]
Sep  5 23:52:33 del kernel: [46044.293632] other info that might help us debug this:
Sep  5 23:52:33 del kernel: [46044.293634]
Sep  5 23:52:33 del kernel: [46044.325333]
Sep  5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0
Sep  5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717:
Sep  5 23:52:33 del kernel: [46044.357548]  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20
Sep  5 23:52:33 del kernel: [46044.367647]
Sep  5 23:52:33 del kernel: [46044.367652] stack backtrace:
Sep  5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3
Sep  5 23:52:33 del kernel: [46044.398764] Call Trace:
Sep  5 23:52:33 del kernel: [46044.409596]  [<c12f9aba>] ? printk+0x18/0x1e
Sep  5 23:52:33 del kernel: [46044.420761]  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
Sep  5 23:52:33 del kernel: [46044.432229]  [<c12b7235>] trie_firstleaf+0x65/0x70
Sep  5 23:52:33 del kernel: [46044.443941]  [<c12b74d4>] fib_table_flush+0x14/0x170
Sep  5 23:52:33 del kernel: [46044.455823]  [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0
Sep  5 23:52:33 del kernel: [46044.467995]  [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40
Sep  5 23:52:33 del kernel: [46044.480404]  [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180
Sep  5 23:52:33 del kernel: [46044.493025]  [<c12b069d>] fib_flush+0x2d/0x60
Sep  5 23:52:33 del kernel: [46044.505796]  [<c12b06f5>] fib_disable_ip+0x25/0x50
Sep  5 23:52:33 del kernel: [46044.518772]  [<c12b10d3>] fib_netdev_event+0x73/0xd0
Sep  5 23:52:33 del kernel: [46044.531918]  [<c1048dfd>] notifier_call_chain+0x2d/0x70
Sep  5 23:52:33 del kernel: [46044.545358]  [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20
Sep  5 23:52:33 del kernel: [46044.559092]  [<c124f687>] call_netdevice_notifiers+0x27/0x60
Sep  5 23:52:33 del kernel: [46044.573037]  [<c124faec>] __dev_notify_flags+0x5c/0x80
Sep  5 23:52:33 del kernel: [46044.586489]  [<c124fb47>] dev_change_flags+0x37/0x60
Sep  5 23:52:33 del kernel: [46044.599394]  [<c12a8a8d>] devinet_ioctl+0x54d/0x630
Sep  5 23:52:33 del kernel: [46044.612277]  [<c12aabb7>] inet_ioctl+0x97/0xc0
Sep  5 23:52:34 del kernel: [46044.625208]  [<c123f6af>] sock_ioctl+0x6f/0x270
Sep  5 23:52:34 del kernel: [46044.638046]  [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0
Sep  5 23:52:34 del kernel: [46044.650968]  [<c123f640>] ? sock_ioctl+0x0/0x270
Sep  5 23:52:34 del kernel: [46044.663865]  [<c10c3188>] vfs_ioctl+0x28/0xa0
Sep  5 23:52:34 del kernel: [46044.676556]  [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0
Sep  5 23:52:34 del kernel: [46044.688989]  [<c1048676>] ? up_read+0x16/0x30
Sep  5 23:52:34 del kernel: [46044.701411]  [<c1021376>] ? do_page_fault+0x1d6/0x3a0
Sep  5 23:52:34 del kernel: [46044.714223]  [<c10b6588>] ? fget_light+0xf8/0x2f0
Sep  5 23:52:34 del kernel: [46044.726601]  [<c1241f98>] ? sys_socketcall+0x208/0x2c0
Sep  5 23:52:34 del kernel: [46044.739140]  [<c10c3eb3>] sys_ioctl+0x63/0x70
Sep  5 23:52:34 del kernel: [46044.751967]  [<c12fca3d>] syscall_call+0x7/0xb
Sep  5 23:52:34 del kernel: [46044.764734]  [<c12f0000>] ? cookie_v6_check+0x3d0/0x630

-------------->

This patch fixes the warning:
 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 1 lock held by pppd/1717:
  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20

 stack backtrace:
 Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3
 Call Trace:
  [<c12f9aba>] ? printk+0x18/0x1e
  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
  [<c12b7235>] trie_firstleaf+0x65/0x70
  [<c12b74d4>] fib_table_flush+0x14/0x170
  ...

Allow trie_firstleaf() to be called either under rcu_read_lock()
protection or with RTNL held. The same annotation is added to
node_parent_rcu() to prevent a similar warning a bit later.

Followup of commits 634a4b20 and 4eaa0e3c.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 14:14:20 -07:00
Namhyung Kim fb8621bb6c net: remove address space warnings in net/socket.c
Casts __kernel to __user pointer require __force markup, so add it. Also
sock_get/setsockopt() takes @optval and/or @optlen arguments as user pointers
but were taking kernel pointers, use new variables 'uoptval' and/or 'uoptlen'
to fix it. These remove following warnings from sparse:

 net/socket.c:1922:46: warning: cast adds address space to expression (<asn:1>)
 net/socket.c:3061:61: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3061:61:    expected char [noderef] <asn:1>*optval
 net/socket.c:3061:61:    got char *optval
 net/socket.c:3061:69: warning: incorrect type in argument 5 (different address spaces)
 net/socket.c:3061:69:    expected int [noderef] <asn:1>*optlen
 net/socket.c:3061:69:    got int *optlen
 net/socket.c:3063:67: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3063:67:    expected char [noderef] <asn:1>*optval
 net/socket.c:3063:67:    got char *optval
 net/socket.c:3064:45: warning: incorrect type in argument 5 (different address spaces)
 net/socket.c:3064:45:    expected int [noderef] <asn:1>*optlen
 net/socket.c:3064:45:    got int *optlen
 net/socket.c:3078:61: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3078:61:    expected char [noderef] <asn:1>*optval
 net/socket.c:3078:61:    got char *optval
 net/socket.c:3080:67: warning: incorrect type in argument 4 (different address spaces)
 net/socket.c:3080:67:    expected char [noderef] <asn:1>*optval
 net/socket.c:3080:67:    got char *optval

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:46:13 -07:00
Changli Gao 6febfca98f net: rps: add the shortcut for one rps_cpus
When there is only one rps_cpus, skb_get_rxhash() can be eliminated.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:10:53 -07:00
Diego Elio 'Flameeyes' Pettenò 65040c33ee sctp: implement SIOCINQ ioctl() (take 3)
This simple patch copies the current approach for SIOCINQ ioctl() from DCCP
into SCTP so that the userland code working with SCTP can use a similar
interface across different protocols to know how much space to allocate for
a buffer.

Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 13:08:05 -07:00
Julian Anastasov 6523ce1525 ipvs: fix active FTP
- Do not create expectation when forwarding the PORT
  command to avoid blocking the connection. The problem is that
  nf_conntrack_ftp.c:help() tries to create the same expectation later in
  POST_ROUTING and drops the packet with "dropping packet" message after
  failure in nf_ct_expect_related.

- Change ip_vs_update_conntrack to alter the conntrack
  for related connections from real server. If we do not alter the reply in
  this direction the next packet from client sent to vport 20 comes as NEW
  connection. We alter it but may be some collision happens for both
  conntracks and the second conntrack gets destroyed immediately. The
  connection stucks too.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 10:39:57 -07:00
Jarek Poplawski 64289c8e68 gro: Re-fix different skb headrooms
The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p->data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be4333f

Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 10:32:15 -07:00
Linus Torvalds 608307e6de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
  pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator
  ipvs: avoid oops for passive FTP
  Revert "sky2: don't do GRO on second port"
  gro: fix different skb headrooms
  bridge: Clear INET control block of SKBs passed into ip_fragment().
  3c59x: Remove incorrect locking; correct documented lock hierarchy
  sky2: don't do GRO on second port
  ipv4: minor fix about RPF in help of Kconfig
  xfrm_user: avoid a warning with some compiler
  net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf()
  pxa168_eth: fix a mdiobus leak
  net sched: fix kernel leak in act_police
  vhost: stop worker only if created
  MAINTAINERS: Add ehea driver as Supported
  ath9k_hw: fix parsing of HT40 5 GHz CTLs
  ath9k_hw: Fix EEPROM uncompress block reading on AR9003
  wireless: register wiphy rfkill w/o holding cfg80211_mutex
  netlink: Make NETLINK_USERSOCK work again.
  irda: Correctly clean up self->ias_obj on irda_bind() failure.
  wireless extensions: fix kernel heap content leak
  ...
2010-09-07 14:06:10 -07:00
David S. Miller 6f86b32518 ipv4: Fix reverse path filtering with multipath routing.
Actually iterate over the next-hops to make sure we have
a device match.  Otherwise RP filtering is always elided
when the route matched has multiple next-hops.

Reported-by: Igor M Podlesny <for.poige@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:24 -07:00
Tetsuo Handa 8df73ff90f UNIX: Do not loop forever at unix_autobind().
We assumed that unix_autobind() never fails if kzalloc() succeeded.
But unix_autobind() allows only 1048576 names. If /proc/sys/fs/file-max is
larger than 1048576 (e.g. systems with more than 10GB of RAM), a local user can
consume all names using fork()/socket()/bind().

If all names are in use, those who call bind() with addr_len == sizeof(short)
or connect()/sendmsg() with setsockopt(SO_PASSCRED) will continue

  while (1)
        yield();

loop at unix_autobind() till a name becomes available.
This patch adds a loop counter in order to give up after 1048576 attempts.

Calling yield() for once per 256 attempts may not be sufficient when many names
are already in use, for __unix_find_socket_byname() can take long time under
such circumstance. Therefore, this patch also adds cond_resched() call.

Note that currently a local user can consume 2GB of kernel memory if the user
is allowed to create and autobind 1048576 UNIX domain sockets. We should
consider adding some restriction for autobind operation.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:23 -07:00
Dan Carpenter cf9b94f88b irda: off by one
This is an off by one.  We would go past the end when we NUL terminate
the "value" string at end of the function.  The "value" buffer is
allocated in irlan_client_parse_response() or
irlan_provider_parse_command().

CC: stable@kernel.org
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:22 -07:00
Nicolas Dichtel 1ee89bd0fe netfilter: discard overlapping IPv6 fragment
RFC5722 prohibits reassembling IPv6 fragments when some data overlaps.

Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:21 -07:00
Nicolas Dichtel 70789d7052 ipv6: discard overlapping fragment
RFC5722 prohibits reassembling fragments when some data overlaps.

Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:21 -07:00
Helmut Schaa deabc772f3 net: fix tx queue selection for bridged devices implementing select_queue
When a net device is implementing the select_queue callback and is part of
a bridge, frames coming from the bridge already have a tx queue associated
to the socket (introduced in commit a4ee3ce329,
"net: Use sk_tx_queue_mapping for connected sockets"). The call to
sk_tx_queue_get will then return the tx queue used by the bridge instead
of calling the select_queue callback.

In case of mac80211 this broke QoS which is implemented by using the
select_queue callback. Furthermore it introduced problems with rt2x00
because frames with the same TID and RA sometimes appeared on different
tx queues which the hw cannot handle correctly.

Fix this by always calling select_queue first if it is available and only
afterwards use the socket tx queue mapping.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-07 13:57:20 -07:00
Eric Dumazet db40980fcd net: poll() optimizations
No need to test twice sk->sk_shutdown & RCV_SHUTDOWN

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:48:45 -07:00
Joe Perches f6c9322c3a net/caifcaif_dev.c: Use netdev_<level>
Convert pr_<level>("%s" ..., (struct netdev *)->name ...)
to netdev_<level>((struct netdev *), ...)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:48:43 -07:00
Joe Perches b31fa5bad5 net/caif: Use pr_fmt
This patch standardizes caif message logging prefixes.

Add #define pr_fmt(fmt) KBUILD_MODNAME ":%s(): " fmt, __func__
Add missing "\n"s to some logging messages
Convert pr_warning to pr_warn

This changes the logging message prefix from CAIF: to caif:
for all uses but caif_socket.c and chnl_net.c.  Those now use
their filename without extension.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:48:43 -07:00
Julia Lawall 29af9309db net/9p/trans_fd.c: Fix unsigned return type
The function has an unsigned return type, but returns a negative constant
to indicate an error condition.  The result of calling the function is
always stored in a variable of type (signed) int, and thus unsigned can be
dropped from the return type.

A sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@exists@
identifier f;
constant C;
@@

 unsigned f(...)
 { <+...
*  return -C;
 ...+> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:48:42 -07:00
Eric Dumazet 1fd63041c4 net: pskb_expand_head() optimization
pskb_expand_head() blindly takes references on fragments before calling
skb_release_data(), potentially releasing these references.

We can add a fast path, avoiding these atomic operations, if we own the
last reference on skb->head.

Based on a previous patch from David

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:24:59 -07:00
Allan Stephens d1fb62796c tipc: Fix misleading error code when enabling Ethernet bearers
Cause TIPC to return EAGAIN if it is unable to enable a new Ethernet
bearer because one or more recently disabled Ethernet bearers are
temporarily consuming resources during shut down.  (The previous error
code, EDQUOT, is now returned only if all available Ethernet bearer
data structures are fully enabled at the time the request to enable an
additional bearer is received.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:12:57 -07:00
Allan Stephens 9fbfca0131 tipc: Ensure outgoing messages on Ethernet have sufficient headroom
Add code to expand the headroom of an outgoing TIPC message if the
sk_buff has insufficient room to hold the header for the associated
Ethernet device.  This change is necessary to ensure that messages
TIPC does not create itself (eg. incoming messages that are being
routed to another node) do not cause problems, since TIPC has no
control over the amount of headroom available in such messages.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:12:56 -07:00
Allan Stephens 5d9c54c1e9 tipc: Minor optimizations to name table translation code
Optimizes TIPC's name table translation code to avoid unnecessary
manipulation of the node address field of the resulting port id when
name translation fails.  This change is possible because a valid port
id cannot have a reference field of zero, so examining the reference
only is sufficient to determine if the translation was successful.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-06 18:12:56 -07:00
Eric Dumazet 52ee7a04a0 net: remove two kmemcheck annotations
__alloc_skb() uses a memset() to clear all the beginning of skb,
including bitfields contained in 'flags1' & 'flags2'.

We dont need any more to use kmemcheck_annotate_bitfield() on these
fields. However, we still need it for the clone part, which is not
cleared.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-03 09:44:51 -07:00
Thomas Graf c3bccac2fa ipv6: add special mode forwarding=2 to send RS while configured as router
Similar to accepting router advertisement, the IPv6 stack does not send router
solicitations if forwarding is enabled.

This patch enables this behavior to be overruled by setting forwarding to the
special value 2.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-03 09:43:14 -07:00
Thomas Graf 65e9b62d45 ipv6: add special mode accept_ra=2 to accept RA while configured as router
The current IPv6 behavior is to not accept router advertisements while
forwarding, i.e. configured as router.

This does make sense, a router is typically not supposed to be auto
configured. However there are exceptions and we should allow the
current behavior to be overwritten.

Therefore this patch enables the user to overrule the "if forwarding
enabled then don't listen to RAs" rule by setting accept_ra to the
special value of 2.

An alternative would be to ignore the forwarding switch alltogether
and solely accept RAs based on the value of accept_ra. However, I
found that if not intended, accepting RAs as a router can lead to
strange unwanted behavior therefore we it seems wise to only do so
if the user explicitely asks for this behavior.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-03 09:43:13 -07:00
Jarek Poplawski 0b5d404e34 pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator
This patch fixes a lockdep warning:

[  516.287584] =========================================================
[  516.288386] [ INFO: possible irq lock inversion dependency detected ]
[  516.288386] 2.6.35b #7
[  516.288386] ---------------------------------------------------------
[  516.288386] swapper/0 just changed the state of lock:
[  516.288386]  (&qdisc_tx_lock){+.-...}, at: [<c12eacda>] est_timer+0x62/0x1b4
[  516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past:
[  516.288386]  (est_tree_lock){+.+...}
[  516.288386] 
[  516.288386] and interrupts could create inverse lock ordering between them.
...

So, est_tree_lock needs BH protection because it's taken by
qdisc_tx_lock, which is used both in BH and process contexts.
(Full warning with this patch at netdev, 02 Sep 2010.)

Fixes commit: ae638c47dc
("pkt_sched: gen_estimator: add a new lock")

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02 13:22:11 -07:00
David S. Miller 7162f6691e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-09-02 12:45:44 -07:00
John W. Linville 78ab952717 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-09-02 13:30:07 -04:00
Changli Gao deffd77759 net: arp: code cleanup
Clean the code up according to Documentation/CodingStyle.

Don't initialize the variable dont_send in arp_process().

Remove the temporary varialbe flags in arp_state_to_flags().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02 10:12:06 -07:00
Eric Dumazet c07b68e841 net: dev_add_pack() & __dev_remove_pack() changes
Add a small helper ptype_head() to get the head to manipulate

dev_add_pack() & __dev_remove_pack() can use a spinlock without
blocking BH, since softirq use RCU, and these functions are run from
process context only.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02 10:12:06 -07:00
Julian Anastasov 7bcbf81a22 ipvs: avoid oops for passive FTP
Fix Passive FTP problem in ip_vs_ftp:

- Do not oops in nf_nat_set_seq_adjust (adjust_tcp_sequence) when
  iptable_nat module is not loaded

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02 10:05:00 -07:00
Julian Anastasov 8ed2163ff3 ipvs: use pkts for SCTP too
Use correctly the in_pkts packet counter also for SCTP

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02 10:04:18 -07:00
Eric Dumazet 95f4b45bc6 net: another last_rx round
Kill last_rx use in l2tp and two net drivers

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-02 09:19:32 -07:00
Eric Dumazet 3d3be4333f gro: fix different skb headrooms
Packets entering GRO might have different headrooms, even for a given
flow (because of implementation details in drivers, like copybreak).
We cant force drivers to deliver packets with a fixed headroom.

1) fix skb_segment()

skb_segment() makes the false assumption headrooms of fragments are same
than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
errors, and crash later in skb_copy_and_csum_dev()

2) allocate a minimal skb for head of frag_list

skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
allocate a fresh skb. This adds NET_SKB_PAD to a padding already
provided by netdevice, depending on various things, like copybreak.

Use alloc_skb() to allocate an exact padding, to reduce cache line
needs:
NET_SKB_PAD + NET_IP_ALIGN

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626

Many thanks to Plamen Petrov, testing many debugging patches !
With help of Jarek Poplawski.

Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 19:17:35 -07:00
David S. Miller 87f94b4e91 bridge: Clear INET control block of SKBs passed into ip_fragment().
In a similar vain to commit 17762060c2
("bridge: Clear IPCB before possible entry into IP stack")

Any time we call into the IP stack we have to make sure the state
there is as expected by the ipv4 code.

With help from Eric Dumazet and Herbert Xu.

Reported-by: Bandan Das <bandan.das@stratus.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 19:17:34 -07:00
Gerrit Renker 0705c6f0e2 tcp: update also tcp_output with regard to RFC 5681
Thanks to Ilpo Jarvinen, this updates also the initial window
setting for tcp_output with regard to RFC 5681.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 18:15:00 -07:00
stephen hemminger fa50d64576 net: make rx_queue sysfs_ops const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 18:12:20 -07:00
Nicolas Dichtel 750e9fad8c ipv4: minor fix about RPF in help of Kconfig
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 14:29:36 -07:00
Nicolas Dichtel 928497f020 xfrm_user: avoid a warning with some compiler
Attached is a small patch to remove a warning ("warning: ISO C90 forbids
mixed declarations and code" with gcc 4.3.2).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 14:29:35 -07:00
Michal Soltys 3b2eb6131e net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf()
This patch fixes init_vf() function, so on each new backlog period parent's
cl_cfmin is properly updated (including further propgation towards the root),
even if the activated leaf has no upperlimit curve defined.

Signed-off-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 14:29:35 -07:00
Jeff Mahoney 0f04cfd098 net sched: fix kernel leak in act_police
While reviewing commit 1c40be12f7, I
 audited other users of tc_action_ops->dump for information leaks.

 That commit covered almost all of them but act_police still had a leak.

 opt.limit and opt.capab aren't zeroed out before the structure is
 passed out.

 This patch uses the C99 initializers to zero everything unused out.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 14:29:34 -07:00
John W. Linville 85f72bc839 mac80211: only cancel software-based scans on suspend
Otherwise the hardware scan handler could access an invalid scan request
structure.  The driver should cancel any pending hardware scans during
the suspend process anyway, so also add a warning if the hardware scan
is still pending when the device resumes.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-09-01 16:12:28 -04:00
David S. Miller a3f86ec002 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-09-01 12:01:05 -07:00
Eric Dumazet 6602cebb5b net: skbuff.c cleanup
(skb->data - skb->head) can be changed by skb_headroom(skb)

Remove some uses of NET_SKBUFF_DATA_USES_OFFSET, using
(skb_end_pointer(skb) - skb->head) or
(skb_tail_pointer(skb) - skb->head) : compiler does the right thing,
and this is more readable for us ;)

(struct skb_shared_info *) casts in pskb_expand_head() to help memcpy()
to use aligned moves.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 10:57:55 -07:00
Eric Dumazet 86cac58b71 skge: add GRO support
- napi_gro_flush() is exported from net/core/dev.c, to avoid
  an irq_save/irq_restore in the packet receive path.
- use napi_gro_receive() instead of netif_receive_skb()
- use napi_gro_flush() before calling __napi_complete()
- turn on NETIF_F_GRO by default
- Tested on a Marvell 88E8001 Gigabit NIC

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 10:57:55 -07:00
Eric Dumazet 875168a933 net: tunnels should use rcu_dereference
tunnel4_handlers, tunnel64_handlers, tunnel6_handlers and
tunnel46_handlers are protected by RCU, but we dont use appropriate rcu
primitives to scan them. rcu_lock() is already held by caller.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-01 10:57:53 -07:00
Eric Dumazet 1639ab6f78 gro: unexport tcp4_gro_receive and tcp4_gro_complete
tcp4_gro_receive() and tcp4_gro_complete() dont need to be exported.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 13:37:07 -07:00
Eric Dumazet ba4fd9d828 pktgen: remove non used variable
remove non used variable "queue" in pg_cleanup

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 13:37:06 -07:00
Jiri Pirko 72ed62f7c9 vlan: Use vlan_dev_real_dev in vlan_hwaccel_do_receive
[patch net-next-2.6] vlan: Use vlan_dev_real_dev in vlan_hwaccel_do_receive

Use helper as in other places.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 13:37:05 -07:00
Rémi Denis-Courmont 01b38606bd Phonet: do not set POLLOUT in case of send buffer overflow
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 13:04:33 -07:00
Rémi Denis-Courmont 02ac3268a5 Phonet: correct sendmsg() error code from sock_alloc_send_skb()
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 13:04:32 -07:00
Rémi Denis-Courmont 1a98214fee Phonet: restore flow control credits when sending fails
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 13:04:32 -07:00
John W. Linville 18145c6934 mac80211: cancel scan in ieee80211_restart_hw if software scan pending
This function exists to clean-up after a hardware error or something
similar.  The restart is accomplished using the same infrastructure used
to resume after a suspend.  The suspend path cancels running scans, so
it seems appropriate to do that here as well for software-based scans.
If a hardware-based scan is pending, issue a warning message since this
indicates that the drivers has failed to clean-up after itself.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-31 15:20:45 -04:00
John W. Linville c3d34d5d96 wireless: register wiphy rfkill w/o holding cfg80211_mutex
Otherwise lockdep complains...

https://bugzilla.kernel.org/show_bug.cgi?id=17311

[ INFO: possible circular locking dependency detected ]
2.6.36-rc2-git4 #12
-------------------------------------------------------
kworker/0:3/3630 is trying to acquire lock:
 (rtnl_mutex){+.+.+.}, at: [<ffffffff813396c7>] rtnl_lock+0x12/0x14

but task is already holding lock:
 (rfkill_global_mutex){+.+.+.}, at: [<ffffffffa014b129>]
rfkill_switch_all+0x24/0x49 [rfkill]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (rfkill_global_mutex){+.+.+.}:
       [<ffffffff81079ad7>] lock_acquire+0x120/0x15b
       [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e
       [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39
       [<ffffffffa014b4ab>] rfkill_register+0x2b/0x29c [rfkill]
       [<ffffffffa0185ba0>] wiphy_register+0x1ae/0x270 [cfg80211]
       [<ffffffffa0206f01>] ieee80211_register_hw+0x1b4/0x3cf [mac80211]
       [<ffffffffa0292e98>] iwl_ucode_callback+0x9e9/0xae3 [iwlagn]
       [<ffffffff812d3e9d>] request_firmware_work_func+0x54/0x6f
       [<ffffffff81065d15>] kthread+0x8c/0x94
       [<ffffffff8100ac24>] kernel_thread_helper+0x4/0x10

-> #1 (cfg80211_mutex){+.+.+.}:
       [<ffffffff81079ad7>] lock_acquire+0x120/0x15b
       [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e
       [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39
       [<ffffffffa018605e>] cfg80211_get_dev_from_ifindex+0x1b/0x7c [cfg80211]
       [<ffffffffa0189f36>] cfg80211_wext_giwscan+0x58/0x990 [cfg80211]
       [<ffffffff8139a3ce>] ioctl_standard_iw_point+0x1a8/0x272
       [<ffffffff8139a529>] ioctl_standard_call+0x91/0xa7
       [<ffffffff8139a687>] T.723+0xbd/0x12c
       [<ffffffff8139a727>] wext_handle_ioctl+0x31/0x6d
       [<ffffffff8133014e>] dev_ioctl+0x63d/0x67a
       [<ffffffff8131afd9>] sock_ioctl+0x48/0x21d
       [<ffffffff81102abd>] do_vfs_ioctl+0x4ba/0x509
       [<ffffffff81102b5d>] sys_ioctl+0x51/0x74
       [<ffffffff81009e02>] system_call_fastpath+0x16/0x1b

-> #0 (rtnl_mutex){+.+.+.}:
       [<ffffffff810796b0>] __lock_acquire+0xa93/0xd9a
       [<ffffffff81079ad7>] lock_acquire+0x120/0x15b
       [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e
       [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39
       [<ffffffff813396c7>] rtnl_lock+0x12/0x14
       [<ffffffffa0185cb5>] cfg80211_rfkill_set_block+0x1a/0x7b [cfg80211]
       [<ffffffffa014aed0>] rfkill_set_block+0x80/0xd5 [rfkill]
       [<ffffffffa014b07e>] __rfkill_switch_all+0x3f/0x6f [rfkill]
       [<ffffffffa014b13d>] rfkill_switch_all+0x38/0x49 [rfkill]
       [<ffffffffa014b821>] rfkill_op_handler+0x105/0x136 [rfkill]
       [<ffffffff81060708>] process_one_work+0x248/0x403
       [<ffffffff81062620>] worker_thread+0x139/0x214
       [<ffffffff81065d15>] kthread+0x8c/0x94
       [<ffffffff8100ac24>] kernel_thread_helper+0x4/0x10

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
2010-08-31 14:48:47 -04:00
Julia Lawall 3653910714 net/wireless: Remove double test
The same expression is tested twice and the result is the same each time.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@expression@
expression E;
@@

(
* E
  || ... || E
|
* E
  && ... && E
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-31 14:20:40 -04:00
Jouni Malinen 391a200a89 mac80211: Do not generate CQM events based on first Beacon frames
The signal strength value in a single RX frame is not that reliable,
so it is better to delay start of CQM events until there is a real
average signal strength from more than a single Beacon frame
available.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-31 14:20:40 -04:00
Jouni Malinen 3ba06c6fbd mac80211: Fix signal strength average initialization for CQM events
The ave_beacon_signal value uses 1/16 dB unit and as such, must be
initialized with the signal level of the first Beacon frame multiplied
by 16. This fixes an issue where the initial CQM events are reported
incorrectly with a burst of events while the running average
approaches the correct value after the incorrect initialization. This
could cause user space -based roaming decision process to get quite
confused at the moment when we would like to go through authentication
and DHCP.

Cc: stable@kernel.org
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-31 14:20:40 -04:00
David S. Miller b963ea89f0 netlink: Make NETLINK_USERSOCK work again.
Once we started enforcing the a nl_table[] entry exist for
a protocol, NETLINK_USERSOCK stopped working.  Add a dummy
table entry so that it works again.

Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-31 09:51:37 -07:00
David S. Miller 628e300ccc irda: Correctly clean up self->ias_obj on irda_bind() failure.
If irda_open_tsap() fails, the irda_bind() code tries to destroy
the ->ias_obj object by hand, but does so wrongly.

In particular, it fails to a) release the hashbin attached to the
object and b) reset the self->ias_obj pointer to NULL.

Fix both problems by using irias_delete_object() and explicitly
setting self->ias_obj to NULL, just as irda_release() does.

Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 18:37:56 -07:00
Eric Dumazet 3ff2cfa55f ipv6: struct xfrm6_tunnel in read_mostly section
tunnel6_handlers chain being scanned for each incoming packet,
make sure it doesnt share an often dirtied cache line.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:50:46 -07:00
Eric Dumazet 6dcd814bd0 net: struct xfrm_tunnel in read_mostly section
tunnel4_handlers chain being scanned for each incoming packet,
make sure it doesnt share an often dirtied cache line.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:50:45 -07:00
Gerrit Renker 89858ad143 dccp ccid-3: use per-route RTO or TCP RTO as fallback
This makes RTAX_RTO_MIN also available to CCID-3, replacing the compile-time
RTO lower bound with a per-route tunable value.

The original Kconfig option solved the problem that a very low RTT (in the
order of HZ) can trigger too frequent and unnecessary reductions of the
sending rate.

This tunable does not affect the initial RTO value of 2 seconds specified in
RFC 5348, section 4.2 and Appendix B. But like the hardcoded Kconfig value,
it allows to adapt to network conditions.

The same effect as the original Kconfig option of 100ms is now achieved by

> ip route replace to unicast 192.168.0.0/24 rto_min 100j dev eth0

(assuming HZ=1000).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:45:28 -07:00
Gerrit Renker 4886fcad6e dccp ccid-2: Share TCP's minimum RTO code
Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
over 802.11g: at least once per session there was a spurious timeout. It
helped to then increase the the value of RTO_MIN over this link.

Since the problem is the same as in TCP, this patch makes the solution from
commit "05bb1fad1cde025a864a90cfeb98dcbefe78a44a"
       "[TCP]: Allow minimum RTO to be configurable via routing metrics."
available to DCCP.

This avoids reinventing the wheel, so that e.g. the following works in the
expected way now also for CCID-2:

> ip route change 10.0.0.2 rto_min 800 dev ath0

Luckily this useful rto_min function was recently moved to net/tcp.h,
which simplifies sharing code originating from TCP.

Documentation also updated (plus minor whitespace fixes).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:45:27 -07:00
Gerrit Renker 22b71c8f4f tcp/dccp: Consolidate common code for RFC 3390 conversion
This patch consolidates initial-window code common to TCP and CCID-2:
 * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
 * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:45:26 -07:00
Gerrit Renker d26eeb07fd dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer()
This removes the wrappers around the sk timer functions, since not much is
gained from using them: the BUG_ON in start_rto_timer will never trigger
since that function is called only if:

 * the RTO timer expires (rto_expire, and then timer_pending() is false);
 * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
 * previously in new_ack, after stopping the timer (timer_pending() false).

Removing the wrappers also clears the way for eventually replacing the
RTO timer with the icsk-retransmission-timer, as it is already part of the
DCCP socket.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:45:26 -07:00
Gerrit Renker d82b6f85c1 dccp ccid-2: Use u32 timestamps uniformly
Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
as much code as possible.

Hence this patch aligns CCID-2 timestamping with TCP timestamping.
This also halves the space consumption (on 64-bit systems).

The necessary include file <net/tcp.h> is already included by way of
net/dccp.h. Redundant includes have been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:45:25 -07:00
Johannes Berg 42da2f948d wireless extensions: fix kernel heap content leak
Wireless extensions have an unfortunate, undocumented
requirement which requires drivers to always fill
iwp->length when returning a successful status. When
a driver doesn't do this, it leads to a kernel heap
content leak when userspace offers a larger buffer
than would have been necessary.

Arguably, this is a driver bug, as it should, if it
returns 0, fill iwp->length, even if it separately
indicated that the buffer contents was not valid.

However, we can also at least avoid the memory content
leak if the driver doesn't do this by setting the iwp
length to max_tokens, which then reflects how big the
buffer is that the driver may fill, regardless of how
big the userspace buffer is.

To illustrate the point, this patch also fixes a
corresponding cfg80211 bug (since this requirement
isn't documented nor was ever pointed out by anyone
during code review, I don't trust all drivers nor
all cfg80211 handlers to implement it correctly).

Cc: stable@kernel.org [all the way back]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-30 16:35:17 -04:00
Jerry Chu dca43c75e7 tcp: Add TCP_USER_TIMEOUT socket option.
This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If
0 is given, TCP will continue to use the system default.

Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.

The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.

The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.

This option, like many others, will be inherited by an acceptor from its
listener.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-30 13:23:33 -07:00
Johannes Berg 071249b1d5 mac80211: delete work timer
The new workqueue changes helped me find this bug
that's been lingering since the changes to the work
processing in mac80211 -- the work timer is never
deleted properly. Do that to avoid having it fire
after all data structures have been freed. It can't
be re-armed because all it will do, if running, is
schedule the work, but that gets flushed later and
won't have anything to do since all work items are
gone by now (by way of interface removal).

Cc: stable@kernel.org [2.6.34+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-30 16:02:34 -04:00
Stephen Rothwell 2c70b51962 IPVS: include net/ip6_checksum.h for csum_ipv6_magic
Fixes this build error:

net/netfilter/ipvs/ip_vs_core.c: In function 'ip_vs_nat_icmp_v6':
net/netfilter/ipvs/ip_vs_core.c:640: error: implicit declaration of function 'csum_ipv6_magic'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-29 21:15:26 -07:00
Linus Torvalds 29cfcddc0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net/ipv4: Eliminate kstrdup memory leak
  net/caif/cfrfml.c: use asm/unaligned.h
  ax25: missplaced sock_put(sk)
  qlge: reset the chip before freeing the buffers
  l2tp: test for ethernet header in l2tp_eth_dev_recv()
  tcp: select(writefds) don't hang up when a peer close connection
  tcp: fix three tcp sysctls tuning
  tcp: Combat per-cpu skew in orphan tests.
  pxa168_eth: silence gcc warnings
  pxa168_eth: update call to phy_mii_ioctl()
  pxa168_eth: fix error handling in prope
  pxa168_eth: remove unneeded null check
  phylib: Fix race between returning phydev and calling adjust_link
  caif-driver: add HAS_DMA dependency
  3c59x: Fix deadlock between boomerang_interrupt and boomerang_start_tx
  qlcnic: fix poll implementation
  netxen: fix poll implementation
  bridge: netfilter: fix a memory leak
2010-08-28 15:42:44 -07:00
Akinobu Mita 6a499b242f phonet: use for_each_set_bit
Replace open-coded loop with for_each_set_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-28 15:37:04 -07:00
Akinobu Mita 762c29164e econet: kill unnecessary spin_lock_init()
The spinlock aun_queue_lock is initialized statically. It is unnecessary
to initialize by spin_lock_init() at module load time.

This is detected by the semantic patch.

// <smpl>
@def@
declarer name DEFINE_SPINLOCK;
identifier spinlock;
@@

DEFINE_SPINLOCK(spinlock);

@@
identifier def.spinlock;
@@

- spin_lock_init(&spinlock);
// </smpl>

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Julia Lawall <julia@diku.dk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-28 15:37:03 -07:00
Julia Lawall c34186ed00 net/ipv4: Eliminate kstrdup memory leak
The string clone is only used as a temporary copy of the argument val
within the while loop, and so it should be freed before leaving the
function.  The call to strsep, however, modifies clone, so a pointer to the
front of the string is kept in saved_clone, to make it possible to free it.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression x;
expression E;
identifier l;
statement S;
@@

*x= \(kasprintf\|kstrdup\)(...);
...
if (x == NULL) S
... when != kfree(x)
    when != E = x
if (...) {
  <... when != kfree(x)
* goto l;
  ...>
* return ...;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-27 19:31:56 -07:00
Johannes Berg 5b714c6a37 mac80211: fix offchannel queue stop
Somebody noticed this problem, and I outlined
to them how to fix it, but haven't heard back
from them. So while I was adding the state
field I figured I could use it to fix it.

The problem, as I understand it, is that when
we go offchannel while the driver has a queue
stopped, the driver will likely start draining
the queue and then enable it while offchannel.
This in turn will enable the interface queue,
and that leads to transmitting data frames on
the wrong channel.

Fix this by keeping track of offchannel status
per interface, and not enabling the interface
queues on interfaces that are offchannel when
the driver enables a queue.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:31 -04:00
Johannes Berg 34d4bc4d41 mac80211: support runtime interface type changes
Add support to mac80211 for changing the interface
type even when the interface is UP, if the driver
supports it.

To achieve this
 * add a new driver callback for switching,
 * split some of the interface up/down code out
   into new functions (do_open/do_stop), and
 * maintain an own __SDATA_RUNNING bit that will
   not be set during interface type, so that any
   other code doesn't use the interface.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:31 -04:00
Johannes Berg 87490f6db3 mac80211: split out concurrent vif checks
Split the concurrent virtual interface checks
into a new function that can be used to check
for any given new interface type.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:31 -04:00
Johannes Berg bf533e0bfd mac80211: simplify zero address checks
The libertas_tf special code for zero addresses
is a bit too complex, it compares against a stack
value instead of using is_zero_ether_addr() and
tries to update all interfaces even if just the
one that's being brought up needs to be changed.
Additionally, the repeated check for a valid MAC
address need only be done if we actually changed
it on the fly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:30 -04:00
Johannes Berg 26a58456be mac80211: switch to ieee80211_sdata_running
Since the introduction of ieee80211_sdata_running(),
some new code was introduced that uses netif_running()
instead. Switch all these instances over.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:30 -04:00
Johannes Berg b9dcf712d1 mac80211: clean up ifdown/cleanup paths
There's a lot of redundant code in mac80211's
interface cleanup/down, for example freeing
AP beacons is done both when the interface is
set DOWN as well as when it is torn down, of
which only the former has any effect.

Also, a bunch of things should be closer to
where they matter, like the MLME timers that
we should cancel when disassociating, rather
than only when the interface is set DOWN.

Clean up all this code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:53:30 -04:00
Johannes Berg 2337db8db8 mac80211: use subqueue helpers
There are subqueue helpers so that we don't
need to get the TX queue and then wake/stop
it, use those helpers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:08 -04:00
Johannes Berg a621fa4d6a mac80211: allow changing port control protocol
Some vendor specified mechanisms for 802.1X-style
functionality use a different protocol than EAP
(even if EAP is vendor-extensible). Support this
in mac80211 via the cfg80211 API for it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:07 -04:00
Johannes Berg c0692b8fe2 cfg80211: allow changing port control protocol
Some vendor specified mechanisms for 802.1X-style
functionality use a different protocol than EAP
(even if EAP is vendor-extensible). Allow setting
the ethertype for the protocol when a driver has
support for this. The default if unspecified is
EAP, of course.

Note: This is suitable only for station mode, not
      for AP implementation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:07 -04:00
Johannes Berg 3ffc2a905b mac80211: allow vendor specific cipher suites
Allow drivers to specify their own set of cipher
suites to advertise vendor-specific ciphers. The
driver is then required to implement hardware
crypto offload for it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:07 -04:00
Johannes Berg 7d64b7cc1f cfg80211: allow vendor specific cipher suites
cfg80211 currently rejects all cipher suites it
doesn't know about for key length checking
purposes. This can lead to inconsistencies when
a driver advertises an algorithm that cfg80211
doesn't know about. Remove this rejection so
drivers can specify any algorithm they like.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:07 -04:00
Johannes Berg 8789d459bc mac80211: allow scan to complete from any context
The ieee80211_scan_completed() function was a frequent
source of potential deadlocks, since it is called by
drivers but may call back into drivers, so drivers had
to make sure to call it without any locks held, which
frequently lead to more complex code in drivers. Avoid
that problem by allowing the function to be called in
any context, and queueing the actual work it does.
Also update the documentation for it to indicate this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:06 -04:00
Johannes Berg 5f33c92d18 mac80211: remove unused scan expire define
Since cfg80211 manages the BSS list completely,
this define hasn't been used for a long time
and will never be used again.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-27 13:27:06 -04:00
Eric Dumazet 40d0802b3e gro: __napi_gro_receive() optimizations
compare_ether_header() can have a special implementation on 64 bit
arches if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.

__napi_gro_receive() and vlan_gro_common() can avoid a conditional
branch to perform device match.

On x86_64, __napi_gro_receive() has now 38 instructions instead of 53

As gcc-4.4.3 still choose to not inline it, add inline keyword to this
performance critical function.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 22:03:08 -07:00
Jeff Mahoney 7e368739e3 net/caif/cfrfml.c: use asm/unaligned.h
caif does not build on ia64 starting with 2.6.32-rc1.  Using
asm/unaligned.h instead of linux/unaligned/le_byteshift.h fixes the issue.

include/linux/unaligned/le_byteshift.h:40:50: error: redefinition of 'get_unaligned_le16'
include/linux/unaligned/le_byteshift.h:45:50: error: redefinition of 'get_unaligned_le32'
include/linux/unaligned/le_byteshift.h:50:50: error: redefinition of 'get_unaligned_le64'
include/linux/unaligned/le_byteshift.h:55:51: error: redefinition of 'put_unaligned_le16'
include/linux/unaligned/le_byteshift.h:60:51: error: redefinition of 'put_unaligned_le32'
include/linux/unaligned/le_byteshift.h:65:51: error: redefinition of 'put_unaligned_le64'
include/linux/unaligned/le_struct.h:31:51: note: previous definition of 'put_unaligned_le64' was here

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 16:11:08 -07:00
Bernard Pidoux F6BVP d71b0e9c00 ax25: missplaced sock_put(sk)
This patch moves a missplaced sock_put(sk) after
bh_unlock_sock(sk)
like in other parts of AX25 driver.

Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 15:18:27 -07:00
Changli Gao 53f91dc1f7 net: use scnprintf() to avoid potential buffer overflow
strlcpy() returns the total length of the string they tried to create, so
we should not use its return value without any check. scnprintf() returns
the number of characters written into @buf not including the trailing '\0',
so use it instead here.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 14:11:49 -07:00
Joe Perches 145ce502e4 net/sctp: Use pr_fmt and pr_<level>
Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
use do { print } while (0) guards.
Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
lines were continued.
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add a missing newline in "Failed bind hash alloc"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 14:11:48 -07:00
Eric Dumazet bfc960a8ee l2tp: test for ethernet header in l2tp_eth_dev_recv()
close https://bugzilla.kernel.org/show_bug.cgi?id=16529

Before calling dev_forward_skb(), we should make sure skb head contains
at least an ethernet header, even if length included in upper layer said
so. Use pskb_may_pull() to make sure this ethernet header is present in
skb head.

Reported-by: Thomas Heil <heil@terminal-consulting.de>
Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 13:29:38 -07:00
Simon Horman dee06e4702 ipvs: switch to GFP_KERNEL allocations
Switch from GFP_ATOMIC allocations to GFP_KERNEL ones in
ip_vs_add_service() and ip_vs_new_dest(), as we hold a mutex and are
allowed to sleep in this context.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 13:21:29 -07:00
Simon Horman 4f72816ef0 IPVS: convert __ip_vs_securetcp_lock to a spinlock
Also rename __ip_vs_securetcp_lock to ip_vs_securetcp_lock.

Spinlock conversion was suggested by Eric Dumazet.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 13:21:29 -07:00
Simon Horman bd14455048 IPVS: convert __ip_vs_sched_lock to a spinlock
Also rename __ip_vs_sched_lock to ip_vs_sched_lock.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 13:21:28 -07:00
Simon Horman 8870f8427b IPVS: ICMPv6 checksum calculation
Cc: Xiaoyu Du <tingsrain@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-26 13:21:26 -07:00
KOSAKI Motohiro d84ba638e4 tcp: select(writefds) don't hang up when a peer close connection
This issue come from ruby language community. Below test program
hang up when only run on Linux.

	% uname -mrsv
	Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686
	% ruby -rsocket -ve '
	BasicSocket.do_not_reverse_lookup = true
	serv = TCPServer.open("127.0.0.1", 0)
	s1 = TCPSocket.open("127.0.0.1", serv.addr[1])
	s2 = serv.accept
	s2.close
	s1.write("a") rescue p $!
	s1.write("a") rescue p $!
	Thread.new {
	  s1.write("a")
	}.join'
	ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux]
	#<Errno::EPIPE: Broken pipe>
	[Hang Here]

FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call
select() internally. and tcp_poll has a bug.

SUS defined 'ready for writing' of select() as following.

|  A descriptor shall be considered ready for writing when a call to an output
|  function with O_NONBLOCK clear would not block, whether or not the function
|  would transfer data successfully.

That said, EPIPE situation is clearly one of 'ready for writing'.

We don't have read-side issue because tcp_poll() already has read side
shutdown care.

|        if (sk->sk_shutdown & RCV_SHUTDOWN)
|                mask |= POLLIN | POLLRDNORM | POLLRDHUP;

So, Let's insert same logic in write side.

- reference url
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 23:02:48 -07:00
Eric Dumazet c5ed63d66f tcp: fix three tcp sysctls tuning
As discovered by Anton Blanchard, current code to autotune 
tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
sysctl_max_syn_backlog makes little sense.

The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
machine in Anton's case.

(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
is much bigger if spinlock debugging is on. Its wrong to select bigger
limits in this case (where kernel structures are also bigger)

bhash_size max is 65536, and we get this value even for small machines. 

A better ground is to use size of ehash table, this also makes code
shorter and more obvious.

Based on a patch from Anton, and another from David.

Reported-and-tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 23:02:17 -07:00
stephen hemminger aa7c6e5fa0 bridge: avoid ethtool on non running interface
If bridge port is offline, don't call ethtool to query speed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 16:36:51 -07:00
Stephen Hemminger 944c794d64 bridge: fix locking comment
The carrier check is not called from work queue in current code.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 16:36:50 -07:00
Julia Lawall b2aff96327 net/netfilter/ipvs: Eliminate memory leak
__ip_vs_service_get and __ip_vs_svc_fwm_get increment a reference count, so
that reference count should be decremented before leaving the function in an
error case.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression x;
expression E;
identifier f1;
iterator I;
@@

x = __ip_vs_service_get(...);
<... when != x
     when != true (x == NULL || ...)
     when != if (...) { <+...x...+> }
     when != I (...) { <+...x...+> }
(
 x == NULL
|
 x == E
|
 x->f1
)
...>
* return ...;
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-25 16:36:50 -07:00
John W. Linville e569aa78ba Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/libertas/if_sdio.c
2010-08-25 14:51:42 -04:00
Johannes Berg 5eb5a52da6 mac80211: fix mesh advertisement
When a mac80211-based driver advertises mesh mode
support, this will be advertised to userspace.
However, if mac80211 was compiled without mesh
support, then that won't actually be true. Fix
this by removing the bit for mesh if mesh isn't
compiled in.

Since this synchronizes what we advertise to
cfg80211 and actually support, it means we can
now rely on cfg80211's interface type checks
and need not check again in mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-08-25 14:34:56 -04:00