Commit Graph

518 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo 276f2edc52 [TFRC]: Migrate TX history to singly-linked lis
This patch was based on another made by Gerrit Renker, his changelog was:

    ------------------------------------------------------
The patch set migrates TFRC TX history to a singly-linked list.

The details are:
 * use of a consistent naming scheme (all TFRC functions now begin with `tfrc_');
 * allocation and cleanup are taken care of internally;
 * provision of a lookup function, which is used by the CCID TX infrastructure
   to determine the time a packet was sent (in turn used for RTT sampling);
 * integration of the new interface with the present use in CCID3.
    ------------------------------------------------------

Simplifications I did:

. removing the tfrc_tx_hist_head that had a pointer to the list head and
  another for the slabcache.
. No need for creating a slabcache for each CCID that wants to use the TFRC
  tx history routines, create a single slabcache when the dccp_tfrc_lib module
  init routine is called.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:11 -08:00
Pavel Emelyanov 8d8ad9d7c4 [NET]: Name magic constants in sock_wake_async()
The sock_wake_async() performs a bit different actions
depending on "how" argument. Unfortunately this argument
ony has numerical magic values.

I propose to give names to their constants to help people
reading this function callers understand what's going on
without looking into this function all the time.

I suppose this is 2.6.25 material, but if it's not (or the
naming seems poor/bad/awful), I can rework it against the
current net-2.6 tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:03 -08:00
Gerrit Renker ce865a61c8 [DCCP]: Add support for abortive release
This continues from the previous patch and adds support for actively aborting
a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an
abortive release.

I have tried this in various client/server settings and it works as expected.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:02 -08:00
Gerrit Renker d83bd95bf1 [DCCP]: Check for unread data on close
This removes one FIXME with regard to close when there is still unread data.
The mechanism is implemented similar to TCP: with regard to DCCP-specifics,
a Reset with Code 2, "Aborted" is sent to the peer.

This corresponds in part to RFC 4340, 8.1.1 and 8.1.5.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:01 -08:00
Gerrit Renker dcfbc7e97a [CCID2]: Remove misleading comment
This removes a comment which identifies an `issue' with dccp_write_xmit() where there is none.
The comment assumes it is possible that a packet is sent between the calls to

	ccid_hc_tx_send_packet(),
	dccp_transmit_skb(),
	ccid_hc_tx_packet_sent()

(in the above order) in dccp_write_xmit().

I think that this is impossible, since dccp_write_xmit() is always called under lock:

 * when called as dccp_write_xmit(sk, 1) from dccp_send_close(), the socket is locked
   (see code comment above dccp_send_close());
 * when called as dccp_write_xmit(sk, 0) from dccp_send_msg(), it is after lock_sock() has been called;
 * when called as dccp_write_xmit(sk, 0) from dccp_write_xmit_timer(), bh_lock_sock() has been called
   and the if/else statement has made sure that sk_lock.owner is not set;
 * there are no other places where dccp_write_xmit() is called.

Furthermore, the debug statement for printing the sequence number of the packet just sent has been
removed, since the entire list is being printed anyway and so the entry of that number appears last.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:01 -08:00
Gerrit Renker a302002516 [CCID2]: Remove redundant ack-counting variable
The code used two different variables to count Acks, one of them redundant.
This patch reduces the number of Ack counters to one.

The type of the Ack counter has also been changed to u32 (twice the range of int);
and the variable has been renamed into `packets_acked' - for consistency with
RFC 3465 (and similarly named variables are used by TCP and SCTP).

Lastly, a slightly less aggressive `maxincr' increment is used (for even Ack Ratios,
maxincr was Ack Ratio/2 + 1 instead of Ack Ratio/2).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:55:00 -08:00
Gerrit Renker 83399361c3 [CCID2]: Remove redundant synchronisation variable
This removes the synchronisation variable `ccid2hctx_sendwait', which is set to 1
when the CCID2 sender may send a new packet, and which is set to 0 otherwise

The variable is redundant, since it is only used in combination with the hc_tx_send_packet/
hc_tx_packet_sent function pair. Both functions are called under socket lock, so the
following happens when the CCID2 may send a new packet:

 * it sets sendwait = 1 in tx_send_packet and returns 0;
 * the subsequent call to tx_packet_sent clears the sendwait flag;
 * since tx_send_packet returns 0 if and only if sendwait == 1, the BUG_ON condition
   in tx_packet_sent is never satisfied, since that function is never called when
   tx_send_packet returns a value different from 0 (cf. dccp_write_xmit);
 * the call to tx_packet_sent clears the flag so that the condition "!sendwait" is
   true the next time tx_packet_sent is called.

In other words, it is sufficient to just return 0 / not-0 to synchronise tx_send_packet
and tx_packet_sent -- which is what the patch does.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:59 -08:00
Gerrit Renker da98e0b5d4 [CCID2]: Redundant debugging output
This reduces the amount of redundant debugging messages:

 * pipe/cwnd are printed in both tx_send_packet() and tx_packet_sent().
   Both functions are called immediately after one another, so one occurrence is sufficient.

 * Since tx_packet_sent() prints pipe/cwnd already, the second printk for pipe is redundant.

 * In tx_packet_sent() the check_sanity function is called twice (at the begin and at the end).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:59 -08:00
Gerrit Renker 95b21d7e9d [CCID2]: Replace pipe assignment-function with assignment
The function ccid2_change_pipe only does an assignment. This patch simplifies the code by
replacing the function with the assignment it performs.

Furthermore, the type of pipe is promoted from `signed' to unsigned (increasing the range).
As a result, a BUG_ON test for negative values now becomes obsolete (for safety not removed,
but replaced with a less annoying `DCCP_BUG').

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:58 -08:00
Gerrit Renker 3deeadd74b [CCID2]: Replace cwnd assignment-function with assignment
The current function ccid2_change_cwnd in effect makes only an assignment, as
the test whether cwnd has reached 0 is only required when cwnd is halved.

This patch simplifies the code by replacing the function with the assignment
it performs.

Furthermore, since ssthresh derives from cwnd and appears in many assignments and
comparisons, the type of ssthresh has also been changed to match that of cwnd.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:57 -08:00
Gerrit Renker 63df18ad7f [CCID2]: Replace read-only variable with constant
This replaces the field member `numdupack', which was used as a read-only
constant in the code, with a #define.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:57 -08:00
Gerrit Renker 7792cd8885 [CCID2]: Remove unused variable
This removes a variable `ccid2hctx_sent' which is incremented but
never referenced/read (i.e., dead code).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:56 -08:00
Gerrit Renker 900bfed471 [CCID2]: Disable broken Ack Ratio adaptation algorithm
This comments out a problematic section comprising a half-finished algorithm:

 - The variable `ccid2hctx_ackloss' is never initialised to a value different from 0 and
   hence in fact is a read-only constant.
 - The `arsent' variable counts packets other than Acks (it is incremented for every packet),
   and there is no test for Ack Loss.
 - The concept of counting Acks as such leads to a complex calculation, and the calculation
   at the moment is inconsistent with this concept.
   The problem is that the number of Acks - rather than the number of windows - is counted,
   which leads to a complex (cubic/quadratic) expression - this is not even implemented.

In its current state, the commented-out algorithm interfers with normal processing by
changing Ack Ratio incorrectly, and at the wrong times.

A new algorithm is necessary, which will not necessarily use the same variables as used by
the unfinished one; hence the old variables have been removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:55 -08:00
Gerrit Renker b00d2bbc45 [CCID2]: Larger initial windows also for CCID2
RFC 4341, sec. 5 states that "The cwnd parameter is initialized to at most
four packets for new connections, following the rules from [RFC3390]", which
is implemented by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:55 -08:00
Arnaldo Carvalho de Melo e18d7a9857 [DCCP]: Initialize dccp_sock before calling the ccid constructors
This is because in the next patch CCID2 will assume that dccps_mss_cache is
non-zero.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:54 -08:00
Gerrit Renker d50ad163e6 [CCID2]: Deadlock and spurious timeouts when Ack Ratio > cwnd
This patch removes a bug in the current code. I agree with Andrea's comment
that there is a problem here but the way it is treated does not fix it.

The problem is that whenever Ack Ratio > cwnd, starvation/deadlock occurs:
 * the receiver will not send an Ack until (Ack Ratio - cwnd) data packets
   have arrived;
 * the sender will not send any data packet before the receipt of an Ack
   advances the send window.
The only way that the connection then progresses was via RTO timeout. In one
extreme case (bulk transfer), it was observed that this happened for every single
packet; i.e. hundreds of packets, each a RTO timeout of 1..3 seconds apart:
a transfer which normally would take a fraction of a second thus grew to
several minutes.

The solution taken by this approach is to observe the relation

                   "Ack Ratio <= cwnd"

by using the constraint (1) from RFC 4341, 6.1.2; i.e. set

                 Ack Ratio = ceil(cwnd / 2)

and update it whenever either Ack Ratio or cwnd change. This ensures that
the deadlock problem can not arise.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:53 -08:00
Gerrit Renker df054e1d00 [CCID2]: Don't assign negative values to Ack Ratio
Since it makes not sense to assign negative values to Ack Ratio, this
patch disallows this possibility.

As a consequence, a Bug test for negative Ack Ratio values becomes obsolete.

Furthermore, a check against overflow (as Ack Ratio may not exceed 2 bytes,
due to RFC 4340, 11.3) has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:53 -08:00
Gerrit Renker cfbbeabc88 [CCID2]: Fix sequence number arithmetic/comparisons
This replaces use of normal subtraction with modulo-48 subtraction.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:52 -08:00
Gerrit Renker 3de5489f47 [CCID2]: Bug in reading Ack Vectors
In CCID2 the receiver-history is sorted in ascending order of sequence number,
but the processing of received Ack Vectors requires the list traversal in the
opposite direction.

The current code has a bug in this regard: the list traversal is upwards. As a
consequence, only Ack Vectors with a run length of 1 will pass, in all other
Ack Vectors the remaining (acked) sequence numbers are missed, and may later
falsely be identified as lost.

Note: This bug is only visible when Ack Ratio > 1, since otherwise the run
      lengths of Ack Vectors are 0.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:51 -08:00
Gerrit Renker a47c51044a [ACKVEC]: Reduce length of identifiers
This is reduces the length of the struct ackvec/ackvec_record fields. It is
a purely text-based replacement:

	s#dccpavr_#avr_#g;
	s#dccpav_#av_#g;

and increases readability somewhat.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:51 -08:00
Gerrit Renker c86ab2b6a5 [DCCP]: Ignore Ack Vectors / Elapsed Time on DCCP-Request also
Small update with regard to RFC 4340 (references added as documentation):
on Requests, Ack Vectors / Elapsed Time should be ignored.
Length handling of Elapsed Time also simplified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:47 -08:00
Gerrit Renker 6d57b43bf8 [DCCP]: Remove redundant dependency on IP_DCCP
This cleans up the consequences of an earlier patch which
introduced the `if IP_DCCP' clause into net/dccp/Kconfig.

The CCID Kconfig menu is sourced within this clause; as a
consequence, all tests of type `depends on IP_DCCP' are now
redundant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:46 -08:00
Gerrit Renker e333b3edc4 [DCCP]: Promote CCID2 as default CCID
This patch addresses the following problems:

 1. DCCP relies for its proper functioning on having at least one CCID module
    enabled (as in TCP plugable congestion control). Currently it is possible to
    disable both CCIDs and thus leave the DCCP module in a compiled, but entirely
    non-functional state: no sockets can be created when no CCID is available.
    Furthermore, the protocol is (again like TCP) not intended to be used without
    CCIDs. Last, a non-empty CCID list is needed for doing CCID feature negotiation.

 2. Internally the default CCID that is advertised by the Linux host is set to CCID2
    (DCCPF_INITIAL_CCID in include/linux/dccp.h). Disabling CCID2 in the Kconfig
    menu without changing the defaults leads to a failure `module not found' when
    trying to load the dccp module (which internally tries to load the default CCID).

 3. The specification (RFC 4340, sec. 10) treats CCID2 somewhat like a
    `minimum common denominator'; the specification says that:

    * "New connections start with CCID 2 for both endpoints"

    * "A DCCP implementation intended for general use, such as an implementation in a
       general-purpose operating system kernel, SHOULD implement at least CCID 2.
       The intent is to make CCID 2 broadly available for interoperability [...]"

    Providing CCID2 as minimum-required CCID (like Reno/Cubic in TCP) thus seems reasonable.

Hence this patch automatically selects CCID2 when DCCP is enabled. Documentation also added.

Discussions with Ian McDonald on this subject are gratefully acknowledged.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:46 -08:00
Gerrit Renker 8e8c71f1ab [DCCP]: Honour and make use of shutdown option set by user
This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.

This patch exploits the information provided by the user via the socket API to reduce
processing costs:
 * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
 * if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
   as long as the TX queue has not been drained, we need to receive feedback to keep
   congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
   packet (under congestion control) has been sent;
 * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
   as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).

Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
to this by the present patch.

There will also no longer be any delivery when the socket is in the final stages, i.e. when
one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
that stage the connection is its final stages.

Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown

A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).

Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
      sure the socket is a TCP socket". This should probably be extended to mean
      `TCP or DCCP socket' (the code is also used by UDP and raw sockets).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:44 -08:00
Gerrit Renker c3ada46a00 [CCID3]: Inline for moving average
The moving average computation occurs so frequently in the CCID 3 code that
it merits an inline function  of its own. This is uses a suggestion by
Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:43 -08:00
Gerrit Renker a5358fdc9c [CCID3]: Accurately determine idle & application-limited periods
This fixes/updates the handling of idle and application-limited periods in CCID3,
which currently is broken: there is no detection as to how long a sender has been
idle - there is only one flag which is toggled in between function calls.

Being obsolete now, the `idle' flag is removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:42 -08:00
Gerrit Renker eb279b79c4 [CCID3]: Ignore trivial amounts of elapsed time
This patch fixes a previously undiscovered bug; the problem is in computing
the elapsed time as the time between `receiving' the packet (i.e. skb enters
CCID module) and sending feedback:

     - there is no layer-processing, queueing, or delay involved,
     - hence the elapsed time is in the order of 1 function call
     - this is in the dimension of maximally 50..100usec
     - which renders the use of elapsed time almost entirely useless.

The fix is simply to ignore such trivial amounts of elapsed time.

As a further advantage, the now useless elapsed_time field can be removed from
the socket, which reduces the socket structure by another four bytes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:42 -08:00
Gerrit Renker 6c08b2cf48 [CCID3]: Revert use of MSS instead of s
This updates the CCID3 code with regard to two instances of using `MSS' in place of `s':

 1. The RFC3390-based initial rate: both rfc3448bis as well as the Faster Restart
    draft now consistently use `s' instead of MSS.

 2. Now agrees with section 4.2 of rfc3448bis: "If the sender is ready to send data when
    it does not yet have a round trip sample, the value of X is set to s bytes per
    second, for segment size s [...]"

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:54:41 -08:00
Pavel Emelyanov b24b8a247f [NET]: Convert init_timer into setup_timer
Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:53:35 -08:00
Joe Perches 5e8e034cc5 [DCCP]: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-20 13:59:39 -08:00
Joe Perches 4756daa3b6 [DCCP]: Add missing "space"
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-19 23:46:02 -08:00
Eric Dumazet 230140cffa [INET]: Remove per bucket rwlock in tcp/dccp ehash table.
As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:11 -08:00
David S. Miller c62cf5cb17 [DCCP]: Use DEFINE_PROTO_INUSE infrastructure.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:09:01 -08:00
Gerrit Renker 24c667db59 [CCID2/3]: Initialisation assignments of 0 are redundant
Assigning initial values of `0' is redundant when loading a new CCID structure,
since in net/dccp/ccid.c the entire CCID structure is zeroed out prior to
initialisation in ccid_new():

    	struct ccid {
    		struct ccid_operations *ccid_ops;
    		char		       ccid_priv[0];
    	};

    	// ...
    	if (rx) {
    		memset(ccid + 1, 0, ccid_ops->ccid_hc_rx_obj_size);
    		if (ccid->ccid_ops->ccid_hc_rx_init != NULL &&
    		    ccid->ccid_ops->ccid_hc_rx_init(ccid, sk) != 0)
    			goto out_free_ccid;
    	} else {
    		memset(ccid + 1, 0, ccid_ops->ccid_hc_tx_obj_size);
    		/* analogous to the rx case */
    	}

This patch therefore removes the redundant assignments. Thanks to Arnaldo for
the inspiration.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-10-24 10:53:01 -02:00
Gerrit Renker 76fd1e87d9 [DCCP]: Unaligned pointer access
This fixes `unaligned (read) access' errors of the type

Kernel unaligned access at TPC[100f970c] dccp_parse_options+0x4f4/0x7e0 [dccp]
Kernel unaligned access at TPC[1011f2e4] ccid3_hc_tx_parse_options+0x1ac/0x380 [dccp_ccid3]
Kernel unaligned access at TPC[100f9898] dccp_parse_options+0x680/0x880 [dccp]

by using the get_unaligned macro for parsing options.

Commiter note: Preserved the sparse __be{16,32} annotations.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-10-24 10:46:58 -02:00
Gerrit Renker d8ef2c29a0 [DCCP]: Convert Reset code into socket error number
This adds support for converting the 11 currently defined Reset codes into system
error numbers, which are stored in sk_err for further interpretation.

This makes the externally visible API behaviour similar to TCP, since a client
connecting to a non-existing port will experience ECONNREFUSED.

* Code 0, Unspecified, is interpreted as non-error (0);
* Code 1, Closed (normal termination), also maps into 0;
* Code 2, Aborted, maps into "Connection reset by peer" (ECONNRESET);
* Code 3, No Connection and
  Code 7, Connection Refused, map into "Connection refused" (ECONNREFUSED);
* Code 4, Packet Error, maps into "No message of desired type" (ENOMSG);
* Code 5, Option Error, maps into "Illegal byte sequence" (EILSEQ);
* Code 6, Mandatory Error, maps into "Operation not supported on transport endpoint" (EOPNOTSUPP);
* Code 8, Bad Service Code, maps into "Invalid request code" (EBADRQC);
* Code 9, Too Busy, maps into "Too many users" (EUSERS);
* Code 10, Bad Init Cookie, maps into "Invalid request descriptor" (EBADR);
* Code 11, Aggression Penalty, maps into "Quota exceeded" (EDQUOT)
  which makes sense in terms of using more than the `fair share' of bandwidth.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-10-24 10:27:48 -02:00
Gerrit Renker 1238d0873b [DCCP]: One more exemption from full sequence number checks
This fixes the following problem: client connects to peer which has no DCCP
enabled or loaded; ICMP error messages ("Protocol Unavailable") can be seen
on the wire, but the application hangs. Reason: ICMP packets don't get through
to dccp_v4_err.

When reporting errors, a sequence number check is made for the DCCP packet
that had caused an ICMP error to arrive.
Such checks can not be made if the socket is in state LISTEN, RESPOND (which
in the implementation is the same as LISTEN), or REQUEST, since update_gsr()
has not been called in these states, hence the sequence window is 0..0.

This patch fixes the problem by adding the REQUEST state as another exemption
to the window check. The error reporting now works as expected on connecting.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-24 10:18:06 -02:00
Gerrit Renker fde20105f3 [DCCP]: Retrieve packet sequence number for error reporting
This fixes a problem when analysing erroneous packets in dccp_v{4,6}_err:
* dccp_hdr_seq currently takes an skb
* however, the transport headers in the skb are shifted, due to the
  preceding IPv4/v6 header.
Fixed for v4 and v6 by changing dccp_hdr_seq to take a struct dccp_hdr as
argument. Verified that the correct sequence number is now reported in the
error handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-10-24 10:12:09 -02:00
Arnaldo Carvalho de Melo 6273172e17 [DCCP]: Implement SIOCINQ/FIONREAD
Just like UDP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Leandro Melo de Sales <leandroal@gmail.com>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-23 21:27:50 -07:00
Jean Delvare 7131c6c736 [INET]: Use MODULE_ALIAS_NET_PF_PROTO_TYPE where possible.
Now that we have this new macro, use it where possible.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-22 02:59:54 -07:00
Jean Delvare 305e1e9691 [INET]: Let inet_diag and friends autoload
By adding module aliases to inet_diag, tcp_diag and dccp_diag, we let
them load automatically as needed. This makes tools like "ss" run
faster.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-22 02:59:54 -07:00
Ingo Molnar bd5435e76a [DCCP]: fix link error with !CONFIG_SYSCTL
Do not define the sysctl_dccp_sync_ratelimit sysctl variable in the
CONFIG_SYSCTL dependent sysctl.c module - move it to input.c instead.

This fixes the following build bug:

 net/built-in.o: In function `dccp_check_seqno':
 input.c:(.text+0xbd859): undefined reference to `sysctl_dccp_sync_ratelimit'
 distcc[29953] ERROR: compile (null) on localhost failed
 make: *** [vmlinux] Error 1

Found via 'make randconfig' build testing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17 19:33:06 -07:00
Herbert Xu e5bbef20e0 [IPV6]: Replace sk_buff ** with sk_buff * in input handlers
With all the users of the double pointers removed from the IPv6 input path,
this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input
handlers.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-15 12:50:28 -07:00
Gerrit Renker 4a5409a5a8 [DCCP]: Twice the wrong reset code in receiving connection-Requests
This fixes two bugs in processing of connection-Requests in
v{4,6}_conn_request:

 1. Due to using the variable `reset_code', the Reset code generated
    internally by dccp_parse_options() is overwritten with the
    initialised value ("Too Busy") of reset_code, which is not what is
    intended.

 2. When receiving a connection-Request on a multicast or broadcast
    address, no Reset should be generated, to avoid storms of such
    packets. Instead of jumping to the `drop' label, the
    v{4,6}_conn_request functions now return 0. Below is why in my
    understanding this is correct:

    When the conn_request function returns < 0, then the caller,
    dccp_rcv_state_process(), returns 1. In all instances where
    dccp_rcv_state_process is called (dccp_v4_do_rcv, dccp_v6_do_rcv,
    and dccp_child_process), a return value of != 0 from
    dccp_rcv_state_process() means that a Reset is generated.

    If on the other hand the conn_request function returns 0, the
    packet is discarded and no Reset is generated.

Note: There may be a related problem when sending the Response, due to
the following.

	if (dccp_v6_send_response(sk, req, NULL))
		goto drop_and_free;
	/* ... */
	drop_and_free:
		return -1;

In this case, if send_response fails due to transmission errors, the
next thing that is generated is a Reset with a code "Too Busy". I
haven't been able to conjure up such a condition, but it might be good
to change the behaviour here also (not done by this patch).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:38 -07:00
Gerrit Renker dcad856fe8 [DCCP]: Wrong format in printk
The elapsed time uses u32, but printk was using %d, not %u.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-10-10 16:54:36 -07:00
Gerrit Renker 451bc0473f [DCCP]: Tidy-up -- minisock initialisation
This

 * removes a declaration of a non-existent function
   __dccp_minisock_init;

 * shifts the initialisation function dccp_minisock_init() from
   options.c to minisocks.c, where it is more naturally expected to
   be.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:36 -07:00
Gerrit Renker 5e28599a6e [CCID2]: Sequence number wraparound issues
This replaces several uses of standard arithmetic with the DCCP
sequence number arithmetic functions. The problem here is that the
sequence number wrap-around was not taken into consideration.

 * Condition "seqp->ccid2s_seq <= prev->ccid2s_seq" has been replaced
   by

   	dccp_delta_seqno(seqp->ccid2s_seq, prev->ccid2s_seq) >= 0

   since if seqp is `before' prev, then the delta_seqno() is positive.

 * The test whether sequence numbers `a' and `b' are consecutive has
   the form

   	dccp_delta_seqno(a, b) == 1

 * Increment of ccid2hctx_rpseq could be done using dccp_inc_seqno(),
   but since here the incremented ccid2hctx_rpseq == seqno, used
   assignment instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:35 -07:00
Gerrit Renker 6c58324808 [CCID2]: Remove redundant case block
skb's passed to ccid2_hc_tx_send_packet() are headerless, the packet
type is decided later, in dccp_write_xmit(). Therefore the first test
of the switch/case block is always true, the others are never reached.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:35 -07:00
Gerrit Renker ee196c2186 [CCID2]: Remove redundant BUG_ON
This removes a test for `val < 1' which would only have been triggered
when val < 0, due to a preceding test for 0.  Fixed by using an
unsigned type for cwnd (as in TCP) instead.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:34 -07:00
Gerrit Renker 7d9e8931f9 [CCID2]: Remove ugly BUG_ON
This removes an ugly BUG_ON which has been pointed out by Arnaldo.

Instead of freezing up the machine, a `critical' message is now issued
to the system log.

There is potential of doing this more gracefully (eg. there are a few
internal variables which could be updated despite the lack of memory),
but that requires more complicated changes to the algorithm; thus a
`FIXME' has been added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:34 -07:00
Gerrit Renker cd1f7d347c [CCID2]: Simplify interface
This patch simplifies the interface of ccid2_hc_tx_alloc_seq():

   * ccid2_hc_tx_alloc_seq() is always called with an argument of
     CCID2_SEQBUF_LEN;

   * other code - ccid2_hc_tx_check_sanity() - even depends on the
     assumption that ccid2_hc_tx_alloc_seq() has been called with this
     particular size;

   * passing the `gfp_t' argument to ccid2_hc_tx_alloc_seq() is
     redundant with gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:33 -07:00
Gerrit Renker 042d18f9f3 [DCCP]: Make all `debug' parameters bool
This just sets the parameter to bool, since debugging messages are
either on or off.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:32 -07:00
Gerrit Renker 7c559a9e44 [DCCP]: Add socket option to query the current MPS
This enables applications to query the current value of the Maximum
Packet Size via a socket option, suggested as a SHOULD in (RFC 4340,
p. 102).

This socket option is useful to avoid the annoying bail-out via
`-EMSGSIZE'.  In particular, as fragmentation is not currently
supported (and its use is partly discouraged in RFC 4340).

With this option, it is possible to size buffers accordingly, e.g.

	int buflen = dccp_get_cur_mps(sockfd);

	/* or */
	if (msgsize > dccp_get_cur_mps(sockfd))
		die("message is too large for this path");

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:31 -07:00
Gerrit Renker bc8498721d [DCCP]: Wait for CCID
This performs a minor optimisation: when ccid_hc_tx_send_packet
returns a value greater zero, then the same call previously was done
again at the begin of the while loop in dccp_wait_for_ccid.

This patch exploits the available information and schedule-timeouts
directly instead.

Documentation also added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:31 -07:00
Arnaldo Carvalho de Melo bb293e6a24 [CCID3]: Remove ifdef surrounding BUG_ON
As suggested by DaveM.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:45 -07:00
Gerrit Renker cecd8d0ec4 [DCCP]: Reduce the number of writable states
Since DCCP requires to close both ends of a connection simultaneously,
permission to write in state DCCP_CLOSING is removed in dccp_sendmsg():
 * if the sending end closed, it would encounter a write error anyhow;
 * if the other end has closed the connection, it accepts no more data.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:45 -07:00
Gerrit Renker e356d37a09 [DCCP]: Factor out common code for generating Resets
This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function,
and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6.

It is useful to have this separate, since the following Reset codes will always
be generated from the control socket rather than via dccp_send_reset:
 * Code 3, "No Connection", cf. 8.3.1;
 * Code 4, "Packet Error" (identification for Data 1 added);
 * Code 5, "Option Error" (identification for Data 1..3 added, will be used later);
 * Code 6, "Mandatory Error" (same as Option Error);
 * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?);
 * Code 8, "Bad Service Code";
 * Code 9, "Too Busy";
 * Code 10, "Bad Init Cookie" (not used).

Code 0 is not recommended by the RFC, the following codes would be used in
dccp_send_reset() instead, since they all relate to an established DCCP connection:
 * Code 1, "Closed";
 * Code 2, "Aborted";
 * Code 11, "Aggression Penalty" (12.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:44 -07:00
Gerrit Renker 9bf55cda9b [DCCP]: Sequence number wrap-around when sending reset
This replaces normal addition with mod-48 addition so that sequence number
wraparound is respected.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker a94f0f9705 [DCCP]: Rate-limit DCCP-Syncs
This implements a SHOULD from RFC 4340, 7.5.4:
 "To protect against denial-of-service attacks, DCCP implementations SHOULD
  impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets,
  such as not more than eight DCCP-Syncs per second."

The rate-limit is maintained on a per-socket basis. This is a more stringent
policy than enforcing the rate-limit on a per-source-address basis and
protects against attacks with forged source addresses.

Moreover, the mechanism is deliberately kept simple. In contrast to
xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets
are not supported.  This foils such attacks where the receipt of a Sync
triggers further sequence-invalid packets. (I have tested this mechanism against
xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.)

In order to keep flexibility, the timeout parameter can be set via sysctl; and
the whole mechanism can even be disabled (which is however not recommended).

The algorithm in this patch has been improved with regard to wrapping issues
thanks to a suggestion by Arnaldo.

Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're
               sending a sync.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:43 -07:00
Gerrit Renker ee1a15922d [DCCP]: Remove duplicate code for Reset from connected socket
In this patch, duplicated code is removed for the case when a Reset packet is
sent from a connected socket. This code duplication is between dccp_make_reset
and dccp_transmit_skb, which already contained an (up to now entirely unused)
switch statement to fill in the reset code from the DCCP_SKB_CB.

The only thing that has been removed is the call to dst_clone(dst), since
the queue_xmit functions use sk_dst_cache anyway.

I wasn't sure which purpose inet_sk_rebuild_header served, so I left it in.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker 0430ee3451 [DCCP]: Add Support for Data 1 .. 3 fields of Reset packets
This adds fields to support the informational Data 1..3 fields of the
DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes
to documentation.
Code which fills in these fields follows in subsequent patches, it is
primarily used for reporting option-processing and feature-negotiation
errors.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:42 -07:00
Gerrit Renker 727ecc5faa [DCCP]: Add FIXME for send_delayed_ack
This adds a FIXME to signal that the function dccp_send_delayed_ack is nowhere
used in the entire DCCP/CCID code.

Using a delayed Ack timer is suggested in 11.3 of RFC 4340, but it has also
rather subtle implications for the Ack-Ratio-accounting.

CCID2 does not use this (maybe it should).

I think leaving the function in is good, in case someone wants to implement
this.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker 2e86908f7d [CCID3]: Move NULL-protection into function
This moves several instances of testing against NULL into the function which is
used to de-reference the CCID-private data.

Committer note: Made the BUG_ON depend on having CONFIG_IP_DCCP_CCID3_DEBUG, as it
                is too much to have this on production code. Also made sure that
                the macro is used only after checking if sk_state is not LISTEN,
                to make it equivalent to what we had before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:41 -07:00
Gerrit Renker 08831700cc [DCCP]: Send Reset upon Sync in state REQUEST
This fixes the code to correspond to RFC 4340, 7.5.4, which states the
exception that a Sync received in state REQUEST generates a Reset (not
a SyncAck).

To achieve this, only a small change is required. Since
dccp_rcv_request_sent_state_process() already uses the correct Reset Code
number 4 ("Packet Error"), we only need to shift the if-statement a few
lines further down.

(To test this case: replace DCCP_PKT_RESPONSE with DCCP_PKT_SYNC
                    in dccp_make_response.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-10-10 16:52:40 -07:00
Gerrit Renker b0d045ca45 [DCCP]: Parameter renaming
The parameter `seq' of dccp_send_sync() is in fact an acknowledgement number
and not a sequence number - thus renamed by this patch into `ackno'.

Secondly, a `critical' warning is added when a Sync/SyncAck could not be sent.

Sanity: I have checked all other functions that are called in dccp_transmit_skb,
        there are no clashes with the use of dccpd_ack_seq; no other function is
        using this slot at the same time.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:37 -07:00
Gerrit Renker e155d76922 [DCCP]: Fix Reset/Sync-Flood Bug
This updates sequence number checking with regard to RFC 4340, 7.5.4.
Missing in the code was an exception for sequence-invalid Reset packets,
which get a Sync acknowledging GSR, instead of (as usual) P.seqno.

This can lead to an oscillating ping-pong flood of Reset packets.

In fact, it has been observed on the wire as follows:

 1. client establishes connection to server;
 2. before server can write to client, client crashes without notifying
    the server (NB: now no longer possible due to ABORT function);
 3. server sends DCCP-Data packet (has no ackno);
 4. client generates Reset "No Connection", seqno=0, increments seqno;
 5. server replies with Sync, using ackno = P.seqno;
 6. client generates Reset "No Connection" with seqno = ackno + 1;
 7. goto (5).

The difference is that now in (5) the server uses GSR.  This causes the
Reset sent by the client in (6) to become sequence-valid, so that in (7)
the vicious circle is broken; the Reset is then enqueued and causes the
socket to enter TIMEWAIT state.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:37 -07:00
Gerrit Renker cbe1f5f88a [DCCP]: Shorten variable names in dccp_check_seqno
This patch is in part required by the next patch; it

 * replaces 6 instances of `DCCP_SKB_CB(skb)->dccpd_seq' with `seqno';
 * replaces 7 instances of `DCCP_SKB_CB(skb)->dccpd_ack_seq' with `ackno';
 * replaces 1 use of dccp_inc_seqno() by unfolding `ADD48' macro in place.

No changes in algorithm, all changes are text replacement/substitution.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:36 -07:00
Gerrit Renker 3393da8241 [DCCP]: Simplify interface of dccp_sample_rtt
The third parameter of dccp_sample_rtt now becomes useless and is removed.

Also combined the subtraction of the timestamp echo and the elapsed time.
This is safe, since (a) presence of timestamp echo is tested first and (b)
elapsed time is either present and non-zero or it is not set and equals 0
due to the memset in dccp_parse_options.

To avoid measuring option-processing time, the timestamp for measuring the
initial Request/Response RTT sample is taken directly when the function is
called (the Linux implementation always adds a timestamp on the Request,
so there is no loss in doing this).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker 4c70f383e0 [DCCP]: Provide 10s of microsecond timesource
This provides a timesource, conveniently used for DCCP timestamps, which
returns the elapsed time in 10s of microseconds since initialisation.
This makes for a wrap-around time of about 11.9 hours, which should be
sufficient for most applications.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:35 -07:00
Gerrit Renker aa97efd97a [DCCP]: Reuse ktime_get_real() calls again
This patch reduces the number of timestamps taken in the receive path
for each packet.

The ccid3_hc_tx_update_x() routine is called in
 * the receive path for each CCID3-controlled packet
 * for the nofeedback timer (if no feedback arrives during 4 RTT)

Currently, when there is no loss, each packet gets timestamped twice.
The patch resolves this by recycling the first timestamp taken on packet
reception for RTT sampling.

When the no_feedback_timer() is called, then the timestamp argument is
simply set to NULL - so that ccid3_hc_tx_update_x() takes care of the logic.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:34 -07:00
Eric W. Biederman 457c4cbc5a [NET]: Make /proc/net per network namespace
This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:06 -07:00
Micah Gruber c726187225 [DCCP]: Remove unneeded pointer newdp from dccp_v4_request_recv_sock()
This trivial patch removes the unneeded pointer newdp, which is never used.

Signed-off-by: Micah Gruber <micah.gruber@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:59 -07:00
Ilpo Järvinen 172589ccdd [NET]: DIV_ROUND_UP cleanup (part two)
Hopefully captured all single statement cases under net/. I'm
not too sure if there is some policy about #includes that are
"guaranteed" (ie., in the current tree) to be available through
some other #included header, so I just added linux/kernel.h to
each changed file that didn't #include it previously.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:37 -07:00
Arnaldo Carvalho de Melo 6168b96c07 [DCCP]: Nuke the timeval helpers now that we fully converted to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:17 -07:00
Arnaldo Carvalho de Melo 8fb8354af9 [DCCP]: Nuke dccp_timestamp and dccps_epoch, not used anymore
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:17 -07:00
Arnaldo Carvalho de Melo 234748954a [DCCP] options: convert dccp_insert_option_timestamp to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:16 -07:00
Arnaldo Carvalho de Melo 19ac21465e [DCCP]: Convert dccps_timestamp_time to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:16 -07:00
Arnaldo Carvalho de Melo 0740d49c24 [DCCP] packet_history: Convert dccphtx_tstamp to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:15 -07:00
Arnaldo Carvalho de Melo e7c2335794 [DCCP] packet_history: convert dccphrx_tstamp to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:14 -07:00
Arnaldo Carvalho de Melo b8bda9d708 [DCCP] ackvec: Convert to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:14 -07:00
Arnaldo Carvalho de Melo 668348a423 [DCCP] CCID3: Stop using dccp_timestamp
Now to convert the ackvec code to ktime_t so that we can get rid of
dccp_timestamp and the epoch thing in dccp_sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:13 -07:00
Arnaldo Carvalho de Melo 9823b7b554 [DCCP]: Convert dccp_sample_rtt to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:13 -07:00
Arnaldo Carvalho de Melo e7a81c6d62 [DCCP]: Convert ccid3hcrx_tstamp_last_feedback to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:12 -07:00
Arnaldo Carvalho de Melo 1faf0a1f5d [DCCP]: Convert ccid3hcrx_tstamp_last_ack to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:11 -07:00
Arnaldo Carvalho de Melo 23f062af6e [DCCP]: Convert ccid3hctx_t_ld to ktime_t
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:11 -07:00
Arnaldo Carvalho de Melo ac198ea8d9 [DCCP]: Make ccid3_hc_tx_update_x get a timestamp if needed
The code was too complicated, if p > 0 in ccid3_hc_tx_no_feedback_timer the
timestamp was being obtained to be passed to ccid3_hc_tx_update_x, where only
if p > 0 the timestamp was needed, so just leave it to ccid3_hc_tx_update_x to
obtain the timestamp if needed.

This will help in the upcoming changesets where we'll convert t_ld to ktime_t.
We'll eventually try to reuse ktime_get_real() calls again.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:48:10 -07:00
Gerrit Renker 39dad26c37 [DCCP]: Allocation in atomic context
This fixes the following bug reported in syslog:

[ 4039.051658] BUG: sleeping function called from invalid context at /usr/src/davem-2.6/mm/slab.c:3032
[ 4039.051668] in_atomic():1, irqs_disabled():0
[ 4039.051670] INFO: lockdep is turned off.
[ 4039.051674]  [<c0104c0f>] show_trace_log_lvl+0x1a/0x30
[ 4039.051687]  [<c0104d4d>] show_trace+0x12/0x14
[ 4039.051691]  [<c0104d65>] dump_stack+0x16/0x18
[ 4039.051695]  [<c011371e>] __might_sleep+0xaf/0xbe
[ 4039.051700]  [<c0157b66>] __kmalloc+0xb1/0xd0
[ 4039.051706]  [<f090416f>] ccid2_hc_tx_alloc_seq+0x35/0xc3 [dccp_ccid2]
[ 4039.051717]  [<f09048d6>] ccid2_hc_tx_packet_sent+0x27f/0x2d9 [dccp_ccid2]
[ 4039.051723]  [<f085486b>] dccp_write_xmit+0x1eb/0x338 [dccp]
[ 4039.051741]  [<f085603d>] dccp_sendmsg+0x113/0x18f [dccp]
[ 4039.051750]  [<c03907fc>] inet_sendmsg+0x2e/0x4c
[ 4039.051758]  [<c033a47d>] sock_aio_write+0xd5/0x107
[ 4039.051766]  [<c015abc1>] do_sync_write+0xcd/0x11c
[ 4039.051772]  [<c015b296>] vfs_write+0x118/0x11f
[ 4039.051840]  [<c015b932>] sys_write+0x3d/0x64
[ 4039.051845]  [<c0103e7c>] syscall_call+0x7/0xb
[ 4039.051848]  =======================

The problem was that GFP_KERNEL was used; fixed by using gfp_any().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-21 20:58:06 -07:00
Jesper Juhl e576de82ee [DCCP]: fix memory leak and clean up style - dccp_feat_empty_confirm()
There's a memory leak in net/dccp/feat.c::dccp_feat_empty_confirm().  If we
hit the 'default:' case of the 'switch' statement, then we return without
freeing 'opt', thus leaking 'struct dccp_opt_pend' bytes.

The leak is fixed easily enough by adding a kfree(opt); before the return
statement.

The patch also changes the layout of the 'switch' to be more in line with
CodingStyle.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-13 22:52:10 -07:00
Oleg Nesterov d725fdc802 [DCCP]: fix theoretical ccids_{read,write}_lock() race
Make sure that spin_unlock_wait() is properly ordered wrt atomic_inc().

(akpm: can't we convert this code to use rwlocks?)

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-13 22:52:09 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Linus Torvalds ce8c2293be Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits)
  [TG3]: Fix msi issue with kexec/kdump.
  [NET] XFRM: Fix whitespace errors.
  [NET] TIPC: Fix whitespace errors.
  [NET] SUNRPC: Fix whitespace errors.
  [NET] SCTP: Fix whitespace errors.
  [NET] RXRPC: Fix whitespace errors.
  [NET] ROSE: Fix whitespace errors.
  [NET] RFKILL: Fix whitespace errors.
  [NET] PACKET: Fix whitespace errors.
  [NET] NETROM: Fix whitespace errors.
  [NET] NETFILTER: Fix whitespace errors.
  [NET] IPV4: Fix whitespace errors.
  [NET] DCCP: Fix whitespace errors.
  [NET] CORE: Fix whitespace errors.
  [NET] BLUETOOTH: Fix whitespace errors.
  [NET] AX25: Fix whitespace errors.
  [PATCH] mac80211: remove rtnl locking in ieee80211_sta.c
  [PATCH] mac80211: fix GCC warning on 64bit platforms
  [GENETLINK]: Dynamic multicast groups.
  [NETLIKN]: Allow removing multicast groups.
  ...
2007-07-19 10:23:21 -07:00
Michael Ellerman 9e367d8592 jprobes: remove JPROBE_ENTRY()
AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything
useful - so remove it ..

I've left a do-nothing version so that out-of-tree jprobes code will still
compile without modifications.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:44 -07:00
YOSHIFUJI Hideaki 23248005fb [NET] DCCP: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-07-19 10:43:28 +09:00
YOSHIFUJI Hideaki bb4dbf9e61 [IPV6]: Do not send RH0 anymore.
Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:55:49 -07:00
Adrian Bunk 4fda25a2cd [DCCP]: Make struct dccp_li_cachep static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:52 -07:00
Gerrit Renker 49d66a70cf [CCID3]: Fix a bug in the send time processing
ccid3_hc_tx_send_packet currently returns 0 when the time difference between
current time and t_nom is less than 1000 microseconds.

In this case the packet is sent immediately; but, unlike other packets that can
be emitted on first attempt, it will not have its window counter updated and
its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
algorithm then converges: time differences > 1000 microseconds trigger the
timer in dccp_write_xmit; after timer expiry this function is tried again; when
the time difference is less than 1000, the packet will have its options added
and window counter updated as required.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:34 -07:00
Gerrit Renker 8132da4d41 [CCID3]: Sending time: update to ktime_t
This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in
                ccid3hctx_no_feedback_timer

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
2007-07-10 22:15:27 -07:00
Arnaldo Carvalho de Melo dd36a9aba4 loss_interval: make struct dccp_li_hist_entry private
net/dccp/ccids/lib/loss_interval.c is the only place where this struct is used.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:24 -07:00
Arnaldo Carvalho de Melo cc4d6a3a34 loss_interval: Nuke dccp_li_hist
It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:23 -07:00
Arnaldo Carvalho de Melo c70b729e66 loss_interval: Make dccp_li_hist_entry_{new,delete} private
Not used outside the loss_interval code anymore.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-07-10 22:15:22 -07:00