This corrects an error in the computation of the open loss interval I_0:
* the interval length is (highest_seqno - start_seqno) + 1
* and not (highest_seqno - start_seqno).
This condition was not fully clear in RFC 3448, but reflects the current
revision state of rfc3448bis and is also consistent with RFC 4340, 6.1.1.
Further changes:
----------------
* variable renamed due to line length constraints;
* explicit typecast to `s64' to avoid implicit signed/unsigned casting.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in the logic of the TFRC loss detection:
* new_loss_indicated() should not be called while a loss is pending;
* but the code allows this;
* thus, for two subsequent gaps in the sequence space, when loss_count
has not yet reached NDUPACK=3, the loss_count is falsely reduced to 1.
To avoid further and similar problems, all loss handling and loss detection is
now done inside tfrc_rx_hist_handle_loss(), using an appropriate routine to
track new losses.
Further changes:
----------------
* added a reminder that no RX history operations should be performed when
rx_handle_loss() has identified a (new) loss, since the function takes
care of packet reordering during loss detection;
* made tfrc_rx_hist_loss_pending() bool (thanks to an earlier suggestion
by Arnaldo);
* removed unused functions.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
RFC 4340, 7.7 specifies up to 6 bytes for the NDP Count option, whereas the code
is currently limited to up to 3 bytes. This seems to be a relict of an earlier
draft version and is brought up to date by the patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
The TFRC loss detection code used the wrong loss condition (RFC 4340, 7.7.1):
* the difference between sequence numbers s1 and s2 instead of
* the number of packets missing between s1 and s2 (one less than the distance).
Since this condition appears in many places of the code, it has been put into a
separate function, dccp_loss_free().
Further changes:
----------------
* tidied up incorrect typing (it was using `int' for u64/s64 types);
* optimised conditional statements for common case of non-reordered packets;
* rewrote comments/documentation to match the changes.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in computing the inter-packet-interval t_ipi = s/X:
scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the
sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater
than 2^26 Bps (~537 Mbps).
Using full 64-bit division now.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This fixes a bug in the reverse lookup of p: given a value f(p), instead of p,
the function returned the smallest tabulated value f(p).
The smallest tabulated value of
10^6 * f(p) = sqrt(2*p/3) + 12 * sqrt(3*p/8) * (32 * p^3 + p)
for p=0.0001 is 8172.
Since this value is scaled by 10^6, the outcome of this bug is that a loss
of 8172/10^6 = 0.8172% was reported whenever the input was below the table
resolution of 0.01%.
This means that the value was over 80 times too high, resulting in large spikes
of the initial loss interval, thus unnecessarily reducing the throughput.
Also corrected the printk format (%u for u32).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch fixes the following sparse warnings:
* nested min(max()) expression:
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one
* Declaration of function prototypes in .c instead of .h file, resulting in
"should it be static?" warnings.
* Declared "struct dccpw" static (local to dccp_probe).
* Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
("Receivers SHOULD implement delayed acknowledgement timers ...").
* Used a different local variable name to avoid
net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
net/dccp/ackvec.c:238:33: originally declared here
* Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.
The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the upper bound of 4 seconds for an RTT sample has been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves two inlines back to packet_history.h: these are not private
to packet_history.c, but are needed by CCID3/4 to detect whether a new
loss is indicated, or whether a loss is already pending.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.
The `swap' routine to keep the RX history sorted is due to and was written
by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant.
Details:
* access to the Loss Interval Records via macro wrappers (with safety checks);
* simplified, on-demand allocation of entries (no extra memory consumption on
lossless links); cache allocation is local to the module / exported as service;
* provision of RFC-compliant algorithm to re-compute average loss interval;
* provision of comprehensive, new loss detection algorithm
- support for all cases of loss, including re-ordered/duplicate packets;
- waiting for NDUPACK=3 packets to fill the hole;
- updating loss records when a late-arriving packet fills a hole.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves the inlines (which were previously declared as macros) back into
packet_history.h since the loss detection code needs to be able to read entries
from the RX history in order to create the relevant loss entries: it needs at
least tfrc_rx_hist_loss_prev() and tfrc_rx_hist_last_rcv(), which in turn
require the definition of the other inlines (macros).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This separates RX/TX initialisation and puts all packet history / loss intervals
initialisation into tfrc.c.
The organisation is uniform: slab declaration -> {rx,tx}_init() -> {rx,tx}_exit()
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moved up the comment "Receiver routines" above the first occurrence of
RX history routines.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.
I modified it just to try to make it closer to the existing API, renaming some
functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not
freeing the allocated ring entries on the error path.
Original changeset comment from Gerrit:
-----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.
Details:
* new data structure, initialisation and cleanup routines;
* allocation of dccp_rx_hist entries local to packet_history.c,
as a service exported by the dccp_tfrc_lib module.
* interface to automatically track highest-received seqno;
* receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
* a generic function to test for `data packets' as per RFC 4340, sec. 7.7.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is in preparation for merging the new rx history code written by Gerrit Renker.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is in preparation for merging the new rx history code written by Gerrit Renker.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the tfrc_lib module in the following manner:
(1) a dedicated tfrc source file to call the packet history &
loss interval init/exit functions.
(2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.
Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3
select IP_DCCP_TFRC_LIB.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on a previous patch by Gerrit Renker.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch was based on another made by Gerrit Renker, his changelog was:
------------------------------------------------------
The patch set migrates TFRC TX history to a singly-linked list.
The details are:
* use of a consistent naming scheme (all TFRC functions now begin with `tfrc_');
* allocation and cleanup are taken care of internally;
* provision of a lookup function, which is used by the CCID TX infrastructure
to determine the time a packet was sent (in turn used for RTT sampling);
* integration of the new interface with the present use in CCID3.
------------------------------------------------------
Simplifications I did:
. removing the tfrc_tx_hist_head that had a pointer to the list head and
another for the slabcache.
. No need for creating a slabcache for each CCID that wants to use the TFRC
tx history routines, create a single slabcache when the dccp_tfrc_lib module
init routine is called.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The moving average computation occurs so frequently in the CCID 3 code that
it merits an inline function of its own. This is uses a suggestion by
Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This fixes a bug which uses an invalid comparison.
The bug resulted in the use of invalid loss intervals.
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No code change at all.
This reorders the source file to follow the same order as the corresponding
header file.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
No code change at all.
To make the header file easier to read, the following ordering is established
among the declarations:
* hist_new
* hist_delete
* hist_entry_new
* hist_head
* hist_find_entry
* hist_add_entry
* hist_entry_delete
* hist_purge
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This removes the `dccphtx_ccval' field since it is nowhere used in the code and
in fact not necessary for the accounting.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
In migrating towards using the newer functions scaled_div/scaled_div32
for TFRC computations mapped from floating-point onto integer arithmetic,
this completes the last stage of modifications.
In particular, the overflow case for computing X_calc is circumvented by
* breaking the computation into two stages
* the first stage, res = (s*1E6)/R, cannot overflow due to use of u64
* in the second stage, res = (res*1E6)/f, overflow on u32 is avoided due
to (i) returning UINT_MAX in this case (which is logically appropriate)
and (ii) issuing a warning message into the system log (since very likely
there is a problem somewhere else with the parameters)
Lastly, all such scaling operations are now exported into tfrc.h, since
actually this form of scaled computation is specific to TFRC and not to CCID3.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_ATOMIC is an alias of GFP_ATOMIC
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces the linear search algorithm for reverse lookup with
binary search.
It has the advantage of better scalability: O(log2(N)) instead of O(N).
This means that the average number of iterations is reduced from 250
(linear search if each value appears equally likely) down to at most 9.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This
* adds documentation about the lowest resolution that is possible within
the bounds of the current lookup table
* defines a constant TFRC_SMALLEST_P which defines this resolution
* issues a warning if a given value of p is below resolution
* combines two previously adjacent if-blocks of nearly identical
structure into one
This patch does not change the algorithm as such.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
1) For the forward X_calc lookup, it
* protects effectively against RTT=0 (this case is possible), by
returning the maximal lookup value instead of just setting it to 1
* reformulates the array-bounds exceeded condition: this only happens
if p is greater than 1E6 (due to the scaling)
* the case of negative indices can now with certainty be excluded,
since documentation shows that the formulas are within bounds
* additional protection against p = 0 (would give divide-by-zero)
2) For the reverse lookup, it warns against
* protects against exceeding array bounds
* now returns 0 if f(p) = 0, due to function definition
* warns about minimal resolution error and returns the smallest table
value instead of p=0 [this would mask congestion conditions]
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This fixes the following small error in tfrc_calc_x_reverse_lookup.
1) The table is generated by the following equations:
lookup[index][0] = g((index+1) * 1000000/TFRC_CALC_X_ARRSIZE);
lookup[index][1] = g((index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE);
where g(q) is 1E6 * f(q/1E6)
2) The reverse lookup assigns an entry in lookup[index][small]
3) This index needs to match the above, i.e.
* if small=0 then
p = (index + 1) * 1000000/TFRC_CALC_X_ARRSIZE
* if small=1 then
p = (index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE
These are exactly the changes that the patch makes; previously the code did
not conform to the way the lookup table was generated (this difference resulted
in a mean error of about 1.12%).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This adds documentation for the TCP Reno throughput equation which is at
the heart of the TFRC sending rate / loss rate calculations.
It spells out precisely how the values were determined and what they mean.
The equations were derived through reverse engineering and found to be
fully accurate (verified using test programs).
This patch does not change any code.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.
In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.
[DCCP]: Introduce a consistent BUG/WARN message scheme
This refines the existing set of DCCP messages so that
* BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
* DCCP_CRIT (for severe warnings) is not rate-limited
* DCCP_WARN() is introduced as rate-limited wrapper
Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This fixes CCID3 to give much closer performance to RFC4342.
CCID3 is meant to alter sending rate based on RTT and loss.
The performance was verified against:
http://wand.net.nz/~perry/max_download.php
For example I tested with netem and had the following parameters:
Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.
This gives a theoretical speed of 71.9 Kbits/s. I measured across three
runs with this patch set and got 70.1 Kbits/s. Without this patchset the
average was 232 Kbits/s which means Linux can't be used for CCID3 research
properly.
I also tested with netem turned off so box just acting as router with 1.2
msec RTT. The performance with this is the same with or without the patch
at around 30 Mbit/s.
Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>