Commit Graph

73488 Commits

Author SHA1 Message Date
Paul Mundt 3d46b2e2fa libata: Support PIO polling-only hosts.
By default ata_host_activate() expects a valid IRQ in order to
successfully register the host. This patch enables a special case
for registering polling-only hosts that either don't have IRQs
or have buggy IRQ generation (either in terms of handling or
sensing), which otherwise work fine.

Hosts that want to use polling mode can simply set ATA_FLAG_PIO_POLLING
and pass in an invalid IRQ.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-08 13:08:41 -05:00
Mark Lord 6004bda1cc libata sata_qstor conversion to new error handling (EH).
sata_qstor conversion to new error handling (EH).

Convert sata_qstor to use the newer libata EH mechanisms.
Based on earlier work by Jeff Garzik.

Signed-off-by:  Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-08 13:08:41 -05:00
Mark Lord 904c7bad99 libata sata_qstor workaround for spurious interrupts
sata_qstor workaround for spurious interrupts.

The qstor hardware generates spurious interrupts from time to time when
switching in and out of packet mode.  These eventually result in the
IRQ being disabled, which kills other devices sharing this IRQ with us.

This workaround isn't perfect, but it's about the best we can do for
this hardware.  Spurious interrupts will still happen, but won't be
logged as such, and therefore won't cause the IRQ to be inadvertently
disabled.

Signed-off-by:  Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-08 13:08:41 -05:00
Mark Lord 12ee7d3ceb libata sata_qstor nuke idle state
sata_qstor nuke idle state.

We're really only ever in one of two hardware states:  packet, or mmio.
Get rid of unnecessary "qs_state_idle" state.

Signed-off-by:  Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-08 13:08:41 -05:00
Fernando Luis Vázquez Cao 647c595dad nv_hardreset: update dangling reference to bugzilla entry
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-08 13:08:40 -05:00
Yann Chachkoff 62320e23c3 ata_piix: add SATELLITE PRO U200 to broken suspend list
Please warmly welcome the PRO variant of Satellite U200 to the broken
suspend list.

Original patch is from Yann Chachkoff.  Patch reformatted and
forwarded by Tejun Heo.

Signed-off-by: Yann Chachkoff <yann.chachkoff@myrealbox.com>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-11-08 13:08:40 -05:00
Benny Halevy ef19454bd4 [LIB] crc32c: Keep intermediate crc state in cpu order
crypto/crc32.c:chksum_final() is computing the digest as
*(__le32 *)out = ~cpu_to_le32(mctx->crc);
so the low-level crc32c_le routines should just keep
the crc in cpu order, otherwise it is getting swabbed
one too many times on big-endian machines.

Signed-off-by: Benny Halevy <bhalevy@fs1.bhalevy.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2007-11-08 21:34:09 +08:00
Roland Dreier 8578007065 mmc: Fix sg helper copy-and-paste error
Commit 45711f1a ("[SG] Update drivers to use sg helpers") had the
following bogus change in drivers/mmc/card/queue.c:

    > -			src_buf = page_address(src->page) + src->offset;
    > +			src_buf = sg_virt(dst);

(Notice that "src" is converted to "dst").  Turn this "dst" back into
the intended "src".

Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Tested-by: Romano Giannetti <romano.giannetti@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-08 13:50:58 +01:00
Kevin Hilman a8fa9ba623 [ARM] 4644/2: fix flush_kern_tlb_range() in module space
For kernel addresses between TASK_SIZE and PAGE_OFFSET,
flush_tlb_kern_range() does not work as would be expected.

The TLB invalidate works with a matching ASID, or on entries marked as
global.  The set_pte_at() macro marks addresses >= PAGE_OFFSET as
global, but not addresses from TASK_SIZE to PAGE_OFFSET, which are
also kernel addresses.

The result is that the entries in this range are not actually
invalidated by flush_tlb_kern_range().

This patch instead marks addresses >= TASK_SIZE as global.

Signed-off-by: Satoru Fujii <s-fujii@ct.jp.nec.com>
Signed-off-by: Kevin Hilman <khilman@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-11-08 10:07:14 +00:00
Paul Mundt 541c547731 Merge branch 'page_colouring_despair' 2007-11-08 17:01:42 +09:00
Tejun Heo fffe487d59 pktcdvd: fix BUG caused by sysfs module reference semantics change
pkt_setup_dev() expects module reference to be held on invocation.
This used to be true for sysfs callbacks but not anymore.  Test and
grab module reference around pkt_setup_dev() in
class_pktcdvd_store_add().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-08 08:00:24 +01:00
Paul Mackerras 688016f4e2 Merge branch 'for-2.6.24' of master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx into merge 2007-11-08 14:28:14 +11:00
Linas Vepstas 2c84b4076c [POWERPC] EEH: Make sure warning message is printed
Fix old buglet; a warning message should have been printed
when a hardware reset takes too long.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Johannes Berg 2e6f40deb7 [POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC
This makes the altivec code in swsusp_32.S depend on CONFIG_ALTIVEC to
avoid build failures for systems that don't have altivec. I'm not sure
whether the code will actually work for other systems, but it was merged
for just ppc32 rather than powermac a very long time ago.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Johannes Berg 67b60518b0 [POWERPC] windfarm: Fix windfarm thread freezer interaction
When I fixed the windfarm freezer interaction first in commit
1ed2ddf380, an earlier patch than the one
I came up with after comments was committed. This has come back to haunt
us now because commit d5d8c5976d changed
the freezer to no long send signals. Fix it by removing the windfarm
thread's signal logic and restoring the original try_to_freeze().

We could simply revert 1ed2ddf380 now
but I feel that the assertion that no signal is delivered to the
windfarm thread needs not be there.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Benjamin Herrenschmidt a792e75d9b [POWERPC] Fix si_addr value on low level hash failures
If the low level MMU hash table insertion returns an error (which
can happen in some rare circumstances when the hypervisor refuses
the insertion of a PTE, typically if you try to access junk via
/dev/mem), the generated signal had an incorrect si_addr value due
to a bug in the assembly, which was loading it as a 32 bits quantity
instead of a 64 bits quantity.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Olof Johansson 7e22fa4a1d [POWERPC] Refresh ppc64_defconfig and enable pasemi-related options
Refresh ppc64_defconfig, add PPC_PASEMI and various options that the
common boards there need:

* Chip drivers (iommu, ethernet, IDE, CF, EDAC, MDIO/PHY)
* PCMCIA
* PATA_PCMCIA
* RTC_CLASS
* SATA_MV
* SATA_SIL24
* IP_PNP + NFS_ROOT for diskless booting

+ possibly some other things I might have missed to list

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Olof Johansson 0c83ddfeb4 [POWERPC] pasemi: Update defconfig
Update pasemi_defconfig.  Add a few missing options for default devices
on electra boards, enable tickless and hrtimers, etc, etc.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Stephen Rothwell 7992344fde [POWERPC] iSeries: Fix ref counting in vio setup
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Li Zefan aca71ef882 [POWERPC] ] Fix memset size error
The size passing to memset is wrong.

Signed-off-by Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Stephen Rothwell e95c91821f [POWERPC] Fix link errors for allyesconfig
An allyesconfig build creates a .text section that is so big that the
.text.init.refok and .fixup sections are too far away for the relocations
to be fixed up correctly. This patch fixes that by linking all the
relevent text sections for each file together.

Suggested by Paul Mackerras.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Stephen Rothwell 18244cfbc3 [POWERPC] iSeries_init_IRQ non-PCI tidy
ppc_md.init_IRQ is not called if it is NULL, so we don't need an empty
routine in the non PCI case.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Patrick Mansfield f2205fbb5a [POWERPC] Change fallocate to match unistd.h on powerpc
Fix the fallocate system call on powerpc to match its unistd.h.

This implies none of these system calls are currently working with the
unistd.h sys call values:
	fallocate
	signalfd
	timerfd
	eventfd
	sync_file_range2

Signed-off-by: Patrick Mansfield <patmans@us.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Linas Vepstas b37ceefe7c [POWERPC] EEH: Avoid crash on null device
Bugfix: avoid crash if there's no PCI device for a given
openfirmware node.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Linas Vepstas 2a50f144fc [POWERPC] EEH: Drivers that need reset trump others
Bugfix: if a driver controlling one part of a multi-function PCI card
has asked for a reset, honor that request above all others.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Linas Vepstas 638799b335 [POWERPC] EEH: Clean up comments
Clean up commentary, remove dead code.

Signed-off-by Linas Vepstas <linas@austin.ibm.com>

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Paul Mackerras 43875cc0a5 [POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2)
The decrementer in Book E and 4xx processors interrupts on the
transition from 1 to 0, rather than on the 0 to -1 transition as on
64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors.  At the
moment we subtract 1 from the count of how many decrementer ticks are
required before the next interrupt before putting it into the
decrementer, which is correct for server/classic processors, but could
possibly cause the interrupt to happen too early on Book E and 4xx if
the timebase/decrementer frequency is low.

This fixes the problem by making set_dec subtract 1 from the count for
server and classic processors, instead of having the callers subtract
1.  Since set_dec already had a bunch of ifdefs to handle different
processor types, there is no net increase in ugliness. :)

Note that calling set_dec(0) may not generate an interrupt on some
processors.  To make sure that decrementer_set_next_event always calls
set_dec with an interval of at least 1 tick, we set min_delta_ns of
the decrementer_clockevent to correspond to 2 ticks (2 rather than 1
to compensate for truncations in the conversions between ticks and
ns).

This also removes a redundant call to set the decrementer to
0x7fffffff - it was already set to that earlier in timer_interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
will schmidt 465ccab9eb [POWERPC] Fix switch_slb handling of 1T ESID values
Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc, stack, and
unmapped_base within switch_slb().   A new helper function called
esids_match() contains the logic for deciding when to call GET_ESID
and GET_ESID_1T.

This fixes a duplicate-slb-entry inspired machine-check exception I
was seeing when trying to run java on a power6 partition.

Tested on power6 and power5.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
Tony Breeds e7bda183d4 [POWERPC] Fix build failure when CONFIG_VIRT_CPU_ACCOUNTING is not defined
Without this patch I get the following build failure
  CC      arch/powerpc/platforms/celleb/setup.o
arch/powerpc/platforms/celleb/setup.c:151: error: 'generic_calibrate_decr' undeclared here (not in a function)

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
will schmidt aa39be09df [POWERPC] Include udbg.h when using udbg_printf
This fixes the error
	error: implicit declaration of function "udbg_printf"

We have a few spots where we reference udbg_printf() without #including
udbg.h.  These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
Benjamin Herrenschmidt 20474abda6 [POWERPC] Fix cache line vs. block size confusion
We had an historical confusion in the kernel between cache line
and cache block size. The former is an implementation detail of
the L1 cache which can be useful for performance optimisations,
the later is the actual size on which the cache control
instructions operate, which can be different.

For some reason, we had a weird hack reading the right property
on powermac and the wrong one on any other 64 bits (32 bits is
unaffected as it only uses the cputable for cache block size
infos at this stage).

This fixes the booting-without-of.txt documentation to mention
the right properties, and fixes the 64 bits initialization code
to look for the block size first, with a fallback to the line
size if the property is missing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Alexey Dobriyan fb293ae1c0 [POWERPC] Fix sysctl table check failure on PowerMac
kernel was marked with 0755. Everywhere else it's 0555.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Olof Johansson 4bfac36891 [POWERPC] Fix CONFIG_SMP=n build break
Fix two build errors on powerpc allyesconfig + CONFIG_SMP=n:

arch/powerpc/platforms/built-in.o: In function `cpu_affinity_set':
arch/powerpc/platforms/cell/spu_priv1_mmio.c:78: undefined reference to `.iic_get_target_id'
arch/powerpc/platforms/built-in.o: In function `iic_init_IRQ':
arch/powerpc/platforms/cell/interrupt.c:397: undefined reference to `.iic_setup_cpu'

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Scott Wood aeb4552fad [POWERPC] bootwrapper: Revert ps3 binary flag usage, and remove .bin suffix
The ps3 target produces two images, and the binary one is not the
"primary" image that corresponds to the -o flag; thus, it no longer
uses the generic binary flag.

On platforms which do use the binary flag, it no longer produces a
.bin suffix, so that the output file matches what was passed to the -o flag.

This should fix the zImage ln problems for the ps3 target.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Dale Farnsworth d102c9d5d3 [POWERPC] Fix mv643xx_pci sysfs .read and .write functions
Commit 91a69029 introduced an additional parameter to the .read and .write
methods for sysfs binary attributes.  Two mv64x60_pci functions
were missed in that patch, resulting in these errors:
	/cache/git/linux-2.6/arch/powerpc/sysdev/mv64x60_pci.c:77: warning: initialization from incompatible pointer type
	/cache/git/linux-2.6/arch/powerpc/sysdev/mv64x60_pci.c:78: warning: initialization from incompatible pointer type

Add the missing "struct bin_attribute *" parameter.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Aurelien Jarno 3a800ff50a [POWERPC] i8259: Add disable method
Since commit 76d2160147, the NE2000 card
is not working anymore on PPC and POWERPC and produces WATCHDOG
timeouts.

The patch below fixes that the same way it has been done on x86, x86_64
and MIPS.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Michael Ellerman 1db3e890ae [POWERPC] Read back MSI message in rtas_setup_msi_irqs() so restore works
There are plans afoot to use pci_restore_msi_state() to restore MSI
state after a device reset.  In order for this to work for the RTAS MSI
backend, we need to read back the MSI message from config space after
it has been setup by firmware.

This should be sufficient for restoring the MSI state after a device
reset, however we will need to revisit this for suspend to disk if that
is ever implemented on pseries.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Olof Johansson bdd71eec9b [POWERPC] Fix build break in arch/ppc/syslib/m8260_setup.c
Fix build break and warnings in current mainline git:

arch/ppc/syslib/m8260_setup.c: In function 'm8260_setup_arch':
arch/ppc/syslib/m8260_setup.c:63: error: implicit declaration of function 'identify_ppc_sys_by_name_and_id'
arch/ppc/syslib/m8260_setup.c:64: warning: passing argument 1 of 'in_be32' makes pointer from integer without a cast
arch/ppc/syslib/m8260_setup.c: In function 'm8260_show_cpuinfo':
arch/ppc/syslib/m8260_setup.c:158: warning: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%d' expects type 'int', but argument 6 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 8 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 9 has type 'long unsigned int'
make[1]: *** [arch/ppc/syslib/m8260_setup.o] Error 1
make[1]: *** Waiting for unfinished jobs....

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Paul Mundt 6d1c76d4e7 sh: Kill off broken snapgear ds1302 code.
This will force the snapgear boards to use the on-chip SH RTC instead,
until the rtc-ds1302 driver is merged. The current code is broken
and hasn't built in some time, so just kill it off and get the board
working again.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-11-08 11:24:33 +09:00
Stephen Smalley 45e5421eb5 SELinux: add more validity checks on policy load
Add more validity checks at policy load time to reject malformed
policies and prevent subsequent out-of-range indexing when in permissive
mode.  Resolves the NULL pointer dereference reported in
https://bugzilla.redhat.com/show_bug.cgi?id=357541.

Signed-off-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2007-11-08 08:56:23 +11:00
KaiGai Kohei 6d2b685564 SELinux: fix bug in new ebitmap code.
The "e_iter = e_iter->next;" statement in the inner for loop is primally
bug.  It should be moved to outside of the for loop.

Signed-off-by: KaiGai Kohei <kaigai@kaigai.gr.jp>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2007-11-08 08:55:10 +11:00
Stephen Rothwell 57002bfb31 SELinux: suppress a warning for 64k pages.
On PowerPC allmodconfig build we get this:

security/selinux/xfrm.c:214: warning: comparison is always false due to limited range of data type

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>
2007-11-08 08:55:04 +11:00
Russell King 70dfa3f875 [ARM] Allow watchdog drivers to be selected again
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-11-07 14:13:35 +00:00
Jens Axboe 8ec680e4c3 ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
Normally io priorities follow the CPU nice, unless a specific scheduling
class has been set. Once that is set, there's no way to reset the
behaviour to 'none' so that it follows CPU nice again.

Currently passing in 0 as the ioprio class/value will return -1/EINVAL,
change that to allow resetting of a set scheduling class.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-07 13:54:07 +01:00
Oleg Nesterov 0e7be9edb9 cfq_idle_class_timer: add paranoid checks for jiffies overflow
In theory, if the queue was idle long enough, cfq_idle_class_timer may have
a false (and very long) timeout because jiffies can wrap into the past wrt
->last_end_request.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-07 13:51:35 +01:00
Patrick McHardy c3d8d1e30c [NETLINK]: Fix unicast timeouts
Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.

ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:12 -08:00
Eric Dumazet 230140cffa [INET]: Remove per bucket rwlock in tcp/dccp ehash table.
As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:11 -08:00
Rumen G. Bogdanovski efac52762b [IPVS]: Synchronize closing of Connections
This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.

Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:10 -08:00
Rumen G. Bogdanovski 1e356f9cdf [IPVS]: Bind connections on stanby if the destination exists
This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski

Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2007-11-07 04:15:09 -08:00
Adrian Bunk c183783e28 [NET]: Remove Documentation/networking/pt.txt
There's no no point in keeping documentation for a driver that was
removed many years ago.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:08 -08:00