This fixes the previous fix, which was completely wrong on closer
inspection. This version has been manually tested with a user-space
test harness and generates sane values. A nearly identical patch has
been boot-tested.
The problem arose from changing how kmalloc/kfree handled alignment
padding without updating ksize to match. This brings it in sync.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the RX/TX byte counters for IPIP, GRE and SIT more
consistent. Previously we included the external IP headers on the
way out but not when the packet is inbound.
The new scheme is to count payload only in both directions. For
IPIP and SIT this simply means the exclusion of the external IP
header. For GRE this means that we exclude the GRE header as
well.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for Ethernet over GRE encapsulation.
This is exposed to user-space with a new link type of "gretap"
instead of "gre". It will create an ARPHRD_ETHER device in
lieu of the usual ARPHRD_IPGRE.
Note that to preserver backwards compatibility all Transparent
Ethernet Bridging packets are passed to an ARPHRD_IPGRE tunnel
if its key matches and there is no ARPHRD_ETHER device whose
key matches more closely.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a netlink interface that will eventually displace
the existing ioctl interface. It utilises the elegant rtnl_link_ops
mechanism.
This also means that user-space no longer needs to rely on the
tunnel interface being of type GRE to identify GRE tunnels. The
identification can now occur using rtnl_link_ops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the dev->mtu setting out of ipgre_tunnel_bind_dev.
This is in prepartion of using rtnl_link where we'll need to make
the MTU setting conditional on whether the user has supplied an
MTU. This also requires the move of the ipgre_tunnel_bind_dev
call out of the dev->init function so that we can access the user
parameters later.
This patch also adds a check to prevent setting the MTU below
the minimum of 68.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have dev->needed_headroom, we can use it instead of
having a bogus dev->hard_header_len. This also allows us to
include dev->hard_header_len in the MTU computation so that when
we do have a meaningful hard_harder_len in future it is included
automatically in figuring out the MTU.
Incidentally, this fixes a bug where we ignored the needed_headroom
field of the underlying device in calculating our own hard_header_len.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to export the governors for use as the default governor,
because the default governor will be built-in anyway and we can access
the symbol directly.
This also fixes the following sparse warnings:
drivers/cpufreq/cpufreq_conservative.c:578:25: warning: symbol 'cpufreq_gov_conservative' was not declared. Should it be static?
drivers/cpufreq/cpufreq_ondemand.c:582:25: warning: symbol 'cpufreq_gov_ondemand' was not declared. Should it be static?
drivers/cpufreq/cpufreq_performance.c:39:25: warning: symbol 'cpufreq_gov_performance' was not declared. Should it be static?
drivers/cpufreq/cpufreq_powersave.c:38:25: warning: symbol 'cpufreq_gov_powersave' was not declared. Should it be static?
drivers/cpufreq/cpufreq_userspace.c:190:25: warning: symbol 'cpufreq_gov_userspace' was not declared. Should it be static?
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Dave Jones <davej@redhat.com>
Use get_cpu_idle_time_us() to get micro-accounted idle information.
This enables ondemand to get more accurate idle and busy timings
than the jiffy based calculation. As a result, we can decrease
the ondemand safety gaurd band from 80-10 to 95-3.
Results in more aggressive power savings.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
export get_cpu_idle_time_us() for it to be used in ondemand governor.
Last update time can be current time when the CPU is currently non-idle,
accounting for the busy time since last idle.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Use a parameter for down differential, instead of hardcoded 10%. Follow-on
patch changes the down-differential dynamically, based on whether
we are using idle micro-accounting or not.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Preparatory changes for doing idle micro-accounting in ondemand governor.
get_cpu_idle_time() gets extra parameter and returns idle time and also the
wall time that corresponds to the idle time measurement.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Change the load calculation algorithm in ondemand to work well with software
coordination of frequency across the dependent cpus.
Multiply individual CPU utilization with the average freq of that logical CPU
during the measurement interval (using getavg call). And find the max CPU
utilization number in terms of CPU freq. That number is then used to
get to the target freq for next sampling interval.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.
A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.
Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Venki Pallipadi made a similar change to the ondemand governor a while
back (in commit 28287033e1). It seems to
work just as well in the conservative governor, leading to fewer wakeups
as reported by powertop.
Signed-off-by: Ben Slusky <sluskyb@paranoiacs.org>
Signed-off-by: Dave Jones <davej@redhat.com>
After calling cpufreq_cpu_get, error handling code should call
cpufreq_cpu_put.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@
(
if ((x = cpufreq_cpu_get@p1(...)) == NULL || ...) S
|
x = cpufreq_cpu_get@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != cpufreq_cpu_put(x)
when != if (x) { ... cpufreq_cpu_put(x); ...}
return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
cpufreq_cpu_put(x)
)
@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@
* x = cpufreq_cpu_get@p1(...)
<...
* if@p3 (...)
S
...>
* return@p2 \(NULL\|ret\);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
Add error handling for cpufreq_register_governor() error
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Dave Jones <davej@redhat.com>
add error handling for cpufreq_register_driver() error
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Dave Jones <davej@redhat.com>
Replace the no longer working links and email address in the
documentation and in source code.
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Dave Jones <davej@redhat.com>
1. arch/powerpc/platforms/pasemi/gpio_mdio.c also needs to be
converted over to mdiobus_{alloc,free}().
2. drivers/net/phy/fixed.c used to embed a struct mii_bus into its
struct fixed_mdio_bus and then use container_of() to go from the
former to the latter. Since mii bus structures are no longer
embedded, we need to do something like use the mii bus private
pointer to go from mii_bus to fixed_mdio_bus instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add /proc/sys/kernel/sched_domain/cpu0/domain0/name, to make
it easier to see which specific scheduler domain remained at
that entry.
Since we process the scheduler domain tree and
simplify it, it's not always immediately clear during debugging
which domain came from where.
depends on CONFIG_SCHED_DEBUG=y.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Enable driver checking of the DMI product name (when enabled) on
an Abit AT8 32X, instead of falling back to a manual probe. This
eliminates false negatives and eventually will help avoid
unnecessary bus probes on unsupported mainboards.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Daniel Exner <dex@dragonslave.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
The table for the Abit AT8 32X was incorrectly missing an entry
for the sixth ("AUX3") fan. Add this entry, exporting the fan
reading to userspace.
Closes lm-sensors.org ticket #2339.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Daniel Exner <dex@dragonslave.de>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Describe the sysfs files that were introduced in the ibmaem driver.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
On the Shuttle SN68PT, FAN_CTL2 is apparently not connected to a fan,
but to something else. One user has reported instant system power-off
when changing the PWM2 duty cycle, so we disable it.
I use the board name string as the trigger in case the same board is
ever used in other systems.
This closes lm-sensors ticket #2349:
pwmconfig causes a hard poweroff
http://www.lm-sensors.org/ticket/2349
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Creates a name file in the sysfs directory, that
is needed for the libsensors library to work.
Also rename fan1_pwm to pwm1 and scale its value as needed.
This fixes bug #11520:
http://bugzilla.kernel.org/show_bug.cgi?id=11520
Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Fix kernel-doc in new functions:
Error(mmotm-2008-1002-1617//fs/block_dev.c:895): duplicate section name 'Description'
Error(mmotm-2008-1002-1617//fs/block_dev.c:924): duplicate section name 'Description'
Warning(mmotm-2008-1002-1617//fs/block_dev.c:1282): No description found for parameter 'pathname'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Since all bio_split calls refer the same single bio_split_pool, the bio_split
function can use bio_split_pool directly instead of the mempool_t parameter;
then the mempool_t parameter can be removed from bio_split param list, and
bio_split_pool is only referred in fs/bio.c file, can be marked static.
Signed-off-by: Denis ChengRq <crquan@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Helper function to find the sector offset in a bio given bvec index
and page offset.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This is a wrapper for accessing a gendisk's integrity bits. It allows
the integrity support in MD to be compiled with BLK_DEV_INTEGRITY off.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The DM and MD integrity support now depends on being able to use
gendisks instead of block_devices when comparing integrity profiles.
Change function parameters accordingly.
Also update comparison logic so that two NULL profiles are a valid
configuration.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
- kobject_del already puts the parent.
- Set integrity profile to NULL to prevent stale data.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
A filesystem might supply its own integrity metadata. Introduce a
flag that indicates whether the filesystem or the block layer owns the
integrity buffer.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The whole bio_integrity() definition is inside an #ifdef
CONFIG_BLK_DEV_INTEGRITY, there's no need for the conditional code.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch removes end_queued_request() and end_dequeued_request(),
which are no longer used.
As a results, users of __end_request() became only end_request().
So the actual code in __end_request() is moved to end_request()
and __end_request() is removed.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts elevator to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' arguments is converted to 'error'.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts gdrom to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
gd.transfer is '1' in error cases and '0' in non-error cases,
so gdrom hasn't been propagating any error code to the block layer.
We can just convert error cases to '-EIO'.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts memstick to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts virtio_blk to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' argument is converted to 'error'.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Define as 32, which is is what BDEVNAME_SIZE is/was as well. This keeps
the user interface the same and gets rid of the difference between
kernel and user api here.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds an new interface, blk_lld_busy(), to check lld's
busy state from the block layer.
blk_lld_busy() calls down into low-level drivers for the checking
if the drivers set q->lld_busy_fn() using blk_queue_lld_busy().
This resolves a performance problem on request stacking devices below.
Some drivers like scsi mid layer stop dispatching request when
they detect busy state on its low-level device like host/target/device.
It allows other requests to stay in the I/O scheduler's queue
for a chance of merging.
Request stacking drivers like request-based dm should follow
the same logic.
However, there is no generic interface for the stacked device
to check if the underlying device(s) are busy.
If the request stacking driver dispatches and submits requests to
the busy underlying device, the requests will stay in
the underlying device's queue without a chance of merging.
This causes performance problem on burst I/O load.
With this patch, busy state of the underlying device is exported
via q->lld_busy_fn(). So the request stacking driver can check it
and stop dispatching requests if busy.
The underlying device driver must return the busy state appropriately:
1: when the device driver can't process requests immediately.
0: when the device driver can process requests immediately,
including abnormal situations where the device driver needs
to kill all requests.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blk_start_queueing() should act like the generic queue unplugging
and kicking and ignore a stopped queue. Such a queue may not be
run until after a call to blk_start_queue().
Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This header file is of interest for user space programming, i.e.
for tools that process blktrace data.
We would like to use it for a tool on-top of blktrace which processes
data provided by blktrace. For this purpose, it would be helpful
if the blktrace API would make it to /usr/include/linux.
The git tree for the blktrace tools comes with its own copy of this header
file. I didn't manage to replace that copy with the file generated
by the patch below yet. A few more cleanups would be needed.
For example, the blktrace ioctl numbers, which are currently defined in
usr/include/fs.h, might need to be moved. Should be feasible, though.
Signed-off-by: Sven Schuetz <sven@linux.vnet.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
By only allowing async IO to consume 3/4 ths of the tag depth, we
always have slots free to serve sync IO. This is important to avoid
having writes fill the entire tag queue, thus starving reads.
Original patch and idea from Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We really need to know about the hardware tagging support as well,
since if the SSD does not do tagging then we still want to idle.
Otherwise have the same dependent sync IO vs flooding async IO
problem as on rotational media.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>