There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat. Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is a version incorporating Christoph's suggestion.
Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove open-coded memdup_user().
Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may
cause pagefault, it's pointless to pass GFP_NOFS to kmalloc().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Documentation/filesystems/vfs.txt incorrectly states that the kernel is
locked during the call to statfs (Documentation/filesystems/Locking
correctly says it is not). This patch removes the offending sentence.
remove reference to BKL being held in statfs
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the
directory.
This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.
While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught that, if it wasn't so busy bitching about
__cold__.
Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses. To fix that, it was
necessary to fix the called functions so that they expect i_mutex to be
held; that part was done by J. Bruce Fields.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Umm-I-can-live-with-that-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Tested-by: J. Bruce Fields <bfields@citi.umich.edu>
LKML-Reference: <8036.1237474444@jrobl>
Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In file included from fs/compat_ioctl.c:61:
include/linux/loop.h:59: error: field 'lo_bio_list' has incomplete type
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mnt should remain the same for all iterations through the list;
as it is, if we have a busy mount, mnt follows into it and isn't
restored for the next iteration.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
AFAICS, we have a subtle bug there: if we have crossed mountpoint
*and* it got mount --move'd away, we'll be holding only one
reference to fs containing dentry - exp->ex_path.mnt. IOW, we
ought to dput() before exp_put().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We shouldn't just touch the namespace of current process
Caught-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Missing conversion from kernel to userland dev_t; this sucker
breaks as soon as we get sufficiently many autofs mounts for
new_encode_dev(s_dev) != s_dev.
Note: this is the minimal fix.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mmap2 uses a fixed page shift of 12, regardless of the PAGE_SIZE setting.
Fix up the mmap2 code to add some sanity checks on the mapping, and to
update pgoff accordingly.
Error handling bits based on 4280e3126f
("frv: fix mmap2 error handling").
Signed-off-by: Toshinobu Sugioka <sugioka@itonet.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
reada_for_balance was using the wrong index into the path node array,
so it wasn't reading the right blocks. We never directly used the
results of the read done by this function because the btree search is
started over at the end.
This fixes reada_for_balance to reada in the correct node and to
avoid searching past the last slot in the node. It also makes sure to
hold the parent lock while we are finding the nodes to read.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The extent_io writepage call updates the writepage index in the inode
as it makes progress. But, it was doing the update after unlocking the page,
which isn't legal because page->mapping can't be trusted once the page
is unlocked.
This lead to an oops, especially common with compression turned on. The
fix here is to update the writeback index before unlocking the page.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
higher priority. But, the checksumming helper threads prevent it
from being fully effective.
There are two problems. First, a big queue of pending checksumming
will delay the synchronous IO behind other lower priority writes. Second,
the checksumming uses an ordered async work queue. The ordering makes sure
that IOs are sent to the block layer in the same order they are sent
to the checksumming threads. Usually this gives us less seeky IO.
But, when we start mixing IO priorities, the lower priority IO can delay
the higher priority IO.
This patch solves both problems by adding a high priority list to the async
helper threads, and a new btrfs_set_work_high_prio(), which is used
to make put a new async work item onto the higher priority list.
The ordering is still done on high priority IO, but all of the high
priority bios are ordered separately from the low priority bios. This
ordering is purely an IO optimization, it is not involved in data
or metadata integrity.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for
writes we plan on waiting on in the near future. This patch
mirrors recent changes in other filesystems and the generic code to
use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for
other latency critical writes.
Btrfs uses async worker threads for checksumming before the write is done,
and then again to actually submit the bios. The bio submission code just
runs a per-device list of bios that need to be sent down the pipe.
This list is split into low priority and high priority lists so the
WRITE_SYNC IO happens first.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] fix allmodconfig compilation breakage.
[IA64] smp_flush_tlb_mm() should only send IPI's to cpus in cpu_vm_mask
[IA64] export smp_send_reschedule
This patch fixes the following compilation error caused by recursive
inclusion of kernel.h which defines BUILD_BUG_ON().
In this case, the case it catches will be caught by the case
CONFIG_PARAVIRT=n, so removing it would not hurt compile time check
very much. So fix the breakage by removing it.
CC arch/ia64/kernel/asm-offsets.s
In file included from include/linux/bitops.h:17,
from include/linux/kernel.h:15,
from include/linux/sched.h:52,
from arch/ia64/kernel/asm-offsets.c:9:
arch/ia64/include/asm/bitops.h: In function 'set_bit':
arch/ia64/include/asm/bitops.h:47: error: implicit declaration of function 'BUILD_BUG_ON'
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
agp: zero pages before sending to userspace
drm: check for minor master before allowing drop master.
drm: set/clear is_master when master changed
drm: clean dirty memory after device release
drm: count reaches -1
* 'for-linus' of git://neil.brown.name/md:
md: support bitmaps on RAID10 arrays larger then 2 terabytes
md: update sync_completed and reshape_position even more often.
md: improve usefulness and accuracy of sysfs file md/sync_completed.
md: allow setting newly added device to 'in_sync' via sysfs.
md: tiny md.h cleanups
Add MAINTAINERS record for FS-Cache and CacheFiles.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stop the FRV arch from attempting to #include <linux/blk.h> as it doesn't
exist.
Reported-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
notice one system /proc/iomem some entries missed the name for pci_devices
it turns that dev->dev.kobj name is changed after device_add.
for pci code: via acpi_pci_root_driver.ops.add (aka acpi_pci_root_add)
==> pci_acpi_scan_root is used to scan pci bus/device, and at the same
time we read the resource for pci_dev in the pci_read_bases, we have
res->name = pci_name(pci_dev); pci_name is calling dev_name.
later via acpi_pci_root_driver.ops.start (aka acpi_pci_root_start) ==>
pci_bus_add_device to add all pci_dev in kobj tree. pci_bus_add_device
will call device_add.
actually in device_add
/* first, register with generic layer. */
error = kobject_add(&dev->kobj, dev->kobj.parent, "%s", dev_name(dev));
if (error)
goto Error;
will get one new name for that kobj, old name is freed.
[Impact: fix corrupted names in /proc/iomem ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows for the possibility of returning VM_FAULT_OOM as
well as VM_FAULT_SIGBUS. This ensures that the correct action
is taken.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
The dirty bit can get set during the inode glock sync. Its too
complicated to change that at the moment, so this is the quick
fix - to clear the bit again at the end of the function.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
.. and other arrays with components larger than 2 terabytes.
We use a "long" rather than a "sector_t" in part of the bitmap
size calculations, which is sad.
Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
Signed-off-by: NeilBrown <neilb@suse.de>
AGP pages might be mapped into userspace finally, so the pages should be
set to zero before userspace can use it. Otherwise there is potential
information leakage.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When fast user switching a lot eventually we get to the point,
where we were checking for the wrong thing in this function.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The variable is_master is being used to track the drm_file that is currently
master, so its value needs to be updated accordingly when the master is
changed.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In current code we register/unregister connector object by
drm_sysfs_connector_add/remove function.
However under some cases, we need to dynamically register or unregister device
multiple times, so we have to go through register -> unregister ->register
routine.
Because after device_unregister function our memory is dirty, we need to do
clean operation in order to re-register the device, otherwise the system
will crash. The patch intends to clean device after device release.
Signed-off-by: Ma Ling <ling.ma@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With a postfix decrement in the test count will reach -1 rather than 0,
subsequent tests fail.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 900af0d973 (PM: Change suspend
code ordering) changed the ordering of suspend code in such a way
that the platform .prepare() callback is now executed after the
device drivers' late suspend callbacks have run. Unfortunately, this
turns out to break ARM platforms that need to talk via I2C to power
control devices during the .prepare() callback.
For this reason introduce two new platform suspend callbacks,
.prepare_late() and .wake(), that will be called just prior to
disabling non-boot CPUs and right after bringing them back on line,
respectively, and use them instead of .prepare() and .finish() for
ACPI suspend. Make the PM core execute the .prepare() and .finish()
platform suspend callbacks where they were executed previously (that
is, right after calling the regular suspend methods provided by
device drivers and right before executing their regular resume
methods, respectively).
It is not necessary to make analogous changes to the hibernation
code and data structures at the moment, because they are only used
by ACPI platforms.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Len Brown <len.brown@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Set function_id only on FG nodes
ALSA: MAINTAINERS - Update SOUND
ALSA: emu10k1 - off by 1 in snd_emu10k1_wait()
ASoC: OMAP: Fix FS polarity in OSK5912 machine driver
ASoC: OMAP: Fix DSP_B format in OMAP McBSP DAI driver
ASoC: Fix include build error in s3c2412-i2s.c
ASoC: Fix s3c-i2s-v2.c snd_soc_dai changes
ASoC: s3c-i2s-v2.c fix for s3c_i2sv2_iis_calc_rate
ASoC: Fix jive_wm8750.c build problems
ASoC: pxa-ssp: allow setting of dai format 0
ALSA: hda - Add upper-limit of mixer amp for AD1884A-laptop model, too
ALSA: hda - Fix headphone-detection on some machines with STAC/IDT codecs
ALSA: Intel8x0: Add hp_only quirk for SSID 0x1028016a (Dell Inspiron 8600)
ALSA: Intel8x0: Remove conflicting quirk for SSID 0x103c0934
ALSA: hda_intel.c - Consolidate bitfields
This reverts commit 1c55f18717.
Ingo Brueckl was assuming that reverting to 1:1 mapping for chars >= 128
was not useful, but it happens to be: due to the limitations of the
Linux console, when a blind user wants to read BIG5 on it, he has no
other way than loading a font without SFM and let the 1:1 mapping permit
the screen reader to get the BIG5 encoding.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
<linux/seccomp.h> uses EINVAL so should include <linux/errno.h>. This
fixes a build error on 64-bit MIPS if CONFIG_SECCOMP is disabled.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 0a1c01c947 ("Make relatime
default") when a file system is mounted explicitely with noatime it gets
both the MNT_RELATIME and MNT_NOATIME bits set.
This shows up like this in /proc/mounts:
/dev/xxx /yyy ext3 rw,noatime,relatime,errors=continue,data=writeback 0 0
That looks strange. The VFS uses noatime in this case, but both flags
are set. So it's more a cosmetic issue, but still better to fix.
Cc: mjg@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert noted that we don't actually document that lguest is 32-bit only,
nor that PAE must be off (CONFIG_PAE is now prompted for if HIGHMEM is
set to "off).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org
Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>
Break out of wait_event_interruptible() if freezing has been requested,
in the vballoon thread. Without this change vballoon refuses to stop and
the system can't suspend.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Fixes guest crash 'lguest: bad read address 0x4800000 len 256'
The new per-cpu allocator ends up handing a non-linear address to
write_gdt_entry. We do __pa() on it, and hand it to the host, which
kills us.
I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
code, but had no pressing reason until now.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org
Typical message: 'lguest: unhandled trap 6 at 0x418726 (0x0)'
vmlinux guests were broken by 4cd8b5e2a1
'lguest: use KVM hypercalls', which rewrites guest text from kvm hypercalls
to trap 31.
The Launcher mmaps the kernel image. The Guest executes and
immediately faults in the first text page (read-only). Then it hits a
hypercall, and we rewrite that hypercall, causing a copy-on-write.
But the Guest pagetables still refer to the old page: we fault again,
but as Host we see the hypercall already rewritten, and pass the fault
back to the Guest. The Guest hasn't set up an IDT yet, so we kill it.
This doesn't happen with bzImages: they unpack themselves and so the
text pages are already read-write.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Patrick McHardy <kaber@trash.net>