linux

Commit Graph

Author	SHA1	Message	Date
Michael Ellerman	c90bfeb80f	MAINTAINERS: add hvc_console Add a MAINTAINERS entry for the hypervisor virtual console driver. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:49 -07:00
Martin Schwidefsky	58984ce21d	mm: do_xip_mapping_read: fix length calculation The calculation of the value nr in do_xip_mapping_read is incorrect. If the copy required more than one iteration in the do while loop the copies variable will be non-zero. The maximum length that may be passed to the call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the check only compares against (nr > len). This bug is the cause for the heap corruption Carsten has been chasing for so long: * glibc detected * /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x200000b9b44] /lib64/libc.so.6(cfree+0x8e)[0x200000bdade] /bin/bash(free_buffered_stream+0x32)[0x80050e4e] /bin/bash(close_buffered_stream+0x1c)[0x80050ea4] /bin/bash(unset_bash_input+0x2a)[0x8001c366] /bin/bash(make_child+0x1d4)[0x8004115c] /bin/bash[0x8002fc3c] /bin/bash(execute_command_internal+0x656)[0x8003048e] /bin/bash(execute_command+0x5e)[0x80031e1e] /bin/bash(execute_command_internal+0x79a)[0x800305d2] /bin/bash(execute_command+0x5e)[0x80031e1e] /bin/bash(reader_loop+0x270)[0x8001efe0] /bin/bash(main+0x1328)[0x8001e960] /lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8] /bin/bash(clearerr+0x5e)[0x8001c092] With this bug fix the commit `0e4a9b5928` "ext2/xip: refuse to change xip flag during remount with busy inodes" can be removed again. Cc: Carsten Otte <cotte@de.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:49 -07:00
Anton Blanchard	417b43d4b7	random: align rekey_work's timer Align rekey_work. Even though it's infrequent, we may as well line it up. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:49 -07:00
Anton Blanchard	98f4ebb290	mm: align vmstat_work's timer Even though vmstat_work is marked deferrable, there are still benefits to aligning it. For certain applications we want to keep OS jitter as low as possible and aligning timers and work so they occur together can reduce their overall impact. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Jeff Layton	d2caa3c549	writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3 ) The dirtied_when value on an inode is supposed to represent the first time that an inode has one of its pages dirtied. This value is in units of jiffies. It's used in several places in the writeback code to determine when to write out an inode. The problem is that these checks assume that dirtied_when is updated periodically. If an inode is continuously being used for I/O it can be persistently marked as dirty and will continue to age. Once the time compared to is greater than or equal to half the maximum of the jiffies type, the logic of the time_() macros inverts and the opposite of what is needed is returned. On 32-bit architectures that's just under 25 days (assuming HZ == 1000). As the least-recently dirtied inode, it'll end up being the first one that pdflush will try to write out. sync_sb_inodes does this check: / Was this inode dirtied after sync_sb_inodes was called? */ if (time_after(inode->dirtied_when, start)) break; ...but now dirtied_when appears to be in the future. sync_sb_inodes bails out without attempting to write any dirty inodes. When this occurs, pdflush will stop writing out inodes for this superblock. Nothing can unwedge it until jiffies moves out of the problematic window. This patch fixes this problem by changing the checks against dirtied_when to also check whether it appears to be in the future. If it does, then we consider the value to be far in the past. This should shrink the problematic window of time to such a small period (30s) as not to matter. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Andrew Morton	846c151a4d	__tty_open(): use the correct type for saved_flags filp->f_flags is unsigned, so use that type for the local copy. Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Wu Fengguang	b6fac63cc1	vfs: skip I_CLEAR state inodes clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so _outside_ of inode_lock. So any I_FREEING testing is incomplete without a coupled testing of I_CLEAR. So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and add_dquot_ref(). Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara reminds fixing the other two cases. Masayoshi MIZUMA has a nice panic flow: ===================================================================== [process A] \| [process B] \| \| \| prune_icache() \| drop_pagecache() \| spin_lock(&inode_lock) \| drop_pagecache_sb() \| inode->i_state \|= I_FREEING; \| \| \| spin_unlock(&inode_lock) \| V \| \| \| spin_lock(&inode_lock) \| V \| \| \| dispose_list() \| \| \| list_del() \| \| \| clear_inode() \| \| \| inode->i_state = I_CLEAR \| \| \| \| \| V \| \| \| if (inode->i_state & (I_FREEING\|I_WILL_FREE)) \| \| \| continue; <==== NOT MATCH \| \| \| \| \| \| (DANGER from here on! Accessing disposing inode!) \| \| \| \| \| \| __iget() \| \| \| list_move() <===== PANIC on poisoned list !! V V \| (time) ===================================================================== Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
David Howells	33e5d76979	nommu: fix a number of issues with the per-MM VMA patch Fix a number of issues with the per-MM VMA patch: (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on a NOMMU system with more than 2G pages. Makes no difference on a 32-bit system. (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value, lest it overflow. (3) Move the allocation of the vm_area_struct slab back for fork.c. (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs. (5) Use BUG_ON() rather than if () BUG(). (6) Make the default validate_nommu_regions() a static inline rather than a #define. (7) Make free_page_series()'s objection to pages with a refcount != 1 more informative. (8) Adjust the __put_nommu_region() banner comment to indicate that the semaphore must be held for writing. (9) Limit the number of warnings about munmaps of non-mmapped regions. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Sergey Senozhatsky	5482415a5e	fb: nvidiafb recognizes geforcego 7300 chip as mobile nvidiafb recognizes geforcego 7300 chip as mobile Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Akinobu Mita	ee3b4290ae	generic debug pagealloc: build fix This fixes a build failure with generic debug pagealloc: mm/debug-pagealloc.c: In function 'set_page_poison': mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: In function 'clear_page_poison': mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: In function 'page_poison': mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: At top level: mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages' include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here mm/debug-pagealloc.c: In function 'kernel_map_pages': mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function) by fixing - debug_flags should be in struct page - define DEBUG_PAGEALLOC config option for all architectures Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Alex Deucher	029a2edbd3	drm/radeon: load the right microcode on rs780 Copy/paste error. The RV670 microcode should work ok, so it's not a show stopper. Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 10:31:23 +10:00
Dave Airlie	5f3dbedf27	Merge branch 'drm-intel-next' of ../anholt-2.6 into drm-linus	2009-04-03 10:27:21 +10:00
Jesse Barnes	7a1fb5d06d	drm: remove unused "can_grow" parameter from drm_crtc_helper_initial_config Cleanup some leftovers from the X port. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 10:21:44 +10:00
Guennadi Liakhovetski	8c6db1bbf8	dma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup Add Start-of-Frame and End-of-Frame debugging to ipu_idmac.c, in the future it might also be needed for the actual video processing in mx3-camera, at which point, the ISRs will have to be transferred to mx3_camera.c, for which ipu_irq_map() and ipu_irq_unmap() functions will have to be exported. Also simplify a couple of pointer-dereferences. Signed-off-by: Guennadi Liakhovetski <lg@denx.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-04-02 16:59:10 -07:00
Huang Weiyi	6c8ad3b07f	glge: remove unused #include <version.h> Remove unused #include <version.h> in drivers/net/qlge/qlge_ethtool. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:46 -07:00
Huang Weiyi	345bec6434	dnet: remove unused #include <version.h> Remove unused #include <version.h> in drivers/net/dnet.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:45 -07:00
Ilpo Järvinen	9eb9362e56	tcp: miscounts due to tcp_fragment pcount reset It seems that trivial reset of pcount to one was not sufficient in tcp_retransmit_skb. Multiple counters experience a positive miscount when skb's pcount gets lowered without the necessary adjustments (depending on skb's sacked bits which exactly), at worst a packets_out miscount can crash at RTO if the write queue is empty! Triggering this requires mss change, so bidir tcp or mtu probe or like. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Uwe Bugla <uwe.bugla@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:45 -07:00
Ilpo Järvinen	797108d134	tcp: add helper for counter tweaking due mid-wq change We need full-scale adjustment to fix a TCP miscount in the next patch, so just move it into a helper and call for that from the other places. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:44 -07:00
Jan Dumon	0de8ca597d	hso: fix for the 'invalid frame length' messages Some devices cannot send very short usb transfers. To get around this the firmware adds a known pattern and flags the driver that it should check for this pattern on short transfers. This flag was not taken into account by the driver. Signed-off-by: Jan Dumon <j.dumon@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:44 -07:00
Jan Dumon	3b7d2b319d	hso: fix for crash when unplugging the device Changed the order in which things are freed. This fixes an oops when unplugging the device while network traffic is ongoing. Signed-off-by: Jan Dumon <j.dumon@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 16:31:43 -07:00
Jesse Barnes	b94ee65289	drm: fix EDID backward compat check EDIDs should be backward compatible, so don't bail if we see a version of 3 (which is out there now) and print a message if we see something newer, but allow it to be parsed. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 09:21:46 +10:00
yakui_zhao	6714977b45	drm: sync the mode validation for INTERLACE/DBLSCAN Check whether the INTERLACE/DBLSCAN is supported by output device. If not, the mode containing the flag of INTERLACE/DBLSCAN will be marked as unsupported. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 09:21:31 +10:00
Dave Airlie	16456c872e	drm: fix typo in edid vendor parsing. Should be, edid_vendor[2] = (edid->mfg_id[1] & 0x1f) + '@'; Since vendor ID has only two bytes only, I am somewhat surprised why gcc doesn't complain this. Reported-by: Guo, Chaohong <chaohong.guo@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 09:10:33 +10:00
Jean Delvare	3c6fc3521a	DRM: drm_crtc_helper.h doesn't actually need i2c.h Remove an include that isn't actually needed to prevent needless rebuilds. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 09:08:25 +10:00
Dave Airlie	522b5cc7ce	drm: fix missing inline function on 32-bit powerpc. The readq/writeq really need to be static inline on the arches which don't provide them. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-04-03 09:07:07 +10:00
Russell King	67a52bb90b	[ARM] fix build-breaking `7a192ec` commit The commit: platform driver: fix incorrect use of 'platform_bus_type' with 'struct device_driver' contains this: -static int __exit pxa2xx_flash_remove(struct device dev) +static int __exit pxa2xx_flash_remove(struct platform_device dev) ... - .remove = __exit_p(pxa2xx_flash_remove), + .remove = __devexit_p(pxa2xx_flash_remove), which leads to the following build error: `pxa2xx_flash_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.exit.text' of drivers/built-in.o This is not the only instance of it in this patch - all __exit_p's touched by this patch have been converted to __devexit_p's without regard to the original function. Let's revert this change and, if we are going to convert functions to be __devexit/__devinit, lets have that as a _separate_ patch doing just that change. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2009-04-02 23:23:43 +01:00
Russell King	cd02938a82	Merge branch 'smsc911x-armplatforms' of git://github.com/steveglen/linux-2.6	2009-04-02 23:22:11 +01:00
Jesse Barnes	1055f9ddad	drm: Use pgprot_writecombine in GEM GTT mapping to get the right bits for !PAT. Otherwise, the PAGE_CACHE_WC would end up getting us a UC-only mapping, and the write performance of GTT maps dropped 10x. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> [anholt: cleaned up unused var] Signed-off-by: Eric Anholt <eric@anholt.net>	2009-04-02 14:28:32 -07:00
Stoyan Gaydarov	c293498be6	Btrfs: BUG to BUG_ON changes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 17:05:11 -04:00
Segher Boessenkool	b6bc978b36	fsl_pq_mdio: Fix compile failure Add EXPORT_SYMBOL_GPL(fsl_pq_mdio_bus_name) for module builds Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-02 13:57:30 -07:00
Dan Carpenter	3e7ad38d20	Btrfs: remove dead code Remove an unneeded return statement and conditional Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Dan Carpenter	ff0a5836ac	Btrfs: remove dead code merge is always NULL at this point. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Wu Fengguang	d4a789474a	Btrfs: fix typos in comments Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Jim Owens	2e966ed22c	Btrfs: remove unused ftrace include Signed-off-by: jim owens <jowens@hp.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 17:02:55 -04:00
Heiko Carstens	93dbfad7ac	Btrfs: fix __ucmpdi2 compile bug on 32 bit builds We get this on 32 builds: fs/built-in.o: In function `extent_fiemap': (.text+0x1019f2): undefined reference to `__ucmpdi2' Happens because of a switch statement with a 64 bit argument. Convert this to an if statement to fix this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:33:45 -04:00
Shen Feng	09771430f3	Btrfs: free inode struct when btrfs_new_inode fails btrfs_new_inode doesn't call iput to free the inode when it fails. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Amit Gud	b5555f7711	Btrfs: fix race in worker_loop Need to check kthread_should_stop after schedule_timeout() before calling schedule(). This causes threads to sleep with potentially no one to wake them up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop. Signed-off-by: Amit Gud <gud@ksu.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 17:01:27 -04:00
Sage Weil	dccae99995	Btrfs: add flushoncommit mount option The 'flushoncommit' mount option forces any data dirtied by a write in a prior transaction to commit as part of the current commit. This makes the committed state a fully consistent view of the file system from the application's perspective (i.e., it includes all completed file system operations). This was previously the behavior only when a snapshot is created. This is used by Ceph to ensure that completed writes make it to the platter along with the metadata operations they are bound to (by BTRFS_IOC_TRANS_{START,END}). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:59:01 -04:00
Sage Weil	3a5e14048a	Btrfs: notreelog mount option Add a 'notreelog' mount option to disable the tree log (used by fsync, O_SYNC writes). This is much slower, but the tree logging produces inconsistent views into the FS for ceph. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:49:40 -04:00
Eric Paris	a9572a15a8	Btrfs: introduce btrfs_show_options btrfs options can change at times other than mount, yet /proc/mounts shows the options string used when the fs was mounted (an example would be when btrfs determines that barriers aren't useful and turns them off.) This patch instead outputs the actual options in use by btrfs. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-02 16:46:06 -04:00
Chris Mason	fa9c0d795f	Btrfs: rework allocation clustering Because btrfs is copy-on-write, we end up picking new locations for blocks very often. This makes it fairly difficult to maintain perfect read patterns over time, but we can at least do some optimizations for writes. This is done today by remembering the last place we allocated and trying to find a free space hole big enough to hold more than just one allocation. The end result is that we tend to write sequentially to the drive. This happens all the time for metadata and it happens for data when mounted -o ssd. But, the way we record it is fairly racey and it tends to fragment the free space over time because we are trying to allocate fairly large areas at once. This commit gets rid of the races by adding a free space cluster object with dedicated locking to make sure that only one process at a time is out replacing the cluster. The free space fragmentation is somewhat solved by allowing a cluster to be comprised of smaller free space extents. This part definitely adds some CPU time to the cluster allocations, but it allows the allocator to consume the small holes left behind by cow. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 09:47:43 -04:00
Chris Mason	8e73f27501	Btrfs: Optimize locking in btrfs_next_leaf() btrfs_next_leaf was using blocking locks when it could have been using faster spinning ones instead. This adds a few extra checks around the pieces that block and switches over to spinning locks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:14:18 -04:00
Chris Mason	c8c42864f6	Btrfs: break up btrfs_search_slot into smaller pieces btrfs_search_slot was doing too many things at once. This breaks it up into more reasonable units. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:14:18 -04:00
Josef Bacik	04018de5d4	Btrfs: kill the pinned_mutex This patch removes the pinned_mutex. The extent io map has an internal tree lock that protects the tree itself, and since we only copy the extent io map when we are committing the transaction we don't need it there. We also don't need it when caching the block group since searching through the tree is also protected by the internal map spin lock. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:18 -04:00
Josef Bacik	6226cb0a5e	Btrfs: kill the block group alloc mutex This patch removes the block group alloc mutex used to protect the free space tree for allocations and replaces it with a spin lock which is used only to protect the free space rb tree. This means we only take the lock when we are directly manipulating the tree, which makes us a touch faster with multi-threaded workloads. This patch also gets rid of btrfs_find_free_space and replaces it with btrfs_find_space_for_alloc, which takes the number of bytes you want to allocate, and empty_size, which is used to indicate how much free space should be at the end of the allocation. It will return an offset for the allocator to use. If we don't end up using it we _must_ call btrfs_add_free_space to put it back. This is the tradeoff to kill the alloc_mutex, since we need to make sure nobody else comes along and takes our space. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:18 -04:00
Josef Bacik	2552d17e32	Btrfs: clean up find_free_extent I've replaced the strange looping constructs with a list_for_each_entry on space_info->block_groups. If we have a hint we just jump into the loop with the block group and start looking for space. If we don't find anything we start at the beginning and start looking. We never come out of the loop with a ref on the block_group _unless_ we found space to use, then we drop it after we set the trans block_group. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:19 -04:00
Josef Bacik	70cb074345	Btrfs: free space cache cleanups This patch cleans up the free space cache code a bit. It better documents the idiosyncrasies of tree_search_offset and makes the code make a bit more sense. I took out the info allocation at the start of __btrfs_add_free_space and put it where it makes more sense. This was left over cruft from when alloc_mutex existed. Also all of the re-searches we do to make sure we inserted properly. Signed-off-by: Josef Bacik <jbacik@redhat.com>	2009-04-03 10:14:19 -04:00
Chris Mason	bedf762ba3	Btrfs: unplug in the async bio submission threads Btrfs pages being written get set to writeback, and then may go through a number of steps before they hit the block layer. This includes compression, checksumming and async bio submission. The end result is that someone who writes a page and then does wait_on_page_writeback is likely to unplug the queue before the bio they cared about got there. We could fix this by marking bios sync, or by doing more frequent unplugs, but this commit just changes the async bio submission code to unplug after it has processed all the bios for a device. The async bio submission does a fair job of collection bios, so this shouldn't be a huge problem for reducing merging at the elevator. For streaming O_DIRECT writes on a 5 drive array, it boosts performance from 386MB/s to 460MB/s. Thanks to Hisashi Hifumi for helping with this work. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:32:58 -04:00
Chris Mason	b765ead57d	Btrfs: keep processing bios for a given bdev if our proc is batching Btrfs uses async helper threads to submit write bios so the checksumming helper threads don't block on the disk. The submit bio threads may process bios for more than one block device, so when they find one device congested they try to move on to other devices instead of blocking in get_request_wait for one device. This does a pretty good job of keeping multiple devices busy, but the congested flag has a number of problems. A congested device may still give you a request, and other procs that aren't backing off the congested device may starve you out. This commit uses the io_context stored in current to decide if our process has been made a batching process by the block layer. If so, it keeps sending IO down for at least one batch. This helps make sure we do a good amount of work each time we visit a bdev, and avoids large IO stalls in multi-device workloads. It's also very ugly. A better solution is in the works with Jens Axboe. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2009-04-03 10:27:10 -04:00
Mikulas Patocka	99360b4c18	dm: set queue ordered mode Set queue ordered mode. It doesn't really matter what we set here because we don't ever put any requests on the queue. But we need to set something other than QUEUE_ORDERED_NONE so that __generic_make_request passes barrier requests to us. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-04-02 19:55:39 +01:00

... 5 6 7 8 9 ...

140463 Commits All Branches Search

140463 Commits

All Branches