linux/fs
Neil Horman e9e3d724e2 nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)
The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:28:52 -08:00
..
9p switch 9p 2011-01-12 20:03:43 -05:00
adfs switch adfs 2011-01-12 20:02:45 -05:00
affs switch affs 2011-01-12 20:03:42 -05:00
afs afs: Fix oops in afs_unlink_writeback 2011-02-25 11:12:37 -08:00
autofs4 autofs4: clean ->d_release() and autofs4_free_ino() up 2011-01-18 01:21:29 -05:00
befs befs: don't pass huge structs by value 2011-01-13 08:03:15 -08:00
bfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
btrfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable 2011-02-25 14:03:39 -08:00
cachefiles llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2011-02-21 15:01:38 -08:00
cifs [CIFS] update cifs version 2011-02-21 22:31:47 +00:00
coda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-01-13 10:27:28 -08:00
configfs configfs: change depends -> select SYSFS 2011-01-16 21:22:29 +00:00
cramfs cramfs: generate unique inode number for better inode cache usage 2011-01-13 08:03:23 -08:00
debugfs convert get_sb_single() users 2010-10-29 04:16:28 -04:00
devpts convert get_sb_single() users 2010-10-29 04:16:28 -04:00
dlm dlm: use single thread workqueues 2011-02-11 16:50:47 -06:00
ecryptfs eCryptfs: Copy up lower inode attrs in getattr 2011-02-21 14:46:36 -06:00
efs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
exofs exofs: i_nlink races in rename() 2011-03-03 01:28:17 -05:00
exportfs fs: dcache per-inode inode alias locking 2011-01-07 17:50:31 +11:00
ext2 ext2: Fix link count corruption under heavy link+rename load 2011-03-02 11:03:52 +01:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 2011-01-21 07:33:37 -08:00
ext4 ext4: serialize unaligned asynchronous DIO 2011-02-12 08:17:34 -05:00
fat switch fat to ->s_d_op, close exportfs races there 2011-01-12 20:02:43 -05:00
freevxfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
fscache FS-Cache: Fix operation handling 2011-01-14 09:23:36 -08:00
fuse fuse: fix truncate after open 2011-02-25 14:44:58 +01:00
gfs2 mm: prevent concurrent unmap_mapping_range() on the same inode 2011-02-23 19:52:52 -08:00
hfs hfs: fix rename() over non-empty directory 2011-03-03 01:28:40 -05:00
hfsplus hfsplus: fix up a comparism in hfsplus_file_extend 2011-02-03 16:34:18 -07:00
hostfs switch hostfs 2011-01-12 20:03:42 -05:00
hpfs hpfs_setattr error case avoids unlock_kernel 2011-01-17 05:11:37 -05:00
hppfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
hugetlbfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
isofs fix isofs d_op handling 2011-01-12 20:02:43 -05:00
jbd fix comment typos concerning "consistent" 2010-12-10 16:04:28 +01:00
jbd2 jbd2: call __jbd2_log_start_commit with j_state_lock write locked 2011-02-12 08:18:24 -05:00
jffs2 Merge git://git.infradead.org/mtd-2.6 2011-01-17 11:15:30 -08:00
jfs Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
lockd NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!" 2011-01-25 15:24:47 -05:00
logfs Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
minix minix: i_nlink races in rename() 2011-03-03 01:28:16 -05:00
ncpfs move internal-only parts of ncpfs headers to fs/ncpfs 2011-01-12 20:03:43 -05:00
nfs nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) 2011-03-04 17:28:52 -08:00
nfs_common NFS: Prevent memory allocation failure in nfsacl_encode() 2011-01-25 15:24:47 -05:00
nfsd nfsd: correctly handle return value from nfsd_map_name_to_* 2011-02-16 18:31:05 -05:00
nilfs2 Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2011-03-03 15:37:59 -08:00
nls
notify Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
ntfs NTFS: Fix invalid pointer dereference in ntfs_mft_record_alloc(). 2011-01-31 12:58:11 +10:00
ocfs2 ocfs2: Check heartbeat mode for kernel stacks only 2011-02-20 02:36:28 -08:00
omfs new helper: mount_bdev() 2010-10-29 04:16:13 -04:00
openpromfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
partitions ldm: corrupted partition table can cause kernel oops 2011-02-25 15:07:36 -08:00
proc of/flattree: Drop an uninteresting message to pr_debug level 2011-03-02 13:45:18 -07:00
qnx4 fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
quota quota: Fix deadlock during path resolution 2011-01-12 19:14:55 +01:00
ramfs convert get_sb_nodev() users 2010-10-29 04:16:31 -04:00
reiserfs fix reiserfs mkdir() breakage 2011-03-03 01:28:40 -05:00
romfs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
squashfs squashfs: fix use of uninitialised variable in zlib & xz decompressors 2011-01-26 10:50:05 +10:00
sysfs kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
sysv sysv: i_nlink races in rename() 2011-03-03 01:28:16 -05:00
ubifs fs: icache RCU free inodes 2011-01-07 17:50:26 +11:00
udf udf: fix i_nlink limit 2011-03-03 01:28:40 -05:00
ufs ufs: i_nlink races in rename() 2011-03-03 01:28:16 -05:00
xfs xfs: zero proper structure size for geometry calls 2011-03-01 21:21:13 -06:00
Kconfig kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
Kconfig.binfmt coredump: default CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y 2010-10-27 18:03:12 -07:00
Makefile Merge 'staging-next' to Linus's tree 2010-10-28 09:44:56 -07:00
aio.c aio: fix race between io_destroy() and io_submit() 2011-02-25 15:07:37 -08:00
anon_inodes.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
attr.c
bad_inode.c fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
binfmt_aout.c Don't dump task struct in a.out core-dumps 2010-10-14 10:57:40 -07:00
binfmt_elf.c binfmt_elf: cleanups 2011-01-13 08:03:12 -08:00
binfmt_elf_fdpic.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c convert get_sb_single() users 2010-10-29 04:16:28 -04:00
binfmt_script.c
binfmt_som.c
bio-integrity.c bio-integrity: mark kintegrityd_wq highpri and CPU intensive 2011-01-03 15:01:48 +01:00
bio.c bio: take care not overflow page count when mapping/copying user data 2010-11-10 14:40:43 +01:00
block_dev.c fs/block_dev.c: fix new kernel-doc warning 2011-02-28 18:08:31 -08:00
buffer.c fs: Use this_cpu_inc_return in buffer.c 2010-12-17 15:18:05 +01:00
char_dev.c Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
compat.c compat: copy missing fields in compat_statfs64 to user 2011-01-17 04:54:38 -05:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 2011-01-07 14:39:20 -08:00
dcache.c fs: fix new dcache.c kernel-doc warnings 2011-01-22 20:32:38 -08:00
dcookies.c
direct-io.c fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio 2011-01-20 17:02:05 -08:00
drop_caches.c
eventfd.c Docbook: add fs/eventfd.c and fix typos in it 2011-02-21 15:07:04 -08:00
eventpoll.c epoll: prevent creating circular epoll structures 2011-02-25 15:07:36 -08:00
exec.c vfs: sparse: add __FMODE_EXEC 2011-02-02 16:03:19 -08:00
fcntl.c vfs: sparse: add __FMODE_EXEC 2011-02-02 16:03:19 -08:00
fifo.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
file.c
file_table.c CRED: Fix kernel panic upon security_file_alloc() failure. 2011-02-04 10:40:29 -08:00
filesystems.c fs: rcu-walk for path lookup 2011-01-07 17:50:27 +11:00
fs-writeback.c fs/fs-writeback.c: fix sync_inodes_sb() return value kernel-doc 2011-01-13 17:32:48 -08:00
fs_struct.c sanitize vfsmount refcounting changes 2011-01-16 13:47:07 -05:00
generic_acl.c fs: provide simple rcu-walk generic_check_acl implementation 2011-01-07 17:50:29 +11:00
inode.c Merge branch 'for-linus' of git://neil.brown.name/md 2011-02-25 11:13:26 -08:00
internal.h Fix over-zealous flush_disk when changing device size. 2011-02-24 17:25:47 +11:00
ioctl.c fs: make block fiemap mapping length at least blocksize long 2011-02-02 16:03:20 -08:00
ioprio.c ioprio: grab rcu_read_lock in sys_ioprio_{set,get}() 2010-11-15 10:23:31 +01:00
libfs.c pass default dentry_operations to mount_pseudo() 2011-01-12 20:03:43 -05:00
locks.c Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux 2011-01-14 13:17:26 -08:00
mbcache.c ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG 2011-01-10 19:04:08 +01:00
mpage.c fs/mpage.c: consolidate code 2011-01-13 17:32:32 -08:00
namei.c vfs: fix BUG_ON() in fs/namei.c:1461 2011-02-16 08:56:55 -08:00
namespace.c Unlock vfsmount_lock in do_umount 2011-02-24 02:10:57 -05:00
nfsctl.c
no-block.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
open.c Fix possible filp_cachep memory corruption 2011-02-11 15:53:38 -08:00
pipe.c Fix broken "pipe: use event aware wakeups" optimization 2011-01-20 16:21:59 -08:00
pnode.c fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
pnode.h
posix_acl.c NFS: Prevent memory allocation failure in nfsacl_encode() 2011-01-25 15:24:47 -05:00
read_write.c fix signedness mess in rw_verify_area() on 64bit architectures 2011-01-12 20:06:58 -05:00
read_write.h
readdir.c
select.c fs/select.c: fix information leak to userspace 2011-01-13 08:03:12 -08:00
seq_file.c fs: take dcache_lock inside __d_path 2010-10-25 21:26:12 -04:00
signalfd.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2010-10-26 10:13:10 -07:00
splice.c Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
stack.c
stat.c Add an AT_NO_AUTOMOUNT flag to suppress terminal automount 2011-01-15 20:07:33 -05:00
statfs.c
super.c vfs: call rcu_barrier after ->kill_sb() 2011-02-11 16:12:19 -08:00
sync.c
timerfd.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
utimes.c
xattr.c
xattr_acl.c