linux

Author	SHA1	Message	Date
Eric W. Biederman	5e4a08476b	userns: Require CAP_SYS_ADMIN for most uses of setns. Andy Lutomirski <luto@amacapital.net> found a nasty little bug in the permissions of setns. With unprivileged user namespaces it became possible to create new namespaces without privilege. However the setns calls were relaxed to only require CAP_SYS_ADMIN in the user nameapce of the targed namespace. Which made the following nasty sequence possible. pid = clone(CLONE_NEWUSER \| CLONE_NEWNS); if (pid == 0) { /* child / system("mount --bind /home/me/passwd /etc/passwd"); } else if (pid != 0) { / parent */ char path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/%u/ns/mnt"); fd = open(path, O_RDONLY); setns(fd, 0); system("su -"); } Prevent this possibility by requiring CAP_SYS_ADMIN in the current user namespace when joing all but the user namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-12-14 16:12:03 -08:00
Eric W. Biederman	520d9eabce	Fix cap_capable to only allow owners in the parent user namespace to have caps. Andy Lutomirski pointed out that the current behavior of allowing the owner of a user namespace to have all caps when that owner is not in a parent user namespace is wrong. Add a test to ensure the owner of a user namespace is in the parent of the user namespace to fix this bug. Thankfully this bug did not apply to the initial user namespace, keeping the mischief that can be caused by this bug quite small. This is bug was introduced in v3.5 by commit `783291e690` "Simplify the user_namespace by making userns->creator a kuid." But did not matter until the permisions required to create a user namespace were relaxed allowing a user namespace to be created inside of a user namespace. The bug made it possible for the owner of a user namespace to be present in a child user namespace. Since the owner of a user nameapce is granted all capabilities it became possible for users in a grandchild user namespace to have all privilges over their parent user namspace. Reorder the checks in cap_capable. This should make the common case faster and make it clear that nothing magic happens in the initial user namespace. The reordering is safe because cred->user_ns can only be in targ_ns or targ_ns->parent but not both. Add a comment a the top of the loop to make the logic of the code clear. Add a distinct variable ns that changes as we walk up the user namespace hierarchy to make it clear which variable is changing. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-12-14 13:50:32 -08:00
Eric W. Biederman	98f842e675	proc: Usable inode numbers for the namespace file descriptors. Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-20 04:19:49 -08:00
Eric W. Biederman	bf056bfa80	proc: Fix the namespace inode permission checks. Change the proc namespace files into symlinks so that we won't cache the dentries for the namespace files which can bypass the ptrace_may_access checks. To support the symlinks create an additional namespace inode with it's own set of operations distinct from the proc pid inode and dentry methods as those no longer make sense. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-20 04:19:48 -08:00
Eric W. Biederman	33d6dce607	proc: Generalize proc inode allocation Generalize the proc inode allocation so that it can be used without having to having to create a proc_dir_entry. This will allow namespace file descriptors to remain light weight entitities but still have the same inode number when the backing namespace is the same. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-20 04:19:19 -08:00
Eric W. Biederman	4f326c0064	userns: Allow unprivilged mounts of proc and sysfs - The context in which proc and sysfs are mounted have no effect on the the uid/gid of their files so no conversion is needed except allowing the mount. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:19:18 -08:00
Eric W. Biederman	c450f371d4	userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file To keep things sane in the context of file descriptor passing derive the user namespace that uids are mapped into from the opener of the file instead of from current. When writing to the maps file the lower user namespace must always be the parent user namespace, or setting the mapping simply does not make sense. Enforce that the opener of the file was in the parent user namespace or the user namespace whose mapping is being set. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:18:55 -08:00
Eric W. Biederman	e9f238c304	procfs: Print task uids and gids in the userns that opened the proc file Instead of using current_userns() use the userns of the opener of the file so that if the file is passed between processes the contents of the file do not change. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:18:15 -08:00
Eric W. Biederman	b2e0d98705	userns: Implement unshare of the user namespace - Add CLONE_THREAD to the unshare flags if CLONE_NEWUSER is selected As changing user namespaces is only valid if all there is only a single thread. - Restore the code to add CLONE_VM if CLONE_THREAD is selected and the code to addCLONE_SIGHAND if CLONE_VM is selected. Making the constraints in the code clear. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:18:14 -08:00
Eric W. Biederman	cde1975bc2	userns: Implent proc namespace operations This allows entering a user namespace, and the ability to store a reference to a user namespace with a bind mount. Addition of missing userns_ns_put in userns_install from Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:18:13 -08:00
Eric W. Biederman	4c44aaafa8	userns: Kill task_user_ns The task_user_ns function hides the fact that it is getting the user namespace from struct cred on the task. struct cred may go away as soon as the rcu lock is released. This leads to a race where we can dereference a stale user namespace pointer. To make it obvious a struct cred is involved kill task_user_ns. To kill the race modify the users of task_user_ns to only reference the user namespace while the rcu lock is held. Cc: Kees Cook <keescook@chromium.org> Cc: James Morris <james.l.morris@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:17:44 -08:00
Eric W. Biederman	bcf58e725d	userns: Make create_new_namespaces take a user_ns parameter Modify create_new_namespaces to explicitly take a user namespace parameter, instead of implicitly through the task_struct. This allows an implementation of unshare(CLONE_NEWUSER) where the new user namespace is not stored onto the current task_struct until after all of the namespaces are created. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:17:43 -08:00
Eric W. Biederman	142e1d1d5f	userns: Allow unprivileged use of setns. - Push the permission check from the core setns syscall into the setns install methods where the user namespace of the target namespace can be determined, and used in a ns_capable call. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:17:42 -08:00
Eric W. Biederman	b33c77ef23	userns: Allow unprivileged users to create new namespaces If an unprivileged user has the appropriate capabilities in their current user namespace allow the creation of new namespaces. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:17:41 -08:00
Eric W. Biederman	37657da3c5	userns: Allow setting a userns mapping to your current uid. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:17:40 -08:00
Eric W. Biederman	7fa294c899	userns: Allow chown and setgid preservation - Allow chown if CAP_CHOWN is present in the current user namespace and the uid of the inode maps into the current user namespace, and the destination uid or gid maps into the current user namespace. - Allow perserving setgid when changing an inode if CAP_FSETID is present in the current user namespace and the owner of the file has a mapping into the current user namespace. Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-20 04:17:24 -08:00
Eric W. Biederman	5eaf563e53	userns: Allow unprivileged users to create user namespaces. Now that we have been through every permission check in the kernel having uid == 0 and gid == 0 in your local user namespace no longer adds any special privileges. Even having a full set of caps in your local user namespace is safe because capabilies are relative to your local user namespace, and do not confer unexpected privileges. Over the long term this should allow much more of the kernels functionality to be safely used by non-root users. Functionality like unsharing the mount namespace that is only unsafe because it can fool applications whose privileges are raised when they are executed. Since those applications have no privileges in a user namespaces it becomes safe to spoof and confuse those applications all you want. Those capabilities will still need to be enabled carefully because we may still need things like rlimits on the number of unprivileged mounts but that is to avoid DOS attacks not to avoid fooling root owned processes. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:24 -08:00
Eric W. Biederman	3cdf5b45ff	userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped When performing an exec where the binary lives in one user namespace and the execing process lives in another usre namespace there is the possibility that the target uids can not be represented. Instead of failing the exec simply ignore the suid/sgid bits and run the binary with lower privileges. We already do this in the case of MNT_NOSUID so this should be a well tested code path. As the user and group are not changed this should not introduce any security issues. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:23 -08:00
Zhao Hongjiang	ae11e0f184	userns: fix return value on mntns_install() failure Change return value from -EINVAL to -EPERM when the permission check fails. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:22 -08:00
Eric W. Biederman	0c55cfc416	vfs: Allow unprivileged manipulation of the mount namespace. - Add a filesystem flag to mark filesystems that are safe to mount as an unprivileged user. - Add a filesystem flag to mark filesystems that don't need MNT_NODEV when mounted by an unprivileged user. - Relax the permission checks to allow unprivileged users that have CAP_SYS_ADMIN permissions in the user namespace referred to by the current mount namespace to be allowed to mount, unmount, and move filesystems. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:21 -08:00
Eric W. Biederman	7a472ef4be	vfs: Only support slave subtrees across different user namespaces Sharing mount subtress with mount namespaces created by unprivileged users allows unprivileged mounts created by unprivileged users to propagate to mount namespaces controlled by privileged users. Prevent nasty consequences by changing shared subtrees to slave subtress when an unprivileged users creates a new mount namespace. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:20 -08:00
Eric W. Biederman	771b137168	vfs: Add a user namespace reference from struct mnt_namespace This will allow for support for unprivileged mounts in a new user namespace. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:19 -08:00
Eric W. Biederman	8823c079ba	vfs: Add setns support for the mount namespace setns support for the mount namespace is a little tricky as an arbitrary decision must be made about what to set fs->root and fs->pwd to, as there is no expectation of a relationship between the two mount namespaces. Therefore I arbitrarily find the root mount point, and follow every mount on top of it to find the top of the mount stack. Then I set fs->root and fs->pwd to that location. The topmost root of the mount stack seems like a reasonable place to be. Bind mount support for the mount namespace inodes has the possibility of creating circular dependencies between mount namespaces. Circular dependencies can result in loops that prevent mount namespaces from every being freed. I avoid creating those circular dependencies by adding a sequence number to the mount namespace and require all bind mounts be of a younger mount namespace into an older mount namespace. Add a helper function proc_ns_inode so it is possible to detect when we are attempting to bind mound a namespace inode. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:18 -08:00
Eric W. Biederman	a85fb273c9	vfs: Allow chroot if you have CAP_SYS_CHROOT in your user namespace Once you are confined to a user namespace applications can not gain privilege and escape the user namespace so there is no longer a reason to restrict chroot. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:17 -08:00
Eric W. Biederman	50804fe373	pidns: Support unsharing the pid namespace. Unsharing of the pid namespace unlike unsharing of other namespaces does not take affect immediately. Instead it affects the children created with fork and clone. The first of these children becomes the init process of the new pid namespace, the rest become oddball children of pid 0. From the point of view of the new pid namespace the process that created it is pid 0, as it's pid does not map. A couple of different semantics were considered but this one was settled on because it is easy to implement and it is usable from pam modules. The core reasons for the existence of unshare. I took a survey of the callers of pam modules and the following appears to be a representative sample of their logic. { setup stuff include pam child = fork(); if (!child) { setuid() exec /bin/bash } waitpid(child); pam and other cleanup } As you can see there is a fork to create the unprivileged user space process. Which means that the unprivileged user space process will appear as pid 1 in the new pid namespace. Further most login processes do not cope with extraneous children which means shifting the duty of reaping extraneous child process to the creator of those extraneous children makes the system more comprehensible. The practical reason for this set of pid namespace semantics is that it is simple to implement and verify they work correctly. Whereas an implementation that requres changing the struct pid on a process comes with a lot more races and pain. Not the least of which is that glibc caches getpid(). These semantics are implemented by having two notions of the pid namespace of a proces. There is task_active_pid_ns which is the pid namspace the process was created with and the pid namespace that all pids are presented to that process in. The task_active_pid_ns is stored in the struct pid of the task. Then there is the pid namespace that will be used for children that pid namespace is stored in task->nsproxy->pid_ns. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:16 -08:00
Eric W. Biederman	1c4042c29b	pidns: Consolidate initialzation of special init task state Instead of setting child_reaper and SIGNAL_UNKILLABLE one way for the system init process, and another way for pid namespace init processes test pid->nr == 1 and use the same code for both. For the global init this results in SIGNAL_UNKILLABLE being set much earlier in the initialization process. This is a small cleanup and it paves the way for allowing unshare and enter of the pid namespace as that path like our global init also will not set CLONE_NEWPID. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:15 -08:00
Eric W. Biederman	57e8391d32	pidns: Add setns support - Pid namespaces are designed to be inescapable so verify that the passed in pid namespace is a child of the currently active pid namespace or the currently active pid namespace itself. Allowing the currently active pid namespace is important so the effects of an earlier setns can be cancelled. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:14 -08:00
Eric W. Biederman	225778d68d	pidns: Deny strange cases when creating pid namespaces. task_active_pid_ns(current) != current->ns_proxy->pid_ns will soon be allowed to support unshare and setns. The definition of creating a child pid namespace when task_active_pid_ns(current) != current->ns_proxy->pid_ns could be that we create a child pid namespace of current->ns_proxy->pid_ns. However that leads to strange cases like trying to have a single process be init in multiple pid namespaces, which is racy and hard to think about. The definition of creating a child pid namespace when task_active_pid_ns(current) != current->ns_proxy->pid_ns could be that we create a child pid namespace of task_active_pid_ns(current). While that seems less racy it does not provide any utility. Therefore define the semantics of creating a child pid namespace when task_active_pid_ns(current) != current->ns_proxy->pid_ns to be that the pid namespace creation fails. That is easy to implement and easy to think about. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:13 -08:00
Eric W. Biederman	af4b8a83ad	pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1 Looking at pid_ns->nr_hashed is a bit simpler and it works for disjoint process trees that an unshare or a join of a pid_namespace may create. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:12 -08:00
Eric W. Biederman	5e1182deb8	pidns: Don't allow new processes in a dead pid namespace. Set nr_hashed to -1 just before we schedule the work to cleanup proc. Test nr_hashed just before we hash a new pid and if nr_hashed is < 0 fail. This guaranteees that processes never enter a pid namespaces after we have cleaned up the state to support processes in a pid namespace. Currently sending SIGKILL to all of the process in a pid namespace as init exists gives us this guarantee but we need something a little stronger to support unsharing and joining a pid namespace. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:11 -08:00
Eric W. Biederman	0a01f2cc39	pidns: Make the pidns proc mount/umount logic obvious. Track the number of pids in the proc hash table. When the number of pids goes to 0 schedule work to unmount the kernel mount of proc. Move the mount of proc into alloc_pid when we allocate the pid for init. Remove the surprising calls of pid_ns_release proc in fork and proc_flush_task. Those code paths really shouldn't know about proc namespace implementation details and people have demonstrated several times that finding and understanding those code paths is difficult and non-obvious. Because of the call path detach pid is alwasy called with the rtnl_lock held free_pid is not allowed to sleep, so the work to unmounting proc is moved to a work queue. This has the side benefit of not blocking the entire world waiting for the unnecessary rcu_barrier in deactivate_locked_super. In the process of making the code clear and obvious this fixes a bug reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a mount of proc during clone(CLONE_NEWPID\|CLONE_NEWNET) if copy_pid_ns succeeded and copy_net_ns failed. Acked-by: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:59:10 -08:00
Eric W. Biederman	17cf22c33e	pidns: Use task_active_pid_ns where appropriate The expressions tsk->nsproxy->pid_ns and task_active_pid_ns aka ns_of_pid(task_pid(tsk)) should have the same number of cache line misses with the practical difference that ns_of_pid(task_pid(tsk)) is released later in a processes life. Furthermore by using task_active_pid_ns it becomes trivial to write an unshare implementation for the the pid namespace. So I have used task_active_pid_ns everywhere I can. In fork since the pid has not yet been attached to the process I use ns_of_pid, to achieve the same effect. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 05:59:09 -08:00
Eric W. Biederman	49f4d8b93c	pidns: Capture the user namespace and filter ns_last_pid - Capture the the user namespace that creates the pid namespace - Use that user namespace to test if it is ok to write to /proc/sys/kernel/ns_last_pid. Zhao Hongjiang <zhaohongjiang@huawei.com> noticed I was missing a put_user_ns in when destroying a pid_ns. I have foloded his patch into this one so that bisects will work properly. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 05:57:31 -08:00
Eric W. Biederman	ae06c7c83f	procfs: Don't cache a pid in the root inode. Now that we have s_fs_info pointing to our pid namespace the original reason for the proc root inode having a struct pid is gone. Caching a pid in the root inode has led to some complicated code. Now that we don't need the struct pid, just remove it. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 03:09:35 -08:00
Eric W. Biederman	e656d8a6f7	procfs: Use the proc generic infrastructure for proc/self. I had visions at one point of splitting proc into two filesystems. If that had happened proc/self being the the part of proc that actually deals with pids would have been a nice cleanup. As it is proc/self requires a lot of unnecessary infrastructure for a single file. The only user visible change is that a mounted /proc for a pid namespace that is dead now shows a broken proc symlink, instead of being completely invisible. I don't think anyone will notice or care. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-19 03:09:34 -08:00
Eric W. Biederman	dd34ad35c3	userns: On mips modify check_same_owner to use uid_eq The kbuild test robot <fengguang.wu@intel.com> report the following error when building mips with user namespace support enabled. All error/warnings: arch/mips/kernel/mips-mt-fpaff.c: In function 'check_same_owner': arch/mips/kernel/mips-mt-fpaff.c:53:22: error: invalid operands to binary == (have 'kuid_t' and 'kuid_t') arch/mips/kernel/mips-mt-fpaff.c:54:15: error: invalid operands to binary == (have 'kuid_t' and 'kuid_t') Replace "a == b" with uid_eq(a, b) removes this error and allows the code to work with user namespaces enabled. Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-11-19 03:05:15 -08:00
Eric W. Biederman	038e7332b8	userns: make each net (net_ns) belong to a user_ns The user namespace which creates a new network namespace owns that namespace and all resources created in it. This way we can target capability checks for privileged operations against network resources to the user_ns which created the network namespace in which the resource lives. Privilege to the user namespace which owns the network namespace, or any parent user namespace thereof, provides the same privilege to the network resource. This patch is reworked from a version originally by Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-18 22:46:23 -08:00
Eric W. Biederman	d727abcb23	netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS The copy of copy_net_ns used when the network stack is not built is broken as it does not return -EINVAL when attempting to create a new network namespace. We don't even have a previous network namespace. Since we need a copy of copy_net_ns in net/net_namespace.h that is available when the networking stack is not built at all move the correct version of copy_net_ns from net_namespace.c into net_namespace.h Leaving us with just 2 versions of copy_net_ns. One version for when we compile in network namespace suport and another stub for all other occasions. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-18 22:46:19 -08:00
Eric W. Biederman	499dcf2024	userns: Support fuse interacting with multiple user namespaces Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-14 22:05:33 -08:00
Eric W. Biederman	45634cd8cb	userns: Support autofs4 interacing with multiple user namespaces Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue. When creating directories and symlinks default the uid and gid of the mount requester to the global root uid and gid. autofs4_wait will update these fields when a mount is requested. When generating autofsv5 packets report the uid and gid of the mount requestor in user namespace of the process that opened the pipe, reporting unmapped uids and gids as overflowuid and overflowgid. In autofs_dev_ioctl_requester return the uid and gid of the last mount requester converted into the calling processes user namespace. When the uid or gid don't map return overflowuid and overflowgid as appropriate, allowing failure to find a mount requester to be distinguished from failure to map a mount requester. The uid and gid mount options specifying the user and group of the root autofs inode are converted into kuid and kgid as they are parsed defaulting to the current uid and current gid of the process that mounts autofs. Mounting of autofs for the present remains confined to processes in the initial user namespace. Cc: Ian Kent <raven@themaw.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-14 22:05:32 -08:00
Linus Torvalds	8f0d8163b5	Linux 3.7-rc3	2012-10-28 12:24:48 -07:00
Linus Torvalds	5a5210c6ad	With the v3.7-rc2 kernel, the network cards on my target boxes were not being brought up. I found that the modules for the network was not being installed. This was due to the config CONFIG_MODULES_USE_ELF_RELA that came before CONFIG_MODULES, and confused ktest in thinking that CONFIG_MODULES=y was not found. Ktest needs to test all configs and not just stop if something starts with CONFIG_MODULES. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJQig8hAAoJEOdOSU1xswtMkFgIALLcnba79RHsPdGCTX247Hcg UdteytZgyd1XayDSPLOVAR5f1vJeZ/6/L5dwWqZpf+j6wUTBwdUTc4DlBwHNpi8V XDKbwAYWAQp4BVaQkKcrxKZZepE791NWxCelG7T7S0d7jIkwFTA4BhT4+8+QFztZ H5IDL+HA73Jvehfv3gpJW6yDQ/QSyUjIK4QCsJS+wodB9nDzhAEiZ6ZKflSXFGq4 J+Fl/UfRfnA+j0aP75ecL7hewfdiLOmK67vKvW3l8wZ7s0y3NnIsxymmaa6sTIAQ lIAsmSPdqOzXExIKLBHnsHCog6UW4a91MmEqM05IDpt+AcCnwDbk4EfbJEXa8ug= =vl1t -----END PGP SIGNATURE----- Merge tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest confusion fix from Steven Rostedt: "With the v3.7-rc2 kernel, the network cards on my target boxes were not being brought up. I found that the modules for the network was not being installed. This was due to the config CONFIG_MODULES_USE_ELF_RELA that came before CONFIG_MODULES, and confused ktest in thinking that CONFIG_MODULES=y was not found. Ktest needs to test all configs and not just stop if something starts with CONFIG_MODULES." * tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA	2012-10-28 11:14:52 -07:00
Linus Torvalds	8e99165a6f	spi: Some minor MXS fixes These fixes are both pretty minor ones and are driver local. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQjEZeAAoJELSic+t+oim9RyMP/RjRpB1GanO9CAYpW9cQ7R69 vPmXfvUMWBTojC70EYvHV8juvbpJCBOLlUwbgFBgnmWasNd7lqOlyOAJ4NlhPDTs 6+iHenM/lDBTZ/9bQUuW3DFsUW172R2j1wI7kVsxqLDKDKdgch5YuTxXiTMKcKAB dX4rJ2m0X/6VGx7D3wIeNz+LPW7+4jbJLf5P4bC4Yg1xxpG6QxPPraXLznElFme8 yfrW3Vo6BnXiL29YDF95RTpLhcZ8ZVM0juT2VJPQ8EvcZLcWpywqdMV8EnHRJrgT BlXT2xuxJsH5et0KYrgFAinbEdwnbIHHu31hKVzUddZ0j2BLYtpfv27f84bE9DcW c1QMu41yf/KYzLwgBNDpuA/q3/8SJEeaK/c17TbEvjZXwu8rZXCGHHnMRCIveEiC B9wFzAX1xo44YOQGXYrEEP/QkTZAUUvMHJl36/ErWGk5RaiHSFwgzb5JdxGgVdnO NvwjRiyDLWUV1WScD91D662/7D3njuKM8Ft1pq5WUxJYWQY3g8rQ09E26aMyY4w4 +H3GDKS6gXTZwoRSFWBOETnCSKorM98nV2pE7JQcfaW+GZ7x7VYPp86Dw8JtAXiZ 4a8X9LaNmxUXIKDJqjwzcdk5lH+KkhodaSTbYnPZbf0VezlNvcp8MMmSHMZ9gyg/ 9CcFfKmZMJ7alSGccohc =3xEy -----END PGP SIGNATURE----- Merge tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc Pull minor spi MXS fixes from Mark Brown: "These fixes are both pretty minor ones and are driver local." * tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc: spi: mxs: Terminate DMA in case of DMA timeout spi: mxs: Assign message status after transfer finished	2012-10-28 11:13:54 -07:00
Linus Torvalds	065c8012b2	arm-soc: fixes for v3.7-rc3 Bug fixes for a number of ARM platforms, mostly OMAP, imx and at91. These come a little later than I had hoped but unfortunately we had a few of these patches cause regressions themselves and had to work out how to deal with those in the meantime. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIVAwUAUIwDWmCrR//JCVInAQJREBAAwkH8kI/Xl3JqTfP4A69P8fOdD0p1ZC08 QHzRdgXixpssiIC2wKwM4N4Ine23p1sbGIHHjnDMyTytFXGl7RIRjIXucm3NVBq5 bw5uW5HziO8Pg+uA0ieZiqDEvroIw6U0AxKEKrZ9Fpc9XBr9RArIsRtTNyoFli+2 JBgQ5eHYq4cq3cmX1XkU4q7RVUUA6XE/Vqs9IT6dfK4x56RR0Huri/ldkxqsLNj+ HdN+7QoTz4wUjhF1tqCZt/3bo1dUONpDu4DJPnzscQA77HplQsSF3MsY5AEajjsA 8mKG6AOjmvZsqJFjGYsq/r4DerPj2ME+1z84y5xrMI5WUxJL/6fj5uGTNsdVxifW scywLEG9bRjCehgoAg26XZWNKy6NuzkONxR9fjbrj9vGopje23VT5OXgeygesUD2 WTbI3qeZz/O1esDBQ9D025K3a9kTCsJltstO2oVubGWgqvG2oK8LTqjeu8DwM2ti tloNQmylOKOaxnYm9TSouDRpQ0MPFVxMxe1VwFxzry7Mz3+lfyC2/fiYpZLC+OgQ 2TjclUB4aIXLPVJAsAxu9Z8vEhx11EtghkeWy5Hk4TT3dXgn77MnyAPWp594DjQ0 WdHrCNCK+K0Kk7R2FDkaZi2CvdCd1+AS6xyXjO3CmA7HbWLDEUlRg4/24/AzLK3j rO+bw62yQKg= =IDdm -----END PGP SIGNATURE----- Merge tag 'fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull arm-soc fixes from Arnd Bergmann: "Bug fixes for a number of ARM platforms, mostly OMAP, imx and at91. These come a little later than I had hoped but unfortunately we had a few of these patches cause regressions themselves and had to work out how to deal with those in the meantime." * tag 'fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits) Revert "ARM i.MX25: Fix PWM per clock lookups" ARM: versatile: fix versatile_defconfig ARM: mvebu: update defconfig with 3.7 changes ARM: at91: fix at91x40 build ARM: socfpga: Fix socfpga compilation with early_printk() enabled ARM: SPEAr: Remove unused empty files MAINTAINERS: Add arm-soc tree entry ARM: dts: mxs: add the "clock-names" for gpmi-nand ARM: ux500: Correct SDI5 address and add some format changes ARM: ux500: Specify AMBA Primecell IDs for Nomadik I2C in DT ARM: ux500: Fix build error relating to IRQCHIP_SKIP_SET_WAKE ARM: at91: drop duplicated config SOC_AT91SAM9 entry ARM: at91/i2c: change id to let i2c-at91 work ARM: at91/i2c: change id to let i2c-gpio work ARM: at91/dts: at91sam9g20ek_common: Fix typos in buttons labels. ARM: at91: fix external interrupt specification in board code ARM: at91: fix external interrupts in non-DT case ARM: at91: at91sam9g10: fix SOC type detection ARM: at91/tc: fix typo in the DT document ARM: AM33XX: Fix configuration of dmtimer parent clock by dmtimer driverDate:Wed, 17 Oct 2012 13:55:55 -0500 ...	2012-10-28 11:12:38 -07:00
Mikulas Patocka	1a25b1c4ce	Lock splice_read and splice_write functions Functions generic_file_splice_read and generic_file_splice_write access the pagecache directly. For block devices these functions must be locked so that block size is not changed while they are in progress. This patch is an additional fix for commit `b87570f5d3` ("Fix a crash when block device is read and block size is changed at the same time") that locked aio_read, aio_write and mmap against block size change. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-28 10:59:37 -07:00
Mikulas Patocka	1bf11c5353	percpu-rw-semaphores: use rcu_read_lock_sched Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu. This is an optimization. The RCU-protected region is very small, so there will be no latency problems if we disable preempt in this region. So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates to preempt_disable / preempt_disable. It is smaller (and supposedly faster) than preemptible rcu_read_lock / rcu_read_unlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-28 10:59:36 -07:00
Mikulas Patocka	5c1eabe685	percpu-rw-semaphores: use light/heavy barriers This patch introduces new barrier pair light_mb() and heavy_mb() for percpu rw semaphores. This patch fixes a bug in percpu-rw-semaphores where a barrier was missing in percpu_up_write. This patch improves performance on the read path of percpu-rw-semaphores: on non-x86 cpus, there was a smp_mb() in percpu_up_read. This patch changes it to a compiler barrier and removes the "#if defined(X86) ..." condition. From: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-28 10:59:36 -07:00
Arnd Bergmann	943bb48755	Revert "ARM i.MX25: Fix PWM per clock lookups" This reverts commit `92063cee11`, it was applied prematurely, causing this build error for imx_v4_v5_defconfig: arch/arm/mach-imx/clk-imx25.c: In function 'mx25_clocks_init': arch/arm/mach-imx/clk-imx25.c:206:26: error: 'pwm_ipg_per' undeclared (first use in this function) arch/arm/mach-imx/clk-imx25.c:206:26: note: each undeclared identifier is reported only once for each function it appears in Sascha Hauer explains: > There are several gates missing in clk-imx25.c. I have a patch which > adds support for them and I seem to have missed that the above depends > on it. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2012-10-27 17:46:56 +02:00
Arnd Bergmann	5b627ba0f5	ARM: versatile: fix versatile_defconfig With the introduction of CONFIG_ARCH_MULTIPLATFORM, versatile is no longer the default platform, so we need to enable CONFIG_ARCH_VERSATILE explicitly in order for that to be selected rather than the multiplatform configuration. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2012-10-27 17:46:56 +02:00
Thomas Petazzoni	e09348c757	ARM: mvebu: update defconfig with 3.7 changes The split of 370 and XP into two Kconfig options and the multiplatform kernel support has changed a few Kconfig symbols, so let's update the mvebu_defconfig file with the latest changes. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2012-10-27 17:46:55 +02:00

1 2 3 4 5 ...

335263 Commits