Commit Graph

335263 Commits

Author SHA1 Message Date
Eric W. Biederman
5e4a08476b userns: Require CAP_SYS_ADMIN for most uses of setns.
Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-14 16:12:03 -08:00
Eric W. Biederman
520d9eabce Fix cap_capable to only allow owners in the parent user namespace to have caps.
Andy Lutomirski pointed out that the current behavior of allowing the
owner of a user namespace to have all caps when that owner is not in a
parent user namespace is wrong.  Add a test to ensure the owner of a user
namespace is in the parent of the user namespace to fix this bug.

Thankfully this bug did not apply to the initial user namespace, keeping
the mischief that can be caused by this bug quite small.

This is bug was introduced in v3.5 by commit 783291e690
"Simplify the user_namespace by making userns->creator a kuid."
But did not matter until the permisions required to create
a user namespace were relaxed allowing a user namespace to be created
inside of a user namespace.

The bug made it possible for the owner of a user namespace to be
present in a child user namespace.  Since the owner of a user nameapce
is granted all capabilities it became possible for users in a
grandchild user namespace to have all privilges over their parent user
namspace.

Reorder the checks in cap_capable.  This should make the common case
faster and make it clear that nothing magic happens in the initial
user namespace.  The reordering is safe because cred->user_ns
can only be in targ_ns or targ_ns->parent but not both.

Add a comment a the top of the loop to make the logic of
the code clear.

Add a distinct variable ns that changes as we walk up
the user namespace hierarchy to make it clear which variable
is changing.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-14 13:50:32 -08:00
Eric W. Biederman
98f842e675 proc: Usable inode numbers for the namespace file descriptors.
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:49 -08:00
Eric W. Biederman
bf056bfa80 proc: Fix the namespace inode permission checks.
Change the proc namespace files into symlinks so that
we won't cache the dentries for the namespace files
which can bypass the ptrace_may_access checks.

To support the symlinks create an additional namespace
inode with it's own set of operations distinct from the
proc pid inode and dentry methods as those no longer
make sense.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:48 -08:00
Eric W. Biederman
33d6dce607 proc: Generalize proc inode allocation
Generalize the proc inode allocation so that it can be
used without having to having to create a proc_dir_entry.

This will allow namespace file descriptors to remain light
weight entitities but still have the same inode number
when the backing namespace is the same.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-20 04:19:19 -08:00
Eric W. Biederman
4f326c0064 userns: Allow unprivilged mounts of proc and sysfs
- The context in which proc and sysfs are mounted have no
  effect on the the uid/gid of their files so no conversion is
  needed except allowing the mount.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:19:18 -08:00
Eric W. Biederman
c450f371d4 userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
To keep things sane in the context of file descriptor passing derive the
user namespace that uids are mapped into from the opener of the file
instead of from current.

When writing to the maps file the lower user namespace must always
be the parent user namespace, or setting the mapping simply does
not make sense.  Enforce that the opener of the file was in
the parent user namespace or the user namespace whose mapping
is being set.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:55 -08:00
Eric W. Biederman
e9f238c304 procfs: Print task uids and gids in the userns that opened the proc file
Instead of using current_userns() use the userns of the opener
of the file so that if the file is passed between processes
the contents of the file do not change.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:15 -08:00
Eric W. Biederman
b2e0d98705 userns: Implement unshare of the user namespace
- Add CLONE_THREAD to the unshare flags if CLONE_NEWUSER is selected
  As changing user namespaces is only valid if all there is only
  a single thread.
- Restore the code to add CLONE_VM if CLONE_THREAD is selected and
  the code to addCLONE_SIGHAND if CLONE_VM is selected.
  Making the constraints in the code clear.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:14 -08:00
Eric W. Biederman
cde1975bc2 userns: Implent proc namespace operations
This allows entering a user namespace, and the ability
to store a reference to a user namespace with a bind
mount.

Addition of missing userns_ns_put in userns_install
from Gao feng <gaofeng@cn.fujitsu.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:18:13 -08:00
Eric W. Biederman
4c44aaafa8 userns: Kill task_user_ns
The task_user_ns function hides the fact that it is getting the user
namespace from struct cred on the task.  struct cred may go away as
soon as the rcu lock is released.  This leads to a race where we
can dereference a stale user namespace pointer.

To make it obvious a struct cred is involved kill task_user_ns.

To kill the race modify the users of task_user_ns to only
reference the user namespace while the rcu lock is held.

Cc: Kees Cook <keescook@chromium.org>
Cc: James Morris <james.l.morris@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:44 -08:00
Eric W. Biederman
bcf58e725d userns: Make create_new_namespaces take a user_ns parameter
Modify create_new_namespaces to explicitly take a user namespace
parameter, instead of implicitly through the task_struct.

This allows an implementation of unshare(CLONE_NEWUSER) where
the new user namespace is not stored onto the current task_struct
until after all of the namespaces are created.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:43 -08:00
Eric W. Biederman
142e1d1d5f userns: Allow unprivileged use of setns.
- Push the permission check from the core setns syscall into
  the setns install methods where the user namespace of the
  target namespace can be determined, and used in a ns_capable
  call.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:42 -08:00
Eric W. Biederman
b33c77ef23 userns: Allow unprivileged users to create new namespaces
If an unprivileged user has the appropriate capabilities in their
current user namespace allow the creation of new namespaces.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:41 -08:00
Eric W. Biederman
37657da3c5 userns: Allow setting a userns mapping to your current uid.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:40 -08:00
Eric W. Biederman
7fa294c899 userns: Allow chown and setgid preservation
- Allow chown if CAP_CHOWN is present in the current user namespace
  and the uid of the inode maps into the current user namespace, and
  the destination uid or gid maps into the current user namespace.

- Allow perserving setgid when changing an inode if CAP_FSETID is
  present in the current user namespace and the owner of the file has
  a mapping into the current user namespace.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-20 04:17:24 -08:00
Eric W. Biederman
5eaf563e53 userns: Allow unprivileged users to create user namespaces.
Now that we have been through every permission check in the kernel
having uid == 0 and gid == 0 in your local user namespace no
longer adds any special privileges.  Even having a full set
of caps in your local user namespace is safe because capabilies
are relative to your local user namespace, and do not confer
unexpected privileges.

Over the long term this should allow much more of the kernels
functionality to be safely used by non-root users.  Functionality
like unsharing the mount namespace that is only unsafe because
it can fool applications whose privileges are raised when they
are executed.  Since those applications have no privileges in
a user namespaces it becomes safe to spoof and confuse those
applications all you want.

Those capabilities will still need to be enabled carefully because
we may still need things like rlimits on the number of unprivileged
mounts but that is to avoid DOS attacks not to avoid fooling root
owned processes.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:24 -08:00
Eric W. Biederman
3cdf5b45ff userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
When performing an exec where the binary lives in one user namespace and
the execing process lives in another usre namespace there is the possibility
that the target uids can not be represented.

Instead of failing the exec simply ignore the suid/sgid bits and run
the binary with lower privileges.   We already do this in the case
of MNT_NOSUID so this should be a well tested code path.

As the user and group are not changed this should not introduce any
security issues.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:23 -08:00
Zhao Hongjiang
ae11e0f184 userns: fix return value on mntns_install() failure
Change return value from -EINVAL to -EPERM when the permission check fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:22 -08:00
Eric W. Biederman
0c55cfc416 vfs: Allow unprivileged manipulation of the mount namespace.
- Add a filesystem flag to mark filesystems that are safe to mount as
  an unprivileged user.

- Add a filesystem flag to mark filesystems that don't need MNT_NODEV
  when mounted by an unprivileged user.

- Relax the permission checks to allow unprivileged users that have
  CAP_SYS_ADMIN permissions in the user namespace referred to by the
  current mount namespace to be allowed to mount, unmount, and move
  filesystems.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:21 -08:00
Eric W. Biederman
7a472ef4be vfs: Only support slave subtrees across different user namespaces
Sharing mount subtress with mount namespaces created by unprivileged
users allows unprivileged mounts created by unprivileged users to
propagate to mount namespaces controlled by privileged users.

Prevent nasty consequences by changing shared subtrees to slave
subtress when an unprivileged users creates a new mount namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:20 -08:00
Eric W. Biederman
771b137168 vfs: Add a user namespace reference from struct mnt_namespace
This will allow for support for unprivileged mounts in a new user namespace.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:19 -08:00
Eric W. Biederman
8823c079ba vfs: Add setns support for the mount namespace
setns support for the mount namespace is a little tricky as an
arbitrary decision must be made about what to set fs->root and
fs->pwd to, as there is no expectation of a relationship between
the two mount namespaces.  Therefore I arbitrarily find the root
mount point, and follow every mount on top of it to find the top
of the mount stack.  Then I set fs->root and fs->pwd to that
location.  The topmost root of the mount stack seems like a
reasonable place to be.

Bind mount support for the mount namespace inodes has the
possibility of creating circular dependencies between mount
namespaces.  Circular dependencies can result in loops that
prevent mount namespaces from every being freed.  I avoid
creating those circular dependencies by adding a sequence number
to the mount namespace and require all bind mounts be of a
younger mount namespace into an older mount namespace.

Add a helper function proc_ns_inode so it is possible to
detect when we are attempting to bind mound a namespace inode.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:18 -08:00
Eric W. Biederman
a85fb273c9 vfs: Allow chroot if you have CAP_SYS_CHROOT in your user namespace
Once you are confined to a user namespace applications can not gain
privilege and escape the user namespace so there is no longer a reason
to restrict chroot.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:17 -08:00
Eric W. Biederman
50804fe373 pidns: Support unsharing the pid namespace.
Unsharing of the pid namespace unlike unsharing of other namespaces
does not take affect immediately.  Instead it affects the children
created with fork and clone.  The first of these children becomes the init
process of the new pid namespace, the rest become oddball children
of pid 0.  From the point of view of the new pid namespace the process
that created it is pid 0, as it's pid does not map.

A couple of different semantics were considered but this one was
settled on because it is easy to implement and it is usable from
pam modules.  The core reasons for the existence of unshare.

I took a survey of the callers of pam modules and the following
appears to be a representative sample of their logic.
{
	setup stuff include pam
	child = fork();
	if (!child) {
		setuid()
                exec /bin/bash
        }
        waitpid(child);

        pam and other cleanup
}

As you can see there is a fork to create the unprivileged user
space process.  Which means that the unprivileged user space
process will appear as pid 1 in the new pid namespace.  Further
most login processes do not cope with extraneous children which
means shifting the duty of reaping extraneous child process to
the creator of those extraneous children makes the system more
comprehensible.

The practical reason for this set of pid namespace semantics is
that it is simple to implement and verify they work correctly.
Whereas an implementation that requres changing the struct
pid on a process comes with a lot more races and pain.  Not
the least of which is that glibc caches getpid().

These semantics are implemented by having two notions
of the pid namespace of a proces.  There is task_active_pid_ns
which is the pid namspace the process was created with
and the pid namespace that all pids are presented to
that process in.  The task_active_pid_ns is stored
in the struct pid of the task.

Then there is the pid namespace that will be used for children
that pid namespace is stored in task->nsproxy->pid_ns.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:16 -08:00
Eric W. Biederman
1c4042c29b pidns: Consolidate initialzation of special init task state
Instead of setting child_reaper and SIGNAL_UNKILLABLE one way
for the system init process, and another way for pid namespace
init processes test pid->nr == 1 and use the same code for both.

For the global init this results in SIGNAL_UNKILLABLE being set
much earlier in the initialization process.

This is a small cleanup and it paves the way for allowing unshare and
enter of the pid namespace as that path like our global init also will
not set CLONE_NEWPID.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:15 -08:00
Eric W. Biederman
57e8391d32 pidns: Add setns support
- Pid namespaces are designed to be inescapable so verify that the
  passed in pid namespace is a child of the currently active
  pid namespace or the currently active pid namespace itself.

  Allowing the currently active pid namespace is important so
  the effects of an earlier setns can be cancelled.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:14 -08:00
Eric W. Biederman
225778d68d pidns: Deny strange cases when creating pid namespaces.
task_active_pid_ns(current) != current->ns_proxy->pid_ns will
soon be allowed to support unshare and setns.

The definition of creating a child pid namespace when
task_active_pid_ns(current) != current->ns_proxy->pid_ns could be that
we create a child pid namespace of current->ns_proxy->pid_ns.  However
that leads to strange cases like trying to have a single process be
init in multiple pid namespaces, which is racy and hard to think
about.

The definition of creating a child pid namespace when
task_active_pid_ns(current) != current->ns_proxy->pid_ns could be that
we create a child pid namespace of task_active_pid_ns(current).  While
that seems less racy it does not provide any utility.

Therefore define the semantics of creating a child pid namespace when
task_active_pid_ns(current) != current->ns_proxy->pid_ns to be that the
pid namespace creation fails.  That is easy to implement and easy
to think about.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:13 -08:00
Eric W. Biederman
af4b8a83ad pidns: Wait in zap_pid_ns_processes until pid_ns->nr_hashed == 1
Looking at pid_ns->nr_hashed is a bit simpler and it works for
disjoint process trees that an unshare or a join of a pid_namespace
may create.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:12 -08:00
Eric W. Biederman
5e1182deb8 pidns: Don't allow new processes in a dead pid namespace.
Set nr_hashed to -1 just before we schedule the work to cleanup proc.
Test nr_hashed just before we hash a new pid and if nr_hashed is < 0
fail.

This guaranteees that processes never enter a pid namespaces after we
have cleaned up the state to support processes in a pid namespace.

Currently sending SIGKILL to all of the process in a pid namespace as
init exists gives us this guarantee but we need something a little
stronger to support unsharing and joining a pid namespace.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:11 -08:00
Eric W. Biederman
0a01f2cc39 pidns: Make the pidns proc mount/umount logic obvious.
Track the number of pids in the proc hash table.  When the number of
pids goes to 0 schedule work to unmount the kernel mount of proc.

Move the mount of proc into alloc_pid when we allocate the pid for
init.

Remove the surprising calls of pid_ns_release proc in fork and
proc_flush_task.  Those code paths really shouldn't know about proc
namespace implementation details and people have demonstrated several
times that finding and understanding those code paths is difficult and
non-obvious.

Because of the call path detach pid is alwasy called with the
rtnl_lock held free_pid is not allowed to sleep, so the work to
unmounting proc is moved to a work queue.  This has the side benefit
of not blocking the entire world waiting for the unnecessary
rcu_barrier in deactivate_locked_super.

In the process of making the code clear and obvious this fixes a bug
reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
succeeded and copy_net_ns failed.

Acked-by: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:59:10 -08:00
Eric W. Biederman
17cf22c33e pidns: Use task_active_pid_ns where appropriate
The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
aka ns_of_pid(task_pid(tsk)) should have the same number of
cache line misses with the practical difference that
ns_of_pid(task_pid(tsk)) is released later in a processes life.

Furthermore by using task_active_pid_ns it becomes trivial
to write an unshare implementation for the the pid namespace.

So I have used task_active_pid_ns everywhere I can.

In fork since the pid has not yet been attached to the
process I use ns_of_pid, to achieve the same effect.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 05:59:09 -08:00
Eric W. Biederman
49f4d8b93c pidns: Capture the user namespace and filter ns_last_pid
- Capture the the user namespace that creates the pid namespace
- Use that user namespace to test if it is ok to write to
  /proc/sys/kernel/ns_last_pid.

Zhao Hongjiang <zhaohongjiang@huawei.com> noticed I was missing a put_user_ns
in when destroying a pid_ns.  I have foloded his patch into this one
so that bisects will work properly.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 05:57:31 -08:00
Eric W. Biederman
ae06c7c83f procfs: Don't cache a pid in the root inode.
Now that we have s_fs_info pointing to our pid namespace
the original reason for the proc root inode having a struct
pid is gone.

Caching a pid in the root inode has led to some complicated
code.  Now that we don't need the struct pid, just remove it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 03:09:35 -08:00
Eric W. Biederman
e656d8a6f7 procfs: Use the proc generic infrastructure for proc/self.
I had visions at one point of splitting proc into two filesystems.  If
that had happened proc/self being the the part of proc that actually deals
with pids would have been a nice cleanup.  As it is proc/self requires
a lot of unnecessary infrastructure for a single file.

The only user visible change is that a mounted /proc for a pid namespace
that is dead now shows a broken proc symlink, instead of being completely
invisible.  I don't think anyone will notice or care.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-19 03:09:34 -08:00
Eric W. Biederman
dd34ad35c3 userns: On mips modify check_same_owner to use uid_eq
The kbuild test robot <fengguang.wu@intel.com> report the following error
when building mips with user namespace support enabled.

All error/warnings:
arch/mips/kernel/mips-mt-fpaff.c: In function 'check_same_owner':
arch/mips/kernel/mips-mt-fpaff.c:53:22: error: invalid operands to binary == (have 'kuid_t' and 'kuid_t')
arch/mips/kernel/mips-mt-fpaff.c:54:15: error: invalid operands to binary == (have 'kuid_t' and 'kuid_t')

Replace "a == b" with uid_eq(a, b) removes this error and allows the
code to work with user namespaces enabled.

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-11-19 03:05:15 -08:00
Eric W. Biederman
038e7332b8 userns: make each net (net_ns) belong to a user_ns
The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18 22:46:23 -08:00
Eric W. Biederman
d727abcb23 netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-18 22:46:19 -08:00
Eric W. Biederman
499dcf2024 userns: Support fuse interacting with multiple user namespaces
Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data.

The connection between between a fuse filesystem and a fuse daemon is
established when a fuse filesystem is mounted and provided with a file
descriptor the fuse daemon created by opening /dev/fuse.

For now restrict the communication of uids and gids between the fuse
filesystem and the fuse daemon to the initial user namespace.  Enforce
this by verifying the file descriptor passed to the mount of fuse was
opened in the initial user namespace.  Ensuring the mount happens in
the initial user namespace is not necessary as mounts from non-initial
user namespaces are not yet allowed.

In fuse_req_init_context convert the currrent fsuid and fsgid into the
initial user namespace for the request that will be sent to the fuse
daemon.

In fuse_fill_attr convert the uid and gid passed from the fuse daemon
from the initial user namespace into kuids and kgids.

In iattr_to_fattr called from fuse_setattr convert kuids and kgids
into the uids and gids in the initial user namespace before passing
them to the fuse filesystem.

In fuse_change_attributes_common called from fuse_dentry_revalidate,
fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert
the uid and gid from the fuse daemon into a kuid and a kgid to store
on the fuse inode.

By default fuse mounts are restricted to task whose uid, suid, and
euid matches the fuse user_id and whose gid, sgid, and egid matches
the fuse group id.  Convert the user_id and group_id mount options
into kuids and kgids at mount time, and use uid_eq and gid_eq to
compare the in fuse_allow_task.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-14 22:05:33 -08:00
Eric W. Biederman
45634cd8cb userns: Support autofs4 interacing with multiple user namespaces
Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue.

When creating directories and symlinks default the uid and gid of
the mount requester to the global root uid and gid.  autofs4_wait
will update these fields when a mount is requested.

When generating autofsv5 packets report the uid and gid of the mount
requestor in user namespace of the process that opened the pipe,
reporting unmapped uids and gids as overflowuid and overflowgid.

In autofs_dev_ioctl_requester return the uid and gid of the last mount
requester converted into the calling processes user namespace.  When the
uid or gid don't map return overflowuid and overflowgid as appropriate,
allowing failure to find a mount requester to be distinguished from
failure to map a mount requester.

The uid and gid mount options specifying the user and group of the
root autofs inode are converted into kuid and kgid as they are parsed
defaulting to the current uid and current gid of the process that
mounts autofs.

Mounting of autofs for the present remains confined to processes in
the initial user namespace.

Cc: Ian Kent <raven@themaw.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-11-14 22:05:32 -08:00
Linus Torvalds
8f0d8163b5 Linux 3.7-rc3 2012-10-28 12:24:48 -07:00
Linus Torvalds
5a5210c6ad With the v3.7-rc2 kernel, the network cards on my target boxes
were not being brought up. I found that the modules for the
 network was not being installed. This was due to the config
 CONFIG_MODULES_USE_ELF_RELA that came before CONFIG_MODULES, and
 confused ktest in thinking that CONFIG_MODULES=y was not found.
 
 Ktest needs to test all configs and not just stop if something starts
 with CONFIG_MODULES.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQig8hAAoJEOdOSU1xswtMkFgIALLcnba79RHsPdGCTX247Hcg
 UdteytZgyd1XayDSPLOVAR5f1vJeZ/6/L5dwWqZpf+j6wUTBwdUTc4DlBwHNpi8V
 XDKbwAYWAQp4BVaQkKcrxKZZepE791NWxCelG7T7S0d7jIkwFTA4BhT4+8+QFztZ
 H5IDL+HA73Jvehfv3gpJW6yDQ/QSyUjIK4QCsJS+wodB9nDzhAEiZ6ZKflSXFGq4
 J+Fl/UfRfnA+j0aP75ecL7hewfdiLOmK67vKvW3l8wZ7s0y3NnIsxymmaa6sTIAQ
 lIAsmSPdqOzXExIKLBHnsHCog6UW4a91MmEqM05IDpt+AcCnwDbk4EfbJEXa8ug=
 =vl1t
 -----END PGP SIGNATURE-----

Merge tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest

Pull ktest confusion fix from Steven Rostedt:
 "With the v3.7-rc2 kernel, the network cards on my target boxes were
  not being brought up.

  I found that the modules for the network was not being installed.
  This was due to the config CONFIG_MODULES_USE_ELF_RELA that came
  before CONFIG_MODULES, and confused ktest in thinking that
  CONFIG_MODULES=y was not found.

  Ktest needs to test all configs and not just stop if something starts
  with CONFIG_MODULES."

* tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA
2012-10-28 11:14:52 -07:00
Linus Torvalds
8e99165a6f spi: Some minor MXS fixes
These fixes are both pretty minor ones and are driver local.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQjEZeAAoJELSic+t+oim9RyMP/RjRpB1GanO9CAYpW9cQ7R69
 vPmXfvUMWBTojC70EYvHV8juvbpJCBOLlUwbgFBgnmWasNd7lqOlyOAJ4NlhPDTs
 6+iHenM/lDBTZ/9bQUuW3DFsUW172R2j1wI7kVsxqLDKDKdgch5YuTxXiTMKcKAB
 dX4rJ2m0X/6VGx7D3wIeNz+LPW7+4jbJLf5P4bC4Yg1xxpG6QxPPraXLznElFme8
 yfrW3Vo6BnXiL29YDF95RTpLhcZ8ZVM0juT2VJPQ8EvcZLcWpywqdMV8EnHRJrgT
 BlXT2xuxJsH5et0KYrgFAinbEdwnbIHHu31hKVzUddZ0j2BLYtpfv27f84bE9DcW
 c1QMu41yf/KYzLwgBNDpuA/q3/8SJEeaK/c17TbEvjZXwu8rZXCGHHnMRCIveEiC
 B9wFzAX1xo44YOQGXYrEEP/QkTZAUUvMHJl36/ErWGk5RaiHSFwgzb5JdxGgVdnO
 NvwjRiyDLWUV1WScD91D662/7D3njuKM8Ft1pq5WUxJYWQY3g8rQ09E26aMyY4w4
 +H3GDKS6gXTZwoRSFWBOETnCSKorM98nV2pE7JQcfaW+GZ7x7VYPp86Dw8JtAXiZ
 4a8X9LaNmxUXIKDJqjwzcdk5lH+KkhodaSTbYnPZbf0VezlNvcp8MMmSHMZ9gyg/
 9CcFfKmZMJ7alSGccohc
 =3xEy
 -----END PGP SIGNATURE-----

Merge tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc

Pull minor spi MXS fixes from Mark Brown:
 "These fixes are both pretty minor ones and are driver local."

* tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc:
  spi: mxs: Terminate DMA in case of DMA timeout
  spi: mxs: Assign message status after transfer finished
2012-10-28 11:13:54 -07:00
Linus Torvalds
065c8012b2 arm-soc: fixes for v3.7-rc3
Bug fixes for a number of ARM platforms, mostly OMAP, imx and at91.
 These come a little later than I had hoped but unfortunately we
 had a few of these patches cause regressions themselves and had to
 work out how to deal with those in the meantime.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIVAwUAUIwDWmCrR//JCVInAQJREBAAwkH8kI/Xl3JqTfP4A69P8fOdD0p1ZC08
 QHzRdgXixpssiIC2wKwM4N4Ine23p1sbGIHHjnDMyTytFXGl7RIRjIXucm3NVBq5
 bw5uW5HziO8Pg+uA0ieZiqDEvroIw6U0AxKEKrZ9Fpc9XBr9RArIsRtTNyoFli+2
 JBgQ5eHYq4cq3cmX1XkU4q7RVUUA6XE/Vqs9IT6dfK4x56RR0Huri/ldkxqsLNj+
 HdN+7QoTz4wUjhF1tqCZt/3bo1dUONpDu4DJPnzscQA77HplQsSF3MsY5AEajjsA
 8mKG6AOjmvZsqJFjGYsq/r4DerPj2ME+1z84y5xrMI5WUxJL/6fj5uGTNsdVxifW
 scywLEG9bRjCehgoAg26XZWNKy6NuzkONxR9fjbrj9vGopje23VT5OXgeygesUD2
 WTbI3qeZz/O1esDBQ9D025K3a9kTCsJltstO2oVubGWgqvG2oK8LTqjeu8DwM2ti
 tloNQmylOKOaxnYm9TSouDRpQ0MPFVxMxe1VwFxzry7Mz3+lfyC2/fiYpZLC+OgQ
 2TjclUB4aIXLPVJAsAxu9Z8vEhx11EtghkeWy5Hk4TT3dXgn77MnyAPWp594DjQ0
 WdHrCNCK+K0Kk7R2FDkaZi2CvdCd1+AS6xyXjO3CmA7HbWLDEUlRg4/24/AzLK3j
 rO+bw62yQKg=
 =IDdm
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc fixes from Arnd Bergmann:
 "Bug fixes for a number of ARM platforms, mostly OMAP, imx and at91.

  These come a little later than I had hoped but unfortunately we had a
  few of these patches cause regressions themselves and had to work out
  how to deal with those in the meantime."

* tag 'fixes-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits)
  Revert "ARM i.MX25: Fix PWM per clock lookups"
  ARM: versatile: fix versatile_defconfig
  ARM: mvebu: update defconfig with 3.7 changes
  ARM: at91: fix at91x40 build
  ARM: socfpga: Fix socfpga compilation with early_printk() enabled
  ARM: SPEAr: Remove unused empty files
  MAINTAINERS: Add arm-soc tree entry
  ARM: dts: mxs: add the "clock-names" for gpmi-nand
  ARM: ux500: Correct SDI5 address and add some format changes
  ARM: ux500: Specify AMBA Primecell IDs for Nomadik I2C in DT
  ARM: ux500: Fix build error relating to IRQCHIP_SKIP_SET_WAKE
  ARM: at91: drop duplicated config SOC_AT91SAM9 entry
  ARM: at91/i2c: change id to let i2c-at91 work
  ARM: at91/i2c: change id to let i2c-gpio work
  ARM: at91/dts: at91sam9g20ek_common: Fix typos in buttons labels.
  ARM: at91: fix external interrupt specification in board code
  ARM: at91: fix external interrupts in non-DT case
  ARM: at91: at91sam9g10: fix SOC type detection
  ARM: at91/tc: fix typo in the DT document
  ARM: AM33XX: Fix configuration of dmtimer parent clock by dmtimer driverDate:Wed, 17 Oct 2012 13:55:55 -0500
  ...
2012-10-28 11:12:38 -07:00
Mikulas Patocka
1a25b1c4ce Lock splice_read and splice_write functions
Functions generic_file_splice_read and generic_file_splice_write access
the pagecache directly. For block devices these functions must be locked
so that block size is not changed while they are in progress.

This patch is an additional fix for commit b87570f5d3 ("Fix a crash
when block device is read and block size is changed at the same time")
that locked aio_read, aio_write and mmap against block size change.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-28 10:59:37 -07:00
Mikulas Patocka
1bf11c5353 percpu-rw-semaphores: use rcu_read_lock_sched
Use rcu_read_lock_sched / rcu_read_unlock_sched / synchronize_sched
instead of rcu_read_lock / rcu_read_unlock / synchronize_rcu.

This is an optimization. The RCU-protected region is very small, so
there will be no latency problems if we disable preempt in this region.

So we use rcu_read_lock_sched / rcu_read_unlock_sched that translates
to preempt_disable / preempt_disable. It is smaller (and supposedly
faster) than preemptible rcu_read_lock / rcu_read_unlock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-28 10:59:36 -07:00
Mikulas Patocka
5c1eabe685 percpu-rw-semaphores: use light/heavy barriers
This patch introduces new barrier pair light_mb() and heavy_mb() for
percpu rw semaphores.

This patch fixes a bug in percpu-rw-semaphores where a barrier was
missing in percpu_up_write.

This patch improves performance on the read path of
percpu-rw-semaphores: on non-x86 cpus, there was a smp_mb() in
percpu_up_read. This patch changes it to a compiler barrier and removes
the "#if defined(X86) ..." condition.

From: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-28 10:59:36 -07:00
Arnd Bergmann
943bb48755 Revert "ARM i.MX25: Fix PWM per clock lookups"
This reverts commit 92063cee11, it
was applied prematurely, causing this build error for
imx_v4_v5_defconfig:

arch/arm/mach-imx/clk-imx25.c: In function 'mx25_clocks_init':
arch/arm/mach-imx/clk-imx25.c:206:26: error: 'pwm_ipg_per' undeclared (first use in this function)
arch/arm/mach-imx/clk-imx25.c:206:26: note: each undeclared identifier is reported only once for each function it appears in

Sascha Hauer explains:
> There are several gates missing in clk-imx25.c. I have a patch which
> adds support for them and I seem to have missed that the above depends
> on it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-10-27 17:46:56 +02:00
Arnd Bergmann
5b627ba0f5 ARM: versatile: fix versatile_defconfig
With the introduction of CONFIG_ARCH_MULTIPLATFORM, versatile is
no longer the default platform, so we need to enable
CONFIG_ARCH_VERSATILE explicitly in order for that to be selected
rather than the multiplatform configuration.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-10-27 17:46:56 +02:00
Thomas Petazzoni
e09348c757 ARM: mvebu: update defconfig with 3.7 changes
The split of 370 and XP into two Kconfig options and the multiplatform
kernel support has changed a few Kconfig symbols, so let's update the
mvebu_defconfig file with the latest changes.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2012-10-27 17:46:55 +02:00