In cred_alloc_blank() since 2.6.32, abort_creds(new) is called with
new->security == NULL and new->magic == 0 when security_cred_alloc_blank()
returns an error. As a result, BUG() will be triggered if SELinux is enabled
or CONFIG_DEBUG_CREDENTIALS=y.
If CONFIG_DEBUG_CREDENTIALS=y, BUG() is called from __invalid_creds() because
cred->magic == 0. Failing that, BUG() is called from selinux_cred_free()
because selinux_cred_free() is not expecting cred->security == NULL. This does
not affect smack_cred_free(), tomoyo_cred_free() or apparmor_cred_free().
Fix these bugs by
(1) Set new->magic before calling security_cred_alloc_blank().
(2) Handle null cred->security in creds_are_invalid() and selinux_cred_free().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (30 commits)
MAINTAINERS: Add tomoyo-dev-en ML.
SELinux: define permissions for DCB netlink messages
encrypted-keys: style and other cleanup
encrypted-keys: verify datablob size before converting to binary
trusted-keys: kzalloc and other cleanup
trusted-keys: additional TSS return code and other error handling
syslog: check cap_syslog when dmesg_restrict
Smack: Transmute labels on specified directories
selinux: cache sidtab_context_to_sid results
SELinux: do not compute transition labels on mountpoint labeled filesystems
This patch adds a new security attribute to Smack called SMACK64EXEC. It defines label that is used while task is running.
SELinux: merge policydb_index_classes and policydb_index_others
selinux: convert part of the sym_val_to_name array to use flex_array
selinux: convert type_val_to_struct to flex_array
flex_array: fix flex_array_put_ptr macro to be valid C
SELinux: do not set automatic i_ino in selinuxfs
selinux: rework security_netlbl_secattr_to_sid
SELinux: standardize return code handling in selinuxfs.c
SELinux: standardize return code handling in selinuxfs.c
SELinux: standardize return code handling in policydb.c
...
Remove path.h from sched.h and other files.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
security/smack/smack_lsm.c
Verified and added fix by Stephen Rothwell <sfr@canb.auug.org.au>
Ok'd by Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).
Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
unix_release() can asynchornously set socket->sk to NULL, and
it does so without holding the unix_state_lock() on "other"
during stream connects.
However, the reverse mapping, sk->sk_socket, is only transitioned
to NULL under the unix_state_lock().
Therefore make the security hooks follow the reverse mapping instead
of the forward mapping.
Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2f90b865 added two new netlink message types to the netlink route
socket. SELinux has hooks to define if netlink messages are allowed to
be sent or received, but it did not know about these two new message
types. By default we allow such actions so noone likely noticed. This
patch adds the proper definitions and thus proper permissions
enforcement.
Signed-off-by: Eric Paris <eparis@redhat.com>
sidtab_context_to_sid takes up a large share of time when creating large
numbers of new inodes (~30-40% in oprofile runs). This patch implements a
cache of 3 entries which is checked before we do a full context_to_sid lookup.
On one system this showed over a x3 improvement in the number of inodes that
could be created per second and around a 20% improvement on another system.
Any time we look up the same context string sucessivly (imagine ls -lZ) we
should hit this cache hot. A cache miss should have a relatively minor affect
on performance next to doing the full table search.
All operations on the cache are done COMPLETELY lockless. We know that all
struct sidtab_node objects created will never be deleted until a new policy is
loaded thus we never have to worry about a pointer being dereferenced. Since
we also know that pointer assignment is atomic we know that the cache will
always have valid pointers. Given this information we implement a FIFO cache
in an array of 3 pointers. Every result (whether a cache hit or table lookup)
will be places in the 0 spot of the cache and the rest of the entries moved
down one spot. The 3rd entry will be lost.
Races are possible and are even likely to happen. Lets assume that 4 tasks
are hitting sidtab_context_to_sid. The first task checks against the first
entry in the cache and it is a miss. Now lets assume a second task updates
the cache with a new entry. This will push the first entry back to the second
spot. Now the first task might check against the second entry (which it
already checked) and will miss again. Now say some third task updates the
cache and push the second entry to the third spot. The first task my check
the third entry (for the third time!) and again have a miss. At which point
it will just do a full table lookup. No big deal!
Signed-off-by: Eric Paris <eparis@redhat.com>
selinux_inode_init_security computes transitions sids even for filesystems
that use mount point labeling. It shouldn't do that. It should just use
the mount point label always and no matter what.
This causes 2 problems. 1) it makes file creation slower than it needs to be
since we calculate the transition sid and 2) it allows files to be created
with a different label than the mount point!
# id -Z
staff_u:sysadm_r:sysadm_t:s0-s0:c0.c1023
# sesearch --type --class file --source sysadm_t --target tmp_t
Found 1 semantic te rules:
type_transition sysadm_t tmp_t : file user_tmp_t;
# mount -o loop,context="system_u:object_r:tmp_t:s0" /tmp/fs /mnt/tmp
# ls -lZ /mnt/tmp
drwx------. root root system_u:object_r:tmp_t:s0 lost+found
# touch /mnt/tmp/file1
# ls -lZ /mnt/tmp
-rw-r--r--. root root staff_u:object_r:user_tmp_t:s0 file1
drwx------. root root system_u:object_r:tmp_t:s0 lost+found
Whoops, we have a mount point labeled filesystem tmp_t with a user_tmp_t
labeled file!
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Reviewed-by: James Morris <jmorris@namei.org>
We duplicate functionality in policydb_index_classes() and
policydb_index_others(). This patch merges those functions just to make it
clear there is nothing special happening here.
Signed-off-by: Eric Paris <eparis@redhat.com>
The sym_val_to_name type array can be quite large as it grows linearly with
the number of types. With known policies having over 5k types these
allocations are growing large enough that they are likely to fail. Convert
those to flex_array so no allocation is larger than PAGE_SIZE
Signed-off-by: Eric Paris <eparis@redhat.com>
In rawhide type_val_to_struct will allocate 26848 bytes, an order 3
allocations. While this hasn't been seen to fail it isn't outside the
realm of possibiliy on systems with severe memory fragmentation. Convert
to flex_array so no allocation will ever be bigger than PAGE_SIZE.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinuxfs carefully uses i_ino to figure out what the inode refers to. The
VFS used to generically set this value and we would reset it to something
useable. After 85fe4025c6 each filesystem sets this value to a default
if needed. Since selinuxfs doesn't use the default value and it can only
lead to problems (I'd rather have 2 inodes with i_ino == 0 than one
pointing to the wrong data) lets just stop setting a default.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
security_netlbl_secattr_to_sid is difficult to follow, especially the
return codes. Try to make the function obvious.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinuxfs.c has lots of different standards on how to handle return paths on
error. For the most part transition to
rc=errno
if (failure)
goto out;
[...]
out:
cleanup()
return rc;
Instead of doing cleanup mid function, or having multiple returns or other
options. This doesn't do that for every function, but most of the complex
functions which have cleanup routines on error.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinuxfs.c has lots of different standards on how to handle return paths on
error. For the most part transition to
rc=errno
if (failure)
goto out;
[...]
out:
cleanup()
return rc;
Instead of doing cleanup mid function, or having multiple returns or other
options. This doesn't do that for every function, but most of the complex
functions which have cleanup routines on error.
Signed-off-by: Eric Paris <eparis@redhat.com>
policydb.c has lots of different standards on how to handle return paths on
error. For the most part transition to
rc=errno
if (failure)
goto out;
[...]
out:
cleanup()
return rc;
Instead of doing cleanup mid function, or having multiple returns or other
options. This doesn't do that for every function, but most of the complex
functions which have cleanup routines on error.
Signed-off-by: Eric Paris <eparis@redhat.com>
Privileged syslog operations currently require CAP_SYS_ADMIN. Split
this off into a new CAP_SYSLOG privilege which we can sanely take away
from a container through the capability bounding set.
With this patch, an lxc container can be prevented from messing with
the host's syslog (i.e. dmesg -c).
Changelog: mar 12 2010: add selinux capability2:cap_syslog perm
Changelog: nov 22 2010:
. port to new kernel
. add a WARN_ONCE if userspace isn't using CAP_SYSLOG
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-By: Kees Cook <kees.cook@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: "Christopher J. PeBenito" <cpebenito@tresys.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
The SELinux ip postroute code indicates when policy rejected a packet and
passes the error back up the stack. The compat code does not. This patch
sends the same kind of error back up the stack in the compat code.
Based-on-patch-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the SELinux netlink code returns a fatal error when the error might
actually be transient. This patch just silently drops packets on
potentially transient errors but continues to return a permanant error
indicator when the denial was because of policy.
Based-on-comments-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SELinux netfilter hooks just return NF_DROP if they drop a packet. We
want to signal that a drop in this hook is a permanant fatal error and is not
transient. If we do this the error will be passed back up the stack in some
places and applications will get a faster interaction that something went
wrong.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
failure when CONFIG_PRINTK=n. This is because the capabilities code
which used the new option was built even though the variable in question
didn't exist.
The patch here fixes this by moving the capabilities checks out of the
LSM and into the caller. All (known) LSMs should have been calling the
capabilities hook already so it actually makes the code organization
better to eliminate the hook altogether.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
/selinux/policy allows a user to copy the policy back out of the kernel.
This patch allows userspace to actually mmap that file and use it directly.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
There is interest in being able to see what the actual policy is that was
loaded into the kernel. The patch creates a new selinuxfs file
/selinux/policy which can be read by userspace. The actual policy that is
loaded into the kernel will be written back out to userspace.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
AVTAB_MAX_SIZE was a define which was supposed to be used in userspace to
define a maximally sized avtab when userspace wasn't sure how big of a table
it needed. It doesn't make sense in the kernel since we always know our table
sizes. The only place it is used we have a more appropiately named define
called AVTAB_MAX_HASH_BUCKETS, use that instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Range transition rules are placed in the hash table in an (almost)
arbitrary order. This patch inserts them in a fixed order to make policy
retrival more predictable.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
With the (long ago) interface change to have the secid_to_secctx functions
do the string allocation instead of having the caller do the allocation we
lost the ability to query the security server for the length of the
upcoming string. The SECMARK code would like to allocate a netlink skb
with enough length to hold the string but it is just too unclean to do the
string allocation twice or to do the allocation the first time and hold
onto the string and slen. This patch adds the ability to call
security_secid_to_secctx() with a NULL data pointer and it will just set
the slen pointer.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Right now secmark has lots of direct selinux calls. Use all LSM calls and
remove all SELinux specific knowledge. The only SELinux specific knowledge
we leave is the mode. The only point is to make sure that other LSMs at
least test this generic code before they assume it works. (They may also
have to make changes if they do not represent labels as strings)
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: James Morris <jmorris@namei.org>
All security modules shouldn't change sched_param parameter of
security_task_setscheduler(). This is not only meaningless, but also
make a harmful result if caller pass a static variable.
This patch remove policy and sched_param parameter from
security_task_setscheduler() becuase none of security module is
using it.
Cc: James Morris <jmorris@namei.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch fixes up coding-style problem at this commit:
4f27a7d49789b04404eca26ccde5f527231d01d5
selinux: fast status update interface (/selinux/status)
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: James Morris <jmorris@namei.org>
While the previous change to the selinux Makefile reduced the window
significantly for this failure, it is still possible to see a compile
failure where cpp starts processing selinux files before the auto
generated flask.h file is completed. This is easily reproduced by
adding the following temporary change to expose the issue everytime:
- cmd_flask = scripts/selinux/genheaders/genheaders ...
+ cmd_flask = sleep 30 ; scripts/selinux/genheaders/genheaders ...
This failure happens because the creation of the object files in the ss
subdir also depends on flask.h. So simply incorporate them into the
parent Makefile, as the ss/Makefile really doesn't do anything unique.
With this change, compiling of all selinux files is dependent on
completion of the header file generation, and this test case with
the "sleep 30" now confirms it is functioning as expected.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: James Morris <jmorris@namei.org>
Selinux has an autogenerated file, "flask.h" which is included by
two other selinux files. The current makefile has a single dependency
on the first object file in the selinux-y list, assuming that will get
flask.h generated before anyone looks for it, but that assumption breaks
down in a "make -jN" situation and you get:
selinux/selinuxfs.c:35: fatal error: flask.h: No such file or directory
compilation terminated.
remake[9]: *** [security/selinux/selinuxfs.o] Error 1
Since flask.h is included by security.h which in turn is included
nearly everywhere, make the dependency apply to all of the selinux-y
list of objs.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch provides a new /selinux/status entry which allows applications
read-only mmap(2).
This region reflects selinux_kernel_status structure in kernel space.
struct selinux_kernel_status
{
u32 length; /* length of this structure */
u32 sequence; /* sequence number of seqlock logic */
u32 enforcing; /* current setting of enforcing mode */
u32 policyload; /* times of policy reloaded */
u32 deny_unknown; /* current setting of deny_unknown */
};
When userspace object manager caches access control decisions provided
by SELinux, it needs to invalidate the cache on policy reload and setenforce
to keep consistency.
However, the applications need to check the kernel state for each accesses
on userspace avc, or launch a background worker process.
In heuristic, frequency of invalidation is much less than frequency of
making access control decision, so it is annoying to invoke a system call
to check we don't need to invalidate the userspace cache.
If we can use a background worker thread, it allows to receive invalidation
messages from the kernel. But it requires us an invasive coding toward the
base application in some cases; E.g, when we provide a feature performing
with SELinux as a plugin module, it is unwelcome manner to launch its own
worker thread from the module.
If we could map /selinux/status to process memory space, application can
know updates of selinux status; policy reload or setenforce.
A typical application checks selinux_kernel_status::sequence when it tries
to reference userspace avc. If it was changed from the last time when it
checked userspace avc, it means something was updated in the kernel space.
Then, the application can reset userspace avc or update current enforcing
mode, without any system call invocations.
This sequence number is updated according to the seqlock logic, so we need
to wait for a while if it is odd number.
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Acked-by: Eric Paris <eparis@redhat.com>
--
security/selinux/include/security.h | 21 ++++++
security/selinux/selinuxfs.c | 56 +++++++++++++++
security/selinux/ss/Makefile | 2 +-
security/selinux/ss/services.c | 3 +
security/selinux/ss/status.c | 129 +++++++++++++++++++++++++++++++++++
5 files changed, 210 insertions(+), 1 deletions(-)
Signed-off-by: James Morris <jmorris@namei.org>
type is not used at all, stop declaring and assigning it.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".
Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.
The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.
[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
unistd: add __NR_prlimit64 syscall numbers
rlimits: implement prlimit64 syscall
rlimits: switch more rlimit syscalls to do_prlimit
rlimits: redo do_setrlimit to more generic do_prlimit
rlimits: add rlimit64 structure
rlimits: do security check under task_lock
rlimits: allow setrlimit to non-current tasks
rlimits: split sys_setrlimit
rlimits: selinux, do rlimits changes under task_lock
rlimits: make sure ->rlim_max never grows in sys_setrlimit
rlimits: add task_struct to update_rlimit_cpu
rlimits: security, add task_struct to setrlimit
Fix up various system call number conflicts. We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.
Fix build error caused by a stale security/selinux/av_permissions.h in the $(src)
directory which will override a more recent version in $(obj) that is it
appears to strike only when building with a separate object directory.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Current selinux policy can have over 3000 types. The type_attr_map in
policy is an array sized by the number of types times sizeof(struct ebitmap)
(12 on x86_64). Basic math tells us the array is going to be of length
3000 x 12 = 36,000 bytes. The largest 'safe' allocation on a long running
system is 16k. Most of the time a 32k allocation will work. But on long
running systems a 64k allocation (what we need) can fail quite regularly.
In order to deal with this I am converting the type_attr_map to use
flex_arrays. Let the library code deal with breaking this into PAGE_SIZE
pieces.
-v2
rework some of the if(!obj) BUG() to be BUG_ON(!obj)
drop flex_array_put() calls and just use a _get() object directly
-v3
make apply to James' tree (drop the policydb_write changes)
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
execmod "could" show up on non regular files and non chr files. The current
implementation would actually make these checks against non-existant bits
since the code assumes the execmod permission is same for all file types.
To make this line up for chr files we had to define execute_no_trans and
entrypoint permissions. These permissions are unreachable and only existed
to to make FILE__EXECMOD and CHR_FILE__EXECMOD the same. This patch drops
those needless perms as well.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>