Now that we can safely increase the avtab max buckets without
triggering high order allocations and have a hash function that
will make better use of the larger number of buckets, increase
the max buckets to 2^16.
Original:
101421 entries and 2048/2048 buckets used, longest chain length 374
With new hash function:
101421 entries and 2048/2048 buckets used, longest chain length 81
With increased max buckets:
101421 entries and 31078/32768 buckets used, longest chain length 12
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
This function, based on murmurhash3, has much better distribution than
the original. Using the current default of 2048 buckets, there are many
fewer collisions:
Before:
101421 entries and 2048/2048 buckets used, longest chain length 374
After:
101421 entries and 2048/2048 buckets used, longest chain length 81
The difference becomes much more significant when buckets are increased.
A naive attempt to expand the current function to larger outputs doesn't
yield any significant improvement; so this function is a prerequisite
for increasing the bucket size.
sds: Adapted from the original patches for libsepol to the kernel.
Signed-off-by: John Brooks <john.brooks@jolla.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Previously we shrank the avtab max hash buckets to avoid
high order memory allocations, but this causes avtab lookups to
degenerate to very long linear searches for the Fedora policy. Convert to
using a flex_array instead so that we can increase the buckets
without such limitations.
This change does not alter the max hash buckets; that is left to a
separate follow-on change.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Move the NetLabel secattr MLS category import logic into
mls_import_netlbl_cat() where it belongs, and use the
mls_import_netlbl_cat() function in security_netlbl_secattr_to_sid().
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: Paul Moore <pmoore@redhat.com>
If hashtab_create() returns a NULL pointer then we should return -ENOMEM
but instead the current code returns success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Restructure to keyword=value pairs without spaces. Drop superfluous words in
text. Make invalid_context a keyword. Change result= keyword to seresult=.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[Minor rewrite to the patch subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Historically the NetLabel LSM secattr catmap functions and data
structures have had very long names which makes a mess of the NetLabel
code and anyone who uses NetLabel. This patch renames the catmap
functions and structures from "*_secattr_catmap_*" to just "*_catmap_*"
which improves things greatly.
There are no substantial code or logic changes in this patch.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
The NetLabel secattr catmap functions, and the SELinux import/export
glue routines, were broken in many horrible ways and the SELinux glue
code fiddled with the NetLabel catmap structures in ways that we
probably shouldn't allow. At some point this "worked", but that was
likely due to a bit of dumb luck and sub-par testing (both inflicted
by yours truly). This patch corrects these problems by basically
gutting the code in favor of something less obtuse and restoring the
NetLabel abstractions in the SELinux catmap glue code.
Everything is working now, and if it decides to break itself in the
future this code will be much easier to debug than the code it
replaces.
One noteworthy side effect of the changes is that it is no longer
necessary to allocate a NetLabel catmap before calling one of the
NetLabel APIs to set a bit in the catmap. NetLabel will automatically
allocate the catmap nodes when needed, resulting in less allocations
when the lowest bit is greater than 255 and less code in the LSMs.
Cc: stable@vger.kernel.org
Reported-by: Christian Evans <frodox@zoho.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
With the introduction of fair queued rwlock, recursive read_lock()
may hang the offending process if there is a write_lock() somewhere
in between.
With recursive read_lock checking enabled, the following error was
reported:
=============================================
[ INFO: possible recursive locking detected ]
3.16.0-rc1 #2 Tainted: G E
---------------------------------------------
load_policy/708 is trying to acquire lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b32a>]
security_genfs_sid+0x3a/0x170
but task is already holding lock:
(policy_rwlock){.+.+..}, at: [<ffffffff8125b48c>]
security_fs_use+0x2c/0x110
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(policy_rwlock);
lock(policy_rwlock);
This patch fixes the occurrence of recursive read_lock() of
policy_rwlock by adding a helper function __security_genfs_sid()
which requires caller to take the lock before calling it. The
security_fs_use() was then modified to call the new helper function.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The cond_read_node() should free the given node on error path as it's
not linked to p->cond_list yet. This is done via cond_node_destroy()
but it's not called when next_entry() fails before the expr loop.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
The node->cur_state and len can be read in a single call of next_entry().
And setting len before reading is a dead write so can be eliminated.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
(Minor tweak to the length parameter in the call to next_entry())
Signed-off-by: Paul Moore <pmoore@redhat.com>
There're some code duplication for reading a string value during
policydb_read(). Add str_read() helper to fix it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Paul Moore <pmoore@redhat.com>
ARRAY_SIZE is more concise to use when the size of an array is divided
by the size of its type or the size of its first element.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(E[...]))
+ ARRAY_SIZE(E)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
After silencing the sleeping warning in mls_convert_context() I started
seeing similar traces from hashtab_insert. Do a cond_resched there too.
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
security_xfrm_policy_alloc can be called in atomic context so the
allocation should be done with GFP_ATOMIC. Add an argument to let the
callers choose the appropriate way. In order to do so a gfp argument
needs to be added to the method xfrm_policy_alloc_security in struct
security_operations and to the internal function
selinux_xfrm_alloc_user. After that switch to GFP_ATOMIC in the atomic
callers and leave GFP_KERNEL as before for the rest.
The path that needed the gfp argument addition is:
security_xfrm_policy_alloc -> security_ops.xfrm_policy_alloc_security ->
all users of xfrm_policy_alloc_security (e.g. selinux_xfrm_policy_alloc) ->
selinux_xfrm_alloc_user (here the allocation used to be GFP_KERNEL only)
Now adding a gfp argument to selinux_xfrm_alloc_user requires us to also
add it to security_context_to_sid which is used inside and prior to this
patch did only GFP_KERNEL allocation. So add gfp argument to
security_context_to_sid and adjust all of its callers as well.
CC: Paul Moore <paul@paul-moore.com>
CC: Dave Jones <davej@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Fan Du <fan.du@windriver.com>
CC: David S. Miller <davem@davemloft.net>
CC: LSM list <linux-security-module@vger.kernel.org>
CC: SELinux list <selinux@tycho.nsa.gov>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian. On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu. So the values are all screwed up. Write the values in le
format like it should have been to start.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.
Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy. In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.
Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo
Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled. Any subsequent access to foo
after doing the above will also trigger the BUG.
BUG output from Matthew Thode:
[ 473.893141] ------------[ cut here ]------------
[ 473.962110] kernel BUG at security/selinux/ss/services.c:654!
[ 473.995314] invalid opcode: 0000 [#6] SMP
[ 474.027196] Modules linked in:
[ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I
3.13.0-grsec #1
[ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246
[ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[ 474.556058] Stack:
[ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[ 474.690461] Call Trace:
[ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[ 475.255884] RIP [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 475.296120] RSP <ffff8805c0ac3c38>
[ 475.328734] ---[ end trace f076482e9d754adc ]---
Reported-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Pull audit update from Eric Paris:
"Again we stayed pretty well contained inside the audit system.
Venturing out was fixing a couple of function prototypes which were
inconsistent (didn't hurt anything, but we used the same value as an
int, uint, u32, and I think even a long in a couple of places).
We also made a couple of minor changes to when a couple of LSMs called
the audit system. We hoped to add aarch64 audit support this go
round, but it wasn't ready.
I'm disappearing on vacation on Thursday. I should have internet
access, but it'll be spotty. If anything goes wrong please be sure to
cc rgb@redhat.com. He'll make fixing things his top priority"
* git://git.infradead.org/users/eparis/audit: (50 commits)
audit: whitespace fix in kernel-parameters.txt
audit: fix location of __net_initdata for audit_net_ops
audit: remove pr_info for every network namespace
audit: Modify a set of system calls in audit class definitions
audit: Convert int limit uses to u32
audit: Use more current logging style
audit: Use hex_byte_pack_upper
audit: correct a type mismatch in audit_syscall_exit()
audit: reorder AUDIT_TTY_SET arguments
audit: rework AUDIT_TTY_SET to only grab spin_lock once
audit: remove needless switch in AUDIT_SET
audit: use define's for audit version
audit: documentation of audit= kernel parameter
audit: wait_for_auditd rework for readability
audit: update MAINTAINERS
audit: log task info on feature change
audit: fix incorrect set of audit_sock
audit: print error message when fail to create audit socket
audit: fix dangling keywords in audit_log_set_loginuid() output
audit: log on errors from filter user rules
...
Two of the conditions in selinux_audit_rule_match() should never happen and
the third indicates a race that should be retried. Remove the calls to
audit_log() (which call audit_log_start()) and deal with the errors in the
caller, logging only once if the condition is met. Calling audit_log_start()
in this location makes buffer allocation and locking more complicated in the
calling tree (audit_filter_user()).
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Hello.
I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
[ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Below is a patch, but I don't know whether we need special handing for undoing
ebitmap_set_bit() call.
----------
>>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 6 Jan 2014 16:30:21 +0900
Subject: [PATCH] SELinux: Fix memory leak upon loading policy
Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
check return value from hashtab_insert() in filename_trans_read(). It leaks
memory if hashtab_insert() returns error.
unreferenced object 0xffff88005c9160d0 (size 8):
comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
hex dump (first 8 bytes):
57 0b 00 00 6b 6b 6b a5 W...kkk.
backtrace:
[<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
[<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
[<ffffffff812b345c>] security_load_policy+0x6c/0x500
[<ffffffff812a623c>] sel_write_load+0xac/0x750
[<ffffffff811eb680>] vfs_write+0xc0/0x1f0
[<ffffffff811ec08c>] SyS_write+0x4c/0xa0
[<ffffffff81690419>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
However, we should not return EEXIST error to the caller, or the systemd will
show below message and the boot sequence freezes.
systemd[1]: Failed to load SELinux policy. Freezing.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Revert "selinux: consider filesystem subtype in policies"
This reverts commit 102aefdda4.
Explanation from Eric Paris:
SELinux policy can specify if it should use a filesystem's
xattrs or not. In current policy we have a specification that
fuse should not use xattrs but fuse.glusterfs should use
xattrs. This patch has a bug in which non-glusterfs
filesystems would match the rule saying fuse.glusterfs should
use xattrs. If both fuse and the particular filesystem in
question are not written to handle xattr calls during the mount
command, they will deadlock.
I have fixed the bug to do proper matching, however I believe a
revert is still the correct solution. The reason I believe
that is because the code still does not work. The s_subtype is
not set until after the SELinux hook which attempts to match on
the ".gluster" portion of the rule. So we cannot match on the
rule in question. The code is useless.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Dynamically allocate a couple of the larger stack variables in order to
reduce the stack footprint below 1024. gcc-4.8
security/selinux/ss/services.c: In function 'security_load_policy':
security/selinux/ss/services.c:1964:1: warning: the frame size of 1104 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
Also silence a couple of checkpatch warnings at the same time.
WARNING: sizeof policydb should be sizeof(policydb)
+ memcpy(oldpolicydb, &policydb, sizeof policydb);
WARNING: sizeof policydb should be sizeof(policydb)
+ memcpy(&policydb, newpolicydb, sizeof policydb);
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Update the policy version (POLICYDB_VERSION_CONSTRAINT_NAMES) to allow
holding of policy source info for constraints.
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Conflicts:
security/selinux/hooks.c
Pull Eric's existing SELinux tree as there are a number of patches in
there that are not yet upstream. There was some minor fixup needed to
resolve a conflict in security/selinux/hooks.c:selinux_set_mnt_opts()
between the labeled NFS patches and Eric's security_fs_use()
simplification patch.
Not considering sub filesystem has the following limitation. Support
for SELinux in FUSE is dependent on the particular userspace
filesystem, which is identified by the subtype. For e.g, GlusterFS,
a FUSE based filesystem supports SELinux (by mounting and processing
FUSE requests in different threads, avoiding the mount time
deadlock), whereas other FUSE based filesystems (identified by a
different subtype) have the mount time deadlock.
By considering the subtype of the filesytem in the SELinux policies,
allows us to specify a filesystem subtype, in the following way:
fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0);
This way not all FUSE filesystems are put in the same bucket and
subjected to the limitations of the other subtypes.
Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently the packet class in SELinux is not checked if there are no
SECMARK rules in the security or mangle netfilter tables. Some systems
prefer that packets are always checked, for example, to protect the system
should the netfilter rules fail to load or if the nefilter rules
were maliciously flushed.
Add the always_check_network policy capability which, when enabled, treats
SECMARK as enabled, even if there are no netfilter SECMARK rules and
treats peer labeling as enabled, even if there is no Netlabel or
labeled IPSEC configuration.
Includes definition of "redhat1" SELinux policy capability, which
exists in the SELinux userpace library, to keep ordering correct.
The SELinux userpace portion of this was merged last year, but this kernel
change fell on the floor.
Signed-off-by: Chris PeBenito <cpebenito@tresys.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Rather than passing pointers to memory locations, strings, and other
stuff just give up on the separation and give security_fs_use the
superblock. It just makes the code easier to read (even if not easier to
reuse on some other OS)
Signed-off-by: Eric Paris <eparis@redhat.com>
We only have 6 options, so char is good enough, but use a short as that
packs nicely. This shrinks the superblock_security_struct just a little
bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
The /sys/fs/selinux/policy file is not valid on big endian systems like
ppc64 or s390. Let's see why:
static int hashtab_cnt(void *key, void *data, void *ptr)
{
int *cnt = ptr;
*cnt = *cnt + 1;
return 0;
}
static int range_write(struct policydb *p, void *fp)
{
size_t nel;
[...]
/* count the number of entries in the hashtab */
nel = 0;
rc = hashtab_map(p->range_tr, hashtab_cnt, &nel);
if (rc)
return rc;
buf[0] = cpu_to_le32(nel);
rc = put_entry(buf, sizeof(u32), 1, fp);
So size_t is 64 bits. But then we pass a pointer to it as we do to
hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with
the first 4 bytes. On x86_64 which is little endian, those first 4
bytes and the least significant, so this works out fine. On ppc64/s390
those first 4 bytes of memory are the high order bits. So at the end of
the call to hashtab_map nel has a HUGE number. But the least
significant 32 bits are all 0's.
We then pass that 64 bit number to cpu_to_le32() which happily truncates
it to a 32 bit number and does endian swapping. But the low 32 bits are
all 0's. So no matter how many entries are in the hashtab, big endian
systems always say there are 0 entries because I screwed up the
counting.
The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Currently, the ebitmap_node structure has a fixed size of 32 bytes. On
a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being
used as bitmaps. The overhead ratio is 1/4.
On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes
are left for bitmap purpose and the overhead ratio is 1/2. With a
3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit()
function to be called about 9 million times. The average number of
ebitmap_node traversal is about 3.7.
This patch increases the size of the ebitmap_node structure to 64
bytes for 64-bit system to keep the overhead ratio at 1/4. This may
also improve performance a little bit by making node to node traversal
less frequent (< 2) as more bits are available in each node.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
While running the high_systime workload of the AIM7 benchmark on
a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel
(with HT on), it was found that a pretty sizable amount of time was
spent in the SELinux code. Below was the perf trace of the "perf
record -a -s" of a test run at 1500 users:
5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit
1.96% ls [kernel.kallsyms] [k] mls_level_isvalid
1.95% ls [kernel.kallsyms] [k] find_next_bit
The ebitmap_get_bit() was the hottest function in the perf-report
output. Both the ebitmap_get_bit() and find_next_bit() functions
were, in fact, called by mls_level_isvalid(). As a result, the
mls_level_isvalid() call consumed 8.95% of the total CPU time of
all the 24 virtual CPUs which is quite a lot. The majority of the
mls_level_isvalid() function invocations come from the socket creation
system call.
Looking at the mls_level_isvalid() function, it is checking to see
if all the bits set in one of the ebitmap structure are also set in
another one as well as the highest set bit is no bigger than the one
specified by the given policydb data structure. It is doing it in
a bit-by-bit manner. So if the ebitmap structure has many bits set,
the iteration loop will be done many times.
The current code can be rewritten to use a similar algorithm as the
ebitmap_contains() function with an additional check for the
highest set bit. The ebitmap_contains() function was extended to
cover an optional additional check for the highest set bit, and the
mls_level_isvalid() function was modified to call ebitmap_contains().
With that change, the perf trace showed that the used CPU time drop
down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the
total which is about 100X less than before.
0.07% ls [kernel.kallsyms] [k] ebitmap_contains
0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit
0.01% ls [kernel.kallsyms] [k] mls_level_isvalid
0.01% ls [kernel.kallsyms] [k] find_next_bit
The remaining ebitmap_get_bit() and find_next_bit() functions calls
are made by other kernel routines as the new mls_level_isvalid()
function will not call them anymore.
This patch also improves the high_systime AIM7 benchmark result,
though the improvement is not as impressive as is suggested by the
reduction in CPU time spent in the ebitmap functions. The table below
shows the performance change on the 2-socket x86-64 system (with HT
on) mentioned above.
+--------------+---------------+----------------+-----------------+
| Workload | mean % change | mean % change | mean % change |
| | 10-100 users | 200-1000 users | 1100-2000 users |
+--------------+---------------+----------------+-----------------+
| high_systime | +0.1% | +0.9% | +2.6% |
+--------------+---------------+----------------+-----------------+
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
There currently doesn't exist a labeling type that is adequate for use with
labeled NFS. Since NFS doesn't really support xattrs we can't use the use xattr
labeling behavior. For this we developed a new labeling type. The native
labeling type is used solely by NFS to ensure NFS inodes are labeled at runtime
by the NFS code instead of relying on the SELinux security server on the client
end.
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
avc_add_callback now just used for registering reset functions
in initcalls, and the callback functions just did reset operations.
So, reducing the arguments to only one event is enough now.
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
It's possible that the caller passed a NULL for scontext. However if this
is a defered mapping we might still attempt to call *scontext=kstrdup().
This is bad. Instead just return the len.
Signed-off-by: Eric Paris <eparis@redhat.com>
Because Fedora shipped userspace based on my development tree we now
have policy version 27 in the wild defining only default user, role, and
range. Thus to add default_type we need a policy.28.
Signed-off-by: Eric Paris <eparis@redhat.com>
When new objects are created we have great and flexible rules to
determine the type of the new object. We aren't quite as flexible or
mature when it comes to determining the user, role, and range. This
patch adds a new ability to specify the place a new objects user, role,
and range should come from. For users and roles it can come from either
the source or the target of the operation. aka for files the user can
either come from the source (the running process and todays default) or
it can come from the target (aka the parent directory of the new file)
examples always are done with
directory context: system_u:object_r:mnt_t:s0-s0:c0.c512
process context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[no rule]
unconfined_u:object_r:mnt_t:s0 test_none
[default user source]
unconfined_u:object_r:mnt_t:s0 test_user_source
[default user target]
system_u:object_r:mnt_t:s0 test_user_target
[default role source]
unconfined_u:unconfined_r:mnt_t:s0 test_role_source
[default role target]
unconfined_u:object_r:mnt_t:s0 test_role_target
[default range source low]
unconfined_u:object_r:mnt_t:s0 test_range_source_low
[default range source high]
unconfined_u:object_r:mnt_t:s0:c0.c1023 test_range_source_high
[default range source low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c1023 test_range_source_low-high
[default range target low]
unconfined_u:object_r:mnt_t:s0 test_range_target_low
[default range target high]
unconfined_u:object_r:mnt_t:s0:c0.c512 test_range_target_high
[default range target low-high]
unconfined_u:object_r:mnt_t:s0-s0:c0.c512 test_range_target_low-high
Signed-off-by: Eric Paris <eparis@redhat.com>
The semantic patch that makes this change is available
in scripts/coccinelle/api/alloc/drop_kmalloc_cast.cocci.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: James Morris <jmorris@namei.org>
My @hp.com will no longer be valid starting August 5, 2011 so an update is
necessary. My new email address is employer independent so we don't have
to worry about doing this again any time soon.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When policy version is less than POLICYDB_VERSION_FILENAME_TRANS,
skip file_name_trans_write().
Signed-off-by: Roy.Li <rongqing.li@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Right now security_get_user_sids() will pass in a NULL avd pointer to
avc_has_perm_noaudit(), which then forces that function to have a dummy
entry for that case and just generally test it.
Don't do it. The normal callers all pass a real avd pointer, and this
helper function is incredibly hot. So don't make avc_has_perm_noaudit()
do conditional stuff that isn't needed for the common case.
This also avoids some duplicated stack space.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>