Reschedule code to read ar.bsp as early as possible. To enable this,
don't bother clearing some of the registers when we're returning to
kernel stacks. Also, instead of trying to support the pNonSys case
(which makes no sense), do a bugcheck instead (with break 0). Finally,
remove a clear of r14 which is a left-over from the previous patch.
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Why is this a good idea? Clearing b7 to 0 is guaranteed to do us no
good and writing it with __kernel_syscall_via_epc() yields a 6 cycle
improvement _if_ the application performs another EPC-based system-
call without overwriting b7, which is not all that uncommon. Well
worth the minimal cost of 1 bundle of code.
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Decreases syscall overhead by approximately 6 cycles.
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This by itself is good for a 1-2 cycle speed up. Effect is bigger
when combined with the later patches.
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Sadly, I goofed in this syscall-tuning patch:
ChangeSet 1.1966.1.40 2005/01/22 13:31:05 davidm@hpl.hp.com
[IA64] Improve ia64_leave_syscall() for McKinley-type cores.
Optimize ia64_leave_syscall() a bit better for McKinley-type cores.
The patch looks big, but that's mostly due to renaming r16/r17 to r2/r3.
Good for a 13 cycle improvement.
The problem is that the size of the physical stacked registers was
loaded into the wrong register (r3 instead of r17). Since r17 by
coincidence always had the value 1, this had the effect of turning
rse_clear_invalid into a no-op. That poses the risk of leaking kernel
state back to user-land and is hence not acceptable.
The fix below is simple, but unfortunately it costs us about 28 cycles
in syscall overhead. ;-(
Unfortunately, there isn't much we can do about that since those
registers have to be cleared one way or another.
--david
Signed-off-by: Tony Luck <tony.luck@intel.com>
Recently I noticed that clearing ar.ssd/ar.csd right before srlz.d is
causing significant stalling in the syscall path. The patch below
fixes that by moving the register-writes after srlz.d. On a Madison,
this drops break-based getpid() from 241 to 226 cycles (-15 cycles).
Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!