Xen wants a dedicated page for the GDT. I believe VMI likes it too.
lguest, KVM and native don't care.
Simple transformation to page-aligned "struct gdt_page".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Now we have an explicit per-cpu GDT variable, we don't need to keep the
descriptors around to use them to find the GDT: expose cpu_gdt directly.
We could go further and make load_gdt() pack the descriptor for us, or even
assume it means "load the current cpu's GDT" which is what it always does.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
paravirt.c used to implement native versions of all low-level
functions. Far cleaner is to have the native versions exposed in the
headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply
#define XXX native_XXX.
There are several nice side effects:
1) write_dt_entry() now takes the correct "struct Xgt_desc_struct *"
not "void *".
2) load_TLS is reintroduced to the for loop, not manually unrolled
with a #error in case the bounds ever change.
3) Macros become inlines, with type checking.
4) Access to the native versions is trivial for KVM, lguest, Xen and
others who might want it.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now we are no longer dynamically allocating the GDT, we don't need the
"cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the
per-cpu GDT. This means initializing the cpu_gdt array in C.
The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it
switches to the per-cpu copy just allocated. For secondary CPUs, the
early_gdt_descr is set to point directly to their per-cpu copy.
For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP,
but we never have to move.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).
[akpm@linux-foundation.org: build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When I implemented the DECLARE_PER_CPU(var) macros, I was careful that
people couldn't use "var" in a non-percpu context, by prepending
percpu__. I never considered that this would allow them to overload
the same name for a per-cpu and a non-percpu variable.
It is only one of many horrors in the i386 boot code, but let's rename
the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr,
that's something else...)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
===================================================================
It turns out that the most called ops, by several orders of magnitude,
are the interrupt manipulation ops. These are obvious candidates for
patching, so mark them up and create infrastructure for it.
The method used is that the ops structure has a patch function, which
is called for each place which needs to be patched: this returns a
number of instructions (the rest are NOP-padded).
Usually we can spare a register (%eax) for the binary patched code to
use, but in a couple of critical places in entry.S we can't: we make
the clobbers explicit at the call site, and manually clobber the
allowed registers in debug mode as an extra check.
And:
Don't abuse CONFIG_DEBUG_KERNEL, add CONFIG_DEBUG_PARAVIRT.
And:
AK: Fix warnings in x86-64 alternative.c build
And:
AK: Fix compilation with defconfig
And:
^From: Andrew Morton <akpm@osdl.org>
Some binutlises still like to emit references to __stop_parainstructions and
__start_parainstructions.
And:
AK: Fix warnings about unused variables when PARAVIRT is disabled.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Create a paravirt.h header for all the critical operations which need to be
replaced with hypervisor calls, and include that instead of defining native
operations, when CONFIG_PARAVIRT.
This patch does the dumbest possible replacement of paravirtualized
instructions: calls through a "paravirt_ops" structure. Currently these are
function implementations of native hardware: hypervisors will override the ops
structure with their own variants.
All the pv-ops functions are declared "fastcall" so that a specific
register-based ABI is used, to make inlining assember easier.
And:
+From: Andy Whitcroft <apw@shadowen.org>
The paravirt ops introduce a 'weak' attribute onto memory_setup().
Code ordering leads to the following warnings on x86:
arch/i386/kernel/setup.c:651: warning: weak declaration of
`memory_setup' after first use results in unspecified behavior
Move memory_setup() to avoid this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Clean up the espfix code:
- Introduced PER_CPU() macro to be used from asm
- Introduced GET_DESC_BASE() macro to be used from asm
- Rewrote the fixup code in asm, as calling a C code with the altered %ss
appeared to be unsafe
- No longer altering the stack from a .fixup section
- 16bit per-cpu stack is no longer used, instead the stack segment base
is patched the way so that the high word of the kernel and user %esp
are the same.
- Added the limit-patching for the espfix segment. (Chuck Ebbert)
[jeremy@goop.org: use the x86 scaling addressing mode rather than shifting]
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Acked-by: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch removes the default_ldt[] array, as it has been unused since
iBCS stopped being supported. This means it is now possible to actually
set an empty LDT segment.
In order to deal with this, the set_ldt_desc/load_LDT pair has been
replaced with a single set_ldt() operation which is responsible for both
setting up the LDT descriptor in the GDT, and reloading the LDT register.
If there are no LDT entries, the LDT register is loaded with a NULL
descriptor.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Fix pack_descriptor:
1. flags are bits 20-23 in the high word
2. limit's 4 msb are bits 16-19 in the high word
These haven't mattered so far, because all users have had small limits
and a flags setting of 0.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
===================================================================
The implementation comes from Zach's [RFC, PATCH 10/24] i386 Vmi
descriptor changes:
Descriptor and trap table cleanups. Add cleanly written accessors for
IDT and GDT gates so the subarch may override them. Note that this
allows the hypervisor to transparently tweak the DPL of the descriptors
as well as the RPL of segments in those descriptors, with no unnecessary
kernel code modification. It also allows the hypervisor implementation
of the VMI to tweak the gates, allowing for custom exception frames or
extra layers of indirection above the guest fault / IRQ handlers.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Recent GDT changes broke the SMP boot sequence if the booting CPU is
numbered anything other than zero. There's also a subtle source of error
in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT
for booting CPUs in head.S). This patch fixes both problems by making GDT
descriptors themselves allocated from a per_cpu area and switching to them
in cpu_init(), which now means that cpu_gdt_table is exclusively used for
booting CPUs again.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make GDT page aligned and page padded to support running inside of a
hypervisor. This prevents false sharing of the GDT page with other hot
data, which is not allowed in Xen, and causes performance problems in
VMware.
Rather than go back to the old method of statically allocating the GDT
(which wastes unneded space for non-present CPUs), the GDT for APs is
allocated dynamically.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add an accessor function for getting the per-CPU gdt. Callee must already
have the CPU.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
GCC can generate better code around descriptor update and access functions
when there is not an explicit "eax" register constraint.
Testing: You won't boot if this is messed up, since the TSS descriptor will be
corrupted. Verified the assembler and booted.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
i386 inline assembler cleanup.
This change encapsulates descriptor and task register management. Also,
it is possible to improve assembler generation in two cases; savesegment
may store the value in a register instead of a memory location, which
allows GCC to optimize stack variables into registers, and MOV MEM, SEG
is always a 16-bit write to memory, making the casting in math-emu
unnecessary.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!