Commit Graph

12046 Commits

Author SHA1 Message Date
Avi Kivity d46164dbd9 KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:16 +02:00
Avi Kivity 7db41eb762 KVM: x86 emulator: add Src2Imm decoding
Needed for 3-operand IMUL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:15 +02:00
Avi Kivity 39f21ee546 KVM: x86 emulator: consolidate immediate decode into a function
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:14 +02:00
Avi Kivity 48bb5d3c40 KVM: x86 emulator: implement RDTSC (opcode 0F 31)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:14 +02:00
Avi Kivity 7077aec0bc KVM: x86 emulator: remove SrcImplicit
Useless.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:13 +02:00
Avi Kivity 5c82aa2998 KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity f3a1b9f496 KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity 40ece7c729 KVM: x86 emulator: implement RET imm16 (opcode C2)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity b250e60589 KVM: x86 emulator: add SrcImmU16 operand type
Used for RET NEAR instructions.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity 0ef753b8c3 KVM: x86 emulator: implement CALL FAR (FF /3)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:12 +02:00
Avi Kivity 7af04fc05c KVM: x86 emulator: implement DAS (opcode 2F)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Avi Kivity fb2c264105 KVM: x86 emulator: Use a register for ____emulate_2op() destination
Most x86 two operand instructions allow the destination to be a memory operand,
but IMUL (for example) requires that the destination be a register.  Change
____emulate_2op() to take a register for both source and destination so we
can invoke IMUL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Avi Kivity b3b3d25a12 KVM: x86 emulator: pass destination type to ____emulate_2op()
We'll need it later so we can use a register for the destination.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Wei Yongjun f2f3184534 KVM: x86 emulator: add LOOP/LOOPcc instruction emulation
Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2).

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Wei Yongjun e8b6fa70e3 KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation
Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98)
Used by FreeBSD's boot loader.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:11 +02:00
Avi Kivity 0fa6ccbd28 KVM: x86 emulator: fix REPZ/REPNZ termination condition
EFLAGS.ZF needs to be checked after each iteration, not before.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:10 +02:00
Avi Kivity f6b33fc504 KVM: x86 emulator: implement SCAS (opcodes AE, AF)
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:10 +02:00
Avi Kivity 5c56e1cf7a KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS
emulate_push() only schedules a push; it doesn't actually push anything.
Call writeback() to flush out the write.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun a13a63faa6 KVM: x86 emulator: remove dup code of in/out instruction
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun 41167be544 KVM: x86 emulator: change OUT instruction to use dst instead of src
Change OUT instruction to use dst instead of src, so we can
reuse those code for all out instructions.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun 943858e275 KVM: x86 emulator: introduce DstImmUByte for dst operand decode
Introduce DstImmUByte for dst operand decode, which
will be used for out instruction.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun c483c02ad3 KVM: x86 emulator: remove useless label from x86_emulate_insn()
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:09 +02:00
Wei Yongjun ee45b58efe KVM: x86 emulator: add setcc instruction emulation
Add setcc instruction emulation (opcode 0x0f 0x90~0x9f)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:08 +02:00
Wei Yongjun 92f738a52b KVM: x86 emulator: add XADD instruction emulation
Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:06 +02:00
Wei Yongjun 31be40b398 KVM: x86 emulator: put register operand write back to a function
Introduce function write_register_operand() to write back the
register operand.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:06 +02:00
Mohammed Gamal 8ec4722dd2 KVM: Separate emulation context initialization in a separate function
The code for initializing the emulation context is duplicated at two
locations (emulate_instruction() and kvm_task_switch()). Separate it
in a separate function and call it from there.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Wei Yongjun d9574a25af KVM: x86 emulator: add bsf/bsr instruction emulation
Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Mohammed Gamal 8c5eee30a9 KVM: x86 emulator: Fix emulate_grp3 return values
This patch lets emulate_grp3() return X86EMUL_* return codes instead
of hardcoded ones.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Mohammed Gamal 3f9f53b0d5 KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions
This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7).

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:04 +02:00
Wei Yongjun ba7ff2b76d KVM: x86 emulator: mask group 8 instruction as BitOp
Mask group 8 instruction as BitOp, so we can share the
code for adjust the source operand.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:03 +02:00
Wei Yongjun 3885f18fe3 KVM: x86 emulator: do not adjust the address for immediate source
adjust the dst address for a register source but not adjust the
address for an immediate source.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:02 +02:00
Wei Yongjun 35c843c485 KVM: x86 emulator: fix negative bit offset BitOp instruction emulation
If bit offset operands is a negative number, BitOp instruction
will return wrong value. This patch fix it.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:01 +02:00
Mohammed Gamal 8744aa9aad KVM: x86 emulator: Add stc instruction (opcode 0xf9)
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:01 +02:00
Wei Yongjun c034da8b92 KVM: x86 emulator: using SrcOne for instruction d0/d1 decoding
Using SrcOne for instruction d0/d1 decoding.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Wei Yongjun 36089fed70 KVM: x86 emulator: disable writeback when decode dest operand
This patch change to disable writeback when decode dest
operand if the dest type is ImplicitOps or not specified.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Wei Yongjun 06cb704611 KVM: x86 emulator: use SrcAcc to simplify stos decoding
Use SrcAcc to simplify stos decoding.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Mohammed Gamal 6e154e56b4 KVM: x86 emulator: Add into, int, and int3 instructions (opcodes 0xcc-0xce)
This adds support for int instructions to the emulator.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:00 +02:00
Mohammed Gamal 160ce1f1a8 KVM: x86 emulator: Allow accessing IDT via emulator ops
The patch adds a new member get_idt() to x86_emulate_ops.
It also adds a function to get the idt in order to be used by the emulator.

This is needed for real mode interrupt injection and the emulation of int
instructions.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:59 +02:00
Wei Yongjun d3ad624329 KVM: x86 emulator: simplify two-byte opcode check
Two-byte opcode always start with 0x0F and the decode flags
of opcode 0xF0 is always 0, so remove dup check.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:59 +02:00
Alexander Graf ba49296236 KVM: Move kvm_guest_init out of generic code
Currently x86 is the only architecture that uses kvm_guest_init(). With
PowerPC we're getting a second user, but the signature is different there
and we don't need to export it, as it uses the normal kernel init framework.

So let's move the x86 specific definition of that function over to the x86
specfic header file.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:49 +02:00
Mohammed Gamal 34698d8c61 KVM: x86 emulator: Fix nop emulation
If a nop instruction is encountered, we jump directly to the done label.
This skip updating rip. Break from the switch case instead

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:41 +02:00
Avi Kivity 2dbd0dd711 KVM: x86 emulator: Decode memory operands directly into a 'struct operand'
Since modrm operand can be either register or memory, decoding it into
a 'struct operand', which can represent both, is simpler.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:40 +02:00
Avi Kivity 1f6f05800e KVM: x86 emulator: change invlpg emulation to use src.mem.addr
Instead of using modrm_ea, which will soon be gone.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:39 +02:00
Avi Kivity 342fc63095 KVM: x86 emulator: switch LEA to use SrcMem decoding
The NoAccess flag will prevent memory from being accessed.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:38 +02:00
Avi Kivity 5a506b125f KVM: x86 emulator: add NoAccess flag for memory instructions that skip access
Use for INVLPG, which accesses the tlb, not memory.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:37 +02:00
Avi Kivity b27f38563d KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg for reg op
This is an ordinary modrm source or destination; use the standard structure
representing it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:36 +02:00
Avi Kivity 1a0c7d44e4 KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg for reg op
This is an ordinary modrm source or destination; use the standard structure
representing it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:35 +02:00
Avi Kivity cecc9e3916 KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in long mode
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:35 +02:00
Avi Kivity 7f9b4b75be KVM: x86 emulator: introduce Op3264 for mov cr and mov dr instructions
The operands for these instructions are 32 bits or 64 bits, depending on
long mode, and ignoring REX prefixes, or the operand size prefix.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:35 +02:00
Avi Kivity 1e87e3efe7 KVM: x86 emulator: simplify REX.W check
(x && (x & y)) == (x & y)

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:34 +02:00
Avi Kivity d4709c78ee KVM: x86 emulator: drop use_modrm_ea
Unused (and has never been).

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:34 +02:00
Avi Kivity 91ff3cb43c KVM: x86 emulator: put register operand fetch into a function
The code is repeated three times, put it into fetch_register_operand()

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity 3d9e77dff8 KVM: x86 emulator: use SrcAcc to simplify xchg decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity 4515453964 KVM: x86 emulator: simplify xchg decode tables
Use X8() to avoid repetition.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity 1a6440aef6 KVM: x86 emulator: use correct type for memory address in operands
Currently we use a void pointer for memory addresses.  That's wrong since
these are guest virtual addresses which are not directly dereferencable by
the host.

Use the correct type, unsigned long.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Avi Kivity 09ee57cdae KVM: x86 emulator: push segment override out of decode_modrm()
Let it compute modrm_seg instead, and have the caller apply it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:33 +02:00
Joerg Roedel dbe7758482 KVM: SVM: Check for asid != 0 on nested vmrun
This patch lets a nested vmrun fail if the L1 hypervisor
left the asid zero. This fixes the asid_zero unit test.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:32 +02:00
Joerg Roedel 52c65a30a5 KVM: SVM: Check for nested vmrun intercept before emulating vmrun
This patch lets the nested vmrun fail if the L1 hypervisor
has not intercepted vmrun. This fixes the "vmrun intercept
check" unit test.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:32 +02:00
Xiao Guangrong 4132779b17 KVM: MMU: mark page dirty only when page is really written
Mark page dirty only when this page is really written, it's more exacter,
and also can fix dirty page marking in speculation path

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:32 +02:00
Xiao Guangrong 8672b7217a KVM: MMU: move bits lost judgement into a separate function
Introduce spte_has_volatile_bits() function to judge whether spte
bits will miss, it's more readable and can help us to cleanup code
later

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:31 +02:00
Xiao Guangrong 251464c464 KVM: MMU: using kvm_set_pfn_accessed() instead of mark_page_accessed()
It's a small cleanup that using using kvm_set_pfn_accessed() instead
of mark_page_accessed()

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:30 +02:00
Gleb Natapov 4fc40f076f KVM: x86 emulator: check io permissions only once for string pio
Do not recheck io permission on every iteration.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:29 +02:00
Avi Kivity 9928ff608b KVM: x86 emulator: fix LMSW able to clear cr0.pe
LMSW is documented not to be able to clear cr0.pe; make it so.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:50:28 +02:00
Gleb Natapov e85d28f8e8 KVM: x86 emulator: don't update vcpu state if instruction is restarted
No need to update vcpu state since instruction is in the middle of the
emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:27 +02:00
Avi Kivity 63540382cc KVM: x86 emulator: convert some push instructions to direct decode
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:26 +02:00
Avi Kivity d0e533255d KVM: x86 emulator: allow repeat macro arguments to contain commas
Needed for repeating instructions with execution functions.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:25 +02:00
Avi Kivity 73fba5f4fe KVM: x86 emulator: move decode tables downwards
So they can reference execution functions.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:25 +02:00
Avi Kivity dde7e6d12a KVM: x86 emulator: move x86_decode_insn() downwards
No code changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:24 +02:00
Avi Kivity ef65c88912 KVM: x86 emulator: allow storing emulator execution function in decode tables
Instead of looking up the opcode twice (once for decode flags, once for
the big execution switch) look up both flags and function in the decode tables.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:22 +02:00
Avi Kivity 9aabc88fc8 KVM: x86 emulator: store x86_emulate_ops in emulation context
It doesn't ever change, so we don't need to pass it around everywhere.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:21 +02:00
Avi Kivity ab85b12b1a KVM: x86 emulator: move ByteOp and Dst back to bits 0:3
Now that the group index no longer exists, the space is free.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:20 +02:00
Avi Kivity 3885d530b0 KVM: x86 emulator: drop support for old-style groups
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:19 +02:00
Avi Kivity 9f5d3220e3 KVM: x86 emulator: convert group 9 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:18 +02:00
Avi Kivity 2cb20bc8af KVM: x86 emulator: convert group 8 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:18 +02:00
Avi Kivity 2f3a9bc9eb KVM: x86 emulator: convert group 7 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:16 +02:00
Avi Kivity b67f9f0741 KVM: x86 emulator: convert group 5 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:15 +02:00
Avi Kivity 591c9d20a3 KVM: x86 emulator: convert group 4 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:14 +02:00
Avi Kivity ee70ea30ee KVM: x86 emulator: convert group 3 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:13 +02:00
Avi Kivity 99880c5cd5 KVM: x86 emulator: convert group 1A to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:12 +02:00
Avi Kivity 5b92b5faff KVM: x86 emulator: convert group 1 to new style
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:11 +02:00
Avi Kivity 120df8902d KVM: x86 emulator: allow specifying group directly in opcode
Instead of having a group number, store the group table pointer directly in
the opcode.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:10 +02:00
Avi Kivity 793d5a8d6b KVM: x86 emulator: reserve group code 0
We'll be using that to distinguish between new-style and old-style groups.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:09 +02:00
Avi Kivity 42a1c52095 KVM: x86 emulator: move group tables to top
No code changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:08 +02:00
Avi Kivity fd853310a1 KVM: x86 emulator: Add wrappers for easily defining opcodes
Once 'struct opcode' grows, its initializer will become more complicated.
Wrap the simple initializers in a D() macro, and replace the empty initializers
with an even simpler N macro.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:08 +02:00
Avi Kivity d65b1dee40 KVM: x86 emulator: introduce 'struct opcode'
This will hold all the information known about the opcode.  Currently, this
is just the decode flags.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:07 +02:00
Avi Kivity ea9ef04e19 KVM: x86 emulator: drop parentheses in repreat macros
The parenthese make is impossible to use the macros with initializers that
require braces.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:06 +02:00
Mohammed Gamal 62bd430e6d KVM: x86 emulator: Add IRET instruction
Ths patch adds IRET instruction (opcode 0xcf).
Currently, only IRET in real mode is emulated. Protected mode support is to be added later if needed.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Reviewed-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:05 +02:00
Joerg Roedel 7a190667bb KVM: SVM: Emulate next_rip svm feature
This patch implements the emulations of the svm next_rip
feature in the nested svm implementation in kvm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:04 +02:00
Joerg Roedel 3f6a9d1693 KVM: SVM: Sync efer back into nested vmcb
This patch fixes a bug in a nested hypervisor that heavily
switches between real-mode and long-mode. The problem is
fixed by syncing back efer into the guest vmcb on emulated
vmexit.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:03 +02:00
Xiao Guangrong 19ada5c4b6 KVM: MMU: remove valueless output message
After commit 53383eaad08d, the '*spte' has updated before call
rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so
remove this information from error message

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:02 +02:00
Avi Kivity d359192fea KVM: VMX: Use host_gdt variable wherever we need the host gdt
Now that we have the host gdt conveniently stored in a variable, make use
of it instead of querying the cpu.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:01 +02:00
Avi Kivity e071edd5ba KVM: x86 emulator: unify the two Group 3 variants
Use just one group table for byte (F6) and word (F7) opcodes.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:50:00 +02:00
Avi Kivity dfe11481d8 KVM: x86 emulator: Allow LOCK prefix for NEG and NOT
Opcodes F6/2, F6/3, F7/2, F7/3.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:59 +02:00
Avi Kivity 4968ec4e26 KVM: x86 emulator: simplify Group 1 decoding
Move operand decoding to the opcode table, keep lock decoding in the group
table.  This allows us to get consolidate the four variants of Group 1 into one
group.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:58 +02:00
Avi Kivity 52811d7de5 KVM: x86 emulator: mix decode bits from opcode and group decode tables
Allow bits that are common to all members of a group to be specified in the
opcode table instead of the group table.  This allows some simplification
of the decode tables.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:58 +02:00
Avi Kivity 047a481809 KVM: x86 emulator: add Undefined decode flag
Add a decode flag to indicate the instruction is invalid.  Will come in useful
later, when we mix decode bits from the opcode and group table.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:57 +02:00
Avi Kivity 2ce495365f KVM: x86 emulator: Make group storage bits separate from operand bits
Currently group bits are stored in bits 0:7, where operand bits are stored.

Make group bits be 0:3, and move the existing bits 0:3 to 16:19, so we can
mix group and operand bits.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:55 +02:00
Avi Kivity 880a188378 KVM: x86 emulator: consolidate Jcc rel32 decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:55 +02:00
Avi Kivity be8eacddbd KVM: x86 emulator: consolidate CMOVcc decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:53 +02:00
Avi Kivity b6e6153885 KVM: x86 emulator: consolidate MOV reg, imm decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:52 +02:00
Avi Kivity b3ab3405fe KVM: x86 emulator: consolidate Jcc rel8 decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:51 +02:00
Avi Kivity 3849186c38 KVM: x86 emulator: consolidate push/pop reg decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:49 +02:00
Avi Kivity 749358a6b4 KVM: x86 emulator: consolidate inc/dec reg decoding
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:48 +02:00
Avi Kivity 83babbca46 KVM: x86 emulator: add macros for repetitive instructions
Some instructions are repetitive in the opcode space, add macros for
consolidating them.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:48 +02:00
Avi Kivity 91269b8f94 KVM: x86 emulator: fix handling for unemulated instructions
If an instruction is present in the decode tables but not in the execution
switch, it will be emulated as a NOP.  An example is IRET (0xcf).

Fix by adding default: labels to the execution switches.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:49:47 +02:00
Jiri Slaby e4072a9a9d x86, printk: Get rid of <0> from stack output
The stack output currently looks like this:

 7fffffffffffffff 0000000a00000000 ffffffff81093341 0000000000000046
<0> ffff88003a545fd8 0000000000000000 0000000000000000 00007fffa39769c0
<0> ffff88003e403f58 ffffffff8102fc4c ffff88003e403f58 ffff88003e403f78

The superfluous <0> are caused by recent printk KERN_CONT
change. <*> is now ignored in printk unless some text follows
the level and even then it still has to be the first in the
format message.

Note that the log_lvl parameter is now completely ignored in
show_stack_log_lvl and the stack is dumped with the default
level (like for quite some time already). It behaves the same as
the rest of the dump, function traces are dumped in the very
same manner. Only Code and maybe some lines are printed with
EMERG level.

Unfortunately I see no way how to fix this conceptually to have
the whole oops/BUG/panic output with the same level, so this
removed only the superfluous characters for the time being.

Just for illustration:

<4>Process kworker/0:0 (pid: 0, threadinfo ffff88003c8a6000, task ffff88003c85c100)
<0>Stack:
<4> ffffffff818022c0 0000000a00000001 0000000000000001 0000000000000046
<4> ffff88003c8a7fd8 0000000000000001 ffff88003c8a7e58 0000000000000000
<4> ffff88003e503f48 ffffffff8102fc4c ffff88003e503f48 ffff88003e503f68
<0>Call Trace:
<0> <IRQ>
<4> [<ffffffff8102fc4c>] ? call_softirq+0x1c/0x30 ...
<0>Code: 00 01 00 00 65 8b 04 25 80 c5 00 00 c7 45 ...

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: jirislaby@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1287586131-16222-1-git-send-email-jslaby@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-23 20:03:03 +02:00
Linus Torvalds 02f36038c5 Merge branches 'softirq-for-linus', 'x86-debug-for-linus', 'x86-numa-for-linus', 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  softirqs: Make wakeup_softirqd static

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, asm: Restore parentheses around one pushl_cfi argument
  x86, asm: Fix ancient-GAS workaround
  x86, asm: Fix CFI macro invocations to deal with shortcomings in gas

* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

* 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: HPET force enable for CX700 / VIA Epia LT

* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, setup: Use string copy operation to optimze copy in kernel compression

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()

* 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.
2010-10-23 08:25:36 -07:00
Linus Torvalds 10f2a2b0f6 Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, mm: Add an initial page table for core bootstrapping
2010-10-22 20:37:50 -07:00
Linus Torvalds 8814011679 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kdb,debug_core: adjust master cpu switch logic against new debug_core locking
  debug_core: refactor locking for master/slave cpus
  x86,kgdb: remove unnecessary call to kgdb_correct_hw_break()
  debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter()
  kdb,kgdb: fix sparse fixups
  kdb: Fix oops in kdb_unregister
  kdb,ftdump: Remove reference to internal kdb include
  kdb: Allow kernel loadable modules to add kdb shell functions
  debug_core: stop rcu warnings on kernel resume
  debug_core: move all watch dog syncs to a single function
  x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35
2010-10-22 20:35:12 -07:00
Linus Torvalds 0fc0531e0a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: update comments to reflect that percpu allocations are always zero-filled
  percpu: Optimize __get_cpu_var()
  x86, percpu: Optimize this_cpu_ptr
  percpu: clear memory allocated with the km allocator
  percpu: fix build breakage on s390 and cleanup build configuration tests
  percpu: use percpu allocator on UP too
  percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
  vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

Fixed up trivial conflicts in include/linux/percpu.h
2010-10-22 17:31:36 -07:00
Dongdong Deng 39a0715f5a x86,kgdb: remove unnecessary call to kgdb_correct_hw_break()
The kernel debug_core invokes hw breakpoint install and removal via
call backs.  The architecture specific kgdb stubs only need to
implement the call backs and not actually call the functions.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: x86@kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
2010-10-22 15:34:13 -05:00
Jason Wessel 91b152aa85 kdb,kgdb: fix sparse fixups
Fix the following sparse warnings:

kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static?
kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static?
kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces)
kgdb.c:652:26:    expected void const *ptr
kgdb.c:652:26:    got struct perf_event *[noderef] <asn:3>*pev

The one in kgdb.c required the (void * __force) because of the return
code from register_wide_hw_breakpoint looking like:

        return (void __percpu __force *)ERR_PTR(err);

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:12 -05:00
Jason Wessel fad99fac26 x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35
HW breakpoints events stopped working correctly with kgdb as a result
of commit: 018cbffe68 (Merge commit
'v2.6.33' into perf/core), later commit:
ba773f7c51 (x86,kgdb: Fix hw breakpoint
regression) allowed breakpoints to propagate to the debugger core but
did not completely address the original regression in functionality
found in 2.6.35.

When the DR_STEP flag is set in dr6 along with any of the DR_TRAP
bits, the kgdb exception handler will enter once from the
hw_breakpoint API call back and again from the die notifier for
do_debug(), which causes the debugger to stop twice and also for the
kgdb regression tests to fail running under kvm with:

echo V2I1 > /sys/module/kgdbts/parameters/kgdbts

To address the problem, the kgdb overflow handler needs to implement
the same logic as the ptrace overflow handler call back with respect
to updating the virtual copy of dr6.  This will allow the kgdb
do_debug() die notifier to properly handle the exception and the
attached debugger, or kgdb test suite, will only receive a single
notification.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: x86@kernel.org
2010-10-22 15:34:10 -05:00
Stefano Stabellini 0e058e5277 xen: add a missing #include to arch/x86/pci/xen.c
Add missing #include <asm/io_apic.h> to arch/x86/pci/xen.c.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2010-10-22 21:26:02 +01:00
Stefano Stabellini ff12849a7a xen: mask the MTRR feature from the cpuid
We don't want Linux to think that the cpu supports MTRRs when running
under Xen because MTRR operations could only be performed through
hypercalls.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:26:02 +01:00
Juan Quintela 4ec5387cc3 xen: add the direct mapping area for ISA bus access
add the direct mapping area for ISA bus access when running as initial
domain

Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:47 +01:00
Stefano Stabellini 801fd14a72 xen: use vcpu_ops to setup cpu masks
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:45 +01:00
Jeremy Fitzhardinge 98511f3532 xen: map a dummy page for local apic and ioapic in xen_set_fixmap
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:44 +01:00
Qing He f731e3ef02 xen: remap MSIs into pirqs when running as initial domain
Implement xen_create_msi_irq to create an msi and remap it as pirq.
Use xen_create_msi_irq to implement an initial domain specific version
of setup_msi_irqs.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:44 +01:00
Jeremy Fitzhardinge 38aa66fcb7 xen: remap GSIs as pirqs when running as initial domain
Implement xen_register_gsi to setup the correct triggering and polarity
properties of a gsi.
Implement xen_register_pirq to register a particular gsi as pirq and
receive interrupts as events.
Call xen_setup_pirqs to register all the legacy ISA irqs as pirqs.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:43 +01:00
Stefano Stabellini 6b0661a5e6 xen: introduce XEN_DOM0 as a silent option
Add XEN_DOM0 to arch/x86/xen/Kconfig as a silent compile time option
that gets enabled when xen and basic x86, acpi and pci support are
selected.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:43 +01:00
Stefano Stabellini 809f9267bb xen: map MSIs into pirqs
Map MSIs into pirqs, writing 0 in the MSI vector data field and the pirq
number in the MSI destination id field.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:43 +01:00
Stefano Stabellini 3942b740e5 xen: support GSI -> pirq remapping in PV on HVM guests
Disable pcifront when running on HVM: it is meant to be used with pv
guests that don't have PCI bus.

Use acpi_register_gsi_xen_hvm to remap GSIs into pirqs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge 90f6881e64 xen: add xen hvm acpi_register_gsi variant
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge 2f065aef17 acpi: use indirect call to register gsi in different modes
Rather than using a tree of conditionals, use function pointer
for acpi_register_gsi.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-10-22 21:25:41 +01:00
Stefano Stabellini 42a1de56f3 xen: implement xen_hvm_register_pirq
xen_hvm_register_pirq allows the kernel to map a GSI into a Xen pirq and
receive the interrupt as an event channel from that point on.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 21:25:41 +01:00
Stefano Stabellini 67ba37293e Merge commit 'konrad/stable/xen-pcifront-0.8.2' into 2.6.36-rc8-initial-domain-v6 2010-10-22 21:24:06 +01:00
Ian Campbell 9e9a5fcb04 xen: use host E820 map for dom0
When running as initial domain, get the real physical memory map from
xen using the XENMEM_machine_memory_map hypercall and use it to setup
the e820 regions.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-22 13:19:19 -07:00
Ian Campbell 375b2a9ada xen: correctly rebuild mfn list list after migration.
Otherwise the second migration attempt fails because the mfn_list_list
still refers to all the old mfns.

We need to update the entires in both p2m_top_mfn and the mid_mfn
pages which p2m_top_mfn refers to.

In order to do this we need to keep track of the virtual addresses
mapping the p2m_mid_mfn pages since we cannot rely on
mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still
contain the old MFN after a migration, which may now belong to another
domain and hence have a different mapping in the m2p.

Therefore add and maintain a third top level page, p2m_top_mfn_p[],
which tracks the virtual addresses of the mfns contained in
p2m_top_mfn[].

We also need to update the content of the p2m_mid_missing_mfn page on
resume to refer to the page's new mfn.

p2m_missing does not need updating since the migration process takes
care of the leaf p2m pages for us.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:36 -07:00
Jeremy Fitzhardinge 3654581e47 xen: don't add extra_pages for RAM after mem_end
If an E820 region is entirely beyond mem_end, don't attempt to truncate
it and add the truncated pages to extra_pages, as they will be negative.

Also, make sure the extra memory region starts after all BIOS provided
E820 regions (and in the case of RAM regions, post-clipping).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:32 -07:00
Jeremy Fitzhardinge 41f2e4771a xen: add support for PAT
Convert Linux PAT entries into Xen ones when constructing ptes.  Linux
doesn't use _PAGE_PAT for ptes, so the only difference in the first 4
entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default)
use it for WT.

xen_pte_val does the inverse conversion.

We hard-code assumptions about Linux's current PAT layout, but a
warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems.
If necessary we could go to a more general table-based conversion between
Linux and Xen PAT entries.

hugetlbfs poses a problem at the moment, the x86 architecture uses the
same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending
on which pagetable level we're using.  At the moment this should be OK
so long as nobody tries to do a pte_val on a hugetlbfs pte.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:31 -07:00
Jeremy Fitzhardinge 2f7acb2085 xen: make sure xen_max_p2m_pfn is up to date
Keep xen_max_p2m_pfn up to date with the end of the extra memory
we're adding.  It is possible that it will be too high since memory
may be truncated by a "mem=" option on the kernel command line, but
that won't matter.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:30 -07:00
Jeremy Fitzhardinge 698bb8d14a xen: limit extra memory to a certain ratio of base
If extra memory is very much larger than the base memory size
then all of the base memory can be filled with structures reserved to
describe the extra memory, leaving no space for anything else.

Even at the maximum ratio there will be little space for anything else,
but this change is intended to at least allow the system to boot rather
than crash mysteriously.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge b5b43ced7a xen: add extra pages for E820 RAM regions, even if beyond mem_end
If an entire E820 RAM region is beyond mem_end, still add its
pages to the extra area so that space can be used by the kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge 36bc251b87 xen: make sure xen_extra_mem_start is beyond all non-RAM e820
If Xen gives us non-RAM E820 entries (dom0 only, typically), then
make sure the extra RAM region is beyond them.  It's OK for
the extra space to grow into E820 regions, however.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:28 -07:00
Jeremy Fitzhardinge 42ee1471e9 xen: implement "extra" memory to reserve space for pages not present at boot
When using the e820 map to get the initial pseudo-physical address space,
look for either Xen-provided memory which doesn't lie within an E820
region, or an E820 RAM region which extends beyond the Xen-provided
memory range.

Count these pages, and add them to a new "extra memory" range.  This range
has an E820 RAM range to describe it - so the kernel will allocate page
structures for it - but it is also marked reserved so that the kernel
will not attempt to use it.

The balloon driver can then add this range as a set of currently
ballooned-out pages, which can be used to extend the domain beyond its
original size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:27 -07:00
Ian Campbell 35ae11fd14 xen: Use host-provided E820 map
Rather than simply using a flat memory map from Xen, use its provided
E820 map.  This allows the domain builder to tell the domain to reserve
space for more pages than those initially provided at domain-build time.

It also allows the host to specify holes in the address space (for
PCI-passthrough, for example).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:27 -07:00
Jeremy Fitzhardinge cfd8951e08 xen: don't map missing memory
When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:26 -07:00
Jeremy Fitzhardinge 33a847502b xen: defer building p2m mfn structures until kernel is mapped
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit).  Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge c3798062f1 xen: add return value to set_phys_to_machine()
set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure.  It can only fail if setting
the mfn for a pfn in previously unused address space.  It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge 58e05027b5 xen: convert p2m to a 3 level tree
Make the p2m structure a 3 level tree which covers the full possible
physical space.

The p2m structure contains mappings from the domain's pfns to system-wide
mfns.  The structure has 3 levels and two roots.  The first root is for
the domain's own use, and is linked with virtual addresses.  The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.

At boot, the domain builder provides a simple flat p2m array for all the
initially present pages.  We construct the two levels above that using
the early_brk allocator.  After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).

Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size.  However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:24 -07:00
Jeremy Fitzhardinge bbbf61eff9 xen: make install_p2mtop_page() static
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge 1f2d9dd309 xen: set the actual extent of the mfn_list_list
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge b7eb4ad391 xen: set shared_info->arch.max_pfn to max_p2m_pfn
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:22 -07:00
Jeremy Fitzhardinge 1e17fc7eff xen: remove noise about registering vcpu info
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:21 -07:00
Jeremy Fitzhardinge 764f0138b9 xen: allocate level1_ident_pgt
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:20 -07:00
Jeremy Fitzhardinge f0991802bb xen: use early_brk for level2_kernel_pgt
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge a2e8752987 xen: allocate p2m size based on actual max size
Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge a171ce6e7b xen: dynamically allocate p2m space
Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:18 -07:00
Jeremy Fitzhardinge 5e941c0939 x86: add RESERVE_BRK_ARRAY() helper
Useful when converting static arrays into boottime brk allocated objects.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:17 -07:00
Linus Torvalds db08bf0877 Merge git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
* git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic/io.h: allow people to override individual funcs
  bitops: remove duplicated extern declarations
  bitops: make asm-generic/bitops/find.h more generic
  asm-generic: kdebug.h: Checkpatch cleanup
  asm-generic: fcntl: make exported headers use strict posix types
  asm-generic: cmpxchg does not handle non-long arguments
  asm-generic: make atomic_add_unless a function
2010-10-22 11:17:06 -07:00
Linus Torvalds 092e0e7e52 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
2010-10-22 10:52:56 -07:00
Linus Torvalds 91151240ed Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE
  x86: Move alloc_desk_mask variables inside ifdef
  x86-32: Align IRQ stacks properly
  x86: Remove CONFIG_4KSTACKS
  x86: Always use irq stacks

Fixed up trivial conflicts in include/linux/{irq.h, percpu-defs.h}
2010-10-22 08:54:21 -07:00
Linus Torvalds 211baf4ffc Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Hpet: Avoid the comparator readback penalty
2010-10-22 08:47:45 -07:00
Rakib Mullick a69a0612c4 [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit.
This patch fixes the following warning. The function
longrun_cpu_init() is marked with __cpuinit which calls
longrun_get_policy() which is a __init function. So make
longrun_get_policy with __cpuinit.

WARNING: arch/x86/kernel/cpu/cpufreq/longrun.o(.cpuinit.text+0x4c5):
Section mismatch in reference from the function longrun_cpu_init() to
the function .init.text:longrun_get_policy()
The function __cpuinit longrun_cpu_init() references
a function __init longrun_get_policy().
If longrun_get_policy is only used by longrun_cpu_init then
annotate longrun_get_policy with a matching annotation.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-10-22 11:44:47 -04:00
Julia Lawall b2a33c1728 [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type
In each case, the function has an unsigned return type, but returns a
negative constant to indicate an error condition.  Each function is only
called once.  For nforce2_detect_chipset, the result is only compared to 0,
and for longrun_determine_freqs, the result is stored in a variable of type
(signed) int.  Thus, for both functions, unsigned can be dropped from the
return type.

A sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@exists@
identifier f;
constant C;
@@

 unsigned f(...)
 { <+...
*  return -C;
 ...+> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-10-22 11:44:47 -04:00
Andi Kleen 46e387bbd8 Merge branch 'hwpoison-hugepages' into hwpoison
Conflicts:
	mm/memory-failure.c
2010-10-22 17:40:48 +02:00
Peter Zijlstra 96681fc3c9 perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
For performance reasons its best to use memory node local memory for
per-cpu buffers.

This logic comes from a much larger patch proposed by Stephane.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.514465326@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:26 +02:00
Peter Zijlstra f80c9e304b perf, x86: Clean up reserve_ds_buffers() signature
Now that reserve_ds_buffers() never fails, change it to return
void and remove all code dealing with the error return.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.462621937@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:26 +02:00
Peter Zijlstra 6809b6ea73 perf, x86: Less disastrous PEBS/BTS buffer allocation failure
Currently PEBS/BTS buffers are allocated when we instantiate the first
event, when this fails everything fails.

This is a problem because esp. BTS tries to allocate a rather large
buffer (64K), which can easily fail.

This patch changes the logic such that when either buffer allocation
fails, we simply don't allow events that would use these facilities,
but continue functioning for all other events.

This logic comes from a much larger patch proposed by Stephane.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.354429461@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:26 +02:00
Peter Zijlstra 5553be2620 perf, x86: Fixup the precise_ip computation
In case we don't have PEBS, the LBR fixup doesn't make sense.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.354429461@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:25 +02:00
Peter Zijlstra 65af94baca perf, x86: Extract DS alloc/free functions
Again, mostly a cleanup to unclutter the reserve_ds_buffer() code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.304495776@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:25 +02:00
Peter Zijlstra 5ee25c8731 perf, x86: Extract PEBS/BTS allocation functions
Mostly a cleanup.. it reduces code indentation and makes the code flow
of reserve_ds_buffers() clearer.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.253453452@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:25 +02:00
Peter Zijlstra b39f88acd7 perf, x86: Extract PEBS/BTS buffer free routines
So that we may grow additional call-sites..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Stephane Eranian <eranian@google.com>
LKML-Reference: <20101019134808.196793164@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 14:18:24 +02:00
Jan Beulich 07bd8516a2 x86, asm: Restore parentheses around one pushl_cfi argument
These were (intentionally) stripped by "fix CFI macro
invocations to deal with shortcomings in gas" to expose problems
with unexpected splitting of arguments by older gas also on
newer versions, but as it turns out there is at least one distro
(Ubuntu 6.06) where even not having *any* spaces in a macro
argument doesn't reliably prevent splitting into multiple
arguments.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4CC157DB020000780001E8A2@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22 10:51:44 +02:00
Linus Torvalds 3044100e58 Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
  x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S
  xen: Cope with unmapped pages when initializing kernel pagetable
  memblock, bootmem: Round pfn properly for memory and reserved regions
  memblock: Annotate memblock functions with __init_memblock
  memblock: Allow memblock_init to be called early
  memblock/arm: Fix memblock_region_is_memory() typo
  x86, memblock: Remove __memblock_x86_find_in_range_size()
  memblock: Fix wraparound in find_region()
  x86-32, memblock: Make add_highpages honor early reserved ranges
  x86, memblock: Fix crashkernel allocation
  arm, memblock: Fix the sparsemem build
  memblock: Fix section mismatch warnings
  powerpc, memblock: Fix memblock API change fallout
  memblock, microblaze: Fix memblock API change fallout
  x86: Remove old bootmem code
  x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
  x86: Remove not used early_res code
  x86, memblock: Replace e820_/_early string with memblock_
  x86: Use memblock to replace early_res
  x86, memblock: Use memblock_debug to control debug message print out
  ...

Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile
2010-10-21 18:52:11 -07:00
Linus Torvalds e36f561a2c Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags:
  Fix IRQ flag handling naming
  MIPS: Add missing #inclusions of <linux/irq.h>
  smc91x: Add missing #inclusion of <linux/irq.h>
  Drop a couple of unnecessary asm/system.h inclusions
  SH: Add missing consts to sys_execve() declaration
  Blackfin: Rename IRQ flags handling functions
  Blackfin: Add missing dep to asm/irqflags.h
  Blackfin: Rename DES PC2() symbol to avoid collision
  Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header
  Blackfin: Split PLL code from mach-specific cdef headers
2010-10-21 14:37:27 -07:00
Linus Torvalds 157b6ceb13 Merge branch 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, iommu: Update header comments with appropriate naming
  ia64, iommu: Add a dummy iommu_table.h file in IA64.
  x86, iommu: Fix IOMMU_INIT alignment rules
  x86, doc: Adding comments about .iommu_table and its neighbors.
  x86, iommu: Utilize the IOMMU_INIT macros functionality.
  x86, VT-d: Make Intel VT-d IOMMU use IOMMU_INIT_* macros.
  x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros.
  x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros.
  x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros.
  x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros.
  x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine.
  x86, iommu: Add proper dependency sort routine (and sanity check).
  x86, iommu: Make all IOMMU's detection routines return a value.
  x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure
2010-10-21 14:23:48 -07:00
Linus Torvalds 4a60cfa945 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
  apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
  apic, x86: Check if EILVT APIC registers are available (AMD only)
  x86: ioapic: Call free_irte only if interrupt remapping enabled
  arm: Use ARCH_IRQ_INIT_FLAGS
  genirq, ARM: Fix boot on ARM platforms
  genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
  x86: Switch sparse_irq allocations to GFP_KERNEL
  genirq: Switch sparse_irq allocator to GFP_KERNEL
  genirq: Make sparse_lock a mutex
  x86: lguest: Use new irq allocator
  genirq: Remove the now unused sparse irq leftovers
  genirq: Sanitize dynamic irq handling
  genirq: Remove arch_init_chip_data()
  x86: xen: Sanitise sparse_irq handling
  x86: Use sane enumeration
  x86: uv: Clean up the direct access to irq_desc
  x86: Make io_apic.c local functions static
  genirq: Remove irq_2_iommu
  x86: Speed up the irq_remapped check in hot pathes
  intr_remap: Simplify the code further
  ...

Fix up trivial conflicts in arch/x86/Kconfig
2010-10-21 14:11:46 -07:00
Linus Torvalds 5fe8321b88 Merge branch 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, x2apic: Simplify apic init in SMP and UP builds
  x86, intr-remap: Remove IRTE setup duplicate code
  x86, intr-remap: Set redirection hint in the IRTE
2010-10-21 13:54:05 -07:00
Linus Torvalds 709d9f54cc Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI
  x86, vmware: Remove deprecated VMI kernel support

Fix up trivial #include conflict in arch/x86/kernel/smpboot.c
2010-10-21 13:53:24 -07:00
Linus Torvalds cca8209ed9 Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: XO-1 uses/depends on PCI
  x86, olpc: Register XO-1 platform devices
  x86, olpc: Add XO-1 poweroff support
  x86, olpc: Don't retry EC commands forever
  x86, olpc: Rework BIOS signature check
  x86, olpc: Only enable PCI configuration type override on XO-1
2010-10-21 13:52:01 -07:00
Linus Torvalds d77bdc423d Merge branch 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mtrr: Support mtrr lookup for range spanning across MTRR range
  x86, mtrr: Refactor MTRR type overlap check code
2010-10-21 13:51:41 -07:00
Linus Torvalds 87affd0b94 Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: sfi: Make local functions static
  x86, earlyprintk: Add hsu early console for Intel Medfield platform
  x86, earlyprintk: Add earlyprintk for Intel Moorestown platform
  x86: Add two helper macros for fixed address mapping
  x86, mrst: A function in a header file needs to be marked "inline"
2010-10-21 13:47:54 -07:00
Linus Torvalds c3b86a2942 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, percpu: Correct the ordering of the percpu readmostly section
  x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G
  x86: Spread tlb flush vector between nodes
  percpu: Introduce a read-mostly percpu API
  x86, mm: Fix incorrect data type in vmalloc_sync_all()
  x86, mm: Hold mm->page_table_lock while doing vmalloc_sync
  x86, mm: Fix bogus whitespace in sync_global_pgds()
  x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation
  x86, mm: Add RESERVE_BRK_ARRAY() helper
  mm, x86: Saving vmcore with non-lazy freeing of vmas
  x86, kdump: Change copy_oldmem_page() to use cached addressing
  x86, mm: fix uninitialized addr in kernel_physical_mapping_init()
  x86, kmemcheck: Remove double test
  x86, mm: Make spurious_fault check explicitly check the PRESENT bit
  x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes
  x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions
  x86, mm: Avoid unnecessary TLB flush
2010-10-21 13:47:29 -07:00
Linus Torvalds 8d8d2e9ccd Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mem: Optimize memmove for small size and unaligned cases
  x86, mem: Optimize memcpy by avoiding memory false dependece
  x86, mem: Don't implement forward memmove() as memcpy()
2010-10-21 13:46:28 -07:00
Linus Torvalds 2a8b67fb72 Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line
  x86, hotplug: Move WBINVD back outside the play_dead loop
  x86, hotplug: Use mwait to offline a processor, fix the legacy case
  x86, mwait: Move mwait constants to a common header file
2010-10-21 13:45:38 -07:00
Linus Torvalds b6f7e38dbb Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fpu: Merge fpu_save_init()
  x86-32, fpu: Rewrite fpu_save_init()
  x86, fpu: Remove PSHUFB_XMM5_* macros
  x86, fpu: Remove unnecessary ifdefs from i387 code.
  x86-32, fpu: Remove math_emulate stub
  x86-64, fpu: Simplify constraints for fxsave/fxtstor
  x86-64, fpu: Fix %cs value in convert_from_fxsr()
  x86-64, fpu: Disable preemption when using TS_USEDFPU
  x86, fpu: Merge __save_init_fpu()
  x86, fpu: Merge tolerant_fwait()
  x86, fpu: Merge fpu_init()
  x86: Use correct type for %cr4
  x86, xsave: Disable xsave in i387 emulation mode

Fixed up fxsaveq-induced conflict in arch/x86/include/asm/i387.h
2010-10-21 13:34:32 -07:00
Alok Kataria 76fac077db x86, kexec: Make sure to stop all CPUs before exiting the kernel
x86 smp_ops now has a new op, stop_other_cpus which takes a parameter
"wait" this allows the caller to specify if it wants to stop until all
the cpus have processed the stop IPI.  This is required specifically
for the kexec case where we should wait for all the cpus to be stopped
before starting the new kernel.  We now wait for the cpus to stop in
all cases except for panic/kdump where we expect things to be broken
and we are doing our best to make things work anyway.

This patch fixes a legitimate regression, which was introduced during
2.6.30, by commit id 4ef702c10b.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: <stable@kernel.org> v2.6.30-36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-21 13:30:44 -07:00
Linus Torvalds 214515b578 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Remove pr_<level> uses of KERN_<level>
  therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal'
  x86: Use {push,pop}{l,q}_cfi in more places
  i386: Add unwind directives to syscall ptregs stubs
  x86-64: Use symbolics instead of raw numbers in entry_64.S
  x86-64: Adjust frame type at paranoid_exit:
  x86-64: Fix unwind annotations in syscall stubs
2010-10-21 13:20:32 -07:00
Linus Torvalds bf70030dc0 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cpu: Fix X86_FEATURE_NOPL
  x86, cpu: Re-run get_cpu_cap() after adjusting the CPUID level
2010-10-21 13:18:36 -07:00
Linus Torvalds d60a2793ba Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Remove stale pmtimer_64.c
  x86, cleanups: Use clear_page/copy_page rather than memset/memcpy
  x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI
  x86, cleanup: Remove obsolete boot_cpu_id variable
2010-10-21 13:18:06 -07:00
Linus Torvalds 781c5a67f1 Merge branch 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, bios: Make the x86 early memory reservation a kernel option
  x86, bios: By default, reserve the low 64K for all BIOSes
2010-10-21 13:06:49 -07:00
Linus Torvalds e990c77d06 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64, asm: If the assembler supports fxsave64, use it
  i386: Make kernel_execve() suitable for stack unwinding
2010-10-21 13:06:00 -07:00
Linus Torvalds 2f0384e5fc Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
  x86, amd: Use compute unit information to determine thread siblings
  x86, amd: Extract compute unit information for AMD CPUs
  x86, amd: Add support for CPUID topology extension of AMD CPUs
  x86, nmi: Support NMI watchdog on newer AMD CPU families
  x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
  x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
  x86, k8-gart: Decouple handling of garts and northbridges
  x86, cacheinfo: Fix dependency of AMD L3 CID
  x86, kvm: add new AMD SVM feature bits
  x86, cpu: Fix allowed CPUID bits for KVM guests
  x86, cpu: Update AMD CPUID feature bits
  x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
  x86, AMD: Remove needless CPU family check (for L3 cache info)
  x86, tsc: Remove CPU frequency calibration on AMD
2010-10-21 13:01:08 -07:00
Linus Torvalds bc4016f481 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
  sched: Export account_system_vtime()
  sched: Call tick_check_idle before __irq_enter
  sched: Remove irq time from available CPU power
  sched: Do not account irq time to current task
  x86: Add IRQ_TIME_ACCOUNTING
  sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
  sched: Add a PF flag for ksoftirqd identification
  sched: Consolidate account_system_vtime extern declaration
  sched: Fix softirq time accounting
  sched: Drop group_capacity to 1 only if local group has extra capacity
  sched: Force balancing on newidle balance if local group has capacity
  sched: Set group_imb only a task can be pulled from the busiest cpu
  sched: Do not consider SCHED_IDLE tasks to be cache hot
  sched: Drop all load weight manipulation for RT tasks
  sched: Create special class for stop/migrate work
  sched: Unindent labels
  sched: Comment updates: fix default latency and granularity numbers
  tracing/sched: Add sched_pi_setprio tracepoint
  sched: Give CPU bound RT tasks preference
  sched: Try not to migrate higher priority RT tasks
  ...
2010-10-21 12:55:43 -07:00
Linus Torvalds 5d70f79b5e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits)
  tracing: Fix compile issue for trace_sched_wakeup.c
  [S390] hardirq: remove pointless header file includes
  [IA64] Move local_softirq_pending() definition
  perf, powerpc: Fix power_pmu_event_init to not use event->ctx
  ftrace: Remove recursion between recordmcount and scripts/mod/empty
  jump_label: Add COND_STMT(), reducer wrappery
  perf: Optimize sw events
  perf: Use jump_labels to optimize the scheduler hooks
  jump_label: Add atomic_t interface
  jump_label: Use more consistent naming
  perf, hw_breakpoint: Fix crash in hw_breakpoint creation
  perf: Find task before event alloc
  perf: Fix task refcount bugs
  perf: Fix group moving
  irq_work: Add generic hardirq context callbacks
  perf_events: Fix transaction recovery in group_sched_in()
  perf_events: Fix bogus AMD64 generic TLB events
  perf_events: Fix bogus context time tracking
  tracing: Remove parent recording in latency tracer graph options
  tracing: Use one prologue for the preempt irqs off tracer function tracers
  ...
2010-10-21 12:54:49 -07:00
Linus Torvalds 1053e6bba0 Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Update copyright headers
  x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend
  AGP: Warn when GATT memory cannot be set to UC
  x86, GART: Disable GART table walk probes
  x86, GART: Remove superfluous AMD64_GARTEN
2010-10-21 12:49:15 -07:00
Daniel Drake 260586d2b4 Add OLPC XO-1 rfkill driver
Add a software rfkill switch for the WLAN interface in the OLPC XO-1
laptop. It uses the OLPC embedded controller to cut/restore power to
the Marvell WLAN chip on the motherboard.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-10-21 10:10:44 -04:00
Konrad Rzeszutek Wilk 5bba6c56dc X86/PCI: Remove the dependency on isapnp_disable.
This looks to be vestigial dependency that had never been used even
in the original code base (2.6.18) from which this driver
was up-ported. Without this fix, with the CONFIG_ISAPNP, we get this
compile failure:

arch/x86/pci/xen.c: In function 'pci_xen_init':
arch/x86/pci/xen.c:138: error: 'isapnp_disable' undeclared (first use in this function)
arch/x86/pci/xen.c:138: error: (Each undeclared identifier is reported only once
arch/x86/pci/xen.c:138: error: for each function it appears in.)

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-21 09:36:07 -04:00
Ian Campbell de1ef2065c xen/privcmd: move remap_domain_mfn_range() to core xen code and export.
This allows xenfs to be built as a module, previously it required flush_tlb_all
and arbitrary_virt_to_machine to be exported.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-20 16:22:34 -07:00
Jeremy Fitzhardinge 1246ae0bb9 xen: add variable hypercall caller
Allow non-constant hypercall to be called, for privcmd.

[ Impact: make arbitrary hypercalls; needed for privcmd ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-20 16:22:27 -07:00
Jeremy Fitzhardinge eba3ff8b99 xen: add xen_set_domain_pte()
Add xen_set_domain_pte() to allow setting a pte mapping a page from
another domain.  The common case is to map from DOMID_IO, the pseudo
domain which owns all IO pages, but will also be used in the privcmd
interface to map other domain pages.

[ Impact: new Xen-internal API for cross-domain mappings ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-20 16:22:27 -07:00
FUJITA Tomonori 66f2b06154 x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G
Set CONFIG_ARCH_DMA_ADDR_T_64BIT when we set dma_addr_t to 64 bits in
<asm/types.h>; this allows Kconfig decisions based on this property.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <201010202255.o9KMtZXu009370@imap1.linux-foundation.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 16:02:42 -07:00
Shaohua Li 9329672021 x86: Spread tlb flush vector between nodes
Currently flush tlb vector allocation is based on below equation:
	sender = smp_processor_id() % 8
This isn't optimal, CPUs from different node can have the same vector, this
causes a lot of lock contention. Instead, we can assign the same vectors to
CPUs from the same node, while different node has different vectors. This has
below advantages:
a. if there is lock contention, the lock contention is between CPUs from one
node. This should be much cheaper than the contention between nodes.
b. completely avoid lock contention between nodes. This especially benefits
kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
to specific node.

In my test, this could reduce > 20% CPU overhead in extreme case.The test
machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
to the first CPU of the node. I run a workload with 4 sequential mmap file
read thread. The files are empty sparse file. This workload will trigger a
lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
extreme tlb flush lock contention because otherwise kswapd keeps migrating
between CPUs of a node and I can't get stable result. Sure in real workload,
we can't always see so big tlb flush lock contention, but it's possible.

[ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:44:42 -07:00
Borislav Petkov b40827fa72 x86-32, mm: Add an initial page table for core bootstrapping
This patch adds an initial page table with low mappings used exclusively
for booting APs/resuming after ACPI suspend/machine restart. After this,
there's no need to add low mappings to swapper_pg_dir and zap them later
or create own swsusp PGD page solely for ACPI sleep needs - we have
initial_page_table for that.

Signed-off-by: Borislav Petkov <bp@alien8.de>
LKML-Reference: <20101020070526.GA9588@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 14:23:55 -07:00
H. Peter Anvin d25e6b0b32 Merge branch 'x86/cleanups' into x86/trampoline 2010-10-20 14:22:45 -07:00
H. Peter Anvin e44dea35cc Merge branch 'x86/vmware' into x86/trampoline 2010-10-20 13:18:17 -07:00
Borislav Petkov f01f7c56a1 x86, mm: Fix incorrect data type in vmalloc_sync_all()
arch/x86/mm/fault.c: In function 'vmalloc_sync_all':
arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast

introduced by 617d34d9e5.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-20 12:54:04 -07:00
Robert Richter 27afdf2008 apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
We want the BIOS to setup the EILVT APIC registers. The offsets
were hardcoded and BIOS settings were overwritten by the OS.
Now, the subsystems for MCE threshold and IBS determine the LVT
offset from the registers the BIOS has setup. If the BIOS setup
is buggy on a family 10h system, a workaround enables IBS. If
the OS determines an invalid register setup, a "[Firmware Bug]:
" error message is reported.

We need this change also for upcomming cpu families.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:42:13 +02:00
Robert Richter a68c439b19 apic, x86: Check if EILVT APIC registers are available (AMD only)
This patch implements checks for the availability of LVT entries
(APIC500-530) and reserves it if used. The check becomes
necessary since we want to let the BIOS provide the LVT offsets.
 The offsets should be determined by the subsystems using it
like those for MCE threshold or IBS.  On K8 only offset 0
(APIC500) and MCE interrupts are supported. Beginning with
family 10h at least 4 offsets are available.

Since offsets must be consistent for all cores, we keep track of
the LVT offsets in software and reserve the offset for the same
vector also to be used on other cores. An offset is freed by
setting the entry to APIC_EILVT_MASKED.

If the BIOS is right, there should be no conflicts. Otherwise a
"[Firmware Bug]: ..." error message is generated. However, if
software does not properly determines the offsets, it is not
necessarily a BIOS bug.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:42:13 +02:00
Ingo Molnar 14d4962dc8 Merge branch 'linus' into irq/core
Merge reason: update to almost-final-.36

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-20 04:38:59 +02:00
Jan Beulich 3234282f33 x86, asm: Fix CFI macro invocations to deal with shortcomings in gas
gas prior to (perhaps) 2.16.90 has problems with passing non-
parenthesized expressions containing spaces to macros. Spaces, however,
get inserted by cpp between any macro expanding to a number and a
subsequent + or -. For the +, current x86 gas then removes the space
again (future gas may not do so), but for the - the space gets retained
and is then considered a separator between macro arguments.

Fix the respective definitions for both the - and + cases, so that they
neither contain spaces nor make cpp insert any (the latter by adding
seemingly redundant parentheses).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 14:28:02 -07:00
Jeremy Fitzhardinge 617d34d9e5 x86, mm: Hold mm->page_table_lock while doing vmalloc_sync
Take mm->page_table_lock while syncing the vmalloc region.  This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held.  If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.

vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.

The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgds.

Reported-by: Ian Campbell <ian.cambell@eu.citrix.com>
Originally-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4CB88A4C.1080305@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge 44235dcde4 x86, mm: Fix bogus whitespace in sync_global_pgds()
Whitespace cleanup only.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-19 13:56:03 -07:00
Avi Kivity 9581d442b9 KVM: Fix fs/gs reload oops with invalid ldt
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.

Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.

This is CVE-2010-3698.

KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-19 14:21:45 -02:00
Yinghai Lu 9717967c4b x86: ioapic: Call free_irte only if interrupt remapping enabled
On a system that support intr-rempping when booting with "intremap=off"

[  177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
[  177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0
...
[  178.173326] Call Trace:
[  178.173574]  [<ffffffff810515b4>] destroy_irq+0x3a/0x75
[  178.192934]  [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10
[  178.193418]  [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f
[  178.213021]  [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb

Call free_irte only when interrupt remapping is enabled.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CBCB274.7010108@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-19 09:25:33 +02:00
Venkatesh Pallipadi e82b8e4ea4 x86: Add IRQ_TIME_ACCOUNTING
This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it
when TSC is enabled.

This change just enables fine grained irq time accounting, isn't used yet.
Following patches use it for different purposes.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18 20:52:25 +02:00
Peter Zijlstra e360adbe29 irq_work: Add generic hardirq context callbacks
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18 19:58:50 +02:00
Stephane Eranian ba0cef3d14 perf_events: Fix bogus AMD64 generic TLB events
PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x7 (to count all possibilities).

PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x3 (to count all possibilities).

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: <stable@kernel.org> # as far back as it applies
LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18 19:58:48 +02:00
Konrad Rzeszutek Wilk 74226b8c8a xen/pci: Request ACS when Xen-SWIOTLB is activated.
It used to done in the Xen startup code but that is not really
appropiate.

[v2: Update Kconfig with PCI requirement]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-18 10:49:38 -04:00
Alex Nixon b5401a96b5 xen/x86/PCI: Add support for the Xen PCI subsystem
The frontend stub lives in arch/x86/pci/xen.c, alongside other
sub-arch PCI init code (e.g. olpc.c).

It provides a mechanism for Xen PCI frontend to setup/destroy
legacy interrupts, MSI/MSI-X, and PCI configuration operations.

[ Impact: add core of Xen PCI support ]
[ v2: Removed the IOMMU code and only focusing on PCI.]
[ v3: removed usage of pci_scan_all_fns as that does not exist]
[ v4: introduced pci_xen value to fix compile warnings]
[ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs]
[ v7: added Acked-by]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Qing He <qing.he@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
2010-10-18 10:49:35 -04:00
Stefano Stabellini 294ee6f89c x86: Introduce x86_msi_ops
Introduce an x86 specific indirect mechanism to setup MSIs.
The MSI setup functions become function pointers in an x86_msi_ops
struct, that defaults to the implementation in io_apic.c and msi.c.

[v2: Use HAVE_DEFAULT_* knobs]
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-18 10:49:34 -04:00
Jeremy Fitzhardinge 5ee01f49c9 x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
When mapping pci space via /sys or /proc, make sure we're really
doing a hardware mapping by setting _PAGE_IOMAP.

[ Impact: bugfix; make PCI mappings map the right pages ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: x86@kernel.org
2010-10-18 10:49:31 -04:00
Alex Nixon 44de3395a4 x86/PCI: Clean up pci_cache_line_size
Separate out x86 cache_line_size initialisation code into its own
function (so it can be shared by Xen later in this patch series)

[ Impact: cleanup ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: x86@kernel.org
2010-10-18 10:49:30 -04:00
Jeremy Fitzhardinge 7b586d7185 x86/io_apic: add get_nr_irqs_gsi()
Impact: new interface to get max GSI

Add get_nr_irqs_gsi() to return nr_irqs_gsi.  Xen will use this to
determine how many irqs it needs to reserve for hardware irqs.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-18 10:40:30 -04:00
Jeremy Fitzhardinge d8e0420603 xen: define BIOVEC_PHYS_MERGEABLE()
Impact: allow Xen control of bio merging

When running in Xen domain with device access, we need to make sure
the block subsystem doesn't merge requests across pages which aren't
machine physically contiguous.  To do this, we define our own
BIOVEC_PHYS_MERGEABLE.  When CONFIG_XEN isn't enabled, or we're not
running in a Xen domain, this has identical behaviour to the normal
implementation.  When running under Xen, we also make sure the
underlying machine pages are the same or adjacent.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-18 10:40:28 -04:00
Alex Nixon 23ace955c2 xen: Don't disable the I/O space
If a guest domain wants to access PCI devices through the frontend
driver (coming later in the patch series), it will need access to the
I/O space.

[ Impact: Allow for domU IO access, preparing for pci passthrough ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2010-10-18 10:40:27 -04:00
Justin P. Mattock 50a23e6eec Update broken web addresses in arch directory.
The patch below updates broken web addresses in the arch directory.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:21 +02:00
Bjorn Helgaas 1ca98fa652 x86/PCI: MMCONFIG: fix region end calculation
The end of an MMCONFIG region depends on the ending bus number, not on the
number of buses the region covers.  We previously computed the wrong ending
address whenever the starting bus number was non-zero, e.g.,:

  MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000)
  MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe1ffffff] (base 0xe0000000)

The correct regions are:

  MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000)
  MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe3ffffff] (base 0xe0000000)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-17 20:03:07 -07:00
Seth Heasley cb04e95bdd PCI: update Intel chipset names and defines
This patch updates the defines for Intel devices in
include/linux/pci_ids.h, referenced in arch/x86/pci/irq.c and
drivers/i2c/busses/i2c-i801.c, reflecting approved legal branding, and
using fuller code-names for products under development.

Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-17 20:03:04 -07:00
Seth Heasley 25143fd127 x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs
This patch adds the LPC Controller DeviceIDs for the Intel Patsburg PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-15 13:09:52 -07:00
Daniel Drake 80e7b19ae1 PCI: OLPC: Only enable PCI configuration type override on XO-1
This configuration type override is for XO-1 only and must not happen
on XO-1.5.

Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-15 13:09:51 -07:00
Thomas Gleixner 40ffa93791 x86: Remove stale pmtimer_64.c
This file is unused since the apic unification in 2.6.29, but nobody
noticed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-15 21:18:59 +02:00
Thomas Gleixner 940b3c7b19 x86: sfi: Make local functions static
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Len Brown <lenb@kernel.org>
2010-10-15 19:37:39 +02:00
Arnd Bergmann 6038f373a3 llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-15 15:53:27 +02:00
Robert Richter b47fad3bfb oprofile, x86: Add support for IBS periodic op counter extension
The count value for IBS op sampling has been extended by 7 bits. The
feature is reflected in bit 6 (OpCntExt) of the IBS capability
register (CPUID Fn8000_001B_EAX).

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:43 +02:00
Robert Richter 25da695047 oprofile, x86: Add support for IBS branch target address reporting
This patch adds support for IBS branch target address reporting. A new
MSR (MSRC001_103B IBS Branch Target Address) has been added that
provides the logical address in canonical form for the branch
target. The size of the IBS sample that is transferred to the userland
has been increased.

For backward compatibility, the userland daemon must explicit enable
the feature by writing to the oprofilefs file

 ibs_op/branch_target

After enabling branch target address reporting, the userland daemon
must handle the extended size of the IBS sample.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:42 +02:00
Robert Richter 53b39e9480 oprofile, x86: Introduce struct ibs_state
This patch introduces struct ibs_state that will extended by additinal
members in follow-on patches.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:41 +02:00
Robert Richter fc889aa23f oprofile, x86: Remove duplicate check for IBS_CAPS_OPCNT
Since oprofile is setting up ibs_op/dispatched_ops in the fs only if
the feature is available, its corresponding variable
ibs_config.dispatched_ops is only set, if the feature is
available. Thus the check is duplicate and can be removed.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:41 +02:00
Robert Richter 4ac945f002 oprofile, x86: Check IBS capability bits 1 and 2
There are IBS CPUID feature flags in CPUID Fn8000_001B to detect if
the cpu supports IBS fetch sampling (FetchSam) and/or IBS execution
sampling (OpSam). This patch adds checks if the both features are
available.

Spec:

 http://support.amd.com/us/Processor_TechDocs/31116.pdf

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:40 +02:00
Robert Richter e63414740e oprofile, x86: Add support for AMD family 14h
This patch adds support for AMD family 14h (Ontario/Zacate) cpus.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:39 +02:00
Robert Richter 3acbf0849b oprofile, x86: Add support for AMD family 12h
This patch adds support for AMD family 12h (Llano) cpus.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-15 12:50:39 +02:00
Robert Richter 6268464b37 Merge remote branch 'tip/perf/core' into oprofile/core
Conflicts:
	arch/arm/oprofile/common.c
	kernel/perf_event.c
2010-10-15 12:45:00 +02:00
Randy Dunlap 9e9006e909 x86, olpc: XO-1 uses/depends on PCI
olpc-xo1 uses pci_*() interfaces so it should depend on PCI.

Otherwise we get build failure like:

 arch/x86/kernel/olpc-xo1.c:65: error: implicit declaration of function 'pci_enable_device_io'
 arch/x86/kernel/olpc-xo1.c:71: error: implicit declaration of function 'pci_request_region'
 arch/x86/kernel/olpc-xo1.c:80: error: implicit declaration of function 'pci_release_region'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Daniel Drake <dsd@laptop.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20101014101313.adf7eb2a.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-15 09:41:38 +02:00
Ingo Molnar 0fdf13606b Merge branch 'tip/perf/recordmcount-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-10-15 06:12:28 +02:00
Steven Rostedt cf4db2597a ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT
The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.

Suggested-by: Ingo Molnar <mingo@elte.hu>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-14 23:32:44 -04:00
Ingo Molnar d9d572a9c0 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core 2010-10-15 05:12:45 +02:00
Steven Rostedt 72441cb1fd ftrace/x86: Add support for C version of recordmcount
This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.

After verifying this works, other archs can add:

 HAVE_C_MCOUNT_RECORD

in its Kconfig and it will use the C version of recordmcount
instead of the perl version.

Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-14 16:52:41 -04:00
Frederic Weisbecker ebc8827f75 x86: Barf when vmalloc and kmemcheck faults happen in NMI
In x86, faults exit by executing the iret instruction, which then
reenables NMIs if we faulted in NMI context. Then if a fault
happens in NMI, another NMI can nest after the fault exits.

But we don't yet support nested NMIs because we have only one NMI
stack. To prevent from that, check that vmalloc and kmemcheck
faults don't happen in this context. Most of the other kernel faults
in NMIs can be more easily spotted by finding explicit
copy_from,to_user() calls on review.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
2010-10-14 20:43:36 +02:00
Linus Torvalds 0eead9ab41 Don't dump task struct in a.out core-dumps
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code).  Just remove it.

Also do the access_ok() check on dump_write().  It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
  calling ->write directly.  That also does the whole fsnotify and write
  statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)

Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 10:57:40 -07:00
Jeremy Fitzhardinge 67e87f0a1c x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S
head_64.S maps up to 512 MiB, but that is not necessarity true for
other entry paths, such as Xen.

Thus, co-locate the setting of max_pfn_mapped with the code to
actually set up the page tables in head_64.S.  The 32-bit code is
already so co-located.  (The Xen code already sets max_pfn_mapped
correctly for its own use case.)

-v2:

 Yinghai fixed the following bug in this patch:

 |
 | max_pfn_mapped is in .bss section, so we need to set that
 | after bss get cleared. Without that we crash on bootup.
 |
 | That is safe because Xen does not call x86_64_start_kernel().
 |

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Fixed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4CB6AB24.9020504@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 09:06:49 +02:00
Masami Hiramatsu 3cba11d32b kconfig/x86: Add HAVE_TEXT_POKE_SMP config for stop_machine dependency
Since the text_poke_smp() definately depends on actual
stop_machine() on smp, add that dependency to Kconfig.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20101014031042.4100.90877.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 08:55:29 +02:00
Masami Hiramatsu 3caa37519c x86: Use __stop_machine() in text_poke_smp()
Use __stop_machine() in text_poke_smp() because the caller
must get online_cpus before calling text_poke_smp(), but
stop_machine() do it again. We don't need it.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LKML-Reference: <20101014031036.4100.83989.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 08:55:28 +02:00
Randy Dunlap 03f1a17cd5 x86/vsmp: Eliminate kconfig dependency warning
Fix kconfig dependency warning to satisfy dependencies:

warning: (X86_VSMP && X86_64 && PCI && X86_EXTENDED_PLATFORM ||
XEN && PARAVIRT_GUEST && (X86_64 || X86_32 && X86_PAE && !X86_VISWS) && X86_CMPXCHG && X86_TSC || KVM_CLOCK && PARAVIRT_GUEST || KVM_GUEST && PARAVIRT_GUEST || LGUEST_GUEST && PARAVIRT_GUEST && X86_32) selects PARAVIRT which has unmet direct dependencies (PARAVIRT_GUEST)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
LKML-Reference: <20101013210023.9a033222.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-14 07:04:30 +02:00
Linus Torvalds 509d4486bd Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: For each node, register the memory blocks actually used
  x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
  x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
2010-10-13 16:34:23 -07:00
Jeremy Fitzhardinge fef5ba7979 xen: Cope with unmapped pages when initializing kernel pagetable
Xen requires that all pages containing pagetable entries to be mapped
read-only.  If pages used for the initial pagetable are already mapped
then we can change the mapping to RO.  However, if they are initially
unmapped, we need to make sure that when they are later mapped, they
are also mapped RO.

We do this by knowing that the kernel pagetable memory is pre-allocated
in the range e820_table_start - e820_table_end, so any pfn within this
range should be mapped read-only.  However, the pagetable setup code
early_ioremaps the pages to write their entries, so we must make sure
that mappings created in the early_ioremap fixmap area are mapped RW.
(Those mappings are removed before the pages are presented to Xen
as pagetable pages.)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4CB63A80.8060702@goop.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-13 16:07:13 -07:00
H. Peter Anvin d7acb92fea x86-64, asm: If the assembler supports fxsave64, use it
Kbuild allows for us to probe for the existence of specific constructs
in the assembler, use them to find out if we can use fxsave64 and
permit the compiler to generate better code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-13 16:00:29 -07:00
Daniel Drake 447b1d43de x86, olpc: Register XO-1 platform devices
The upcoming XO-1 rfkill driver (for drivers/platform/x86) will register
itself with the name "xo1-rfkill", and the already-merged XO-1 poweroff
code uses name "olpc-xo1"

Add the necessary mechanics so that these devices are properly
initialized on XO-1 laptops.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20101013181042.90C8F9D401B@zog.reactivated.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-13 11:51:25 -07:00
Ingo Molnar 3d8a1a6a8a Merge branch 'amd-iommu/2.6.37' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu 2010-10-13 15:44:24 +02:00
Joerg Roedel 5d0d71569e x86/amd-iommu: Update copyright headers
This patch updates the copyright headers in all source files
of the AMD IOMMU driver.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-10-13 11:13:21 +02:00
Matthew Garrett 5bcd757f93 x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend
AMD's reference BIOS code had a bug that could result in the
firmware failing to reenable the iommu on resume. It
transpires that this causes certain less than desirable
behaviour when it comes to PCI accesses, to whit them ending
up somewhere near Bristol when the more desirable outcome
was Edinburgh. Sadness ensues, perhaps along with filesystem
corruption.  Let's make sure that it gets turned back on,
and that we restore its configuration so decisions it makes
bear some resemblance to those made by reasonable people
rather than crack-addled lemurs who spent all your DMA on
Thunderbird.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-10-13 11:11:46 +02:00
Daniel Drake bf1ebf0079 x86, olpc: Add XO-1 poweroff support
Add a pm_power_off handler for the OLPC XO-1 laptop.

The driver can be built modular and follows the behaviour of the
APM driver, setting pm_power_off to NULL on unload. However, the
ability to unload the module will probably be removed (with a simple
__module_get(THIS_MODULE)) if/when XO-1 suspend/resume support is
added to this file at a later date.

Signed-off-by: Daniel Drake <dsd@laptop.org>
LKML-Reference: <20101010094032.9AE669D401B@zog.reactivated.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-12 17:31:15 -07:00
Thomas Gleixner 2ee3906598 x86: Switch sparse_irq allocations to GFP_KERNEL
No callers from atomic context (except boot) anymore.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:46 +02:00
Thomas Gleixner c2f31c37b7 x86: lguest: Use new irq allocator
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2010-10-12 16:53:45 +02:00
Thomas Gleixner ad9f43340f x86: Use sane enumeration
Instead of looping through all interrupts, use the bitmap lookup to
find the next.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:44 +02:00
Thomas Gleixner 48b2650196 x86: uv: Clean up the direct access to irq_desc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:43 +02:00
Thomas Gleixner 1a8ce7ff68 x86: Make io_apic.c local functions static
No users outside of io_apic.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:43 +02:00
Thomas Gleixner 1a0730d664 x86: Speed up the irq_remapped check in hot pathes
irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check
based on irq_cfg instead of going through a lookup function. That's
especially interesting in the eoi_ioapic_irq() hotpath.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:42 +02:00
Thomas Gleixner 423f085952 x86: Embedd irq_2_iommu into irq_cfg
That interrupt remapping code is x86 specific and tied to the io_apic
code. No need for separate allocator functions in the interrupt
remapping code. This allows to simplify the code and irq_2_iommu is
small (13 bytes on 64bit) so it's not a real problem even if interrupt
remapping is runtime disabled. If it's compile time disabled the
impact is zero.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:41 +02:00
Thomas Gleixner bc5fdf9f3a x86: io_apic: Remove the now unused sparse_irq arch_* functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner fbc6bff04a x86: ioapic: Cleanup sparse irq code
Switch over to the new allocator and remove all the magic which was
caused by the unability to destroy irq descriptors. Get rid of the
create_irq_nr() loop for sparse and non sparse irq.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Yinghai Lu fe6dab4e79 x86: Don't setup ioapic irq for sci twice
The sparseirq rework triggered a warning in the iommu code, which was
caused by setting up ioapic for ACPI irq 9 twice. This function is
solely to handle interrupts which are on a secondary ioapic and
outside the legacy irq range.

Replace the sparse irq_to_desc check with a non ifdeffed version.

[ tglx: Moved it before the ioapic sparse conversion and simplified
  	the inverse logic ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB00122.3030301@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner f981a3dc19 x86: io_apic: Prepare alloc/free_irq_cfg()
Rename the grossly misnamed get_one_free_irq_cfg() to alloc_irq_cfg().
Add a (not yet used) irq number argument to free_irq_cfg()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner 08c33db6d0 x86: Implement new allocator functions
Implement new allocator functions which make use of the core changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:40 +02:00
Thomas Gleixner 6e2fff50a5 x86: ioapic: Cleanup get_one_free_irq_cfg()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:39 +02:00
Thomas Gleixner 7e495529b6 x86: ioapic: Cleanup some more
Cleanup after the irq_chip conversion a bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:39 +02:00
Thomas Gleixner be5b7bf738 x86: Convert ht set_affinity to new chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:39 +02:00
Thomas Gleixner 0e09ddf2d7 x86: Cleanup hpet affinity setting
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:39 +02:00
Thomas Gleixner fe52b2d259 x86: Convert dmar affinity setting to new chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
2010-10-12 16:53:39 +02:00
Thomas Gleixner b5d1c46579 x86: Convert remapped msi to new chip.irq_set_affinity function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:38 +02:00
Thomas Gleixner f19f5ecc92 x86: Convert remapped ioapic affinity setting to new irq chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
2010-10-12 16:53:38 +02:00
Thomas Gleixner 5346b2a78f x86: Convert msi affinity setting to new chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:38 +02:00
Thomas Gleixner f7e909eae4 x86: Prepare the affinity common functions for taking struct irq_data *
While at it rename it to sensible function names and fix the return
value from unsigned to int for __ioapic_set_affinity (set_desc_affinity).
Returning -1 in a function returning unsigned int is somewhat strange.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:38 +02:00
Thomas Gleixner 60c69948e5 x86: ioapic: Clean up the direct access to irq_desc
Most of it is useless pseudo optimization.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:38 +02:00
Thomas Gleixner e9f7ac664b ht: Convert to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:37 +02:00
Thomas Gleixner 5c2837fbaa dmar: Convert to new irq chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <dwmw2@infradead.org>
2010-10-12 16:53:37 +02:00
Thomas Gleixner d0fbca8f93 x86: ioapic/hpet: Convert to new chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:37 +02:00
Thomas Gleixner 90297c5fe7 x86: ioapic: Convert mask to new irq_chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:37 +02:00
Thomas Gleixner 61a38ce3f5 x86: io_apic: Convert startup to new irq_chip function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:37 +02:00
Thomas Gleixner dd5f15e5cf x86: Cleanup io_apic
Sanitize functions. Remove irq_desc pointer magic.
Preparatory patch for further cleanups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:36 +02:00
Thomas Gleixner d4eba29770 x86: Cleanup access to irq_data
Fixup the open coded access to 
      irq_desc->[handler_data|chip_data|msi-desc]

Use the macros and inline functions for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:36 +02:00
Thomas Gleixner 4305df947c x86: i8259: Convert to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:36 +02:00
Thomas Gleixner 020dd984d7 x86: Cleanup visws interrupt handling
Remove the open coded access to irq_desc and convert to the new irq
chip functions. Change the mask function of piix4_virtual_irq_type so
we can use the generic irq handling function for the virtual interrupt
instead of open coding it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:35 +02:00
Thomas Gleixner fe25c7fc2e x86: lguest: Convert to new irq chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2010-10-12 16:53:35 +02:00
Thomas Gleixner a5ef2e7040 x86: Sanitize apb timer interrupt handling
Disable the interrupt in CPU_DEAD where it belongs. Remove the
open coded irq_desc manipulation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
2010-10-12 16:53:35 +02:00
Thomas Gleixner a3c08e5d80 x86: Convert irq_chip access to new functions
Before moving the irq chips to the new functions, fixup direct callers.

The cpu offline irq fixup code needs to become generic and archs need
to honour the "force" flag as an indicator, but that's for later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:35 +02:00
Thomas Gleixner 011d578fda x86: Remove useless reinitialization of irq descriptors
The descriptors are already initialized in exactly this way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:53:34 +02:00
Thomas Gleixner 39431acb1a pci: Cleanup the irq_desc mess in msi
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-12 16:53:34 +02:00
Thomas Gleixner 1c9db52534 pci: Convert msi to new irq_chip functions
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
2010-10-12 16:53:34 +02:00
Thomas Gleixner 7c5f13519a Merge branch 'x86/urgent' of into irq/sparseirq
Reason: Pull in the latest io_apic bugfixes

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-12 16:41:26 +02:00
Thomas Gleixner 5e62feabcc Merge branch 'x86/cleanups' into irq/sparseirq
Reason: Avoid conflicts with removal of boot_cpu_id

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-12 16:40:42 +02:00
Thomas Gleixner 8ffcfa4e2d Merge branch 'x86/x2apic' into irq/sparseirq
Reason: Avoid conflicts with the x2apic modifications

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-12 16:39:53 +02:00
Thomas Gleixner b683de2b3c genirq: Query arch for number of early descriptors
sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go
ahead and allocate more.

Use the unused return value of arch_probe_nr_irqs() to let the
architecture return the number of early allocations. Fix up all users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 16:39:08 +02:00
Michal Marek 239060b93b Merge branch 'kbuild/rc-fixes' into kbuild/kconfig
We need to revert the temporary hack in 71ebc01, hence the merge.
2010-10-12 15:09:06 +02:00
Zhang Rui dab5fff14d acpi-cpufreq: fix a memleak when unloading driver
We didn't free per_cpu(acfreq_data, cpu)->freq_table
when acpi_freq driver is unloaded.

Resulting in the following messages in /sys/kernel/debug/kmemleak:

unreferenced object 0xf6450e80 (size 64):
  comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s)
  hex dump (first 32 bytes):
    00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00  ......$.......$.
    02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00  .....j.......5..
  backtrace:
    [<c123ba97>] kmemleak_alloc+0x27/0x50
    [<c109f96f>] __kmalloc+0xcf/0x110
    [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq]
    [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0
    [<c11920b7>] sysdev_driver_register+0x97/0x110
    [<c11cce56>] cpufreq_register_driver+0x86/0x140
    [<f9dad080>] 0xf9dad080
    [<c1001130>] do_one_initcall+0x30/0x160
    [<c10626e9>] sys_init_module+0x99/0x1e0
    [<c1002d97>] sysenter_do_call+0x12/0x26
    [<ffffffff>] 0xffffffff

https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21

Tested-by: Toralf Forster <toralf.foerster@gmx.de>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-12 00:58:28 -04:00
H. Peter Anvin 8e4029ee35 Merge branch 'x86/urgent' into core/memblock
Reason for merge:

Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree.

Resolved Conflicts:
	arch/x86/mm/srat_64.c

Originally-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 17:05:11 -07:00
Nikanth Karthikesan 50f2d7f682 x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA
commit d9c2d5ac6a "x86, numa: Use near(er)
online node instead of roundrobin for NUMA" changed NUMA initialization on
Intel to choose the nearest online node or first node.  Fake NUMA would be
better of with round-robin initialization, instead of the all CPUS on
first node.  Change the choice of first node, back to round-robin.

For testing NUMA kernel behaviour without cpusets and NUMA aware
applications, it would be better to have cpus in different nodes, rather
than all in a single node.  With cpusets migration of tasks scenarios
cannot not be tested.

I guess having it round-robin shouldn't affect the use cases for all cpus
on the first node.

The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
be the case, which was changed by commit d9c2d5ac6.  It changed from
roundrobin to nearer or first node.  And I couldn't find any reason for
this change in its changelog.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2010-10-11 16:16:56 -07:00
Jeremy Fitzhardinge 236260b90d memblock: Allow memblock_init to be called early
The Xen setup code needs to call memblock_x86_reserve_range() very early,
so allow it to initialize the memblock subsystem before doing so.  The
second memblock_init() is ignored.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <4CACFDAD.3090900@goop.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 15:59:01 -07:00
Yinghai Lu 73cf624d02 x86, numa: For each node, register the memory blocks actually used
Russ reported SGI UV is broken recently. He said:

| The SRAT table shows that memory range is spread over two nodes.
|
| SRAT: Node 0 PXM 0 100000000-800000000
| SRAT: Node 1 PXM 1 800000000-1000000000
| SRAT: Node 0 PXM 0 1000000000-1080000000
|
|Previously, the kernel early_node_map[] would show three entries
|with the proper node.
|
|[    0.000000]     0: 0x00100000 -> 0x00800000
|[    0.000000]     1: 0x00800000 -> 0x01000000
|[    0.000000]     0: 0x01000000 -> 0x01080000
|
|The problem is recent community kernel early_node_map[] shows
|only two entries with the node 0 entry overlapping the node 1
|entry.
|
|    0: 0x00100000 -> 0x01080000
|    1: 0x00800000 -> 0x01000000

After looking at the changelog, Found out that it has been broken for a while by
following commit

|commit 8716273cae
|Author: David Rientjes <rientjes@google.com>
|Date:   Fri Sep 25 15:20:04 2009 -0700
|
|    x86: Export srat physical topology

Before that commit, register_active_regions() is called for every SRAT memory
entry right away.

Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node.  nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.

Reported-by: Russ Anderson <rja@sgi.com>
Tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 15:26:15 -07:00
Zachary Amsden 47008cd887 KVM: x86: Move TSC reset out of vmcb_init
The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC.  Instead, just set TSC to zero once at VCPU creation time.

Why the separate patch?  So git-bisect is your friend.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-11 12:36:07 +02:00
Zachary Amsden 58877679fd KVM: x86: Fix SVM VMCB reset
On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-11 12:36:07 +02:00
Borislav Petkov 6dcbfe4f0b x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
This fixes possible cases of not collecting valid error info in
the MCE error thresholding groups on F10h hardware.

The current code contains a subtle problem of checking only the
Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
thresholding group) in its first iteration and breaking out if
the bit is cleared.

But (!), this MSR contains an offset value, BlkPtr[31:24], which
points to the remaining MSRs in this thresholding group which
might contain valid information too. But if we bail out only
after we checked the valid bit in the first MSR and not the
block pointer too, we miss that other information.

The thing is, MC4_MISC0[BlkPtr] is not predicated on
MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
prior to iterating over the MCI_MISCj thresholding group,
irrespective of the MC4_MISC0[Valid] setting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-11 11:04:36 +02:00
Akinobu Mita 708ff2a009 bitops: make asm-generic/bitops/find.h more generic
asm-generic/bitops/find.h has the extern declarations of find_next_bit()
and find_next_zero_bit() and the macro definitions of find_first_bit()
and find_first_zero_bit(). It is only usable by the architectures which
enables CONFIG_GENERIC_FIND_NEXT_BIT and disables
CONFIG_GENERIC_FIND_FIRST_BIT.

x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and
CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include
asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern
declarations of find_first_bit and find_first_zero_bit() are put in
linux/bitops.h.

This makes asm-generic/bitops/find.h usable by these architectures
and use it. Also this change is needed for the forthcoming duplicated
extern declarations cleanup.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Chris Metcalf <cmetcalf@tilera.com>
2010-10-09 21:51:44 +02:00
Konrad Rzeszutek Wilk 6e96366933 x86, iommu: Update header comments with appropriate naming
The header comments diverged a bit from the implementation. Lets
re-sync them.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1286564028-2352-3-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-08 13:11:21 -07:00
Ingo Molnar 7cd2541cf2 Merge commit 'v2.6.36-rc7' into perf/core
Conflicts:
	arch/x86/kernel/module.c

Merge reason: Resolve the conflict, pick up fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:46:27 +02:00
Jin Dongming b62be8ea9d x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
When the feature PTS is not supported by CPU, the sysfile
package_power_limit_count for package should not be
generated.

This patch is used for fixing missing { and }.

The patch is not complete as there are other error handling
problems in this function - but that can wait until the
merge window.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@initel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:29:20 +02:00
Paul Fox 286e5b97eb x86, olpc: Don't retry EC commands forever
Avoids a potential infinite loop.

It was observed once, during an EC hacking/debugging
session - not in regular operation.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: dilinger@queued.net
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:06:09 +02:00
Feng Tang 4d033556f1 x86, earlyprintk: Add hsu early console for Intel Medfield platform
Intel Medfield platform has a high speed UART device, which
could act as a early console. To enable early printk of HSU
console, simply add "earlyprintk=hsu" in kernel command line.

Currently we put the code in the early_printk_mrst.c as it is
also for Intel MID platforms like the mrst early console

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: greg@kroah.com
LKML-Reference: <1284361736-23011-5-git-send-email-feng.tang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:01:47 +02:00
Feng Tang c20b5c3318 x86, earlyprintk: Add earlyprintk for Intel Moorestown platform
Intel Moorestown platform has a spi-uart device(Maxim3110),
which connects to a Designware spi core controller. This patch
will add early console function based on it.

As it will be used long before Linux spi subsystem get
initialised, we simply directly manipulate the spi controller's
register to acheive the early console func. This is safe as it
will be disabled when devices subsytem get initialised.

To use it, user need enable CONFIG_X86_MRST_EARLY_PRINTK in
kenrel config and add "earlyprintk=mrst" in kernel command line.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: greg@kroah.com
LKML-Reference: <1284361736-23011-4-git-send-email-feng.tang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:01:47 +02:00
Feng Tang 5a47c7dae8 x86: Add two helper macros for fixed address mapping
Sometimes fixmap will be used to map an physical address which
is not PAGE align, so to use it we need first map it and then
add the address offset to the mapped fixed address. These 2 new
helpers are suggested by Ingo Molnar to make the process
simpler.

For a physicall address like "phys", a directly usable virtual
address can be get by
	virt = (void *)set_fixmap_offset(fixed_idx, phys);
or
	virt = (void *)set_fixmap_offset_nocache(fixed_idx, phys);
(depends on whether the physical address is cachable or not).

Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: alan@linux.intel.com
Cc: greg@kroah.com
Cc: x86@kernel.org
LKML-Reference: <1284361736-23011-3-git-send-email-feng.tang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 10:01:46 +02:00
Andi Kleen f672b49b07 x86: HWPOISON: Report correct address granuality for huge hwpoison faults
An earlier patch fixed the hwpoison fault handling to encode the
huge page size in the fault code of the page fault handler.

This is needed to report this information in SIGBUS to user space.

This is a straight forward patch to pass this information
through to the signal handling in the x86 specific fault.c

Cc: x86@kernel.org
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:46 +02:00
Ingo Molnar 153db80f8c Merge commit 'v2.6.36-rc7' into core/memblock
Merge reason: Update from -rc3 to -rc7.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-08 09:15:00 +02:00
Zhao Yakui 68f4d5a00a x86, setup: Use string copy operation to optimze copy in kernel compression
The kernel decompression code parses the ELF header and then copies
the segment to the corresponding destination.  Currently it uses slow
byte-copy code.  This patch makes it use the string copy operations
instead.

In the test the copy performance can be improved very significantly after using
the string copy operation mechanism.
        1. The copy time can be reduced from 150ms to 20ms on one Atom machine
	2. The copy time can be reduced about 80% on another machine
		The time is reduced from 7ms to 1.5ms when using 32-bit kernel.
		The time is reduced from 10ms to 2ms when using 64-bit kernel.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-07 21:23:09 -07:00
H. Peter Anvin 55572b293b x86, mrst: A function in a header file needs to be marked "inline"
A function in a header file needs to be explicitly marked "inline", or
gcc will complain if it is not used.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: <stable@kernel.org> v2.6.36
LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>
2010-10-07 16:45:18 -07:00
Namhyung Kim a416e9e1dd x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation
On 32-bit non-PAE system, cast to 'phys_addr_t' truncates value
before subtraction. Subtracting before cast produce same result
but remove following warnings from sparse:

 arch/x86/include/asm/pgtable_types.h:255:38: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable_types.h:270:38: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable.h:127:32: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable.h:132:32: warning: cast truncates bits from constant value (100000000 becomes 0)
 arch/x86/include/asm/pgtable.h:344:31: warning: cast truncates bits from constant value (100000000 becomes 0)

64-bit or PAE machines will not be affected by this change.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
LKML-Reference: <1285770588-14065-1-git-send-email-namhyung@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-07 16:36:17 -07:00
David Howells df9ee29270 Fix IRQ flag handling naming
Fix the IRQ flag handling naming.  In linux/irqflags.h under one configuration,
it maps:

	local_irq_enable() -> raw_local_irq_enable()
	local_irq_disable() -> raw_local_irq_disable()
	local_irq_save() -> raw_local_irq_save()
	...

and under the other configuration, it maps:

	raw_local_irq_enable() -> local_irq_enable()
	raw_local_irq_disable() -> local_irq_disable()
	raw_local_irq_save() -> local_irq_save()
	...

This is quite confusing.  There should be one set of names expected of the
arch, and this should be wrapped to give another set of names that are expected
by users of this facility.

Change this to have the arch provide:

	flags = arch_local_save_flags()
	flags = arch_local_irq_save()
	arch_local_irq_restore(flags)
	arch_local_irq_disable()
	arch_local_irq_enable()
	arch_irqs_disabled_flags(flags)
	arch_irqs_disabled()
	arch_safe_halt()

Then linux/irqflags.h wraps these to provide:

	raw_local_save_flags(flags)
	raw_local_irq_save(flags)
	raw_local_irq_restore(flags)
	raw_local_irq_disable()
	raw_local_irq_enable()
	raw_irqs_disabled_flags(flags)
	raw_irqs_disabled()
	raw_safe_halt()

with type checking on the flags 'arguments', and then wraps those to provide:

	local_save_flags(flags)
	local_irq_save(flags)
	local_irq_restore(flags)
	local_irq_disable()
	local_irq_enable()
	irqs_disabled_flags(flags)
	irqs_disabled()
	safe_halt()

with tracing included if enabled.

The arch functions can now all be inline functions rather than some of them
having to be macros.

Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile]
Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze]
Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR]
Acked-by: Tony Luck <tony.luck@intel.com> [IA-64]
Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R]
Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC]
Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390]
Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score]
Acked-by: Matt Fleming <matt@console-pimps.org> [SH]
Acked-by: David S. Miller <davem@davemloft.net> [Sparc]
Acked-by: Chris Zankel <chris@zankel.net> [Xtensa]
Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha]
Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300]
Cc: starvik@axis.com [CRIS]
Cc: jesper.nilsson@axis.com [CRIS]
Cc: linux-cris-kernel@axis.com
2010-10-07 14:08:55 +01:00
Linus Torvalds 34984f54b7 Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
  xen: do not initialize PV timers on HVM if !xen_have_vector_callback
  xen: do not set xenstored_ready before xenbus_probe on hvm
2010-10-06 09:51:28 -07:00
Jeremy Fitzhardinge 161b0275e2 x86, mm: Add RESERVE_BRK_ARRAY() helper
This is useful when converting static arrays into boot-time brk
allocated objects.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4C805EEA.1080205@goop.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 22:16:54 -07:00
Yinghai Lu 16c36f743b x86, memblock: Remove __memblock_x86_find_in_range_size()
Fold it into memblock_x86_find_in_range(), and change bad_addr_size()
to check_reserve_memblock().

So whole memblock_x86_find_in_range_size() code is more readable.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CAA4DEC.4000401@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 21:45:43 -07:00
Yinghai Lu 1d931264af x86-32, memblock: Make add_highpages honor early reserved ranges
Originally the only early reserved range that is overlapped with high
pages is "KVA RAM", but we already do remove that from the active ranges.

However, It turns out Xen could have that kind of overlapping to support memory
ballooning.x

So we need to make add_highpage_with_active_regions() to subtract
memblock reserved just like low ram; this is the proper design anyway.

In this patch, refactering get_freel_all_memory_range() to make it can
be used by add_highpage_with_active_regions().  Also we don't need to
remove "KVA RAM" from active ranges.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABB183.1040607@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 21:44:35 -07:00
Yinghai Lu 9f4c13964b x86, memblock: Fix crashkernel allocation
Cai Qian found crashkernel is broken with the x86 memblock changes.

1. crashkernel=128M@32M always reported that range is used, even if
   the first kernel is small and does not usethat range

2. we always got following report when using "kexec -p"
	Could not find a free area of memory of a000 bytes...
	locate_hole failed

The root cause is that generic memblock_find_in_range() will try to
allocate from the top of the range, whereas the kexec code was written
assuming that allocation was always near the bottom and that it could
blindly extend memory upward.  Unfortunately the kexec code doesn't
have a system for requesting the range that it really needs, so this
is subject to probabilistic failures.

This patch hacks around the problem by limiting the target range
heuristically to below the traditional bzImage max range.  This number
is arbitrary and not always correct, and a much better result would be
obtained by having kexec communicate this number based on the kernel
header information and any appropriate command line options.

Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CABAF2A.5090501@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-10-05 21:43:14 -07:00
Linus Torvalds 39c12be86a Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf trace scripting: Fix extern struct definitions
  perf ui hist browser: Fix segfault on 'a' for annotate
  perf tools: Fix build breakage
  perf, x86: Handle in flight NMIs on P4 platform
  oprofile, ARM: Release resources on failure
  oprofile: Add Support for Intel CPU Family 6 / Model 29
2010-10-05 11:57:37 -07:00
Linus Torvalds 5336377d62 modules: Fix module_bug_list list corruption race
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.

However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling.  That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.

Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.

So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.

Future fixups:
 - move the module list handling code into kernel/module.c where it
   belongs.
 - get rid of 'module_bug_list' and just use the regular list of modules
   (called 'modules' - imagine that) that we already create and maintain
   for other reasons.

Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-05 11:29:27 -07:00
Stefano Stabellini 31e7e931cd xen: do not initialize PV timers on HVM if !xen_have_vector_callback
if !xen_have_vector_callback do not initialize PV timer unconditionally
because we still don't know how many cpus are available and if there is
more than one we won't be able to receive the timer interrupts on
cpu > 0.

This patch fixes an hang at boot when Xen does not support vector
callbacks and the guest has multiple vcpus.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
2010-10-05 13:39:23 +01:00
Andi Kleen c62f981f93 perf, gcc-4.6: Fix set but unused variable
Just dead code I believe.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: andi@firstfloor.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05 09:48:07 +02:00
Ingo Molnar 00e8976200 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/util/ui/browsers/hists.c

Merge reason: fix the conflict and merge in changes for dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-05 09:47:14 +02:00
Borislav Petkov 366d4a43b1 x86, cpu: Fix X86_FEATURE_NOPL
ba0593bf55 cleared the aforementioned
cpuid bit only on 32-bit due to various problems with Virtual PC. This
somehow got lost during the 32- + 64-bit merge so restore the feature
bit on 64-bit. For that, set it explicitly for non-constant arguments of
cpu_has(). Update comment for future reference.

Signed-off-by: Borislav Petkov <bp@alien8.de>
LKML-Reference: <20101004073127.GA20305@liondog.tnic>
Cc: Ryan O'Neill <ryan@innosecc.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-04 11:22:24 -07:00
Linus Torvalds 5a4bbd01c8 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc
  [CPUFREQ] acpi-cpufreq: add missing __percpu markup
2010-10-04 11:14:21 -07:00
Thomas Gleixner 3bb9808e99 x86: Use genirq Kconfig
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100927121843.314600915@linutronix.de>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
2010-10-04 11:01:15 +02:00
Andreas Herrmann 5c80cc78de x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
AMD CPU family 0x15 still supports GART for compatibility reasons.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930124316.GG20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann d4fbe4f035 x86, amd: Use compute unit information to determine thread siblings
This information is vital for different load balancing policies.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930124156.GF20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann 6057b4d331 x86, amd: Extract compute unit information for AMD CPUs
Get compute unit information from CPUID Fn8000_001E_EBX.
(See AMD CPUID Specification - publication # 25481, revision 2.34,
September 2010.)

Note that each core on a compute unit still has a core_id of its own.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123857.GE20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann 23588c38a8 x86, amd: Add support for CPUID topology extension of AMD CPUs
Node information (ID, number of internal nodes) is provided via
CPUID Fn8000_001e_ECX.

See AMD CPUID Specification (Publication # 25481, Revision 2.34,
September 2010).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123628.GD20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann 420b13b60a x86, nmi: Support NMI watchdog on newer AMD CPU families
CPU families 0x12, 0x14 and 0x15 support this functionality.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123357.GC20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:32 -07:00
Andreas Herrmann 3fdbf004c1 x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
Instead of adapting the CPU family check in amd_special_default_mtrr()
for each new CPU family assume that all new AMD CPUs support the
necessary bits in SYS_CFG MSR.

Tom2Enabled is architectural (defined in APM Vol.2).
Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT.
In pre K8-NPT BKDG this bit is reserved (read as zero).

W/o this adaption Linux would unnecessarily complain about bad MTRR
settings on every new AMD CPU family, e.g.

[    0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM.

Cc: stable@kernel.org # .32.x, .35.x
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100930123235.GB20545@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-01 16:18:31 -07:00
H. Peter Anvin 86ffb08519 Merge remote branch 'origin/x86/cpu' into x86/amd-nb 2010-10-01 16:18:11 -07:00
Linus Torvalds f4a3330d76 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hpet: Fix bogus error check in hpet_assign_irq()
  x86, irq: Plug memory leak in sparse irq
  x86, cpu: After uncapping CPUID, re-run CPU feature detection
2010-10-01 15:02:41 -07:00
Robert Richter 5140434d5f oprofile, x86: Simplify init/exit functions
Now, that we only call the exit function if init succeeds with commit:

 979048e oprofile: don't call arch exit code from init code on failure

we can simplify the x86 init/exit functions too. Variable using_nmi
becomes obsolete.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-01 17:05:47 +02:00
Jiri Olsa f6dedecc37 oprofile, x86: Adding backtrace dump for 32bit process in compat mode
This patch implements the oprofile backtrace  generation for 32 bit
applications running in the 64bit environment (compat mode).

With this change it's possible to get backtrace for 32bits applications
under the 64bits environment using oprofile's callgraph options.

opcontrol --setup -c ...
opreport -l -cg ...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-01 16:07:18 +02:00
Jiri Olsa 40c6b3cb64 oprofile, x86: Using struct stack_frame for 64bit processes dump
Removing unnecessary struct frame_head and replacing it with
struct stack_frame.

The struct stack_frame is already defined and used in other places
in kernel, so there's no reason to define new structure.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-01 16:07:09 +02:00
Thomas Gleixner 0219896228 x86, hpet: Fix bogus error check in hpet_assign_irq()
create_irq() returns -1 if the interrupt allocation failed, but the
code checks for irq == 0.

Use create_irq_nr() instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-30 15:57:35 -07:00
Thomas Gleixner 1cf180c94e x86, irq: Plug memory leak in sparse irq
free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this
triggers a use after free caused by the fact that copying struct
irq_cfg is done with memcpy, which copies the pointer not the cpumask.

Fix both places.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-30 15:57:35 -07:00
Pekka Enberg 3682930623 [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc
If acpi_evaluate_object() function call doesn't fail, we must kfree()
output.buffer before returning from pcc_cpufreq_do_osc().

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-09-30 16:14:23 -04:00
Namhyung Kim 86cf147494 [CPUFREQ] acpi-cpufreq: add missing __percpu markup
acpi_perf_data is a percpu pointer but was missing __percpu markup.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-09-30 16:14:22 -04:00
Cyrill Gorcunov 03e22198d2 perf, x86: Handle in flight NMIs on P4 platform
Stephane reported we've forgot to guard the P4 platform
against spurious in-flight performance IRQs. Fix it.

This fixes potential spurious 'dazed and confused' NMI
messages.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Robert Richter <robert.richter@amd.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-30 09:17:59 +02:00
Dan Carpenter b365a85c68 x86, UV: Use allocated buffer in tlb_uv.c:tunables_read()
The original code didn't check that the value returned from
snprintf() was less than the size of the buffer.  Although it
didn't cause a runtime bug in this case, it makes the static
checkers complain.

Andrew Morton suggested a dynamically sized buffer would be
cleaner.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
LKML-Reference: <20100929083118.GA6376@bicker>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-30 09:11:27 +02:00
Namhyung Kim bd126b23a2 ACPI: add missing __percpu markup in arch/x86/kernel/acpi/cstate.c
cpu_cstate_entry is a percpu pointer
but was missing __percpu markup.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-28 21:38:20 -04:00
H. Peter Anvin d900329e20 x86, cpu: After uncapping CPUID, re-run CPU feature detection
After uncapping the CPUID level, we need to also re-run the CPU
feature detection code.

This resolves kernel bugzilla 16322.

Reported-by: boris64 <bugzilla.kernel.org@boris64.net>
Cc: <stable@kernel.org> v2.6.29..2.6.35
LKML-Reference: <tip-@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-28 16:33:14 -07:00
Linus Torvalds 050026feae Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile
2010-09-27 21:19:27 -07:00