Commit Graph

49 Commits

Author SHA1 Message Date
whitequark 42754ce710 Add profiling support, through the rustc -Z profile flag.
When -Z profile is passed, the GCDAProfiling LLVM pass is added
to the pipeline, which uses debug information to instrument the IR.
After compiling with -Z profile, the $(OUT_DIR)/$(CRATE_NAME).gcno
file is created, containing initial profiling information.
After running the program built, the $(OUT_DIR)/$(CRATE_NAME).gcda
file is created, containing branch counters.

The created *.gcno and *.gcda files can be processed using
the "llvm-cov gcov" and "lcov" tools. The profiling data LLVM
generates does not faithfully follow the GCC's format for *.gcno
and *.gcda files, and so it will probably not work with other tools
(such as gcov itself) that consume these files.
2017-05-01 09:16:20 +00:00
A.J. Gardner 9240054b3e Expose LLVM appendModuleInlineAsm 2017-04-12 19:12:49 -05:00
Vadim Chugunov 0f87203e2e Make sure that -lole32 ends up *after* LLVM libs on the linker command line. 2017-03-30 16:31:46 -07:00
Philipp Oppermann b44805875e Add support for x86-interrupt calling convention
Tracking issue: https://github.com/rust-lang/rust/issues/40180

This calling convention can be used for definining interrupt handlers on
32-bit and 64-bit x86 targets. The compiler then uses `iret` instead of
`ret` for returning and ensures that all registers are restored to their
original values.

Usage:

```
extern "x86-interrupt" fn handler(stack_frame: &ExceptionStackFrame) {…}
```

for interrupts and exceptions without error code and

```
extern "x86-interrupt" fn page_fault_handler(stack_frame: &ExceptionStackFrame,
                                             error_code: u64) {…}
```

for exceptions that push an error code (e.g., page faults or general
protection faults). The programmer must ensure that the correct version
is used for each interrupt.

For more details see the [LLVM PR][1] and the corresponding [proposal][2].

[1]: https://reviews.llvm.org/D15567
[2]: http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html
2017-03-02 19:01:15 +01:00
bors 05a7f25cc4 Auto merge of #39456 - nagisa:mir-switchint-everywhere, r=nikomatsakis
[MIR] SwitchInt Everywhere

Something I've been meaning to do for a very long while. This PR essentially gets rid of 3 kinds of conditional branching and only keeps the most general one - `SwitchInt`. Primary benefits are such that dealing with MIR now does not involve dealing with 3 different ways to do conditional control flow. On the other hand, constructing a `SwitchInt` currently requires more code than what previously was necessary to build an equivalent `If` terminator. Something trivially "fixable" with some constructor methods somewhere (MIR needs stuff like that badly in general).

Some timings (tl;dr: slightly faster^1 (unexpected), but also uses slightly more memory at peak (expected)):

^1: Not sure if the speed benefits are because of LLVM liking the generated code better or the compiler itself getting compiled better. Either way, its a net benefit. The CORE and SYNTAX timings done for compilation without optimisation.

```
AFTER:
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 31.50 secs
    Finished release [optimized] target(s) in 31.42 secs
Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 439.56 secs
    Finished release [optimized] target(s) in 435.15 secs

CORE: 99% (24.81 real, 0.13 kernel, 24.57 user); 358536k resident
CORE: 99% (24.56 real, 0.15 kernel, 24.36 user); 359168k resident
SYNTAX: 99% (49.98 real, 0.48 kernel, 49.42 user); 653416k resident
SYNTAX: 99% (50.07 real, 0.58 kernel, 49.43 user); 653604k resident

BEFORE:
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 31.84 secs
Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 451.17 secs

CORE: 99% (24.66 real, 0.20 kernel, 24.38 user); 351096k resident
CORE: 99% (24.36 real, 0.17 kernel, 24.18 user); 352284k resident
SYNTAX: 99% (52.24 real, 0.56 kernel, 51.66 user); 645544k resident
SYNTAX: 99% (51.55 real, 0.48 kernel, 50.99 user); 646428k resident
```

cc @nikomatsakis @eddyb
2017-02-13 02:32:09 +00:00
Matt Ickstadt 68fff62542 [LLVM 4.0] Fix CreateCompileUnit 2017-02-11 15:15:28 -06:00
Simonas Kazlauskas f3bd723101 Fix intcast, use it where appropriate 2017-02-10 19:47:09 +02:00
bors 4053276354 Auto merge of #38109 - tromey:main-subprogram, r=michaelwoerister
Emit DW_AT_main_subprogram

This changes rustc to emit DW_AT_main_subprogram on the "main" program.
This lets gdb suitably stop at the user's main in response to
"start" (rather than the library's main, which is what happens
currently).

Fixes #32620
r? michaelwoerister
2017-02-09 17:09:50 +00:00
Corey Farwell 3053494a9a Rollup merge of #38699 - japaric:lsan, r=alexcrichton
LeakSanitizer, ThreadSanitizer, AddressSanitizer and MemorySanitizer support

```
$ cargo new --bin leak && cd $_

$ edit Cargo.toml && tail -n3 $_
```

``` toml
[profile.dev]
opt-level = 1
```

```
$ edit src/main.rs && cat $_
```

``` rust
use std::mem;

fn main() {
    let xs = vec![0, 1, 2, 3];
    mem::forget(xs);
}
```

```
$ RUSTFLAGS="-Z sanitizer=leak" cargo run --target x86_64-unknown-linux-gnu; echo $?
    Finished dev [optimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/leak`

=================================================================
==10848==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x557c3488db1f in __interceptor_malloc /shared/rust/checkouts/lsan/src/compiler-rt/lib/lsan/lsan_interceptors.cc:55
    #1 0x557c34888aaa in alloc::heap::exchange_malloc::h68f3f8b376a0da42 /shared/rust/checkouts/lsan/src/liballoc/heap.rs:138
    #2 0x557c34888afc in leak::main::hc56ab767de6d653a $PWD/src/main.rs:4
    #3 0x557c348c0806 in __rust_maybe_catch_panic ($PWD/target/debug/leak+0x3d806)

SUMMARY: LeakSanitizer: 16 byte(s) leaked in 1 allocation(s).
23
```

```
$ cargo new --bin racy && cd $_

$ edit src/main.rs && cat $_
```

``` rust
use std::thread;

static mut ANSWER: i32 = 0;

fn main() {
    let t1 = thread::spawn(|| unsafe { ANSWER = 42 });
    unsafe {
        ANSWER = 24;
    }
    t1.join().ok();
}
```

```
$ RUSTFLAGS="-Z sanitizer=thread" cargo run --target x86_64-unknown-linux-gnu; echo $?
==================
WARNING: ThreadSanitizer: data race (pid=12019)
  Write of size 4 at 0x562105989bb4 by thread T1:
    #0 racy::main::_$u7b$$u7b$closure$u7d$$u7d$::hbe13ea9e8ac73f7e $PWD/src/main.rs:6 (racy+0x000000010e3f)
    #1 _$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h2e466a92accacc78 /shared/rust/checkouts/lsan/src/libstd/panic.rs:296 (racy+0x000000010cc5)
    #2 std::panicking::try::do_call::h7f4d2b38069e4042 /shared/rust/checkouts/lsan/src/libstd/panicking.rs:460 (racy+0x00000000c8f2)
    #3 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56)
    #4 std::panic::catch_unwind::h31ca45621ad66d5a /shared/rust/checkouts/lsan/src/libstd/panic.rs:361 (racy+0x00000000b517)
    #5 std:🧵:Builder::spawn::_$u7b$$u7b$closure$u7d$$u7d$::hccfc37175dea0b01 /shared/rust/checkouts/lsan/src/libstd/thread/mod.rs:357 (racy+0x00000000c226)
    #6 _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hd880bbf91561e033 /shared/rust/checkouts/lsan/src/liballoc/boxed.rs:605 (racy+0x00000000f27e)
    #7 std::sys:👿🧵:Thread:🆕:thread_start::hebdfc4b3d17afc85 <null> (racy+0x0000000abd40)

  Previous write of size 4 at 0x562105989bb4 by main thread:
    #0 racy::main::h23e6e5ca46d085c3 $PWD/src/main.rs:8 (racy+0x000000010d7c)
    #1 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56)
    #2 __libc_start_main <null> (libc.so.6+0x000000020290)

  Location is global 'racy::ANSWER::h543d2b139f819b19' of size 4 at 0x562105989bb4 (racy+0x0000002f8bb4)

  Thread T1 (tid=12028, running) created by main thread at:
    #0 pthread_create /shared/rust/checkouts/lsan/src/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (racy+0x00000001aedb)
    #1 std::sys:👿🧵:Thread:🆕:hce44187bf4a36222 <null> (racy+0x0000000ab9ae)
    #2 std:🧵:spawn::he382608373eb667e /shared/rust/checkouts/lsan/src/libstd/thread/mod.rs:412 (racy+0x00000000b5aa)
    #3 racy::main::h23e6e5ca46d085c3 $PWD/src/main.rs:6 (racy+0x000000010d5c)
    #4 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56)
    #5 __libc_start_main <null> (libc.so.6+0x000000020290)

SUMMARY: ThreadSanitizer: data race $PWD/src/main.rs:6 in racy::main::_$u7b$$u7b$closure$u7d$$u7d$::hbe13ea9e8ac73f7e
==================
ThreadSanitizer: reported 1 warnings
66
```

```
$ cargo new --bin oob && cd $_

$ edit src/main.rs && cat $_
```

``` rust
fn main() {
    let xs = [0, 1, 2, 3];
    let y = unsafe { *xs.as_ptr().offset(4) };
}
```

```
$ RUSTFLAGS="-Z sanitizer=address" cargo run --target x86_64-unknown-linux-gnu; echo $?
=================================================================
==13328==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff29f3ecd0 at pc 0x55802dc6bf7e bp 0x7fff29f3ec90 sp 0x7fff29f3ec88
READ of size 4 at 0x7fff29f3ecd0 thread T0
    #0 0x55802dc6bf7d in oob::main::h0adc7b67e5feb2e7 $PWD/src/main.rs:3
    #1 0x55802dd60426 in __rust_maybe_catch_panic ($PWD/target/debug/oob+0xfe426)
    #2 0x55802dd58dd9 in std::rt::lang_start::hb2951fc8a59d62a7 ($PWD/target/debug/oob+0xf6dd9)
    #3 0x55802dc6c002 in main ($PWD/target/debug/oob+0xa002)
    #4 0x7fad8c3b3290 in __libc_start_main (/usr/lib/libc.so.6+0x20290)
    #5 0x55802dc6b719 in _start ($PWD/target/debug/oob+0x9719)

Address 0x7fff29f3ecd0 is located in stack of thread T0 at offset 48 in frame
    #0 0x55802dc6bd5f in oob::main::h0adc7b67e5feb2e7 $PWD/src/main.rs:1

  This frame has 1 object(s):
    [32, 48) 'xs' <== Memory access at offset 48 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow $PWD/src/main.rs:3 in oob::main::h0adc7b67e5feb2e7
Shadow bytes around the buggy address:
  0x1000653dfd40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x1000653dfd90: 00 00 00 00 f1 f1 f1 f1 00 00[f3]f3 00 00 00 00
  0x1000653dfda0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfdb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfdc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfdd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1000653dfde0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==13328==ABORTING
1
```

```
$ cargo new --bin uninit && cd $_

$ edit src/main.rs && cat $_
```

``` rust
use std::mem;

fn main() {
    let xs: [u8; 4] = unsafe { mem::uninitialized() };
    let y = xs[0] + xs[1];
}
```

```
$ RUSTFLAGS="-Z sanitizer=memory" cargo run; echo $?
==30198==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x563f4b6867da in uninit::main::hc2731cd4f2ed48f8 $PWD/src/main.rs:5
    #1 0x563f4b7033b6 in __rust_maybe_catch_panic ($PWD/target/debug/uninit+0x873b6)
    #2 0x563f4b6fbd69 in std::rt::lang_start::hb2951fc8a59d62a7 ($PWD/target/debug/uninit+0x7fd69)
    #3 0x563f4b6868a9 in main ($PWD/target/debug/uninit+0xa8a9)
    #4 0x7fe844354290 in __libc_start_main (/usr/lib/libc.so.6+0x20290)
    #5 0x563f4b6864f9 in _start ($PWD/target/debug/uninit+0xa4f9)

SUMMARY: MemorySanitizer: use-of-uninitialized-value $PWD/src/main.rs:5 in uninit::main::hc2731cd4f2ed48f8
Exiting
77
```
2017-02-08 23:55:43 -05:00
Jorge Aparicio 775a93646c build/test the sanitizers only when --enable-sanitizers is used 2017-02-08 18:51:43 -05:00
Jorge Aparicio 9af6aa3889 sanitizer support 2017-02-08 18:51:43 -05:00
Corey Farwell 7709c4d2b9 Rollup merge of #39529 - dylanmckay:llvm-4.0-align32, r=alexcrichton
[LLVM 4.0] Use 32-bits for alignment

LLVM 4.0 changes this. This change is fine to make for LLVM 3.9 as we
won't have alignments greater than 2^32-1.
2017-02-08 10:19:49 -05:00
Tom Tromey b037c5211b Emit DW_AT_main_subprogram
This changes rustc to emit DW_AT_main_subprogram on the "main" program.
This lets gdb suitably stop at the user's main in response to
"start" (rather than the library's main, which is what happens
currently).

Fixes #32620
r? michaelwoerister
2017-02-04 23:19:39 -07:00
Dylan McKay b4e6f70eda [llvm] Use 32-bits for alignment
LLVM 4.0 changes this. This change is fine to make for LLVM 3.9 as we
won't have alignments greater than 2^32-1.
2017-02-04 23:51:10 +13:00
Dylan McKay 768c6c081e Support a debug info API change for LLVM 4.0
Instead of directly creating a 'DIGlobalVariable', we now have to create
a 'DIGlobalVariableExpression' which itself contains a reference to a
'DIGlobalVariable'.

This is a straightforward change.

In the future, we should rename 'DIGlobalVariable' in the FFI
bindings, assuming we will only refer to 'DIGlobalVariableExpression'
and not 'DIGlobalVariable'.
2017-02-04 23:22:05 +13:00
Simonas Kazlauskas 1363cdaec9 Remove unnecessary LLVMRustPersonalityFn binding
LLVM Core C bindings provide this function for all the versions back to what we support (3.7), and
helps to avoid this unnecessary builder->function transition every time. Also a negative diff.
2017-01-26 23:49:17 +02:00
Jorge Aparicio 6296d52ba6 calling convention for MSP430 interrupts
This calling convention is used to define interrup handlers on MSP430
microcontrollers. Usage looks like this:

``` rust
#[no_mangle]
#[link_section = "__interrupt_vector_10"]
pub static TIM0_VECTOR: unsafe extern "msp430-interrupt" fn() = tim0;

unsafe extern "msp430-interrupt" fn tim0() {
  P1OUT.write(0x00);
}
```

which generates the following assembly:

``` asm
Disassembly of section __interrupt_vector_10:

0000fff2 <TIM0_VECTOR>:
    fff2:       10 c0           interrupt service routine at 0xc010

Disassembly of section .text:

0000c010 <_ZN3msp4tim017h3193b957fd6a4fd4E>:
    c010:       c2 43 21 00     mov.b   #0,     &0x0021 ;r3 As==00
    c014:       00 13           reti
        ...
```
2017-01-18 20:42:54 -05:00
Simonas Kazlauskas ee69cd7925 Calculate discriminant bounds within 64 bits
Since discriminants do not support i128 yet, lets just calculate the boundaries within the 64 bits
that are supported. This also avoids an issue with bootstrapping on 32 bit systems due to #38727.
2016-12-31 04:55:29 +02:00
Simonas Kazlauskas 9aad2d551e Add a way to retrieve constant value in 128 bits
Fixes rebase fallout, makes code correct in presence of 128-bit constants.

This commit includes manual merge conflict resolution changes from a rebase by @est31.
2016-12-30 15:17:26 +01:00
Simonas Kazlauskas d9eb756cbf Wrapping<i128> and attempt at LLVM 3.7 compat
This commit includes manual merge conflict resolution changes from a rebase by @est31.
2016-12-30 15:17:26 +01:00
Simonas Kazlauskas b0e55a83a8 Such large. Very 128. Much bits.
This commit introduces 128-bit integers. Stage 2 builds and produces a working compiler which
understands and supports 128-bit integers throughout.

The general strategy used is to have rustc_i128 module which provides aliases for iu128, equal to
iu64 in stage9 and iu128 later. Since nowhere in rustc we rely on large numbers being supported,
this strategy is good enough to get past the first bootstrap stages to end up with a fully working
128-bit capable compiler.

In order for this strategy to work, number of locations had to be changed to use associated
max_value/min_value instead of MAX/MIN constants as well as the min_value (or was it max_value?)
had to be changed to use xor instead of shift so both 64-bit and 128-bit based consteval works
(former not necessarily producing the right results in stage1).

This commit includes manual merge conflict resolution changes from a rebase by @est31.
2016-12-30 15:15:44 +01:00
Alex Crichton bcfd504744 Rollup merge of #38559 - japaric:ptx2, r=alexcrichton
PTX support, take 2

- You can generate PTX using `--emit=asm` and the right (custom) target. Which
  then you can run on a NVIDIA GPU.

- You can compile `core` to PTX. [Xargo] also works and it can compile some
  other crates like `collections` (but I doubt all of those make sense on a GPU)

[Xargo]: https://github.com/japaric/xargo

- You can create "global" functions, which can be "called" by the host, using
  the `"ptx-kernel"` ABI, e.g. `extern "ptx-kernel" fn kernel() { .. }`. Every
  other function is a "device" function and can only be called by the GPU.

- Intrinsics like `__syncthreads()` and `blockIdx.x` are available as
  `"platform-intrinsics"`. These intrinsics are *not* in the `core` crate but
  any Rust user can create "bindings" to them using an `extern
  "platform-intrinsics"` block. See example at the end.

- Trying to emit PTX with `-g` (debuginfo); you get an LLVM error. But I don't
  think PTX can contain debuginfo anyway so `-g` should be ignored and a warning
  should be printed ("`-g` doesn't work with this target" or something).

- "Single source" support. You *can't* write a single source file that contains
  both host and device code. I think that should be possible to implement that
  outside the compiler using compiler plugins / build scripts.

- The equivalent to CUDA `__shared__` which it's used to declare memory that's
  shared between the threads of the same block. This could be implemented using
  attributes: `#[shared] static mut SCRATCH_MEMORY: [f32; 64]` but hasn't been
  implemented yet.

- Built-in targets. This PR doesn't add targets to the compiler just yet but one
  can create custom targets to be able to emit PTX code (see the example at the
  end). The idea is to have people experiment with this feature before
  committing to it (built-in targets are "insta-stable")

- All functions must be "inlined". IOW, the `.rlib` must always contain the LLVM
  bitcode of all the functions of the crate it was produced from. Otherwise, you
  end with "undefined references" in the final PTX code but you won't get *any*
  linker error because no linker is involved. IOW, you'll hit a runtime error
  when loading the PTX into the GPU. The workaround is to use `#[inline]` on
  non-generic functions and to never use `#[inline(never)]` but this may not
  always be possible because e.g. you could be relying on third party code.

- Should `--emit=asm` generate a `.ptx` file instead of a `.s` file?

TL;DR Use Xargo to turn a crate into a PTX module (a `.s` file). Then pass that
PTX module, as a string, to the GPU and run it.

The full code is in [this repository]. This section gives an overview of how to
run Rust code on a NVIDIA GPU.

[this repository]: https://github.com/japaric/cuda

- Create a custom target. Here's the 64-bit NVPTX target (NOTE: the comments
  are not valid because this is supposed to be a JSON file; remove them before
  you use this file):

``` js
// nvptx64-nvidia-cuda.json
{
  "arch": "nvptx64",  // matches LLVM
  "cpu": "sm_20",  // "oldest" compute capability supported by LLVM
  "data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
  "llvm-target": "nvptx64-nvidia-cuda",
  "max-atomic-width": 0,  // LLVM errors with any other value :-(
  "os": "cuda",  // matches LLVM
  "panic-strategy": "abort",
  "target-endian": "little",
  "target-pointer-width": "64",
  "target-vendor": "nvidia",  // matches LLVM -- not required
}
```

(There's a 32-bit target specification in the linked repository)

- Write a kernel

``` rust

extern "platform-intrinsic" {
    fn nvptx_block_dim_x() -> i32;
    fn nvptx_block_idx_x() -> i32;
    fn nvptx_thread_idx_x() -> i32;
}

/// Copies an array of `n` floating point numbers from `src` to `dst`
pub unsafe extern "ptx-kernel" fn memcpy(dst: *mut f32,
                                         src: *const f32,
                                         n: usize) {
    let i = (nvptx_block_dim_x() as isize)
        .wrapping_mul(nvptx_block_idx_x() as isize)
        .wrapping_add(nvptx_thread_idx_x() as isize);

    if (i as usize) < n {
        *dst.offset(i) = *src.offset(i);
    }
}
```

- Emit PTX code

```
$ xargo rustc --target nvptx64-nvidia-cuda --release -- --emit=asm
   Compiling core v0.0.0 (file://..)
   (..)
   Compiling nvptx-builtins v0.1.0 (https://github.com/japaric/nvptx-builtins)
   Compiling kernel v0.1.0

$ cat target/nvptx64-nvidia-cuda/release/deps/kernel-*.s
//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 64

        // .globl       memcpy

.visible .entry memcpy(
        .param .u64 memcpy_param_0,
        .param .u64 memcpy_param_1,
        .param .u64 memcpy_param_2
)
{
        .reg .pred      %p<2>;
        .reg .s32       %r<5>;
        .reg .s64       %rd<12>;

        ld.param.u64    %rd7, [memcpy_param_2];
        mov.u32 %r1, %ntid.x;
        mov.u32 %r2, %ctaid.x;
        mul.wide.s32    %rd8, %r2, %r1;
        mov.u32 %r3, %tid.x;
        cvt.s64.s32     %rd9, %r3;
        add.s64         %rd10, %rd9, %rd8;
        setp.ge.u64     %p1, %rd10, %rd7;
        @%p1 bra        LBB0_2;
        ld.param.u64    %rd3, [memcpy_param_0];
        ld.param.u64    %rd4, [memcpy_param_1];
        cvta.to.global.u64      %rd5, %rd4;
        cvta.to.global.u64      %rd6, %rd3;
        shl.b64         %rd11, %rd10, 2;
        add.s64         %rd1, %rd6, %rd11;
        add.s64         %rd2, %rd5, %rd11;
        ld.global.u32   %r4, [%rd2];
        st.global.u32   [%rd1], %r4;
LBB0_2:
        ret;
}
```

- Run it on the GPU

``` rust
// `kernel.ptx` is the `*.s` file we got in the previous step
const KERNEL: &'static str = include_str!("kernel.ptx");

driver::initialize()?;

let device = Device(0)?;
let ctx = device.create_context()?;
let module = ctx.load_module(KERNEL)?;
let kernel = module.function("memcpy")?;

let h_a: Vec<f32> = /* create some random data */;
let h_b = vec![0.; N];

let d_a = driver::allocate(bytes)?;
let d_b = driver::allocate(bytes)?;

// Copy from host to GPU
driver::copy(h_a, d_a)?;

// Run `memcpy` on the GPU
kernel.launch(d_b, d_a, N)?;

// Copy from GPU to host
driver::copy(d_b, h_b)?;

// Verify
assert_eq!(h_a, h_b);

// `d_a`, `d_b`, `h_a`, `h_b` are dropped/freed here
```

---

cc @alexcrichton @brson @rkruppe

> What has changed since #34195?

- `core` now can be compiled into PTX. Which makes it very easy to turn `no_std`
  crates into "kernels" with the help of Xargo.

- There's now a way, the `"ptx-kernel"` ABI, to generate "global" functions. The
  old PR required a manual step (it was hack) to "convert" "device" functions
  into "global" functions. (Only "global" functions can be launched by the host)

- Everything is unstable. There are not "insta stable" built-in targets this
  time (\*). The users have to use a custom target to experiment with this
  feature. Also, PTX instrinsics, like `__syncthreads` and `blockIdx.x`, are now
  implemented as `"platform-intrinsics"` so they no longer live in the `core`
  crate.

(\*) I'd actually like to have in-tree targets because that makes this target
more discoverable, removes the need to lug around .json files, etc.

However, bundling a target with the compiler immediately puts it in the path
towards stabilization. Which gives us just two cycles to find and fix any
problem with the target specification. Afterwards, it becomes hard to tweak
the specification because that could be a breaking change.

A possible solution could be "unstable built-in targets". Basically, to use an
unstable target, you'll have to also pass `-Z unstable-options` to the compiler.
And unstable targets, being unstable, wouldn't be available on stable.

> Why should this be merged?

- To let people experiment with the feature out of tree. Having easy access to
  the feature (in every nightly) allows this. I also think that, as it is, it
  should be possible to start prototyping type-safe single source support using
  build scripts, macros and/or plugins.

- It's a straightforward implementation. No different that adding support for
  any other architecture.
2016-12-29 17:26:15 -08:00
Alex Crichton 03bc2cf35a Fallout from updating bootstrap Cargo 2016-12-29 08:47:26 -08:00
Jorge Aparicio 18d49288d5 PTX support
- `--emit=asm --target=nvptx64-nvidia-cuda` can be used to turn a crate
  into a PTX module (a `.s` file).

- intrinsics like `__syncthreads` and `blockIdx.x` are exposed as
  `"platform-intrinsics"`.

- "cabi" has been implemented for the nvptx and nvptx64 architectures.
  i.e. `extern "C"` works.

- a new ABI, `"ptx-kernel"`. That can be used to generate "global"
  functions. Example: `extern "ptx-kernel" fn kernel() { .. }`. All
  other functions are "device" functions.
2016-12-26 21:06:23 -05:00
Ivan Molodetskikh c461cdfdf6
Fixed fastcall not applying inreg attributes to arguments like the C/C++ fastcall. 2016-12-21 21:44:40 +03:00
Mark-Simulacrum cbbdb73eb0 Remove FunctionContext::cleanup, replacing it with a Drop impl.
Move alloca and initial entry block creation into FunctionContext::new.
2016-12-20 20:03:27 -07:00
Jake Goulding 5bce12c95f [LLVM 4.0] Move debuginfo alignment argument
Alignment was removed from createBasicType and moved to

- createGlobalVariable
- createAutoVariable
- createStaticMemberType (unused in Rust)
- createTempGlobalVariableFwdDecl (unused in Rust)

e69c459a6e
2016-12-12 09:00:04 -05:00
bors 1692c0b587 Auto merge of #37973 - vadimcn:dllimport, r=alexcrichton
Implement RFC 1717

Implement the first two points from #37403.

r? @alexcrichton
2016-12-06 10:54:45 +00:00
Michael Woerister d1a6d47f94 Make LLVM symbol visibility FFI types more stable. 2016-12-05 11:05:25 -05:00
bors 125474de07 Auto merge of #37857 - shepmaster:llvm-4.0-dinodes, r=michaelwoerister
[LLVM 4.0] Handle new DIFlags enum
2016-12-04 02:30:23 +00:00
bors 08faff49c3 Auto merge of #38055 - rkruppe:rm-unused-llvm-ffi, r=alexcrichton
Remove unused functions from rustc_llvm
2016-12-03 03:57:57 +00:00
Jake Goulding dbdd60e6d7 [LLVM] Introduce a stable representation of DIFlags
In LLVM 4.0, this enum becomes an actual type-safe enum, which breaks
all of the interfaces. Introduce our own copy of the bitflags that we
can then safely convert to the LLVM one.
2016-12-02 21:13:31 -05:00
Vadim Chugunov a9a6f8c8ed Remove the "linked_from" feature. 2016-12-01 16:56:49 -08:00
Robin Kruppe 6e35cc94de Remove unused functions from rustc_llvm 2016-11-28 21:59:26 +01:00
Robin Kruppe 85dc08e525 Don't assume llvm::StringRef is null terminated
StringRefs have a length and their contents are not usually null-terminated.
The solution is to either copy the string data (in rustc_llvm::diagnostic) or take the size into account (in LLVMRustPrintPasses).
I couldn't trigger a bug caused by this (apparently all the strings returned in practice are actually null-terminated) but this is more correct and more future-proof.
2016-11-28 17:33:13 +01:00
Robin Kruppe 730400167a Support LLVM 4.0 in OptimizationDiagnostic FFI
- getMsg() changed to return std::string by-value. Fix: copy the data to a rust String during unpacking.
- getPassName() changed to return StringRef
2016-11-24 17:33:47 +01:00
Seo Sanghyeon c45f3dee10 Restore compatibility with LLVM 3.7 and 3.8 2016-11-21 20:30:05 +09:00
bors 0bd2ce62b2 Auto merge of #37831 - rkruppe:llvm-attr-fwdcompat, r=eddyb
[LLVM 4.0] Use llvm::Attribute APIs instead of "raw value" APIs

The latter will be removed in LLVM 4.0 (see 4a6fc8bacf).

The librustc_llvm API remains mostly unchanged, except that llvm::Attribute is no longer a bitflag but represents only a *single* attribute.
The ability to store many attributes in a small number of bits and modify them without interacting with LLVM is only used in rustc_trans::abi and closely related modules, and only attributes for function arguments are considered there.
Thus rustc_trans::abi now has its own bit-packed representation of argument attributes, which are translated to rustc_llvm::Attribute when applying the attributes.

cc #37609
2016-11-19 16:39:25 -06:00
Robin Kruppe 30daedf603 Use llvm::Attribute API instead of "raw value" APIs, which will be removed in LLVM 4.0.
The librustc_llvm API remains mostly unchanged, except that llvm::Attribute is no longer a bitflag but represents only a *single* attribute.
The ability to store many attributes in a small number of bits and modify them without interacting with LLVM is only used in rustc_trans::abi and closely related modules, and only attributes for function arguments are considered there.
Thus rustc_trans::abi now has its own bit-packed representation of argument attributes, which are translated to rustc_llvm::Attribute when applying the attributes.
2016-11-17 21:12:26 +01:00
Jorge Aparicio 456ceba137 fix `extern "aapcs" fn`
to actually use the AAPCS calling convention

closes #37810

This is technically a [breaking-change] because it changes the ABI of
`extern "aapcs"` functions that (a) involve `f32`/`f64` arguments/return
values and (b) are compiled for arm-eabihf targets from
"aapcs-vfp" (wrong) to "aapcs" (correct).

Appendix:

What these ABIs mean?

- In the "aapcs-vfp" ABI or "hard float" calling convention: Floating
point values are passed/returned through FPU registers (s0, s1, d0, etc.)

- Whereas, in the "aapcs" ABI or "soft float" calling convention:
Floating point values are passed/returned through general purpose
registers (r0, r1, etc.)

Mixing these ABIs can cause problems if the caller assumes that the
routine is using one of these ABIs but it's actually using the other
one.
2016-11-16 18:38:32 -05:00
Alex Crichton 88b46460fa rustc: Flag all builtins functions as hidden
When compiling compiler-rt you typically compile with `-fvisibility=hidden`
which to ensure that all symbols are hidden in shared objects and don't show up
in symbol tables. This is important for these intrinsics being linked in every
crate to ensure that we're not unnecessarily bloating the public ABI of Rust
crates.

This should help allow the compiler-builtins project with Rust-defined builtins
start landing in-tree as well.
2016-11-12 10:46:15 -08:00
Srinivas Reddy Thatiparthy 9972d17ecf
run rustfmt on librustc_llvm folder 2016-10-22 18:37:35 +05:30
Michael Woerister db4a9b3446 debuginfo: Remove some outdated stuff from LLVM DIBuilder binding. 2016-10-14 14:56:33 -04:00
Matt Ickstadt b9a8c1a063 Fix incorrect LLVM Linkage enum
The `Linkage` enum in librustc_llvm got out of sync with the version in LLVM and it caused two variants of the #[linkage=""] attribute to break.

This adds the functions `LLVMRustGetLinkage` and `LLVMRustSetLinkage` which convert between the Rust Linkage enum and the LLVM one, which should stop this from breaking every time LLVM changes it.

Fixes #33992
2016-09-04 16:12:01 -05:00
CensoredUsername 516519ee9a Allow specification of the system V AMD64 ABI constraint.
This can be specified using `extern sysV64 fn` on all platforms
2016-08-30 16:01:40 +02:00
Vadim Chugunov cf6461168f Fix debug line info for macro expansions.
Macro expansions produce code tagged with debug locations that are completely different from the surrounding expressions.  This wrecks havoc on debugger's ability the step over source lines.

In order to have a good line stepping behavior in debugger, we overwrite debug locations of macro expansions with that of the outermost expansion site.
2016-08-25 00:40:42 -07:00
Cameron Hart cbb88faad7 Merge branch 'master' into issue-30961 2016-08-06 15:50:48 +10:00
Ariel Ben-Yehuda 3041a97b1a finish type-auditing rustllvm 2016-08-03 15:08:47 +03:00
Ariel Ben-Yehuda 24874170b4 split the FFI part of rustc_llvm to rustc_llvm::ffi 2016-08-03 15:08:47 +03:00