config.gcc (spu-*-elf*): Add spu_cache.h to extra_headers.

2009-10-26  Ben Elliston  <bje@au.ibm.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Ulrich Weigand  <uweigand@de.ibm.com>

	* config.gcc (spu-*-elf*): Add spu_cache.h to extra_headers.
	* config/spu/spu_cache.h: New file.

	* config/spu/cachemgr.c: New file.
	* config/spu/cache.S: New file.

	* config/spu/spu.h (ASM_OUTPUT_SYMBOL_REF): Define.
	(ADDR_SPACE_EA): Define.
	(TARGET_ADDR_SPACE_KEYWORDS): Define.
	* config/spu/spu.c (EAmode): New macro.
	(TARGET_ADDR_SPACE_POINTER_MODE): Define.
	(TARGET_ADDR_SPACE_ADDRESS_MODE): Likewise.
	(TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Likewise.
	(TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS): Likewise.
	(TARGET_ADDR_SPACE_SUBSET_P): Likewise.
	(TARGET_ADDR_SPACE_CONVERT): Likewise.
	(TARGET_ASM_SELECT_SECTION): Likewise.
	(TARGET_ASM_UNIQUE_SECTION): Likewise.
	(TARGET_ASM_UNALIGNED_SI_OP): Likewise.
	(TARGET_ASM_ALIGNED_DI_OP): Likewise.
	(ea_symbol_ref): New function.
	(spu_legitimate_constant_p): Handle __ea qualified addresses.
	(spu_addr_space_legitimate_address_p): New function.
	(spu_addr_space_legitimize_address): Likewise.
	(cache_fetch): New global.
	(cache_fetch_dirty): Likewise.
	(ea_alias_set): Likewise.
	(ea_load_store): New function.
	(ea_load_store_inline): Likewise.
	(expand_ea_mem): Likewise.
	(spu_expand_mov): Handle __ea qualified memory references.
	(spu_addr_space_pointer_mode): New function.
	(spu_addr_space_address_mode): Likewise.
	(spu_addr_space_subset_p): Likewise.
	(spu_addr_space_convert): Likewise.
	(spu_section_type_flags): Handle "._ea" section.
	(spu_select_section): New function.
	(spu_unique_section): Likewise.
	* config/spu/spu-c.c (spu_cpu_cpp_builtins): Support __EA32__
	and __EA64__ predefined macros.
	* config/spu/spu-elf.h (LIB_SPEC): Handle -mcache-size= and
	-matomic-updates switches.

	* config/spu/t-spu-elf (MULTILIB_OPTIONS): Define.
	(EXTRA_MULTILIB_PARTS): Add libgcc_cachemgr.a,
	libgcc_cachemgr_nonatomic.a, libgcc_cache8k.a, libgcc_cache16k.a,
	libgcc_cache32k.a, libgcc_cache64k.a, libgcc_cache128k.a.
	($(T)cachemgr.o, $(T)cachemgr_nonatomic.o): New target.
	($(T)cache8k.o, $(T)cache16k.o, $(T)cache32k.o, $(T)cache64k.o,
	$(T)cache128k.o): Likewise.
	($(T)libgcc_%.a): Likewise.

	* config/spu/spu.h (TARGET_DEFAULT): Add MASK_ADDRESS_SPACE_CONVERSION.
	* config/spu/spu.opt (-mea32/-mea64): Add switches.
	(-maddress-space-conversion): Likewise.
	(-mcache-size=): Likewise.
	(-matomic-updates): Likewise.
	* doc/invoke.texi (-mea32/-mea64): Document.
	(-maddress-space-conversion): Likewise.
	(-mcache-size=): Likewise.
	(-matomic-updates): Likewise.

Co-Authored-By: Michael Meissner <meissner@linux.vnet.ibm.com>
Co-Authored-By: Ulrich Weigand <uweigand@de.ibm.com>

From-SVN: r153575
This commit is contained in:
Ben Elliston 2009-10-26 21:59:17 +00:00 committed by Ulrich Weigand
parent 36c5e70a3a
commit 299456f3c2
12 changed files with 1209 additions and 11 deletions

View File

@ -1,3 +1,69 @@
2009-10-26 Ben Elliston <bje@au.ibm.com>
Michael Meissner <meissner@linux.vnet.ibm.com>
Ulrich Weigand <uweigand@de.ibm.com>
* config.gcc (spu-*-elf*): Add spu_cache.h to extra_headers.
* config/spu/spu_cache.h: New file.
* config/spu/cachemgr.c: New file.
* config/spu/cache.S: New file.
* config/spu/spu.h (ASM_OUTPUT_SYMBOL_REF): Define.
(ADDR_SPACE_EA): Define.
(TARGET_ADDR_SPACE_KEYWORDS): Define.
* config/spu/spu.c (EAmode): New macro.
(TARGET_ADDR_SPACE_POINTER_MODE): Define.
(TARGET_ADDR_SPACE_ADDRESS_MODE): Likewise.
(TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Likewise.
(TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS): Likewise.
(TARGET_ADDR_SPACE_SUBSET_P): Likewise.
(TARGET_ADDR_SPACE_CONVERT): Likewise.
(TARGET_ASM_SELECT_SECTION): Likewise.
(TARGET_ASM_UNIQUE_SECTION): Likewise.
(TARGET_ASM_UNALIGNED_SI_OP): Likewise.
(TARGET_ASM_ALIGNED_DI_OP): Likewise.
(ea_symbol_ref): New function.
(spu_legitimate_constant_p): Handle __ea qualified addresses.
(spu_addr_space_legitimate_address_p): New function.
(spu_addr_space_legitimize_address): Likewise.
(cache_fetch): New global.
(cache_fetch_dirty): Likewise.
(ea_alias_set): Likewise.
(ea_load_store): New function.
(ea_load_store_inline): Likewise.
(expand_ea_mem): Likewise.
(spu_expand_mov): Handle __ea qualified memory references.
(spu_addr_space_pointer_mode): New function.
(spu_addr_space_address_mode): Likewise.
(spu_addr_space_subset_p): Likewise.
(spu_addr_space_convert): Likewise.
(spu_section_type_flags): Handle "._ea" section.
(spu_select_section): New function.
(spu_unique_section): Likewise.
* config/spu/spu-c.c (spu_cpu_cpp_builtins): Support __EA32__
and __EA64__ predefined macros.
* config/spu/spu-elf.h (LIB_SPEC): Handle -mcache-size= and
-matomic-updates switches.
* config/spu/t-spu-elf (MULTILIB_OPTIONS): Define.
(EXTRA_MULTILIB_PARTS): Add libgcc_cachemgr.a,
libgcc_cachemgr_nonatomic.a, libgcc_cache8k.a, libgcc_cache16k.a,
libgcc_cache32k.a, libgcc_cache64k.a, libgcc_cache128k.a.
($(T)cachemgr.o, $(T)cachemgr_nonatomic.o): New target.
($(T)cache8k.o, $(T)cache16k.o, $(T)cache32k.o, $(T)cache64k.o,
$(T)cache128k.o): Likewise.
($(T)libgcc_%.a): Likewise.
* config/spu/spu.h (TARGET_DEFAULT): Add MASK_ADDRESS_SPACE_CONVERSION.
* config/spu/spu.opt (-mea32/-mea64): Add switches.
(-maddress-space-conversion): Likewise.
(-mcache-size=): Likewise.
(-matomic-updates): Likewise.
* doc/invoke.texi (-mea32/-mea64): Document.
(-maddress-space-conversion): Likewise.
(-mcache-size=): Likewise.
(-matomic-updates): Likewise.
2009-10-26 Ben Elliston <bje@au.ibm.com>
Michael Meissner <meissner@linux.vnet.ibm.com>
Ulrich Weigand <uweigand@de.ibm.com>

View File

@ -2449,7 +2449,7 @@ sparc64-*-netbsd*)
spu-*-elf*)
tm_file="dbxelf.h elfos.h spu/spu-elf.h spu/spu.h newlib-stdint.h"
tmake_file="spu/t-spu-elf"
extra_headers="spu_intrinsics.h spu_internals.h vmx2spu.h spu_mfcio.h vec_types.h"
extra_headers="spu_intrinsics.h spu_internals.h vmx2spu.h spu_mfcio.h vec_types.h spu_cache.h"
extra_modes=spu/spu-modes.def
c_target_objs="${c_target_objs} spu-c.o"
cxx_target_objs="${cxx_target_objs} spu-c.o"

43
gcc/config/spu/cache.S Normal file
View File

@ -0,0 +1,43 @@
/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
.data
.p2align 7
.global __cache
__cache:
.rept __CACHE_SIZE__ * 8
.fill 128
.endr
.p2align 7
.global __cache_tag_array
__cache_tag_array:
.rept __CACHE_SIZE__ * 2
.long 1, 1, 1, 1
.fill 128-16
.endr
__end_cache_tag_array:
.globl __cache_tag_array_size
.set __cache_tag_array_size, __end_cache_tag_array-__cache_tag_array

438
gcc/config/spu/cachemgr.c Normal file
View File

@ -0,0 +1,438 @@
/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
#include <spu_mfcio.h>
#include <spu_internals.h>
#include <spu_intrinsics.h>
#include <spu_cache.h>
extern unsigned long long __ea_local_store;
extern char __cache_tag_array_size;
#define LINE_SIZE 128
#define TAG_MASK (LINE_SIZE - 1)
#define WAYS 4
#define SET_MASK ((int) &__cache_tag_array_size - LINE_SIZE)
#define CACHE_LINES ((int) &__cache_tag_array_size / \
sizeof (struct __cache_tag_array) * WAYS)
struct __cache_tag_array
{
unsigned int tag_lo[WAYS];
unsigned int tag_hi[WAYS];
void *base[WAYS];
int reserved[WAYS];
vector unsigned short dirty_bits[WAYS];
};
extern struct __cache_tag_array __cache_tag_array[];
extern char __cache[];
/* In order to make the code seem a little cleaner, and to avoid having
64/32 bit ifdefs all over the place, we use macros. */
#ifdef __EA64__
typedef unsigned long long addr;
#define CHECK_TAG(_entry, _way, _tag) \
((_entry)->tag_lo[(_way)] == ((_tag) & 0xFFFFFFFF) \
&& (_entry)->tag_hi[(_way)] == ((_tag) >> 32))
#define GET_TAG(_entry, _way) \
((unsigned long long)(_entry)->tag_hi[(_way)] << 32 \
| (unsigned long long)(_entry)->tag_lo[(_way)])
#define SET_TAG(_entry, _way, _tag) \
(_entry)->tag_lo[(_way)] = (_tag) & 0xFFFFFFFF; \
(_entry)->tag_hi[(_way)] = (_tag) >> 32
#else /*__EA32__*/
typedef unsigned long addr;
#define CHECK_TAG(_entry, _way, _tag) \
((_entry)->tag_lo[(_way)] == (_tag))
#define GET_TAG(_entry, _way) \
((_entry)->tag_lo[(_way)])
#define SET_TAG(_entry, _way, _tag) \
(_entry)->tag_lo[(_way)] = (_tag)
#endif
/* In GET_ENTRY, we cast away the high 32 bits,
as the tag is only in the low 32. */
#define GET_ENTRY(_addr) \
((struct __cache_tag_array *) \
si_to_uint (si_a (si_and (si_from_uint ((unsigned int) (addr) (_addr)), \
si_from_uint (SET_MASK)), \
si_from_uint ((unsigned int) __cache_tag_array))))
#define GET_CACHE_LINE(_addr, _way) \
((void *) (__cache + ((_addr) & SET_MASK) * WAYS) + ((_way) * LINE_SIZE));
#define CHECK_DIRTY(_vec) (si_to_uint (si_orx ((qword) (_vec))))
#define SET_EMPTY(_entry, _way) ((_entry)->tag_lo[(_way)] = 1)
#define CHECK_EMPTY(_entry, _way) ((_entry)->tag_lo[(_way)] == 1)
#define LS_FLAG 0x80000000
#define SET_IS_LS(_entry, _way) ((_entry)->reserved[(_way)] |= LS_FLAG)
#define CHECK_IS_LS(_entry, _way) ((_entry)->reserved[(_way)] & LS_FLAG)
#define GET_LRU(_entry, _way) ((_entry)->reserved[(_way)] & ~LS_FLAG)
static int dma_tag = 32;
static void
__cache_evict_entry (struct __cache_tag_array *entry, int way)
{
addr tag = GET_TAG (entry, way);
if (CHECK_DIRTY (entry->dirty_bits[way]) && !CHECK_IS_LS (entry, way))
{
#ifdef NONATOMIC
/* Non-atomic writes. */
unsigned int oldmask, mach_stat;
char *line = ((void *) 0);
/* Enter critical section. */
mach_stat = spu_readch (SPU_RdMachStat);
spu_idisable ();
/* Issue DMA request. */
line = GET_CACHE_LINE (entry->tag_lo[way], way);
mfc_put (line, tag, LINE_SIZE, dma_tag, 0, 0);
/* Wait for DMA completion. */
oldmask = mfc_read_tag_mask ();
mfc_write_tag_mask (1 << dma_tag);
mfc_read_tag_status_all ();
mfc_write_tag_mask (oldmask);
/* Leave critical section. */
if (__builtin_expect (mach_stat & 1, 0))
spu_ienable ();
#else
/* Allocate a buffer large enough that we know it has 128 bytes
that are 128 byte aligned (for DMA). */
char buffer[LINE_SIZE + 127];
qword *buf_ptr = (qword *) (((unsigned int) (buffer) + 127) & ~127);
qword *line = GET_CACHE_LINE (entry->tag_lo[way], way);
qword bits;
unsigned int mach_stat;
/* Enter critical section. */
mach_stat = spu_readch (SPU_RdMachStat);
spu_idisable ();
do
{
/* We atomically read the current memory into a buffer
modify the dirty bytes in the buffer, and write it
back. If writeback fails, loop and try again. */
mfc_getllar (buf_ptr, tag, 0, 0);
mfc_read_atomic_status ();
/* The method we're using to write 16 dirty bytes into
the buffer at a time uses fsmb which in turn uses
the least significant 16 bits of word 0, so we
load the bits and rotate so that the first bit of
the bitmap is in the first bit that fsmb will use. */
bits = (qword) entry->dirty_bits[way];
bits = si_rotqbyi (bits, -2);
/* Si_fsmb creates the mask of dirty bytes.
Use selb to nab the appropriate bits. */
buf_ptr[0] = si_selb (buf_ptr[0], line[0], si_fsmb (bits));
/* Rotate to next 16 byte section of cache. */
bits = si_rotqbyi (bits, 2);
buf_ptr[1] = si_selb (buf_ptr[1], line[1], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
buf_ptr[2] = si_selb (buf_ptr[2], line[2], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
buf_ptr[3] = si_selb (buf_ptr[3], line[3], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
buf_ptr[4] = si_selb (buf_ptr[4], line[4], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
buf_ptr[5] = si_selb (buf_ptr[5], line[5], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
buf_ptr[6] = si_selb (buf_ptr[6], line[6], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
buf_ptr[7] = si_selb (buf_ptr[7], line[7], si_fsmb (bits));
bits = si_rotqbyi (bits, 2);
mfc_putllc (buf_ptr, tag, 0, 0);
}
while (mfc_read_atomic_status ());
/* Leave critical section. */
if (__builtin_expect (mach_stat & 1, 0))
spu_ienable ();
#endif
}
/* In any case, marking the lo tag with 1 which denotes empty. */
SET_EMPTY (entry, way);
entry->dirty_bits[way] = (vector unsigned short) si_from_uint (0);
}
void
__cache_evict (__ea void *ea)
{
addr tag = (addr) ea & ~TAG_MASK;
struct __cache_tag_array *entry = GET_ENTRY (ea);
int i = 0;
/* Cycles through all the possible ways an address could be at
and evicts the way if found. */
for (i = 0; i < WAYS; i++)
if (CHECK_TAG (entry, i, tag))
__cache_evict_entry (entry, i);
}
static void *
__cache_fill (int way, addr tag)
{
unsigned int oldmask, mach_stat;
char *line = ((void *) 0);
/* Reserve our DMA tag. */
if (dma_tag == 32)
dma_tag = mfc_tag_reserve ();
/* Enter critical section. */
mach_stat = spu_readch (SPU_RdMachStat);
spu_idisable ();
/* Issue DMA request. */
line = GET_CACHE_LINE (tag, way);
mfc_get (line, tag, LINE_SIZE, dma_tag, 0, 0);
/* Wait for DMA completion. */
oldmask = mfc_read_tag_mask ();
mfc_write_tag_mask (1 << dma_tag);
mfc_read_tag_status_all ();
mfc_write_tag_mask (oldmask);
/* Leave critical section. */
if (__builtin_expect (mach_stat & 1, 0))
spu_ienable ();
return (void *) line;
}
static void
__cache_miss (__ea void *ea, struct __cache_tag_array *entry, int way)
{
addr tag = (addr) ea & ~TAG_MASK;
unsigned int lru = 0;
int i = 0;
int idx = 0;
/* If way > 4, then there are no empty slots, so we must evict
the least recently used entry. */
if (way >= 4)
{
for (i = 0; i < WAYS; i++)
{
if (GET_LRU (entry, i) > lru)
{
lru = GET_LRU (entry, i);
idx = i;
}
}
__cache_evict_entry (entry, idx);
way = idx;
}
/* Set the empty entry's tag and fill it's cache line. */
SET_TAG (entry, way, tag);
entry->reserved[way] = 0;
/* Check if the address is just an effective address within the
SPU's local store. */
/* Because the LS is not 256k aligned, we can't do a nice and mask
here to compare, so we must check the whole range. */
if ((addr) ea >= (addr) __ea_local_store
&& (addr) ea < (addr) (__ea_local_store + 0x40000))
{
SET_IS_LS (entry, way);
entry->base[way] =
(void *) ((unsigned int) ((addr) ea -
(addr) __ea_local_store) & ~0x7f);
}
else
{
entry->base[way] = __cache_fill (way, tag);
}
}
void *
__cache_fetch_dirty (__ea void *ea, int n_bytes_dirty)
{
#ifdef __EA64__
unsigned int tag_hi;
qword etag_hi;
#endif
unsigned int tag_lo;
struct __cache_tag_array *entry;
qword etag_lo;
qword equal;
qword bit_mask;
qword way;
/* This first chunk, we merely fill the pointer and tag. */
entry = GET_ENTRY (ea);
#ifndef __EA64__
tag_lo =
si_to_uint (si_andc
(si_shufb
(si_from_uint ((addr) ea), si_from_uint (0),
si_from_uint (0x00010203)), si_from_uint (TAG_MASK)));
#else
tag_lo =
si_to_uint (si_andc
(si_shufb
(si_from_ullong ((addr) ea), si_from_uint (0),
si_from_uint (0x04050607)), si_from_uint (TAG_MASK)));
tag_hi =
si_to_uint (si_shufb
(si_from_ullong ((addr) ea), si_from_uint (0),
si_from_uint (0x00010203)));
#endif
/* Increment LRU in reserved bytes. */
si_stqd (si_ai (si_lqd (si_from_ptr (entry), 48), 1),
si_from_ptr (entry), 48);
missreturn:
/* Check if the entry's lo_tag is equal to the address' lo_tag. */
etag_lo = si_lqd (si_from_ptr (entry), 0);
equal = si_ceq (etag_lo, si_from_uint (tag_lo));
#ifdef __EA64__
/* And the high tag too. */
etag_hi = si_lqd (si_from_ptr (entry), 16);
equal = si_and (equal, (si_ceq (etag_hi, si_from_uint (tag_hi))));
#endif
if ((si_to_uint (si_orx (equal)) == 0))
goto misshandler;
if (n_bytes_dirty)
{
/* way = 0x40,0x50,0x60,0x70 for each way, which is also the
offset of the appropriate dirty bits. */
way = si_shli (si_clz (si_gbb (equal)), 2);
/* To create the bit_mask, we set it to all 1s (uint -1), then we
shift it over (128 - n_bytes_dirty) times. */
bit_mask = si_from_uint (-1);
bit_mask =
si_shlqby (bit_mask, si_from_uint ((LINE_SIZE - n_bytes_dirty) / 8));
bit_mask =
si_shlqbi (bit_mask, si_from_uint ((LINE_SIZE - n_bytes_dirty) % 8));
/* Rotate it around to the correct offset. */
bit_mask =
si_rotqby (bit_mask,
si_from_uint (-1 * ((addr) ea & TAG_MASK) / 8));
bit_mask =
si_rotqbi (bit_mask,
si_from_uint (-1 * ((addr) ea & TAG_MASK) % 8));
/* Update the dirty bits. */
si_stqx (si_or (si_lqx (si_from_ptr (entry), way), bit_mask),
si_from_ptr (entry), way);
};
/* We've definitely found the right entry, set LRU (reserved) to 0
maintaining the LS flag (MSB). */
si_stqd (si_andc
(si_lqd (si_from_ptr (entry), 48),
si_and (equal, si_from_uint (~(LS_FLAG)))),
si_from_ptr (entry), 48);
return (void *)
si_to_uint (si_a
(si_orx
(si_and (si_lqd (si_from_ptr (entry), 32), equal)),
si_from_uint (((unsigned int) (addr) ea) & TAG_MASK)));
misshandler:
equal = si_ceqi (etag_lo, 1);
__cache_miss (ea, entry, (si_to_uint (si_clz (si_gbb (equal))) - 16) >> 2);
goto missreturn;
}
void *
__cache_fetch (__ea void *ea)
{
return __cache_fetch_dirty (ea, 0);
}
void
__cache_touch (__ea void *ea __attribute__ ((unused)))
{
/* NO-OP for now. */
}
void __cache_flush (void) __attribute__ ((destructor));
void
__cache_flush (void)
{
struct __cache_tag_array *entry = __cache_tag_array;
unsigned int i;
int j;
/* Cycle through each cache entry and evict all used ways. */
for (i = 0; i < CACHE_LINES / WAYS; i++)
{
for (j = 0; j < WAYS; j++)
if (!CHECK_EMPTY (entry, j))
__cache_evict_entry (entry, j);
entry++;
}
}

View File

@ -201,6 +201,17 @@ spu_cpu_cpp_builtins (struct cpp_reader *pfile)
if (spu_arch == PROCESSOR_CELLEDP)
builtin_define_std ("__SPU_EDP__");
builtin_define_std ("__vector=__attribute__((__spu_vector__))");
switch (spu_ea_model)
{
case 32:
builtin_define_std ("__EA32__");
break;
case 64:
builtin_define_std ("__EA64__");
break;
default:
gcc_unreachable ();
}
if (!flag_iso)
{

View File

@ -68,8 +68,14 @@
#define LINK_SPEC "%{mlarge-mem: --defsym __stack=0xfffffff0 }"
#define LIB_SPEC \
"-( %{!shared:%{g*:-lg}} -lc -lgloss -)"
#define LIB_SPEC "-( %{!shared:%{g*:-lg}} -lc -lgloss -) \
%{mno-atomic-updates:-lgcc_cachemgr_nonatomic; :-lgcc_cachemgr} \
%{mcache-size=128:-lgcc_cache128k; \
mcache-size=64 :-lgcc_cache64k; \
mcache-size=32 :-lgcc_cache32k; \
mcache-size=16 :-lgcc_cache16k; \
mcache-size=8 :-lgcc_cache8k; \
:-lgcc_cache64k}"
/* Turn off warnings in the assembler too. */
#undef ASM_SPEC

View File

@ -154,6 +154,8 @@ static tree spu_builtin_decl (unsigned, bool);
static unsigned char spu_scalar_mode_supported_p (enum machine_mode mode);
static unsigned char spu_vector_mode_supported_p (enum machine_mode mode);
static bool spu_legitimate_address_p (enum machine_mode, rtx, bool);
static bool spu_addr_space_legitimate_address_p (enum machine_mode, rtx,
bool, addr_space_t);
static rtx adjust_operand (rtx op, HOST_WIDE_INT * start);
static rtx get_pic_reg (void);
static int need_to_save_reg (int regno, int saving);
@ -203,15 +205,23 @@ static bool spu_return_in_memory (const_tree type, const_tree fntype);
static void fix_range (const char *);
static void spu_encode_section_info (tree, rtx, int);
static rtx spu_legitimize_address (rtx, rtx, enum machine_mode);
static rtx spu_addr_space_legitimize_address (rtx, rtx, enum machine_mode,
addr_space_t);
static tree spu_builtin_mul_widen_even (tree);
static tree spu_builtin_mul_widen_odd (tree);
static tree spu_builtin_mask_for_load (void);
static int spu_builtin_vectorization_cost (bool);
static bool spu_vector_alignment_reachable (const_tree, bool);
static tree spu_builtin_vec_perm (tree, tree *);
static enum machine_mode spu_addr_space_pointer_mode (addr_space_t);
static enum machine_mode spu_addr_space_address_mode (addr_space_t);
static bool spu_addr_space_subset_p (addr_space_t, addr_space_t);
static rtx spu_addr_space_convert (rtx, tree, tree);
static int spu_sms_res_mii (struct ddg *g);
static void asm_file_start (void);
static unsigned int spu_section_type_flags (tree, const char *, int);
static section *spu_select_section (tree, int, unsigned HOST_WIDE_INT);
static void spu_unique_section (tree, int);
static rtx spu_expand_load (rtx, rtx, rtx, int);
static void spu_trampoline_init (rtx, tree, rtx);
@ -270,6 +280,10 @@ spu_libgcc_cmp_return_mode (void);
static enum machine_mode
spu_libgcc_shift_count_mode (void);
/* Pointer mode for __ea references. */
#define EAmode (spu_ea_model != 32 ? DImode : SImode)
/* Table of machine attributes. */
static const struct attribute_spec spu_attribute_table[] =
@ -282,6 +296,25 @@ static const struct attribute_spec spu_attribute_table[] =
/* TARGET overrides. */
#undef TARGET_ADDR_SPACE_POINTER_MODE
#define TARGET_ADDR_SPACE_POINTER_MODE spu_addr_space_pointer_mode
#undef TARGET_ADDR_SPACE_ADDRESS_MODE
#define TARGET_ADDR_SPACE_ADDRESS_MODE spu_addr_space_address_mode
#undef TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P
#define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P \
spu_addr_space_legitimate_address_p
#undef TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS
#define TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS spu_addr_space_legitimize_address
#undef TARGET_ADDR_SPACE_SUBSET_P
#define TARGET_ADDR_SPACE_SUBSET_P spu_addr_space_subset_p
#undef TARGET_ADDR_SPACE_CONVERT
#define TARGET_ADDR_SPACE_CONVERT spu_addr_space_convert
#undef TARGET_INIT_BUILTINS
#define TARGET_INIT_BUILTINS spu_init_builtins
#undef TARGET_BUILTIN_DECL
@ -296,6 +329,15 @@ static const struct attribute_spec spu_attribute_table[] =
#undef TARGET_LEGITIMIZE_ADDRESS
#define TARGET_LEGITIMIZE_ADDRESS spu_legitimize_address
/* The current assembler doesn't like .4byte foo@ppu, so use the normal .long
and .quad for the debugger. When it is known that the assembler is fixed,
these can be removed. */
#undef TARGET_ASM_UNALIGNED_SI_OP
#define TARGET_ASM_UNALIGNED_SI_OP "\t.long\t"
#undef TARGET_ASM_ALIGNED_DI_OP
#define TARGET_ASM_ALIGNED_DI_OP "\t.quad\t"
/* The .8byte directive doesn't seem to work well for a 32 bit
architecture. */
#undef TARGET_ASM_UNALIGNED_DI_OP
@ -412,6 +454,12 @@ static const struct attribute_spec spu_attribute_table[] =
#undef TARGET_SECTION_TYPE_FLAGS
#define TARGET_SECTION_TYPE_FLAGS spu_section_type_flags
#undef TARGET_ASM_SELECT_SECTION
#define TARGET_ASM_SELECT_SECTION spu_select_section
#undef TARGET_ASM_UNIQUE_SECTION
#define TARGET_ASM_UNIQUE_SECTION spu_unique_section
#undef TARGET_LEGITIMATE_ADDRESS_P
#define TARGET_LEGITIMATE_ADDRESS_P spu_legitimate_address_p
@ -3613,6 +3661,29 @@ exp2_immediate_p (rtx op, enum machine_mode mode, int low, int high)
return FALSE;
}
/* Return true if X is a SYMBOL_REF to an __ea qualified variable. */
static int
ea_symbol_ref (rtx *px, void *data ATTRIBUTE_UNUSED)
{
rtx x = *px;
tree decl;
if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == PLUS)
{
rtx plus = XEXP (x, 0);
rtx op0 = XEXP (plus, 0);
rtx op1 = XEXP (plus, 1);
if (GET_CODE (op1) == CONST_INT)
x = op0;
}
return (GET_CODE (x) == SYMBOL_REF
&& (decl = SYMBOL_REF_DECL (x)) != 0
&& TREE_CODE (decl) == VAR_DECL
&& TYPE_ADDR_SPACE (TREE_TYPE (decl)));
}
/* We accept:
- any 32-bit constant (SImode, SFmode)
- any constant that can be generated with fsmbi (any mode)
@ -3624,6 +3695,12 @@ spu_legitimate_constant_p (rtx x)
{
if (GET_CODE (x) == HIGH)
x = XEXP (x, 0);
/* Reject any __ea qualified reference. These can't appear in
instructions but must be forced to the constant pool. */
if (for_each_rtx (&x, ea_symbol_ref, 0))
return 0;
/* V4SI with all identical symbols is valid. */
if (!flag_pic
&& GET_MODE (x) == V4SImode
@ -3662,8 +3739,14 @@ spu_legitimate_address_p (enum machine_mode mode,
switch (GET_CODE (x))
{
case LABEL_REF:
return !TARGET_LARGE_MEM;
case SYMBOL_REF:
case CONST:
/* Keep __ea references until reload so that spu_expand_mov can see them
in MEMs. */
if (ea_symbol_ref (&x, 0))
return !reload_in_progress && !reload_completed;
return !TARGET_LARGE_MEM;
case CONST_INT:
@ -3707,6 +3790,20 @@ spu_legitimate_address_p (enum machine_mode mode,
return FALSE;
}
/* Like spu_legitimate_address_p, except with named addresses. */
static bool
spu_addr_space_legitimate_address_p (enum machine_mode mode, rtx x,
bool reg_ok_strict, addr_space_t as)
{
if (as == ADDR_SPACE_EA)
return (REG_P (x) && (GET_MODE (x) == EAmode));
else if (as != ADDR_SPACE_GENERIC)
gcc_unreachable ();
return spu_legitimate_address_p (mode, x, reg_ok_strict);
}
/* When the address is reg + const_int, force the const_int into a
register. */
rtx
@ -3738,6 +3835,17 @@ spu_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
return x;
}
/* Like spu_legitimate_address, except with named address support. */
static rtx
spu_addr_space_legitimize_address (rtx x, rtx oldx, enum machine_mode mode,
addr_space_t as)
{
if (as != ADDR_SPACE_GENERIC)
return x;
return spu_legitimize_address (x, oldx, mode);
}
/* Handle an attribute requiring a FUNCTION_DECL; arguments as in
struct attribute_spec.handler. */
static tree
@ -4241,6 +4349,233 @@ address_needs_split (rtx mem)
return 0;
}
static GTY(()) rtx cache_fetch; /* __cache_fetch function */
static GTY(()) rtx cache_fetch_dirty; /* __cache_fetch_dirty function */
static alias_set_type ea_alias_set = -1; /* alias set for __ea memory */
/* MEM is known to be an __ea qualified memory access. Emit a call to
fetch the ppu memory to local store, and return its address in local
store. */
static void
ea_load_store (rtx mem, bool is_store, rtx ea_addr, rtx data_addr)
{
if (is_store)
{
rtx ndirty = GEN_INT (GET_MODE_SIZE (GET_MODE (mem)));
if (!cache_fetch_dirty)
cache_fetch_dirty = init_one_libfunc ("__cache_fetch_dirty");
emit_library_call_value (cache_fetch_dirty, data_addr, LCT_NORMAL, Pmode,
2, ea_addr, EAmode, ndirty, SImode);
}
else
{
if (!cache_fetch)
cache_fetch = init_one_libfunc ("__cache_fetch");
emit_library_call_value (cache_fetch, data_addr, LCT_NORMAL, Pmode,
1, ea_addr, EAmode);
}
}
/* Like ea_load_store, but do the cache tag comparison and, for stores,
dirty bit marking, inline.
The cache control data structure is an array of
struct __cache_tag_array
{
unsigned int tag_lo[4];
unsigned int tag_hi[4];
void *data_pointer[4];
int reserved[4];
vector unsigned short dirty_bits[4];
} */
static void
ea_load_store_inline (rtx mem, bool is_store, rtx ea_addr, rtx data_addr)
{
rtx ea_addr_si;
HOST_WIDE_INT v;
rtx tag_size_sym = gen_rtx_SYMBOL_REF (Pmode, "__cache_tag_array_size");
rtx tag_arr_sym = gen_rtx_SYMBOL_REF (Pmode, "__cache_tag_array");
rtx index_mask = gen_reg_rtx (SImode);
rtx tag_arr = gen_reg_rtx (Pmode);
rtx splat_mask = gen_reg_rtx (TImode);
rtx splat = gen_reg_rtx (V4SImode);
rtx splat_hi = NULL_RTX;
rtx tag_index = gen_reg_rtx (Pmode);
rtx block_off = gen_reg_rtx (SImode);
rtx tag_addr = gen_reg_rtx (Pmode);
rtx tag = gen_reg_rtx (V4SImode);
rtx cache_tag = gen_reg_rtx (V4SImode);
rtx cache_tag_hi = NULL_RTX;
rtx cache_ptrs = gen_reg_rtx (TImode);
rtx cache_ptrs_si = gen_reg_rtx (SImode);
rtx tag_equal = gen_reg_rtx (V4SImode);
rtx tag_equal_hi = NULL_RTX;
rtx tag_eq_pack = gen_reg_rtx (V4SImode);
rtx tag_eq_pack_si = gen_reg_rtx (SImode);
rtx eq_index = gen_reg_rtx (SImode);
rtx bcomp, hit_label, hit_ref, cont_label, insn;
if (spu_ea_model != 32)
{
splat_hi = gen_reg_rtx (V4SImode);
cache_tag_hi = gen_reg_rtx (V4SImode);
tag_equal_hi = gen_reg_rtx (V4SImode);
}
emit_move_insn (index_mask, plus_constant (tag_size_sym, -128));
emit_move_insn (tag_arr, tag_arr_sym);
v = 0x0001020300010203LL;
emit_move_insn (splat_mask, immed_double_const (v, v, TImode));
ea_addr_si = ea_addr;
if (spu_ea_model != 32)
ea_addr_si = convert_to_mode (SImode, ea_addr, 1);
/* tag_index = ea_addr & (tag_array_size - 128) */
emit_insn (gen_andsi3 (tag_index, ea_addr_si, index_mask));
/* splat ea_addr to all 4 slots. */
emit_insn (gen_shufb (splat, ea_addr_si, ea_addr_si, splat_mask));
/* Similarly for high 32 bits of ea_addr. */
if (spu_ea_model != 32)
emit_insn (gen_shufb (splat_hi, ea_addr, ea_addr, splat_mask));
/* block_off = ea_addr & 127 */
emit_insn (gen_andsi3 (block_off, ea_addr_si, spu_const (SImode, 127)));
/* tag_addr = tag_arr + tag_index */
emit_insn (gen_addsi3 (tag_addr, tag_arr, tag_index));
/* Read cache tags. */
emit_move_insn (cache_tag, gen_rtx_MEM (V4SImode, tag_addr));
if (spu_ea_model != 32)
emit_move_insn (cache_tag_hi, gen_rtx_MEM (V4SImode,
plus_constant (tag_addr, 16)));
/* tag = ea_addr & -128 */
emit_insn (gen_andv4si3 (tag, splat, spu_const (V4SImode, -128)));
/* Read all four cache data pointers. */
emit_move_insn (cache_ptrs, gen_rtx_MEM (TImode,
plus_constant (tag_addr, 32)));
/* Compare tags. */
emit_insn (gen_ceq_v4si (tag_equal, tag, cache_tag));
if (spu_ea_model != 32)
{
emit_insn (gen_ceq_v4si (tag_equal_hi, splat_hi, cache_tag_hi));
emit_insn (gen_andv4si3 (tag_equal, tag_equal, tag_equal_hi));
}
/* At most one of the tags compare equal, so tag_equal has one
32-bit slot set to all 1's, with the other slots all zero.
gbb picks off low bit from each byte in the 128-bit registers,
so tag_eq_pack is one of 0xf000, 0x0f00, 0x00f0, 0x000f, assuming
we have a hit. */
emit_insn (gen_spu_gbb (tag_eq_pack, spu_gen_subreg (V16QImode, tag_equal)));
emit_insn (gen_spu_convert (tag_eq_pack_si, tag_eq_pack));
/* So counting leading zeros will set eq_index to 16, 20, 24 or 28. */
emit_insn (gen_clzsi2 (eq_index, tag_eq_pack_si));
/* Allowing us to rotate the corresponding cache data pointer to slot0.
(rotating eq_index mod 16 bytes). */
emit_insn (gen_rotqby_ti (cache_ptrs, cache_ptrs, eq_index));
emit_insn (gen_spu_convert (cache_ptrs_si, cache_ptrs));
/* Add block offset to form final data address. */
emit_insn (gen_addsi3 (data_addr, cache_ptrs_si, block_off));
/* Check that we did hit. */
hit_label = gen_label_rtx ();
hit_ref = gen_rtx_LABEL_REF (VOIDmode, hit_label);
bcomp = gen_rtx_NE (SImode, tag_eq_pack_si, const0_rtx);
insn = emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
hit_ref, pc_rtx)));
/* Say that this branch is very likely to happen. */
v = REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100 - 1;
REG_NOTES (insn)
= gen_rtx_EXPR_LIST (REG_BR_PROB, GEN_INT (v), REG_NOTES (insn));
ea_load_store (mem, is_store, ea_addr, data_addr);
cont_label = gen_label_rtx ();
emit_jump_insn (gen_jump (cont_label));
emit_barrier ();
emit_label (hit_label);
if (is_store)
{
HOST_WIDE_INT v_hi;
rtx dirty_bits = gen_reg_rtx (TImode);
rtx dirty_off = gen_reg_rtx (SImode);
rtx dirty_128 = gen_reg_rtx (TImode);
rtx neg_block_off = gen_reg_rtx (SImode);
/* Set up mask with one dirty bit per byte of the mem we are
writing, starting from top bit. */
v_hi = v = -1;
v <<= (128 - GET_MODE_SIZE (GET_MODE (mem))) & 63;
if ((128 - GET_MODE_SIZE (GET_MODE (mem))) >= 64)
{
v_hi = v;
v = 0;
}
emit_move_insn (dirty_bits, immed_double_const (v, v_hi, TImode));
/* Form index into cache dirty_bits. eq_index is one of
0x10, 0x14, 0x18 or 0x1c. Multiplying by 4 gives us
0x40, 0x50, 0x60 or 0x70 which just happens to be the
offset to each of the four dirty_bits elements. */
emit_insn (gen_ashlsi3 (dirty_off, eq_index, spu_const (SImode, 2)));
emit_insn (gen_spu_lqx (dirty_128, tag_addr, dirty_off));
/* Rotate bit mask to proper bit. */
emit_insn (gen_negsi2 (neg_block_off, block_off));
emit_insn (gen_rotqbybi_ti (dirty_bits, dirty_bits, neg_block_off));
emit_insn (gen_rotqbi_ti (dirty_bits, dirty_bits, neg_block_off));
/* Or in the new dirty bits. */
emit_insn (gen_iorti3 (dirty_128, dirty_bits, dirty_128));
/* Store. */
emit_insn (gen_spu_stqx (dirty_128, tag_addr, dirty_off));
}
emit_label (cont_label);
}
static rtx
expand_ea_mem (rtx mem, bool is_store)
{
rtx ea_addr;
rtx data_addr = gen_reg_rtx (Pmode);
rtx new_mem;
ea_addr = force_reg (EAmode, XEXP (mem, 0));
if (optimize_size || optimize == 0)
ea_load_store (mem, is_store, ea_addr, data_addr);
else
ea_load_store_inline (mem, is_store, ea_addr, data_addr);
if (ea_alias_set == -1)
ea_alias_set = new_alias_set ();
/* We generate a new MEM RTX to refer to the copy of the data
in the cache. We do not copy memory attributes (except the
alignment) from the original MEM, as they may no longer apply
to the cache copy. */
new_mem = gen_rtx_MEM (GET_MODE (mem), data_addr);
set_mem_alias_set (new_mem, ea_alias_set);
set_mem_align (new_mem, MIN (MEM_ALIGN (mem), 128 * 8));
return new_mem;
}
int
spu_expand_mov (rtx * ops, enum machine_mode mode)
{
@ -4298,9 +4633,17 @@ spu_expand_mov (rtx * ops, enum machine_mode mode)
}
}
if (MEM_P (ops[0]))
return spu_split_store (ops);
{
if (MEM_ADDR_SPACE (ops[0]))
ops[0] = expand_ea_mem (ops[0], true);
return spu_split_store (ops);
}
if (MEM_P (ops[1]))
return spu_split_load (ops);
{
if (MEM_ADDR_SPACE (ops[1]))
ops[1] = expand_ea_mem (ops[1], false);
return spu_split_load (ops);
}
return 0;
}
@ -6442,6 +6785,113 @@ spu_builtin_vec_perm (tree type, tree *mask_element_type)
return d->fndecl;
}
/* Return the appropriate mode for a named address pointer. */
static enum machine_mode
spu_addr_space_pointer_mode (addr_space_t addrspace)
{
switch (addrspace)
{
case ADDR_SPACE_GENERIC:
return ptr_mode;
case ADDR_SPACE_EA:
return EAmode;
default:
gcc_unreachable ();
}
}
/* Return the appropriate mode for a named address address. */
static enum machine_mode
spu_addr_space_address_mode (addr_space_t addrspace)
{
switch (addrspace)
{
case ADDR_SPACE_GENERIC:
return Pmode;
case ADDR_SPACE_EA:
return EAmode;
default:
gcc_unreachable ();
}
}
/* Determine if one named address space is a subset of another. */
static bool
spu_addr_space_subset_p (addr_space_t subset, addr_space_t superset)
{
gcc_assert (subset == ADDR_SPACE_GENERIC || subset == ADDR_SPACE_EA);
gcc_assert (superset == ADDR_SPACE_GENERIC || superset == ADDR_SPACE_EA);
if (subset == superset)
return true;
/* If we have -mno-address-space-conversion, treat __ea and generic as not
being subsets but instead as disjoint address spaces. */
else if (!TARGET_ADDRESS_SPACE_CONVERSION)
return false;
else
return (subset == ADDR_SPACE_GENERIC && superset == ADDR_SPACE_EA);
}
/* Convert from one address space to another. */
static rtx
spu_addr_space_convert (rtx op, tree from_type, tree to_type)
{
addr_space_t from_as = TYPE_ADDR_SPACE (TREE_TYPE (from_type));
addr_space_t to_as = TYPE_ADDR_SPACE (TREE_TYPE (to_type));
gcc_assert (from_as == ADDR_SPACE_GENERIC || from_as == ADDR_SPACE_EA);
gcc_assert (to_as == ADDR_SPACE_GENERIC || to_as == ADDR_SPACE_EA);
if (to_as == ADDR_SPACE_GENERIC && from_as == ADDR_SPACE_EA)
{
rtx result, ls;
ls = gen_const_mem (DImode,
gen_rtx_SYMBOL_REF (Pmode, "__ea_local_store"));
set_mem_align (ls, 128);
result = gen_reg_rtx (Pmode);
ls = force_reg (Pmode, convert_modes (Pmode, DImode, ls, 1));
op = force_reg (Pmode, convert_modes (Pmode, EAmode, op, 1));
ls = emit_conditional_move (ls, NE, op, const0_rtx, Pmode,
ls, const0_rtx, Pmode, 1);
emit_insn (gen_subsi3 (result, op, ls));
return result;
}
else if (to_as == ADDR_SPACE_EA && from_as == ADDR_SPACE_GENERIC)
{
rtx result, ls;
ls = gen_const_mem (DImode,
gen_rtx_SYMBOL_REF (Pmode, "__ea_local_store"));
set_mem_align (ls, 128);
result = gen_reg_rtx (EAmode);
ls = force_reg (EAmode, convert_modes (EAmode, DImode, ls, 1));
op = force_reg (Pmode, op);
ls = emit_conditional_move (ls, NE, op, const0_rtx, Pmode,
ls, const0_rtx, EAmode, 1);
op = force_reg (EAmode, convert_modes (EAmode, Pmode, op, 1));
if (EAmode == SImode)
emit_insn (gen_addsi3 (result, op, ls));
else
emit_insn (gen_adddi3 (result, op, ls));
return result;
}
else
gcc_unreachable ();
}
/* Count the total number of instructions in each pipe and return the
maximum, which is used as the Minimum Iteration Interval (MII)
in the modulo scheduler. get_pipe() will return -2, -1, 0, or 1.
@ -6534,9 +6984,46 @@ spu_section_type_flags (tree decl, const char *name, int reloc)
/* .toe needs to have type @nobits. */
if (strcmp (name, ".toe") == 0)
return SECTION_BSS;
/* Don't load _ea into the current address space. */
if (strcmp (name, "._ea") == 0)
return SECTION_WRITE | SECTION_DEBUG;
return default_section_type_flags (decl, name, reloc);
}
/* Implement targetm.select_section. */
static section *
spu_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align)
{
/* Variables and constants defined in the __ea address space
go into a special section named "._ea". */
if (TREE_TYPE (decl) != error_mark_node
&& TYPE_ADDR_SPACE (TREE_TYPE (decl)) == ADDR_SPACE_EA)
{
/* We might get called with string constants, but get_named_section
doesn't like them as they are not DECLs. Also, we need to set
flags in that case. */
if (!DECL_P (decl))
return get_section ("._ea", SECTION_WRITE | SECTION_DEBUG, NULL);
return get_named_section (decl, "._ea", reloc);
}
return default_elf_select_section (decl, reloc, align);
}
/* Implement targetm.unique_section. */
static void
spu_unique_section (tree decl, int reloc)
{
/* We don't support unique section names in the __ea address
space for now. */
if (TREE_TYPE (decl) != error_mark_node
&& TYPE_ADDR_SPACE (TREE_TYPE (decl)) != 0)
return;
default_unique_section (decl, reloc);
}
/* Generate a constant or register which contains 2^SCALE. We assume
the result is valid for MODE. Currently, MODE must be V4SFmode and
SCALE must be SImode. */

View File

@ -51,7 +51,7 @@ extern GTY(()) int spu_tune;
/* Default target_flags if no switches specified. */
#ifndef TARGET_DEFAULT
#define TARGET_DEFAULT (MASK_ERROR_RELOC | MASK_SAFE_DMA | MASK_BRANCH_HINTS \
| MASK_SAFE_HINTS)
| MASK_SAFE_HINTS | MASK_ADDRESS_SPACE_CONVERSION)
#endif
@ -469,6 +469,17 @@ targetm.resolve_overloaded_builtin = spu_resolve_overloaded_builtin; \
#define ASM_OUTPUT_LABELREF(FILE, NAME) \
asm_fprintf (FILE, "%U%s", default_strip_name_encoding (NAME))
#define ASM_OUTPUT_SYMBOL_REF(FILE, X) \
do \
{ \
tree decl; \
assemble_name (FILE, XSTR ((X), 0)); \
if ((decl = SYMBOL_REF_DECL ((X))) != 0 \
&& TREE_CODE (decl) == VAR_DECL \
&& TYPE_ADDR_SPACE (TREE_TYPE (decl))) \
fputs ("@ppu", FILE); \
} while (0)
/* Instruction Output */
#define REGISTER_NAMES \
@ -590,6 +601,13 @@ targetm.resolve_overloaded_builtin = spu_resolve_overloaded_builtin; \
} while (0)
/* Address spaces. */
#define ADDR_SPACE_EA 1
/* Named address space keywords. */
#define TARGET_ADDR_SPACE_KEYWORDS ADDR_SPACE_KEYWORD ("__ea", ADDR_SPACE_EA)
/* Builtins. */
enum spu_builtin_type

View File

@ -82,3 +82,24 @@ Generate code for given CPU
mtune=
Target RejectNegative Joined Var(spu_tune_string)
Schedule code for given CPU
mea32
Target Report RejectNegative Var(spu_ea_model,32) Init(32)
Access variables in 32-bit PPU objects (default)
mea64
Target Report RejectNegative Var(spu_ea_model,64) VarExists
Access variables in 64-bit PPU objects
maddress-space-conversion
Target Report Mask(ADDRESS_SPACE_CONVERSION)
Allow conversions between __ea and generic pointers (default)
mcache-size=
Target Report RejectNegative Joined UInteger
Size (in KB) of software data cache
matomic-updates
Target Report
Atomically write back software data cache lines (default)

View File

@ -0,0 +1,39 @@
/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
This file is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3 of the License, or (at your option)
any later version.
This file is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.
You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
<http://www.gnu.org/licenses/>. */
#ifndef _SPU_CACHE_H
#define _SPU_CACHE_H
void *__cache_fetch_dirty (__ea void *ea, int n_bytes_dirty);
void *__cache_fetch (__ea void *ea);
void __cache_evict (__ea void *ea);
void __cache_flush (void);
void __cache_touch (__ea void *ea);
#define cache_fetch_dirty(_ea, _n_bytes_dirty) \
__cache_fetch_dirty(_ea, _n_bytes_dirty)
#define cache_fetch(_ea) __cache_fetch(_ea)
#define cache_touch(_ea) __cache_touch(_ea)
#define cache_evict(_ea) __cache_evict(_ea)
#define cache_flush() __cache_flush()
#endif

View File

@ -66,14 +66,39 @@ fp-bit.c: $(srcdir)/config/fp-bit.c $(srcdir)/config/spu/t-spu-elf
# Don't let CTOR_LIST end up in sdata section.
CRTSTUFF_T_CFLAGS =
#MULTILIB_OPTIONS=mlarge-mem/mtest-abi
#MULTILIB_DIRNAMES=large-mem test-abi
#MULTILIB_MATCHES=
# Multi-lib support.
MULTILIB_OPTIONS=mea64
# Neither gcc or newlib seem to have a standard way to generate multiple
# crt*.o files. So we don't use the standard crt0.o name anymore.
EXTRA_MULTILIB_PARTS = crtbegin.o crtend.o
EXTRA_MULTILIB_PARTS = crtbegin.o crtend.o libgcc_cachemgr.a libgcc_cachemgr_nonatomic.a \
libgcc_cache8k.a libgcc_cache16k.a libgcc_cache32k.a libgcc_cache64k.a libgcc_cache128k.a
$(T)cachemgr.o: $(srcdir)/config/spu/cachemgr.c
$(GCC_FOR_TARGET) $(LIBGCC2_CFLAGS) $(MULTILIB_CFLAGS) -c $< -o $@
# Specialised rule to add a -D flag.
$(T)cachemgr_nonatomic.o: $(srcdir)/config/spu/cachemgr.c
$(GCC_FOR_TARGET) $(LIBGCC2_CFLAGS) $(MULTILIB_CFLAGS) -DNONATOMIC -c $< -o $@
$(T)libgcc_%.a: $(T)%.o
$(AR_FOR_TARGET) -rcs $@ $<
$(T)cache8k.o: $(srcdir)/config/spu/cache.S
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=8 -o $@ -c $<
$(T)cache16k.o: $(srcdir)/config/spu/cache.S
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=16 -o $@ -c $<
$(T)cache32k.o: $(srcdir)/config/spu/cache.S
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=32 -o $@ -c $<
$(T)cache64k.o: $(srcdir)/config/spu/cache.S
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=64 -o $@ -c $<
$(T)cache128k.o: $(srcdir)/config/spu/cache.S
$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=128 -o $@ -c $<
LIBGCC = stmp-multilib
INSTALL_LIBGCC = install-multilib

View File

@ -846,7 +846,11 @@ See RS/6000 and PowerPC Options.
-msafe-dma -munsafe-dma @gol
-mbranch-hints @gol
-msmall-mem -mlarge-mem -mstdmain @gol
-mfixed-range=@var{register-range}}
-mfixed-range=@var{register-range} @gol
-mea32 -mea64 @gol
-maddress-space-conversion -mno-address-space-conversion @gol
-mcache-size=@var{cache-size} @gol
-matomic-updates -mno-atomic-updates}
@emph{System V Options}
@gccoptlist{-Qy -Qn -YP,@var{paths} -Ym,@var{dir}}
@ -16358,6 +16362,46 @@ useful when compiling kernel code. A register range is specified as
two registers separated by a dash. Multiple register ranges can be
specified separated by a comma.
@item -mea32
@itemx -mea64
@opindex mea32
@opindex mea64
Compile code assuming that pointers to the PPU address space accessed
via the @code{__ea} named address space qualifier are either 32 or 64
bits wide. The default is 32 bits. As this is an ABI changing option,
all object code in an executable must be compiled with the same setting.
@item -maddress-space-conversion
@itemx -mno-address-space-conversion
@opindex maddress-space-conversion
@opindex mno-address-space-conversion
Allow/disallow treating the @code{__ea} address space as superset
of the generic address space. This enables explicit type casts
between @code{__ea} and generic pointer as well as implicit
conversions of generic pointers to @code{__ea} pointers. The
default is to allow address space pointer conversions.
@item -mcache-size=@var{cache-size}
@opindex mcache-size
This option controls the version of libgcc that the compiler links to an
executable and selects a software-managed cache for accessing variables
in the @code{__ea} address space with a particular cache size. Possible
options for @var{cache-size} are @samp{8}, @samp{16}, @samp{32}, @samp{64}
and @samp{128}. The default cache size is 64KB.
@item -matomic-updates
@itemx -mno-atomic-updates
@opindex matomic-updates
@opindex mno-atomic-updates
This option controls the version of libgcc that the compiler links to an
executable and selects whether atomic updates to the software-managed
cache of PPU-side variables are used. If you use atomic updates, changes
to a PPU variable from SPU code using the @code{__ea} named address space
qualifier will not interfere with changes to other PPU variables residing
in the same cache line from PPU code. If you do not use atomic updates,
such interference may occur; however, writing back cache lines will be
more efficient. The default behavior is to use atomic updates.
@item -mdual-nops
@itemx -mdual-nops=@var{n}
@opindex mdual-nops