Commit Graph

267 Commits

Author SHA1 Message Date
Nobuhiro Iwamatsu d8902adcc1 dmaengine: sh: Add Support SuperH DMA Engine driver
This supported all DMA channels, and it was tested in SH7722,
SH7780, SH7785 and SH7763.
This can not use with SH DMA API.

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Reviewed-by: Matt Fleming <matt@console-pimps.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:56:02 -07:00
Dan Williams bbb20089a3 Merge branch 'dmaengine' into async-tx-next
Conflicts:
	crypto/async_tx/async_xor.c
	drivers/dma/ioat/dma_v2.h
	drivers/dma/ioat/pci.c
	drivers/md/raid5.c
2009-09-08 17:55:21 -07:00
Dan Williams 3e48e65690 Merge branch 'iop-raid6' into async-tx-next 2009-09-08 17:53:57 -07:00
Atsushi Nemoto 657a77fa72 dmaengine: Move all map_sg/unmap_sg for slave channel to its client
Dan Williams wrote:
... DMA-slave clients request specific channels and know the hardware
details at a low level, so it should not be too high an expectation to
push dma mapping responsibility to the client.

Also this patch includes DMA_COMPL_{SRC,DEST}_UNMAP_SINGLE support for
dw_dmac driver.

Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:05 -07:00
Ira Snyder bbea0b6e0d fsldma: Add DMA_SLAVE support
Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
scatterlist into an arbitrary list of hardware address/length pairs.

This allows a single DMA transaction to copy data from several different
devices into a scatterlist at the same time.

This also adds support to enable some controller-specific features such as
external start and external pause for a DMA transaction.

[dan.j.williams@intel.com: rebased on tx_list movement]
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Acked-by: Li Yang <leoli@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:04 -07:00
Ira Snyder e6c7ecb64e fsldma: split apart external pause and request count features
When using the Freescale DMA controller in external control mode, both the
request count and external pause bits need to be setup correctly. This was
being done with the same function.

The 83xx controller lacks the external pause feature, but has a similar
feature called external start. This feature requires that the request count
bits be setup correctly.

Split the function into two parts, to make it possible to use the external
start feature on the 83xx controller.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:04 -07:00
Dan Williams 162b96e63e ioat2,3: cacheline align software descriptor allocations
All the necessary fields for handling an ioat2,3 ring entry can fit into
one cacheline.  Move ->len prior to ->txd in struct ioat_ring_ent, and
move allocation of these entries to a hw-cache-aligned kmem cache to
reduce the number of cachelines dirtied for descriptor management.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:04 -07:00
Dan Williams 0803172778 dmaengine: kill tx_list
The tx_list attribute of struct dma_async_tx_descriptor is common to
most, but not all dma driver implementations.  None of the upper level
code (dmaengine/async_tx) uses it, so allow drivers to implement it
locally if they need it.  This saves sizeof(struct list_head) bytes for
drivers that do not manage descriptors with a linked list (e.g.: ioatdma
v2,3).

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:04 -07:00
Dan Williams 1979b186b8 txx9dmac: implement a private tx_list
Drop txx9dmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:03 -07:00
Dan Williams 285a3c7164 at_hdmac: implement a private tx_list
Drop at_hdmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:03 -07:00
Dan Williams 64203b6727 mv_xor: implement a private tx_list
Drop mv_xor's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Cc: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:03 -07:00
Dan Williams ea25968a32 ioat: implement a private tx_list
Drop ioatdma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:02 -07:00
Dan Williams 308136d1ab iop-adma: implement a private tx_list
Drop iop-adma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:02 -07:00
Dan Williams eda3423457 fsldma: implement a private tx_list
Drop fsldma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:02 -07:00
Dan Williams e0bd0f8cb0 dw_dmac: implement a private tx_list
Drop dw_dmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.

Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:53:02 -07:00
Dan Williams e12c4fa377 Merge branch 'ioat' into dmaengine 2009-09-08 17:52:57 -07:00
Roland Dreier a6417dd58d I/OAT: Convert to PCI_VDEVICE()
Trivial cleanup to make the PCI ID table easier to read.

[dan.j.williams@intel.com: extended to v3.2 devices]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:43:03 -07:00
Roland Dreier 6506cbca6b Add MODULE_DEVICE_TABLE() so ioatdma module is autoloaded
The ioatdma module is missing aliases for the PCI devices it supports,
so it is not autoloaded on boot.  Add a MODULE_DEVICE_TABLE() to get
these aliases.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:43:03 -07:00
Dan Williams e3232714d4 ioat3: segregate raid engines
The cleanup routine for the raid cases imposes extra checks for handling
raid descriptors and extended descriptors.  If the channel does not
support raid it can avoid this extra overhead by using the ioat2 cleanup
path.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:43:02 -07:00
Tom Picard b265b11fc1 ioat3: ioat3.2 pci ids for Jasper Forest
Jasper Forest introduces raid offload support via ioat3.2 support.  When
raid offload is enabled two (out of 8 channels) will report raid5/raid6
offload capabilities.  The remaining channels will only report ioat3.0
capabilities (memcpy).

Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:43:01 -07:00
Dan Williams 58c8649e0e ioat3: interrupt descriptor support
The async_tx api uses the DMA_INTERRUPT operation type to terminate a
chain of issued operations with a callback routine.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:43:00 -07:00
Dan Williams ae786624c2 ioat3: support xor via pq descriptors
If a platform advertises pq capabilities, but not xor, then use
ioat3_prep_pqxor and ioat3_prep_pqxor_val to simulate xor support.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:43:00 -07:00
Dan Williams d69d235b7d ioat3: pq support
ioat3.2 adds support for raid6 syndrome generation (xor sum of galois
field multiplication products) using up to 8 sources.  It can also
perform an pq-zero-sum operation to validate whether the syndrome for a
given set of sources matches a previously computed syndrome.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:59 -07:00
Dan Williams 9de6fc717b ioat3: xor self test
This adds a hardware specific self test to be called from ioat_probe.
In the ioat3 case we will have tests for all the different raid
operations, while ioat1 and ioat2 will continue to just test memcpy.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:58 -07:00
Dan Williams b094ad3be5 ioat3: xor support
ioat3.2 adds xor offload support for up to 8 sources.  It can also
perform an xor-zero-sum operation to validate whether all given sources
sum to zero, without writing to a destination.  Xor descriptors differ
from memcpy in that one operation may require multiple descriptors
depending on the number of sources.  When the number of sources exceeds
5 an extended descriptor is needed.  These descriptors need to be
accounted for when updating the DMA_COUNT register.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:57 -07:00
Dan Williams e61dacaeb3 ioat3: enable dca for completion writes
Tag completion writes for direct cache access to reduce the latency of
checking for descriptor completions.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:57 -07:00
Dan Williams 5669e31c5a ioat: add 'ioat' sysfs attributes
Export driver attributes for diagnostic purposes:
'ring_size': total number of descriptors available to the engine
'ring_active': number of descriptors in-flight
'capabilities': supported operation types for this channel
'version': Intel(R) QuickData specfication revision

This also allows some chattiness to be removed from the driver startup
as this information is now available via sysfs.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:56 -07:00
Dan Williams bf40a6869c ioat3: split ioat3 support to its own file, add memset
Up until this point the driver for Intel(R) QuickData Technology
engines, specification versions 2 and 3, were mostly identical save for
a few quirks.  Version 3.2 hardware adds many new capabilities (like
raid offload support) requiring some infrastructure that is not relevant
for v2.  For better code organization of the new funcionality move v3
and v3.2 support to its own file dma_v3.c, and export some routines from
the base files (dma.c and dma_v2.c) that can be reused directly.

The first new capability included in this code reorganization is support
for v3.2 memset operations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:55 -07:00
Dan Williams 2aec048cdc ioat3: hardware version 3.2 register / descriptor definitions
ioat3.2 adds raid5 and raid6 offload capabilities.

Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:54 -07:00
Dan Williams 128f2d567f ioat2+: add fence support
In preparation for adding more operation types to the ioat3 path the
driver needs to honor the DMA_PREP_FENCE flag.  For example the async_tx api
will hand xor->memcpy->xor chains to the driver with the 'fence' flag set on
the first xor and the memcpy operation.  This flag in turn sets the 'fence'
flag in the descriptor control field telling the hardware that future
descriptors in the chain depend on the result of the current descriptor, so
wait for all writes to complete before starting the next operation.

Note that ioat1 does not prefetch the descriptor chain, so does not
require/support fenced operations.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:53 -07:00
Dan Williams 83544ae9f3 dmaengine, async_tx: support alignment checks
Some engines have transfer size and address alignment restrictions.  Add
a per-operation alignment property to struct dma_device that the async
routines and dmatest can use to check alignment capabilities.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:53 -07:00
Dan Williams 9308add6ea dmaengine: cleanup unused transaction types
No drivers currently implement these operation types, so they can be
deleted.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:52 -07:00
Dan Williams 138f4c359d dmaengine, async_tx: add a "no channel switch" allocator
Channel switching is problematic for some dmaengine drivers as the
architecture precludes separating the ->prep from ->submit.  In these
cases the driver can select ASYNC_TX_DISABLE_CHANNEL_SWITCH to modify
the async_tx allocator to only return channels that support all of the
required asynchronous operations.

For example MD_RAID456=y selects support for asynchronous xor, xor
validate, pq, pq validate, and memcpy.  When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=y any channel with all these
capabilities is marked DMA_ASYNC_TX allowing async_tx_find_channel() to
quickly locate compatible channels with the guarantee that dependency
chains will remain on one channel.  When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=n async_tx_find_channel() may select
channels that lead to operation chains that need to cross channel
boundaries using the async_tx channel switch capability.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:51 -07:00
Dan Williams f9dd213437 Merge branch 'md-raid6-accel' into ioat3.2
Conflicts:
	include/linux/dmaengine.h
2009-09-08 17:42:29 -07:00
Dan Williams 4b652f0db3 net_dma: poll for a descriptor after allocation failure
Handle descriptor allocation failures by polling for a descriptor.  The
driver will force forward progress when polled.  In the best case this
polling interval will be the time it takes for one dma memcpy
transaction to complete.  In the worst case, channel hang, we will need
to wait 100ms for the cleanup watchdog to fire (ioatdma driver).

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:38:54 -07:00
Dan Williams a309218ace ioat2,3: dynamically resize descriptor ring
Increment the allocation order of the descriptor ring every time we run
out of descriptors up to a maximum of allocation order specified by the
module parameter 'ioat_max_alloc_order'.  After each idle period
decrement the allocation order to a minimum order of
'ioat_ring_alloc_order' (i.e. the default ring size, tunable as a module
parameter).

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:38:54 -07:00
Dan Williams 09c8a5b85e ioat: switch watchdog and reset handler from workqueue to timer
In order to support dynamic resizing of the descriptor ring or polling
for a descriptor in the presence of a hung channel the reset handler
needs to make progress while in a non-preemptible context.  The current
workqueue implementation precludes polling channel reset completion
under spin_lock().

This conversion also allows us to return to opportunistic cleanup in the
ioat2 case as the timer implementation guarantees at least one cleanup
after every descriptor is submitted.  This means the worst case
completion latency becomes the timer frequency (for exceptional
circumstances), but with the benefit of avoiding busy waiting when the
lock is contended.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:24 -07:00
Dan Williams ad643f54c8 ioat1: trim ioat_dma_desc_sw
Save 4 bytes per software descriptor by transmitting tx_cnt in an unused
portion of the hardware descriptor.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:24 -07:00
Dan Williams 345d852391 ioat: ___devinit annotate the initialization paths
Mark all single use initialization routines with __devinit.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:24 -07:00
Dan Williams f6ab95b557 ioat: preserve chanctrl bits when re-arming interrupts
The register write in ioat_dma_cleanup_tasklet is unfortunate in two
ways:
1/ It clears the extra 'enable' bits that we set at alloc_chan_resources time
2/ It gives the impression that it disables interrupts when it is in
   fact re-arming interrupts

[ Impact: fix, persist the value of the chanctrl register when re-arming ]

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:24 -07:00
Dan Williams bb32078630 ioat: ignore reserved bits for chancnt and xfercap
Don't trust that the reserved bits are always zero, also sanity check
the returned value.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:24 -07:00
Dan Williams 4fb9b9e8d5 ioat: cleanup completion status reads
The cleanup path makes an effort to only perform an atomic read of the
64-bit completion address.  However in the 32-bit case it does not
matter if we read the upper-32 and lower-32 non-atomically because the
upper-32 will always be zero.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:24 -07:00
Dan Williams 6df9183a15 ioat: add some dev_dbg() calls
Provide some output for debugging the driver.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:23 -07:00
Dan Williams 38e12f64a1 ioat1: kill unused unmap parameters
The unified ioat1/ioat2 ioat_dma_unmap() implementation derives the
source and dest addresses from the unmap descriptor.  There is no longer
a need to track this information in struct ioat_desc_sw.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:30:23 -07:00
Dan Williams 5cbafa65b9 ioat2,3: convert to a true ring buffer
Replace the current linked list munged into a ring with a native ring
buffer implementation.  The benefit of this approach is reduced overhead
as many parameters can be derived from ring position with simple pointer
comparisons and descriptor allocation/freeing becomes just a
manipulation of head/tail pointers.

It requires a contiguous allocation for the software descriptor
information.

Since this arrangement is significantly different from the ioat1 chain,
move ioat2,3 support into its own file and header.  Common routines are
exported from driver/dma/ioat/dma.[ch].

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:29:55 -07:00
Dan Williams dcbc853af6 ioat: prepare the code for ioat[12]_dma_chan split
Prepare the code for the conversion of the ioat2 linked-list-ring into a
native ring buffer.  After this conversion ioat2 channels will share
less of the ioat1 infrastructure, but there will still be places where
sharing is possible.  struct ioat_chan_common is created to house the
channel attributes that will remain common between ioat1 and ioat2
channels.

For every routine that accesses both common and hardware specific fields
the old unified 'ioat_chan' pointer is split into an 'ioat' and  'chan'
pointer.  Where 'chan' references common fields and 'ioat' the
hardware/version specific.

[ Impact: pure structure member movement/variable renames, no logic changes ]

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:29:55 -07:00
Dan Williams a6a39ca1ba ioat: fix self test interrupts
If a callback is to be attached to a descriptor the channel needs to
know at ->prep time so it can set the interrupt enable bit.  This is in
preparation for moving descriptor ioat2 descriptor preparation from
->submit to ->prep.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:29:55 -07:00
Dan Williams a0587bcf3e ioat1: move descriptor allocation from submit to prep
The async_tx api assumes that after a successful ->prep a subsequent
->submit will not fail due to a lack of resources.

This also fixes a bug in the allocation failure case.  Previously the
descriptors allocated prior to the allocation failure would not be
returned to the free list.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:29:55 -07:00
Dan Williams c7984f4e4e ioat: define descriptor control bit-field
This cleans up a mess of and'ing and or'ing bit definitions, and allows
simple assignments from the specified dma_ctrl_flags parameter.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:29:55 -07:00
Dan Williams 77867fff03 ioat: fix type mismatch for ->dmacount
->dmacount tracks the sequence number of active descriptors.  It is
written to the DMACOUNT register to update the channel's view of pending
descriptors in the chain.  The register is 16-bits so ->dmacount should
be unsigned and 16-bit as well.  Also modify ->desccount to maintain
alignment.

This was never a problem in practice because we never compared dmacount
values, but this is a bug waiting to happen.

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:29:54 -07:00