Commit Graph

194 Commits

Author SHA1 Message Date
Tejun Heo
8ea7645c5a libata: leave port thawed after reset failure
libata EH intentionally left a port frozen if it failed
ata_eh_reset().  The intention was avoiding continuous loop of resets
when the controller or attached device is flaky and reporting spurious
hotplug events.  Once port enters this state, it can be recovered with
manual rescan, which seemed reasonable.

However, outside of my convoluted test setup, there have been very few
reports justifying this choice while there have been more cases where
the automatic freezing of the port after hotplug attempt of a faulty
device caused confusion and led to unnecessary resets.

This patch changes the behavior so that the port is thawed after reset
failure.  This change doesn't necessarily solve but makes it easier
and more intuitive to work around hotplug related problems
(ie. re-pluggin or power cycling the device) as reported in the
followings.

  https://bugzilla.kernel.org/show_bug.cgi?id=34712
  http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Reartes Guillermo <rtguille@gmail.com>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23 17:57:36 -04:00
Joe Perches
a9a79dfec2 ata: Convert ata_<foo>_printk(KERN_<LEVEL> to ata_<foo>_<level>
Saves text by removing nearly duplicated text format strings by
creating ata_<foo>_printk functions and printf extension %pV.

ata defconfig size shrinks ~5% (~8KB), allyesconfig ~2.5% (~13KB)

Format string duplication comes from:

 #define ata_link_printk(link, lv, fmt, args...) do { \
       if (sata_pmp_attached((link)->ap) || (link)->ap->slave_link)    \
               printk("%sata%u.%02u: "fmt, lv, (link)->ap->print_id,   \
                      (link)->pmp , ##args); \
       else \
               printk("%sata%u: "fmt, lv, (link)->ap->print_id , ##args); \
       } while(0)

Coalesce long formats.

$ size drivers/ata/built-in.*
   text	   data	    bss	    dec	    hex	filename
 544969	  73893	 116584	 735446	  b38d6	drivers/ata/built-in.allyesconfig.ata.o
 558429	  73893	 117864	 750186	  b726a	drivers/ata/built-in.allyesconfig.dev_level.o
 141328	  14689	   4220	 160237	  271ed	drivers/ata/built-in.defconfig.ata.o
 149567	  14689	   4220	 168476	  2921c	drivers/ata/built-in.defconfig.dev_level.o

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23 17:57:36 -04:00
Tejun Heo
8c56cacc72 libata: fix unexpectedly frozen port after ata_eh_reset()
To work around controllers which can't properly plug events while
reset, ata_eh_reset() clears error states and ATA_PFLAG_EH_PENDING
after reset but before RESET is marked done.  As reset is the final
recovery action and full verification of devices including onlineness
and classfication match is done afterwards, this shouldn't lead to
lost devices or missed hotplug events.

Unfortunately, it forgot to thaw the port when clearing EH_PENDING, so
if the condition happens after resetting an empty port, the port could
be left frozen and EH will end without thawing it, making the port
unresponsive to further hotplug events.

Thaw if the port is frozen after clearing EH_PENDING.  This problem is
reported by Bruce Stenning in the following thread.

 http://thread.gmane.org/gmane.linux.kernel/1123265

stable: I think we should weather this patch a bit longer in -rcX
	before sending it to -stable.  Please wait at least a month
	after this patch makes upstream.  Thanks.

-v2: Fixed spelling in the comment per Dave Howorth.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Cc: stable@kernel.org
Cc: Dave Howorth <dhoworth@mrc-lmb.cam.ac.uk>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-06-07 15:55:55 -04:00
Kristen Carlson Accardi
8a745f1f39 libata: Power off empty ports
Give users the option of completely powering off unoccupied
SATA ports using the existing min_power link_power_management_policy
option.  When the use selects this option on an empty port, we
will power the port off by setting DET to off.  For occupied ports,
behavior is unchanged.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-05-19 20:50:53 -04:00
Tejun Heo
5f6f12ccf3 libata: fix oops when LPM is used with PMP
ae01b2493c (libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65)
added ATA_FLAG_NO_DIPM and made ata_eh_set_lpm() check the flag.
However, @ap is NULL if @link points to a PMP link and thus the
unconditional @ap->flags dereference leads to the following oops.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
  IP: [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  ...
  Pid: 295, comm: scsi_eh_4 Tainted: P            2.6.38.5-core2 #1 System76, Inc. Serval Professional/Serval Professional
  RIP: 0010:[<ffffffff813f98e1>]  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
  RSP: 0018:ffff880132defbf0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880132f40000 RCX: 0000000000000000
  RDX: ffff88013377c000 RSI: ffff880132f40000 RDI: 0000000000000000
  RBP: ffff880132defce0 R08: ffff88013377dc58 R09: ffff880132defd98
  R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000000
  R13: 0000000000000000 R14: ffff88013377c000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000018 CR3: 0000000001a03000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process scsi_eh_4 (pid: 295, threadinfo ffff880132dee000, task ffff880133b416c0)
  Stack:
   0000000000000000 ffff880132defcc0 0000000000000000 ffff880132f42738
   ffffffff813ee8f0 ffffffff813eefe0 ffff880132defd98 ffff88013377f190
   ffffffffa00b3e30 ffffffff813ef030 0000000032defc60 ffff880100000000
  Call Trace:
   [<ffffffff81400867>] sata_pmp_error_handler+0x607/0xc30
   [<ffffffffa00b273f>] ahci_error_handler+0x1f/0x70 [libahci]
   [<ffffffff813faade>] ata_scsi_error+0x5be/0x900
   [<ffffffff813cf724>] scsi_error_handler+0x124/0x650
   [<ffffffff810834b6>] kthread+0x96/0xa0
   [<ffffffff8100cd64>] kernel_thread_helper+0x4/0x10
  Code: 8b 95 70 ff ff ff b8 00 00 00 00 48 3b 9a 10 2e 00 00 48 0f 44 c2 48 89 85 70 ff ff ff 48 8b 8d 70 ff ff ff f6 83 69 02 00 00 01 <48> 8b 41 18 0f 85 48 01 00 00 48 85 c9 74 12 48 8b 51 08 48 83
  RIP  [<ffffffff813f98e1>] ata_eh_recover+0x9a1/0x1510
   RSP <ffff880132defbf0>
  CR2: 0000000000000018

Fix it by testing @link->ap->flags instead.

stable: ATA_FLAG_NO_DIPM was added during 2.6.39 cycle but was
        backported to 2.6.37 and 38.  This is a fix for that and thus
        also applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Nathan A. Mourey II" <nmoureyii@ne.rr.com>
LKML-Reference: <1304555277.2059.2.camel@localhost.localdomain>
Cc: Connor H <cmdkhh@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-05-14 14:51:40 -04:00
Tejun Heo
ae01b2493c libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used.  Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

 http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:32:16 -04:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
James Bottomley
0e0b494ca8 libata: separate error handler into usable components
Right at the moment, the libata error handler is incredibly
monolithic.  This makes it impossible to use from composite drivers
like libsas and ipr which have to handle error themselves in the first
instance.

The essence of the change is to split the monolithic error handler
into two components: one which handles a queue of ata commands for
processing and the other which handles the back end of readying a
port.  This allows the upper error handler fine grained control in
calling libsas functions (and making sure they only get called for ATA
commands whose lower errors have been fixed up).

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:45 -05:00
James Bottomley
c34aeebc06 libata: fix eh locking
The SCSI host eh_cmd_q should be protected by the host lock (not the
port lock).  This probably doesn't matter that much at the moment,
since we try to serialise the add and eh pieces, but it might matter
in future for more convenient error handling.  Plus this switches
libata to the standard eh pattern where you lock, remove from the cmd
queue to a local list and unlock and then operate on the local list.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:45 -05:00
Tejun Heo
eb0e85e36b libata: fix hotplug for drivers which don't implement LPM
ata_eh_analyze_serror() suppresses hotplug notifications if LPM is
being used because LPM generates spurious hotplug events.  It compared
whether link->lpm_policy was different from ATA_LPM_MAX_POWER to
determine whether LPM is enabled; however, this is incorrect as for
drivers which don't implement LPM, lpm_policy is always
ATA_LPM_UNKNOWN.  This disabled hotplug detection for all drivers
which don't implement LPM.

Fix it by comparing whether lpm_policy is greater than
ATA_LPM_MAX_POWER.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:34:20 -05:00
Tejun Heo
e5005b15c9 libata: issue DIPM enable commands with LPM state updated
Low level drivers may behave differently depending on the current
link->lpm_policy.  During ata_eh_set_lpm(), DIPM enable commands are
issued after the successful completion of ap->ops->set_lpm(), which
means that the controller is already in the target state.  This causes
DIPM enable commands to be processed with mismatching controller power
state and link->lpm_policy value.

In ahci, link->lpm_policy is used to ignore certain PHY events if LPM
is enabled; however, as DIPM commands are issued with stale
link->lpm_policy, they sometimes end up triggering these conditions
and get aborted leading to LPM configuration failure.

Fix it by updating link->lpm_policy before issuing DIPM enable
commands.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-12-24 13:34:34 -05:00
Tejun Heo
c0c362b60e libata: implement cross-port EH exclusion
In libata, the non-EH code paths should always take and release
ap->lock explicitly when accessing hardware or shared data structures.
However, once EH is active, it's assumed that the port is owned by EH
and EH methods don't explicitly take ap->lock unless race from irq
handler or other code paths are expected.  However, libata EH didn't
guarantee exclusion among EHs for ports of the same host.  IOW,
multiple EHs may execute in parallel on multiple ports of the same
controller.

In many cases, especially in SATA, the ports are completely
independent of each other and this doesn't cause problems; however,
there are cases where different ports share the same resource, which
lead to obscure timing related bugs such as the one fixed by commit
213373cf (ata_piix: fix locking around SIDPR access).

This patch implements exclusion among EHs of the same host.  When EH
begins, it acquires per-host EH ownership by calling ata_eh_acquire().
When EH finishes, the ownership is released by calling
ata_eh_release().  EH ownership is also released whenever the EH
thread goes to sleep from ata_msleep() or explicitly and reacquired
after waking up.

This ensures that while EH is actively accessing the hardware, it has
exclusive access to it while allowing EHs to interleave and progress
in parallel as they hit waiting stages, which dominate the time spent
in EH.  This achieves cross-port EH exclusion without pervasive and
fragile changes while still allowing parallel EH for the most part.

This was first reported by yuanding02@gmail.com more than three years
ago in the following bugzilla.  :-)

  https://bugzilla.kernel.org/show_bug.cgi?id=8223

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: yuanding02@gmail.com
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:05 -04:00
Tejun Heo
97750cebb3 libata: add @ap to ata_wait_register() and introduce ata_msleep()
Add optional @ap argument to ata_wait_register() and replace msleep()
calls with ata_msleep() which take optional @ap in addition to the
duration.  These will be used to implement EH exclusion.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:05 -04:00
Tejun Heo
6c8ea89cec libata: implement LPM support for port multipliers
Port multipliers can do DIPM on fan-out links fine.  Implement support
for it.  Tested w/ SIMG 57xx and marvell PMPs.  Both the host and
fan-out links enter power save modes nicely.

SIMG 37xx and 47xx report link offline on SStatus causing EH to detach
the devices.  Blacklisted.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:04 -04:00
Tejun Heo
6b7ae9545a libata: reimplement link power management
The current LPM implementation has the following issues.

* Operation order isn't well thought-out.  e.g. HIPM should be
  configured after IPM in SControl is properly configured.  Not the
  other way around.

* Suspend/resume paths call ata_lpm_enable/disable() which must only
  be called from EH context directly.  Also, ata_lpm_enable/disable()
  were called whether LPM was in use or not.

* Implementation is per-port when it should be per-link.  As a result,
  it can't be used for controllers with slave links or PMP.

* LPM state isn't managed consistently.  After a link reset for
  whatever reason including suspend/resume the actual LPM state would
  be reset leaving ap->lpm_policy inconsistent.

* Generic/driver-specific logic boundary isn't clear.  Currently,
  libahci has to mangle stuff which libata EH proper should be
  handling.  This makes the implementation unnecessarily complex and
  fragile.

* Tied to ALPM.  Doesn't consider DIPM only cases and doesn't check
  whether the device allows HIPM.

* Error handling isn't implemented.

Given the extent of mismatch with the rest of libata, I don't think
trying to fix it piecewise makes much sense.  This patch reimplements
LPM support.

* The new implementation is per-link.  The target policy is still
  port-wide (ap->target_lpm_policy) but all the mechanisms and states
  are per-link and integrate well with the rest of link abstraction
  and can work with slave and PMP links.

* Core EH has proper control of LPM state.  LPM state is reconfigured
  when and only when reconfiguration is necessary.  It makes sure that
  LPM state is reset when probing for new device on the link.
  Controller agnostic logic is now implemented in libata EH proper and
  driver implementation only has to deal with controller specifics.

* Proper error handling.  LPM config failure is attributed to the
  device on the link and LPM is disabled for the link if it fails
  repeatedly.

* ops->enable/disable_pm() are replaced with single ops->set_lpm()
  which takes @policy and @hints.  This simplifies driver specific
  implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:04 -04:00
Tejun Heo
c93b263e0d libata: clean up lpm related symbols and sysfs show/store functions
Link power management related symbols are in confusing state w/ mixed
usages of lpm, ipm and pm.  This patch cleans up lpm related symbols
and sysfs show/store functions as follows.

* lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and
  MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and
  ATA_LPM_{MIN|MAX|MED}_POWER.

* Pre/postfixes are unified to lpm.

* sysfs show/store functions for link_power_management_policy were
  curiously named get/put and unnecessarily complex.  Renamed to
  show/store and simplified.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:04 -04:00
Gwendal Grignou
d9027470b8 [libata] Add ATA transport class
This is a scheleton for libata transport class.
All information is read only, exporting information from libata:
- ata_port class: one per ATA port
- ata_link class: one per ATA port or 15 for SATA Port Multiplier
- ata_device class: up to 2 for PATA link, usually one for SATA.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Reviewed-by: Grant Grundler <grundler@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-10-21 20:21:03 -04:00
Tejun Heo
e2f3d75fc0 libata: skip EH autopsy and recovery during suspend
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend.  As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09 22:27:59 -04:00
Linus Torvalds
3b7433b8a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
  workqueue: mark init_workqueues() as early_initcall()
  workqueue: explain for_each_*cwq_cpu() iterators
  fscache: fix build on !CONFIG_SYSCTL
  slow-work: kill it
  gfs2: use workqueue instead of slow-work
  drm: use workqueue instead of slow-work
  cifs: use workqueue instead of slow-work
  fscache: drop references to slow-work
  fscache: convert operation to use workqueue instead of slow-work
  fscache: convert object to use workqueue instead of slow-work
  workqueue: fix how cpu number is stored in work->data
  workqueue: fix mayday_mask handling on UP
  workqueue: fix build problem on !CONFIG_SMP
  workqueue: fix locking in retry path of maybe_create_worker()
  async: use workqueue for worker pool
  workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
  workqueue: implement unbound workqueue
  workqueue: prepare for WQ_UNBOUND implementation
  libata: take advantage of cmwq and remove concurrency limitations
  workqueue: fix worker management invocation without pending works
  ...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07 12:42:58 -07:00
FUJITA Tomonori
acad76272c [libata] add ATA_CMD_DSM to ata_get_cmd_descript
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-08-01 19:36:03 -04:00
Tejun Heo
ad72cf9885 libata: take advantage of cmwq and remove concurrency limitations
libata has two concurrency related limitations.

a. ata_wq which is used for polling PIO has single thread per CPU.  If
   there are multiple devices doing polling PIO on the same CPU, they
   can't be executed simultaneously.

b. ata_aux_wq which is used for SCSI probing has single thread.  In
   cases where SCSI probing is stalled for extended period of time
   which is possible for ATAPI devices, this will stall all probing.

#a is solved by increasing maximum concurrency of ata_wq.  Please note
that polling PIO might be used under allocation path and thus needs to
be served by a separate wq with a rescuer.

#b is solved by using the default wq instead and achieving exclusion
via per-port mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
2010-07-02 10:59:24 +02:00
Tejun Heo
fe06e5f9b7 libata-sff: separate out BMDMA EH
Some of error handling logic in ata_sff_error_handler() and all of
ata_sff_post_internal_cmd() are for BMDMA.  Create
ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move
BMDMA part into those.

While at it, change DMA protocol check to ata_is_dma(), fix
post_internal_cmd to call ap->ops->bmdma_stop instead of directly
calling ata_bmdma_stop() and open code hardreset selection so that
ata_std_error_handler() doesn't have to know about sff hardreset.

As these two functions are BMDMA specific, there's no reason to check
for bmdma_addr before calling bmdma methods if the protocol of the
failed command is DMA.  sata_mv and pata_mpc52xx now don't need to set
.post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't
need to set it to their bmdma_stop routines.

ata_sff_post_internal_cmd() becomes noop and is removed.

This fixes p3 described in clean-up-BMDMA-initialization patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-19 13:36:46 -04:00
Tejun Heo
c429137a67 libata-sff: port_task is SFF specific
port_task is tightly bound to the standard SFF PIO HSM implementation.
Using it for any other purpose would be error-prone and there's no
such user and if some drivers need such feature, it would be much
better off using its own.  Move it inside CONFIG_ATA_SFF and rename it
to sff_pio_task.

The only function which is exposed to the core layer is
ata_sff_flush_pio_task() which is renamed from ata_port_flush_task()
and now also takes care of resetting hsm_task_state to HSM_ST_IDLE,
which is possible as it's now specific to PIO HSM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-05-19 13:35:49 -04:00
Jeff Garzik
a09bf4cd53 libata: ensure NCQ error result taskfile is fully initialized
before returning it via qc->result_tf.

Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-22 21:59:13 -04:00
Tejun Heo
fa41efdae7 libata: fix locking around blk_abort_request()
blk_abort_request() expectes queue lock to be held by the caller.
Grab it before calling the function.

Lack of this synchronization led to infinite loop on corrupt
q->timeout_list.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-04-22 21:47:52 -04:00
Tejun Heo
534ead7092 libata: retry FS IOs even if it has failed with AC_ERR_INVALID
libata currently doesn't retry if a command fails with AC_ERR_INVALID
assuming that retrying won't get it any further even if retried.
However, a failure may be classified as invalid through hardware
glitch (incorrect reading of the error register or firmware bug) and
there isn't whole lot to gain by not retrying as actually invalid
commands will be failed immediately.  Also, commands serving FS IOs
are extremely unlikely to be invalid.  Retry FS IOs even if it's
marked invalid.

Transient and incorrect invalid failure was seen while debugging
firmware related issue on Samsung n130 on bko#14314.

  http://bugzilla.kernel.org/show_bug.cgi?id=14314

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-01-20 14:25:11 -05:00
Tejun Heo
6013efd886 libata: retry failed FLUSH if device didn't fail it
If ATA device failed FLUSH, it means that the device failed to write
out some amount of data and the error needs to be reported to upper
layers. As retries can't recover the lost data, FLUSH failures need to
be reported immediately in general.

However, if FLUSH fails due to transmission errors, the FLUSH needs to
be retried; otherwise, filesystems may switch to RO mode and/or raid
array may drop a drive for a random transmission glitch.

This condition can be rather easily reproduced on certain ahci
controllers which go through a PHY event after powersave mode switch +
ext4 combination.  Powersave mode switch is often closely followed by
flush from the filesystem failing the FLUSH with ATA bus error which
makes the filesystem code believe that data is lost and drop to RO
mode.  This was reported in the following bugzilla bug.

  http://bugzilla.kernel.org/show_bug.cgi?id=14543

This patch makes libata EH retry FLUSH if it wasn't failed by the
device.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-12-03 02:46:35 -05:00
Tejun Heo
4f7c287499 libata: fix PMP initialization
Commit 842faa6c1a fixed error handling
during attach by not committing detected device class to dev->class
while attaching a new device.  However, this change missed the PMP
class check in the configuration loop causing a new PMP device to go
through ata_dev_configure() as if it were an ATA or ATAPI device.

As PMP device doesn't have a regular IDENTIFY data, this makes
ata_dev_configure() tries to configure a PMP device using an invalid
data.  For the most part, it wasn't too harmful and went unnoticed but
this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
sata_pmp_attach().  This means that SATA_PMP_FEAT_NOTIFY ends up being
disabled on PMPs and on PMPs which honor the flag breaks hotplug
support.

This problem was discovered and reported by Ethan Hsiao.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ethan Hsiao <ethanhsiao@jmicron.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-10-16 06:21:54 -04:00
Tejun Heo
3b761d3d43 libata: fix incorrect link online check during probe
While trying to work around spurious detection retries for
non-existent devices on slave links, commit
816ab89782 incorrectly added link
offline check logic before ata_eh_thaw() was called.  This means that
if an occupied link goes down briefly at the time that offline check
was performed, device class will be cleared to ATA_DEV_NONE and libata
wouldn't retry thus failing detection of the device.

The offline check should be done after the port is thawed together
with online check so that such link glitches can be detected by the
interrupt handler and handled properly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Blechmann <tim@klingt.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-10-06 20:58:18 -04:00
Robert Hancock
6521148c64 libata: add command name parsing for error output
This patch improve libata's output for error/notification messages
to allow easier comprehension and debugging:

When ATAPI commands issued through the SCSI layer fail, use SCSI
functions to print the CDB in human-readable form instead of just
dumping out the CDB in hex.

Print out the name of the failed command (as defined by the ATA
specification) in error handling output along with the raw register
contents.

When reporting status of ACPI taskfile commands executed on resume,
also output the names of the commands being executed (or not) in
readable form.

Since the extra data for printing command names increases kernel
size slightly, a config option has been added to allow disabling
command name output (as well as some of the error register parsing)
for those highly sensitive to kernel text size.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01 19:47:20 -04:00
Tejun Heo
1e641060c4 libata: clear eh_info on reset completion
Resets are done with port frozen but some controllers still issue
interrupts during reset and they may end up recording error conditions
in ehi leading to unnecessary EH retrials.

This patch makes ata_eh_reset() clear ehi on reset completion.  As
reset is the most severe recovery action, there's nothing to lose by
clearing ehi on its completion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01 19:47:19 -04:00
Jeff Garzik
54c38444fa [libata] EH: freeze port before aborting commands
Call the ->freeze() hook before aborting qc's, because some hardware
requires special handling prior to accessing the taskfile registers
(for diagnosis/analysis/reset).  Most notably, hardware may wish to
disable the DMA engine or interrupts in the ->freeze() hook.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-01 19:47:19 -04:00
Bartlomiej Zolnierkiewicz
705d201414 libata: add missing NULL pointer check to ata_eh_reset()
drivers/ata/libata-eh.c +2403 ata_eh_reset(80) warning: variable derefenced before check 'slave'

Please note that this is _not_ a real bug at the moment since ata_eh_context
structure is embedded into ata_list structure and the code alwas checks for
'slave' before accessing 'sehc'.

Anyway lets add missing check and always have a valid 'sehc' pointer (which
makes code easier to understand and prevents introducing some possible bugs
in the future).

Reported-by: Dan Carpenter <error27@gmail.com>
Cc: corbet@lwn.net
Cc: eteo@redhat.com
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-07-28 21:05:41 -04:00
Tejun Heo
fe2c4d018f libata: fix follow-up SRST failure path
ata_eh_reset() was missing error return handling after follow-up SRST
allowing EH to continue the normal probing path after reset failure.
This was discovered while testing new WD 2TB drives which take longer
than 10 secs to spin up and cause the first follow-up SRST to time
out.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-07-14 22:41:28 -04:00
Martin Olsson
98a1708de1 trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in documentation and source comments.
Signed-off-by: Martin Olsson <martin@minimum.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:46 +02:00
Tejun Heo
6f9c1ea2c1 libata: clear ering on resume
Error timestamps are in jiffies which doesn't run while suspended and
PHY events during resume isn't too uncommon.  When the two are
combined, it can lead to unnecessary speed downs if the machine is
suspended and resumed repeatedly.  Clear error history on resume.

This was reported and verified in bnc#486803 by Vladimir Botka.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Vladimir Botka <vbotka@novell.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-11 14:30:59 -04:00
Tejun Heo
842faa6c1a libata: fix attach error handling
New device attach path in ata_eh_revalidate_and_attach() is divided
into two separate loops because ATA requires IDENTIFY to be issued to
slave first while the user expects to see device probe messages from
the master device.  new_mask is used to track which devices are the
new ones between the first loop and the second.

This usually works well but if an error occurs during configuration
stage, ata_dev_revalidate_and_attach() returns with error code and
forgets new_mask.  On the retry run, dev->class is set and new_mask
for the device is clear, so the device just gets revalidated and thus
ends up skipping post-configuration procedure including scheduling of
SCSI_HOTPLUG for the device.  When this occurs, ATA part of probing
works fine but SCSI probing usually doesn't happen and makes the
device unreachable.

The behavior has been around for a very long time but it has been
uncovered with the recent addition of 1_5_GBPS horkage which uses
-EAGAIN return value from ata_dev_configure() to restart the probing
sequence after forcing cable speed.

This can be fixed by making sure dev->class is permanently set only
after all configurations are successfully complete.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-11 14:26:01 -04:00
Alan Cox
c96f1732e2 [libata] Improve timeout handling
On a timeout call a device specific handler early in the recovery so that
we can complete and process successful commands which timed out due to IRQ
loss or the like rather more elegantly.

[Revised to exclude the timeout handling on a few devices that inherit from
 SFF but are not SFF enough to use the default timeout handler]

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-03-24 22:52:39 -04:00
Tejun Heo
d6515e6ff4 libata: make sure port is thawed when skipping resets
When SCR access is available and the link is offline, softreset is
skipped as it only wastes time and some controllers don't respond very
well.  However, the skip path forgot to thaw the port, which not only
blocks further event notification from the port but also causes
repeated EH invocations on the same event on drivers which rely on
->thaw() to clear events if the IRQ is shared with another device or
port.

This problem has always been there but is uncovered by recent sata_nv
nf2/3 change which dropped hardreset support while maintaining SCR
access.  nf2/3 doesn't clear hotplug event mask from the interrupt
handler but relies on ->thaw() to clear them.  When the hardreset was
there, the reset action was never skipped and the port was always
thawed but, with the hardreset gone, ->prereset() determines that
there's no need for softreset and both ->softreset() and ->thaw() are
skipped.  This leads to stuck hotplug event in the IRQ status register
triggering hotplug event whenever IRQ is delieverd on the same IRQ.
As the controller shares the same IRQ for both ports, this happens on
every IO if one port is occpupied and the other isn't.

This patch fixes the problem by making sure that the port is thawed on
reset-skip path.

bko#11615 reports this problem.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockrwd@gmail.com>
Reported-by: Dan Andresan <danyer@gmail.com>
Reported-by: Arne Woerner <arne_woerner@yahoo.com>
Reported-by: Stefan Lippers-Hollmann <s.L-H@gmx.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-03-05 07:25:43 -05:00
Tejun Heo
b535708146 libata: don't use on-stack sense buffer
sense_buffer is used as DMA target and shouldn't be allocated on
stack.  Use ap->sector_buf instead.  This problem is spotted by Chuck
Ebbert.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-03-05 07:25:10 -05:00
Tejun Heo
cf9a590a9e libata: add no penalty retry request for EH device handling routines
Let -EAGAIN from EH device handling routines trigger EH retry without
consuming its tries count.  This will be used to implement link SPD
horkage which requires hardreset to adjust SPD without affecting other
EH decisions.  As it bypasses the forward progress guarantee provided
by the tries count, the requester is responsible for ensuring forward
progress.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:04:19 -05:00
Tejun Heo
c2c7a89c5e libata: improve probe failure handling
When link is flaky at high speed, it isn't uncommon for a device to
repeatedly fail probing sequence early after successfully negotiating
high link speed.  This often leads to consecutive hotplug events
without successful probing.

This patch improves libata EH such that it remembers probing trials
and if there have been more than two unsuccessful trials in the past
60 seconds, slows down link speed to 1.5Gbps.

As link speed negotiation is the duty of the PHY layer proper, the
goal of this fallback mechanism is to provide the last resort when
everything else fails, which unfortunately happens not too
infrequently, so no fancy 6->3->1.5 speeding down or highest
successful transmission speed seen kind of logics (yet).

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:34 -05:00
Tejun Heo
a07d499b47 libata: add @spd_limit to sata_down_spd_limit()
Add @spd_limit to sata_down_spd_limit() so that the caller can specify
the SPD limit it wants.  This parameter doesn't get in the way even
when it's too low.  The closest possible limit is applied.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:22 -05:00
Tejun Heo
99cf610aa4 libata: clear dev->ering in smarter way
dev->ering used to be cleared together with the rest of ata_device in
ata_dev_init() which is called whenever a probing event occurs.
dev->ering is about to be used to track probing failures so it needs
to remain persistent over multiple porbing events.  This patch
achieves this by doing the following.

* Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
  clear between BEGIN and END.  ering is moved after END.  The split
  of persistent area is to allow hotter items remain at the head.

* ering is explicitly cleared on ata_dev_disable() and when device
  attach succeeds.  So, ering is persistent throug a device's life
  time (unless explicitly cleared of course) and also through periods
  inbetween disablement of an attached device and successful detection
  of the next one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:17 -05:00
Tejun Heo
678afac678 libata: move ata_dev_disable() to libata-eh.c
ata_dev_disable() is about to be more tightly integrated into EH
logic.  Move it to libata-eh.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:00 -05:00
Tejun Heo
d89293abd9 libata: fix EH device failure handling
The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.

This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:02:57 -05:00
Tejun Heo
ece180d1cf libata: perform port detach in EH
ata_port_detach() first made sure EH saw ATA_PFLAG_UNLOADING and then
assumed EH context belongs to it and performed detach operation
itself.  However, UNLOADING doesn't disable all of EH and this could
lead to problems including triggering WARN_ON()'s in EH path.

This patch makes port detach behave more like other EH actions such
that ata_port_detach() requests EH to detach and waits for completion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-12-28 22:43:21 -05:00
Tejun Heo
1eca4365be libata: beef up iterators
There currently are the following looping constructs.

* __ata_port_for_each_link() for all available links
* ata_port_for_each_link() for edge links
* ata_link_for_each_dev() for all devices
* ata_link_for_each_dev_reverse() for all devices in reverse order

Now there's a need for looping construct which is similar to
__ata_port_for_each_link() but iterates over PMP links before the host
link.  Instead of adding another one with long name, do the following
cleanup.

* Implement and export ata_link_next() and ata_dev_next() which take
  @mode parameter and can be used to build custom loop.
* Implement ata_for_each_link() and ata_for_each_dev() which take
  looping mode explicitly.

The following iteration modes are implemented.

* ATA_LITER_EDGE		: loop over edge links
* ATA_LITER_HOST_FIRST		: loop over all links, host link first
* ATA_LITER_PMP_FIRST		: loop over all links, PMP links first

* ATA_DITER_ENABLED		: loop over enabled devices
* ATA_DITER_ENABLED_REVERSE	: loop over enabled devices in reverse order
* ATA_DITER_ALL			: loop over all devices
* ATA_DITER_ALL_REVERSE		: loop over all devices in reverse order

This change removes exlicit device enabledness checks from many loops
and makes it clear which ones are iterated over in which direction.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-12-28 22:43:20 -05:00
Tejun Heo
19b723218b libata: fix last_reset timestamp handling
ehc->last_reset is used to ensure that resets are not issued too
close to each other.  It's initialized to jiffies minus one minute
on EH entry.  However, when new links are initialized after PMP is
probed, new links have zero for this timestamp resulting in long wait
depending on the current jiffies.

This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in
which case last_reset is always initialized.  As an added precaution,
WARN_ON() is added so that warning is printed if last_reset is
in future.

This problem is spotted and debugged by Shane Huang.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shane Huang <Shane.Huang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-11-11 03:01:21 -05:00
Tejun Heo
90484ebfc9 libata: clear saved xfer_mode and ncq_enabled on device detach
libata EH saves xfer_mode and ncq_enabled at start to later set
DUBIOUS_XFER flag if it has changed.  These values need to be cleared
on device detach such that hot device swap doesn't accidentally miss
DUBIOUS_XFER.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-10-27 23:55:40 -04:00