@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from the perspective of
another CPU not holding that lock. In short, a ACQUIRE followed by an
RELEASE may -not- be assumed to be a full memory barrier.
Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
imply a full memory barrier. If it is necessary for a RELEASE-ACQUIRE
pair to produce a full barrier, the ACQUIRE can be followed by an
smp_mb__after_unlock_lock() invocation. This will produce a full barrier
(including transitivity) if either (a) the RELEASE and the ACQUIRE are
executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
the same variable. The smp_mb__after_unlock_lock() primitive is free
on many architectures. Without smp_mb__after_unlock_lock(), the CPU's
execution of the critical sections corresponding to the RELEASE and the
ACQUIRE can cross, so that:
Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
not imply a full memory barrier. Therefore, the CPU's execution of the
critical sections corresponding to the RELEASE and the ACQUIRE can cross,
so that:
*A = a;
RELEASE M
@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
For example, with the following code, the store to *A will always be
seen by other CPUs before the store to *B:
*A = a;
RELEASE M
ACQUIRE N
smp_mb__after_unlock_lock();
*B = b;
The operations will always occur in one of the following orders:
STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
If the RELEASE and ACQUIRE were instead both operating on the same lock
variable, only the first of these alternatives can occur. In addition,
the more strongly ordered systems may rule out some of the above orders.
But in any case, as noted earlier, the smp_mb__after_unlock_lock()
ensures that the store to *A will always be seen as happening before
the store to *B.
Locks and semaphores may not provide any guarantee of ordering on UP compiled
systems, and so cannot be counted on in such a situation to actually achieve
anything at all - especially with respect to I/O accesses - unless combined
@ -2154,40 +2125,6 @@ But it won't see any of:
*E, *F or *G following RELEASE Q
However, if the following occurs:
CPU 1 CPU 2
=============================== ===============================
WRITE_ONCE(*A, a);
ACQUIRE M [1]
WRITE_ONCE(*B, b);
WRITE_ONCE(*C, c);
RELEASE M [1]
WRITE_ONCE(*D, d); WRITE_ONCE(*E, e);
ACQUIRE M [2]
smp_mb__after_unlock_lock();
WRITE_ONCE(*F, f);
WRITE_ONCE(*G, g);
RELEASE M [2]
WRITE_ONCE(*H, h);
CPU 3 might see:
*E, ACQUIRE M [1], *C, *B, *A, RELEASE M [1],
ACQUIRE M [2], *H, *F, *G, RELEASE M [2], *D
But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
*B, *C, *D, *F, *G or *H preceding ACQUIRE M [1]
*A, *B or *C following RELEASE M [1]
*F, *G or *H preceding ACQUIRE M [2]
*A, *B, *C, *E, *F or *G following RELEASE M [2]
Note that the smp_mb__after_unlock_lock() is critically important
here: Without it CPU 3 might see some of the above orderings.
Without smp_mb__after_unlock_lock(), the accesses are not guaranteed
to be seen in order unless CPU 3 holds lock M.
ACQUIRES VS I/O ACCESSES
------------------------