用户态的一些同步设计(Linux Futex、Java JUC)

Linux Futex

2002 年的《Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux》中提及,在实现用户态同步设计之前,用户态程序线程/进程需要对共享资源上锁,就得依赖内核态的信号量(在 System V 下提供 semop() 等函数、在 POSIX 下提供 semaphore 结构体;但由于 POSIX 格外提供线程同步函数,尽管 Futex 未被应用,内部也通过用户态自旋锁实现线程同步,但只局限线程)。

用户在旧版的内核 POSIX 中进行多进程的信号量同步,进程在 Trap in 内核态后,POSIX 会执行 LOCK 指令锁定总线来读取信号量值,如果发现值不满足,则会通知调度器放入等待队列。

Futex 的思路是把总线 LOCK 替换成自旋,而且把自旋部分放在用户态。

pthread_mutex_lock 函数将交由 lowlevellock 的 lll_lock() 执行自旋,自旋的第一步是尝试把 futex 标记变量从 0 置为 1,标记着从空闲到请求但无竞争,一旦原子性的变量覆盖成功,意味着锁获取成功,否则执行 __lll_lock_wait() 自旋,把标记置为 2,标记存在竞争并陷入内核,执行 futex() 系统调用对线程/进程进行阻塞。

/* This is an expression rather than a statement even though its value is
    void, so that it can be used in a comma expression or as an expression
    that's cast to void.  / / The inner conditional compiles to a call to __lll_lock_wait_private if
    private is known at compile time to be LLL_PRIVATE, and to a call to
    __lll_lock_wait otherwise.  / / If FUTEX is 0 (not acquired), set to 1 (acquired with no waiters) and
    return.  Otherwise, ensure that it is >1 (acquired, possibly with waiters)
    and then block until we acquire the lock, at which point FUTEX will still be
      The lock is always acquired on return.  */   
   define __lll_lock(futex, private)                                      \
   ((void)                                                               \
      ({                                                                   \
        int *__futex = (futex);                                            \
        if (__glibc_unlikely                                               \
            (atomic_compare_and_exchange_bool_acq (__futex, 1, 0)))        \
          {                                                                \
            if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \
              __lll_lock_wait_private (__futex);                           \
            else                                                           \
              __lll_lock_wait (__futex, private);                          \
          }                                                                \
      }))
   define lll_lock(futex, private)    \
   __lll_lock (&(futex), private) 

futex_wait() 最终由 sysdep.h 的汇编指令陷入内核,并执行 futex() 的系统调用,由内核的偏函数 /kernel/futex.c#do_futex() 接受。以上就是 Futex 在用户态的设计,也就是类似客户端的设计。

Futex 的内核态,也就是服务端,其首要任务是判断客户端给予的参数信息是否过期,比如客户端以为现在存在锁竞争,但实际上锁已被释放,这些无效请求将返回客户端重新处理;然后,基于 HashMap 记录 futex 标记变量的地址,以便于 unlock 时候索引到阻塞的任务;最终,执行调度器的 /kernel/sched/core.c#schedule() 函数添加到等待队列,并放弃当前线程的执行。

同理,unlock 也如此,在用户态,无条件把 futex 标记变量置为 0,观察之前的数值,如果等于 2 即存在等待的任务,通知内核态唤醒;内核态 do_futex() 偏函数收到后,指派给 mark_wake_futex(),通过 HashMap 寻找到竞争同一资源的任务,执行调度器的 wake_q_add() 把等待队列的任务放回到唤醒队列。

除此之外,Futex 还实现 Priority-inheritance 优先级继承:

       Linux supports priority-inheritance (PI) futexes in order to
       handle priority-inversion problems that can be encountered with
       normal futex locks.  Priority inversion is the problem that
       occurs when a high-priority task is blocked waiting to acquire a
       lock held by a low-priority task, while tasks at an intermediate
       priority continuously preempt the low-priority task from the CPU.
       Consequently, the low-priority task makes no progress toward
       releasing the lock, and the high-priority task remains blocked.

       Priority inheritance is a mechanism for dealing with the
       priority-inversion problem.  With this mechanism, when a high-
       priority task becomes blocked by a lock held by a low-priority
       task, the priority of the low-priority task is temporarily raised
       to that of the high-priority task, so that it is not preempted by
       any intermediate level tasks, and can thus make progress toward
       releasing the lock.  To be effective, priority inheritance must
       be transitive, meaning that if a high-priority task blocks on a
       lock held by a lower-priority task that is itself blocked by a
       lock held by another intermediate-priority task (and so on, for
       chains of arbitrary length), then both of those tasks (or more
       generally, all of the tasks in a lock chain) have their
       priorities raised to be the same as the high-priority task.

Java JUC

Java JUC 是源自《The java.util.concurrent Synchronizer Framework》,其基于 CLH Lock。

对比 Futex 模型,Futex 本质是 TAS 模型,在相同的共享资源下,并发自旋同一个变量,只是得益于调度器的兜底,防止了并发增长的情况下,依旧在自旋的任务不至于过多;但是对于任务小,任务量大,竞争激烈,频繁的陷入内核态及进行调度,可能会不及于为 Scalable 设计的 CLH Lock。

CLH Lock 基础实现是,为每一个线程创建一个结点,上锁前先设置 Tail hint,然后检测前驱结点状态,当有新线程也想获取锁时候,通过 Tail hint 找到前驱结点;解锁的时候把当前结点状态置空,后面等待的线程检测到前驱释放锁就取消自旋阻塞。

public class CLHLock {
    private AtomicReference<QNode> tail = new AtomicReference<QNode>(new QNode());
    private ThreadLocal<QNode> myPred, myNode;
    private class QNode {
        volatile boolean locked = false;
    }
    public CLHLock() {
        myPred = new ThreadLocal<QNode>() {
            protected QNode initialValue() {
                return null;
            }
        };
        myNode = new ThreadLocal<QNode>() {
            protected QNode initialValue() {
                return new QNode();
            }
        };
    }
    public void lock() {
        QNode qnode = myNode.get();
        qnode.locked = true;
        QNode pred = tail.getAndSet(qnode);
        myPred.set(pred);
        while (pred.locked);
    }
    public void unlock() {
        QNode qnode = myNode.get();
        qnode.locked = false;
        myNode.set(myPred.get()); // GC
    }
}

在条件变量的实现上,JUC 与 ObjectMonitor 的 ObjectWaiter、pthread_cond_wait 的设计相似,即,每个条件变量,对应着一条等待队列,具体表现为:

  • 在 wait 的时候执行
    1. 获取锁(在上文由用户显式调用)
    2. (锁获取成功后)添加结点到等待队列
    3. 释放锁(内部隐式调用)
    4. 休眠线程(注意唤醒后会再次获取锁)
  • 在 notify 的时候执行
    1. 获取锁(在上文由用户显式调用)
    2. (锁获取成功后)取出等待队列的结点
    3. 释放锁(在下文由用户显式调用)
    4. 唤醒休眠线程

但由于 JUC 的锁操作与条件变量的设计均使用队列,因此在代码体现上,条件变量的操作反映在 CLH 队列与条件等待队列:

  • 比如有 T1 线程执行 wait 操作
    1. 添加 LOCK 结点到 CLH 队列
    2. (LOCK 成功后)添加 CONDITION 结点到等待队列
    3. 修改 LOCK 结点为释放状态并唤醒后继结点(即解锁操作,此时线程相当于不持有锁)
    4. 休眠 T1 线程
  • 然后有 T2 执行 notify 操作
    1. 添加 LOCK 结点到 CLH 队列
    2. (LOCK 成功后)取出等待队列的 CONDITION 结点转为 LOCK 结点并将其添加到 CLH 队列(此时 T1 有 LOCK 结点在 CLH 队列,相当于 T1 在等待获取锁)
    3. 修改 LOCK 结点为释放并唤醒后继(此时后继可能就是 T1 的 LOCK 结点)。

JUC 还设计了任务取消、任务超时处理实现(待补充)。

Linux Mutex Subsystem

《Generic Mutex Subsystem》把内核内部程序的基于信号量设计的锁,直接换成 MCS Lock。MCS Lock 虽然在 NUMA 架构上非常实用,但《The java.util.concurrent Synchronizer Framework》提及,CLH Lock 在任务取消以及超时处理上更优。

参考资料

  1. https://www.kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf
  2. http://gee.cs.oswego.edu/dl/papers/aqs.pdf
  3. https://classes.engineering.wustl.edu/cse539/web/lectures/locks.pdf
  4. https://www.lagou.com/lgeduarticle/63917.html
  5. https://lwn.net/Articles/164802/

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注