69 lines
2.3 KiB
Diff
69 lines
2.3 KiB
Diff
Subject: sched: Queue RT tasks to head when prio drops
|
|
From: Thomas Gleixner <tglx@linutronix.de>
|
|
Date: Tue, 04 Dec 2012 08:56:41 +0100
|
|
|
|
The following scenario does not work correctly:
|
|
|
|
Runqueue of CPUx contains two runnable and pinned tasks:
|
|
T1: SCHED_FIFO, prio 80
|
|
T2: SCHED_FIFO, prio 80
|
|
|
|
T1 is on the cpu and executes the following syscalls (classic priority
|
|
ceiling scenario):
|
|
|
|
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
|
|
...
|
|
sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
|
|
...
|
|
|
|
Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
|
|
to sleep the scheduler picks T2. Surprise!
|
|
|
|
The same happens w/o actual preemption when T1 is forced into the
|
|
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
|
|
pick_next_task() which returns T2. So T1 gets preempted and scheduled
|
|
out.
|
|
|
|
This happens because sched_setscheduler() dequeues T1 from the prio 90
|
|
list and then enqueues it on the tail of the prio 80 list behind T2.
|
|
This violates the POSIX spec and surprises user space which relies on
|
|
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
|
|
give the CPU up voluntarily or are preempted by a higher priority
|
|
task. In the latter case the preempted task must get back on the CPU
|
|
after the preempting task schedules out again.
|
|
|
|
We fixed a similar issue already in commit 60db48c (sched: Queue a
|
|
deboosted task to the head of the RT prio queue). The same treatment
|
|
is necessary for sched_setscheduler(). So enqueue to head of the prio
|
|
bucket list if the priority of the task is lowered.
|
|
|
|
It might be possible that existing user space relies on the current
|
|
behaviour, but it can be considered highly unlikely due to the corner
|
|
case nature of the application scenario.
|
|
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Cc: stable@vger.kernel.org
|
|
Cc: stable-rt@vger.kernel.org
|
|
---
|
|
kernel/sched/core.c | 9 +++++++--
|
|
1 file changed, 7 insertions(+), 2 deletions(-)
|
|
|
|
--- a/kernel/sched/core.c
|
|
+++ b/kernel/sched/core.c
|
|
@@ -4168,8 +4168,13 @@ recheck:
|
|
|
|
if (running)
|
|
p->sched_class->set_curr_task(rq);
|
|
- if (on_rq)
|
|
- enqueue_task(rq, p, 0);
|
|
+ if (on_rq) {
|
|
+ /*
|
|
+ * We enqueue to tail when the priority of a task is
|
|
+ * increased (user space view).
|
|
+ */
|
|
+ enqueue_task(rq, p, oldprio <= p->prio ? ENQUEUE_HEAD : 0);
|
|
+ }
|
|
|
|
check_class_changed(rq, p, prev_class, oldprio);
|
|
task_rq_unlock(rq, p, &flags);
|