Browse Source

tcp: allow splice() to build full TSO packets

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <>
Cc: Nandita Dukkipati <>
Cc: Neal Cardwell <>
Cc: Tom Herbert <>
Cc: Yuchung Cheng <>
Cc: H.K. Jerry Chu <>
Cc: Maciej Żenczykowski <>
Cc: Mahesh Bandewar <>
Cc: Ilpo Järvinen <>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <>
Eric Dumazet 10 years ago
committed by David S. Miller
  1. 2


@ -860,7 +860,7 @@ wait_for_memory:
if (copied)
if (copied && !(flags & MSG_MORE))
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;