linux/net/ipv4
Konstantin Khorenko 565b7b2d2e tcp: do not send reset to already closed sockets
i've found that tcp_close() can be called for an already closed
socket, but still sends reset in this case (tcp_send_active_reset())
which seems to be incorrect.  Moreover, a packet with reset is sent
with different source port as original port number has been already
cleared on socket.  Besides that incrementing stat counter for
LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.

Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
seems to be true for the current mainstream kernel (checked on
2.6.35-rc3).  Please, correct me if i missed something.

How that happens:

1) the server receives a packet for socket in TCP_CLOSE_WAIT state
   that triggers a tcp_reset():

Call Trace:
 <IRQ>  [<ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8
 [<ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08
 [<ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a
 [<ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43
 [<ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259
 [<ffffffff80037131>] ip_local_deliver+0x200/0x2f4
 [<ffffffff8003843c>] ip_rcv+0x64c/0x69f
 [<ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa
 [<ffffffff80032eca>] process_backlog+0x90/0xec
 [<ffffffff8000cc50>] net_rx_action+0xbb/0x1f1
 [<ffffffff80012d3a>] __do_softirq+0xf5/0x1ce
 [<ffffffff8001147a>] handle_IRQ_event+0x56/0xb0
 [<ffffffff8006334c>] call_softirq+0x1c/0x28
 [<ffffffff80070476>] do_softirq+0x2c/0x85
 [<ffffffff80070441>] do_IRQ+0x149/0x152
 [<ffffffff80062665>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303
 [<ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303
 [<ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e
 [<ffffffff8006a263>] do_page_fault+0x49a/0x7dc
 [<ffffffff80066487>] thread_return+0x89/0x174
 [<ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c
 [<ffffffff80062e39>] error_exit+0x0/0x84

tcp_rcv_state_process()
...  // (sk_state == TCP_CLOSE_WAIT here)
...
        /* step 2: check RST bit */
        if(th->rst) {
                tcp_reset(sk);
                goto discard;
        }
...
---------------------------------
tcp_rcv_state_process
 tcp_reset
  tcp_done
   tcp_set_state(sk, TCP_CLOSE);
     inet_put_port
      __inet_put_port
       inet_sk(sk)->num = 0;

   sk->sk_shutdown = SHUTDOWN_MASK;

2) After that the process (socket owner) tries to write something to
   that socket and "inet_autobind" sets a _new_ (which differs from
   the original!) port number for the socket:

 Call Trace:
  [<ffffffff80255a12>] inet_bind_hash+0x33/0x5f
  [<ffffffff80257180>] inet_csk_get_port+0x216/0x268
  [<ffffffff8026bcc9>] inet_autobind+0x22/0x8f
  [<ffffffff80049140>] inet_sendmsg+0x27/0x57
  [<ffffffff8003a9d9>] do_sock_write+0xae/0xea
  [<ffffffff80226ac7>] sock_writev+0xdc/0xf6
  [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
  [<ffffffff8001fb49>] __pollwait+0x0/0xdd
  [<ffffffff8008d533>] default_wake_function+0x0/0xe
  [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
  [<ffffffff800f0b49>] do_readv_writev+0x163/0x274
  [<ffffffff80066538>] thread_return+0x13a/0x174
  [<ffffffff800145d8>] tcp_poll+0x0/0x1c9
  [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
  [<ffffffff800f0dd0>] sys_writev+0x49/0xe4
  [<ffffffff800622dd>] tracesys+0xd5/0xe0

3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):

F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3

Call Trace:
 [<ffffffff80027557>] tcp_sendmsg+0xcb/0xe87
 [<ffffffff80033300>] release_sock+0x10/0xae
 [<ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7
 [<ffffffff8026bd32>] inet_autobind+0x8b/0x8f
 [<ffffffff8003a9d9>] do_sock_write+0xae/0xea
 [<ffffffff80226ac7>] sock_writev+0xdc/0xf6
 [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
 [<ffffffff8001fb49>] __pollwait+0x0/0xdd
 [<ffffffff8008d533>] default_wake_function+0x0/0xe
 [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
 [<ffffffff800f0b49>] do_readv_writev+0x163/0x274
 [<ffffffff80066538>] thread_return+0x13a/0x174
 [<ffffffff800145d8>] tcp_poll+0x0/0x1c9
 [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
 [<ffffffff800f0dd0>] sys_writev+0x49/0xe4
 [<ffffffff800622dd>] tracesys+0xd5/0xe0

tcp_sendmsg()
...
        /* Wait for a connection to finish. */
        if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
                int old_state = sk->sk_state;
                if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
if (f_d && (err == -EPIPE)) {
        printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
                "sk_err=%d, sk_shutdown=%d\n",
                sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
                sk->sk_err, sk->sk_shutdown);
        dump_stack();
}
                        goto out_err;
                }
        }
...

4) Then the process (socket owner) understands that it's time to close
   that socket and does that (and thus triggers sending reset packet):

Call Trace:
...
 [<ffffffff80032077>] dev_queue_xmit+0x343/0x3d6
 [<ffffffff80034698>] ip_output+0x351/0x384
 [<ffffffff80251ae9>] dst_output+0x0/0xe
 [<ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2
 [<ffffffff80095700>] vprintk+0x21/0x33
 [<ffffffff800070f0>] check_poison_obj+0x2e/0x206
 [<ffffffff80013587>] poison_obj+0x36/0x45
 [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
 [<ffffffff80023481>] dbg_redzone1+0x1c/0x25
 [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
 [<ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8
 [<ffffffff80023405>] tcp_transmit_skb+0x764/0x786
 [<ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d
 [<ffffffff80258ff1>] tcp_close+0x39a/0x960
 [<ffffffff8026be12>] inet_release+0x69/0x80
 [<ffffffff80059b31>] sock_release+0x4f/0xcf
 [<ffffffff80059d4c>] sock_close+0x2c/0x30
 [<ffffffff800133c9>] __fput+0xac/0x197
 [<ffffffff800252bc>] filp_close+0x59/0x61
 [<ffffffff8001eff6>] sys_close+0x85/0xc7
 [<ffffffff800622dd>] tracesys+0xd5/0xe0

So, in brief:

* a received packet for socket in TCP_CLOSE_WAIT state triggers
  tcp_reset() which clears inet_sk(sk)->num and put socket into
  TCP_CLOSE state

* an attempt to write to that socket forces inet_autobind() to get a
  new port (but the write itself fails with -EPIPE)

* tcp_close() called for socket in TCP_CLOSE state sends an active
  reset via socket with newly allocated port

This adds an additional check in tcp_close() for already closed
sockets. We do not want to send anything to closed sockets.

Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-24 21:54:58 -07:00
..
netfilter net - IP_NODEFRAG option for IPv4 socket 2010-06-23 13:16:38 -07:00
af_inet.c net - IP_NODEFRAG option for IPv4 socket 2010-06-23 13:16:38 -07:00
ah4.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
arp.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
cipso_ipv4.c net: Remove unnecessary returns from void function()s 2010-05-17 23:23:14 -07:00
datagram.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
devinet.c arp_notify: allow drivers to explicitly request a notification event. 2010-05-31 00:27:44 -07:00
esp4.c xfrm: SA lookups signature with mark 2010-02-22 16:20:22 -08:00
fib_frontend.c ipv4: add LINUX_MIB_IPRPFILTER snmp counter 2010-06-03 03:18:19 -07:00
fib_hash.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fib_lookup.h ipv4: cleanup - remove two unused parameters from fib_semantic_match(). 2009-05-18 15:16:37 -07:00
fib_rules.c net: rtnetlink: decouple rtnetlink address families from real address families 2010-04-26 16:13:54 +02:00
fib_semantics.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
fib_trie.c net: Remove unnecessary returns from void function()s 2010-05-17 23:23:14 -07:00
icmp.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
igmp.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
inet_connection_sock.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
inet_diag.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
inet_fragment.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
inet_hashtables.c net: reserve ports for applications using fixed port numbers 2010-05-15 23:28:40 -07:00
inet_lro.c net/ipv4: Move && and || to end of previous line 2009-11-23 10:41:23 -08:00
inet_timewait_sock.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
inetpeer.c inetpeer: restore small inet_peer structures 2010-06-16 11:55:39 -07:00
ip_forward.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
ip_fragment.c ipfrag : frag_kfree_skb() cleanup 2010-06-15 18:12:44 -07:00
ip_gre.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
ip_input.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
ip_options.c net: Remove unnecessary returns from void function()s 2010-05-17 23:23:14 -07:00
ip_output.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-06-23 18:26:27 -07:00
ip_sockglue.c net - IP_NODEFRAG option for IPv4 socket 2010-06-23 13:16:38 -07:00
ipcomp.c xfrm: SA lookups signature with mark 2010-02-22 16:20:22 -08:00
ipconfig.c ipconfig: send host-name in DHCP requests 2010-06-02 07:05:03 -07:00
ipip.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
ipmr.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-06-11 13:32:31 -07:00
Kconfig syncookies: remove Kconfig text line about disabled-by-default 2010-06-04 15:56:01 -07:00
Makefile
netfilter.c Merge branch 'master' of /repos/git/net-next-2.6 2010-06-15 17:31:06 +02:00
proc.c ipv4: add LINUX_MIB_IPRPFILTER snmp counter 2010-06-03 03:18:19 -07:00
protocol.c net: constify struct net_protocol 2009-09-14 17:03:01 -07:00
raw.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
route.c inetpeer: restore small inet_peer structures 2010-06-16 11:55:39 -07:00
syncookies.c syncookies: check decoded options against sysctl settings 2010-06-16 14:42:15 -07:00
sysctl_net_ipv4.c net: reserve ports for applications using fixed port numbers 2010-05-15 23:28:40 -07:00
tcp_bic.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_cong.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
tcp_cubic.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_diag.c tcp: diag: Dont report negative values for rx queue 2009-12-03 16:06:13 -08:00
tcp_highspeed.c
tcp_htcp.c net/ipv4: Move && and || to end of previous line 2009-11-23 10:41:23 -08:00
tcp_hybla.c TCP: tcp_hybla: Fix integer overflow in slow start increment 2010-06-02 07:15:48 -07:00
tcp_illinois.c
tcp_input.c tcp: unify tcp flag macros 2010-06-15 11:56:19 -07:00
tcp_ipv4.c inetpeer: restore small inet_peer structures 2010-06-16 11:55:39 -07:00
tcp_lp.c net/ipv4: Move && and || to end of previous line 2009-11-23 10:41:23 -08:00
tcp_minisocks.c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-04-11 14:53:53 -07:00
tcp_output.c tcp: unify tcp flag macros 2010-06-15 11:56:19 -07:00
tcp_probe.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
tcp_scalable.c tcp: add helper for AI algorithm 2009-03-02 03:00:15 -08:00
tcp_timer.c TCP: avoid to send keepalive probes if receiving data 2010-04-27 12:53:25 -07:00
tcp_vegas.c tcp: tcp_vegas ssthresh bugfix 2009-05-25 22:44:59 -07:00
tcp_vegas.h
tcp_veno.c net/ipv4: Move && and || to end of previous line 2009-11-23 10:41:23 -08:00
tcp_westwood.c
tcp_yeah.c net/ipv4: Move && and || to end of previous line 2009-11-23 10:41:23 -08:00
tcp.c tcp: do not send reset to already closed sockets 2010-06-24 21:54:58 -07:00
tunnel4.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
udp_impl.h net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
udp.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
udplite.c net: spread __net_init, __net_exit 2010-01-17 19:16:02 -08:00
xfrm4_input.c net: Use ip_route_input_noref() in input path 2010-05-17 17:18:51 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
xfrm4_output.c netfilter: ipv4: use NFPROTO values for NF_HOOK invocation 2010-03-25 16:00:30 +01:00
xfrm4_policy.c net-next: remove useless union keyword 2010-06-10 23:31:35 -07:00
xfrm4_state.c xfrm: remove useless forward declarations 2008-11-25 01:05:54 -08:00
xfrm4_tunnel.c