| Subject: | Re: BUG: crasher [IMPORTANT PATCH] |
| From: | Roberto Nibali <ratz,AT,tac,DOT,ch> |
| Date: | Fri, 18 Jan 2002 18:12:57 +0100 |
| In-reply-to: | <E16NgCt-0001da-00@bigred.inka.de> |
Hi, > > Holy cow! That's why our cipe tunnels always crashed when > > connecting with netcat and sending 7bytes + 1byte return!!! > > Were you able to reestablish the connection(s) without > rebooting/reloading the module? Nope, I had to rmmod the kernel module. And if it was the first vpn device I couldn't even rmmod the module anymore. This is another bug I hope to fix anytime soon. And you have to 'ip link set vpn-device down' or 'ifconfig vpn-device' down. The kernel module happily works with other tunnels and you can add or remove tunnels to your likings. But you need to either reboot the machine or write a kernel module that flushes the receive queue of the crashed vpn-device. Olaf, despite the missing check and the oops it generates, why the hell doesn't the kernel close the socket? Example output documented below: Jan 18 09:11:46 tm Unable to handle kernel paging request at virtual address 84000000 Jan 18 09:11:46 tm current->tss.cr3 = 033e7000, %cr3 = 033e7000 Jan 18 09:11:46 tm *pde = 00000000 Jan 18 09:11:46 tm Oops: 0000 Jan 18 09:11:46 tm CPU: 0 Jan 18 09:11:46 tm EIP: 0010:[<8482d33a>] Jan 18 09:11:46 tm EFLAGS: 00000282 Jan 18 09:11:46 tm eax: fcfe1050 ebx: 84000000 ecx: 71737d50 edx: fcfe10a3 Jan 18 09:11:46 tm esi: 8482f08c edi: 8311c010 ebp: 8366dd70 esp: 8366dd10 Jan 18 09:11:46 tm ds: 0018 es: 0018 ss: 0018 Jan 18 09:11:46 tm Process frapd-cb (pid: 493, process nr: 10, stackpage=8366d000) Jan 18 09:11:46 tm Stack: 8482cd21 80fe10a8 fffffffc 8311c02c 8022e000 832d2980 83ee2904 80fe10b0 Jan 18 09:11:46 tm 8311c010 00000246 8366dd6c 8366c000 8482b6c3 8311c010 80fe10a8 8366dd70 Jan 18 09:11:46 tm 00000246 832d2840 8311c010 8366df18 831f2400 836e6000 e1270015 080217ac Jan 18 09:11:46 tm Call Trace: [<8482cd21>] [<8482b6c3>] [<8482b9b4>] [<8016dacb>] [<801457e9>] [<801115fe>] [<80146400>] Jan 18 09:11:46 tm [<80111792>] [<801115fe>] [<8011186e>] [<80109fa3>] [<80146480>] [<801184a5>] [<80146b7d>] [<80109083>] Jan 18 09:11:46 tm [<80109060>] Jan 18 09:11:46 tm Code: 32 03 0f b6 c0 c1 e9 08 33 0c 86 43 89 d0 4a 85 c0 75 eb 89 #oops (not decoded) after sending 8bytes through a cipe #tunnel via netcat. Jan 18 09:14:50 tm frcb10: new peer 172.23.2.8:10210 #Started new cipe tunnel to proove that the lkml still #works. Jan 18 09:15:36 tm cipe_dev_close: not owned?? #ifconfig crashed_cipe_device down. -> removed from list #entry but still in established state :(( Jan 18 09:16:15 tm frcb: CIPE driver vers 1.5.2 (c) Olaf Titz 1996-2000, 250 channels, debug=0 Jan 18 09:17:53 tm frcb: CIPE driver vers 1.5.2 (c) Olaf Titz 1996-2000, 100 channels, debug=1 Jan 18 09:17:53 tm frcb: cipe_alloc_dev 0 #remove lkml and insmod it again. No problem. But #you cannot initialize the crashed_cipe_device #anymore, because: (netstat -an) Proto Recv-Q Send-Q Local Address Foreign Address State udp 2448 0 0.0.0.0:10209 0.0.0.0:* ESTABLISHED This, IMHO, is a kernel bug. I mean, even if the cipe lkml oopses and continues to work, the kernel should eventually close the socket. The problem with the thing above is, that even though the device is not listed in the kernel space anymore we can still send packets to this IP and fill up the RX queue and waste kernel memory. This goes up to 132kb! Has anyone done any further analysis on this. We've been chasing this bug since months already. Sorry for not releasing an advisory on bugtraq nor contacting you, Olaf. We thought it had something to do with our other patches I made to the kernel, but it even crashes on plain vanilla kernel. HTH, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc