<< | Thread Index | >> ]    [ << | Date Index | >> ]

Subject: Re: BUG: crasher [IMPORTANT PATCH]
From: Roberto Nibali <ratz,AT,tac,DOT,ch>
Date: Fri, 18 Jan 2002 18:12:57 +0100
In-reply-to: <E16NgCt-0001da-00@bigred.inka.de>

Hi,

> > Holy cow! That's why our cipe tunnels always crashed when
> > connecting with netcat and sending 7bytes + 1byte return!!!
> 
> Were you able to reestablish the connection(s) without
> rebooting/reloading the module?

Nope, I had to rmmod the kernel module. And if it was the
first vpn device I couldn't even rmmod the module anymore.
This is another bug I hope to fix anytime soon. And you
have to 'ip link set vpn-device down' or 'ifconfig 
vpn-device' down. The kernel module happily works with
other tunnels and you can add or remove tunnels to your
likings. But you need to either reboot the machine or
write a kernel module that flushes the receive queue of
the crashed vpn-device. Olaf, despite the missing check
and the oops it generates, why the hell doesn't the kernel
close the socket? Example output documented below:

Jan 18 09:11:46 tm Unable to handle kernel paging request at virtual address
84000000
Jan 18 09:11:46 tm current->tss.cr3 = 033e7000, %cr3 = 033e7000
Jan 18 09:11:46 tm *pde = 00000000
Jan 18 09:11:46 tm Oops: 0000
Jan 18 09:11:46 tm CPU:    0
Jan 18 09:11:46 tm EIP:    0010:[<8482d33a>]
Jan 18 09:11:46 tm EFLAGS: 00000282
Jan 18 09:11:46 tm eax: fcfe1050   ebx: 84000000   ecx: 71737d50   edx: 
fcfe10a3
Jan 18 09:11:46 tm esi: 8482f08c   edi: 8311c010   ebp: 8366dd70   esp: 
8366dd10
Jan 18 09:11:46 tm ds: 0018   es: 0018   ss: 0018
Jan 18 09:11:46 tm Process frapd-cb (pid: 493, process nr: 10,
stackpage=8366d000)
Jan 18 09:11:46 tm Stack: 8482cd21 80fe10a8 fffffffc 8311c02c 8022e000 
832d2980
83ee2904 80fe10b0 
Jan 18 09:11:46 tm 8311c010 00000246 8366dd6c 8366c000 8482b6c3 8311c010
80fe10a8 8366dd70 
Jan 18 09:11:46 tm 00000246 832d2840 8311c010 8366df18 831f2400 836e6000
e1270015 080217ac 
Jan 18 09:11:46 tm Call Trace: [<8482cd21>] [<8482b6c3>] [<8482b9b4>]
[<8016dacb>] [<801457e9>] [<801115fe>] [<80146400>] 
Jan 18 09:11:46 tm [<80111792>] [<801115fe>] [<8011186e>] [<80109fa3>]
[<80146480>] [<801184a5>] [<80146b7d>] [<80109083>] 
Jan 18 09:11:46 tm [<80109060>] 
Jan 18 09:11:46 tm Code: 32 03 0f b6 c0 c1 e9 08 33 0c 86 43 89 d0 4a 85 c0 75
eb 89

#oops (not decoded) after sending 8bytes through a cipe
#tunnel via netcat.

Jan 18 09:14:50 tm frcb10: new peer 172.23.2.8:10210

#Started new cipe tunnel to proove that the lkml still
#works.

Jan 18 09:15:36 tm cipe_dev_close: not owned??

#ifconfig crashed_cipe_device down. -> removed from list
#entry but still in established state :((

Jan 18 09:16:15 tm frcb: CIPE driver vers 1.5.2 (c) Olaf Titz 1996-2000, 250
channels, debug=0
Jan 18 09:17:53 tm frcb: CIPE driver vers 1.5.2 (c) Olaf Titz 1996-2000, 100
channels, debug=1
Jan 18 09:17:53 tm frcb: cipe_alloc_dev 0

#remove lkml and insmod it again. No problem. But
#you cannot initialize the crashed_cipe_device
#anymore, because:

(netstat -an)
Proto Recv-Q Send-Q Local Address           Foreign Address         State     
 
udp     2448      0 0.0.0.0:10209           0.0.0.0:*               
ESTABLISHED 

This, IMHO, is a kernel bug. I mean, even if the cipe lkml
oopses and continues to work, the kernel should eventually
close the socket. The problem with the thing above is, that
even though the device is not listed in the kernel space
anymore we can still send packets to this IP and fill up
the RX queue and waste kernel memory. This goes up to 132kb!

Has anyone done any further analysis on this. We've been chasing
this bug since months already. Sorry for not releasing an
advisory on bugtraq nor contacting you, Olaf. We thought it
had something to do with our other patches I made to the kernel,
but it even crashes on plain vanilla kernel.

HTH,
Roberto Nibali, ratz

-- 
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc





<< | Thread Index | >> ]    [ << | Date Index | >> ]