<< | Thread Index | >> ]    [ << | Date Index | >> ]

To: <cipe-l,AT,inka,DOT,de>
Subject: RE: Tunnel collapses with "Bad file descriptor"
From: "Mark" <msalists,AT,gmx,DOT,net>
Date: Fri, 14 May 2004 11:08:17 -0700
Importance: Normal
In-reply-to: <40A4ACEF.5020209@ray.fi>

After I upgraded the tunnels to release 18 2 days ago it has not crashed any
more - but it has worked before without problems, the problems pretty much
started over night. Then it would barely run for an hour without crashing. I
have some frequent ping traffic going over the tunnel, but besides that
almost no traffic.
I also upgraded the kernel version when I did the upgrade to 1.4.5-18, maybe
that fixed the problem. I do not remember if they originally started after I
upgraded the kernel - but I don't think I have done any kernel changes
around the time it started.

Anyway, I installed your new bugfix release number 19 - thanks for providing
this. Let's hope my problems are fixed :)

MARK

-----Original Message-----
From: owner-cipe-l,AT,inka,DOT,de [mailto:owner-cipe-l,AT,inka,DOT,de On 
Behalf Of Tommi
Kyntola
Sent: Friday, May 14, 2004 4:27 AM
Cc: cipe-l,AT,inka,DOT,de
Subject: Re: Tunnel collapses with "Bad file descriptor"

Hello Mark,

I'm very doubtful about the version difference is causing those problems.
I've had different versions running without any problems. Besides that
portion of the code should not have anything to do with the other end.

Mike's posting below on the otherhand is totally valid. I've also had cipe
daemons stopping because of EINTRs occuring there on fedora kernels.
Moreover, Mike's fix seems valid and I haven't experienced a single problem
after that fix.

I've made a cipe-1.4.5-19 rpm for fedora that incorporates that fix at :
http://www.hut.fi/u/tkyntola/linux/

However, what you described was a EBADF in the same place, which is really
strange. You might still want to try it with that fix included to rule out
the possiblity of a wrong error message being printed there. Because I
cannot see how the fd for /dev/urandom could get closed or corrupted.

cheers,
        Tommi "Kynde" Kyntola           tommi.kyntola,AT,ray,DOT,fi

> I just noticed that the two tunnel ends where running different 
> versions of cipe. One had 1.4.5-18 (Fedora/RedHat), the other had 
> 1.4.5-16 (Fedora/RedHat). Could this have been the reason for the 
> problems?
> 
> Thanks,
> 
> MARK
> 
> 
> -----Original Message-----
> From: owner-cipe-l,AT,inka,DOT,de [mailto:owner-cipe-l,AT,inka,DOT,de On 
> Behalf Of 
> Michael Fischer
> Sent: Sunday, April 18, 2004 3:27 PM
> To: Mark
> Cc: cipe-l,AT,inka,DOT,de
> Subject: Re: Tunnel collapses with "Bad file descriptor"
> 
> 
> Dear Mark,
> 
> I've had problems in the past with my cipe daemon dying at this same 
> spot in the code, but I got the error "Interrupted system call", which 
> can happen normally during a blocking read and is not handled properly 
> by the cipe code.  (See ciped.c in the source around the line that 
> contains the text "kxchg: read(r)".)  What that code is trying to do 
> is to get some random bits from /dev/urandom.  /dev/urandom is opened 
> by main().  Its file descriptor is passed to mainloop() as the third 
> argument and is in turn passed to kxchg(), which generates the 
> observed error message.  The return from the original open() call is 
> checked for validity, so it isn't at all clear why the read() call in 
> kxchg() finds the descriptor bad.  Perhaps /dev/urandom is somehow 
> getting closed, or the descriptor is getting corrupted, or an 
> incorrect error message is getting logged and you're really 
> encountering the same problem that I was.
> 
> You can find my old posting about the "Interrupted system call" error 
> on the cipe-l archives at 
> http://sites.inka.de/bigred/archive/cipe-l/2003-12/msg00003.html
> Good luck at tracking this one down!
> 
> --Mike
> 
> Mark wrote:
> 
> 
>>Hi,
>>
>>I have a cipe (v.1.4.5) tunnel running between a Redhat9 and a Fedora 
>>core 1 machine. The tunnel gets established successfully and works 
>>fine until at some point - sometimes after a few hours, sometimes 
>>after a few days - it collapses with this error message:
>>
>>Apr 14 19:22:50 lvd1 ciped-cb[2658]: kxchg: read(r): Bad file descriptor
>>Apr 14 19:22:50 lvd1 ciped-cb[2658]: Interface stats 22552096  201358    4
>>0    0     1          0         0        0
>>      0    0    0    0     0       0          0
>>Apr 14 19:22:50 lvd1 ciped-cb[2658]: KX stats: rreq=0, req=335, 
>>ind=336, indb=0, ack=328, ackb=0, unknown=0 Apr 14 19:22:50 lvd1
>>ciped-cb[2658]:
>>cipcb1: daemon exiting
>>
>>Any idea where this might come from?
>>The machines are sitting locally within a LAN - it's just a test setup 
>>for now...
>>
>>Thanks,
>>
>>MARK
>>
>>
>>--
>>Message sent by the cipe-l,AT,inka,DOT,de mailing list.
>>Unsubscribe: mail majordomo,AT,inka,DOT,de, "unsubscribe cipe-l" in body 
>>Other commands available with "help" in body to the same address. CIPE 
>>info and list archive: 
>><URL:http://sites.inka.de/~bigred/devel/cipe.html
>>
> 
> 
> --
> Message sent by the cipe-l,AT,inka,DOT,de mailing list.
> Unsubscribe: mail majordomo,AT,inka,DOT,de, "unsubscribe cipe-l" in body 
> Other commands available with "help" in body to the same address. CIPE 
> info and list archive: 
> <URL:http://sites.inka.de/~bigred/devel/cipe.html>
> 
> 
> --
> Message sent by the cipe-l,AT,inka,DOT,de mailing list.
> Unsubscribe: mail majordomo,AT,inka,DOT,de, "unsubscribe cipe-l" in body 
> Other commands available with "help" in body to the same address.  
> CIPE info and list archive:  
> <URL:http://sites.inka.de/~bigred/devel/cipe.html>
> 

--
Message sent by the cipe-l,AT,inka,DOT,de mailing list.
Unsubscribe: mail majordomo,AT,inka,DOT,de, "unsubscribe cipe-l" in body Other
commands available with "help" in body to the same address. CIPE info and
list archive: <URL:http://sites.inka.de/~bigred/devel/cipe.html>


<< | Thread Index | >> ]    [ << | Date Index | >> ]