[ << | Thread Index | >> ]    [ << | Date Index | >> ]

Subject: Re: [PATCH 2.2.17] _SMP_ doesn't export tasklist_lock needed in cipe.h
From: ratz <ratz,AT,tac,DOT,ch>
Date: Fri, 1 Dec 2000 14:58:34 +0100
In-reply-to: <3A22A136.35147233@tac.ch>

Hi Olaf,

Apologies for the late reply but I was sick (no worries, it wasn't
because of a non-funtional cipe-tunnel) and therefor haven't been
able to reply.

Olaf Titz wrote:
> 
> > DON'T try to run cipe-1.4.x on a SMP machine. It crashes really
> > badly. I try to reconstruct it but this will take some time.
> 
> For which .x and (more importantly) for which Linux version? I'd very
> much like to know. I don't know much about the whole SMP stuff and
> don't have an SMP system to test it.

It was 1.4.3 and btw I don't know much either about SMP I just can't
get it working under 2.2.17. As soon as I try to ping the kernel crashes
really badly. Not even sysrq is working. I haven't tried any kdb attemps
yet because on UP with the same configuration it's working and I do need
it now. However I'd like to find out anyway what this behavior is like.
Ahem, Kambiz, you stated that you're running the same cipe-version with
an SMP kernel. Do you have SMP on both sides of the cipe tunnel?
 
> About the tasklist lock: I have changed tasklist_LOCK() to NOP in CIPE
> 1.4.5 and 1.5 for Linux 2.2. I'm not really sure if this is correct

I'll check out that version first because I did the same thing for 1.4.3.
The NOP seems to cure the problem. But I have to investigate further.

> but if it isn't correct, the result should be a race condition with
> fork() and exit(), which could cause CIPE to accidentally re-assign a
> used device, leading to EBUSY in ciped, at worst. Surely not a
> reproducable crasher.

ouch!
 
> If it does crash because of this problem, then the fix for the Linux
> kernel is the right one, as modules which need to access the task list
> _must_ have access to that lock, then.

Agreed.
 
> > the ../include/asm/spinlock.h is causing the trouble:
> 
> Are you _sure_ you're reading this right? (Not my code, btw.)
> That not working would basically mean Linux under SMP were unusable at
> all.

Nope, I'm not sure, just trying to follow into the depths of SMP
process locking. If you code does a tasklist_LOCK() then IMHO it
has to call the read_lock(&tasklist_lock) which would be defined
in the above mentioned file for SMP.
 
> > /*
> >  * On x86, we implement read-write locks as a 32-bit counter
> >  * with the high bit (sign) being the "write" bit.
> >  *
> >  * The inline assembly is non-obvious. Think about it.
> >  */
> > #define read_lock(rw)   \
> >         asm volatile("\n1:\t" \
> >                      "lock ; incl %0\n\t" \
> >                      "js 2f\n" \
> >                      ".section .text.lock,\"ax\"\n" \
> >                      "2:\tlock ; decl %0\n" \
> >                      "3:\trep; nop\n\t" \
> >                      "cmpl $0,%0\n\t" \
> >                      "js 3b\n\t" \
> >                      "jmp 1b\n" \
> >                      ".previous" \
> >                      :"=m" (__dummy_lock(&(rw)->lock)))
> > #define read_unlock(rw) \
> >         asm volatile("lock ; decl %0" \
> >                 :"=m" (__dummy_lock(&(rw)->lock)))
> >
> > Are you sure about this, Olaf?
> 
> As stated in the header file, it is non-obvious. ;-)

Do you really get it? I don't, honestly this is too complicated for me.
Wait another 4 or 5 years and I tell you what this means ;)

> The .section and .previous commands cause code to be moved around, and
> the actual generated machine code looks like this:

Jeez', you really understand it :)

>   ; Start of read_lock();
> 1:      lock; incl %0
>         js 2f           ; Jump to spinning code if write-locked
>   ; End of read_lock(); C code continues here
> 
>   ; Elsewhere in the code segment:
> 2:      lock; decl %0
> 3:      rep; nop
>         cmpl $0, %0
>         js 3b           ; Respin if still locked
>         jmp 1b          ; Unlocked, retry to acquire lock
> 
> which looks much less weird. The reason for moving the code around is to
> make the common case (i.e. unlocked and getting the lock succeeds) as
> short as possible and get the non-common case out of the cache lines.

Thank you, this make sense now. Ok, I agree with you, that this isn't
to cause of the trouble.
 
> > > This trivial patch includes it. However I ask the author to cross-
> > > check its correctness since I'm not sure if he really wanted it
> > > that way because I can load the module but I can't get a tunnel
> > > interface configured and running with ciped-cb -o myconfig. But
> > > this may also be due to my lack of knowledge about how cipe works.
> 
> What are the error messages, if any? (First, the argument to -o has to
> be an absolute path.)

Yep, found that out too :)
Now under UP I got it working once but then I changed something in the
source code (actually DEVNAME from 'cip' to 'vpn') and now it doesn't
work anymore. It sais:

kxchg: send: Invalid argument

[note: I had this before when trying to call connect() with a wrong IP
configured in myconfig. It calls err("opendev: bind"); which is not
really helpful. I found it out by stracing ciped. Would you mind including
following patch?

--------------------------------------------------------------------------
--- ciped.c~    Fri Dec  1 14:33:38 2000
+++ ciped.c     Fri Dec  1 14:33:55 2000
@@ -419,7 +419,7 @@
             err("opendev: bind");
         if (OA(peer)) {
             if (connect(f, (struct sockaddr *)OA(peer), sizeof(*OA(peer)))<0)
-                err("opendev: bind");
+                err("opendev: connect");
         } else
             amiss("peer");
     }
--------------------------------------------------------------------------

Anyway, I think the return value of -EINVAL is not the best choice.]

vpncb0: changing my address: 10.1.2.1 [note: this is the peer's physical
address!]
vpncb0: re-keying cryptpad
vpncb0: encrypt typ 0 pad 7 len 104
vpncb0: returning 9 flags 86f4c3a8
 
> > Is Olaf reading this mailinglist or should I contact him privately?
> 
> I'm reading the mailinglist and trying to respond to anything I
> consider important (and possible bugs in my code do certainly fall
> into that category ;-)

I'm sorry being so impatient. Hope you're not too grumpy now ;)

Best regards and thank you for replying and clarification,
Roberto Nibali, ratz

-- 
mailto: `echo NrOatSz,AT,tPacA,DOT,cMh | sed 's/[NOSPAM]//g'`





[ << | Thread Index | >> ]    [ << | Date Index | >> ]