[Kittyhawk] KH is not fault-tolerant/fault oblivious

Eric Van Hensbergen ericvh at gmail.com
Mon Sep 12 14:44:40 EDT 2011


got a new behavior during some poking around today.

   {0}.0: bgtree: inject fifo timed out!

..and then everything hung.  Of course this could just be the IO node
responding to a jammed up compute node tree level.
FWIW, I had this on a 64-node bump doing the same sort of operations I
was doing the other day although there was less determinism (it seemed
to happen after I poked it several times instead of just the first
time).  There were no other error messages on the console.

         -eric


On Mon, Sep 12, 2011 at 9:16 AM, Jonathan Appavoo <jappavoo at bu.edu> wrote:
> We looked at it briefly on Friday and are going to start debugging it now.
>
> Jonathan.
>
> On Sep 12, 2011, at 9:29 AM, Eric Van Hensbergen wrote:
>
>> Actually -- looking at the code what Jan says is consistent - assuming
>> its returning from the alloc_skb it should be printing a message about
>> not being able to alloc (within the tree driver) and dropping the
>> packet which should dequeue it from fifo.  But I'm never seeing the
>> message because TRACE() is defined NULL.  I'll fix that for my future
>> runs.
>>
>> It could be just other parts of the system are more unhappy than us
>> without GFP_ATOMIC allocation, but we are only seeing stack dumps from
>> the tree receive interrupt....
>>
>> On Mon, Sep 12, 2011 at 6:39 AM, Jan Stoess <jan.stoess at kit.edu> wrote:
>>> On 9/9/2011 11:15 PM, ron minnich wrote:
>>>
>>> ow.
>>>
>>> Wouldn't it make sense to have a less fatal reaction to out of memory
>>> conditions, like "drop packet" ...
>>>
>>> It seems to me that this is just a stack dump, right? Doesn't mean that this
>>> is killing Linux already. Might be that it's just unresponsive as hell
>>> dumping out all those skb allocation failure messages, but eventually bails
>>> out somewhere else.
>>>
>>>
>>> --
>>> Dr. Jan Stoess, KIT System Architecture Group
>>> Phone: +49 (721) 6084 4056
>> _______________________________________________
>> Kittyhawk mailing list
>> Kittyhawk at cs.bu.edu
>> http://cs-mailman.bu.edu/mailman/listinfo/kittyhawk
>
>



More information about the Kittyhawk mailing list