[Kittyhawk] Another trace

Eric Van Hensbergen ericvh at gmail.com
Mon Sep 19 16:52:11 EDT 2011

Yeah, my slightly hacky approach was probably similar to your fix
(essentially added a tree_lock and restructured a bit of code).  I am
now getting the bad page fault in con_flush which doesn't make sense
to me:

{4}.0 IAR:  0xc029c580    LR:  0xc01d7aa4  spin_lock_irqsave in bgtree_xmit
{4}.1 IAR:  0xc0007b74    LR:  0xc0007b88  cpu_idle
{4}.2 IAR:  0xc0010cb8    LR:  0xc0010c54  smp_call_function_map (spinning IPI)
{4}.3 IAR:  0xc0007b74    LR:  0xc0007b88  cpu_idle
haredebug[ANL-R00-M0-N00-64]>{4} dump_stack
Sorry, you are not allowed to run dump_stack
Type "help" to get a list of supported JTAG commands
haredebug[ANL-R00-M0-N00-64]>{4} dump_stacks
{4}.0 Dump stack
Core 0 Stack Dump
0xdfeb98e0  0x00000000
0xdfeb9930  0xc01dec64  bgtree_xmit
0xdfeb9950  0xc01dece0  do_write
0xdfeb9970  0xc002d9ec  __call_console_drivers
0xdfeb99b0  0xc002df08  release_console_sem
0xdfeb9a40  0xc002e7ec vprintk
0xdfeb9a80  0xc002e900 printk
0xdfeb9a90  0xc0013030 bad_page_fault
0xdfeb9b50  0xc000fa70
0xdfeb9b80  0xc01de64c __bgtree_xmit call in con_flush

Of course, the page page fault is only half the problem -- the other
half is that we need some way of detecting we are faulting inside the
console/tree and not deadlock attempting to transmit the console
message indicating the error.


On Mon, Sep 19, 2011 at 2:31 PM, Dan Schatzberg <dschatz at bu.edu> wrote:
> Jonathan and I looked at this for a bit. We've found a lock ordering issue
> that can cause a deadlock. Working on confirming it and a fix.
> ---
> Dan Schatzberg
> On Sun, Sep 18, 2011 at 10:37 AM, Eric Van Hensbergen <ericvh at gmail.com>
> wrote:
>> looks like maybe a deadlock if we print during a fault handler when
>> someone else is holding onto tree->lock (?)
>> Of course, the fourth core is timing out talking to one of the other
>> cores because they aren't responding to an IPI, so maybe some sort of
>> deadlock between bgcon->lock {5}.0 and tree->lock {5}.2
>>       -eric
>> haredebug[ANL-R00-M0-N14-64]>{5} dumpier
>> OK
>> {5}.0 IAR:  0xc029c18c    LR:  0xc01de72c      (spin lock in
>> con_flush)  (bgcon->lock)
>> {5}.1 IAR:  0xc0007b74    LR:  0xc0007b88      (cpu idle)
>> {5}.2 IAR:  0xc029c4c8    LR:  0xc01d8094      (spin lock_irqsave in
>> bgtree_enable_inject_wm_interrupt) (tree->lock)
>> {5}.3 IAR:  0xc029c4c8    LR:  0xc01d7a64      (spin lock in
>> bgtree_xmit) (tree->lock)
>> haredebug[ANL-R00-M0-N14-64]>{5} dump_stacks
>> OK
>> {5}.0 Dump stack
>> Core 0 Stack Dump
>> 0xeff89c90  0xc01de7d0     con_flush
>> 0xeff89cc0  0xc01d83f4     bgtree_inject_interrupt
>> 0xeff89ce0  0xc00576e4    handle_IRQ_event
>> 0xeff89d00  0xc0059754    handle_fasteoi_irq
>> 0xeff89d20  0xc0004418    do_IRQ
>> 0xeff89de0  0xc000fc04     ret_from_except
>> 0xeff89e10  0xc01deae0    bg_tty_write
>> 0xeff89e20  0xc019101c    tty_defaul_put_char
>> 0xeff89e30  0xc0196778    opost
>> 0xeff89f30  0xc0197908     n_tty_receive_buf
>> 0xeff89f60  0xc0191d60     flush_to_ldisc
>> 0xeff89f90  0xc0044bf0      run_workqueue
>> 0xeff89fd0  0xc0045314     worker_thread
>> 0xeff89ff0  0xc004a0d4      kthread
>> 0x00000000  0xc0010658
>> {5}.1 Dump stack
>> Core 1 Stack Dump
>> 0xeffc5fe0  0xc0007b88     cpu_idle
>> 0xeffc5ff0  0xc00109bc
>> 0x00000000  0xc0000258
>> {5}.2 Dump stack
>> Core 2 Stack Dump
>> 0xc049de10  0xc01d7a88   bgtree_xmit
>> 0xc049de20  0xc01de9ac   enqueue_retransmit calling
>> bgtree_enable_inj_wm_interrupt
>> 0xc049de50  0xc01deae0   bg_tty_write (calling enque retransmit)
>> 0xc049deb0  0xc0198808  write_chan
>> 0xc049def0  0xc0193808   tty_write
>> 0xc049df10  0xc008c7e0   vfs_write
>> 0xc049df40  0xc008c9b8   file_pos_write
>> 0x7f8e6960  0xc000f5c4    ret_from_syscall
>> Invalid value in Link register: Oldframe=0xc049df40  Newframe=0x7f8e6960
>> {5}.3 Dump stack
>> Core 3 Stack Dump
>> 0xdfeebb80  0xc140692c
>> 0xdfeebbd0  0xc01dde44     do_write
>> 0xdfeebbf0  0xc01ddec0      bg_console_write
>> 0xdfeebc10  0xc002d9ec     call_console drivers
>> 0xdfeebc50  0xc002df08     release_console_Sem
>> 0xdfeebce0  0xc002e7ec     vprintk
>> 0xdfeebd20  0xc002e900     printk
>> 0xdfeebd70  0xc0010cf0      smp_call_function_map (timing out response
>> from other cpu)
>> 0xdfeebda0  0xc0010e08     smp_flush_tlb_page
>> 0xdfeebde0  0xc006d4c0     do_wp_page
>> 0xdfeebe70  0xc006ffa8       __handle_mm_fault
>> 0xdfeebf40  0xc00132a4      do_page_fault
>> 0x7f8e6940  0xc000fa00      handle_page_fault
>> Invalid value in Link register: Oldframe=0xdfeebf40  Newframe=0x7f8e6940
>> _______________________________________________
>> Kittyhawk mailing list
>> Kittyhawk at cs.bu.edu
>> http://cs-mailman.bu.edu/mailman/listinfo/kittyhawk

More information about the Kittyhawk mailing list