[haiku-bugs] Re: [Haiku] #8650: KDL launching WebPositive development version

  • From: "bonefish" <trac@xxxxxxxxxxxx>
  • Date: Sat, 30 Jun 2012 18:07:30 -0000

#8650: KDL launching WebPositive development version
----------------------+----------------------------
   Reporter:  aldeck  |      Owner:  nobody
       Type:  bug     |     Status:  new
   Priority:  high    |  Milestone:  R1
  Component:  System  |    Version:  R1/Development
 Resolution:          |   Keywords:
 Blocked By:          |   Blocking:
Has a Patch:  0       |   Platform:  All
----------------------+----------------------------

Comment (by bonefish):

 Replying to [comment:8 anevilyak]:
 > Replying to [comment:5 bonefish]:
 > > So maybe it's not an unsafe syscall after all, but the faults are
 caused by a stack corruption with an entirely different cause. Syscall
 tracing might help to clarify that point, but I don't really have the time
 to play with that ATM.
 >
 > That case unfortunately appears to be the more likely cause. I've gone
 through and reviewed all usage of user_strlcpy in the kernel, and there
 doesn't appear to be a tangible culprit to fit the bill as far as that's
 concerned.

 Have you checked the `user_memcpy()` uses as well? There are significantly
 more of those.

 A kernel stack address of 0 is not permissible, but the address
 specification is `B_ANY_KERNEL_ADDRESS` in this case, i.e. the address
 passed in is ignored. I don't think mapping the stacks has anything to do
 with the problem. It's just the last VM operation when creating the
 thread.

 If you want to track this further, I'd recommend the following:
  * Add a `panic()` right after the `dprintf()` we seen in the syslog
 ([http://cgit.haiku-os.org/haiku/tree/src/system/kernel/vm/vm.cpp#n4037]).
 When I tested it, the stack was already corrupt at that point, so no
 reason to continue.
  * Enable kernel tracing for threads and syscalls. This helps to see the
 syscalls after the creation of the thread until its demise. Whether the
 cause is a `user_memcpy()` or not, it is very likely that the corruption
 happens in a syscall or code called from it. To help with checking the
 `user_memcpy()` theory add a `ktrace_printf()` in it. Since
 `user_memcpy()` in syscalls usually copies from or to the kernel stack, it
 is probably not easily possible to see whether stack is overwritten
 erroneously. But maybe something catches the eye (like source and
 destination addresses both being kernel addresses or a huge copy size).
  * If the thread overwrites its own stack, then likely the last syscall is
 to blame (well, there could be an interrupt where this happens, but
 interrupts happen all the time). Inserting `ktrace_printf()`s at some
 places and enabling stack traces for them (need to be long enough), should
 help to narrow down when exactly the stack gets broken.
  * If some other thread overwrites this thread's stack or if the heap is
 corrupted, things are more complicated. A possible strategy is to try and
 find out what is corrupted and add respective sanity checks at various
 places. This would at least narrow it down. Hopefully some more useful
 lead presents itself.

-- 
Ticket URL: <http://dev.haiku-os.org/ticket/8650#comment:9>
Haiku <http://dev.haiku-os.org>
Haiku - the operating system.

Other related posts: