[csw-maintainers] [ITP] gdb 7.2

Peter FELECAN pfelecan at opencsw.org
Thu Jan 6 17:15:42 CET 2011


For those curious, here is a strange domino game that I discovered
testing the new gdb package. It's quite long but I wrote this in the
spirit of sharing and documenting a typical issue involved when creating
packages. Enjoy.

When debugging a trivial program:

#include <stdio.h>

int main(int argc, char* argv)
{
	printf("simple test\n");
	exit(0);
}

compiled as follows:

gcc -ggdb3 tgdb.c -o tgdb

gdb tgdb

an internal error is encountered:

GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /karmamaya/tmp/tgdb...done.
(gdb) b main
Breakpoint 1 at 0x80506e0: file tgdb.c, line 5.
(gdb) r
Starting program: /karmamaya/tmp/tgdb 
[New LWP 2]
[LWP 2 exited]
thread.c:598: internal-error: is_thread_state: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

thread.c:598: internal-error: is_thread_state: Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) y
simple test
Abort (core dumped)

The stack trace as given by dbx is:

For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.6' in your .dbxrc
Reading gdb
core file header read successfully
Reading ld.so.1
Reading libintl.so.8.0.2
Reading libdl.so.1
Reading libncurses.so.5.7
Reading libz.so.1.2.5
Reading libsocket.so.1
Reading libnsl.so.1
Reading libm.so.2
Reading libiconv.so.2.5.0
Reading libresolv.so.2
Reading librt.so.1
Reading libpthread.so.1
Reading libpython2.6.so.1.0
Reading libexpat.so.1.5.2
Reading libc.so.1
Reading libsec.so.1
Reading libaio.so.1
Reading libmd.so.1
Reading libm.so.1
Reading libavl.so.1
Reading libc_db.so.1
t at 1 (l at 1) program terminated by signal ABRT (Abort)
0xfea19315: __lwp_kill+0x0015:  jae      __lwp_kill+0x23        [ 0xfea19323, .+0xe ]
Current function is set_internal_problem_cmd
 1229   {
(dbx) where                                                                  
current thread: t at 1
  [1] __lwp_kill(0x1, 0x6), at 0xfea19315 
  [2] _thr_kill(0x1, 0x6), at 0xfea14188 
  [3] raise(0x6), at 0xfe9c1d73 
  [4] abort(0x4, 0x8047590, 0x80475b8, 0x80989c0, 0x7fffffff, 0x7fffffff), at 0xfe9a1bbd 
=>[5] set_internal_problem_cmd(args = 0x1 "<bad address 0x1>", from_tty = 6), line 1229 in "utils.c"
  [6] internal_vproblem(problem = (nil), file = 0x80475b8 "\xd8u^D^H\x94\x8a^I^HV^B", line = 598, fmt = 0x832b865 "%s: Assertion `%s' failed.", ap = 0x804760c "\xeeY+^Hb\xb82^H^B"), line 1176 in "utils.c"
  [7] internal_verror(file = (nil), line = 598, fmt = 0x832b865 "%s: Assertion `%s' failed.", ap = 0x804760c "\xeeY+^Hb\xb82^H^B"), line 1191 in "utils.c"
  [8] wrap_here(indent = (nil)), line 2302 in "utils.c"
  [9] is_stopped(ptid = RECORD), line 605 in "thread.c"
  [10] is_exited(ptid = RECORD), line 611 in "thread.c"
  [11] switch_to_thread(ptid = RECORD), line 899 in "thread.c"
  [12] startup_inferior(ntraps = 134510296), line 471 in "fork-child.c"
  [13] procfs_create_inferior(ops = 0x83dbf18, exec_file = 0x83d9648 "/karmamaya/tmp/tgdb", allargs = 0x8502410 "", env = 0x84f45b8, from_tty = 1), line 4714 in "procfs.c"
  [14] target_create_inferior(exec_file = (nil), args = 0x8047808 "Hx^D^H\xdc\xec^T^HH\x96=^H^P$P^H\xb8EO^H^A", env = 0x84f45b8, from_tty = 0), line 486 in "target.c"
  [15] run_command_1(args = (nil), from_tty = 0, tbreak_at_main = 0), line 565 in "infcmd.c"
  [16] execute_command(p = 0x83c7cb9 "", from_tty = 134510712), line 422 in "top.c"
  [17] command_handler(command = 0x8047898 "\xc8x^D^H\x9a~^V^H^T"), line 498 in "event-top.c"
  [18] command_line_handler(rl = 0x8502380 "r"), line 702 in "event-top.c"
  [19] rl_callback_read_char(), line 205 in "callback.c"
  [20] rl_callback_read_char_wrapper(client_data = (nil)), line 178 in "event-top.c"
  [21] handle_file_event(data = UNION), line 817 in "event-loop.c"
  [22] process_event(), line 399 in "event-loop.c"
  [23] gdb_do_one_event(data = (nil)), line 464 in "event-loop.c"
  [24] catch_errors(func = 0x8167244 = &gdb_do_one_event(void *data), func_args = (nil), errstring = 0x8320fd0 "", mask = 6), line 518 in "exceptions.c"
  [25] tui_command_loop(data = (nil)), line 171 in "tui-interp.c"
  [26] current_interp_command_loop(), line 291 in "interps.c"
  [27] captured_command_loop(data = (nil)), line 227 in "main.c"
  [28] catch_errors(func = 0x808f4e0 = &`gdb`main.c`captured_command_loop(void *data), func_args = (nil), errstring = 0x8318b53 "", mask = 6), line 518 in "exceptions.c"
  [29] captured_main(data = (nil)), line 910 in "main.c"
  [30] catch_errors(func = 0x808f518 = &`gdb`main.c`captured_main(register void *data), func_args = 0x8047b80, errstring = 0x8318b53 "", mask = 6), line 518 in "exceptions.c"
  [31] gdb_main(args = (nil)), line 919 in "main.c"
  [32] main(argc = 0, argv = (nil)), line 34 in "gdb.c"
(dbx) quit

After analysis of the source it appears that when a SHELL
environment variable is found its content is used to execute gdb's
inferior process and it waits on the first significant event.

In our case, the content of SHELL is /opt/csw/bin/bash and as we
can see the process tree of gdb is

 ptree $(pgrep gdb)
 ...
    2725  /karmamaya/tmp/gdb-7.2/gdb/gdb /karmamaya/tmp/tgdb
      2726  /opt/csw/bin/bash -c exec /karmamaya/tmp/tgdb 

 and its process flags are:

 pflags 2726
:   /opt/csw/bin/bash -c exec /karmamaya/tmp/tgdb 
    data model = _ILP32  flags = RLC|MSACCT|MSFORK
    flttrace = 0xfffffbff
    sigtrace = 0x67c5deff 0x0000fff6
        HUP|INT|QUIT|ILL|TRAP|ABRT|EMT|FPE|BUS|SEGV|SYS|PIPE|TERM|USR1|USR2|PWR|STOP|TSTP|CONT|TTIN|TTOU|XCPU|XFSZ|FREEZE|THAW|LOST|XRES|JVM1|JVM2|RTMIN|RTMIN+1|RTMIN+2|RTMIN+3|RTMAX-3|RTMAX-2|RTMAX-1|RTMAX
    entryset = 0x00000001 0x00000000 0x00000000 0x00000000
               0x80000000 0x00000000 0x00000000 0x00000000
    exitset  = 0x00000400 0x04000000 0x00000000 0x00000000
               0xc0000000 0x00000000 0x00000000 0x00000000
 /1:    flags = STOPPED|ISTOP|ASLEEP  lwp_wait(0x2,0x8047a8c)
    why = PR_REQUESTED
 /2:    flags = STOPPED|ISTOP  lwp_exit()
    why = PR_SYSENTRY  what = lwp_exit
    sigmask = 0xffbffeff,0x0000ffff

The first significant event is the entry in a system call, lwp_exit, and
gdb's thread management removes the thread from the active list;
however, and here is a logic fault, gdb tries to switch to the first
thread of the inferior process to continue its execution which is to
execute the debugged process; unfortunately the thread candidate toward
which to change the context is the already dead thread, consequently its
identifier is no more in the active thread list which triggers the
assertion which aborts the program.

It appears that this behavior is manifested only when we use an
Open CSW bash and with the SUN provided version everything works
correctly.

Trussing a simple bash session (start and immediate exit):

truss -u a.out -u ld:: -u :: -o bash-truss /opt/csw/bin/bash

we observe the following sequence:

lwp_create(0x080478A0, LWP_SUSPENDED, 0x08047AC4) = 2
/2:	lwp_create()	(returning as new lwp ...)	= 0
/1:	schedctl()					= 0xFEFF4000
/1:	lwp_continue(2)					= 0
/2:	setustack(0xFEAD0260)
/2:	schedctl()					= 0xFEFF4010
/2:	lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0x0000FFF7) = 0xFFBFFEFF [0x0000FFFF]
/2:	lwp_exit()
/1:	lwp_wait(2, 0x08047AF8)				= 0

Setting a breakpoint in the thread creation:

stop in __lwp_create

reveals that the ephemeral thread is created when bash sets its
text domain in bindtextdomain() from libintl from the ggettext package.

After the dull exploration of of the corresponding sources we
discover that, in order to test the library's behavior with
regard to multi-threading, a singleton test is made in
gettext-runtime/intl/lock.h:

#if USE_POSIX_THREADS

/* Use the POSIX threads library.  */

# if PTHREAD_IN_USE_DETECTION_HARD

/* The function to be executed by a dummy thread.  */
static void *
dummy_thread_func (void *arg)
{
  return arg;
}

int
glthread_in_use (void)
{
  static int tested;
  static int result; /* 1: linked with -lpthread, 0: only with libc */

  if (!tested)
    {
      pthread_t thread;

      if (pthread_create (&thread, NULL, dummy_thread_func, NULL) != 0)
	/* Thread creation failed.  */
	result = 0;
      else
	{
	  /* Thread creation works.  */
	  void *retval;
	  if (pthread_join (thread, &retval) != 0)
	    abort ();
	  result = 1;
	}
      tested = 1;
    }
  return result;
}

# endif

PTHREAD_IN_USE_DETECTION_HARD is defined in the configure script where
we find this comment:

# On Solaris and HP-UX, most pthread functions exist also in libc.
# Therefore pthread_in_use() needs to actually try to create a
# thread: pthread_create from libc will fail, whereas
# pthread_create will actually create a thread.

After verification it appears that libc prior to Solaris 10
contains the POSIX threads symbols but their definition are not
operational which means that if the binary is not linked with
libpthread the threading operations doesn't work and exactly this
situation is tested in glthread_in_use() which is an astute but
invasive, in our case, hack.

Unfortunately, our support for Solaris 9 and the building of
ggettext being done on that operating environment, makes that we
face a delicate decision of how to make gdb operational when
using an inferior shell which was linked with the libintl supplied
by our ggettext.

How to avoid this situation?

There are the solution that I explored:

1. correct gdb's logic

  difficult to implement and the required tests are extensive

2. document the issue and recommend that when using gdb with a non
   Open CSW shell must be used

   the least costly and gives the opportunity to somebody else to
   test the other shells that we deliver

3. enforce programmatically that no Open CSW shell is used with gdb

  not very elegant but avoids surprises for the uninformed user;
  minimal programming effort: replace any Open CSW shell with
  /sbin/sh.

4. use the SUN provided libintl when linking our shells

  this is an elegant solution but doesn't fall in my perimeter of
  action and needs strong coordination with many maintainers, some
  which are maybe retired which forces me to take the
  responsibility for packages for which I'm less interested.

5. find another robust test for libpthread linkage in libintl

  Quite difficult and involves a lot a fudgin arround with symbols
  and other esoteric incantations (this is a complex and sensitive
  package).

Finally I'm choosing 3 and, of course, 2...

-- 
Peter


More information about the maintainers mailing list