Author |
Message |
Jerry Leichte #1 / 10
|
 Strange failure (internal assertion failure?)
I have a C++ program on AIX 4.3.3 that exited with the following message: The assert subroutine failed: dg_cv != (cond_t *) NULL, file clnt_dg.c, line 167 This isn't coming from anywhere in my code; it seems to be internal. After printing this message, the process vanished. (I catch various signals like SIGSEGV; none of them were sent.) The code is heavily multithreaded, and seem to be running normally when it suddenly died. I haven't been able to reproduce the failure since, despite repeated tries. Anyone have any clues as to what might cause this message to appear? -- Jerry
|
Tue, 06 Jul 2004 08:13:49 GMT |
|
 |
Mark M #2 / 10
|
 Strange failure (internal assertion failure?)
Are you getting a core file created after the program exits? It sounds like your program has a memory leak. Mark
|
Wed, 07 Jul 2004 02:52:04 GMT |
|
 |
Paul Pluzhniko #3 / 10
|
 Strange failure (internal assertion failure?)
news:3c477d68$0$181$7586b60c@news.frii.com...
There are clnt_ symbols in libc.a, but surely you do not expect shipping libc.a to be compiled with assert()s in place, do you? Anyway, from my 4.3.3.0 system: $ nm /lib/libc.a | grep 'clnt_.* f' ../../../../../../../src/bos/usr/ccs/lib/libc/clnt_generic.c f - ../../../../../../../src/bos/usr/ccs/lib/libc/clnt_perror.c f - ../../../../../../../src/bos/usr/ccs/lib/libc/clnt_raw.c f - ../../../../../../../src/bos/usr/ccs/lib/libc/clnt_simple.c f - ../../../../../../../src/bos/usr/ccs/lib/libc/clnt_tcp.c f - ../../../../../../../src/bos/usr/ccs/lib/libc/clnt_udp.c f - To the OP: the assert is coming from one of the object modules that is linked into your executable. Find out which one has it by running 'nm' on each object or library on your link line.
|
Fri, 09 Jul 2004 15:09:15 GMT |
|
 |
Jerry Leichte #4 / 10
|
 Strange failure (internal assertion failure?)
Funny you should ask that question. Yes, there is a core file. However, all the de{*filter*}s I've tried on it - dbx, xldb, gdb, even adb and Bob Hablutzel's coralist - die trying to read it. adb complains that it's a 4.3 core file, which it can't read. (It helpfully suggests changing the dumpfile format to pre-403, which of course is no help with an existing core file!) gdb says the file is not a core file - probably the same thing. dbx takes a long time before dying with, I think, a SEGV. xldb produces: xldb (map.C, line 304): A file or directory in the path name does not exist. Segmentation fault So I have no clue where the program was when it died.
Memory leak or memory scribble? I can believe the latter - though I have no other evidence that it's occuring (since other runs of the same code - on multiple OS's - don't show any sign of it). It would help to know just what consistency test if failing - and even more to have a way to get at least *some* information out of the core file! -- Jerry
|
Fri, 09 Jul 2004 23:08:47 GMT |
|
 |
Nicholas Drone #5 / 10
|
 Strange failure (internal assertion failure?)
On the contrary, I expect it not to be compiled with asserts. However, no build system is perfect, nor is IBM. Regards, Nicholas Dronen -- --------------------------------------------------------------------------- Certified AIX Advanced Technical Expert Boulder, Colorado ---------------------------------------------------------------------------
|
Sat, 10 Jul 2004 00:37:14 GMT |
|
 |
Jerry Leichte #6 / 10
|
 Strange failure (internal assertion failure?)
Phil Budne suggested something that provided new information: Determining where the error message actually comes from. Because the code referes to a cond_t, I had assumed this was coming from the pthreads library. When debugging never assume - check everything! In fact, this message comes not from the threads library, but from the name resolver - libnsl_r.a, to be exact. Does that suggest anything to anyone? -- Jerry
|
Sat, 10 Jul 2004 04:32:46 GMT |
|
 |
Gary R. Hoo #7 / 10
|
 Strange failure (internal assertion failure?)
Well, the code probably came from Sun that way. It's reporting a severe logic problem, and in all likelihood, Jerry has discovered a defect. There are two variables in the file that are both valid or both NULL. The assert occurs because one is non-NULL and the other NULL, and this should never happen. Granted, the function ought to do something more intelligent than assert, but it may be a pretty fatal condition. As far as I can see, all the initialization, sanity checking, and locking is correct in this part of the code. -- Gary R. Hook / AIX PartnerWorld for Developers / These opinions are MINE ________________________________________________________________________
|
Sun, 11 Jul 2004 03:03:07 GMT |
|
 |
Gary R. Hoo #8 / 10
|
 Strange failure (internal assertion failure?)
Yes. See my other post. -- Gary R. Hook / AIX PartnerWorld for Developers / These opinions are MINE ________________________________________________________________________
|
Sun, 11 Jul 2004 03:04:03 GMT |
|
 |
Jerry Leichte #9 / 10
|
 Strange failure (internal assertion failure?)
OK, there's been some progress. I can now reproduce the problem in a fairly small test program. What the test program basically does is to do repeated name resolution in multiple threads for a short list of names. When run, the program will do one of two things: - Hang, with one of the threads stuck inside the name resolver call; - Crash, reporting the assertion failure. Now for the fun: We have 3 AIX machines here. Two them exhibit these failures, The other one never does - the program runs fine. The difference: The machines that fail are much more up-to-date on patches than the one that doesn't. Apparently this is a bug introduced in some recent patch. It's hard to know for sure which libraries might be relevant, but here's a guess: What Works Fails Comments bos.net.nis.client 4.3.3.1 4.3.3.75 bos.net.nis.server 4.3.3.0 4.3.3.75 bos.net.nisplus ??????? 4.3.3.75 (Not on old machine) bos.net.tcp.client 4.3.3.29 4.3.3.75 Fileset containing libnsl[_r] Ring any bells now? -- Jerry
|
Sat, 17 Jul 2004 06:35:42 GMT |
|
 |
Jose Pina Coelh #10 / 10
|
 Strange failure (internal assertion failure?)
bos.rte.libc 4.3.3.78 Get it from ftp.software.ibm.com Or use fixdist while they don't pull the plug -- Doing AIX support was the most monty-pythonesque activity available at the time.
|
Sat, 24 Jul 2004 09:26:27 GMT |
|
 |
|