It is currently Mon, 04 Mar 2024 02:53:14 GMT



 
Author Message
 Strange failure (internal assertion failure?)
I have a C++ program on AIX 4.3.3 that exited with the following
message:

        The assert subroutine failed: dg_cv != (cond_t *) NULL, file            
clnt_dg.c, line 167

This isn't coming from anywhere in my code; it seems to be internal.
After printing this message, the process vanished.  (I catch various
signals like SIGSEGV; none of them were sent.)

The code is heavily multithreaded, and seem to be running normally when
it suddenly died.  I haven't been able to reproduce the failure since,
despite repeated tries.

Anyone have any clues as to what might cause this message to appear?

                                                        -- Jerry



 Tue, 06 Jul 2004 08:13:49 GMT   
 Strange failure (internal assertion failure?)
Are you getting a core file created after the program exits?  It
sounds like your program has a memory leak.

Mark



 Wed, 07 Jul 2004 02:52:04 GMT   
 Strange failure (internal assertion failure?)
news:3c477d68$0$181$7586b60c@news.frii.com...

There are clnt_ symbols in libc.a, but surely you do not expect
shipping libc.a to be compiled with assert()s in place, do you?

Anyway, from my 4.3.3.0 system:
  $ nm /lib/libc.a | grep 'clnt_.* f'

../../../../../../../src/bos/usr/ccs/lib/libc/clnt_generic.c f          -
../../../../../../../src/bos/usr/ccs/lib/libc/clnt_perror.c f          -
../../../../../../../src/bos/usr/ccs/lib/libc/clnt_raw.c f          -
../../../../../../../src/bos/usr/ccs/lib/libc/clnt_simple.c f          -
../../../../../../../src/bos/usr/ccs/lib/libc/clnt_tcp.c f          -
../../../../../../../src/bos/usr/ccs/lib/libc/clnt_udp.c f          -

To the OP: the assert is coming from one of the object modules
that is linked into your executable. Find out which one has it
by running 'nm' on each object or library on your link line.



 Fri, 09 Jul 2004 15:09:15 GMT   
 Strange failure (internal assertion failure?)

Funny you should ask that question.

Yes, there is a core file.  However, all the de{*filter*}s I've tried on it
- dbx, xldb, gdb, even adb and Bob Hablutzel's coralist - die trying to
read it.  adb complains that it's a 4.3 core file, which it can't
read.  (It helpfully suggests changing the dumpfile format to pre-403,
which of course is no help with an existing core file!)  gdb says the
file is not a core file - probably the same thing.  dbx takes a long
time before dying with, I think, a SEGV.  xldb produces:

xldb (map.C, line 304): A file or directory in the path name does not
exist.
Segmentation fault

So I have no clue where the program was when it died.

Memory leak or memory scribble?  I can believe the latter - though I
have no other evidence that it's occuring (since other runs of the same
code - on multiple OS's - don't show any sign of it).

It would help to know just what consistency test if failing - and even
more to have a way to get at least *some* information out of the core
file!
                                                        -- Jerry



 Fri, 09 Jul 2004 23:08:47 GMT   
 Strange failure (internal assertion failure?)

On the contrary, I expect it not to be compiled with asserts.
However, no build system is perfect, nor is IBM.

Regards,

Nicholas Dronen

--
---------------------------------------------------------------------------
Certified AIX Advanced Technical Expert
Boulder, Colorado
---------------------------------------------------------------------------



 Sat, 10 Jul 2004 00:37:14 GMT   
 Strange failure (internal assertion failure?)

Phil Budne suggested something that provided new information:
Determining where the error message actually comes from.  Because the
code referes to a cond_t, I had assumed this was coming from the
pthreads library.  When debugging never assume - check everything!
In fact, this message comes not from the threads library, but from the
name resolver - libnsl_r.a, to be exact.

Does that suggest anything to anyone?
                                                        -- Jerry



 Sat, 10 Jul 2004 04:32:46 GMT   
 Strange failure (internal assertion failure?)

Well, the code probably came from Sun that way.  It's reporting a
severe logic problem, and in all likelihood, Jerry has discovered a
defect.  There are two variables in the file that are both valid or
both NULL.  The assert occurs because one is non-NULL and the other
NULL, and this should never happen.  Granted, the function ought to
do something more intelligent than assert, but it may be a pretty
fatal condition. As far as I can see, all the initialization, sanity
checking, and locking is correct in this part of the code.

--
Gary R. Hook / AIX PartnerWorld for Developers / These opinions are MINE
________________________________________________________________________



 Sun, 11 Jul 2004 03:03:07 GMT   
 Strange failure (internal assertion failure?)

Yes.  See my other post.

--
Gary R. Hook / AIX PartnerWorld for Developers / These opinions are MINE
________________________________________________________________________



 Sun, 11 Jul 2004 03:04:03 GMT   
 Strange failure (internal assertion failure?)

OK, there's been some progress.

I can now reproduce the problem in a fairly small test program.  What
the test program basically does is to do repeated name resolution in
multiple threads for a short list of names.  When run, the program will
do one of two things:

        - Hang, with one of the threads stuck inside the name resolver
                call;
        - Crash, reporting the assertion failure.

Now for the fun:  We have 3 AIX machines here.  Two them exhibit these
failures,  The other one never does - the program runs fine.  The
difference:  The machines that fail are much more up-to-date on patches
than the one that doesn't.  Apparently this is a bug introduced in some
recent patch.

It's hard to know for sure which libraries might be relevant, but here's
a guess:

    What             Works   Fails         Comments
bos.net.nis.client  4.3.3.1  4.3.3.75
bos.net.nis.server  4.3.3.0  4.3.3.75
bos.net.nisplus     ???????  4.3.3.75    (Not on old machine)
bos.net.tcp.client  4.3.3.29 4.3.3.75    Fileset containing libnsl[_r]

Ring any bells now?
                                                -- Jerry



 Sat, 17 Jul 2004 06:35:42 GMT   
 Strange failure (internal assertion failure?)

bos.rte.libc 4.3.3.78

Get it from ftp.software.ibm.com

Or use fixdist while they don't pull the plug

--
Doing AIX support was the most monty-pythonesque
activity available at the time.



 Sat, 24 Jul 2004 09:26:27 GMT   
 
   [ 10 post ] 

Similar Threads

1. scat: assertion failure

2. Assertion failure on solaris 2.8 causes system to crash

3. usb assertion failure

4. Assertion Failure in Compilat

5. xemacs dies with an assertion failure error message

6. Assertion failure in 2.5.69-mm9

7. Assertion failure in journal_dirty_metadata()

8. Kernel Bug Assertion Failure

9. Kernel assertion failures

10. KERNEL BUG: assertion failure in reiserfs code


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software