Any idea what could be the cause? I suspect we have pjproject packaged badly, so it won't be reproducible with your version.
I don't need a full answer, if you won't be able to tell me exactly what's wrong, just a hint would be good.
The version is 20200124.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related or that one is blocking others.
Learn more.
I have no clue what they're talking about there, but seems that's it.
There's a possibility only our version is buggy, because of the reproducible way of building software.
Could someone check it out please? I really tried my best searching for the cause in our package, but couldn't find anything so far.
Don't know if it's fixed in your version, but it caused really bad crashes during video calls.
The issue is going to be fixed in pjproject 2.10, but it isn't yet released.
I hope this information will be useful.
Tested Jami with the patch applied, but it still fails, in a different place though - during clicking the disconnect button, instead of while starting a call.
I guess I can't do anything, until you upgrade pjproject to 2.10
This weirdness can have two sources - the first is reproducibility, the second is we could have a bug in Guix, but then other packages would be affected too.
The only thing I can think of is outdated gnutls, I'll wait for a new Guix release, where it is bumped to the latest or I'll try nagging some more experienced devs to package the latest gnutls for me.
The one bug gave me a hit that something's wrong with glibc, but no other packages were affected. This one is also going to be updated soon.
If it isn't something I've mentioned, then I have no idea.
The weird thing is that gnutls doesn't have anything to do with this error (by this I mean invite counting). It's related to our code from ice_transport or the pjsip stack, or the two at the same time.
One more thing - what compiler, I guess GCC, then what version do you use?
On Guix the binary seed has been reduced from ~250MB to ~120MB and recently to ~60MB.
We use a custom compiler (GNU Mes) to build tcc to build some other stuff and then to build gcc, which then builds the whole system.
Because every gcc version was before built with a previous gcc version (egg and chicen), hypothetically some ancient bug hidden inside binaries could affect behavior of Jami.
Building pjproject with tarballs provided with its source instead of Guix packages
Building pjproject and all other Jami components using GCC 8 (the version Debian 10 uses)
Both with the same result, still fails. Is it possible something network-related causes this or the problem is not only in pjproject, but is an effect of incorrect client-daemon communication?
I mean we have a lot of audio calls in a lot of configuration (I got like 10 devices in 10 different networks), this doesn't make any sense to me to got this each time
Also, one other thing. Asserts doesn't make sense in release and there was some assert in pj that doesn't make any sense. We gave a feedback weeks ago for this one for example: https://trac.pjsip.org/repos/changeset/6111
Tested today, it doesn't work with the line commented, but it crashes with "memory protection violation" instead of the default message.
I sent my work to Guix, it should be available in the master branch and will be available as a pre-built binary in the Guix package manager. Would be glad if someone could check what's wrong with it.
There can be also a problem with starting the daemon on foreign distributions, because it is started by D-Bus. I'll try fixing this as fast as possible, for now start the daemon manually (if possible).
I don't really know what did it, but the bug is resolved now.
I updated jami to 20200509, Guix also got core-updates merged.
Did you change anything in the code causing the issue?
It seems the issue occurs only when the client on Guix calls and ends the call. If Android client calls to Guix and Guix disconnects, nothing happens, same when Android client calls and Android clients disconnects.
So this is it:
dring: ../src/pjsip-ua/sip_inv.c:203: pjsip_inv_add_ref: Assertion `inv && inv->ref_cnt' failed.--Type <RET> for more, q to quit, c to continue without paging--Thread 47 "dring" received signal SIGABRT, Aborted.[Switching to Thread 0x7fff907f0700 (LWP 1229)]0x00007ffff4511aba in raise () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6(gdb) bt#0 0x00007ffff4511aba in raise () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6#1 0x00007ffff4512bf5 in abort () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6#2 0x00007ffff450a70a in __assert_fail_base () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6#3 0x00007ffff450a782 in __assert_fail () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6#4 0x00007ffff7df9d51 in pjsip_inv_add_ref () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#5 0x00007ffff7dfa0f7 in inv_set_state () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#6 0x00007ffff7e001f4 in inv_handle_bye_response () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#7 0x00007ffff7e028a6 in inv_on_state_confirmed () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#8 0x00007ffff7dfaa4d in mod_inv_on_tsx_state () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#9 0x00007ffff7e47182 in pjsip_dlg_on_tsx_state () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#10 0x00007ffff7e47b73 in mod_ua_on_tsx_state () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#11 0x00007ffff7e3e630 in tsx_set_state () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#12 0x00007ffff7e41b16 in tsx_on_state_proceeding_uac () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#13 0x00007ffff7e40f30 in tsx_on_state_calling () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#14 0x00007ffff7e3f7ce in pjsip_tsx_recv_msg () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#15 0x00007ffff7e3dadf in mod_tsx_layer_on_rx_response () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#16 0x00007ffff7e25e09 in pjsip_endpt_process_rx_data () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#17 0x00007ffff7e260cd in endpt_on_rx_msg () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#18 0x00007ffff7e2fe63 in pjsip_tpmgr_receive_packet () from /gnu/store/9m42df53ain3nr79kily5bbncjrpv8d8-libring-20200509.1.a7603a6/lib/libring.so.0#19 0x00007ffff7d400ed in jami::tls::ChanneledSIPTransport::handleEvents (this=0x7fffb40e8720)--Type <RET> for more, q to quit, c to continue without paging-- at channeled_transport.cpp:221#20 0x00007ffff7ba89a1 in std::function<void ()>::operator()() const (this=<optimized out>) at /gnu/store/rn75fm7adgx3pw5j8pg3bczfqq1y17lk-gcc-7.5.0/include/c++/bits/std_function.h:706#21 jami::ScheduledExecutor::loop (this=<optimized out>) at scheduled_executor.cpp:122#22 0x00007ffff7ba8b35 in jami::ScheduledExecutor::<lambda()>::operator() (__closure=0x7fffb400e1d8) at scheduled_executor.cpp:27#23 std::__invoke_impl<void, jami::ScheduledExecutor::ScheduledExecutor()::<lambda()> > (__f=...) at /gnu/store/rn75fm7adgx3pw5j8pg3bczfqq1y17lk-gcc-7.5.0/include/c++/bits/invoke.h:60#24 std::__invoke<jami::ScheduledExecutor::ScheduledExecutor()::<lambda()> > (__fn=...) at /gnu/store/rn75fm7adgx3pw5j8pg3bczfqq1y17lk-gcc-7.5.0/include/c++/bits/invoke.h:95#25 std::thread::_Invoker<std::tuple<jami::ScheduledExecutor::ScheduledExecutor()::<lambda()> > >::_M_invoke<0> (this=0x7fffb400e1d8) at /gnu/store/rn75fm7adgx3pw5j8pg3bczfqq1y17lk-gcc-7.5.0/include/c++/thread:234#26 std::thread::_Invoker<std::tuple<jami::ScheduledExecutor::ScheduledExecutor()::<lambda()> > >::operator() (this=0x7fffb400e1d8) at /gnu/store/rn75fm7adgx3pw5j8pg3bczfqq1y17lk-gcc-7.5.0/include/c++/thread:243#27 std::thread::_State_impl<std::thread::_Invoker<std::tuple<jami::ScheduledExecutor::ScheduledExecutor()::<lambda()> > > >::_M_run(void) (this=0x7fffb400e1d0) at /gnu/store/rn75fm7adgx3pw5j8pg3bczfqq1y17lk-gcc-7.5.0/include/c++/thread:186#28 0x00007ffff4c4fcdf in execute_native_thread_routine () from /gnu/store/01b4w3m6mp55y531kyi1g8shh722kwqm-gcc-7.5.0-lib/lib/libstdc++.so.6#29 0x00007ffff6692f64 in start_thread () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libpthread.so.0#30 0x00007ffff45d19af in clone () from /gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6
I decided to check one thing by downloading all code from git instead of using the tarballs (because they sometimes fail), but I encountered a problem - applying patches for pjproject 2.10 from the "release/202005" branch of ring-daemon fails. Is it normal? Is the commit number stated in this file up to date, or did you upstream some patches and build Jami using a more recent commit?
After several months, I managed to fix the bug by passing "-DNDEBUG" flag to compiler when building pjproject as advised on pjsip website to turn off assertions.
Release mode. Don't forget to set the appropriate compiler optimization flag, and disable assertion with -DNDEBUG.
Is this documented somewhere on Jami's wiki?
If not please consider adding a note about it, it will make life of maintainers of source-based distributions like Nix or Gentoo easier.
"When building pjproject, it is mandatory to pass -DNDEBUG flag to disable assertion and prevent Jami from failing."
or something like this.
You can close the issue now, or I will do it later.