Off-by-two error when reading NAT-PMP responses
The documentation for libnatpmp (see here or here) says that
For each request sent, readnatpmpresponseorretry() should be called as long as it returns NATPMP_TRYAGAIN.
where a "request" means a call to sendpublicaddressrequest
or sendnewportmappingrequest
. However, the NatPmp::removeAllMappings
function in DHTNet doesn't respect this: it calls sendnewportmappingrequest
twice (once for TCP and once for UDP) without trying to read the response. (Despite its name, the sendnewportmappingrequest
isn't only used to open new mappings, but also to close existing mappings, cf. section 3.4 of the RFC for NAT-PMP.)
The NatPmp::sendMappingRequest
function, which also uses sendnewportmappingrequest
, does call readnatpmpresponseorretry
afterward, but it simply assumes without checking that the response it read was for the mapping it requested. In reality, the first two calls to NatPmp::sendMappingRequest
will actually read the responses to the requests made by NatPmp::removeAllMappings
, and each subsequent call will read the response to a previous mapping request (more precisely, call number n
ends up reading the response to mapping request number n - 2
).
Note: This behavior was observed while using the system version of libnatpmp on Ubuntu 22.04, which is quite old (20150609-7). More recent versions of the library, including the one currently used by Jami, at least check that a response is of the right type (TCP or UDP) before returning it, which seems to be enough to prevent the off-by-two error from occurring. However, there are still issues in Jami due to the fact that mapping requests are made from several threads and nothing guarantees that the responses will be read in the correct order:
- Some requests fail with a
NOPENDINGREQ
("no pending request") error:
[1725025492.789|52166] NAT-PMP: Read response on IGD 192.168.0.1 failed with error NOPENDINGREQ
[1725025492.789|52166] NAT-PMP: Request for mapping JAMI-TCP:14621 on 192.168.0.1 failed with error -6: NOPENDINGREQ
This can happen because libnatpmp isn't designed to keep track of multiple concurrent requests; it simply sets a boolean to true whenever a request is sent and sets it back to false when a response is successfully read.
- The status of some requests is determined based on the response to a different request of the same type. E.g. the logs could show that a request for TCP port 12422 succeeded (even though it actually failed) because a previous request for TCP port 11284 did succeed.