That was the state a thread locked up in when a non-owner thread attempted to release a lock previously obtained by some other thread. Just want to see if a similar lock is happening. This was the thread deadlock condition caused by that previous Flow Manager bug. Look at the list of active threads and see if any are condition umtxn. After the stall occurs, execute this command: procstat -t (where is the Suricata process ID ) Start a download or otherwise reproduce the stall condition. Next, determine the Process ID of the running Suricata process. If you are testing a VM, then directly opening the VM's console will work. Set up a reliable console session with the test system (meaning one that is not dependent on the interface running Suricata). Even just stopping suricata and not running it, the WAN remained down. However I was unable to get it back after that second time. I triggered the issue again and the WAN dropped once more. I think this due to the fact I restarted suricata after the above so I could run it again to see if the bt was different. I somewhat suspect that suricata is tripping a bug someplace else, like in a driver. I will let it run a full 24 hours, then switch the threading model to "workers" mode and test again. my test virtual machine with 6.0.9 in netmap IPS mode is still operating normally. This worked on Linux systems and most others, but failed on FreeBSD because it is undefined behavior for a non-owner thread to release a lock in FreeBSD.Ĭould this possibly be related to some new variation of this old bug? The receiving thread would then attempt to unlock the queue prior to using it. The old flow manager code, in an effort to improve efficiency, was recycling queues (or something similar, sorry but my memory is foggy now) by passing them to other threads for use. The problem only surfaced on FreeBSD because FreeBSD is very strict about allowing only the thread that obtained a lock to release that lock. Investigating further showed a type of thread deadlock. The network traffic on the interface just stopped. Absolutely nothing was logged when that stall occurred. And just like described for this bug, the problem took some time to surface (many times more than 12 hours of normal operation and then the stall). The symptoms with that bug were a sudden stoppage of traffic on a netmap interface. The symptoms here seem suspiciously very similar to the old flow manager bug that was fixed last year (back in late June of 2021 as I recall). Is anything logged by either Suricata or the system when the stall occurs? There have been some additions by Jeff to wrap more sections of code with the USE_NEW_NETMAP_API logic that we are not currently using within pfSense. The core code changes should be the same because it is the code Jeff and I collaborated on last year. The current patch used on pfSense is here. I have not audited the patch we are using in pfSense against the latest updates to the Netmap v14 API in Suricata 6.x or 7.x, so it's possible there is a subtle difference in the patch that is impacting the OPNsense users. Don't know if the newer FreeBSD version may be in play here ? The new development branch is FreeBSD 14, but I'm not sure how many of the development snapshot users are running netmap and inline IPS mode there. Currently the release branch of pfSense is on FreeBSD 12.3-STABLE. There are several users on pfSense that have netmap inline IPS mode running. I have essentially the same Netmap v14 patch in Suricata used on pfSense since August of last year, but have not seen reports logged of stalling traffic by users there.
0 Comments
Leave a Reply. |