Details
Description
I think we discovered a malfunction in QtOpcUa when using monitoring on a QOpcUaNode with open62541 backend. The problem: If you monitor a OPC UA node value that doesn't change for a long time, the QOpcUaClient disconnects 60 seconds after enabling monitoring.
To ensure it's not a problem located in my code, I created a small application which just monitors a static (never changing) OPC UA node to reproduce the error - it's attached below. (See README.md for details!)
What is happening here?
- Once you enable node monitoring, the open62541 client sends a "CreateSubscriptionRequest" followed by a "CreateMonitoredItemsRequest", these will be accepted by the server. As specified in OPC UA protocol, the client has to provide "PublishRequest" messages which act like one time tokens the server responds to with "PublishResponse" messages on value changes or before monitoring keep-alive timeout is reached.
- Right after enabling monitoring, the client sends 10 PublishRequest messages and the server first time responds with the initial value of the variable.
- After each server response, the client sends a new PublishRequest to ensure the server never runs out of tokens.
- These PublishRequest messages have a "TimeoutHint" field, telling the server how long the token is valid before it expires. By default open62541 sets this to 60,000 ms (1 minute).
- When you monitor a node that never changes it's value and the server just sends a few empty responses (which also require a token) to prevent timeout, the amount of tokens provided at the beginning is likely to expire.
- When a token expires, the server behaves correctly by sending a ServiceFault message referring to the expired token with status code "BadTimeout".
- The open62541 client then logs "Received Timeout for Publish Response". Which I found in the open62541 sources in function `UA_Client_Subscriptions_processPublishResponse` (ua_client_subscriptions.c L1101) which itself just logs a warning and calls the configured "inactivityCallback".
- The callback configured by QtOpcUa open62541 plugin is function `Open62541AsyncBackend::inactivityCallback` (qopen62541backend.cpp) which then just disconnects the client instance.
I think the correct behaviour (as seen with other clients like UaExpert) should be: just ignore the BadTimeout code if referred to a PublishRequest, as the open62541 library already logs a warning on that and ensures to send a new PublishRequest to the server. But I had no time yet, to dig deeper in the Open62541AsyncBackend code to check if "inactivityCallback" is called in other places as well.
My example application provides the following log output:
Starting MonitoringTest... qt.opcua.plugins.open62541.sdk.userland: "AcceptAll Certificate Verification. Any remote certificate will be accepted." qt.opcua.plugins.open62541.sdk.securechannel: "Connection 724 | SecureChannel 2745215654 | Opened SecureChannel with SecurityPolicy http://opcfoundation.org/UA/SecurityPolicy#None" qt.opcua.plugins.open62541.sdk.client: "Client Status: ChannelState: Open, SessionState: Closed, ConnectStatus: Good" qt.opcua.plugins.open62541.sdk.client: "Selected Endpoint opc.tcp://WKS-00005:4840 with SecurityMode None and SecurityPolicy http://opcfoundation.org/UA/SecurityPolicy#None" qt.opcua.plugins.open62541.sdk.client: "Selected UserTokenPolicy Anonymous-[0]-None-None with UserTokenType Anonymous and SecurityPolicy http://opcfoundation.org/UA/SecurityPolicy#None" qt.opcua.plugins.open62541.sdk.client: "Client Status: ChannelState: Open, SessionState: Created, ConnectStatus: Good" qt.opcua.plugins.open62541.sdk.client: "Client Status: ChannelState: Open, SessionState: Activated, ConnectStatus: Good" Connected to OPC UA server! Created OPC UA node! Node attributes read: QFlags<QOpcUa::NodeAttribute>(Value) Read node value: QVariant(uint, 0) Enabled node monitoring! Running with no problems for 1.00 s Running with no problems for 10.00 s Running with no problems for 11.00 s Running with no problems for 20.02 s Running with no problems for 30.02 s Running with no problems for 40.01 s Running with no problems for 50.01 s Running with no problems for 60.02 s qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadTimeout" qt.opcua.plugins.open62541.sdk.client: "Received Timeout for Publish Response" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadTimeout" qt.opcua.plugins.open62541.sdk.client: "Received Timeout for Publish Response" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadTimeout" qt.opcua.plugins.open62541.sdk.client: "Received Timeout for Publish Response" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadTimeout" qt.opcua.plugins.open62541.sdk.client: "Received Timeout for Publish Response" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadTimeout" qt.opcua.plugins.open62541.sdk.client: "Received Timeout for Publish Response" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Received a ServiceFault response" qt.opcua.plugins.open62541.sdk.client: "The ServiceResult has the StatusCode BadNoSubscription" qt.opcua.plugins.open62541.sdk.client: "Client Status: ChannelState: Closed, SessionState: Closed, ConnectStatus: Good" Disconnected from OPC UA server! Failed after 62.01 s
So we can see what exactly happens:
- 60s after enabling monitoring, we receive 5 ServiceFault messages, telling us that 5 previously submitted PublishRequest tokens expired.
- Due to call chain `Open62541AsyncBackend::inactivityCallback -> Open62541AsyncBackend::clientStateCallback -> Open62541AsyncBackend::cleanupSubscriptions; ` the client sends a DeleteSubscriptionsRequest (not logged)
- For all other available PublishRequests, the server sends more ServiceFault messages, telling there is no active subscription (status code BadNoSubscription).
- A timer started in `Open62541AsyncBackend::clientStateCallback` closes the OPC UA connection.
Additionally to the test application, I attached a Wireshark capture that shows the behaviour described above.
If you need any further information or something tested, I'm glad to help!
Dominic