Forwarding stops working
I am facing a problem which shows up every few days: sensors values forwarding suddenly stops working.
This happens both on my home server (Atom Z8350, Ubuntu) and on my camper server (RPi 3B+).
I did not find any realiable and repeatable procedure to recover the error.
Sometimes it recovers restoring the day before backup, sometimes rebooting the Pi, disconnecting and reconnecting the serial gateway,
Last Sunday I reinstalled a clean mycontroller instance, copied the backup from the previous and restored the last working backup.
This morning, after having switched off the display which receives all the forwarded messages, it just stopped working after the reconnection.
I restored the last backup and now also the connection with the external InfluxDB stopped working.
I don´t know from where to start.
I can just exclude hardware, RF or supply problems, since the mysensor network is proven to work reliably.
Any help would be appreciated.
@fsgrazzutti Do you see data received to the MyController server?
Do you see any error on the mycontroller.log?
@jkandasa I see sporadic data coming in, while infludb receives all of them.
I connected the serial gateway to the pc using MYScontroller and the data flows regularly.
Now I created a new fresh install and I am restarting from the beginning.
I still have to recreate the forwardings, which is a pain.
I will let you know how it goes
@jkandasa I installed the fresh instance and the nodes showed up immediately.
I came back after few hours to start adding the forwardings but the sensors list is empty.
Looking at the log I have found this
2021-05-04 21:15:44,641 ERROR [mc-th-pool-0] [org.mycontroller.standalone.provider.EngineAbstract:279] Throws exception while processing!, [MessageImpl(gatewayId=1, nodeEui=5, sensorId=SENSOR_BC, type=Internal, subType=Pre sleep notification, ack=0, payload=500, isTxMessage=false, timestamp=1620155744629, properties=null)] java.util.concurrent.RejectedExecutionException: Task org.mycontroller.standalone.provider.ResourcesLogger@1d223de rejected from java.util.concurrent.ThreadPoolExecutor@a957bf[Running, pool size = 70, active threads = 70, queued tasks = 100, completed tasks = 551833] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at org.mycontroller.standalone.McThreadPoolFactory.execute(McThreadPoolFactory.java:52) at org.mycontroller.standalone.provider.ExecuterAbstract.execute(ExecuterAbstract.java:134) at org.mycontroller.standalone.provider.EngineAbstract.auditQueue(EngineAbstract.java:274) at org.mycontroller.standalone.provider.EngineAbstract.run(EngineAbstract.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
repeated for every sensor variable. The sensor variables are all present in influxDB (remote server)
What I am missing ?
Thanks and kind reards,
@fsgrazzutti We have some fixes on sensor related issues on the SNAPSHOT version.
can you please try with the SNAPSHOT version?
We are going to release this soon. Thanks!
SNAPSHOT Version: https://forum.mycontroller.org/topic/58/download-snapshot-build/1
@jkandasa already done yesterday, when I found it, but unfortunately it did not work.
Must say that the system has lots of sensor variables (around 200).
When I migrated, as described in the dirty way method, they were already all there.
I will go to have a look later today. Is there any information that I might collect ?
Btw, the sensors are shown in the node details and contain al the info.
Action menu also works, but as soon as you encounter a drop down menu, it's empty
@jkandasa I don't know if it might be related, but yesterday when rewiring my gateways I entered a condition where the GW was 'up' and showing last message count OK, but no update on any sensor graphs attached to that GW - I wonder if it is an issue between myc and the db as I assume all graph updates come from db?
The cure was to remove power from GW and then restore power. All seems OK after that....
@jkandasa sorry for the late reply.
I don't get any error, just the lists are empty.
Loading the Sensors subpage, it stays there forever.
it seems a database relate issue, maybe I have too many sensors reporting. (average duration is 20ms, 600 messages per minute.
In the meantime I just took a ESP32 and cyclic query in influx db all the sensors that the display expects, and then just forward node-to-node to the display (I use mysensors and nRF24 radio). Still a couple of variable casting problems on V_PERCENTAGE but it works.
Of course, I look forward to get back to Mycontroller
@jkandasa I will give it a try to the snapshot, thanks !
@fsgrazzutti Sure, kindly report back your finding on this, it can help us to narrow down the issue.
@jkandasa today I installed the latest SNAPSHOT, but I wanted to start from a fresh configuration, so I did not restore any backup from the previous snapshot.
So far, looks like it is running much faster, and the sensors dropdown list is filled in a few seconds instead of minutes as before.
Great job ! Happy to be back with MyController !
I will still keep the actual solution for the display.
In this way the data from influxdb to the display doesn't go thought MyController. I added a second serial gateway(different channel and network ID) just for that scope, and so far just for date and time syncronisation.
@jkandasa I just installed the last SNAPSHOT to the home server and restored the last backup and it works smoothly
@fsgrazzutti It is good news. let us know if you see the issue again. Thanks!