This is an old revision of the document!
Recently I worked on a product solution using a development platform based on the Nordic nRF9151 chipset. One of the key requirements was to integrate a serial-based RFID module.
The RTOS of choice for this board is the nRF Connect SDK (NCS), Nordic's fork of Zephyr.
This was my first foray into Zephyr.
I picked C++ over C in order to be able to clearly operate the module's various features within its own class.
Data received from the module would trigger an interrupt, and a Message Queue would serve as a simple way to contain the data to be processed by a thread. The queue was defined using the macro:
K_MSGQ_DEFINE(test_msgq, 1, 50, 4);
Within the Interrupt Service Routine (ISR), the data would be put to the queue using:
k_msgq_put(&test_msgq, &testDataForQueue, K_NO_WAIT);
And here is a where a problem occurred: although the module was successfully transferring serial data, that data could not be put into the queue.
Being inside the ISR, the timeout value can only be K_NO_WAIT (meaning not to wait around for a space to become available) to ensure the routine puts to the Message Queue and completes quickly.
On the first attempt to add an item to the queue, the system returned error -35 (ENOMSG). Normally, this would indicate a full queue, but the queue was completely empty. This could be easily determined using the k_msgq_num_free_get() and k_msgq_num_used_get functions verified 50 free entries and none used.
From here, first port of call was to do a quick Duck search around the traps to find out if this was a common problem. Turned up pretty much nothing.
The next stop was to give AI a try (tech's equivalent of a lying schoolboy). The poor model ran itself in circles arguing that the queue was full, or that I needed to allow some time delay in the ISR, which is bad advice. After supplying much debugging information to AI, it ended up having no idea.
The next step to determining the issue was to create a minimal reproducible example.
I created a simple C++ app with just a Message Queue, an ISR based on a button press, and a thread. I added just a single byte to the queue to eliminate the source of data having anything to do with the problem. Same result: error -35 (ENOMSG).
What if I removed the ISR from the equation and just added the byte to the queue from a thread where time is not constrained?
Same: error -35 (ENOMSG).
I decided to try a different timeout parameter from the thread. Instead of K_NO_WAIT, I went for K_SECONDS(10):
k_msgq_put(&test_msgq, &testDataForQueue, K_SECONDS(10));
This resulted in the code waiting in the thread for 10 seconds before giving: Error -11 (EAGAIN) which means the waiting period timed out. I guess it did indeed timeout waiting for a free slot.
Same question remains: it's an empty queue, why act like its full?
For fun I tried the final option of K_FOREVER:
k_msgq_put(&test_msgq, &testDataForQueue, K_FOREVER);
This just resulted in the thread blocking forever. No slot was ever going to free up in the queue.
My final test was to eliminate Zephyr threads entirely and try the three tests again from the main thread. The results were exactly the same as the thread tests.
Ok, so where to from here?
What if I started over using a simple “hello world” application in C and try again from there? Lo and behold it works from the main thread! I can add a single byte to the Message Queue. What gives?
Then the penny started to drop… is it possible that the difference could be lying in the fact that my application is C++ and the hello world is C?
I switched back to my minimal C++ application and put it to AI. Was there a difference?
But before I get to that, I was given something I probably should have read at the very start when digging into Zephyr: https://docs.zephyrproject.org/latest/develop/languages/cpp/index.html (C++ Language Support — Zephyr Project Documentation).
C++ usage is not just a given in Zephyr. It needs to be configured. Though this configuration was not necessary for my particular issue, it should be done to avoid further issues down the track.
What was discovered was that using the macro K_MSGQ_DEFINE creates a queue object in memory of a certain size from a C file. From a C++ file, it might or might not be the same size if the C++ compiler pads the object.
When calling functions inside the queue object, for example k_msgq_num_free_get(), the queue might report correctly. But some behaviour may not work as intended. And this was the case when trying to put something to the queue. The Message Queue's internal offsets were out in a C++ file and so it was unable to determine correctly if something could be put into it.
The first way to fix this was to move the creation of the Message Queue into a C file which could be imported and used in the C++ code. This does work. But it's not ideal, at least from my perspective.
In the end, I opted to not use the K_MSGQ_DEFINE macro at all. Instead create my own buffer for the queue and create the queue itself with:
static char test_queue_buf[50 * sizeof(unsigned char)]; static struct k_msgq test_msgq;
And then initialise it with:
k_msgq_init(&test_msgq, test_queue_buf, sizeof(unsigned char), 50);
Now it works from the main thread, Zephyr threads and an ISR.
As ever, the simplest answer is usually the right one. But if you can't get to any answer simple or not, cutting a copy of your code down to a minimal reproducible example is an excellent tool to helping you discover the answer.
Enjoy your journey into Zephyr.