N bed with Lora, Lily and mDot

Terrence · May 13, 2016, 11:38am

Sorry, I couldn’t help myself.
I meant mbed with LoraWAN, Llilum and mDot.

I have been very interested in the LoRaWAN space since it started to flourish last year.

Here is why:

Low power, possible battery from small ma batteries longer than a nat’s life.
Long range, possibly further than I can hit a baseball [yes I know the theoretical distances are much greater].
Each end node in the field doesn’t have to be configured with network access credentials [as you know, a major pain for iot devices].

[The home healthcare solutions I am working on prove to be very challenging if the elderly has to do some magic to authenticate with their wifi network.]

I am also one of the 3 access points for http://thethingsnetwork.org/c/austin/" here in Austin.

[my gateway is not active yet, trying to configure the Conduit to do packet forwarding except if the node id is one of mine, in which case send the data to my web api in Azure.]

I have been focusing my attention on MultiTech’s hardware.

Gateway - MultiConnect Conduit

Soon to be Microsoft IOT certified.

mDot - mbed programmable Cortex M4 end node radio.

I like MultiTech for the following reasons:

They have a complete line of LoRaWAN products.
Been doing industrial device communication for about 40 years.
Made in the USA, Michigan
20 million devices out there in the wild.
The soon to be released xDot is has a lower power M0! http://www.multitech.com/news/article?id=30883
My sales rep gets it that we, the small guy innovators, not the big corporations, will create the next and next big hit.

Which brings me to my next topic - Llilum/mbed.

The mDots are programed with ARM’s mbed platform, which is C++ code. I have been using mbed for a while, but C++ is still a foreign language. For me C# is preferable.

I was thinking that if Llilum is used on mbed and mDot is also, why can’t we get Llilum on mDot?

So I was having this conversation offline with @ .Peter., @ JSimoes, Lorenzo and Steve Maillet (Llilum Gods) about having a talk with MultiTech about getting Llilum to run on mDot.
Last Friday Lorenzo and I had a conference call with Paul Jaeger, MultiTech’s Technical Sales and Marketing Director and Brandon Dalida, Regional Sales Manager - South (Peter and Jose’s time zone didn’t fit the schedule).
Lorenzo described Llilum to them and we discussed the topic at hand.

Here is my follow up email to Peter and Jose:

Well I thought the call went well.
We talked about Multitech getting the Conduit certified for Azure IOT suite.
Multitech doesn’t think there will be a problem with certification if we use Llilum.
Lorenzo gave us a good description of how Llilum works.
I didn’t ask any hardware questions because Lorenzo was on the phone and he didn’t feel the need to ask any, so he must not think it is an issue.
They said there is 500k for our programs, and 96k of ram on the current mDot.
They are building a new mDot with the M0 super low power, the model # is nxp KL17 I think.
I ask if there was anything we needed to watch out for when we lay the new firmware down, so we won’t brick the chip and Paul said no, they are pretty bullet proof.
The impression I was left with was if someone in the community wants to move forward with this topic, MultiTech does not mind. It didn’t sound like they were going to put a man on the project, but I am sure we will have access to some tech support. Lorenzo stated that he will…
Well I will let Lorenzo speak for himself, I don’t want to put words in his mouth.
I know this isn’t the detail you were looking for, but there it is.
Hopefully Lorenzo will give us his take on the phone call.

Lorenzo has not chimed in yet, but I think his take is that he will help out if the community wants to dive into this endeavor.

I had another call today with Brandon and he said that they took our topic to several other departments and they were receptive to the idea and are contemplating it.

If we could get MultiTech to spearhead working with Lorenzo and Steve on getting Llilum to run on the mDot then we will be in business.

Attached are pictures of the Conduit gateway, the M4 mDot and the soon to be released M0 xDot.

_Peter · May 13, 2016, 3:08pm

@ terrence - something I tried some time ago … NUCLEO-F411RE & Llilum …

taken from a previous post I made …

… after getting a Net 4.5 managed Hello World app deployed to an STM32F411RE … I let that sink in for a moment …

… so after a few moments of thoughtfulness …

For those interested:

Prepped the requirements for LLVM
Build the needed LLVM compiler (5 Hrs)
Prepped the SDK
Build the SDK
Installed the SDK
Ran an SDK Template and changed for a 411RE based MCU

Solution structure of the SDK based template:

Managed Project, targeting .Net 4.5
Native project, lots of CPP - Here the mDot mbed library will probably need to be included and those functions imported so they can be used from the managed part …

This is real infancy time, but the idea of running a 4.5 app (or UWP) on an MCU, like STM32F411RE or even an STM32L152 is somewhat, ehm, getting used to …

For additional info see this as well: [url]https://www.ghielectronics.com/community/forum/topic?id=22734[/url]

lt72 · May 14, 2016, 6:30pm

@ terrence - I will definitely be very interested and motivated to help out anybody that wants to try the port to mDot !

I did some performance investigations, spent a couple of hours, so not very deep, nevertheless I have some interesting results I wanted to share.
I used a simple program to toggle a GPIO pin, and I used the K64F board. All measures are relative and therefore results may apply to any other board.
The program I compared to is the following:

 
 #include "mbed.h"

DigitalOut gpo(D0);
int main()
{
    while (true) {
        gpo.write( 0 );
        gpo.write( 1 ); 
    }
}

simple, and likely as fast as possible on mBed: DigitalOut::write calls GPIO_write directly, and that is the GPIO api in the mBed 2.0 HAL. Likely, one cannot do better than that using mBed.

I compiled the program on the Web UI, deployed to my K64f board and I got 2.5Mhz. Since one can only download the .bin file, it is not immediate to see what machine code exactly one is executing. I then exported the program to ARM GCC, and after recompiling for release (flag NDEBUG), I redeployed and I got a 1.1Mhz, less than half… interesting. Obviously the libraries used are very much different, so I go and check what the exported program is doing and I see that:



00001194 <gpio_write.constprop.1>:
    1194:	b5f0      	push	{r4, r5, r6, r7, lr}
    1196:	4b0d      	ldr	r3, [pc, #52]	; (11cc <gpio_write.constprop.1+0x38>)
    1198:	4d0d      	ldr	r5, [pc, #52]	; (11d0 <gpio_write.constprop.1+0x3c>)
    119a:	681e      	ldr	r6, [r3, #0]
    119c:	4686      	mov	lr, r0
    119e:	cd0f      	ldmia	r5!, {r0, r1, r2, r3}
    11a0:	b087      	sub	sp, #28
    11a2:	ac01      	add	r4, sp, #4
    11a4:	c40f      	stmia	r4!, {r0, r1, r2, r3}
    11a6:	682b      	ldr	r3, [r5, #0]
    11a8:	6023      	str	r3, [r4, #0]
    11aa:	b2f7      	uxtb	r7, r6
    11ac:	ab06      	add	r3, sp, #24
    11ae:	1336      	asrs	r6, r6, #12
    11b0:	eb03 0686 	add.w	r6, r3, r6, lsl #2
    11b4:	2301      	movs	r3, #1
    11b6:	f856 2c14 	ldr.w	r2, [r6, #-20]
    11ba:	f01e 0fff 	tst.w	lr, #255	; 0xff
    11be:	fa03 f307 	lsl.w	r3, r3, r7
    11c2:	bf0c      	ite	eq
    11c4:	6093      	streq	r3, [r2, #8]
    11c6:	6053      	strne	r3, [r2, #4]
    11c8:	b007      	add	sp, #28
    11ca:	bdf0      	pop	{r4, r5, r6, r7, pc}
    11cc:	1fff04d4 	svcne	0x00ff04d4
    11d0:	00006930 	andeq	r6, r0, r0, lsr r9

000011d4 <main>:
    11d4:	b508      	push	{r3, lr}
    11d6:	2000      	movs	r0, #0
    11d8:	f7ff ffdc 	bl	1194 <gpio_write.constprop.1>
    11dc:	2001      	movs	r0, #1
    11de:	f7ff ffd9 	bl	1194 <gpio_write.constprop.1>
    11e2:	e7f8      	b.n	11d6 <main+0x2>

that looks pretty minimal, although the bitfield struct for K64F GPIO controller does not help (see file MK64F12_gpio.h) understanding what the compiler is doing… and it seems a lot. Likely RVDS on ARM side for the Web UI is doing waaaaay better. Also, the exported program seems to only contains debug libraries. In facts when I dump the content of libmbed.a, I still see the asserts, which are in theory only enabled for debug builds (see mbed_assert.h). There is also code in mBed that uses <assert.h> from STD, and that complicates the analysis further… e.g. see file fsl_gpio_hal.h:


void GPIO_HAL_WritePinOutput(uint32_t baseAddr, uint32_t pin, uint32_t output)
{
    assert(pin < 32);
    if (output != 0U)
    {
        HW_GPIO_PSOR_WR(baseAddr, 1U << pin); /* Set pin output to high level.*/
    }
    else
    {
        HW_GPIO_PCOR_WR(baseAddr, 1U << pin); /* Set pin output to low level.*/
    }
}

So we are really executing here is some sub-par assembly, interleaved with some asserts that crept in because of bad abstraction layers around macros.
Nevertheless, we have though a first few interesting leads here:

for a busy loop performance analysis, the choice of compiler is fundamental (RVDS vs. GCC)
check your libraries flavors and dependencies
a few more instructions and you are stuck with 1/2 the throughput

I then moved to LLILUM, and I tried the following program, which is the morale equivalent of the one I used for mBed:



        private void Run()
        {
            System.Diagnostics.Debug.WriteLine( "User program starting..." );

            //
            // Test pin toggling speed with Llium GpIO API
            // 
            {
                var pinLlilum = Microsoft.CortexM0OnMBED.HardwareModel.GpioPin.Create( (int)Llilum.K64F.PinName.D0 );

                pinLlilum.Direction = Llilum.Devices.Gpio.PinDirection.Output;

                while(true)
                {
                    pinLlilum.Write( 0 );
                    pinLlilum.Write( 1 );
                }
            }
            
            //
            // Test pin toggling speed with UWP Gpio API implementd with Lilum GpIO API
            // 
            {
                var controller = GpioController.GetDefault();
                var pinUWP     = controller.OpenPin( (int)Llilum.K64F.PinName.D1 );

                pinUWP.SetDriveMode( GpioPinDriveMode.Output );

                while(true)
                {
                    pinUWP.Write( GpioPinValue.Low );
                    pinUWP.Write( GpioPinValue.High );
                }
            }
        }

The upper half uses the lowest level GPIO API in LLILUM, and the lower half uses the UWP ones.
Please note that you need to comment the upper halt to get the UWP portion.

my findings on the K64F is that we execute at 0.822Mhz with the low level API when I compile for release (again, flag NDEBUG). Please note that the default today is to compile for DEBUG, even in the LLILUM SDK!

If I compile for release, and I kill a pretty useless check at \Zelig\os_layer\ports\mbed\mbed_gpio.cpp then I get 0.857Mhz

[/code=cpp]
HRESULT LLOS_GPIO_Write(LLOS_Context pin, int32_t value)
{
/* KILL THIS CHECK
LLOS_MbedGpio pGpio = (LLOS_MbedGpio)pin;

    if (pGpio == NULL)
    {
        return LLOS_E_INVALID_PARAMETER;
    }
    */

    gpio_write( &((LLOS_MbedGpio*)pin)->Pin, value );

    return S_OK;
}



If I compile for debug, then I get  0.770Mhz. Oooops... 

The UWP portion of the program compiled for release executes at 0.203Mhz. 
If you look at the UWP API, I have to say it is poorly specified. It makes a lot of demands on the implementation, i.e. one needs to check at every toggle whether the object is disposed, remember the last value, etc... not good in a busy loop, but we need to respect the UWP semantic. 

In summary this is what I found in a one hour investigation on a K64F board:

LLILUM low level GPIO API:

[quote]
release / do not use null check in GPIO functions:      0.857Mhz
release /            use null check in GPIO functions:      0.822Mhz
debug  / do not use no null check in GPIO functions : 0.770Mhz
[/quote]


a few simple instructions cause a rather quick performance degradation. This is not surprising, since we are executing a rather minimal busy loop. 

a few considerations:
1) LLILUM does not need C/C++ drivers. We used mBed drivers because they are there, readily available, but really we should implement them in C#, for all basic cases for a specific board.
2) GCC is not that great
3) we use debug builds by default! 
4) class libraries need to be written for performance, when you need performance. UWP was not written for performance. LLILUM low level GPIO library is a bit better, but could do even better.

All in all, it is not so difficult to make it go faster, but correctness first!  
(and remember that managed code does a lot of "free" null checks for you) 

I am more concerned with the build size.  That is really managed code overhead that takes a while to shave off... 

if somebody wants to dig a bit deeper, please let me know, I can point you to a few tricks.  Also, it would be interesting to try and write a GPIO Api really written for performance on top of the actual GPIO controller, no C/C++ in the middle!

Mr_John_Smith · May 14, 2016, 9:43pm

There should be points for just the length and technical dept of posts.

Terrence · May 14, 2016, 10:19pm

@ lt72 - Hi Lorenzo, your analysis is very impressive. I agree with @ Mr. John Smith, lots of points for you as, newbie doesn’t quite fit.
Most of it is way over my pay grade

For those that do not know, Lorenzo is the man behind Llilum.

Hopefully we can get .Peter., @ JSimoes, @ mcalsyn, @ justin and all others who are interested in LoRaWAN to review your findings and pitch in.

Thank you again for your support.