Galileo first impressions

godefroi · August 11, 2014, 3:09pm

Compiling IL to native code has existed since the very first version of the .NET framework. On the desktop, IL has ALWAYS been compiled to native code.

Compiling IL to native code for ARM has existed since NETCF. It’s not new, and it predates Windows Phone by something like a decade.

The real problem (IMNSHO) is that there isn’t such a thing as a “.NET Micro Framework team”. Four months ago MS committed to provide more resources to NETMF, and what have we seen since? Nothing.

https://www.ghielectronics.com/community/forum/topic?id=15304

Mr_John_Smith · August 11, 2014, 4:43pm

So then if IL could run natively on a uC then it would run at full speed right? It would be as fast as Thumb.

taylorza · August 11, 2014, 11:03pm

Full speed yes, but not as fast as Thumb. The main reason I am so confident it would not be as fast as Thumb is the fact that it is stack based. Compiling for and interpreting stack based languages (IL, ByteCode etc.) is without a doubt the simplest target, but it suffers when it comes to execution speed even if the CPU natively supports the language.

Registers on a CPU are implemented using very fast on chip memory, this memory is not your typical SRAM or SDRAM, it is normally implemented using dedicated D-Flip Flops made, which is just a few NAND gates, one DFF per bit. Of course implementing memory like this is fast because it is accessible right there where you need it, no need to load an address onto the address bus in one clock cycle, signal a read/write line on the next clock cycle, wait x number of clock-cycles for data to become available on the data bus etc. Sadly it is expensive in terms of silicon real-estate, so there cannot be much of it, which is one of the reasons that the number of registers are limited.

Now contrast this to a stack based language. Here you almost have to use externalized RAM to maintain the stack unless you are going to severely limit your stack depth, which is not practical for a stack based language which relies heavily on the stack. Sure you can do all kinds of optimization like having the top N stack elements mirrored in register memory, but look at IL everything is on the stack, local variables are accessed on the stack function arguments are on the stack, call frames are on the stack and then every expression that you have uses the stack as intermediate storage, simply adding two numbers uses the stack. So ultimately your data has to spill to the RAM based stack.

So you might argue that even in a register based CPU you need to move data from slow memory to the registers, true, but consider a simple routine to Find GCD

Before anyone shouts, yes this is a simple example that could take advantage of mirroring the top stack elements, but take it for what it is an example show the difference between a stack based language and register based CPU.


; Memory access
MOV r0, 10
MOV r1, 8

; Using registers calculate the GCD
.while
  CMP  r0,r1
  BE  endwhile

  SUBGT r0,r0,r1
  SUBLE r1,r1,r0 
  B  while
.endwhile

Contrast this to IL where in your while loop you would be pushing and popping the stack repeatedly to execute the expressions, compare the data on the stack etc. etc.


  IL_0000:  br.s       IL_0012
  IL_0002:  ldarg.0 
  IL_0003:  ldarg.1 
  IL_0004:  ble.s      IL_000d 
  IL_0006:  ldarg.0 
  IL_0007:  ldarg.1 
  IL_0008:  sub      
  IL_0009:  starg.s    a 
  IL_000b:  br.s       IL_0012
  IL_000d:  ldarg.1 
  IL_000e:  ldarg.0 
  IL_000f:  sub 
  IL_0010:  starg.s    b 
  IL_0012:  ldarg.0 
  IL_0013:  ldarg.1 
  IL_0014:  bne.un.s   IL_0002

Every instruction except the unconditional branches (br.s) performs multiple stack accesses, IL_00008 for example calls sub which will do pop, pop, subtract, push. ie. pull two top stack elements, subtract them and then push the result to the stack.

NOTE: I am not saying it won’t be fast, but it will not be as fast as Thumb.

Btw. This has been done more than once for Java ByteCode. Sun had the picoJava chip, which did not achieve much. They release the FPGA code (Verilog) for it eventually. And then ARM has Jazelle which implements a subset of the Java ByteCodes. Reading the Wikipedia entry I would say that it has been a non-starter.

godefroi · August 11, 2014, 11:53pm

I would say that @ taylorza is correct on all counts. Furthermore, a CPU that could execute IL directly would be a quite complex CPU!

Complex high-level instruction sets make for complex, expensive, power-hungry CPUs. This is essentially the argument for RISC vs CISC CPUs, though, nowadays even our nice CISC CPUs (such as x86) are essentially RISC CPUs with lots of complex microcode

Duke_Nukem · August 12, 2014, 1:14am

Could FORTRAN be making a comeback for devices? When I wrote FORTRAN we used to snicker at all those slow stack based languages especially when we broke out the super computers.

Jason · August 12, 2014, 1:29am

Apart from @ Justin, has anyone else got their hands on one of these dev kits yet or seen what these devices are capable of? Whilst my interest (and enjoyment) in NETMF and all things Gadgeteer has been rekindled in recent weeks (thanks @ Justin) I can’t help but think that this will be a horses for courses race where each platform will have its own niche! It would, of course, be nice to understand what MS thinks is the Galileo’s niche given the reach, form-factor and capabilities of NETMF. :think:

Cuno · August 12, 2014, 4:27am

It’s not really about the language. We’re talking about C#, whether or not this language is interpreted, compiled to a register machine, or compiled to a stack machine.

Also, when we say “stack”, this can mean different things. Stack frames containing local variables, return addresses, frame pointers, etc. reside mostly in memory (main memory, possibly cached) no matter what kind of machine executes the code. Expression stacks on the other hand, with intermediate results, are what would be kept in fast “stack registers” on a stack machine. A good stack machine, like the M-code machine for the Lilith computer ages ago, would have extremely good code density, thanks to its many zero-address instructions (operands are implicitly the top elements of the expression stack), and code density is still relevant in the microcontroller domain.

There is certainly much more research into and experience with optimizations for register machines, though. In particular, it is far easier for a register machine to correctly schedule operations in parallel. This is where the more fundamental speed-up comes from (although the last Transputer has done an incredible job at parallelizing its stack operations). But this is hardly relevant for today’s microcontrollers with fast on-chip RAM and no instructions issued in parallel.

But all the nit-picking aside, I agree that the industry has converged on register machines, and it doesn’t make sense to tuck something like Java byte codes onto an ARM controller, or trying to compete with ARM using some stack machine design. Even XMOS (the ex Transputer guys) now use a register machine design.

Regarding C# and ARM, I’d expect most of the [em]practical[/em] speed differences would be due to the effort that a compiler can or cannot put into optimizations. A JIT would result in relatively slow code and use up much memory on the microcontroller, while a highly optimizing cross-compiler like .NET Native (GitHub - dotnet/coreclr: CoreCLR is the runtime for .NET Core. It includes the garbage collector, JIT compiler, primitive data types and low-level classes.) should provide near C performance.

taylorza · August 12, 2014, 7:33am

The point I was addressing is the statement that a custom processor built to execute IL would run that IL as fast as an ARM processor would run Thumb. Which is why I contrast a stack based language to a register based language, both are executed natively by the CPU regardless of the source language.

I knew this would come up, which is why I had said “Sure you can do all kinds of optimization like having the top N stack elements mirrored in register memory…”.

Cuno · August 12, 2014, 7:52am

[quote=“taylorza”]
The point I was addressing is the statement that a custom processor built to execute IL would run that IL as fast as an ARM processor[/quote]
Ah, ok. Yes, I’d never try to build such a processor either.

Justin · August 12, 2014, 7:59am

@ cuno @ taylorza

The attached image sums up your mini conversation nicely for me :

Mr_John_Smith · August 12, 2014, 9:55am

OK agreed that trying to implement IL at the silicon level is more of a CISC vs RISC argument; the chip would be too expensive and defeat the purpose of a low cost low power uC. GHI’s dual approach of RLP is the best blend of speed, cost, supervisory code.

Sadly if the technology isn’t tied to some company’s revenue (like GHI) then the tech will die, and IoT will die shortly there after (because nobody else is really offering a good solution at this time).

godefroi · August 12, 2014, 3:21pm

If a processor could be devised that would run IL instructions at the same instructions-per-second speed as a processor that could run Thumb2 instructions, then the IL processor would indeed get more “useful” work done in a given time frame. The tradeoffs there would be power consumption, die space, complexity, etc. This is simply because the average IL instruction performs more “useful” work than the average Thumb2 instruction.

godefroi · August 12, 2014, 3:22pm

I don’t think you have to worry about that. At the end of the day, it’s all tied to ARM’s, Intel’s, ST’s, NXP’s, and others’ revenue.

Mr_John_Smith · August 12, 2014, 3:28pm

No, I don’t think so. All these companies can just write of IoT as a loss and bury the tech if they had to.

godefroi · August 12, 2014, 3:32pm

IoT is such a miniscule part of what these companies do, it’s not even an accounting error. The hardware we use for IoT is going nowhere, because it’s not being produced for IoT.

Cuno · August 12, 2014, 4:01pm

What does this mean? Leaving aside IoT as a buzzword, there are numerous microcontrollers with built-in Ethernet controller, for example. I have no information that these MCUs have so little success. On the contrary, over the last 20 years the use of industrial versions of Ethernet e.g. for industrial automation has steadily increased. This may only be the traditional part of what is now called IoT, but not a negligible market for MCU producers. Connectivity in all kinds of forms for MCUs will stay with us, even if the predictions of Cisco et.al. should turn out to have been far too optimistic (“50 billion IoT devices by 2020”).

godefroi · August 12, 2014, 4:16pm

Right, sorry, I think it’s an english ambiguity. When I said “going nowhere”, I meant, “will not be disappearing”, as in, “will continue to be available regardless of the success or failure of what is currently hyped as the Internet of Things”.

Like you said, these MCUs and other hardware, sensors, etc. are ubiquitous, and are being used for a lot of things every day. There’s no worry that Atmel (f.e.) will suddenly lose interest in the “IoT” and give up production of AVR chips.

Cuno · August 12, 2014, 4:17pm

Ah

hagster · August 22, 2014, 9:11am

I don’t know this guy, but here is his first impressions of the Galileo

Spoiler - GPIO is sloooooow.

mhectorgato · August 22, 2014, 9:15am

That’s been a common theme with the Gen1 boards.

From the Gen2 datasheet:

“12 GPIOs fully native for greater speed and improved drive strength.”

https://communities.intel.com/docs/DOC-22795