Debugging RLP code

Does anyone have any pointers on debugging RLP code? I’m running into trouble with my RLP shift register driver, and I’m not quite sure where to turn.

For starters, I’m wondering if my pin assignments are correct. On the managed side, I’m doing this:


public ShiftRegisterRLP(Cpu.Pin clock, Cpu.Pin data, Cpu.Pin latch, Cpu.Pin clear)
{
	var elf = Resources.GetBytes(Resources.BinaryResources.RLPShiftRegister);

	RLP.LoadELF(elf);
	RLP.InitializeBSSRegion(elf);

	m_init_proc    = RLP.GetProcedure(elf, "Initialize");
	m_release_proc = RLP.GetProcedure(elf, "Release");
	m_writeb_proc  = RLP.GetProcedure(elf, "WriteByte");
	m_writes_proc  = RLP.GetProcedure(elf, "WriteShort");

	elf = null;

	Debug.GC(true);

	m_init_proc.Invoke((uint)clock, (uint)data, (uint)latch, (uint)clear);
}

That is, I’m first casting FEZ_Pin to Cpu.Pin, and then casting to uint to pass to the RLP code, which looks like this:


unsigned int port_clock = RLP_GPIO_NONE;
unsigned int port_data  = RLP_GPIO_NONE;
unsigned int port_latch = RLP_GPIO_NONE;
unsigned int port_clear = RLP_GPIO_NONE;

int Initialize(unsigned int *generalArray, void **args, unsigned int argsCount, unsigned int *argSize)
{
	port_clock = *(unsigned int*)args[0];
	port_data  = *(unsigned int*)args[1];
	port_latch = *(unsigned int*)args[2];
	port_clear = *(unsigned int*)args[3];

	// make sure the ports aren't already reserved (latch and clear are optional)
	if( RLPext->GPIO.IsReserved(port_clock) == RLP_TRUE )
		return -1;
	if( RLPext->GPIO.IsReserved(port_data) == RLP_TRUE )
		return -2;
	if( port_latch != RLP_GPIO_NONE && RLPext->GPIO.IsReserved(port_latch) == RLP_TRUE )
		return -3;
	if( port_clear != RLP_GPIO_NONE && RLPext->GPIO.IsReserved(port_clear) == RLP_TRUE )
		return -4;

	// reserve the clock port
	if( RLPext->GPIO.ReservePin(port_clock, RLP_TRUE) != RLP_TRUE )
	{
		Release(0, 0, 0, 0);
		return -1;
	}

	// reserve the data port
	if( RLPext->GPIO.ReservePin(port_data, RLP_TRUE) != RLP_TRUE )
	{
		Release(0, 0, 0, 0);
		return -2;
	}

	// reserve the latch port
	if( port_latch != RLP_GPIO_NONE && RLPext->GPIO.ReservePin(port_latch, RLP_TRUE) != RLP_TRUE )
	{
		Release(0, 0, 0, 0);
		return -3;
	}

	// reserve the clear port
	if( port_clear != RLP_GPIO_NONE && RLPext->GPIO.ReservePin(port_clear, RLP_TRUE) != RLP_TRUE )
	{
		Release(0, 0, 0, 0);
		return -4;
	}

	// enable output mode on the pins, all low initially
	RLPext->GPIO.EnableOutputMode(port_clock, RLP_FALSE);
	RLPext->GPIO.EnableOutputMode(port_data,  RLP_FALSE);
	
	if( port_latch != RLP_GPIO_NONE )
		RLPext->GPIO.EnableOutputMode(port_latch, RLP_FALSE);
	
	if( port_clear != RLP_GPIO_NONE )
		RLPext->GPIO.EnableOutputMode(port_clear, RLP_FALSE);

	// we're done
	return 0;
}

Is that a valid way to get pin numbers? For my Panda II, they map like this:
FEZ_Pin.Digital.Di20 -> pin 52
FEZ_Pin.Digital.Di21 -> pin 51
FEZ_Pin.Digital.Di22 -> pin 54
FEZ_Pin.Digital.Di23 -> pin 53

You can debug by sending some informative messages to a serial port. Sometimes an LED can help as well.

Ah, yeah, I figured out my problem. I was initializing the clear pin to low, and never changing it (it’s active low :-[).

Now, I’ve got it working, even though my binary math is off somewhere. That I can deal with. Oh, and when you said it would be faster, you weren’t kidding:

// 16-bit test:
// Completed [managed] in 232296ms.
// Completed [managed] in 228876ms.
// Completed [rlp] in 996ms.
// Completed [rlp] in 996ms.
// Completed [rlp] in 996ms.

:o :o :o

Don’t know about the pin mapping, but have you checked the return value of the Initialize method?

Oops, too late :slight_smile:

Hm, bad news. My binary math wasn’t off, I was only running the 8-bit test (outputting 0-255) instead of the 16-bit test (outputting 0-65535).

Turns out the RLP driver isn’t all that much faster. In fact, it’s slower. Possibly because it’s triggering all sorts of garbage collections, and I’m not sure why (I’m using **args instead of *generalArray, could that be it?). The updated numbers are as follows:

// Completed [managed] in 232296ms.
// Completed [managed] in 228876ms.
// Completed [rlp] in 251128ms.

What compiler are you using for the native code?

The YAGARTO one. My makefile looks like this:


OUTFILE=RLPShiftRegister
LINKERSCRIPT = RLP_LinkScript.lds

INCL=./include

CC		=arm-none-eabi-gcc-4.5.2
LD		=arm-none-eabi-gcc-4.5.2

CCFLAGS=  -g -mlittle-endian -mcpu=arm7tdmi  -Wall -I. -I$(INCL)
CCFLAGS+= -mapcs-frame -fno-builtin

LDFLAGS =-nostartfiles -Wl,--Map -Wl,./Output/$(OUTFILE).map
LDFLAGS+=-lc -lgcc -Wl,--omagic
LDFLAGS+=-T $(LINKERSCRIPT)

OBJS+= RLPShiftRegister.o

rebuild: clean all del_o

all: $(OBJS)
	$(LD) $(LDFLAGS) -o ./Output/$(OUTFILE).elf $(OBJS)

RLPShiftRegister.o: RLPShiftRegister.c 
	$(CC) -c $(CCFLAGS) RLPShiftRegister.c -o RLPShiftRegister.o

clean:
	-rm *.o ./Output/*.elf ./Output/*.map

del_o:
	-rm *.o

del_map:
	-rm ./Output/*.map

What are you trying to do? You are comparing this code to what in C#?

There is a lot of optimization everywhere in the GHI drivers so RLP will not be faster than built in drivers. RLP is good for processor intensive things or real time operations.

I’m bit-banging some output ports for interfacing with a shift register (SIPO).

My C# code, essentially, is this (8-bit version):

private OutputPort m_clock;
private OutputPort m_data;
private OutputPort m_latch;
private OutputPort m_clear;

public void Write(byte data, bool clear, bool lsbFirst)
{
	if( m_clear != null && clear )
	{
		m_clear.Write(false);
		m_clear.Write(true);
	}

	if( m_latch != null )
		m_latch.Write(false);

	for( var i = 0; i < 8; i++ )
	{
		// first, bring the clock low
		m_clock.Write(false);

		// then, write the data
		m_data.Write(data >= 128);

		// finally, bring the clock high
		m_clock.Write(true);

		// shift left
		data = (byte)(data << 1);
	}

	if( m_latch != null )
		m_latch.Write(true);
}

My RLP code is as follows:

int WriteByte(unsigned int *generalArray, void **args, unsigned int argsCount, unsigned int *argSize)
{
	unsigned char data  = *(unsigned char*)args[0];
	unsigned int  clear = *(unsigned int*)args[1];
	unsigned int  lsbf  = *(unsigned int*)args[2];

	// clear (if we're clearing, and we have a clear pin)
	if( clear == RLP_TRUE && port_clear != RLP_GPIO_NONE )
	{
		RLPext->GPIO.WritePin(port_clear, RLP_FALSE);
		RLPext->GPIO.WritePin(port_clear, RLP_TRUE);
	}

	// latch false (if we're latching data)
	if( port_latch != RLP_GPIO_NONE )
		RLPext->GPIO.WritePin(port_latch, RLP_FALSE);

	// write the data
	for( i = 0; i < 8; i++ )
	{
		// first, bring the clock low
		RLPext->GPIO.WritePin(port_clock, RLP_FALSE);

		// then, write the data
		RLPext->GPIO.WritePin(port_data, data >= 128);

		// finally, bring the clock high
		RLPext->GPIO.WritePin(port_clock, RLP_TRUE);

		// shift left
		data = data << 1;
	}

	// latch true (if we're latching data)
	if( port_latch != RLP_GPIO_NONE )
		RLPext->GPIO.WritePin(port_latch, RLP_TRUE);

	return 0;
}

RLP should be faster but you see the real advantage when you transfer whole buffers not a byte.

Keep in mind the overhead of passing data from C# to C

[quote]RLP should be faster but you see the real advantage when you transfer whole buffers not a byte.

Keep in mind the overhead of passing data from C# to C[/quote]

Right, even with two-byte buffers I’m seeing no advantage at all. There are a LOT of collections happening while running the RLP code, and none while running the managed code. I’m going to rework the system to use the *generalArray parameter instead of the **args parameter, since it is stated that that method is more efficient (hopefully eliminating the GC activity…)