void FFT1D(int dir,int m,double *x, double *y)
{
long nn, i, i1, j, k, i2, l, l1, l2;
double c1, c2, tx, ty, t1, t2, u1, u2, z;
/* Calculate the number of points */
nn = 1;
for (i = 0; i < m; i++)
nn *= 2;
/* Do the bit reversal */
i2 = nn >> 1;
j = 0;
for (i = 0; i < nn - 1; i++)
{
if (i < j)
{
tx = x[i];
ty = y[i];
x[i] = x[j];
y[i] = y[j];
x[j] = tx;
y[j] = ty;
}
k = i2;
while (k <= j)
{
j -= k;
k >>= 1;
}
j += k;
}
/* Compute the FFT */
c1 = -1.0;
c2 = 0.0;
l2 = 1;
for (l = 0; l < m; l++)
{
l1 = l2;
l2 <<= 1;
u1 = 1.0;
u2 = 0.0;
for (j = 0; j < l1; j++)
{
for (i = j; i < nn; i += l2)
{
i1 = i + l1;
t1 = u1 * x[i1] - u2 * y[i1];
t2 = u1 * y[i1] + u2 * x[i1];
x[i1] = x[i] - t1;
y[i1] = y[i] - t2;
x[i] += t1;
y[i] += t2;
}
z = u1 * c1 - u2 * c2;
u2 = u1 * c2 + u2 * c1;
u1 = z;
}
c2 = sqrt((1.0 - c1) / 2.0);
if (dir == 1)
c2 = -c2;
c1 = sqrt((1.0 + c1) / 2.0);
}
/* Scaling for forward transform */
if (dir == 1)
{
for (i = 0; i < nn; i++)
{
x[i] /= (double)nn;
y[i] /= (double)nn;
}
}
}

This is a function of FFT transform in RLP in FezSpider.

I find that it takes 15sec to execute. I want to improve its performance.
If I use FezHydra then how much performance it will improve?
If you have another option then please share.

@ Duke Nukem -
in this function
nx=160
ny=160
dir=1

void FFT2D(COMPLEX *c, int nx, int ny, int dir)
{
COMPLEX *Output;
Output = RLPext->malloc(25600 * sizeof(COMPLEX));
//// Transform the Rows
index = 0;
for (i=0;i<=nx -1;i++)
{
for (j = 0; j <= ny - 1; j++)
{
Output[index].real = c[index].real;
Output[index].imag = c[index].imag;
index++;
}
}
for (j = 0; j < ny; j++)
{
for (i = 0; i < nx; i++)
{
real[i] = c[j*ny+i].real;
imag[i] = c[j * ny + i].imag;
}
// Calling 1D FFT Function for Rows
m = 7;//(int)Log(nx,2);//Finding power of 2 for current number of points e.g. for nx=512 m=9
FFT1D(dir, m,real,imag);
for (i = 0; i < nx; i++)
{
Output[j * ny + i].real = real[i];
Output[j * ny + i].imag = imag[i];
}
}
// Transform the columns
//*real = new double[ny];
//imag = new double[ny];*/
for (i = 0; i < nx; i++)
{
for (j = 0; j < ny; j++)
{
real[j] = Output[j * ny + i].real;
imag[j] = Output[j * ny + i].imag;
}
// Calling 1D FFT Function for Columns
m = 7;//(int)Log((double)ny, 2);//Finding power of 2 for current number of points e.g. for nx=512 m=9
FFT1D(dir, m,real,imag);
for (j = 0; j < ny; j++)
{
Output[j * ny + i].real = real[j];
Output[j * ny + i].imag = imag[j];
}
}
//// transfer output values in to fourior///////////////
index=0;
for (i=0;i<=nx -1;i++)
{
for (j = 0; j <= ny - 1; j++)
{
Fourier[index].real = Output[index].real;
Fourier[index].imag = Output[index].imag;
index++;
}
}
//if(Output)
RLPext->free(Output);
}

this image are given to this fft2d function .but fft1d takes 15 seconds to execute . how to improve performance

Thereâ€™s different way of computing an FFT according to its size, the radix 2 is not necessarly the best one (see the FFTW project or the spiral project from CMU).

Moreover try to extract the nx-1, ny-1 computations from the loop.

Even if Hydra is Faster than Spider in term of MHz, the only thing that will result in a real speed up is a perfect match between the architecture of the processor and the algorithm. In this field we often talk about architecture, algorithm adequation ( A-cube paradigm). Therefore you need to hand tune your FFT algorithm in order to help the compiler to extract butterfly pattern that will be executed in an efficient way (usually by a MAC operator).

Hydra is about 5 times faster than spider but hydra has floating point processor so it will be a lot faster. I am not sure by how much but maybe 30 times faster. I would love to know if anyone has tried this.

Well I might have to post this in Code Share as I put together a PI Calculator and Iâ€™m using that for performance comparisons. Iâ€™m just waiting on the Spider to calculate 200 places of PI, but it is the slowest of my mainboards (Spider, Hydra, Cerberus, Mountaineer USB, ArgonR1), but the Spider is still my go to board for prototyping as it can do anything. The Mountaineer board is the most surprising as its quick and the Cerberus gives huge bang for the buck.

The hydra will be plenty of times faster. You are using floating point (float/double), and the EMX (FEZ Spider) doesnâ€™t have a floating-point-unit.
So all floating point operations result in a huge blob of assembly code. The Hydra has a floating-point-unit.

When I was developing the 3D engine on the hydra I was surprised that it does more then a million floating point operations each second, donâ€™t expect this on EMX.

@ WouterH you are correct and hence why I had to post the code for this as you canâ€™t do a performance test and not post the code so people can see exactly what your testing. Perhaps someone has a Whetstone or other FPU tests they can test and post as Iâ€™ll run it against my collection of mainboards.

Fez Spider does not have support FPU. so we are thinking about to use fixed point library like â€śLibfixmathâ€ť. fez hydra does not support camera so we can not use fez hydra. if we use Fixed Point library how much performance incraese? Any suggestion Please tell me.