Dear all,
I am currently testing data transfer with FX3, from GPIF2 to USB 3.0. I took the project GpifToUsb as basis. Since I would like to have a clock of 32 MHz, I reduced the PCLK clock (clkdiv=12 instead of 4) and I work with Streamer C# (modified version to do datalogging).
First, I noticed that 4 buffers of 32kB do not work properly. The throughput is very low. To solve this problem, I reduced the buffers to 4x8kB and I got almost the right throughput (32bits x 32MHz -> 128 MB/s, I got about 120-125).
So, my first question is: why does the system does not work well with large buffers and work well with small buffers? It is really weird because I think it should be the opposite...
Then, tried to connect a 8 bits counter on 8 GPIF pins to see if I lost data. Because of latency and long wire (I am using a bread board), I had to put the PCLK to 16 MHz. Then, to have a throughput of about 64 MB/s, I had to select buffers of 4x2kB becasue 4x8kB gave me a very low throughput...
So, with the original firmware, I lose 28 words of 32 bits every time a buffer is full. This seems to be normal according to the fact we have only 1 thread/socket and there is a latence while switching buffer (as explained in AN75779).
So I made some modifications in order to use 2 sockets/threads to tranfert data from GPIF2 to FX3 memory. Here is the code using CyU3PDmaMultiChannelConfig_t dmaMultiCfg; (instead of CyU3PDmaChannelConfig_t dmaCfg;)
dmaMultiCfg.size = 2048;
dmaMultiCfg.count = 4;
dmaMultiCfg.validSckCount = 2;
dmaMultiCfg.prodSckId [0] = (CyU3PDmaSocketId_t)CY_U3P_PIB_SOCKET_0;
dmaMultiCfg.prodSckId [1] = (CyU3PDmaSocketId_t)CY_U3P_PIB_SOCKET_1;
dmaMultiCfg.consSckId [0] = (CyU3PDmaSocketId_t)CY_FX_EP_CONSUMER_SOCKET;
dmaMultiCfg.prodAvailCount = 0;
dmaMultiCfg.dmaMode = CY_U3P_DMA_MODE_BYTE;
dmaMultiCfg.prodHeader = 0;
dmaMultiCfg.prodFooter = 0;
dmaMultiCfg.consHeader = 0;
dmaMultiCfg.prodAvailCount = 0;
dmaMultiCfg.notification = CY_U3P_DMA_CB_CONS_SUSP;
dmaMultiCfg.cb = GpifToUsbDmaMultiCallback;
apiRetStatus = CyU3PDmaMultiChannelCreate (&glDmaMultiChHandle, CY_U3P_DMA_TYPE_AUTO_MANY_TO_ONE, &dmaMultiCfg);
if (apiRetStatus != CY_U3P_SUCCESS)
{
CyU3PDebugPrint (4, "CyU3PDmaChannelCreate failed, Error code = %d\n", apiRetStatus);
CyFxAppErrorHandler(apiRetStatus);
}
/* Set DMA Channel transfer size */
apiRetStatus = CyU3PDmaMultiChannelSetXfer (&glDmaMultiChHandle, CY_FX_GPIFTOUSB_DMA_TX_SIZE, 0);
if (apiRetStatus != CY_U3P_SUCCESS)
{
CyU3PDebugPrint (4, "CyU3PDmaChannelSetXfer failed, Error code = %d\n", apiRetStatus);
CyFxAppErrorHandler(apiRetStatus);
}
As you can see, I basically modified the code the "DmaMulti" objects instead of "Dma". For the state machine, I did some adaptions. There is 2 threads and the 1st thread fill in the data until the !DMA_RDY_TH0 flag is set and switch to the 2nd thread until the !DMA_RDY_TH1 flag is set. Please see the attached file.
With this configuration, I have better results but not perfect. Instead of loosing 28 words of 32 bits while the GPIFII switch the buffer, I lose only 1 word of 32 bits.
So I have 2 issues:
- Why do I have to decrease buffer size in order to have correct transfert?
- Why do I loose 1 word of 32 bits when I use 2 threads?
Thank you in advance,
Best regards,
Christian