Implementation of Transmission Control Protocol in FPGA
In this project, Implementation of Transmission Control Protocol is carried out in Xilinx XUPV5-LX110T Evaluation Platform. But the implementation can be easily modified to work with any FPGA with adequate amount of Block RAM support.
Transmission Control Protocol is the protocol responsible for maintaining the reliability (guaranteed transmission of data segments) of the end to end network/internet connections. It’s a connection oriented protocol which process the incoming and outgoing packets based on its respective connection which makes this protocol complex compared to the other TCP/IP stack protocols. TCP is also a stream oriented protocol which guarantees the received sequence of data bytes are exactly in order as transmitted. With the combination of these properties, TCP functions as an agent with inherent intelligence.
As per our project to implement TCP, the goal is to fully implement this protocol in FPGA in contrast to other solutions provided by NIC (Network Interface Card) developers which are basically TOEs (TCP Offload Engines). These TCP Offload Engines implement TCP’s process intensive functions like checksum calculations and handle large segments’ offloads. Some high end Ethernet hardware also implement large receive offload and TCP acknowledgement offload. But none of these solutions fully implement the TCP protocol in hardware.
The advantages of fully implementing TCP protocol in hardware are very high. For example, to process full duplex gigabit data link in software, it utilizes over 80% of a 2.4GHz Pentium 4 processor continuously. This can be completely removed by fully hardware implementing TCP. Apart from the processing time, the processor has to respond to huge number of interrupts to facilitate the acknowledgement generation and other tiny processes of TCP. Apart from those drawbacks, the PCI interface which usually acts as the standard data transferring interface between the computer processor and the NIC is very inefficient when it comes to transferring very small data segments like TCP headers. These problems will be completely removed if the whole TCP is implemented in a separate hardware.
Getting back to our implementation, the architecture that we developed for TCP is based on 64 bit bus width. The TCP Core module consists of 3 main sub modules which are working in parallel. One sub module is responsible for processing arriving segments from a remote/foreign TCP. This is the module which implements the famous TCP state machine. Next there is another sub module to process user commands. TCP itself is a passive element in some sense; the application layer user needs to specify what type of services required from TCP. Also there should be a proper interface for the Application Layer user to send data through local TCP to a remote host. This is provided by the User Calls Module. Finally the other core module is the Timeout handling module. There are many processes which require time based triggers such as segment retransmission, systematic removal of TCP connections, delayed acknowledgement generation, keeping connections alive at idle situations, etc.
The current implementation of TCP Core is customized to optimally work with Ethernet connections and the maximum MSS (Maximum Segment Size) supported is 1460 bytes. Currently the developed TCP Core supports 16 connections and the number of connections can be easily increased up to a required level. But increase of number of connections comes with an inherent drawback of buffer sizes (Send Queue, Receive Queue and Retransmit Queue) getting reduced. Also increasing the depth of PCAM[1] (Parameterizable Content Accessible Memory) will increase the synthesizing time of the implementation. Currently each connection possesses a buffer of size 5.84 Kbytes each for sending and receiving queues. Retransmission queue has 8 Kbytes allocated for each connection. Currently the maximum clock speed that this protocol can run is 106.59 MHz due to the limitations of the FPGA development environment. But further optimizing the code which was developed for TCP, we can probably increase this clock speed to a higher level.
The estimated processing speedsfor TCP at 100MHz clock speed is around 5 Gbps[2] each in receiving direction and in transmission direction for large size segments (for 1460 byte packets). (So the maximum speed which we can process is around 10 Gbps, this value needs to be recalculated after the implementation the TCP is fully completed). However the processing speeds drops to a value around 1 to 2 Gbps in one direction in an unlikely scenario of having to process very small segments (only segment headers) continuously.
When developing TCP, lots of conflicting situations occur. One such situation is memory conflicts. For example, the TCB (Transmission Control Block) which stores all the information about each TCP connection is a shared resource. But sometimes this resource has to be modified by 2 parallel running processes. So these kinds of situations are practically handled in the implementation.
To increase the speed of processing, pipelining techniques have been implemented. This add-on nearly doubles the processing speed of the TCP. The communication between the Application User (resides in the computer) and the TCP core (reside in the FPGA) is done through PCI-E interface. Unlike software implementations in computers, the application layer cannot directly know the state of the TCP connection. So the method that we use to do the communication between the Application user and the TCP core is message passing. The application user sends commands into the FPGA regarding how the TCP should function. And from the TCP Core, it will generate messages / error messages based on how the process is going on. There are specific formats for these user commands and returning messages as defined by us which the application user should follow.
Finally one major point to state here is that, this implementation can be custom build based on user’s requirement. Whether the TCP has to work in Ethernet based environment or in a WAN (Wide Area Network) based environment, we can custom build to better match with the condition. Scalability and the performance can be fine tuned too.
[1] PCAM is used to search and identify the incoming segment’s connection. The depth of the PCAM resembles to the number of connections in the TCP implementation.
[2] Speeds are not proven yet.
Pls Get me the implementation of TCP/IP protocol core in VHDL Coding ,,,its for my project purpse……..pls do reply