No announcement yet.

ClearSpeed Advance™ X620 and e620 Accelerator Boards

  • Filter
  • Time
  • Show
Clear All
new posts

  • ClearSpeed Advance™ X620 and e620 Accelerator Boards

    ClearSpeed Advance accelerator boards are double-precision, IEEE 754 compliant floating-point accelerator boards Each Advance board is equippedwith two CSX600 processors, which is the world’s highest performance, most power efficient processor for double precision floating point arithmetic. Designed for use in server and workstation systems based on 32 bit and 64 bit x86 compatible architectures, they deliver over 66 GFLOPS of sustained double precision matrix multiply (DGEMM) performance while averaging only 25 watts power consumption.

    The Advance X620 is a standard height, two-thirds length PCI-X board designed for new and existing systems whose architecture incorporates the PCI-X standard. Building on the success of the Advance X620 and delivering the same computational performance, the Advance e620 is a complementary and smaller form factor PCIe based accelerator. It brings all the benefits of ClearSpeed’s acceleration technology to the latest generation of multi-core industry standard servers that incorporate the PCIe standard.

    Together the existing Advance X620 and the Advance e620 significantly increase the number of server platforms that can take advantage of ClearSpeed acceleration.

    ClearSpeed Advance X620 ClearSpeed Advance e620
    Advancing performance without impacting infrastructure costs
    ClearSpeed Advance boards simply plug into otherwise empty PCI-X or PCIe slots in industry standard servers or workstations without any modification or need for additional power connections. Functioning as true coprocessors in conjunction with the host processor they accelerate the most computationally intensive portions of an application. Each Advance board delivers:

    Blazing performance, capable of over 66 GFLOPS sustained double precision DGEMM
    Greater accuracy, IEEE 754 compatible 64-bit floating point operations
    Low power consumption, typically averaging only 25 Watts power dissipation per board
    Users are able to take advantage of ClearSpeed's acceleration through using standard libraries and applications which have been ported to run on the Advance board family and the CSX processor

    The ClearSpeed CSX600
    ClearSpeed's CSX600 is an embedded low power data parallel coprocessor. It provides 25 GFLOPS of sustained single or double precision floating point performance, while dissipating an average of 10 Watts. Using 64-bit addressing, each CSX600 can support multi-gigabyte DDR2 SDRAMs via a local ECC protected memory interface.

    The CSX600 processor is actually a system-on-a-chip (SoC), based around the combination of ClearSpeed's patented multi-threaded array processor (MTAP) and ClearConnect™ Network on Chip (NoC) technology. The MTAP architecture has been designed to provide unparalleled performance-per-watt, while the low-power ClearConnect NoC provides straightforward system-wide concurrent bandwidth.


    10W average power dissipation
    25 GFLOPS sustained double precision DGEMM
    96 Gbytes/s internal memory
    3.2 Gbytes/s external memory
    2 x 3.2 Gbytes/s chip-to-chip bandwidth

    96 high-performance processing elements cores, each with dedicated memory
    6 Kbytes high bandwidth memory per processing element
    128 Kbytes on-chip scratchpad memory
    64-bit DDR2 DRAM interface with ECC support
    ClearConnect provides on-chip and inter-chip data network
    Host interface and debug port
    64-bit virtual, 48-bit physical addressing
    On-chip instruction and data caches
    On-chip DMA controller

    The CSX600 comprises of an MTAP processor core, external DRAM interface, high-speed inter-processor I/O ports and embedded SRAM integrated onto a single chip. All subsystems on the chip are interconnected via the ClearConnect on-chip network. The MTAP contains an array of 96 Processing Elements (PEs) or cores. Each PE includes multiple processing units and have a high level of internal instruction and data parallelism. Each PE also has its own local memory providing high-bandwidth to frequently used data.

    ClearConnect™ NoC
    Interconnect on the CSX600 is achieved using ClearConnect, a packet switched on-chip network (NoC). All memory based data transactions are converted into packets and then transmitted over the network. It also supports multiple concurrent transfers, for example, the processor can access data in the on-chip SRAM at the same time as data is transferred to the DDR2 interface from one of the bridge ports. This enables extremely high aggregate bandwidth with low power consumption. ClearConnect is also used, via bridge ports, to provide communication between CSX processors.

    Memory system
    External memory is connected via a 64-bit DDR2 DRAM interface. When used with 72-bit wide DRAM modules, the interface can support Error Checking and Correction (ECC). Each processor supports up to 4 Gbytes of local DRAM. The processor supports 64-bit addressing so that large data sets can be processed. The 64-bit address space is flexibly mapped into a 48-bit physical address space distributed across multiple processors. For embedded systems and backward compatibility a simple 32-bit addressing mode is provided.

    Interrupt and semaphore unit
    The on-chip DMA controller can be programmed to transfer data to and from the external memory interface and any other device on the ClearConnect NoC. On-chip SRAM is included for frequently accessed code and data.

    The Interrupt and Semaphore Unit (ISU) supports low latency synchronization between threads and external events such as memory to memory communications. Both pin and message signalled interrupts are supported for flexible support of multiple devices in various host environments.

    Host/debug port (hdp)
    A host interface allows the CSX600 to communicate with, and be controlled by, the system's host processor. This port can also be used as a hardware and software debug port as it provides full access to all the internal registers on the device. Finally, an IEEE 1149.1 Test Access Port (TAP) supports boundary scan for system test.

    CSX600 Processor Interconnection
    Click image to enlarge
    Advance™ Accelerator Board
    ClearSpeed's CSX600 Advance™ Accelerator Board provides application acceleration without impacting power, cooling or space requirements. In a PCI-X form factor one or more boards can be easily added to a workstation or server. And because it operates at the standard math library level, application users see only the performance gain with none of the hassle of changing their code.

    Requiring no more space than a free PCI-X slot for each ClearSpeed Advance board, adding acceleration to your workstations or servers is easy. Drawing on average only 25 watts it won't increase your power or cooling requirements.

    Adding a single Advance board to your workstation gives you up to an additional 50 GFLOPS sustained performance - that's like having five workstations under your desk. There's no need to port code for many applications; it accelerates standard math libraries e.g. Level 3 BLAS, FFTW etc. used by many applications such as Mathematica and MATLAB.

    Using multiple boards to accelerate your cluster enables even bigger breakthroughs in performance per node. Combined with IEEE 754 compliant 64bit floating point, teams are freed to advance their science, tackling ever bigger problems with greater accuracy.

    How it works
    The Advance board works by offloading compute-intensive math library routines called by applications running on the host processor.

    When a call is made by an application to a ClearSpeed supported standard math library, it is intercepted by CSXL, ClearSpeed's accelerated math library, which calculates if the function call is worth off-loading. When it is, the CSXL transfers the required data to the board to compute the function. The answer is calculated on the board and the results read back into host memory before returning to the application.

    Throughout this process, the only perceivable difference between a function running on the host system, or a function running on the Advance board, is the speed. The acceleration is transparent to the end user and the application.

    The hardware consists of a single-slot PCI-X board with two CSX600 processors that can be used to accelerate a single desktop machine or nodes of a cluster. Multiple boards may be used in one system. The two CSX600s and an FPGA are daisy-chained together via (high speed) ClearConnect™ bridges, ClearSpeed's high speed network-on-chip (NoC). ClearConnect is also extended into the FPGA enabling a bridge to the host system to be implemented as a hardware block . This provides an efficient universal memory architecture between the CSX600 processors and the host's memory system. Each processor has 512 Mbytes of DDR2 SDRAM local memory. This memory also forms part of the universal memory architecture with DMA engines providing automated movement of data between local and host memory. The ClearSpeed CSX600 is a high-performance coprocessor based around an advanced multi-threaded array processor (MTAP) with 96 Processing Elements (PEs). Each PE has a dual 64-bit FPU and 6 Kbytes of local memory. This architecture provides very high performance combined with extremely low power.

    Hardware specifications

    PCI-X 2.0 mode 1 with 3.3 V signaling, 66 to 133 MHz
    Two CSX600 processors
    1 Gbyte DDR2 SDRAM
    Integrates with ClearSpeed's SDK
    Bus mastering DMA between host and board
    Single slot width, full height
    Length: 7.98 inches (202.9mm)
    Width: 3.896 inches (98.06mm)

    Supporting Software
    The board is provided with supporting software in the form of standard software libraries used in a variety of applications. Currently available or under development are: Level 3 BLAS and FFTW. As well as standard libraries and application software, the CSX600 board is supported by ClearSpeed's Software Development Kit (SDK). This includes a C compiler, a debugger based on gdb, and a full suite of supporting tools and libraries.

    Measured Linpack performance
    The Linpack Benchmark is used to solve a dense system of linear equations. The Top500 which tracks the 500 most powerful commercially available computer systems known on a semi-annual basis uses a version of the benchmark that allows the user to scale the size of the problem and to optimize the software in order to achieve the best performance for a given machine.

    Benchmarks performed by ClearSpeed Technology in August 2006 on a single server with two 3.0 GHz Intel® Xeon® 5160 (Woodcrest) dual core processors system delivered 34 GFLOPS without acceleration. A cluster of four such nodes delivered an impressive136 GFLOPS from its 8 Intel® Xeon® 5160 (Woodcrest) dual core processors while consuming 1,940 Watts of power.

    With two Advance accelerator boards in each server, a single node delivered 90 GFLOPS and the cluster performance was increased to over 364 GFLOPS while adding only 200 Watts to the overall power levels, representing over 1GFLOP Linpack per Watt of additional performance. The ClearSpeed accelerated cluster completed the Linpack benchmark run in just 18.4 minutes while using only 40% of the energy required by the non-accelerated cluster which took 48.4 minutes to finish.

    To put these results in context, the performance delivered by the four node ClearSpeed accelerated cluster, (a total of 16 CPU cores,) is equivalent to the world's most powerful supercomputer according to the Top500 results from November 1996. That supercomputer was a massive 2048 CPU Hitachi system at the Center For Computational Science at the University Of Tsukuba in Japan that delivered 368.2 GFLOPS.

  • #2
    Re: ClearSpeed Advance™ X620 and e620 Accelerator Boards

    While that is interesting, why did you post it? Do you work with these?


    • #3
      Re: ClearSpeed Advance™ X620 and e620 Accelerator Boards

      if you got between 8000 - 9000 euro 's


      • #4
        Re: ClearSpeed Advance™ X620 and e620 Accelerator Boards

        Originally posted by RandomGuy View Post
        While that is interesting, why did you post it? Do you work with these?

        , i have to copy it ,cause could not place a link, sorry .

        check out the tech .data




        TeamSpeak 3 Server


        Twitter Feed