Processor cores NVIDIA GeForce 8800
Figure 4 AMD's ATI Radeon HD 2900 has 320 cores
Figure 5 AMD's Griffin processor supports HyperTransport 3 bus technology unique
Figure 6 45nm process Intel Penryn chip
Sun's recently introduced 8-core 64-thread UltraSPARC T2 processor's power and excellent performance almost everyone knows, but many people do not know, in last year's Hot Chips conference, Sun on the details of the code-named Niagara- 2 UltraSPARC T2 product development plans. So, this year Aug. 20 and 21, the 19th Hot Chips conference, what the new chip technology? We will graphics processors, server processors, mobile processors, and network processor, and several other areas, to show you the latest research results of major manufacturers.
server processors: performance leap
IBM POWER 6 is currently the world's fastest microprocessor, the meeting, IBM introduced the related to three new technologies: fault-tolerant design, system scalability and performance of third generation flexible interface. In addition, IBM has also described the next generation of computer processors Z6 features a large, Sun also demonstrated the development code for the Victoria Falls the next generation of servers processor technology.
IBM has just launched in May of this year's dual-core POWER 6 processors using 65nm process technology, contains 790 million transistors, clocked at 4.7GHz, is 2 times the POWER 5 processors, but to run and cool the same power consumption. Each POWER 6 Microprocessor unit (MPU) is a 2-way single-chip multi-processor (CMP) design implementation, 340 square millimeters on a single chip integrates two synchronous multi-threaded processor, Each core has a dedicated secondary cache. POWER 6 three-level cache to 32MB, can support 16 channels.
IBM attaches great importance to reliability, availability and serviceability (RAS, Reliability, Availability and Serviceability) . POWER 6 point system using the fault tolerance test and calibration. processor state is saved in the recovery part in, and use error correction coding (ECC) protection. For example, the registers or cache write, which would cause the state change operation, will test parity, and use error correction coding processing failure. If the error can be corrected, changes are usually sent to the processor state register. can not correct the error, such as parity or control of the array fails, it will trigger recording device, recording the type of error, and then the correct state from the known re-started. any temporary error will be the right solution. repeated errors will be reported and known to send the correct state to another CPU, then started by the CPU .
RAS within the system chip can perform at a high level of redundancy and error checking mechanisms. Once a CPU error is detected, the system will record the CPU state, and activate a spare CPU, CPU to the original state. RAS not only has the POWER 5 chip all the functions, but also CPU chip can record the circumstances of each cycle. If you find an error state, the chip will self-check, if the check was still found many errors that the error will be to CPU offline. This largely avoided because too much emphasis on reliability, performance loss caused. POWER 6 can provide very high operating frequency and lower power consumption, flexible and scalable memory system and the same with the mainframe reliability, availability and serviceability.
POWER 6 processor provides a high for the bandwidth. in 5GHz, each MPU can achieve 300Gbps of bandwidth, of which approximately 80Gb / s from the three-level cache , 80Gb / s inside the bus from the MCM, 75Gb / s from memory, 50Gb / s from a remote processor, 20Gb / s from the local I / O.
IBM Z6 is a 4-core processor, you can it is the POWER 6 brothers. although the two are running in different environments, but most of the design Z6 consistent with the POWER 6. In order to highlight the z / Architecture features, Z6 also use some new design techniques, such as the 894 instruction set . The instruction set contains a number of decimal algorithm to ensure the accuracy of the computation process.
Sun Company to develop a new generation of server processors, code named Victoria Falls, the chip area and energy consumption and close to the current UltraSPARC T2 processor . At present the UltraSPARC T2 processor provides 8 server processor cores, each core can execute eight threads.
Victoria Falls of the SPARC server chips will be configured with two built-in consistency hub module, able to 65Gb / s data transfer rate of .4 road version of the Victoria Falls server chip module built-in consistency hub 4, the data transmission speed is also doubled. Using this module the memory control problem can be solved by MCH to provide external data storage routing. is expected to use the Victoria Falls processor servers will come out in 2008.
Graphics Processor: Multicore Parallel
AMD CTO Phil Hester companies in his keynote speech, said a single CPU traditional PC-centric model will be out of date, CPU need integrated graphics chips and other peripherals.
on this shortly before the Siggraph 2007, NVIDIA and AMD graphics cards were on display the new GeForce 8800 and ATI Radeon HD 2900, so that people With multi-core graphics processor parallel perceptual knowledge. At this meeting, they are on the technology involved have been described.
NVIDIA GeForce 8800 GPU with 128 processor cores, the floating-point operations per chip rate up to 576 billion times per second, running 3D games and graphics software consumes a large 150W. not only supports Microsoft DirectX 10 Shader Model 4.0, NVIDIA SLI multi-GPU also owns and PureVideo HD and other advanced technologies, with large-capacity high-speed GDDR3 memory and High-bandwidth Digital Content Protection (High Bandwidth Digital Content Protection, HDCP) specifications. different from traditional multi-core processors, NVIDIA GeForce 8800 GPU multicore often simultaneously a work. GeForce 8800 is designed that way, 8-core group to work simultaneously in a program.
NVIDIA GPU presented at the meeting of parallel computing architecture and the CUDA programming model. CUDA (Compute Unified Device Architecture) is a general-purpose computing architecture means. GeForce 8800 has applied this technology. CUDA is a The new system of entry basis, which not only supports the graphics chip simulation physics, but also increased the GPU's first C-compiler development environment. CUDA provides a C language based on the underlying library, you can directly use some D3D or OpenGL does not contain characteristics, so that synchronized GPU core chips to carry out general purpose computation, greatly improving the operation speed. because of the increased C-compiler development environment, so in the product design, data analysis, technical processing, game physics and other complex computing capability.
through the use of CUDA, programmers can write one for all types of GPU programs. NVIDIA difference between the number of the kernel GeForce 8000 series. The application did not even know the total number of processors. CUDA GPU processing to break the traditional the limitations of the data means, so that GPU core to the joint, simultaneous sharing of data.
NVIDIA's GPU's performance not only in excellent graphics, but also non-drawing of the software also provides a strong support. meeting of the example three typical applications: MRI (Magnetic Resonance Imaging, magnetic resonance imaging) image reproduction, Fluid dynamics fluid dynamics and H.264 video encoding.
despite three applications are available to some extent, parallel processing , but they are on the GeForce 8800's requirements vary. In the MRI image reproduction, the 8800 is the Athlon 64 2800 + 416-fold, but the Athlon 64 2800 + 2004 product. fluid dynamics code base using SPEC CPU2006 LBM , GPU is a CPU 12 times. in H.264 video coding, GPU is 20 times the CPU, but the algorithm for the GPU is not the most optimal.
NVIDIA GeForce 8800 GPU to demonstrate Windows Vista-based and next-generation DirectX 10 games and visual applications, NVIDIA offers for the DirectX 10 world-class support. If then with the top game, will make the game under Windows system a new look.
AMD's ATI Radeon HD 2900 has 320 cores running at 742 MHz, 475 billion floating-point operations per second, although less than the NVIDIA GeForce 8800's 576 billion times, but the actual application is not obvious difference between the two, each GPU has a good performance.
AMD and NVIDIA products, another difference is that memory interface. NVIDIA to 384, while AMD has reached 512. This means that 33% of AMD may need to add pins and memory.
with NVIDIA's 8-core groups are able to work simultaneously on a different program, AMD uses a different approach: in combination with 5 core group in each clock cycle to run a pre-determined set of instructions 5. This allows the core group of 64 perform different tasks independently.
mobile processors: with each passing day
With the popularity of mobile devices, the corresponding rapid development of mobile processors, and emerging new technologies. At the conference, Intel and AMD to introduced the two new upcoming products.
Intel's upcoming 45nm process is used, based on Core architecture for desktops, laptops and servers the next generation Penryn chip. Penryn chips into the SSE4 instructions set, support 1333MHz FSB (front-end system bus), the frequency break 3GHz. and in the secondary cache, the dual-core Penryn secondary cache capacity will be further upgraded to 6MB, and quad-core Penryn processors, secondary cache capacity will be a staggering 12MB. at the same frequency, Penryn than Croe at least 5% to 10% performance improvement.
the integration of the new SSE4 instruction set, Penryn will be in the game, video decoding, 3D images, Web services improve the performance certain. Penryn respect to the video optimized for high-definition video technology and Clear Video UDI interface specification to provide strong support. 45nm process is currently planned that about 15 processors, including the version with Hyper-Threading Technology The dual-core Wolfdale, quad-core Bloomfield native quad-core Yorkfield and so on.
compared with the current processors, the Penryn-based 45nm computing performance has exceeded the previous, such that a substantial increase in notebook computing performance of mobile devices, and Mobile Penryn processor is also equipped with more advanced power management technology. Soon after, compact notebook computers can also large-scale complex data calculations.
AMD company will also introduce a new generation of mobile CPUmGriffin.Griffin using 65nm process with Turion64, like, Griffin had two physical cores and integrated DDR2 memory controller, support for HyperTransport 3 bus technology unique, and the secondary cache to 2MB, 35W power consumption probably around.
HyperTransport 3 bus technology has the following several advantages, such as: higher frequency, more liberal domination of resources to support the HTX interface, support hot swap. In support of dynamic management of power, allowing the operating system on the HyperTransport bus frequency and bit width to make dynamic adjustments in meet the performance requirements under the premise of reducing power consumption.
Griffin processor with the same core Barcelona server level, but also for mobile applications were optimized. taking into account anytime, anywhere mobile processor running at full speed does not need floating-point computing unit, Griffin can be completely shut down one of the core, in order to achieve maximum effect life. CPU and north bridge of the two core voltage will be independent of control, and 1 / 8 between load to full load operation, each kernel of 9 phase of the reduced frequency, performance and power consumption can gain a perfect balance. Not only that, Griffin will also provide a good deep sleep C4 level of support. The dual-core processors is expected to be available in 2008.
network processors: flexible and powerful network processor with
flexible architecture and powerful processing capabilities, the programmability and processing power ASIC together organically. At present the major manufacturers have introduced a 10Gb / s ~ 20Gb / s network processor, 40Gb / s network processors market is gradually.
Bay Microsystem meeting demonstrated in Chesapeake network processor chip that the data exchange rate of 50Gb / s. Network Processing NPU device is a programmable hardware device, a combination of RISC processors, low cost and high flexibility and ASIC hardware, high speed and high scalability. network processor dedicated to network with a fixed customer interface, programmable kernel, and other network characteristics.
Xelerated company's programmable network processor X10q, there are 3 models, the interface speed and power consumption vary. X10q using data flow methods, there are about 200 processors . When the data packet moves along the assembly line, each data packet processor to perform one or several operations. In order to maintain high-speed operation, each processor can only use up to 4 instructions per packet processing, which means 4 instructions will not be wasted in the load, save, or I / O operation. X10q design won the annual Best Extreme Processor Award.
Xelerated company recently introduced the next generation X11.X11 X10q still use the data Stream pipeline technology, pipeline more compact than the X10q simple; increase of 24 FE / GE MAC, a 50% cost reduction to support 20Gb / s bi-directional Ethernet or 10Gb / s bi-directional SONET applications.
Intel plans to available by the end of this year the north and south bridge features a fully integrated processor Tolapai.Tolapai processor is based on Pentium M processor, built-in 256KB L2 cache, clock frequency of 600MHz, 1.06GHz and 1.2GHz, DDR2 memory interface supports data rates for the 400 ~ 800MHz, has improved the I / O capabilities, built-in 3 groups GigABIT Ethernet networking (RGMII or RMII), the highest-chip memory up to 2GB. The processor can run all 32-bit operating systems, power consumption is estimated at 13 ~ 25W. The processor is embedded computer and industrial computer market.
in wireless networks, with the advancement of standards, a variety of wireless network processor is also ready to come.
802.11n WiFi products using MIMO -OFDM (Multiple Input Multiple Output - Orthogonal Frequency Division Multiplexing) technology, the transmission speed can be achieved next year, 600Mbps, for the transmission of multimedia streams to provide the necessary range, speed and reliability, and ultimately to achieve a comprehensive wireless home media distribution function . At present, the new HomePlug AV powerline standard transmission speed has reached a 200Mb / s.
MIMO-OFDM technology in space to generate independent and parallel channels transmit multiple data streams simultaneously, without increasing the system bandwidth in the case of increase the spectral efficiency, improve the system throughput. MIMO-OFDM Technology by OFDM transmission system using array antennas to achieve spatial diversity and improve the signal quality, and increased multi-path tolerance, the wireless network and effective transfer rate achieve a qualitative improvement. but on the 802.11 standard, 802.11n single MAC layer protocol is optimized, changing the data frame structure, increasing the proportion of net load, reducing the share of management of the number of bytes error detection, which greatly enhance the network throughput volume.
U.S. SiBEAM 60GHz band millimeter wave launched in the communications chip unit to wirelessly currently undertaken by the HDMI uncompressed HDTV transmission, the transmission speed is 4Gbps. 7GHz of the transmission system provides the RF bandwidth, the effective power of 8W. chipset integrates a 36 element antenna array of tiny, about 20mm square of the RF transceiver module and the digital baseband / MAC processing LSI.
SiBeam signal bandwidth using the 2.5GHz, the effective distance of about 10m, It also has a transmission speed of 40Mbps digital back channel to ensure that the video receiver to the sender of the video communication. This technology will help realize the home entertainment system, TV, HD DVD, camera, etc. among communication will no longer need a video cable or audio cable. Currently, the chip has not been determined group time to market, is expected in early 2008 in Las Vegas Consumer Electronics Show (CES) will be its demonstration on.
Links: chip technology foresight
Sun's computer chip connection technology mmProximity
Hot Chips not only introduced recently and the upcoming launch of a variety of products, but there are many forward-looking technology, herald the next few years or even decades to promote the new development trend of the chip.
Sun plans to launch in the next four years, a new computer chip connection technology mmProximity technology. This new technology will bypass the existing chip design in connection connection, direct communication between the chip without the need for circuit boards, wire or Pin interface. Because the data transmission between the chip and the transmission channel to enhance speed, the machine will gradually increase the overall performance, energy consumption will decline.
in Over the past 20 years, despite the performance of the processor has been upgraded, but the input and output interfaces has become a performance bottleneck. Previously, only through the chip package after the metal pin (Pin) is connected with the circuit board, but because these pins too big limit the interface space, the bandwidth is severely restricted. Sun said, using this connection method can greatly increase the communication bandwidth between chips, per square millimeter 10Tb, an ordinary 10 times the chip connection. As communication bandwidth increases, Proximity technology also eliminates some of the chip's cache, thereby reducing manufacturing costs.
IBM's Shahidi predicted level 8 years after the 11nm manufacturing process will occur, but the current technical level, not reality. In addition, Intel's Mayberry said, not long ago had given up the simple CMOS scaling technology, replaced with a new type of device scaling. This new scaling technology will be able to continue even longer. UC Berkeley has been studying Kubiatowicz University of quantum computers. Kubiatowicz that the principles of quantum computers is based on ion trap, is closely related to the complex relationship with the microprocessor. The basic unit of quantum computer are quantum bits. Due to the unique nature of the quantum bits can not only take 0 or 1, but also both read 0 and 1. Finally, Stanford University, Professor Horowitz talked about CMOS technology will eventually be replaced, but there is no technology can replace the future designers will develop a smaller process technology.
HP Labs The Norm Jouppi about 40 years, the development process of IC manufacturing. IC physical size of the initial 1% increase of 10,000 times the computing power, operating frequency increases 100 times. Over the past 40 years as we pursue more on-chip transistors, the next 40 years should focus on each part of the circuit. photonics and nanotechnology will be applied to improve the CMOS. photonic switching more advantages than the silicon transistors because they can achieve higher bandwidth.