noscript
2025. 3. 12. 오후 5:03 녹음읽기전용

Speaking in front of you so to be honest I couldn't imagine the stage was so large I assumed to be speaking in a relatively small seminar room so I'm a bit excited Please understand even if sometimes I make mistakes Alright so The purpose of this seminar is basically to provide you some interest in the field of high-speed wireless, which is becoming more and more important in high-performance computing systems, especially in the industry. The economical and technical importance of chip-to-chip wireless communication is becoming more and more important. This is a very brief introduction of myself so that you know who is speaking in front of you. This is the world view of speeches. At the bottom, this is the trajectory of my Last about 10 years. I was in studying in Switzerland From from Los Angeles to it and then after finishing my PhD I moved to South Korea and I spent a couple of years in Tejan and Seoul and finally arrived here Where the agency is located? my office is in the circle over there And it's a brief one-page summary of our research activities in my group. We are working on high-speed wireline transceiver circuits and systems. One major part is high-speed data converters, ultra-high-speed. ML-to-Digital converter which are the essential blocks for modern high-speed service system. We are mainly interested in DSP, the digital signal processor based service system and the hardware verification platform before the chip fabrication that can be used to increase the success the success and chance of that ABCD is going to be C which is based on the RLT-SOC this is what we developed recently the board is the commercial board and using this board you can verify the DAG and ABC which is very useful so so far it was the brief introduction of myself in the group now let's talk about wireless communication systems the chip to chip interfaces very briefly Can you see the mouse pointer? Ok, so let's say this chip and this chip are located at different line curves want to communicate each other to perform some meaningful communication at large scale only one chip cannot do this multiple chips should seamlessly communicate each other for that wired interconnection like this is essential and usually the channel is composed of several components such as via the Borg read array and the PCB trace connectors and so on and these multiple components of channels are can be expressed as a single simple channel symbol and when you see this kind of channel symbol it does not mean that there is one single trace but it includes already eight end-to-end channel components so This illustration shows that chip 1 and chip 2 are interconnected with channels and they want to send or receive data at very high speed. High speed typically means above 50 Gbps. even more than 200 Gbps and usually the usable micro processors include the digital processors inside and the DSP operates at multi GHz not tens of GHz it's relatively slow is very relative but let's say it's a relatively low frequency compared to the to the data rate compared to the tens of gigabit per second data rate so it's multi-digits but the processing happens in parallel so In order to transmit the data produced in this DSP to another chip, the chip 2, you need a wired interconnect. And we don't want to use multiple pads, we want to minimize the number of pads used to transmit this data because pads are very important resources. you use multiple paths, it means you are occupying area and the chip fabrication cost will be increased also. So we serialize the parallel data and then transmit using the driver toward the channel. And then on the receiver side, it's the same. You receive the signal using a limited number of paths and then at very high speed. And then you deserialize the data so that process the signal. So this symbolic view is illustrated on top here. So you have slow parallel data and the serializer will transmit the data to the channel and the receiver deserializes it. So this is the basics of high speed data communication, wireline data communication. And it's very simple. You basically transmit a zero or one on the transmitter side, synchronize them to the clock source, and on the receiver side, you receive this data, and then you put a certain threshold for the decision. And if at every, let's say, rising edge of the clock, you compare the incoming signal with the reference, If the incoming signal is higher than Vref, it can be determined as logic level 1. If the signal is smaller than Vref, it can be interpreted as logic level 0. But ideally it's like this, but in reality it's not that simple. That's why high-speed wireless transceivers are required. In reality, the physical channel has this kind of low-pass filter-like frequency response. So whatever you transmit on the transmitter side, let's say you transmit such sharp purrs, but then after the signal passing through the channel on the receiver front end, what you will see is this kind of thing. It's a low pass filter signal. So to rise to the highest value, it takes some time. And then to return to zero, it requires a so-called unit interval, several periods of the bit. So I'm not going to go into too much detail, but this kind of postcursors and precursors, those slow responses of the channel, cause so-called inter-symbol interference. So, at low speed, you can have this kind of widely open eye diagram, but as the data rate grows, meaning that the reduced pulse speed to transmit higher data rate, your eye diagram will be more and more closed, and at very high speed, your eye diagram is so closed that you cannot determine We cannot determine if the incoming signal is 1 or 0. So the purpose of wireline transceivers, composed of this kind of separate blocks in the transmitter and the combination of many equalizers in the receiver, the purpose is to open the I-diagram and make the signal integrity better to reduce the probability that the bit error happens. So this is the basic. Now let's look at some applications where wireless transceivers are widely used. One typical example and one major example is data center. You may have heard many times about data center. You can see that there are so many cables. The reason why the cables are there is basically to transmit and receive binary bits. for data communication and inside the server chassis you have this kind of rack so you have this rack and inside the rack you have many many chassis and between the chassis you have to communicate between the chassis you cannot communicate with wireless Even if we are in 2025, still in data center, the data communications are done in wireline communication. So you have to connect this chassis and this chassis using cable. And between the racks is the same. You have, let's say, this server and this server wants to communicate each other, then the cable connection should be made. This can be illustrated as this figure here. The green plate is the server board and this one is also another server board. Each board includes a processor and these processors want to communicate each other, then you need copper cable. There are also optical communications, but optical communications are usually preferred for longer reach communication like more than 10 meters of cable. For relatively short reach like 1 meter to 5 meter communication, copper cables are more economically preferred. And one interesting thing is that if you look at the global data traffic, more than 70 of network traffic happens inside the data center. More than 13 of data traffic is between data centers and only less than 15 is outside data center. So you can realize here how the data center is important network traffic and wherever network traffic happens there are wireline transmitters and receivers. This chart on the bottom left side shows the stock price history of Broadcom which is the leading industry these days. Broadcom is the world number one in network switch SOC, the systems of chip, which includes a bunch of transmitters and receivers. The stock price of this company raised more than 10 times for the last 10 years. So you can see that the growing, you can see that the wireless receiver or service business is growing significantly. Inside the server, there are also many different types of interconnects. On the server board, there are not only chips or discrete components, but there are also many wires and this kind of such as this kind of edge finger PCIe slot and USB and so on. So this is a beautiful phone. And one most commonly used high speed data communication standard for high performance computing is pci express. pci express has seven generations so far from Gen 1 to Gen 7 and up to Gen 6. you can find some products using PCI Gen 6 rarely. can find it. PCIe Gen 5 is widely used these days and PCIe Gen 6 is still getting employed by industry and PCIe Gen 7 is just a starting dial. Some standard documents are made. Standards are being defined these days and you can see that the data rate increased data rate is increased by approximately a factor of two every two years every four years. this is how the board looks like. You can see that there are so many PCIe interfaces on one single board. For example, this one is a GPU slot. You can see, I think many of you are familiar with GPUs. Usually, the GPU has a J-finger connector for a wide bandwidth data communication between GPUs or between GPU and CPU. This is another type of board, computer accelerator card. Many accelerator cards have PCIe edge finger connector as their main interface. This one is Ethernet interface, so there are different types of high-speed wireline data communications. I just talked about Ethernet. Ethernet is the most widely used ultra high speed wireline data communication standard used between servers. So you have this server in the rack, you have this server and this server and between these two servers you may communicate using Ethernet standard. On the other hand, within the server you tend to communicate with PCI Express. Even NVIDIA, which is known as GPU or AI infrastructure company, has their own network interface card. It's not to be sold to public, but they use this kind of network interface card very widely in their servers. So you can see the PCI Express slot and this one is internet. Another interesting application is instruments, measurement instruments for electronic characterization. This is a Keysight chassis where such plug-able modules can be mounted. Inside you can see the backplane connectors. These backplane connectors are for communication between pluggable modules using PCI Express standard. And this is another example of arbitrary waveform generator. This is the most advanced arbitrary waveform generator to date. Its per lane sampling rate is 128 Gbps. And to support such high speed sampling rate, Wideband with interconnects are essential of course and you can see that on the backside of this AWG payment there are backplane connectors supporting PCI Express communication. And one last example is the upper hierarchy of the data center. So far we talked about this area where the servers and top of rack switches are. For higher hierarchy, since the physical distance between those switches are too long, copper interconnects cannot be used because of signal integrity issue. For upper hierarchy of data center, usually optical communications are The reason that the optical communications are not widely used for short-witch interconnects is mainly due to the economic reason. The optical cables and optical modules are much more expensive than electrical modules or electrical cables. So the main reason is the economics. And this is an open view of switch chassis, the network switch chassis. The main component of this, the main part of this chassis is the SOC placed here in the middle. If you look at the detail of this switch, it has a switch core which is fully digital. The surrounding edges of this switch core is an array of CERDES. CERDES is another term of wide line transceiver. Modern network switch ASIC includes more than 256 or 512 or even 1024 surdess slices and each surdess slice or surdess lane looks like this. So you can see there to transmit and receive binary data so many building blocks FIO filter, digital to analog converter, PLA, clock generation circuit, passive or active passive output network, and on the receiver side, various types of equalizers and ADC, DSP, and clock units are required just to transmit 1 and 0. The bandwidth of network switch ASIC has been grown significantly over the last more than 20 years. So for the last 10 years, more than 80 times the bandwidth are supported and power consumption of ASIC is grown by a factor of 10 or so. The interesting thing is that the power efficiency of CERT-S is not scaled down, is not getting better as required. The problem is that The problem is that 15 years ago the entire switch power is dominated by the core power which is mainly the DSP, the digital part. And as the process node gets minimized, gets smaller and smaller, like from 28 nanometers down 14nm, 10nm, down to 3nm, the digital core benefits significantly in terms of area and power. Also, speed as well. However, analog part does not scale down as well as digital. And also, the CERDES is a mixed signal system, so it includes many analog parts. So as time flies, the data rate should be increased and the service becomes more and more complicated and analog power does not scale down. So now the switch silicon power is still dominated by the core, however the portion of the analog part, the portion of the surdesk is becoming larger and larger. So in the future, we can assume that the switch power is dominated by the surdesk and the overall power consumption of switch ASIC will be growing as well. The increase in power consumption is not only a matter of electricity bill but it is more it causes more complicated cooling system and as the more as more powerful cooling system is required of course the power consumption the power consumed by such cooling system giant cooling system will And also the system level complexity and also the space occupancy of such cooling system will be problem also. This is one example of Broadcom's Marvel switch chassis. Inside the chassis you can see the core switch AC and you can also see the giant fan put behind. This is just to communicate data This one is maybe not very well visible. This is also a cooling fan. And the heat dissipation of network switch is becoming more and more important. This is the system demonstrated at a conference last year at Marvell. The service sitting inside is presented at ISSC 2022 by the same company. So if you are interested, you can take a look at this paper to know more about this service. This is the economic significance of high speed interface. Y-axis is the port shipments in millions. So every year the wireless communication via transceiver ports are shipped more and more. This increase is basically exponential. So we can conclude that you can have an insight data communication, wired data communication business is growing significantly. One One important notice is that NVIDIA even started their own networking business. The purpose is not to sell publicly, it's to be used within their own super computing infrastructure. They acquired a company and started designing their own network switch unit and network interface card as well. This is one of the slides from NVIDIA when they showcased their new GPU technology. You can see that they are highlighting the interconnects. The NVIDIA H100 server includes 8 GPUs in one chassis and 2 chassis include 16 GPUs. If you have multiple GPUs, of course you want to use them all together as if it were a single very large GPU. Wide bandwidth seamless wireless communication is essential. You can see that there are network switches and vSwitch. Inside one single GPU chassis, there are 8 GPUs and 6 switch ASICs only for their chip-to-chip communications. This is another example of Tesla. Tesla has their own training system. They designed their own ASIC, where the system is named as Dojo. This is Dojo, a supercomputing platform. This supercomputing platform is a scale-able architecture and this silicon is really large. 645 mm is nearly the reticle size limit of one single die. 5x5 matrix is formed using these dies. The fact that they are a disaggregated scale-able architecture is that they require ultra-wide bandwidth, chip-to-chip communication so that this die and this die can communicate with each other. And you can see the highlighted 9TB/s bandwidth per edge. 9TB/s is even larger than modern network switch AZ bandwidth. So far we took a look at the applications of wireline transceivers in real systems and we will take a brief look at the recent trends of high-speed wireline. So from 2017, in 2017 there was one major change in ultra-high speed wireless. So the simplest modulation scheme is of course non-return to zero signaling or also called 2PAM or PAM2, the two-level pulse amplitude modulation. So to send a binary data, let's say 0 and 1, you can modulate this binary data low voltage and high voltage, let's say 0 volt or 0.9 volt to map 0 and 1. And And this one on the right hand side is 4-PAN or 4-level parts amplitude modulation. And using this modulation scheme, you can encode 2 bits to 4 different voltage levelers. For example, the lowest voltage level transmits bits 00 and the higher voltage is bit 01 and 1011. The transition from NRC of PAM2 to PAM4 happened in 2018 or 2017. So most 56 Gbps or faster wireless receivers employ PAM4 signal and PAM4 modulation. And you can see from this chart that from 2017, most higher speed than 56 Gbps links are employing PAM4. Energy efficiency plot is shown on the bottom right side. Even with PEM4 with highest data rate, the major trend of energy efficiency is getting better and better. The X axis is energy efficiency. As you go to the left, the energy efficiency is better. Usually when you go to higher speed the energy efficiency gets worse. However, it is quite surprising that even at higher data rate the energy efficiency is getting better and better. Of course, it's not... it couldn't be... it couldn't be happening without a significant effort both in circuit design and process node development. On the right hand side On the upper right side, the energy efficiency versus year is plotted. So as the year progresses, the energy efficiency is getting better and better. It includes both the circuit design, architectural innovation, and process load shrinkage as well. On the bottom side, the dotted line here is 14 nanometer limit, 7nm, 5nm, 3nm and 4nm as well. So you can realize here that most PEM4 surdash are fabricated in very much miniaturized technology and of course this is one major reason that the energy efficiency could get better. Here are some recent publications shown at ISSCC conference, which is the most challenging and competitive conference in the circuit design field. In 2018, the first 100 Gbps transmitter was shown and the receiver was 56 Gbps order. So you can keep in mind that usually the higher speed transmitter comes first and then the high speed receiver catches up. In 2018, the first ADC/DSG based receiver was shown and the next year in 2019, more number of ADC/DSG And here the world's first 100 gigabit per second receiver was shown and the receiver remained, most of the long range receivers remained In 2020 there were many 812Gb/s receivers so you can realize that the lack of race is very challenging. Three papers were from South Korea. Major high speed servers were ADC and DSP based. So you can see that the importance of digital signal processing technique based on analog to digital converter got more important. And in 2021, except for the only transmitter papers, every receiver or transceiver papers included ADC and DSP on their receiver. So, in 2021, there were the world's first 224 Gbps transmitter and the receiver remained at 120 Gbps. And by that time, I doubted I doubted we could go for 2 to 4 Gbps with 10-4 signaling. However, the next year, the world's first 224 Gbps receiver was shown by Intel. It is also obviously ADC and DSP based architecture. And two years ago, the race was temporarily stopped. There was no 2 to 40 kbps links showcased in the advanced wireline. other than the wireless transceiver session at ISSC but there were more idea-based innovative architectures shown in the conference. But the next year, in 2024, again the major data rate was 2 to 4 gigabit per second. So this is really frustrating. So to achieve such high data rate using advanced node like below 7 nanometer and most such links are designed in fabricated in 5 nanometer or 4 nanometer. So you can see in the title that this one is 5 nanometer FinFET and this one is fabricated in 3 nanometer FinFET and so on. and this one is also 3 nanometer so it's becoming more and more challenging and the design cost, the fabrication cost, everything is becoming tough. Still, it's so important that we cannot stop working on it. And such 200 gigabit per second links of course include data converter based front-end and digital signal processing. This is this year's ISSC. You can see some 200 Gbps transceivers and this one is also 200 Gbps links and so on. And those are all ADC/DSP based receivers. So what I want to say with this slide is that To support ultra-wide bandwidth data communication in service, high sampling weight data converters are essential and the DSP technique is becoming more and more important. Of course, analog technique is very important, especially at the front end. because the bandwidth itself, analog bandwidth itself is becoming double every generation to support 100 Gbps you need something like 25 GHz analog bandwidth to support 200 Gbps you need 50 GHz of analog bandwidth and so on so next generation will be 100 GHz of analog bandwidth from DC to the highest frequency which will be very challenging in terms of RF and analog circuits perspective. So this is a thing as the process node gets miniaturized it becomes more digital friendly than analog. So with miniaturized fintech mode, the area becomes smaller, power consumption gets better, speed enhances, however the power supply, the voltage supply usually gets lower, for example in 28 nanometers you have 0.9 volt nominal supply, but in 5 nanometers you have 0.65 volt something like that. But still, the logic, the ANSI prefers miniaturized process node, advanced node, so the analog circuit should also be supported by digital circuits. many sub-locks of ASIC becomes digital intensive to benefit from process node shurikish. The analog front end of wireline receiver has been, as the name says, it has been a really analog circuit, but even such analog circuits continuous time linear equalizer or variable gain amplifiers become digital like circuits like inverter based architecture. So inverter is becoming more and more widely used in high speed amplifier applications. Actually, the title of this slide is wrong. It's die-to-die interconnects. For die-to-die interconnect, the interconnected reach is extremely short. So it's also called extra short reach link. For extra short reach links, there is no room for ADC, DAC, and DSP because of power consumption and area occupancy. But for long reach service, for example, for a server, chassis to chassis interconnect like 200 Gbps links the transceiver architecture is becoming like that so the DAC there is DSP and the DAC on the transmitter and ADC and on the front end and the DSP on the receiver side So this is the second last slide to support beyond the 200 Gbps data rate for next generation CERDES. The modulation order should be higher like the transition from NRC to PAM4. There will be at some point from PAM4 to PAM8 or PAM6, something like that. We can also explore higher, more advanced modulation schemes that are widely used in wireless communications. There is a lot of effort to get the optics closer to the interface. So silicon photonics is also a very important topic of research in the CERTES community. so to summarize this talk, even if we don't realize, while interfaces are really everywhere around us, even this mouse is wired. and usb, displayport, hdmi, these things are all wireless. The economy of service is becoming larger and larger. The high-performance computing systems are no longer bottlenecked by the processor itself, but the interface is becoming more With process node scaling, digital-friendly architecture are more and more explored and the DSP is more widely used in modern service systems. So that's it. Thank you very much for your attention. Now I'm ready to receive your questions if you have any questions. I'll be on the phone.

다글로가 직접 받아쓰고 요약한 자료입니다
녹음, 동영상, 유튜브까지 전부 다 다글로에서 한 번에 정리할 수 있어요
AI 정리 시작하기