1 Http://2014.apricot.net/program#session/22283 26 February 2014 APNIC Plenary: Anatomy of CGN >>Dean Pemberton: Welcome this morning to the APNIC Plenary: Anatomy of a CGN. What we want to talk about today is I think everyone here is aware that there is something up with IPv4 address. There used to be a lot of them, then we wanted to actually use them, then we allocated them in a way that now we might -- let's not use the word "regret" but I think you know where I'm going. We came out with IPv6, and two decades later we decided to use it. But we have this problem now. We have deployed most of the Internet that we know and love on IPv4, now we are trying to deploy a lot of it on IPv6, because we have run out of IPv4, but being humans, we don't do anything until the last minute. We have an issue about how we support the IPv4 habit that we have had, while we are moving to IPv6. One of these measures is NAT. We have been using this for ages at home. In my little router at home I use private addressing internally, I get the one address from my upstream and I use NAT. What about if we could do that for an entire 2 carrier? Carrier grade NAT? What we are looking at today is the idea of carrier grade NAT. Is it the answer to how we extend the life of v4 a little bit further, to give us time to implement v6 or is it actually the answer, full stop? Do we just want to get better and better at doing carrier grade NAT, and then we won't need this troublesome IPv6? I've got a couple of speakers on stage that are going to give varying experiences with carrier grade NAT. Some of them have deployed carrier grade NAT, some of them have opinions about carrier grade NAT. What we want to do with this session, we very much want to make it more interactive, less slides. My speakers do have some slides -- exhibit A on the screen. What we want to do is run through these and then get some more feedback from the audience. Your feedback is really important. The mics are there, step up to the mic, name and affiliation, and just have at these guys. My first speaker is Geoff Huston. While we were discussing what we would say in his bio, we came up with, "He talks a bit, he writes a bit, he has a couple of opinions." I think we might be understating the "couple". Geoff is Chief Scientist with APNIC. We are going 3 to start off with some observations/opinions about carrier grade NAT. <>Randy Bush: Use the microphone. <>Dean Pemberton: Awesome, thank you. APPLAUSE Do we have any questions from the floor? I've got some points. Does anyone want to get up and have a comment about Geoff's presentation? You make an interesting point about lawful intercept. The more of the carrier grade NAT stuff you do doesn't actually mean that the police want to catch less people. Are people actually keeping this NAT state? You seem to imply they weren't. Does that actually mean that people are anonymous until they are targeted? <>Dean Pemberton: Really, IPv4 could be the choice of terrorists worldwide? <>Dean Pemberton: Right. 16 I had another point -- state. Roland Dobbins presented at NZNOG recently about how the more state you have in your network, the more fragile it makes it in terms of defending things like denial of service attack. Carrier grade NAT seems to be, with the 5-tuple and all that, the mother of all stateful devices. Does deploying it perhaps mean that you are making your network more fragile? <>Pindar Wong: You mentioned earlier about shifting cost. Do you see a role in the end user or educating the end user to prevent this trajectory that you are presenting? <>Pindar Wong: If I can twist that round a little, if you got the customers to say, "I want v6" as the answer to that -- not necessarily understanding all the details, but a very simple message, v6, I think we have been saying it for a few decades -- could that be something that is the message for the wider population to take away? For the very technical reasons that you have gone into -- long story short, v6 cloud, as you say. <>Cameron Byrne: Someone said the customers have to ask for v6. When we are talking about NAT, we are usually talking about the real consumers, the people with houses and home routers and cellphones. These people don't know what IPv4 is or IPv6 is, they will never ever ask for IPv6. When people say, "The customers should ask for IPv6," I agree with you, that is wacko land, it's not going to happen. What is feasible is that people understand the 18 reality of the carrier grade NAT -- and by "people" I mean carriers, the people in this room, the network operators. They understand what deploying NAT means, what it means in terms of heat in your facility and space and power and customer care calls when the application layer gateways don't work for TTPP or whatever possible programs people are running. Really, there is a balance, there is a chasm to be jumped between here we are in CGN land and here we are with end-to-end Internet, and once you jump the chasm -- so there's going to be a pool, right? A pool from the v6 simplicity of restoring end-to-end versus the pushback of whatever momentum is required to get from the IPv4 world to the IPv6 world. Really, it comes down to the carriers have to make the decision and it is a business case within the carrier to simplify their network, to not go down this escalating fragility associated with the CGN. <>Dean Pemberton: Okay, that was really good. I think now we have got a little bit of a taste about how complex and hard and what some of the issues can be around carrier grade NATs. So no one would actually be running those; right? Except people are. Luckily, now we have Dr Shin Miyakawa from NTT to share some of their experiences with carrier grade NATs. Miyakawa-san joined NTT in 1995 as a researcher, right after he received his doctoral degree on computer science from Tokyo Institute of Technology. Since then he was worked on research development and standardization of Internet protocol technologies. He is also an active participant in the IETF and author of several RFCs, including requirement for IPv6 prefix delegation, common requirements for carrier grade NATs and so on. He is a guest professor at the Japan Advanced Institutes of Science and Technology, a member of the Japanese governmental committee on the strategy of information security policy and senior visiting researcher at SSC Lab, Keio University. >>Shin Miyakawa: Thank you for the introduction. I would like to talk in a little bit more detail 20 about the carrier grade NAT, but first of all I would like to thank Kaname-san over there, he is my team member, an excellent engineer, and today's talk is mostly his work, so just an acknowledgment for him. Why this skips a lot? >>Randy Bush: It's going through carrier grade NAT. >>Shin Miyakawa: I think so. I would like to talk about some machines in their form. Those are what happened recently about that. I would like to skip this, because you can download this. First of all, maybe people here in this room may be already operating the carrier grade NAT -- or maybe not. But just to clarify, we already have testbeds and a considerable number of commercial implementations on the market. The reason why I am one of the authors of the RFC-6888, "Common Requirements for Carrier Grade NAT", is so many vendors are coming to me, saying, "Please try to evaluate the machines," and they introduce the implementation. We know that this is just an example of today's implementation. The software, today's carrier grade NAT machines can handle 10 million to 100 million concurrent sessions as a maximum, usually. Another very important number is how fast that can create TCP status. When one TCP session is coming 21 through the carrier grade NAT, of course a NAT table must be created. If that is not quick enough, that stops the TCP at the end. The number, we observe that 10K to 50K new connections per second can be processed. These are the basic numbers. Also especially the carrier grade NAT today is mostly deployed in mobile operator first, rather than the terrestrial. In that case, many operators do care about high availability. I will talk about that later. Usually, many carrier grade NAT machines have HA function to active-active, active-standby type of things. Then actually the 1U to 4U or something like that is the form factor. Also, the 1G to 40G bps ethernet interface is commonly used for those machines. One important thing is usually the specification on the catalog is way better than the actual performance. So be careful about that. Sometimes it is double or maybe triple faster, just like in a car. Usually the catalog spec is way better than actual driving, right. Then be careful about that. You need to be careful about that. Talking about HA, if we install CGN in parallel, just for back-up, in two machines, side by side, must 22 copy all the status in a realtime manner. If one thing TCP goes on the left side, the other TCP, the NAT table, must be copied into the right side. Usually the CGN has 10G ethernet, maybe 40G ethernet, just for HA, to copy this kind of status very fast. Some CGN is very good. We sometimes also can tell packets lost, TCP lost, even if we switch off the CGN. So that's actual achievement the vendors have done. I couldn't disclose who is the A and B and C. Please come to me if you like to know about that secretly. We have NDAs. Vendor A, that is the best one I ever know, can handle in catalog 67 million sessions, but actually we tested and that can handle up to 60 million. That is the real number. Then the DNS security, this is also good, because according to my RFC, UDP also can be treated, but DNS especially can be excluded. Because the DNS don't need to have any status usually, so that can save tonnes of memory, as Geoff said. Also, mostly we recommend that you use a fullcone scheme, the destination address independent NAT. For the well-known ports, they don't need to do that. Also, that can save tonnes of memory. Logging things -- that's a lot. Geoff said that. This vendor A logging is very good, and also the log 23 server can be multiple, so that's also very important to have the logging. Also, HA is supported. Talking about vendor B and C, you can check with that. Also, we have more than that, D, E and F and whatever, but if you have any implementation, please ring me to test it, I have good feedback. Then also talking about how to implement, this is a sample network design. This is a little bit too expensive maybe, but especially Japanese ISPs is so nervous about the breakdown, so we have a complete backbone in a design. Thinking about up to here, we can implement a CGN. Based on that number I already showed, like 67 million or something like that, we can implement this CGN like this. But a very important thing is usually CGN cannot speak any BGP, mostly. If CGN can speak BGP, CGN would like to use the memory in the CPU for TCP transmission, allows them to handle the BGP. That is one reason why CGN usually doesn't talk BGP. In that case, in an example like this, we can divide an edge router into two pieces, to put the CGNs into that section, and HA likes that. You need to really think deeply about these kinds of things, based on the performance and whatever you need. 24 Then, very important tips: today, I like to say that IPv6 should (or must) be introduced when CGN needed to be there, because carrier grade NAT is quite expensive device. Of course, that degrades the service, that means we cannot charge more to the customer to introduce carrier grade NAT, so from the administration point of view there is no hope to recover the cost. Thank you very much to on Google and Facebook, the famous applications, they already support IPv6. When you introduce IPv6 today, I can show you the example about 30 to 40 per cent of the volume of traffic can be diverted to the IPv6, so that the stress can be pressured into the carrier grade or can be save a lot. That saves also the money for the CGN operation. This is a good logic to ask the board members to introduce IPv6 to save the cost of the carrier grade NAT. That's one of the things today I would like to talk about that. Then, talking about the Internet application, we have intensive testing of lots of applications last year. Back to 2008, some of the people already we noticed about that, and I made some presentation in the Google headquarters back in 2008, where I used the Google Maps example, you know, that CGN is introduced in 25 broking, the white boxes appearing, that's my presentation originally. Passing the six years, lots of applications is now using less sessions than before. But still, as Geoff said, they still use many TCP concurrent sessions. This is some results of our observations. For webmail, usually the average of the consumers of the TCP is 65, so that's very much. Also, the video streaming, the average consumers is 83. For adult sites, it is less used, 47. I'm not sure about that, the reason why. Like that, the online gaming, using 100 concurrent sessions. According to this survey, usually we should secure at least 100, or I recommend 500, sessions per user, to give good customer satisfaction, using carrier grade NAT. That number is very important; how many customers you can handle with one single IP address. Also, you need to think about the logging as well. Geoff said that if we do not log, we do not need to disclose that. Some areas, that is true. But some areas, we have to log. So that depend on the legal system you are in. But be careful about that. That impacts a lot of design of the carrier grade NAT operation. Then, one example I would like to show about how 26 IPv6 can improve that situation. We have evaluated last year some -- we had a Conference of the web folks and they are now very busy to develop HTML5 technology. In Japan, there is some community to develop the HTML5, so we supported their network at the Conference. That happened last November. About 1,300 people attended. We have introduced carrier grade NAT, that aids the machines, and also some firewalls with IPv6 perform simultaneously. Then the maximum of the terminals is about 1,000. We got logs from the wireless access, the servers, then very impressive result is the maximum IPv6 usage is reaching at 60 per cent. We never talk about that to people, "You should use IPv6," no. We just deploy it. We didn't tell what happened. Because web designer doesn't care which protocol they are using. What we have done is just, "Please use v4 with CGN, please use v6." According to the model of the operating system, Apple, Windows, version 7 or 8, if IPv6 is there, the usual operation system is use IPv6 with a higher priority than before. That is the reason why, once we introduce IPv6 without letting know the people, they just switch to the IPv6. That's the reason why that can happen to reaching 60 per cent. 27 The funny thing is the IPv6 maximum is 1:10, so many Apple machines was there, that is the reason why it was an HTML5 Conference, but I don't know that the other machines use the implementation, but lots of IOS can speak to IPv6. Interesting is what they look at is mostly Google, Akamai, Twitter and Facebook type of things. These famous DNS names are already IPv6 compatible because at the last year we had such kind of event. But even just a few sites like this can save many sessions, because many people are going through Google, or reading Gmail, by IPv6. If they Twitter or maybe they Facebook, probably they use IPv6. I'm not sure about Twitter, but Facebook for sure. Then they can save so many TCP sessions towards the IPv6 transport. Then, best of all, less IPv4. This is very few number of sessions we can process for that. Then when we had this experience, it was limited by almost 30 or so, because offload to IPv6. That is a very important number. Usually the application is 100 or something like that, but once you introduce IPv6, that can be dropped off to 30 or something like that. So that's way improved. Also, this is interesting number, the percentage of the high-ports, I mean the need to be treated as 28 fullcone, to ensure the application is transparent, and transparency, fullcone is needed, then overall the 60 per cent of sessions are above the 1024. That is also a very important number, where you need to think about that when you design networks. Then, that is the result. Even today, only Google, Facebook and a few sites are IPv6 ready, but they are so major. So if we introduce IPv6, about the average is 40 to 50 per cent of traffic by volume can be diverted to IPv6. That means we can save the cost of that carrier grade NAT, when you introduce IPv6. This is the final slide. There are several carrier grade NAT implementations commercially available in the market today. Mostly, they work good, but some issue, especially around HA -- we tested HA function, and lots of sessions we lost, and the takeover was happen. So please be careful about that. Again, catalog specs are a bit suspicious. So be careful about that, again. Also, many cellular phone operators have deployed CGN already. Some terrestrial services are following this trend, but again, IPv6 introduction will help carrier grade NAT load a lot to reduce the cost. That's my comment about carrier grade NAT today. 29 Thank you very much. That's my presentation will end. If any questions, discussions. >>Dean Pemberton: Thank you very much. APPLAUSE Awesome. Do we have any questions from the floor? Make your way to the mic, otherwise you have to listen to me some more. One of the interesting points for me was the Conference where 60 per cent of the traffic was IPv6. I'm trying to see if we can get a similar number for this Conference. I am trying to get what the numbers are, rather than trying to -- anyway. Do you think carriers have a realistic understanding of how much of their traffic doesn't need carrier grade NAT? Because if they just look at all the IPv4 traffic now -- maybe it's X many terabytes and they assume all of it is going to need to be carrier grade NATed, that is going to lead to decisions down this path. If 60 per cent of it won't need this, it could be a completely different thing. Do you think they have that understanding? >>Shin Miyakawa: Yes. I would like to talk about that, this afternoon. But at least in Japan we already have decided to move to IPv6. Then KDDI and SoftBank already introduce an IPv6 virtual native service to the people, 30 also NTT is following; just because of the carrier grade NAT costs a lot -- very simple reason. We say to the people at the higher level, "OK, Mr President, we should save the cost, let's introduce IPv6." That's it. Very simple reasons. <>Shin Miyakawa: But again, because the reason why, we operators would like to call 6 CGN, so I don't think so, but we may have a chance to delay IPv4 in that case. Then maybe the web browser people need to change their attitude. I'm not sure. Again, we will see soon, because the inter Japanese islands network is going to the dual-stack soon. So we will change that situation. I don't know. 32 >>Dean Pemberton: Before I unpack that too much, I might throw over to our next presenter, Sunny Yeung. Sunny is working for Telstra for over eight years in wireless mobile engineering. He is responsible for their network planning, architecture and deployment of Telstra's 3G and 4G data networks. Sunny's work has spanned diverse areas, related to wireless data core, including network monitoring, security, software-defined networking, and over three years, overseeing the design, deployment and trial of IPv6 in mobile wireless. >>Sunny Yeung: Thank you. Some of the things I am going to be talking about today: CGN solution for Telstra mobile, so I will talk a little bit about how we have deployed CGN in our solution; fabricated reality, what is CGN, what is the message we are trying to get across? I have put a bit of a sky-fi reference in there, which I will talk about later; and the truth of reality, what we think you should be doing with CGN. In Telstra we have deployed CGN ever since the inception of 2.5G data. So it's been in the network for almost a decade now. It was introduced very early on in development of the mobile network, to maximize address utilization efficiency -- simply something that most wireless carriers are doing globally. 33 Users are usually allocated a private IPv4 address, then they translate into a public v4 address. We have deployed it before the APNIC exhaustion of IPv4 addresses, so a very important point to stress there. Recently we have deployed PAT, so I don't think it is the 5-tuple, but we have regionalized deployment of CGN. So instead of centralizing our CGN deployment into one location for our entire deployment, for all the users to exit one central point, we have regionalized the deployment into all the different states in the country to exit as local as possible and as close to the users as possible. Then we deploy PAT, and we have deployed it recently and are starting to deploy that, just to allow for a little bit of manoeuvring room during the transition period. It is not a long-term solution but it is a way for us to get over the hump while we are transitioning to IPv6. Traffic to the Internet, originating from our users, we allow to pass without much interruption at all. We do obviously place security in there, you have our standard firewalls, the access control lists and also on the ISP side, internally in our own ISP, but the Internet border router, on the other side of that, we have control mechanisms to only allow certain ASs and 34 certain IPv4 ranges to exit into the network as well. So there are certain security measures that are placed. Internet originating traffic is essentially dropped. We see no reason for the traffic to enter into the handsets coming from the Internet. We have a distributive model. I am going to talk a little bit about what other things go into a CGN deployment, some things you need to consider, and that we have considered inside our environment. Let's put up a hypothetical situation, you need to allocate your users a private address in the user gateway, to your user equipment, you have a finite number of public IPv4 addresses you can use, you have a finite number of private IPv4 addresses you can use and you can't get any more of any of those at all. What are your options? You can centralize or regionalize your network design. We have chosen to regionalize. Centralization simply increases your latency and resource utilization. You don't really want to do that. To save cost, what we have done -- the carrier grade NAT in this diagram is on the right-hand side with the arrow pointing to it -- we have integrated the routing and the CGN into the same box. But instead of choosing 35 a smaller box to do only CGN, we are doing the BGP routing on the same box with a larger box that can do the processing involved and has sufficient memory and CPU resources. That is something you might want to consider if you have budget constraints when looking at CGN. You can centralize or regionalize. Regardless of either one, you have to look closely at your design to make sure you have sufficient resources to do the correct routing. The regionalization just means that you need smaller public IPv4 spaces in each gateway, and that means you don't need to have such a large range of NATing or PAT that needs to go on. It doesn't need to chew up as many sessions. You can place CGN closer to the user, that's your second option. But if you do that, that means you might increase the number of public IPv4 addresses that get used. The question here is, if you do that, what if the users are trying to reach content services that are hosted inside your own carrier service? Then that means that public IPv4 address is essentially wasted. You don't really want to do that either. There is a reason why we have public IPv4 addresses. The IPv6 reality is different. We have lots of 36 public IPv6 addresses, so it doesn't really matter. You can do NAT444, that is your third option. We looked at this and we didn't want to do it. One of the things about working in Australia is that there are very, very strict regulatory requirements and compliance with those regulatory and legal requirements means we need to log. Logging is very important for us. We don't want to waste money on purchasing more hard disk space to basically have this logging in place. We don't want to complicate the matter, more importantly, for our operations and troubleshooting teams, because in the end it's not just a legal area that we need to worry about, it's about servicing our customers. If they call up our service centre to try to troubleshoot an issue with their connection and we have NAT444, it is just going to make matters worse. We have to be smart about what we really want to do. But if you have major issues of private IPv4 address depletion on top of your public IPv4 address run-out, it is a possibility. But again, you have to worry about the legal and troubleshooting and investigation issues if you are going to do that. >>Dean Pemberton: We might have some feedback for the presentation set-up. >>Sunny Yeung: The question for everyone here is: before 37 you even consider doing CGN, if you haven't deployed it yet, do you actually have an IPv6 strategy? If you go two slides forward, using a sky-fi reference, the blue pill and the red pill from the matrix, the blue pill represents the blissful ignorance of illusion, which basically means that CGN will save the world. It's not going to. Many believe that CGN is going to do that and resolve the issue of the IPv4 depletion problem, or extend it, and IPv4 depletion will simply disappear and you can manage everything with CGN. That's not the case. We have already run out of addresses in APNIC. Operators are already looking at private IP address utilization, probably for the first time ever, and this is especially the case in some fixed operators, not just wireless. We need to be very careful about defining and differentiating tunnels and translators, when you are looking at CGN. Single CGN in a traffic flow can maybe help you with your IPv4 depletion, and that is one of the things that we have done inside Telstra, we have basically one stage. But if you do NAT444, you encounter all the issues we mentioned before, about troubleshooting and compliance. Scalability and reliability become another 38 issue as well. How do you scale properly? How do you manage the address range utilization properly? Another issue is how do you actually get two ends to actually talk to each other at the same address? You can't really do that. You need to have good IP address planning and management before you deploy CGN and even after you deploy it, you need to look very carefully at what is left available, if you really need to do NAT444. You don't really want to do that because the last one is a lot of applications will break. NAT44 already requires ALGs to fix these applications, and I am sure Shin Miyakawa-san already has a list of ALGs that are needed. If you do NAT444, that list gets even longer. You don't want to do that because it just complicates the matter even more. Something to note here is that in a way, in mobile networks, if you have a CGN already in your network you are already running that 444, because on the UE, when you are tethering, the UE becomes a router and a DHCP server that allocates another IPv4 range to your tethered devices. We are all blissfully ignorant of that, but that's the reality, it's already been doing that. What about dual-stack in IPv6? We have been talking about it as well. How does that relate? Dual-stack 39 does nothing to alleviate the situation. If you deploy CGN and dual-stack without the final IPv6 design to aim for, then really you haven't solved the problem, you have just made the problem worse. The red pill, the painful truth of reality. CGN will extend your public IPv4 allocation but the v4 depletion issue isn't going to disappear. Will it cost more in the future to obtain more IPv4 addresses? It probably will. These are some of the questions you need to ask yourself before you do this. To fund development on ways to overcome the issues caused by NAT444 compared to funding IPv6 directly? To avoid that situation, obviously you need to have an IPv6 strategy, and that's what we have done. Our end goal is to have an IPv6 strategy and to deploy CGN and treat it as a hop or a transition method into IPv6. It's not a way to solve the IPv4 depletion, it's a stepping stone to v6. CGN should be used to give yourself time to deploy it. But it should be deployed for IPv6 single stack only. Really, the solution -- the real solution to resolve the IPv4 depletion is to deploy IPv6 anyway. If you are really stuck, you can deploy PAT. We have chosen to do that as well because we are in a bit of a pickle in Telstra, we really need to launch IPv6 to 40 solve our problem. But you can deploy PATs. You will really need to look at your logging very closely, because it also increases your requirement to log properly. One of the other things you can look at is try and recover as many IPv4 addresses as possible. One of the things we have noticed is we have deployed much larger IPv4 private ranges for our infrastructure allocations. Previously, because of the simple rules that a lot of vendors have been teaching network engineers, is double whatever you have been allocating for future growth. Well, we have run out of IPv4 addresses, so you cannot really do that any more. Why not look at recovering some of those addresses and reducing the subnet mask and try and recover as much as you can during this transition period, or migrate some of those subnets to much smaller ones so you can recover a bigger super-net. The next one is: do I use a tunnel encapsulator or a translator for IPv6 traffic going to IPv4? Depending on what you are doing as a carrier, if you are fixed, you can use DS Lite or MAPT/MAPE or 464 XLAT or NAT64. Those are some of the things you might want to consider. In our case, as a PLAT solution for our wireless environment, we will be using XLAT 464. It is not the 41 best way forward in the long term of things, but how we see it is a lot of our mobile users are also going to Facebook and YouTube as their primary websites, so we expect that when we turn on IPv6 and we have the user base out there, a lot of the users, the traffic is going to native IPv6 directly -- well, hopefully, anyway. That's something that you need to think about as well, look closely at where your customers are actually going to. What we have seen so far is that they are going to the major sites, and they are IPv6 enabled -- given all the browser issues we have been talking about today -- but we are hopeful that most of these will be going to native IPv6 on day one. If they are not, then you need a 464 XLAT or a NAT64 to do that translation for you. We do need a NAT, because not everything on the Internet is going to be IPv6 enabled on day one. You need some sort of translator, and that is the reference there. Alternatively, you can do dual-stack but it doesn't solve your IPv4 depletion issue. That is the issue we are trying to resolve here. So it doesn't really solve that problem. You do need a DNS 64. We believe you do need one because 464 XLAT, although it is there, doesn't really 42 solve some cases -- for example, if you don't have a DNS 64 and the application that you are running doesn't need to call up the 464 XLAT service, then you have a broken network. So we believe it is actually needed inside your environment. So that is something to consider when you are deploying NAT64 on your CGN as well. Is it complicating things more? Not really, because we are trying to go for IPv6. As Cameron talked about in his session yesterday, 464 XLAT is just to assist with the v6 transition into full native IPv6. It won't be required when the Internet has gone full IPv6 anyway. The last point I want to make is that we don't want to forget about application layer gateways. NAT444 is going to break a lot more. Remember that, you need to make sure your vendors have developed sufficient ALGs -- and they haven't -- to support NAT444, if you are going to do that. Not only that, make sure you double check NAT64 as well, and we are doing that, looking at NAT64 carefully, looking at what ALGs the vendors are actually supporting, and not all of them are supporting everything today, day one. So that is something to be very careful about, when we are doing NAT64 on the CGN. Obviously with NAT64, it is a stepping stone. We don't expected it to last in the network, obviously, 43 until v4 is completely gone, which is in 10 to 20 years' time. But as more and more websites and content become v6 enabled, it won't be doing as many translation, hopefully. Truth of reality: from Telstra perspective, we are deploying IPv6. We have a CGN deployed today. We have been testing IPv6 for the past three years, and technology has been maturing. It is part of the strategy, during the transition. It is not our long-term strategy, it is just part of the transition strategy for IPv6. To introduce it for all of you here, some of the things that you might want to consider, that we have considered as well, is deploying IPv6 in the infrastructure today. You can do it today -- no issues. You have to talk to your vendors, obviously, to make sure you have proper IPv6 support, but there's no reason why you can't start testing and deploying today. Deploy CGN at the Internet border router with NAT64. As Miyakawa-san said earlier, you want to save costs so you want to combine your NAT64 CGN with your NAT44 at the same time. Obviously, CPU and memory resources are important things to consider, you need a fairly powerful box. But if you can combine that, all the functionalities together in one location, it means less 44 hops and less routing for the network, to reduce latency. Introduce the DNS 64 function into your existing DNS resolver. Again, it is a cost-saving measure and it makes sense. Connect your user devices using single stack IPv6 and implement 464 XLAT on the user devices. Those are things that you really should be pushing, regardless if you are going to be doing a wireless environment. The last comment I have is that there will eventually be a time when the only function the CGN will perform is in the 64, hopefully. Later, eventually completely removed as well. We are going to aim for this as an end goal when going down the rabbit hole. >>Dean Pemberton: Awesome. Thank you. APPLAUSE I might forgo a couple of questions, because we have about 20 minutes to go. I'm going to throw to AJ, then we will hopefully have some time for questions at the end. AJ, Alastair Johnson, is a senior product line manager at Alcatel-Lucent in California. In his role he works with customers globally to define requirements and feature developments for the Alcatel-Lucent family of routers, particularly to IPv6. He has experience in 45 carrier and ISP operations as well as network engineering and has worked across many countries in the APAC region. He is active in the Internet NZ community as a member and the NZNOG community as a member of the organizing committee. >>Alastair Johnson: Thanks, Dean. Actually, the bio is a little out of date now, since I haven't been on the NZNOG committee for a little while. I probably should update that. >>Dean Pemberton: I had noticed that, but I wasn't going to call you out in front of all these people. But now I have. >>Alastair Johnson: Good morning, everyone, I'm AJ. As Dean mentioned, I work a lot on IPv6, working in the APAC region working with operators here, and more recently in the last three years I have moved into a global role and work with our customers all around the world. By day, I work on IPv6 and I talk to customers specifically focusing on IPv6 requirements, features, deployment and how they get to there from here. That always ends up bringing me into talking about NAT at some point. Unfortunately, NAT and v6, as Geoff mentioned, are often intrinsically linked. I am going to talk about what we have observed in 46 some customer networks and some of the operational experience we have gleaned in deploying NAT in a few customers around the world. When we set about actually developing NAT functionality for our router, the CGN blade that Geoff mentioned in his presentation, a lot of us felt like the guy on the slide there, and just wanted to run away. CGN, NAT, at the carrier level was a dirty word, it was deliberately introducing fragility into the network and putting a lot of extra state into the routers. That said, it was 2007, we could see the writing on the wall, and we knew we needed to do something. When we set about developing the NAT functionality, our main focus was on making NAT suck as little as possible. That's pretty challenging. But we have done a reasonable job. I will talk a bit about what we have seen. I tried to cut down the slides as much as possible. Looking at the time we have left, that's a good thing. I have four main points to hit, some of which have already come up across the earlier presenters. The first area we see operators being concerned about, and rightly so, is what's the application compatibility like when I deploy CGN into my network? Everyone is concerned about what will break. What apps work, what don't? We perform testing, as every vendor 47 does or should do, in our labs to make sure that we understand what works. We have a big list of apps that we have collected from operators around the world that are specific to -- in some cases -- countries. The protocols and applications in use in China differ significantly from the apps and protocols that we see in use in Germany, which are different again to what we see in the US versus South America and so on. That then turns us to: what ALGs do we need? Sunny touched on those quite a number of times in his presentation. The big four that we typically see people talking about or asking about are SIP, FTP, RTSP and PPTP. It has been fun for some operators deploying without a PPTP ALG and wondering why their customers' VPN suddenly break. We have to develop and make these particular ALGs highly scalable. When we are talking about 30,000 to 50,000 subscribers behind a single NAT, we have to make sure we can support the translation as and fix-ups that we need for the protocols for that level of scale. There are a lot more ALGs that you might want to consider, and we have had customers ask about, but they are not necessarily something we can easily do in a very large scale NAT. If you look at what an enterprise NAT or a small business NAT is going to do, or even if you 48 look at the Linux Kernel and all the NAT ALGs it supports, the list looks small in comparison, but we have this immense amount of state that we have to track, we have potentially millions of flows and we need to scale the ALGs as much as we can, that means as little as possible to impact the biggest set of apps as possible. If you have an application that really needs an ALG that your vendor does not support, then you are probably looking at don't NAT that particular type of subscriber. That does introduce further complexity into the network, but usually it's not unavoidable and many operators are differentiating, particularly in the broadband space, the wireline broadband space, as opposed to the wireless space, premium subscribers v non-premium, those that can be safely NATed verse those that can't. That means some analytics in your network, and taking a look at what your customers are doing and potentially, if you are highly revenue driven, who you can charge more to avoid your NAT. There is also some impact, depending on the type of NAT you are deploying. NAT64 needs a different type of ALGs versus the NAT44s and DS Lite needs a different set again, and we have dependency sets in the CPE and in the carrier grade components when we move to distributive 49 NAT functionality. That said, the good news, everybody -- said somewhat cynically -- is that LSN44 can be deployed almost transparently. We have had some pretty good experiences here in APAC and in Europe and in some trial markets in the United States where we have dropped in a NAT and nobody has noticed. Now, as someone who comes from the perspective of I don't like breaking the traffic flows and I would like a certain network to be as simple as possible, that kind of scares me. Coming from my day job vendor role, that makes me very happy, we built a product that works. This is where the keyboard doesn't want to work. Application impact, we typically observe two areas, inbound connectivity, where subscribers are running a server. The typical example could be a Skype client, I want to receive a phone call, I am running some other type of IP telephony, video conferencing, I'm hosting a game on my Xbox at home. We need to allow end subscribers to signal that we need to allow inbound ports. Maybe we could do that for static port forwards service through a portal or we can use PCP to get -- CPE to signal to the carrier grade NAT, I would like an external port, please tell me what port and what my external IP address is. 50 Or we don't NAT this type of subscriber, as I mentioned before. The other area where applications move away from being a straight protocol like FTP or HTTP, but moving right above that, is anti-abuse systems run by content providers. One of the points I heard about, and talked about back at APNIC in Busan a couple of years ago, was Twitter has a rate limit that limits the number of tweets per second from a single IP address. If you have a lot of people tweeting from behind a single IP, bad things can happen. There is a major e-retailer as well that also looks on with a little bit of surprise when they see 100 subscribers behind a single IP address and 100 different accounts being used. To them, that looks like 100 accounts have just been compromised and we have a malicious user trying to buy a whole bunch of books or something along those lines. There is a similar story for a fairly significant music TV movie content purchase and distribution system. This leads me into discussing later on the number of ports we overload. The second area that operators have concern about, and Miyakawa-san touched on this, is redundancy and 51 resiliency of NAT in the network. Broadly speaking, there are two areas we need to consider -- intra-chassis redundancy, we design a CGN blade based solution where you load in blades based on your resource requirements. If you look at the top right, I have a chassis that has two green cards in it and one blue card. The way that we load balance across those cards is the same way that most routing functions work: the ingress line card makes the egress forwarding decision and it chooses where it sprays traffic. This means we do an ingress hash. When we have two available line cards for supporting NAT, we hash the traffic 50/50. If one of the line cards fails, we either need to recalculate the hash or we need the line card to be backed up by another one. We can see there the cards failing, or should be failing, and we should have a card swapping to take over. In an intra-chassis redundancy mode, generally speaking we want to have a back-up card available, simply because doing it, a recalculation of the ingress hash means that we would actually put subscriber traffic on to different cards and would lose some of the state bindings that we have created. This is one of the ways we took in other to build a very efficient but scalable 52 NAT, but it comes with trade-offs. The second area we look at and consider is inter-chassis redundancy. Theoretically it is possible to support, and if you have looked at some more enterprise NAT deployments or products, you have probably seen it is possible with active, active or active standby. When we crunched the numbers, it wasn't so much about the traffic you needed to run between the two nodes. With 6 million flows, you have about 32 megabits of traffic running between the two nodes to synchronize state with 30,000 transactions a second. It is 100,000 transactions, up around 100 megabits. The problem is that actually takes away resources from establishing new flows. We are spending a lot of time synchronizing state, making sure the state is in sync, because if the state is not completely and totally in sync, it is just as bad as having no state sync whatsoever. We actually made the decision so far that we will do active stand by inter-chassis with no state sync. If a chassis is going down hard, there is going to be so much churn in the network anyway that some interruption is to be expected. It is not ideal and we are introducing fragility into the network. This is one of the things that personally it worries me about NAT. 53 But so far this has worked well enough, and if we have a tightly coupled NAT that is very close to the subscriber, perhaps in the first IP node the subscriber hits, if that chassis is going down because of a fibre cut, power failure, planned reboot, the subscriber was going down anyway, so we have this coupled behaviour which also allows us to scale NAT, particularly from a resource utilization on the chassis. When we looking at logging impact, this is also a big consideration for operators. We have heard from the other speakers why that is important. We see a few different deployment models in the field, no logging whatsoever. In fact, some operators have said, "We don't really even want to be able to see transient state, just in CLI or in SNMP, because maybe the regulators will say, 'You do have some capability to log and you better keep that transient state forever.' Please don't let us see that." We see pretty simple logging, and I have an example where we break an external IP up into port ranges of about 500 ports per block and generate a logging message which says the block of 500 was assigned to this external IP at this time. Timing precision is important. As Geoff mentioned, we need to make sure that when someone comes to us and asks, "Who was using 54 this IP address?", we have the port information provided to us, or we will never be able to identify the subscriber. That brings us to: do we need full flow logging? Just the same way we might use flow data for analyzing end-to-end traffic across the network, we can do this with NAT as well. We have seen some jurisdictions say, if you are deploying NAT, you need to do full logging using IPv6 or other flow exports where we can see every translation, every source port, every destination port, every source IP, every destination IP, the duration the flow was in flight for, et cetera. This translates to a lot of data to keep and warehouse. We see customers saying, "We need to do this, but we want a turnkey solution." Now, we build routers and build NAT but we don't build data warehousing systems. We are not very good at that and it's not a business we want to get into. We push it back to the operator and say, "If this is a requirement, there are companies that are good at it but they charge a lot of money." This tends to defer NAT deployment in the operators where they have this very strict requirement. Unfortunately, most of the operators facing this haven't yet faced IPv4 exhaustion, so that tells you the regions where we have seen this. Some operators have 55 had to pursue an IP address acquisition strategy simply because they decided flow logging was so expensive it was cheaper to buy more IPs. You see a flow logging message that would look like this, this is a NAT64 example, based off the IP fix or NAT draft, and it gives you an example of the type of record you would see. That brings me to the final point: how much overloading is too much? This is always a question that comes up. I personally try to push operators to very low overloading, 2:1 to 5:1, in that range is probably the best I would recommend for initial deployments. When I see an operator coming to me and asking, "I would like to do 100:1 subscriber to IP address overloading from the first day I turn on my NAT," I go, "Why do you want to do that? If you have enough IPv4 resources for your subscriber base today, going to 5:1 is giving you a huge reduction in actual IP consumption. Are you sure you want to go down the path of 100:1?" It certainly introduces some fun, particularly with anti-abuse systems, but also in the logging requirements. If you have 2:1 or 5:1, the number of problem causes reduces substantially if the only input data you get from a law enforcement agency or abuse report is an IP address. 56 This also leads us to how quickly we need to recycle ports. If we have big port blocks available for subscribers, we don't need to recycle ports so quickly or start playing with TCP and UDP timers to wind down how quickly the NAT will close out a translation. As Geoff mentioned, that can be pretty bad. I have sat behind some very bad hotel NAT before, which makes it impossible for me to keep up my IPv6 tunnel back to the office, unless I run a constant ping to keep the UDP translation that IPsec is running over active in the hotel NAT. We give operators a lot of tools to tune this, but we actually follow what was defined by the IETF and RFC4787 and RFC5382, which actually defines some of the behaviors that NATs should take to be conservative and break things the least, or suck the least. I would also say, try and keep your overloading as small as possible and don't go insane. You have got a lot of advantages and a lot of benefit from 5:1. In summary, NAT has been deployed in a number of networks; we have done it, other vendors have done it. As Sunny mentioned, wireless operators have been doing this since the invention of packet data but wireline operators are learning this now. There is a lot of operational overhead, port block management, logging 57 management, outside IP address management, how do we swing pools around on different NATs and load balance the pools? If you go for a centralized NAT model you need to think about what it does to your traffic engineering. Are you suddenly carrying a lot of additional traffic in your network core to get to the NAT that previously would have exited out local peering and transit arrangements? We have new redundancy and resiliency engineering choke points that we have to consider and consider how we couple, you know, typical network failures with NAT failures as well. Finally, any NAT deployment we work on is always strongly encouraged to deploy IPv6 at the same time. We are pretty successful with the argument, "It means you pay us less because you buy less NAT." Sales guys hate me for saying this. I am not incentivized on sales, so I don't care, but it gives me a little bit of an ability to sleep better at night by trying to get people to deploy as little NAT as possible that they can get away with. Hopefully that was useful from our perspective and I tried to keep it as short as possible. Thanks. >>Dean Pemberton: Well done. Thank you. APPLAUSE Just trying to sum up, with all of the four speakers 58 we have had today, we have heard that all is not rosy with CGN, and there are pitfalls. All is not rosy with what vendors tell you their CGNs can support, and whereabouts you might want to put them in your network, and you definitely want to do some testing about that. Some form of CGN is pretty much inevitable, and people have been doing it for so long anyway, and there are deployments, and there are workable solutions. In the couple of minutes we have left, I want to throw it open to the floor and get a bit of feedback from anyone who is currently deploying a CGN, what were the pros and cons you thought about, what are the sort of trade-offs, or anyone looking at deploying a CGN, how do you think that deployment will change over time? Will it be in one mode now, just post IPv4 exhaustion and how will that change as we look at more IPv6 adoption? Does anyone have any input from the floor? >>Srinath Beldona: I am with APNIC. Before this, I used to work for Wipro and we did a solution for some of the big providers in India. One of the challenges that comes up with the mobile environment is when you export data from the CGN device to the logging devices which are required from a lawful intercept perspective, there needs to be some integrity of the information. The information, the 59 logs that are being collected can be lost, so because of conditions or anything. So what is the solution that is being looked at for that? I think this question is more towards Alastair. >>Alastair Johnson: That is a good question. We have had operators that have said, "Yes, we want something more reliable than just simple Syslog." We actually offer in our platforms a radius based logging message, so you get confirmation that the log was received by the receiver. When you move to a full detailed logging protocol, like IP fix, the overhead in trying to acknowledge every transaction becomes enormous. We can correlate approximately, we have exported this many flows, the flows receivers received this many flows, so you can see if you lost anything, but it's not always possible to pinpoint what was lost. This is a huge amount of state and a huge amount of logging information to keep, it is very difficult to do this accurately 100 per cent of the time, although I think everyone in this space tries as hard as they can. >>Shin Miyakawa: Geoff used the static assignment, so there is no need to log, so that is another solution. >>Kaname Nishizuka: I would like to add a comment about Conference network. The Conference network, as Shin 60 Miyakawa said, is very tight schedule, so there was no complaint about the network experiment, except one complaint was about Google Cloud message. Google Cloud is using high-port 5528 to 5530 and the time-out is very long if using port forwarding, so it is not stable time-out, so Google Hangout cannot use that situation. The new recommendation about CGN is set that port time-out long at the specific port number. That is important to know about carrier grade NAT. >>Dean Pemberton: Thank you very much. Any more questions before we break? >>Pindar Wong: A quick question about NTP and time synchronization. Any views on how that might evolve or should? <>Dean Pemberton: On that note, I would like to thank everyone for coming, I would like to thank the panelists. Thank you very much.