APOPS session transcript
Table of contents
- Realities of IPv6 IPSEC deployment
- Experience sharing: IPv6 at the Hong Kong Internet Exchange
- IPv6 address architecture on P2P link
- What's happening with four-byte AS numbers?
- The Day in the Life of the Internet project
- RIPE NCC Information Services
- IP emergency services
- DNS: report on security advisory
Wednesday 27, 2008
APOPS
1400-1530
PHILIP SMITH:
Hello, I think we should try to make a start to this afternoon's session. This is the first of two operational sessions I suppose with what we've called the APOPS or the Asia-Pacific Operator's Forum. So before I get launched into what APOPS is all about and what the agenda for the afternoon is, I have the usual housekeeping book I have to read out to you. It is quite a substantial list of items here, so if you bear with me.
The first item is that we would like to thank the session sponsors, Google, the DotNZ registry.
Secondly, the Helpdesk is located in meeting room 7 so you get there by going out the door at the back and turning right and up the stairs and the first room on the left.
If you have any queries at all, please visit the Helpdesk.
Thirdly, the prayer room is literally next door to this hall, Hall C. Go out the door and turn right and right again and you can find the prayer room in the Star Room next door.
We've all had lunch, so I don't think I need to mention that again. Item five, the hostmaster consultation, that's also available if you go and book an appointment via the Helpdesk, the hostmasters would be delighted to meet you as well.
Tonight, of course, is the Vocus cultural night. I know most of you have got a nice big A3 sheet as you walked out before lunch this morning. Basically, you experience a night of Maori culture with a night of Maori performance and a hangi. Please assemble at the Convention Centre main entrance out towards my left from looking out the back here. There are several buses leaving, the first one will leave at 6:50 pm and the last one at 7:10, so I would suggest it is in your interest to be around for the buses by 6:50 to ensure that you don't get left behind.
The first buses will leave for returning to the Convention Centre at 10:20pm.
As we will have a short tour around the village, please wear comfortable shoes and wear warm clothing. Most of you have got your APNIC fleeces so I'm sure that that would help to keep the winter chill out.
On Friday evening, an informal dinner and we're heading to the restaurant on the Gondola, I really recommend that experience and you'll get a fantastic view hopefully if the skies are clear of Christchurch, the Banks Peninsula and Littleton and cuisine made of the best of the Canterbury ingredients. Again, highly recommend those. So, if you're interested in coming, transportation is provided and the cost will be NZ $70. If you want to go, please register by the end of today at the registration desk as spaces obviously have to be booked and transport has to be arranged.
Looking at the remains of the program today, after the APOPS session, we have lightning talks that start at 6:00, or probably pretty much after the APOPS session finishes. We've had some good submissions for lightning talks so you're encouraged to stay along. When the lightning talks finish, we go and catch the bus to the cultural evening.
For tomorrow, the big event tomorrow is of course the policy Special Interest Group. What you see and, indeed, what you do not see will influence the outcome of policy discussions and proposals that have been made to change APNIC policies. They can affect you and they can affect your network's number resources. Please come along and especially please express your opinion. If you do not express your opinion, you can't really be justified in complaining afterwards that nobody asked you. So, please come along to the Policy SIG tomorrow. Then, tomorrow evening, we have the APNIC Vendor Reception, so we start your Thursday evening with the APNIC Vendor Reception and enjoy some Christchurch nightclub with a game of billiards and finger food and beverages and those are all of the lengthy housekeeping notes. I'm sure they'll be repeated again later on at some stage today.
Let's get on to the agenda. As I was saying, this is the Asia-Pacific Operator's Forum. A little bit of background for those of you who don't know what all of this is and wondering what on earth this is doing in the middle of an APNIC meeting. APOPS is really the operational content, operational peace within the APNIC meetings and for those of you who have been to APRICOT, there is a conference. We've been stumbling along and looking after the APOPS thing for the last few years. The website, www.APOPS.net. Basically, it's kind of just pointing out where the APOPS sessions are, where the next events are and so forth. There is a mailing list of course. The mailing list is probably running longest for at least 10 or 12 years now.It was fashioned after NANOG, but thankfully, or maybe not thankfully, a lot quieter than the NANOG mailing list. In fact, it is probably a bit too quiet for our liking, but it is the Asia-Pacific region's Operational Mailing List. If you're interested in seeking help from others discussing APOPS issues, the APOPS mailing list is the place to go. Subscription information is there on the slide if you want to follow it. So, as I was saying, APOPS is part of the regular APNIC program. We do a general call for contributions. There is a program committee for want of a better description which reviews the content. These are the special interest group chairs who work with myself and the APNIC Secretariat to determine what content fits into the session.
You will probably have noticed that the APNIC Special Interest Groups are not meeting with the exception of the policy and the NIR SIGs. The other specialist groups are not meeting and each SIG has generated as appearing within this session. So I think I should probably move on because I'm rabbiting on too much here. Before I introduce the speakers, if you have any questions, please use the microphone and please say who you are, because this session is webcast and is useful for the online participants to know who you are.
Also, please have some sympathy for the stenographers who are trying to capture all spoken words, so I'd like to ask speakers and those asking questions to be clear, state who you are and so forth so that we don't end up with confusion on the screen on my left. So, I think without further ado, I would like to introduce Merike Kaeo who will be talking about v6.
Realities of IPv6 IPSEC deployment
MERIKE KAEO:
Hello and good afternoon and here I'll be talking about IPSEC and current deployment in IPv6 which is probably closer to zero than I would really like. But, I am going to cover basically what is happening within the IPv6 standards process. Basically, it is done, but as everything else, it is still evolving and I think that's a good thing, right, you want to keep making things better. I'll talk about practical deployment considerations, some personal observations and also sample configurations. While I'm not going to go through the sample configurations in detail, if you go home and have some equipment, there's examples of how you can configure using Linux devices and Vista and also Cisco.
So, for those of you who may not be familiar with IPSEC, just in a hurry scale, there's three main components, the authentication and the encapsulating security pay load and the Internet key exchange. The Authentication, not a lot of people use it and every now and then, people say, yep, we need it but the reality is that most people say, we must use ESP or should use AH and most vendors don't really test it too much. Neither do they test it too much in tests. So just as a practical piece of information.
I went through the list of standards. Now, if you start looking at IPSEC, you're going to be bombarded with about 40 or 50 documents, so those of you who are at least at all interested in looking at it, these are basically the ones that I consider relevant so I've narrowed down a bunch of 50 documents into these which are basically the most current country ones and the one that would be most relevant to you if you have any interest in configurate and want to understand pieces of it in more detail. The standard is complex, but my take has always been for the last, oh 10 or 11 years, that the implementation shouldn't be and definitely user configurations, it should be as easy as routing. OK, both are very complex, but you know, people don't really need to know how it works to configure it.
There is an IPSEC maintenance working group. The first meeting was at the last IETF in Dublin in July, and there's a specific charter item in there for IPv6 and I've written down the sentence as it is in the charter where basically, there's a standards track extension to IKEv2 that provides full IPv6 support for IPSEC remote access clients that use configuration pay loads and the draft list there is the one that is specific to that. There's also some work that is also relevant if you're going to be using IPSEC in your environments and I'll get to that in a later slide.
So, it's been interesting, I mean, really there's no difference in terms of IPv4 and IPv6 as to whether or not to use IPSEC. The only real difference is that we have a small window of opportunities to now to get vendors to have consistent implementation in terms of defaults and making configurations easier. You know, if we lose that window of opportunity, then we're going to be in the same mess that we were in before and people probably won't use IPSEC because the operational headache is just too much to bear.
But, considerations for using IPSEC, and I always ask people when they're dealing with v6, think about it really carefully. I mean, for years and years, the v6 market here kept saying "Security is built in". They're referring to the fact that implementations were relying - well, they were relying on the fact that implementations would support IPSEC and that would be and ubiquitous. Think about whether or not it makes sense from an end-to-end perspective in your environment with the size of the network. Critical is how trustworthy are the end hosts and can you set up communication policies between the end hosts.
There used to be something called opportunistic IPSEC. Whether or not people are going to be using that more, I don't know, but to actually use IPSEC today, you have to set up security policies on both ends. Make sure vendors are supportive and then, you know you also have to go through other mitigation mechanisms that are as good or good enough, or would IPSEC actually provide some improvement. So in terms of deployment issues, some non-vendor specific performance issues. There's a historical perception that the configurations suck and well, they don't suck as much any more! But they're still pretty horrible, and that doesn't have to happen and from my personal perspective, it is because there's no defaults. The historical perception that it is not interoperable is not a fallacy today, it is basically operational error. People who do not know how to configure in the heterogeneous environment and that's I think the vendor software bug really because of the terms that they use.
From a performance perception, it's been really interesting to hear really well respected security folks say, oh, of course, we all know IPSEC performance, you know, it's horrible, let's come up with new key mechanisms and blah, blah, blah, and I go and say, where's the empirical data? Where's the proof? I don't see it. I personally come from a performance background, and when I look at the difference between using IPSEC for integrity setting where you're basically doing hash functions, what's the difference? Sure, you're going to have more state, and I keep asking people, the implementers of IPSEC, is there really that much overhead in terms of state? And they're like, no, not really. If there is a performance issue, I want to see the empirical data, and if not, I want to find out whether or not that's a myth or reality.
The standards need cohesion and IPv6 certification needs cohesion. About a year ago, we were at the Chicago IETF and there were about 10 people that sat around together during one of the breaks and there were the people who were doing IPv6 logo testing and these were people from the Tahi Project. There was a big event consortium and other people that were heavily into IPSEC and IPv6, and we all hashed out what kind of testing we all decided, you know would need to be what we all need to conform to, and that was about a year ago. So I'm very happy to say that there's actually cohesion in that front.
Vendor deployment issues - this is really the big problem in my mind. Lack of interoperable defaults and I had the IPSEC people, what's the issue? And I said, I don't want to find somebody's security policy. Have you ever run a network? A change is changeable. Let's get together and figure out what we think should be the defaults for 80% of the people who just know that they want some integrity protection and maybe encryption. That's all they know. The other 20%, they're going to be smart enough to change their defaults.
Configuration complexity, there's way too many knobs. The vendors with the specific terminology is interesting because everybody is making up terms and that's why people think that IPSEC is so complex because there's four different terms and they actually all mean the same thing.
Now, the good news is that actually most vendors do support IPv6. About a year and a half ago, I was the only non-vendor at the IP 2 interoperability test and that was solely because I had the opportunity to speak at one of the forums and they were like, oh yeah, we should do IPv6 testing so I was there to do some tests. I wasn't there this past March but I did see the IPSEC tests that they ran through.
So, some other things that I want to explain and just so everybody is on the same page, with IPSEC ESP, you can run in two modes, tunnel mode and transport mode. All VPN implementations run in tunnel mode. In tunnel mode, they run on behalf that needs to keep IPSEC protected. So the two devices that appear, the ones doing the encryption and decryption, basically that's called transport mode. I'm actually up and running these drafts forever, you know depending what time of day it is, I'm either interested again or I'm not because nobody else seems to be interested in IPSEC. But I've done performance testing of IPSEC to get everyone to compare apples to apples when doing testing and I've been asked many times to get rid of transport mode testing and I said, "Are you crazy?" because this is what we want to achieve so I'm not going to listen to you as a vendor, I want to see what is real maybe in future networks.
And one of the things I started thinking about was, you know, we all know if we're running an operational network and some of us might be in the security arena that the port standards that are out there are absolutely ridiculous. There's hundreds, probably millions of scripts that are run per day and the port scans, you know that's probably one of the first things that people do for reconnaissance. Just as a thought exercise, if you had IPSEC integrity using ESP, would that alleviate some port scans, whereas the potential hacker wouldn't actually get a reply? I've had some conversations this week where I've been told, if you just do filtering that's almost as good. I think it's not, but just as a thought exercise, think about it and think whether or not that would really help you in your environment.
Another thing that's always surprised me is that people that are really smart in terms of networking but may not be that experienced with security always say, oh no, can't use IPSEC because the network based IDS and fire calls, I can't use them any more and I'm like why. And it's like, if they're encrypted, I can't see it. And I'm like, IPSEC does not equal encryption. I'll repeat that, IPSEC does not equal encryption and if anything, I would hope that it was used for protection first, think about confidentiality if you have a policy in place that says I need it.
So a couple of concerns when thinking about it from an IPv6 environment is, are there enough people aware that IP 2 is not backwards compatible with IP 1. So it really depends on your vendor right now. A lot of the vendors shipping IPSEC and you can use IPv6, you can all use version 1. Everybody that's implementing it, you know, it's not a sure bet that they'll revert back to IP 1 in the same way, so this is something where there's the whole catch 22, when I talk to the vendors and I know most of the vendors in IPSEC because I've been a pain in their butt for 10 years. And the interesting thing is that they constantly tell me the same thing - no user is asking for it. And I know that no user is asking for it because it is too complex and not interoperable and they use it for VPN, and that's because somebody told me to, and for me, it is this whole catch 22 and if we really think that IPSEC might be useable in our environments, let's fix the vendor issues, you know, make it easier to use. And also, let's make sure that if you're actually thinking that you might want to use it in your IPv6 environments, ask for IP 2, ask when it is shipping and ask how in the world are you going to know if they're going to be backwards compatible with IP 1.
It was interesting too, I started playing with v6 about four or five years ago and right away I wanted to try IPSEC and not everybody had it. I was like, oh, that's interesting.
OSPF has been really interesting, because initially, right before v6, the standard just said use IPSEC, and everybody implemented AH because that's how you do integrity. Just about a year ago, there was an actual stand that became a standard and a document that was written that followed the IPSEC architecture documents where it says that ESP may be used and AH may be use and the whole time I was going, why isn't anybody bringing up this issue? And people were like, "I don't know". So you've got all of the implementations now that use AH and some may or may not use IPSEC.
Another is the transport mode interoperability status. A lot of people don't use it, and there's also PKI issues when working with peers. Generally there's two modes that people use, either using a shared key or a PKI infrastructure, and there's also mobility scenarios. How are they actually going to test it?
And so, there are a couple of enhancements needed to use IPSEC in v6 environments. Dealing with standards, this issue came up about a year ago. When I talked to somebody about it, they were like, oh yeah, this guy just wrote a draft about it and I was like - "Oh interesting", and I was thinking about it at the same time. Basically when they wrote IPSEC, then didn't take into consideration that you could get a prefix advertisement from a router and then the host could create its own IP address, they just thought that the host would have its own IP address already. So that's getting fixed. The second point also is that now that people realise that we can use ESP and either for encryption or use it only for confidentiality, how can we easily discern whether or not the traffic is encrypted or not? And I'll point out why that's an issue and why this is a discussion point right now is to how to change things, if at all. And on the usability front, I'll be like a broken record.
So, in terms of being able to figure out whether or not you can look at your ESP packets and whether or not you can figure out if it is integrity protected or if it is confidential and encrypted and you can't use your Internet Firewalls or IDS systems, basically, this shows what an ESP protected packet would look like for IPv6. And the thing to point out is that you have - you know, you have the IPv6 headers and you might have some extension headers and of course, you'll definitely have ESP extension headers. The way ESP works is that you also add a trailer to the end of the packet and this is where the issue comes in and this is where you probably won't be able to read this, but the first blue parts are just part of the header.
The green is just what's encrypted and what you see at the last field of the green part, it's all part there is that there's a next header, but basically, I'm showing the entire packet, so what you have is the ESP header, but you also have the data. After the data, you find out there's the ESP trailer, the blue part over here, the ESP ICV, the blue square box over there. So basically what you have is if you want to discern what the next header is and whether or not the packet is only encrypted or integrity protected, you have to read the entire packet. So if you think about performance enhancements and how hardware works, hardware does not typically need the packet. It reads in first X bytes and this is where the issue comes in and so people are now trying to figure out, well, can with we do something statistically. I mean, are we going to change things so the protocols change which means that you have to touch everything in the end host. I.e., do we create different protocol numbers so right away, you can say it is ESP and we're integrity protected and that controversy is still going on. So if you're interested, the mailing groups are mailing us.
Now, the biggest issue for me is not really those technical details because they will get worked out and will get implemented, but to me, the biggest reason people aren't using it is because every time you look at different vendors, they all pick their own little defaults which means that as a user, you can't just have an easy configuration. You have to go and say, what are the defaults, you know, what gets even worse are terminology issues, and you know, I talk in detail about IPSEC, I did it two days ago and I've done this very often in the last five years and people always come up and say, man, you totally helped me. And you know, it is basically this slide here. You'll be able to configure God knows how much stuff, whereas before, you were like, I don't know what the hell I'm doing.
So, you've got this IKE version one which works in two phases so phase one, depending on which vendor you're trying to configure, they call it IKE phase one, one called it IKESA and one is a ISAKMP SA and they're all the same thing. I think it is just engineers, the marketing engineers don't know enough IPSEC, that's what the engineer called it so that's what it must be and then just picked the word they saw referenced in the document and they're all referenced there. So problem is that as a user, you don't know IPSEC, guess what, they're all the same thing. The one that key cracks me up is IKE phase and the other says IP number. And IKE phase two, the different ones are IKE phase 2, IPSEC SA and Quick mode. That has to change, let's pick one.
So, in terms of configuration, this goes back a few years, between 2002 and 2003, it had to be 2003, I got myself in a room where someone had a Cisco box and a Juniper box and I was on a kick to see if we could do MV5 authentication and Dave Ward did actually have a draft out but nobody was interested. He presented it in at NANOG 2 at my request, but it was interesting becauseThey basically said, use IPSEC. The problem is it didn't use IP Internet exchange so you had to do something called manual keying, where if you don't know what it is, it is really complex. Cisco, now, you know you would have this whole convoluted configuration, but they used IKE. What was interesting to me is man, if you take like the really easy way to configure it like Juniper had done it, wow, solution.
But it should be as simple as this from at least their command line. And one of the things I specifically put down, SIS LOG and TFTP. And they said, yeah, we know that we have to configure the logging or configure the logging and TFTP is still not that security. I'm like, why not just use IPSEC. Well, they're not going to use IPSEC because it is way too complex to configure. Let's go around in circles again. So for me, let's just figure out as a community what are the interoperable defaults. This isn't the right community. The right community are the people implementing IPSEC, but if we decide to use it, we need to tell them that we want to use it and we need it to be easy to use. For me, my wish list, common terminology, interoperable defaults and an RFC 4308 which has these crypto suites which were basically code words for defaults.
It was a good start but they need to be updated. Interoperability tests for transport and tunnel mode. It would be nice to have API standards to be completed and it would be nice to have repeatable performance data so people, if they say your performance stings, I want to see the proof.
So, this is just an example of a policy that I was saying, you know this is pretty good. If you don't know what to use, use this. And the next couple of slides are basically not something I'm going to go through in detail. These are basically something that you can pick up from the website and use them to claim their environments, it is a cheat sheet. How do I play with IPSEC and not get too much of a headache? I always do this, I offer and some people take me up on it. If you try something, you run into a problem, you've tried bugging me send me an e-mail and I can probably help you. I won't help you for weeks on end but if up want to try it, I'll help you. So here is documentation for how to configure stuff. These are the ones that I thought were pretty good.
And the rest of the slides and I've done this with workshops so I know that this works so here's the steps that you go through to configure the Cisco stuff. There's a lot of them, the terminology is very convoluted.
I myself prefer Raccoon and I'm going to start playing with Raccoon when I'm home, which I'm not very often. But anyway, there's just some pointers. In terms of just using Raccoon, basically, this is what you need to do. There's a couple of files. The PSK.txt text, this is the one that people mess up the most. They know how to edit Raccoon.com and they're like, ah, the peers don't create an association and basically what happens is that there's no shared key or there's no certificate. Well, there's a file called psk.txt so I use VI and I VI the text file and put in the password. That's all it is, the file only contains your shared secret from the pierce and that's it.
A couple of other things, IPSEC.com, three files to deal with there. Here's how you can actually create the database, the security association between the peers. You have to do this on both sides if they're both Linux boxes and so, to see the text file and see it or things don't worry about it and simply enter - you know here's my peer IPv6 address, space, and put in your shared secret and put it in on both sides.
Also make sure that you set the permissions because you know, this is a password, so some people sometimes forget this. Just make sure that you know it is not readable by everybody. This is too small for you to see right now. It is a sample of what this Raccoon.com file should look like. And here's just how you would test it and create logging information, so if something doesn't happen, you can actually see what the logging information says to see where it might have failed.
These slides are definitely, you definitely can't see. I did do some testing on Vista, I really like the new implementation. They've done a pretty good job at making the jargon more user friendly and not so geeky for people who really don't know the IP 6 standard and I've shown screen shots of what they do and I know that you definitely can't see that unless you've got x-ray vision. But this is just to help you out if you want to actually try playing with it. And the one thing that I like about the Microsoft implementation, you know there's a button where you say, "where are the defaults", because it is really nice to say, here are my defaults and at least from the other side if you're in a heterogeneous environment, it is good to do that. Most things work there. I was in Kathmandu the other week at SANOG and they were saying, are they the same? And they were like, absolutely yes, but it wasn't. So double check.
As a conclusion, IPSEC is definitely a complex standard. I mean, if you just went out and tried to look at the drafts for the RFCs and the number of them that have IPSEC in it, it is astounding.
But really, the implementations and user configurations should be much, much simpler. Using IPSEC does not mean that you have to encrypt the traffic. Don't leave IPSEC out, everybody leaves it out. Play with it if you can. It may not make sense, you know. But I figure, you know if we don't think about it really, then how do we know. And quite frankly, the window of opportunity to have things be a lot simpler to configure is pretty fast running out. So anyway, thank you.
APPLAUSE
STEVE KENT:
STEVE KENT from PBN, You included RSC 4301, has that made it more compatible. It assumes IP 2, so I'm just warning people, I'm not sure that I would go to the historical background of IP 1 if they're going to be simultaneously looking at it because it could be confusing.
MERIKE KAEO:
The reason I included IP 1 is currently every shipping implementation that supporting IPSEC with IPv6 is using IP 1.
STEVE KENT:
But not 43.01?
MERIKE KAEO:
Yes.
STEVE KENT:
I wonder that the inconsistency may be confusing.
MERIKE KAEO:
I don't agree that this is that confusing.
STEVE KENT:
I say that it is.
MERIKE KAEO:
OK, we agree to disagree.
STEVE KENT:
So, you're absolutely right, people screw up the terminology and it is inconsistent and I'm 100% behind you to make it consistent. But in the interest of accuracy, they not only define different key lengths but generators. That's why the other names are more appropriate names for them than key length.
MERIKE KAEO:
And you're correct. The thing is trying to do a synopsis, but Steve is absolutely correct, obviously. I mean, he was the co-author or author of many of the standards, so you know, trust him over me, OK. But don't ignore me!
RANDY BUSH:
Randy Bush IIJ. Good talk. Steve, the reason they're inconsistent with the presentation is that we all have to do with implementations that are inconsistent and the reason for the split practice is what happens to me every day.
Where is the URL for the WIKI where you said, they're not going to do that next month, but we could paper over that with a WIKI that says, here's an interesting set of defaults on your Cisco, here's how to configure it. On the Juniper, here's how to configure it.
MERIKE KAEO:
I forgot to point this out, on IPv4, Paul Hoffman, I actually did one of the first configurations for Net PSD and if you go to the site, www.VPNC.org, somewhere on the top, there's a whole school of links and one of them is configuration profiles and under that link, there are 40 or 50 vendors with all of their products with a consistent scenario. It is only for v4 right now but something like that to actually mimic the v6 would be nice.
RANDY BUSH:
For both. I just want, I'm old and stupid and I just want to be able to go somewhere and cut and paste and bring it somewhere.
MERIKE KAEO:
You know, I've been meaning to ask about that and most of the configurations if you substitute the v4 address with v6, it works. The issue I have is that it is only for tunnel mode and transport scenarios, but let's work on that. I agree with you.
STEVE KENT:
Just a quick response to Randy, you're absolutely right on the implementations versus the specs, it leaves a lot to be desired. The observation I was making is that when it reads 43.01, it talks about a lot of things that could not have IP 1 and that confuses people. I agree for most part, it is the same but if you're trying to read it to understand what's going on, you might get confused because of that.
MERIKE KAEO:
What is the number? 34.01?
RANDY BUSH:
With all due respect to the authors Of the documents, I wouldn't advocate reading one of them. This is an operator's forum. IPSEC.
MERIKE KAEO:
You know something, I can't of... you know, if you know, that slide I actually did that in the morning and I was like, will I even put some up there and I thought, yeah, some people might be interested. But let's see how the implementations work. If they really stink and you want to use it, but you're not sure, have a look.
PHILIP SMITH:
OK, thank you very much for that. So, next up we have Che-Hoo who will be talking about the sharing experiences of the Hong Kong Internet exchange.
Experience sharing: IPv6 at the Hong Kong Internet Exchange
CHE-HOO CHENG:
Good afternoon, I'm Che-Hoo from the Chinese University of Hong Kong and we're running the Hong Kong Internet exchange. And we try our very best to be as professional as a service provider.
Before I go into details, I want to quickly go through an introduction of HKIX. We set up in 1995 and we went to the exchange of the infrastructure so you can do bilateral peering and the MLPA is mandatory, and you can see the route server AS path, and we do route filters. When you do that, you probably can get more routes from the peers and there's one there.
We do implement Port Security, so you can only have one router for each switch board and the best part of the file exchange is that there's no port charge, that's because we're not for profit, but if you want a 10 GE port. We do have to charge for that. The servers are still located in the university and the exchange is considered a critical infrastructure in Hong Kong and that's why we were told not to do any change during the Olympics!
Let's not finish yet because there is a time limit coming soon! OK, here is a simplified document of the exchange. We just use a very simple router to the route server before the traffic will go directly through with the route server. We have HKIX 2, a very special exchange setting up away from our campus which was set up around November 2004 and it was, it linked up to two 10 GE port, but it is the same domain and you can not do bilateral sites.
To be honest, this is not a switch for IX, it is more a switch for the data centre environment and more information after this session. Because Metro Net is very popular, most connect to our switchers, Ethernet is very popular.
Now the ISP can connect directly to the exchange. There are even some overseas ISPs which connect directly through the long distance Ethernet servers. Those overseas ISP, they do not even have Hong Kong routes. And well, we have more than 90 participants now and on the multilateral peering part, we have 26K routs. Right now, it is about 18K routes. People are trying to turn their announcement to control the traffic so you see the fluctuation, and the peak is over 65G.
From here, you can see a peak. Last Friday, the reason is that there was a typhoon in Hong Kong, everybody stayed at home and they all do... here, they didn't have anything else to do, they just... stayed on line. About three weeks ago, it was also because of the typhoon, so typhoons, whenever there's typhoons, we need to check the usage closely.
And very coincidentally, the two typhoons, the first one happened just before the Olympics, and the last one just happened right after the Olympics. So very good, very lucky!
And another peak in this period, in June, do you know why? It's because euro 2008 was going on and Hong Kong residents can only see the games by paying pay TV stations so people are actually using the peer to peer to watch the game free of charge.
Well, we also have to keep intra-Asia traffic within Asia. Hong Kong is in a very good position, you can see a lot of overseas, main land China academic networks presence in Hong Kong.
Hopefully we can replace the Cisco in the plan for 2008 with a higher end layer 2 switch. Mainly because we need a better 10 GE ports. We want to support link aggregation with port security. And sFlow also. Replace one Cisco catalyst 6513. If you have any suggestions let me know. The earthquake in December 2006 and we lost 90% of the overseas connectivity. IP connectivity in Hong Kong was OK because of HKIX, but because of the top level server including dot com and dot net, the local servers were not reachable because even though the IP connectivity was OK.
So after then, we tried hard to talk to VeriSign and tell them to set up something in Hong Kong and set up and connect our HKIX and now we have successfully connected to HKIX. We have other players like Afilias, APNIC connected directly to HKIX or indirectly.
OK, going to HKIX, sorry, IPv6. Let's talk about Hong Kong, Hong Kong is lagging behind very much. I don't think you could hear very much about IPv6 in Hong Kong. Well, we definitely need to catch up and well, in terms of service, only a handful of backbone ISPs provide IPv6 transit service in Hong Kong such as NTT Com and Reliance Globalcom which was called FLAG before. And very, very few retail ISPs provide IPv6 access service, and they are not active at all. In examples like NTT-HKNET which is local and NTT subsidiary and ECN and CITIC 1616. What they have are for business customers only, and there are no residential customers using IPv6.
We notice that two mobile phone operators are testing out IPv6, CSL/Telstra and China mobile-peoples, but since they are not actually using it, there is no IPv6 tunnel broker in Hong Kong, but hopefully Hurricane Elijah will hit Hong Kong soon. There's no attraction for people to switch to tent providers are not ready for IPv6. As for academic networks, before HKIX did something on it, HARNET relied on IPv6, of course, ABILENE could not provide full routes, so even for HARNET, most overseas traffic could not be obtained and the traffic has to go through the US.
Well, of course, we are committed to help Internet development in Hong Kong and we did it since 1995 and we support IPv6 since March 2004. We are doing it with dual stack and today, 16 different ASes have been assigned addresses at HKIX and have joined our Multilateral Peering and of course they can do bilateral peering also. As for the root server, there are server instances with what the rooter server F which is now supporting IPv6 transport at HKIX. While we are running dual stack, dual stack can do good things but a bad thing for us because we don't get as much traffic. My wild guess is that the traffic is 0.001%. Here are the lists of IPv6 participants at HKIX. As you can see, they are APNIC, mobile operators, academic networks, ASCC-ASNET.
CNGI-6 IX. Our own university. And then, academic networks from Korea, KREOnet 2 and Reliance and Samsung, etc. These are good networks of academic network and commercial networks.
We recently did something, we want to promote IPv6 in Hong Kong over HKIX and we notice there are a number of academic networks which announced several hundred IPv6 routes to us. So we removed the route filters to allow people to do some kind of transit change and now transit exchange and we have BGP community tagging to distinguish upstream routes and downstream routs. And just last week, we had NTT Com announcing free IPv6 transit service for HKIX participants and the free offer will be available until December 31, 2008. And if you are on HKIX, if you want to do this, please contact NTT-com.
OK, some observations - dual stack seems to be the norm for IX. Well, because I think the main reason is if you separate v4 and v6 into two different infrastructures, then the people will not be too interested to join the IPv6 part because it will cost them extra circuit and extra equipment to do that and it is not cost justified, but with dual stack, there's no need to have separate equipment connection for IPv6, so easier to justify. We see that most providers use the same AS for both IPv6 and IPv4. The only exception that I see is Pacnet who use two different ASes for IPv4 and IPv6. But, as I said, because of dual stack and because of Cisco catalyst 65-13- sorry for that. We don't have full knowledge of how much IPv6 traffic volume we have now. Hopefully with sFlow, we can have a better idea. As we see that using especially for IX connection, using tunnelling seems not quite acceptable by the community, but anyway in our case, because all of our participants or most of the participants are using physical circuit connection to us, so it shouldn't be a problem.
The first day that we removed the route filters, The action that we took three or four weeks ago, someone from Germany immediately noticed that and immediately sent us a message to the others saying that this was not good practice. But anyway, at least to me it is a good thing because some people do care about IPv6 routing and that's a good thing and because of that, we implemented BGP community strength to distinguish routes of upstream and downstream. And also, we observed the commercial providers care very much about routing and operations and they treat v4 and v6 the same as much as possible, and of course, their concept of transit and peering are very clear and not like academic networks and also of course, they treat customers and pierce differently.
Now, we are having only /64 for the IX and because we have HKIX and HKIX 2 and adopting a very conservative approach, they used /120 for each, but I know APNIC has a new policy for IX now. Now we can get /48, so we probably will get /48 for our two exchanges and then use /64 for each. But of course, we will go through some migration process. But if we can not do it now, I think we can not do it in the future, because there will be more participants.
During the process, we also noticed that maybe some people, some participants blindly prefer routes learnt from us, learnt from our multinet peering because, mostly because we cost less, so they find they prefer us setting up higher presence, but this is probably not a good thing, so in the future, we will try to implement more BGP community tagging for them to control their traffic more easily and then tell them not to blindly refer routs from us. That led to the exchange, but many, many times, people ask us to do black hole, but of course, we can not do that, unless there's a new feature. But anyway, under observation, related to IPv6, HARNET, the academic networks for the audio verses, they have a /32, located by APNIC but one university, each university was assigned a /48 and some universities, they're multihome and there is one university connecting to academic networks and that university was assigned a /35 and it's very, very messy and it is very difficult to control the inbound traffic. So I think that this should be changed and hopefully HARNET and IPv activity can be improved with better collaboration and co-operation.
Another observed significance, my MyAPNIC, I've just learnt that there is a new version, but I'm a bit disappointed that it still does not include the feature like allowing easier reverse DNS object set up for v6 and also they don't have the same treatment for router and route objects and hopefully for the release at APNIC, I hope this can be improved for My APNIC and for better IPv6 usage, MyAPNIC should treat IPv6 and IPv4 - provide the same functionality For IPv4 and IPv6.
Just recently, there's a change of policy for initial allocations criteria and it seems that not many people are aware of that. And as you may know, if you are an APNIC member and you have existing IPv4 allocation, you can get IPv6 allocations more easily now, but it seems like it is not well known now, so I'm trying to promote this within Hong Kong, and we also see a top level domain servers do not support the record and run IPv6 transport, and specifically for HK, it does not support the records. So that's why this morning, when I tried to connect back to my own network, there's some problem with that, although we are v6 enabled, there's some problem with the quad A records there.
Well, Hong Kong is still far from universal deployment of HKIX, especially in commercial networks because nobody is pushing and there's no demand. IPv6 knowledge is bad. Our university is trying to use it as a platform to promote v6. Hopefully there will be more - at least more v6 trial in Hong Kong. And the last point I want to say is, Mac support IPv6, but I don't know why the iPhone 2.0 does not support IPv6 and I couldn't use any servers this morning using my iPhone. Although I love it, I hope iPhone 2.1 or 2.2 can support IPv6. OK, that's it.
APPLAUSE
PHILIP SMITH:
Any questions for Che-Hoo.
GEORGE KUO:
I have a question regarding MyAPNIC. I think just to use this opportunity for everyone, I'm in the process of reviewing the procedures in MyAPNIC and I'd like the feedback from Che-Hoo and I would like you to send your feedback to me or to the Helpdesk.
PHILIP SMITH:
OK, thank you Che-Hoo. Next up, we have Matsuzaki Yoshinobu from IIJ who will be talking about the IPv6 address architecture.
IPv6 address architecture on P2P link
MATSUZAKI YOSHINOBU:
OK, I am Matsuzaki Yoshinobu from IIJ. Today I will talk about IPv6 operational issue. Good, OK. So, I will go fast. So IPv4 and IPv6 are similar in routing, the concept is almost the same. But IPv6 has more bits in the address field.
This is a very basic scene, the directory on the same segment. And on the Ethernet, the arpa address. And next in the IPv6 case, the directory On the same thing as well. And then, the MAC address is needed on the Ethernet. The discovery is performed, to neighbour discover is used to perform. Then, let's talk about routing. Here is a network, there are two inter-router connections and there are two router-host segments. This is IPv4 case. We usually assign optimal net blocks like /28 or/30 based on the needs.
OK is a IPv6 case. We can assign /64 everywhere. Then, point-to-point link, so point to point link is a mainly used as an inter-router link. Typical is a POS or a serial and SONET. It is useful for trouble shooting so people use it for long distance cabling and inter-country or transpacific networks. And the tunnel is also a point-to-point link. And the router just send packets To an opposite router vi a the link. We don't need layer 2 address resolution like arpa on point to point link.
Before, we used to configure an address of a digital router on each point to point link, but nowadays, we use /30 or /64 for point-to-point link as if there is a segment on the link. It's easy for us because we don't care about media of the link. OK, this is /30 in the link. The first address is the network address. So the second and the third one for the routers. And the last address is the broadcast address in the IPv4 case.
Then, this is a /64 for inter-router link. It is as a result of the subnet router anycast address. You need two addresses for the routers and the others left unused. Sometimes, operation people want to assign optimal prefix for the inter-router link, so let's consider /126 for the link. For the subnet router-anycast address, but there is still one more unused address there. Let's imagine, if a packet is coming to the unused address on the segment, isn't it the case, layer 2 address resolution is performed to find the destination and that no such host in this case and the host is unreachable. And the next point-to-point link case, the packet will be sent on the link.
One assigns /24 for the point-to-point link. We never do this but let's assume. Letter 1 is for Router And Letter 2 is for Router B in this case, what happens if the destination of the packet is 13. Yes, the packet will loop on the link until the TTL is expired.
So, a packet destined for unused address on the point-to-point link could be looped on the link. IPv4 case, there is nothing we can't address, because we usually use /30 or /31 for these links, but IPv6 case, there are vacant addresses on the link. Of course, this issue was noticed early and has been discussed and we have RFC.
So, RFC 4443, so, there is a special case in ICMPv6 error messages. So the RFC says when you use /64 for point-to-point link, a packet destining you for an unused address like colon colon 13, if the incoming interface and the outgoing interface has the same one, and if the destination on the address is on the point-to-point link.
The first one is linking for the inter-route link. In the IPv6 case, the router does not require global address for an inter-router link. Only loopback interface needs a global address, so neighbouring routers can exchange routing information by link local address. But, there are several issues of course. We can not ping from the remote site to monitor the link. For the eBGP session, like, we have to rewrite the next hop of the address space. So the second one could be a packet filter. It is what we call infrastructure ACL. So, we can allocate the address block for the infrastructure first and then put the packet filter that denies packets from outside to the address. IPv6 has more bits so we can allocate infrastructure address easily from the IPv4 case.
But the first thing is how to maintain these kind of packet filters. And we also need a special tier for inter-AS connections like eBGP connections.
Then the /127 for the inter-router link. The /31 in the IPv4 case, there are only two other address in the /127 case, so there are no vacant address on the link. And that this was actually discussed before.
So, the title is use of /127 considered harmful. So the first address is reserved for the subnet router address. It might cause problems in the future. So, almost all routers do not support the subnet router address at this moment. So, we can write the RFC that do not use the subnet routing address with /127 addressing. But this brings another special case. There could be issues with the prefix, other than the /64. But personally, I'm pushing for the /127 for peer-to-peer links. The key thing is specs, not implementation. So we have to check implementation before using it and please, please let vendors know when you find an issue on the equipment. Any questions?
RANDY BUSH:
Randy Bush IIJ. Flip back one please. Yeah. The first address used for anycast, am I correct that the word anycast there is not anycast as we know it and use it on the Internet? This is another case where, "Leave your RFCs at home, we're now entering reality". And that is not a real problem.
MATSUZAKI YOSHINOBU:
OK.
RANDY BUSH:
That anycast as we know it is the same address being amassed from multiple places on the Internet, this was some artefact deep inside IPv6 that nobody uses, nobody ever will use, etc., so there is no real problem with use of /127. Am I correct?
PHILIP SMITH:
Any other questions? If not, thank you very much.
APPLAUSE.
PHILIP SMITH:
Now, we have a bit of a scheduling issue, but rather than making George put his presentation into three minutes, we'll have a break now and come back at 3:50. And Geoff, I hope you're OK with coming back after the coffee break and it will run a bit longer than the 5:30 advertised finish, but then the lightning talks don't start until 6:00 so we'll steal a bit of time at either end. Please have the coffee break now and we'll come back at 3:50.
(End of session)
APOPS
Wednesday, 27 August, 2008
1550-1730
PHILIP SMITH:
We are running a bit late and rather than shoe-horning Geoff into five minutes, we will go to the first item in the second session. Five presentations. I apologize, we are going to steal a bit of the break between APOPS and the lightning talks at 6pm, but we will try to get through the agenda as best we can. If other speakers can be poised for when the other speaker is finished, it would be appreciated. First up, we have Geoff, who is going to be talking about 4-byte numbers.
What's happening with four-byte AS numbers?
GEOFF HUSTON:
Do I need thingo? I don't need one. Good afternoon, everyone, or those of you who are left. I'm shifted to the right? Why. This is 7/8ths of me, the other has disappeared into no-bit land at all. The 16-bit 8-number field is defined by the well known protocol BG, buggered if I know what it is. If you know, please tell me. You want to try to fix this? Woo!
Expertise. Catch that man, frame, him, stuff him, use him again.
A long time ago, and I think it was around 1989, in defining BGP, they needed to do number systems to be able to uniquely determine the path of a route in order for BGP to be loop protected. The only way you we know if a loop is happening is if the path contains your identifier. At the time, BGP was meant to span around about 200 or 300 routing parties. So the idea of using a 16 bit field of 65,000 identifiers seemed relatively conservative. Also, we were working, at the time, on the next generation of inter-domain routing protocols, which was iDrip or IPPRP or something, which was meant to be the mother of all protocols and do everything so we wouldn't need AS numbers. It was meant to be something to last for a little while, so the 16-bit decision seemed quite reasonable.
Why use it in BGP? It is used for loop detection. And that is the major reason why AP paths, sorry, AS numbers are actually manipulated. The other thing we do in BGP is use it as a pathmetric calculation. Because you are in inter-domain space, the usual at administrative metrics in your favourite IGP of choice don't work. Your metrics are not mine. We don't expose that level of domain routing policy. Instead, we expose a very crude method of the shortest path by default wins. The first one we use is peer identification, you actually like to know who your partner is setting up a BGP selection. The way you can do it is making sure the AS numbers get exchanged for the ones you expect.
That was then, this is now. And out of those 65,000, 65,536 numbers, the IETF has reserved 1,026. IANA has got 16,384 left in its pool and either allocated or in the process from the RIRs or the other 48,126. You have been busy and you are chewing numbers, congratulations.
Colour, light, and numbers - this is the coloured version of the same thing, each stack is 256 AS numbers, the blues are in the routing table, the reds have gone into the great AS retirement home in the sky and the greens are the ones I think the RIRs still own in their pool. What is interesting is even now, out of every sort of block of 256, we only get to see around 192 or 193 ASes in the routing table. Even now, you don't use all the AS numbers. I suspect that the rest are actually being used in MPLS VPN, ID, because you can stuff an AS number in the identification field.
The early numbers, there is a whole lot of unused ones, it seems this industry doesn't really like old AS numbers, they seem to go senile and you stop using them and you like bright, shiny new ones. Every time you do the graph, a few more retire into the red and we move to the right. So that is the kind of big march. If you look at - over time, what happened, this is an amazing picture of this industry's boom and bust. Because AS numbers are actually a reflection of the growth of independent players with their own routing policy who take their own AS number. Can I have that laser pointer? Woo! The first part is actually the Internet boom taking place. And you have to say that strong exponential growth and the numbers agree, it is very strong growth. If you wanted to know the day when the boom finished, almost to the hour, it is that point, just there. Right.
So around about, I think it is midsummer, 2001, the industry decided to stop growing and the dynamic that took over was this dynamic, which is actually a much slower growth from the previous one. This is the unannounced AS numbers and like I said, the pool just keeps on growing in size over time. Old AS numbers are indeed senile and no longer useful in routers - God knows why.
So now, it is kind of how long will all this last? When does all this crunch? It is then possible to look at the RIRs and say, "How many ASes are given out every single day?" You notice a lot of noise, that is just week days and weekends. Interestingly enough, we are still working on weekends over in RIRland, 7 days a week, we hand down the numbers, the numbers don't go down to 0. But in essence, the daily rate of AS number handouts back in 2005 was around 10 AS numbers a day and has been growing steadily and surely and today we are moving at 15 AS numbers per day, which is quite phenomenal. You can probably analyse that and find business cycles in there, if it is your bent.
What we can do, oddly enough, is do the same kind of mathematical model of modelling the entire system. You use the technique called mathematics, you do a first order derivative, do a curve that matches, seems to work out, you model the RIRs and you can find a date when things stopped. Oddly enough, it is about the same time as v4. And I have to say, it is coincidental, bizarrely.
SPEAKER FROM THE FLOOR:
Can we have another set of key rings?
GEOFF HUSTON:
So, God is up there at crunch time. This one hasn't moved a lot. When I did the work first, it was November 2010, AS numbers are more stable than IP addresses. I suspect this is actually business, the momentum of the way in which people invest in the Internet, the way in which people actually grow in terms of number of distinct entities is phenomenally stable, the investment pace doesn't change. So the curves are remarkably stable compared to the address curves. There is something about AS addresses as a business indicator, which is in there somewhere. Not quite sure where.
So anyway, the IETF had a plan and unlike v6, this plan is a ripper, this one works. The first thing you do from 32-bit to 128 bits so this is 16 to 32. We will never have 4.4 billion entries in the routing table! Never, never, never! Never... well, maybe never... hopefully not never. Well, not in the next few years, anyway... so, for the interim, while we haven't got there, what we are going to do this time is deliver on the promise of these magic words called backward compatibility. Remember those words, it is where v6 failed completely.
In BGP you can do that, because BGP is not end-to-end, it is actually hop-by-hop. All you need - an AS number actually detect loops. So you are only looking for yourself. You can do some really hinky translations. And the IETF standard is really quite cool. I might be a sexy, 4-byte BGP speaker and I know about 32-bit quantities, but you, Matt are an old unreformed speaker who only understands the 16 bit things.
Where I have numbers with the 0s, I strip them out, where I have numbers with a non 0 at the top, I substitute 23456. You are looking for yourself and your number is a 16 bit number so you don't care. So magically it kind of works, you will see AS 23456, becoming the horizon of the next millennium because from your world it will pop up everywhere, but from my world I see the start. The beauty of it, unlike v6, there is not a router transition. What you have still works, just fine. It doesn't have to change, at all. Your routers and your router software may have to change for other reasons, but the deployment of 4-byte ASes intrinsically doesn't require you to upgrade your routers, leave them alone, they are just fine. The only folk who have a bit of a problem are the folk who are building new networks or have somehow figured out their old AS number is senile, beyond their use-by date and they want a sexy one.
It is going to be big, if it is the case you better have routing systems cope. All the new networks, 15 a day, are going to need, at some point, to get a BGP that understands big numbers. That was the plan, the good plan, they needed to open up a number registry and the plan has been executed. We have redefined it as a spec, it has been opened and we are looking good. There is the proposed standard so they know how to build and they don't build it on a Wiki page.
The RIRs had a matching policy. We knew it was happening from years ago. Around about 2002, 2003, for the first real projections but the work started with Inki Chinback in 2000. In 2006, we pushed the policy through in the RIRs to say, Let's give everyone planning time. Remember that word - 'planning'? Rather than waiting for the last microsecond to figure out you have got no more AS numbers in v4, let's make you aware.
From January 2007, if you wanted to play, the numbers were available to play with. You had to ask, if you didn't ask for them, you didn't get them. To give a strong signal that for new players, life is going to have to change, so vendors, operating support systems and the different infrastructure you use to support your routing environment needs to change. We will say, "Let's work out a date." So in January 1, 2009, short AS numbers are still available and you can get them if you want them, but if you say nothing, you will get a big number. You don't need to give any particular reason, just say if you want a short number, but if you say nothing, in a few months time, you will get a big number. So if you are sort of thinking about this, you think, I have to remember after January, if my routing system doesn't support big numbers, say so, and you will get a short number.
A year later, about a year before we are going to run out anyway, we drop all the distinctions and just treat the AS number pool as one pool, 3 bits long. In theory, we go it the pub and congratulate ourselves and life is cool. The idea of doing this was to give planning time. Because, quite frankly, it is a massive industry out there and everyone waits on everyone else to spend the money before we go and do everything, right? We were trying to give vendors, the suppliers, the ISPs and everyone in the supply chain enough time to get 3 bit support.
It is not just routers, it is the operating support system, the lines, the lookups in various databases that configure your systems automatically. Anything that manipulates an AS number you need to look at it and do changes because they have grown longer.
The other objective was to say to the industry, when you look at your product development life cycles, "Here are some dates to work against". What a phenomenally different concept! So the idea was to give you milestones to work at. So gasp, horror, and shock, to do advance planning. I can see you are all looking stunned at this. "Planning" what is that word? And 'avoid disruptive exhaustion' - remember the words, you will need them, of the 16 bit AS number pool, because we want to get it done without too much stuffing about.
Has it happened? Yes it has. Most of you are running old BGP code but there are at the moment twelve 32-bit numbers flying around that you would see as AS 23456 and they have been doing it for over a year. So far, out of the 43,000 numbers we have allocated - I can see 12 of them in my neck of the woods. Because there are 12, here they are, isn't it great? That is APNIC, that is APNIC, that is RIPE.
Oddly enough, the RIRs are doing some of the experimentation but there are others playing in the place and it is very, very encouraging. You should do more too. However, most of you don't write your own BGP. So you normally are relying on vendors. So the real issue is what are the vendors doing? Folk might like to comment on the microphone on updates to this, but this is my personal understanding.
There are folk from vendorland and they might like to comment. I understand from Cisco it is IOS-XR 3.4 and greater. From Juniper, JUNOSe 4-1-0. But the gentleman from Cisco should probably correct this and give me authoritative information. I have been told Redback does it, there might be other versions but I don't know. It would be good if you knew, it would be good if all of us knew, so the list probably needs updating. We will see how you can. There are two open source BGP versions out there that do support it now, Quagga in version 0.99.10 supports it, I have it, it is cool, and Open BGPd has patches, I wrote patches yonks ago that will run it.
You might think, "Why is Geoff rabbiting on for the last 10 minutes. I have just been told there are 10 out there, why am I wasting my time?" Good question, I can't answer it. But if you're thinking you might be interested, the real question you have got to ask yourself: When two customers - not just one, but two - hop up with 4 byte numbers and want me to upstream, will my operating system get the two confused when I try to do the route filters and do all the rest of my internal configuration? Can you support customers, peers, and transits, all appearing to you on the routing plane as 23456?
If you do use AS numbers in your operating support systems and do manipulate them as indices against customers, the real question is what happens when the other folk start presenting big numbers at me because my current routing system says they are 232456. Don't bother changing the router, it works but you will need to change your software. If you intend to do the dance and your AS number is senile and it is old and demented and you need a new sexy one, you will need to figure out a new version of BGP. It is the cost.
Here is one place with a whole bunch of information, the URL, isn't it cute? It has colours. There is information on Wiki for those who want to update, there is the URL that will be on the slide pack. I know I have skated through this at a phenomenal speed and last week at AUSNOG, James Spenceley spent a great deal of time talking on it and there are questions I have not answered, but if there are questions, plaudits about it, now would be a good time to ask.
PHILIP SMITH:
Any questions?
GEOFF HUSTON:
What is the latest information from Cisco?
PHILIP SMITH:
Yes, CR 3.4 and XOS 4.01, otherwise March, April time next year.
GEOFF HUSTON:
There is an annoying period between then when your plans and our policy are slightly skewed.
PHILIP SMITH:
Yes.
BEATTY LANE-DAVIS:
It is.
GEOFF HUSTON:
You are from Junos? Is there a date or is it currently released?
BEATTY LANE-DAVIS:
It is in release.
JAMES SPENCELEY:
James Spenceley from Vocus. More a comment on the AS 23456, I did a presentation last week at AUSNOG, but the interesting thing we found internally, we are not ready to support a bunch of customers on a shared AS, effectively, primarily, we wouldn't be accepting 4-byte customers until the routers could support that. I would encourage everybody to look at the problems surrounding having many, many routes on the network from the same AS and appearing from the same, what looks to us, the same customer. That was my comment.
GEOFF HUSTON:
I have to emphasize, in a routing sense, all of the next hops differ, BGP will never get confused. So this is nothing to do with the routing system itself. It is the operating support system that you are using that is keying customers by AS number that gets confused.
JAMES SPENCELEY:
Yes, I mean, we generally identify customer AS because of the many sessions with the customer, for us, people's knowledge may vary but it is certainly something to look at. When you see a whole bunch of routes, from the single AS and they are different for other customers, it is a pretty strange feeling.
GEOFF HUSTON:
Or AS 223456 is the next dark horse taking over the universe.
TOMOYA YOSHIDA:
The recent update, the ace plan is the 7.7.1.
GEOFF HUSTON:
7.7.1 from force 10.
TOMOYA YOSHIDA:
Yes.
GEOFF HUSTON:
Using single integer notation.
TOMOYA YOSHIDA:
Yes, and 7.8.1, so in Juniper, 9.1 is the base plan but later 9.2, so the additional AS. So Juniper, they have two implementations. In Japan they are turning to the AS thought.
GEOFF HUSTON:
I should say something about notation because my slides didn't, in the original documentation of the drafts, that were submitted to the inter-domain working group on this, the authors, Eg Ke Chen, and I have forgotten the other gentleman, they used a notation that had 16 bits followed by a colon with another 16 bits, a big one might be 1:2. When I, as the author of the policy proposal to all the RIRs putting forward the staggered dates looked at that, I suggested to the RIRs inside the policy proposal that we should use a notation that was consistent with that, A:B. Most in the RIR community said nothing, except for ARIN, when they reviewed this, they looked at it and said, "You will get very confused with community notations." So they said, "Hmm, and why not use a dot?" So then, the policy proposal had notation A.B and then further along the line, and the next iteration, the ARIN community said, "Actually why are you giving notation in the policy proposal to us? We are not the notation people, take it away." So it got taken away, now there is no particular standard out there, there are a number of implementations and a number of usages in a number of contexts.
The operating community has come back with the view that any kind of delimiter is different from current practice, it points out the use of expression matching in router config as a good reason not to use it. Myself and a college at APNIC submitted two drafts to the RDR working group and said, "Here are two documents, go Mick." At this point, I have to say the overwhelming consensus emerging in that particular area for standardization notation is a standard that reflects plain integer numbers. They have to go through the process but as an early reading of the tea leaves, it would appear to be an integer. To make a long story even more boring. Back to your email.
SWAMY:
Some 6.1 patches.
GEOFF HUSTON:
7600 patch.
PHILIP SMITH:
Could I ask folks with information to update the Wiki that Geoff mentioned. It would be a good place to have central information.
GEOFF HUSTON:
Customers want to know, if they don't get a number, they will get a big number in January, it would be very helpful if you would help all us by updating that information, thank you.
PHILIP SMITH:
Thank you, Geoff, and thanks to the questioners. So, whoops, I need to do this. Next up we have George Michaelson. George is going to be talking about the day in the life of the Internet project.
The Day in the Life of the Internet project
GEORGE MICHAELSON:
A day in the life... is this working? Yes, OK, so this is a presentation about a day in the life, 10,000 holes in Blackburn, Lancashire. I will give a summary of what DITL, what APNIC did, the outcomes and where to go from here.
So DITL is "A Day in the Life of the Internet." It was a long-standing plan going on at least twice, possibly three times now, to capture an entire day's worth of data at significant points across the net. What is really going on? And it goes to this question that has been bugging people in Internet measurement for a while, do you count things, sample things, measure things? In a sense it is saying we have got an opportunity here to collect, maybe the court is still out on whether something is better than measuring, let's collect and go back and look at the data any time we feel like it.
That is really quite a nice thing, I suspect we won't cure cancer or find the ozone hole but if you look how the ozone hole was found, in part it was found a long time before we knew it. It was found by NASA using satellite measurement techniques but, unfortunately, the data massaging hid it because it looked like noise. When they went back to the tapes they discovered - they "discovered" - the whole. Dobson found it independently, but in practice it had been known for some time. The ability for NASA to go back and look at raw data and tapes is useful. When you discover questions you didn't think of at the time you can reflect on the past and the future. You get the 'what-if' behaviour. If you can hold on to the data, you can go back and look at it again, if it is something you didn't understand.
Data archiving, well it does raise some issues, there is clearly a right of issue around the data, we are talking collection, not counting. OK, so this is an initiative from CAIDA, and in an ideal world, someone from CAIDA would be up here giving you a better talk. Given we are in New Zealand, I suspect you know because New Zealand is very, very strongly involved in the initiative. Does anyone know which uni Neville is? There you go, up in the North Island.
Someone who is critical in doing the work, CAIDA's website is worth the visit. There is lot of information there, they are acting as a safe harbour for the data, they have a strong methodology to collect the terms and conditions to access and publish and they have a public commitment to be a long-term archive for the information.
There is another agency involved in this, which is OARC, which is an outgrowth, an initiative from the ISC, but it is now actually a fully independent legal entity, it is no longer just something ISC do, APNIC was very pleased to be involved in the creation of the entity, we were an early supporter of OARC in their council. OARC and ISC were able to arrange a thumper to make available for this and Duane Wessels did the software support and management for this data collection.
So 24 hours of data, well it turns out to get 24, you have to capture more, we actually captured 48. Some of that is to do with just the sheer logic of organizing this thing but I'm going to put my hand up to making some pretty big mistakes in time zone arithmetic. You would think living 10 hours offset from the country of my birth I would get it right, but I got it wrong.
Another experience I haven't put on the slide if you have long stable machines that have up-times measuring over a year, I advise you to check if they are running NTP because we thought the machines were and it turns out they weren't and they have been stably drifting backwards over time and were several hours out of sync with the rest of humanity and no other process picked up on it because they were running quite happily back in the past. That was a bit embarrassing.
There was knowledge in CAIDA and OARC - they actually had to get more data, everyone knew you were buying in for 48 hours of capture, it is a lot of data. Our own contribution was 300 gig. It is a lot more data than I would routinely feel like putting to disc or shipping on the network to someone else.
This is a picture of what 59, I believe, points of capture of 48 hours of data looks like. As you can see, there are some quite big holes, anywhere that is white or pale white, if that is visible is a data loss. If you look across that you will see there isn't a complete 24-hour slice. We did get a significant number of capture points, most people were able to come up with data but it turns out to be administratively quite difficult, this to zoom in, is a subset of the data that shows my inability to perform clock arithmetic, because I started collecting five hours before everyone else.
Each of the vertical lines is a sample upload and I chose a large size, I thought I would collect an hour, send an hour, thought that was a pretty bad design decision. If you look at everyone else, they realized quickly you want to parallelize shipping data off your box quicker than collecting. I think the next I'll be more clever. I'll be more clever. There were 156 points of agencies and at least three of the RIRs were involved, LACNIC, and I believe ARIN, and there were R&D collect points, just interested in data in general. The total collection was around 200 TB. We did 200 terabytes. We did every nameserver we operate, either the primary function, our line mission to do reverse or the secondary service we offer, the other RIRs, but also we offer to CCTLDs.
We had a mix of information about Domain Name System and reverse and general Domain Name System. All of this had to be shipped to OARC using an SSH upload path. It took three or four days for all the capture. I really like to encourage anyone who is in a position to participate in a mass data capture to do it, it is a really good thing to do. It is not just about the community contribution, though I think it is the main driver for me, there is the aspect of going out and making your technology able to do the mass data capture.
I plan to learn how to read a clock and do smaller blogs. I had a very strong sense I built the engine ahead of technology. I thought I will take two weeks, there will be someone within the office in two weeks. We retooled the DMS and discovered three to four times unsatisfied demand, we weren't seeing red lining but the effective round-trip time in the Domain Name System we were offering, it suddenly got radically better and if you can do something better in the Domain Name System, everyone in the world says, "We'll use you." So we acquired three times the traffic which means my retention has come down from three weeks to three days. It is very hard to plan for something like that. The question is still out on the jury: Do you sample or measure things? My own personal measurement in APNIC, which is an initiative Randy Bush urged, it is a sample measure but I'm comfortable it tracks reality in some measure, I have spoken to some people, Neville is strong on the measure side, he thinks you should measure packets. But if you have technology like this to do data capture, why not do both? There is no reason not to.
How did APNIC do the capture? We did it on server tcpdump and I'm sure you have done it on very busy service, but it impacts your service. There is lot more chance of packet loss, we saw the collector losing 15%-20%. You know it is not the real service, if you are losing data, the servers weren't designed for it. For this exercise we decided to take it off the machine and rescale accordingly. We thought about doing port span in the switch and we very quickly decided not to do this. Maybe other people have a different experience, but the sense we got is you get a radical increase in CPU in the switching fabric when you do this stuff routinely.
Instead, we selected a copper-based TAP technology from a vendor in Australia. It has worked extremely well for us, but it is copper. We are not yet in a situation where the locations we deploy to have fibre. I guess you guys are probably closer to the core and may have fibre connects to the parties but we are still buying copper infrastructure, they have worked well for us. The vendor locked in, it is a one-packet loss, if you turn it off it goes passive in one packet, I think the Internet can stand one packet. We were confident it was a fail-safe design.
The other thing we looked about the design, by taking the TAP to another machine presented as traffic, we didn't have to reengineer any of our existing measuring methodology. We have five years investment in Domain Name System doing sampling. It is working pretty well for us, we are getting interesting counts on the economy and the level, the ratio between v4 and v6, some of the stuff is in the presentation Geoff has done.
But we have been doing a model from the OARC called DSC which I would encourage anyone who does Domain Name System to use. It is a generalized tool that allows you to aggregate it into a centralized repository, it is a simple model and does an XML data retention that gets mapped into simple 2-D files that are simple to graph across. Deploying OARC's tool made a big difference to our sense of what the Domain Name System was doing.
Looking for what other people look at. These were both absolutely ideal to shift to the TAP methodology, we had no loss of continuity. We are currently using an EL series operating system on a DELL with dual 750 big mirrors, if I run out of disc I can split it in half. I don't know if you noticed but hosts these days, some with dual Ethernet, but if you need more you have to find cards and finding good cards that do Ethernet is tricky but we eventually got there.
So, where can you see the outcome? It is unreadable on the slide pack, I'm afraid, CAIDA gives extremely low URLs but if you can get the PowerPoint you can get the URL. But just to give you a sense of the flavour, this is the general stats they publish between the 2000 and 2008 experiments. You can see they focus on Domain Name System and root servers. You can see they went from covering 4 of the roots to 10 and there is a significant increase in the number of measured nodes worldwide, the collection exercise has really captured all of the significant traffic that collection of the roots.
Query count, well it is about double the amount of traffic. Client count is interesting. I have been looking at the number of unique people who do Domain Name System off us and I have begun to think there are only 1.5 million Domain Name System nodes in the world? Maybe we should ship them a CD every week of every known address and get rid of the Domain Name System entirely?
The recursive query count gets interesting for people. TCP ratios too. So this is an indication of change in the IPv6 query load, the January 2007 was virtually 0, but across the time difference, there was a fairly significant increase in the number of Domain Name System servers that had v6 bindings, including root nodes, and it is reflected in the number of queries they saw.
This one is a little hard to look at, but it has been the subject of a bit of debate. People have been talking about the extent to which the eDomain Name System is enabled because they are interested in the ability to deploy it. The question if there capability there matters a lot.
Recently there was a presentation at IETF where a guy said we are convinced it is close to saturation, close to 90% uptake. I beg to differ and this is a reason why. CAIDA's measurements show while people may be eDomain Name System capable, the evidence is it is not being used on the wire. So there is question on when you collect the data, versus what people say what they see when they look at the views. OK, I'm done, short, and sweet.
PHILIP SMITH:
Any questions for Geoff?
GEORGE MICHAELSON:
George.
PHILIP SMITH:
What did I say? OK. Apologies to George, thanks very much, George, for no questions. Put the main screen back. Next presentation is from Mark Dranse from RIPE NCC. While he is getting set up, packing more things into the afternoon, finger food in the lobby of the Convention Centre at 6:30pm, I realize, George, this is going to conflict slightly with lightening talks and so forth.
GEORGE MICHAELSON:
No problem.
PHILIP SMITH:
Maybe we can do it at the same time. Dinner is going to be fairly late on tonight so it is why we have got finger food to keep your hunger at bay. I always wonder about these events, so much food available, morning, noon, and night. Anyway, finger food at 6:30 before heading off to the cultural evening. So now push the button and Mark can start his presentation.
RIPE NCC Information Services
MARK DRANSE:
Good afternoon, I'm Mark Dranse from the RIPE NCC. You realise, I'm not Axel. I apologize for any disappointment, if you are looking for the update, come back on Friday morning. Just a quick introduction in case anyone doesn't realize, RIPE NCC is APNIC's counterpart in Europe, we cover Europe, the Middle East and South-East Asia. We cover the green blob. We are more than just a registry, we cover things like training, we run whois, Domain Name System and one of the areas in which we excel is information services.
It is quite a bland and generic term for you, so what I'm going to cover today is what we do in the areas. What we basically do run a lot of tools for network operators, tools for people who research, people doing analysis, tools to look at resource usage and for trends in developments of different things on the Internet.
Now, I have heard a lot this week about v6, the topic of the day at the moment, so there will be a lot of focus on v6 in what I say. So prepare yourself for that. In order to do the measurements, we have a couple of measurement networks, the first one is the Routing Information Service network. We have 16 remote collectors at different locations. We collect BGP data from about 640 peers and we keep that. More about risk a bit later. The second network is a more generic one, 80 nodes on that one at the moment, those are hosted by people like you and us, ISPs, universities, RIRs, APNIC host one for us, down in Brisbane, as you'll see from the map, there is European focus, because the service began in our region. But you can see there is nodes in Japan, in Brisbane, Melbourne, and Hamilton in New Zealand. It is not free to participate in this, there is a small fee, we run the stuff on a cost-recovery basis but you do get good value and benefits and also a warm, fuzzy feeling, guaranteed for contributing to community efforts.
Looking at the probes themselves, we use 1U rackmount servers. They have a GPS antenna, we are doing precise timing so we use GPS to get accuracy within 10 microseconds. In the Asia Pacific region, finding good time sources is not always an easy thing so one of the benefits and offshoots from having the GPS into box and you get a local Stratum-1 NTP server on the network. So the platform is very flexible, you can run many applications on it, and we do, such as TTM, Domain Name System MON, ad-hoc. It operates in a full mesh such as the bottom left, but a large number of the nodes are v6 capable, it means we can do v4 and v6, point-to-point on the links for a lot of the connections we have.
So the first and the oldest application running on the network is test traffic measurements, TTM. It has been running for almost 9 or 10 years now. What we do is one way measurements of delay, loss, and jitter between each node. Every single node in the full mesh in both directions and we plot that stuff. Also, if you want to do anything you need to know the traceroutes so you can see which path data has taken when you go to investigate what went wrong.
As an example of what you can use TTM for in trouble-shooting, I took a look at the boxes in APNIC region and found an interesting glitch to look at. Bear in mind the NCC is completely neutral, we are just looking. I found the box in Hamilton and the one in Japan, it you look at the very top left on the red line there, is a little glitch, the red line is the number of hops between the two boxes, you can see it increases for a short while and the black bits underneath are the delay and it also increases too when the number of hops increases. So I'm a fairly inquisitive person so I had a closer look at it, that is the hop line itself, there are two extra hops from 17 to 19 and the delay increases for about 75 to about 120 milliseconds at the same time. You can actually craft your own custom blocks for specific points in time.
You can see more closely what was going on. Just on the very top of the graph you can see the red bits showing there was, the hops went above and beyond 30 at some point, so looking closer and find out why.
On the traceroute stuff then, we keep everything in the database and traceroute every so often and store all the hops and search and find anything over any period of time. This is the traceroute before the glitch appeared, it shows 17 hops from New Zealand to Japan. If you take a closer look, showing you the previous one was too big to read, 17 hops there. What we saw, at 12.30, we got a routing loop in ridge.com. I have cut it off but it was flipping back and forth and back and forth and getting to 30 hops and giving up. What we saw after that was in fact the reach disappears and we started seeing the part going by ALTER-NET to reach the same place. There are two extra IIJ hops on the path to Tokyo. So we saw it increase from 17 to 19. Half an hour later, away from it and the hops have disappeared and we are back to 17 again.
It shows you can look plots and see something happening and trace it and find out exactly why. The data is good for going back in history and finding out anything. A different view of the traceroutes we are using geolocations. So you can plot the stuff on the map. I'm not entirely convinced the geolocation stuff is accurate because it is going past New York and other places in the US so I'm not sure if it is right or not. One of the benefits of having v4 and v6 hosts in the net is we can look at the same two points and look at using v4 and v6 and compare the two.
Some of the data from TTM is going back to the presentation, earlier in the year, what we have got here is relationships, each pair of columns, I'm not sure if you can read it in the room, it shows you the relationships between two of the two different probes and we are looking at the delay them, red is the v4 and blue is the v6.
In fact, in every case, the v6 delay between the nodes is larger than the v4, it is probably what you might expect at the moment. These ones, particularly, bring out, I'm not too sure why, because they are close to each other geographically. The loss is more profound, very, very low, but if you look at the v6 it jumps massively. A massive one I have highlighted which stands out. Something else, that we use, is this v6 tunnel discovery tool, built into the TTM data. What we actually do is we look at all of the peering relationships between the v4 and the v6 capable nodes and we try to work out whether there is some sort of tunnel between them. We plot it on this chart and you can use the MTU, there are 77 v6 enabled modes, 702 potential paths of which the TTM data is saying there are probably 557 native and 145 tunnelled, 26%. Fifty-one GRE and 746 in 4 tunnels. You can zoom in and see the path where we think the tunnel begins, in case you're interested in that sort of stuff.
Another application then that we run across the probes is called ad-hoc. We take each of the measurement nodes and instead of have them talk to each other, we aim them at a different target outside of the network. We have built this nice interface, and you can do anything, it is plug-in based. You just configure your things and you run the tests and it gives you the results. So we have got a configuration page, I don't know if it is big enough to read, but you plug in the configuration at the bottom and set the parameters. There is a number of limits built in to stop any sort of abuse, we don't want to be a drag on the DGOS network.
So when your test has run, you get a nice summary page, in fact, while it is running, you can look at the data in real time. It shows you what they are doing. Here I looked at the APNIC website, actually, from locations in New Zealand, Japan, London, and Melbourne, you can see the different response times on the graph. It is fairly what you would expect, London is the longest, Japan is quickest and Australia and New Zealand quite fast. The bottom there you get, there is a chunk of the data you actually get, plus all the results, you can grab those as a CSV and plot your own graphs. One of the things that you can do with that then is a different test I have run here which is actually grabbing the APNIC website over v6 and measuring the amount of time it takes to retrieve the data we are putting across, just the images and things like that.
Again, I don't - I'm not sure if the key is big enough but the USA is at the top in blue, Germany in green, Japan in the bottom and it is faster. Because we have v6 and v4, you can run them both simultaneously and compare the data. What is interesting is I didn't expect to find, when we grabbed the APNIC website from the Japanese node it was faster as v6 than v4. It is not something I have seen before looking at the data. The US is slower than Germany, ever so slightly.
There is more data we caught when we did the same against the RIPE website for the last RIPE meeting. I think the thing is a bit more pronounced there, you can see the quite wide difference between v6 and v4. So that is some of the stuff you can do with the ad hoc testing.
We have got another application, called DNSMON. We build it to monitor the server RIPE NCC runs but we have given it to the community for everyone to use. We have stopped aiming the probes at each other and now outside the network. We monitor ccTLD sgTL GDs. When it is green it is good. Yellow not so good. What we found at the start of the year, monitoring for v6 when we enabled v6 in the route we started monitoring it, there are two v6 enabled NZ servers, looking back at the previous data is the overall quality of the v6 service on the open Internet, it is not indicative of the server but it could be network issues built in.
DNSMON is lovely for route operators but what if you want to host a probe? What benefit? We have the probe view which lets you gauge the connectivity between your probe and the locations of the 200 or so Domain Name System servers we monitor, in well connected interesting diverse places so would be interested to see your server in those locations.
This is showing the probe located in New Zealand. You can see there is a few interesting bits on the graph, which I will take a closer look at. This is, it just shows how we look out from the probe to the servers. There a big red line followed by a white line, vertically. What we are seeing is the box itself, probably lost connectivity, which is why it is red because all the queries were being dropped. At some point the probe realised it couldn't see anything so it stopped taking measurements, which is when it goes white. There is a horizontal red line, the old B route server which I assume is switched off but still being monitored for some reason. At the top, there is stuff, which I think is the G root server which seems to have fairly recurrent daily problems being queried.It seems to be the same time every day, in the afternoon, quite wide.
So, back to the Routing Information Service, this is a looking glass with history. We collect routing information over BGP. We have 640 peers, 16 remote route collectors, different IXPs around the globe. We have three months live data searchable online but everything is stored going back to 2000. If you want to go back to that, tools are free of charge to everyone who wants it at that website.
Here is the map showing the locations, there is an RRC in Japan but nothing a bit closer to us. We have Sao Paolo, Miami, Moscow it all supports v4 and v6 and does 3 bit, whatever you want to call them. Geoff said he could see 12? We see 13, I'm not sure why. Maybe we can compare notes later.
Great, so among the tools you can use to query all of this, we have the AS dashboard, I have picked on a - I'm not sure which AS but there is a chart at the top which shows the number of prefixes being announced. It has shot up at the end of January, a load of /24s started being announced. There is no v6 prefix appearing, which might or might not be interesting. The pie charts and in the middle is the AS path length for each of the collectors. The average is 5.35 for AS path link, for RIPE, for AS it is 3.75 and the global average is 4.75. A closer look of the prefix chart, one thing I noticed with the prefixes was that a lot of very small prefixes appeared, being announced. A load of /26s, /7s and /8s appears for 9 minutes in August. No idea why. But what is interesting to me is how visible some of these very small prefixes were. /27s being seen by a very large number of our peers. I thought this is interesting. It might not interest you.
I then pulled up what we call the prefix dashboard, a /8, sorry, a /16 which I found on the path when I did a traceroute from the hotel. In the chart there is a lot of overlapping, more specific things being announced on the /16, no idea why, maybe a policy somewhere. You can cycle through it get a map. We use the traffic lights to see the prefix at the different locations at the RICs and the one beside you shows you how stable it is, we measure it by seeing how many updates there are for the prefix at any point in time. It is a small sample of the tools, they are listed on the website, covered down there. Please go and have a look, you can use the stuff.
One of the tools listed is called BGPlay, it is the prettiest thing, and it is huge and confusing. It is a slide from the Middle Eastern cable cuts. Down the left, the purple histogram, showing the number of updates, the origin AS is marked in red, all the risks, peers, questions, and paths, in one week. You can actually animate these things. Here is something which I think is being presented to APNIC before so I have trimmed it down to save time. It shows the YouTube incident in February when a more specific /24 of YouTube /22 was announced. We can set it playing, you can see the /24 appearing from Pakistan Telecom, at some point YouTube recognized and grabbed back the traffic and now no-one is going there anymore. BGPlay uses the visibility and it has got everything playing. You can use it to visualize anything within your own AS and visit those of your friends or anyone at all.
One thing we got asked when this, I don't like to use the word hijacker, the accident happened, was what could people do to find out if it happens to them? We came back and said we have got something we call MyASN, a tool we run based again on the risk data that lets you configure alarms that lets you know if someone announces a prefix of yours. You can use expressions to create complicated rules and trigger notifications to come to you if your prefixes appear where they shouldn't. You need to bear in mind, if we send you email or assist log you might not get it so we think we might offer notification at some point to get around it. MyASN is free, we check you are from where you claim you are from and you get your account.
One of the last bits of risk we run are loads of reports and statistics that Geoff does, but it is better. If you are interested to see it, you might want to have a look at the reports and fix stuff if you find it was listed there.
So there are links to all of the stuff I have mentioned. We did some fairly in-depth analysis of the cable cut stuff and the YouTube incident, they are linked to the bottom there. All the stuff is under constant development, all the tools, the services we offer and we really love feedback. So if you use any of this or if you're interested in it, please talk to us, let us know, if you don't like it, please tell us why. Most of all, we would like to get more users in the region so please go and play with it, if you're interested in finding out more, please come and talk to me or ask me a question, about now. Thank you.
PHILIP SMITH:
Any questions for Mark? No-one is rushing to the microphone. Thank you very much for that, Mark.
IP emergency services
MATT LEPINSKI:
I'm Matt Lepinski, I'm going to talk about emergency services and emergency communication over IP. So, first of all, I'm going to talk briefly about what is emergency communication for the purposes of our talk and basically, it's citizens calling for help, like requesting the assistance of the local fire department, or else, it's the Government trying to notifyc itizens of an emergency like an impending earthquake or another natural disaster. I'm going to talk briefly about why you would do emergency communication over IP that should be easy for this group. And then give a couple of high level examples about how emergency community indication might work over IP. Now, there's ongoing standardization efforts in this area, so I'm just trying to paint a hypothetical picture to give you a sense of how this might work.
Then I'd like to talk about what's required in order to actually make emergency communication work over IP, and the short answer is location. If you're going to warn people about an impending earthquake, then you need to know whether the people in question are within a particular region that's going to get hit.
Finally, I'd like to talk about the ongoing standardization efforts that are happening for emergency communications, and most importantly, I'd like to talk about how you could find out more or get involved in the standardization efforts because ultimately, the more people who provide input into this process, the more likely it is that we develop standards that are actually going to interoperate globally and meet the needs of the stakeholders.
So, having said that, again, in a little bit more detail, for the purposes of this talk, I'm considering two types of emergency communication. Citizen-to-authority communication is best characterized by the 1-1-2 service that works on GSM phones, and in Europe. There's a 1-1-1 service in New Zealand with a similar function or the 9-1-1 number in North America. Basically, these are all numbers that you call and your call gets routed to someone who is able to dispatch police, fire, maybe medical services to your current location.
Then, the other type of emergency community indication that we're interested in is authority-to-citizen. This is geographically targeted disaster warnings like earthquakes, tsunamis, hurricanes, those sorts of things. So, why emergency communication over IP? I'm sure everybody in this room knows there's a growing number of mobile devices supporting IP, iPhones, and 3GPP and Blackberry. And people are using their laptop in more and more places. I look out in the audience and see that almost everyone in this convention centre has a laptop plugged in.
Additionally, voice and realtime text messaging applications are becoming more and more popular. Skype is huge and Google Talk has millions of users. Vonage offers a large commercial VoIP service in many parts of the world. A number of cable operators are offering voice over IP services, so in light of the fact that there is more and more real time communication happening from IPdevices in more and more locations, a bunch of people who think about emergency services, emergency communications got together and said, "Wouldn't it be wonderful if at some point in the future, any IP device on any network could request or receive emergency services, receive help when they're in trouble, receive warnings when something bad is going to happen, and let's have this interoperate everywhere in the world".
So, obviously this is an ambitious goal, it's not going to happen any time soon, but we're working in this direction and we'd love to get your help. Here's a high level example of how citizen-to-authority communication might work over an IP network. So, I show the device in question, it could be a laptop, it could be a laptop running Skype or a Blackberry or an iPhone, and the first thing that's going to happen is that the device is going to request its location from the network, either the network knows where the device is because the network has information that indicates where the device is connected, or else perhaps the device has GPS on it and the network is just providing GPS assist data. And the device is going to initiate a signalling for whatever realtime communication mechanism you're using. You contact a routing element, which might be a Skype server or a Google server or a proxy, anything like that.
And this routing element, which sees the signalling containing the location, then that can make a request to a location to the service mapping database. So, the database I'm talking about here is the database that, for any given location, like Christchurch or perhaps a city level, county level township, whatever is appropriate for the jurisdiction, you provide it at the local granularity and you get back reference to an emergency dispatcher who is able to provide assistance in your region.
Then, the routing element, having routed to the local police, fire, medical, whatever; the local dispatcher is, continues to route your signal to the dispatcher and the dispatcher can have an immediate connection to the originating device where it is probably either realtime or voice traffic. So, just to provide another picture of how this might work, there's some privacy concerns obviously with attaching location information to signalling that's going to be sent over the public Internet, so another idea that's gaining traction is, instead of a device object staining its actual location, that it actually sends out a reference, a pointer to its location, and so, therefore, the only people who actually need to see this location are the database that is going to determine the appropriate dispatcher, and perhaps the dispatcher himself would want the location information. But then, you're not - you're alleviating some privacy concerns by not attaching the location to the call signalling.
So, alternatively, authority-to-citizen could work in many ways. One way that people envision it is a device that obtains its location from the network, then, you have a similar location to service mapping database which contains information like - for Christchurch, who is it that would be sending out earthquake warnings or tsunami warnings? And you get back a reference to an authority that could help you in an area, and then the device registers with the authority, and receives an alert if such a disaster were to occur.
Now, alternatively, for scaling reasons, instead of having every device registered independently with the emergency authority, it probably works better if there's actually a network element close to the device, like perhaps the 802.11 routers that are providing us access in this room. The network element gets its location, again, figures out who the appropriate authority is to issue the alerts and then send the registration and all of the alerts that would pass through the network element, that would then proceed cast them toall of the devices that are downstream. So, in any case, these are just high level pictures meant to be illustrative of the things that people are thinking about for emergency communication. So, what's required to make emergency communication happenover IP? Obviously, there's some work that needs to be done as far as actually building databases that point to the appropriate dispatchers or emergency alert generators. And there's some work for application developers to do if they actually want to want to route to your local emergency dispatcher. But, the big thing from an Internet community point of view is that we need location. And so, in particular, location is required in order to route requests for assistance to somebody locally who is going to be able to help me, and also, to know what devices are in a region that's going to be affected by some disaster or emergency.
In this regard, the access network, the network elements which are closest to the device are the best equipped to provide the assistance necessary to locate the device. Now, whether this means providing a server that has access to network apology information that's able to answer queries about the device's location in a wired setting, or in a wireless setting, maybe there's a database of WiFi triangulation information or a GPS that needs to be pushed in order to assist with location.
Now, the good news in this regard is that although locating IP devices is a challenge, it's a challenge that's being tackled for a large number of reasons, not only emergency communication. In particular, there are people who have business plans located to location-based advertising, given like a traveller with money, the opportunity to spend that money at local destinations that choose this form of advertising. Also, there're services like - I'm in a new city, I would like to know how to get to the grocery store. Or which of my buddies are nearby so I can hook up with them? So, all of these types of services in addition to emergency communication are driving the push for location enabling both the devices and hopefully of networks as well.
But, the one concern that I have and I think many people have is that in order to actually get global inter operability for emergency services so that a Skype customer who shows up in New Zealand after buying his laptop and installing Skype in Europe is still able to get the emergency assistance that he needs in order to get this type of interoperability. It's going to require standardized interfaces to location information, so in particular, a device that wants to support emergency communication needs a small number of mechanisms to obtain its location from the network. And there's a lot of protocols that have been proposed for this, but the important thing here is to have a manageable list that the device could implement and be able to get location from any one of a variety of networks that it might decide to attach to.
Additionally, developers will need an API, and it will need to be in a standardized format, in particular that the emergency service providers around the world can all deal with reasonably.
So, that's the challenge. As I said, there is a lot of ongoing work in this area. The IETF has active work in the ECRIT and GEOPRIV working groups. There are standards going on for next generation IP enabled phones. The OMA consortium does a standard of work on location formats and geolocation. The WiFi form is 802.16related standardization. ETSI is the European at the Communications Standards Institute. They're a TISPAN working group. The OASIS organization does a number of data format standards, including data formats for emergency alerts, disaster alerts. So there're a lot of bodies working it in this area, but in order for the standards that they produce to be useful, it's important that the standards get vetted by a broad group of stakeholders, so, the emergency services workshop series was started in October of 2006.
Workshops have been happening every six months. The next workshop is in Vienna, Austria. Remote participation is possible, although the time zones from here and Austria really don't sync up nicely. So far, we've been able to have broad participates from Europe and North America. Both standards developers, regulators, network operators and emergency service providers. However, we haven't been very successful in getting the participates we would like from the Asia-Pacific region, and so we'd really like to do a workshop in 2009 somewhere in the Asia-Pacific region and my primary reason for coming here to talk to you is to try to encourage those of you who might have some interest to read about what's going on, to talk to me, to talk to other people who are involved in this effort, to make sure that the standards that are being produced would meet the needs of operators here in this region as well as in Europe and North America. And again, the primary goals of this workshop are to make sure that the standards bodies are producing things that operate and also that the standards are going to be things that are deployable in a wide variety of networks in a wide variety of places.
So, if you would like to learn more, the first thing you should do is contact me. If you think at all about emergency services as part of your job or day-to-day life, I'd love to talk to you. If you're doing anything location-related, I would love to talk to you because I think that we can reuse a lot of great location work that's being done in other contexts.
If you're not doing anything location-related now but think you might be interested in doing something location related in the future, I would also love to talk to you. The worst-case situation from our point of view is the standards designers, is that five years from now, you decide that you want to deploy some location technology and it doesn't interoperate with our emergency services standards because what we've created was done without any thought as to your particular network deployment situation and we didn't get your input in time and so we have something broken, so if you're thinking that you might want to do anything location related in the future, I would love to talk to you, I would love to make sure that what we're doing is going to interoperate with anything that you might you might deploy in the future. I would love to get some e-mails from people or just catch me at the social event. Also, the emergency services workshop series has a website that I put up on the slide. There's a mailing list out of Columbia University. The website has information on how to get on if you're interested or I can point you in the right direction.
So thank you very much for listening to me speak and I hope to hear from people who might be interested in this work.
PHILIP SMITH:
Are there any questions for Matt? Any questions or observations? No, it must be afternoon fatigue or something!
APPLAUSE
Finally we've got Sam Sargeant who is going to be talking about the recent DNS security advisor who is giving his report.
DNS: report on security advisory
SAM SARGEANT:
Right, good afternoon. I'm Sam Sargeant from Modica Group based in Wellington here in New Zealand. There's been a lot of talk recently about a DNS new attack and I want to talk about what DNS is, very simplified version, and sorry to any experts in the room who may cringe at my explanation and what the catch cache. I want to talk about the impact of a poison cache. What does it mean when your DNS cache is poisoned. Who cares? So, I'm not a security professional, nor a vendor of DNS software. I'm not trying to sell you anything new or to upgrade, I'm just here to suggest. I'm from New Zealand. So DNS, if I have my laptop to the APNIC website and a nice little browser comes up, what happens? So, my laptop will send a query and fire it off to the conference DNS server here in the office. That conference DNS server will then, obviously it doesn't know about APNIC.net, it will fire it off to APNIC's DNS service, which will be in Australia. APNIC's DNS server then responds and tells you the exact IP address to go and it will send it back to the conference DNS server and come back to my laptop and I get a website via v6.
The conference DNS server will cache that answer. Obviously, every single time anybody looks up an address, it doesn't want to go and ask APNIC's server. It will maintain a table of all the recent look-ups for each of the entries, you can see that it has a lifetime so perhaps after five minutes or a day, it will expire that entry. But, up until that point, it will keep serving that old entry, the basis for cache poisoning.
The first way to go about it is spoofing. First we get the server in the middle to ask the right question. We can't just give it an answer it wasn't expecting, it won't do anything with it. And when we spoof an answer packet and we have to match a few different fields to make sure that it is right. Either source IP address, in my example it was APNIC. Destination port, the local DNS server we use a port and hope to hit the right one, and there's a query ID or a transaction ID which is a 16 bit number. Just a way to assign anumber to a particular packet so you know which answer it is for.
So another example of some spoofing, so I have laptop now send a query for www.APNIC.net and it will create a query and then sign a 16 bit ID number. That will send it off to APNIC's DNS server. Meantime, I create my own entries and I trial the numbers. I don't know what the transaction ID that's been selected is and I invent them, and finally one of my answers gets through, it's the right answer. At this point, the conference DNS server says great, you put in the right ID and I'm going to put that into my cache table. In the meantime, the relapse comes back and we throw it away, we are no longer interested in this query, we have it in the cache, we're done. At the same time, I specify for any long period of time, in this case a million seconds so nobody else has a chance to poison it.
If you were to mitigate this - this was the old method. The transaction IDs are a 16-bit number. In the mid-90s was like just an infinite number, and starting at 1 and 2 and 3 and 4 and very easy to work out what you need to fake. Since then, we've learnt some lessons and now you see the random numbers. It is harder, it depends on the quality of the random number generator, but it is harder.
So, that's the first old method of cache spoofing. The next method is using something called the RR Set. Your answers that come back can include more than one answer. They can say, here's the answer for the question you asked and by the way, here's something else that you may need to know and the first thing is to get the server to ask the right question then, and you respond with the right answer. But we also overload it with some extra evilness and added another record into that set that days, hey, here's something else you should add to your cache. So, my attacker sent off a query that says, who is www.evil.com and it goes to the conference DNS server and then to my DNS server and it says, here is the IP address and by the way, here's another record you might be interested in.
APNIC.net. It comes back, the conference server adds it in and I have poisoned the cache. Again, this was a method that was valid quite a long time ago. To mitigate this, additional records must be relevant for question, so, what happens now is that if I send a record back that says, NS2.APNIC.net and here is the question and the answer that I'm going to have, that's fine. But if I supply an answer that is not relevant, for example, if I ask about evil.com and the answer comes back and says e oh be and by the way, APNIC.net, the result is that it will throw it away and it doesn't care. So, two different methods of caching, spoofing, and the RRSet. Earlier this year, Dan Kaminsky discovered a new method of cache poisoning. He notified the vendors firstly. He didn't tell people in the wider community, he spokes to people in the vendors and said this is a problem. July 8, he came out and said, hey this is a big deal. CERT issued an announcement saying everyone needs to upgrade and the software came out and it was an amazing co-ordinated release for many vendors of the update. Later on that month, the details were leaked, or guessed, it's not clear, and it was public knowledge. The exploit knowledge was available within a day and you could Google it and it was easy to find. Early August, Kaminsky revealed the full details of the implications of what was going on. A quick way to check if you're vulnerable if you're on a Mac or Linux you can run a command. Fortunately, the Linux is not vulnerable or visit Doxpara.com. So, how does this particular attack work? My attacker host will send a query, for in this case, it doesn't happen to exist. It doesn't matter what the record is. The example here is aaa.APNIC.net. I send my query packet to the conference DNS server. I then craft an answer at the same time, this is my spoofing, my regular spoofing attack that says aaa.APNIC.net doesn't exist and by the way, this is an additional record. And we are omitting a relevant check because you asked the question and we're supplying records in the same zone for www.APNIC.net. So, we try to send that off as quickly as we can to the conference server, we try to fake all the details. While it queries and is coming back, you've got many different packets and have to guess the right transaction ID. You have a small window of opportunity and sometimes it just doesn't work. Your answer arrives at the wrong time and the conference DNS server no longer cares and throws away the packet. So we try again, aab.APNIC.net and we can go as long as we want to. We have to guess a number between 1 and 60,335, we can do it as many times as we need. There is a statistical anomaly called a birthday attack, saying that the more chance you have to guess a number, the more likely you are to find it. So finally we use aac.APNIC.net. That gets through. The current thinking is that on a fast Internet connection, it takes about ten minutes of packets, spewing them out and trying the different random ID shoping that one will finally get through. The longer you try, the more likely that you're going to find something.
Then finally, the answer arrives before the end answer does and we've won, we've poisoned the cache, as well as records for aac and subverted it and told it that APNIC's server is somewhere else. So in sort, that's how the spoofing attack works and the new cache poisoning, a combination of old techniques.
So, obviously, we want to fix this. So, the patch that came out in early July to DNS software was source port randomization. Currently, what happens is most DNS implications is the query that gets fired off to the wider Internet and comes from a single source, Port number 53. Nice and simple. What they've done is increase the difficulty by also randomizing the port number, so as well as trying to guess the transaction ID, the 65,000 number, we've added it to the source port which is another 65,000 numbers and around about 32 bits to play with. It is a lot harder to guess and a very small window of opportunity. So I said it's been up since July and there is no excuse at this stage for it not to be patched. Unfortunately, there's a small niggle is that if your resolve is behind NAT, on the way out, NAT will change the source of the packet. So you may want to check your NAT implementation to make sure that it is not doing anything particularly stupid.
There's a couple of ways your attacker could trigger the attack. As a customer of the service provider, obviously I need to have a recursive DNS server available. I could keep sending queries. Some service providers and some in New Zealand as well have open resolvers, so anyone can do it, you send packets to the DNS server and it will happily go away and keep making requests on your behalf.
Of course, if you close the holes, phishing websites and clever websites can go through and load images from a single page, much like Nathan's IPv6 testing forcing the DNS server to resolve, so you can potentially use this to send e-mails to corporate and fire it off. The users are triggering all your DNS look-ups for you and you poison from there. So, that's basically how cache poisoning works. That's all well and good. Most people go, great, why does that matter? Why is that a big deal? The biggest problem is having an attacker in the middle. If I control where your DNS goes to, I can say APNIC's web server goes to my IP address. What I can then do is perhaps give you a webpage that says, I'm sorry we're down or give you a webpage that looks exactly like APNIC's website or say, please log in.
If we poison it, we go to the ISP which now directs all of your customers to me and say what's your prompt and log-in and they'll give it to you and they don't see any difference at this stage. Sure, here's my details.
Some banks use two-factor authentication. That's going to help a lot but I can go and request a real webpage from the bank because I know the real IP address and get the two-factor authentication, like getting your password or your phase and take it from you and I can give it to the bank and authenticate it and give it your money. Some say use HTTPS, which has the host security stuff and people won't be fooled. This is the error message my bank gives me and in this situation, users can continue, they won't care. This actually tells you exactly what is happening. It says you might be connecting to a website which is pretending to be your bank which could put your confidential information at risk. OK, continue, I want to continue with my banking, thank you very much. I don't care, make it work. So I don't do that.
If you redirect your users to a malware site. This happened to an ISP in China using this attack last week. Any user of this ISP who typed Google.cn, they got redirected to a site that looked like Google, but in fact, the machine was malware and a nice way to doit. You could do a denial-of-service attack. I could take the competitor's website down by poisoning caches and users go to the website and it doesn't work.
I could create fake SPF records so they can't send e-mail to any place because I said that SPF can only come from this random IP address. Perhaps, it's a bit of a stretch but these are real and possible attacks. I could create a poisoned entry for Google.com and send everyone here to my target who is on DSL and they'll collapse under the load. It's an effective way to throw traffic at someone.
So, that's all fine or e-mail and the web but there are other protocols we use that rely on DNS. They assume that it is working and works fine, why should I care? If you gather log in details and have a false FTP, then won't know any different, they'll give you the password. Open SSH does a really good job of warning if it thinks there's someone in the middle. It will give you a huge warning and won't continue. It will just stop dead in its tracks and say, nope, this is broken and I won't let you do anything else. That's good behaviour.
And potentially VPN, if you don't do any host validation, you think that this is all nice and easy, you connect to your VPN and give it your password and look at your security services, you're talking to an attacker and you don't know any better and they're using it for another attack. The big risks are resolvers who use for ISPs for the customers, that's the ISP that you want to...
(Technical Fault - one minute duration 1735)
...you can't do it on your own, you can do the work by yourself and other people operate DNS set and it won't do any good. You're sitting on your own little Island and there are other people up there, Google are sending back unsigned answers like yes, you can still poison those. So, how do we solve those? We upgrade the DNS service software and make sure that it is up-to-date. You can limit the exposure by making sure that your providers will only answer questions for your customers. There's no reason why your DNS server should answer questions for people who aren't your customers and perhaps we need attention on protocol design. We need to increase it from 16 bits to 32 bits. Perhaps we need to look at a better way through this. For example, if the authority disappears and you're still sending queries, you can keep running the race as long as you need to. It will take a huge amount of time, but it will still work.
So, sounds like the old problems we had in the 1990s with the horrible cache poisoning, but it is a clever combination of the old problems. DNS set will help, it's worth going down the path, but it will only work if we are all going to go down the path.
And perhaps you shouldn't rely on the single source of who do you connect to and how do you trust them? Perhaps you should have strict host checking for example.
So, finally, I want to show a little video. Earlier on, I mentioned that you can go and check yourself to see if your server is vulnerable. What's been happening since then, the website have been collecting all that information and they've made a little video that shows what's happening, which servers have been tested and the red ones are the servers which are unpatched in the location, and the yellow ones are patched by behind NAT so potentially not very helpful. And green, patched, we're happy about that. You'll see some in New Zealand that were unpatched. I checked some open ones which were still unpatched. There was a huge flurry of activity in July, people were very excited about this. If your region is still red, then yes. That's where we stood on the 22nd. So, that's all from me. Any queries?
ERIK KLINE:
Concerned citizen. Forgive my ignorance, but is TGP also an adequate defence if you don't ask queries. If TDP would be fine.
SAM SARGEANT:
Some people decide, we'll filter it and they just filter it away. There's a huge amount of servers out there that you can't contact by TCP.
ERIK KLINE:
So you couldn't even have long lived connections that you don't have to set up again. That would probably not be allowed.
HEATHER SCHILLER:
Heather Schiller from Verizon Business. Going into a false sense of security with the patch just requires more patience to compromise.
SAM SARGEANT:
Absolutely, some protocol design need to be reworded.
DAVID WOODGATE:
David Woodgate from Telstra. Just a quick question on, do you think that DNSSEC will ever happen, given that it has been ten years at least?
SAM SARGEANT:
Definitely, I can recently report, yes, we're going to sign the root, yes it is going to happen in the next 12 months. So the route will be signed soon. It is certainly not the same scale at v6, but there's a cost to deploying it so it is just people getting over there and understanding why you need to do it.
DAVID WOODGATE:
So probably another ten years, just consistent with a 20-year deployment seems consistent.
SAM SARGEANT:
Or there's a giant problem and we say oh, perhaps we need to fix this.
PHILIP SMITH:
No other questions for Sam. Oh, one more.
MARK FOSTER:
Mark Foster with New Zealand Defence Force. It was interesting, the comment about GD6, today I received a headsup that the United States Government are mandating that all must have... (INAUDIBLE)
So I suspect that some people are standing up and paying attention, I guess it is up to the rest of us in the areas where it is not mandated by Government to take those steps of approach with the appropriate measure.
PHILIP SMITH:
All right, thank you very much for that, Sam. As I say, we're running a little bit late. I would like you all to thank our speakers that we had for the APOPS session this afternoon and thank you to my colleagues, the SIG chairs and the APNIC Secretariat for helping to put together this agenda.
One final announcement, I was just notified a few moments ago that the MRTG graphs are now available if you going to the APOPS section under the program. The hard working staff, Jonny, and so forth have put together this page. Basically showing the traffic load on the network at various times in the day. So, that's the final piece I want to make.
OK, so the lightning talks will start at 6:00, that's about 16 minutes from now. A reminder that there is going to be a finger buffet at 6:30. The food will be brought into the back of the room here. So, during the lightning talks, I guess that's going to require some skill and chairing from George. Reminder also for the social event, the buses start leaving at 6:50pm. Last bus is 7:10pm departing for the village and the buses start leaving again at 10:20pm to comeback to the convention centre.
I think that's all I have to say, to thank you to the presenters and thank you for listening and thank you to the stenographers for trying to keep up with us for most of the afternoon and we'll see you all later on.
(End of session)
APPLAUSE