IPv6 WireGuard Peering

justinsaccount · on Dec 23, 2020

That's pretty amazing. I've been keeping my on this stuff since it seems one of the few simple PaaS that give you access to tcp sockets.

tons of PaaS providers are no help at all if you want to run an app that uses ZeroMQ or basically anything that isn't http.

One super small nitpick though, and maybe it's just me, but

> Technically, what we end up delegating to each instance is a /112, which is the IPv6 equivalent of an IPv4 Class B address

> WireGuard peers get /120 delegations (the equivalent of an IPv4 class C)

Should be

> Technically, what we end up delegating to each instance is a /112, which is the IPv6 equivalent of an IPv4 /16 network.

> WireGuard peers get /120 delegations (the equivalent of an IPv4 /24)

classes haven't really been a thing for almost 30 years.

tptacek · on Dec 23, 2020

I thought about that, and just thought it read better the other way, but also I'm the oldest 40-something on Hacker News so "class C" and "class B" might just be more vivid for me than for normal people.

If there is something with a socket of any sort that you can't do on Fly, I'd like to know about it, so I can make Kurt write some BPF code for a change.

happythought · on Dec 23, 2020

In the spirit of nitpicking, class A, B, and C referred to specific address blocks, not network sizes. A /24 in the class A range was still class A rather than class C.

unethical_ban · on Dec 23, 2020

I worked on network/firewall at a rather large bank from 2009-2019 and we used the class labels and "slash 24s" etc. interchangeably when talking. Not that you're incorrect, but that the slang was used and everyone knew what people meant by it.

tptacek · on Dec 23, 2020

See, I had no idea. This is the best kind of nitpicking.

swinglock · on Dec 23, 2020

Look up CIDR on Wikipedia, which is what we use now and you know it already, but more so to learn the difference from the old classful networking.

Then you'll never use "Class A/B/C" again. :) It's correct to just say /24 while Class C isn't, because it has a more specific but outdated meaning.

tptacek · on Dec 23, 2020

I'm familiar with CIDR, I just always understood it to mean "arbitrary prefix lengths, not just on byte boundaries".

mech422 · on Dec 23, 2020

>but also I'm the oldest 40-something on Hacker News so

Soo...Happy 50th birthday? (all the 49.9999 year olds will get this joke...)

yencabulator · on Dec 29, 2020

Last I looked, I can't receive SMTP on Fly. But that sounds more like a policy decision.

> You can configure an application to listen for global traffic on ports 80, 443, 5000, and ports 10000 - 10100.

https://fly.io/docs/reference/configuration/#services-ports

justinsaccount · on Dec 23, 2020

Fair enough :)

If I were to run a container that handled tcp connections, is there any sort of max connection duration?

One of the projects I have worked on involved accepting tcp connections from cellular connected race timing boxes, dealing with their protocol and sending the resulting data into another system.

I guess you could think along the lines of syslog or MQTT.

This isn't very bandwidth or CPU intensive, but does require keeping the sockets open for hours at a time.

tptacek · on Dec 23, 2020

This has come up before! If you like, you can hit me up at thomas@fly.io and I can take a whack at telling you how well this'll work out-of-the-box. What I can say is that "ordinary" TCP connections at Fly run through our Anycast proxy infrastructure, which does expect that it can bounce connections whenever it needs to --- but we can route around that for long-term connections, which is a thing that people doing, for instance, WebRTC want.

justinsaccount · on Dec 23, 2020

Cool! With covid I haven't had the chance to work on any races lately, so I don't have an immediate need for this right now.

heroku/app engine/lambda/cloud run all replace CGI in one way or another, but none of them can replace inetd.

zelly · on Dec 23, 2020

GCP PaaS stuff allows gRPC and WebSockets but that's it from what I've seen.

ksajadi · on Dec 23, 2020

You might want to have a look at Cloud 66 as well. It’s like a PaaS but on your own cloud / servers so you get full control.

sargun · on Dec 23, 2020

From my understanding, wireguard makes it so that all traffic between two physical nodes shares the same 5-tuple. In addition, ECN nor DSCP are copied in or out of the packet. Do you ever run into trouble with ensuring your flows are well balanced across ECMP?

ignoramous · on Dec 23, 2020

> ...when Fly.io launched, it had a pretty simple use case: taking conventional web applications and speeding them up, by converting them into Firecracker MicroVMs that we can run close to users on a network of servers around the world.

Technically, when fly.io first launched, its usecase was fast, anycast "middleware" for web apps: https://arstechnica.com/information-technology/2017/04/pushi... :)

sargun · on Dec 23, 2020

I'm a little confused as to how come you don't run a GUA /128 to each container (out of a per-customer GUA /64 say). If this is from an address range you own, you can filter and drop packets at ingress/egress from the customer address space in case somehow traffic is misproperly routed (not via the WG tunnel). As new containers go on and offline, you can distribute this state through the system, and if a user reaches out to an IP that you don't know whether or not it's assigned, you can consult an oracle as to the location.

tptacek · on Dec 23, 2020

That was the first design I considered. The "distribute the state through the system" thing is the hard part.

(There's no possibility in our system of routing customer traffic outside of WireGuard, because our entire connectivity fabric is WireGuard; nothing but WireGuard, for instance, would know what to do with an fdaa address, or, for that matter, how to reach an instance's IPv4 addresses.)

We can filter and drop packets at ingress/egress with this design as well.

What's the advantage you see to having assignments out of per-customer /64s?

mrkurt · on Dec 23, 2020

This sounds quite a bit more complicated than what we're doing, so the answer is "we angrily avoid complexity".

FOne interesting problem we have is that apps span regions, so the less information we have to propagate when VMs come up, the better. The 6PN routing is already established. When a VM boots it's available immediately because there's nothing to propagate.

ignoramous · on Dec 26, 2020

> This sounds quite a bit more complicated than what we're doing, so the answer is "we angrily avoid complexity".

Curious about fly's other design principles: A blog post about that would be wunderbar

social_quotient · on Dec 23, 2020

I love the writing style here. It seems to have the passion and perspective of a founder but is super technical. I’m curious how writing like this gets executed at a company like this. What’s the workflow?

mrkurt · on Dec 23, 2020

This is from a super technical founder, so we're lucky! The process isn't much of a process. Extracting this type of content from other technical folks is a nut we haven't cracked yet.

The workflow is literally: tptacek writes a post, we fix inevitable typos, and then we merge it.

2pEXgD0fZ5cF · on Dec 23, 2020

Wireguard is a project that will just never cease to amaze me

skybrian · on Dec 23, 2020

I guess there's no reason you can't have defense in depth, but I'm wondering if this private virtual IPv6 network for your whole organization might end up being considered "secure enough" by the complacent? Like being behind the firewall in the old days.

tptacek · on Dec 23, 2020

Careful; "organization" here is just a Fly.io account term. You can have multiple organizations, and segment however you'd like. What we're doing is analogous to AWS VPCs, just with IPv6 and static routing. (Probably, we should revisit the terminology).

kortilla · on Dec 23, 2020

> Our WireGuard mesh sees IPv6 addresses that look like fdaa:host:host::/48 but the rest of our system sees fdaa:net:net::48.

Is it not feasible to make wireguard more flexible with a bit mask? The annoying thing with having to use 1:1 NAT in a packet pipeline (which is what you’ve done here) is that logs, metrics, etc get all screwed up for correlation because wireguard doesn’t see the same things everyone else does.

I guess putting wireguard in the kernel wasn’t such a hot idea after all?

tptacek · on Dec 23, 2020

It's totally feasible; I could probably write the PR for that in a day, and I'm not a kernel dev. But it's an oddball feature, and one of the big reasons WireGuard works in the kernel is that it's deliberately tiny.

Think about how big the BPF program is that implements the switcheroo we do to get around this. It's not, like, a big ask.

raggi · on Dec 23, 2020

The date on the article is tomorrow. The future is clearly now.

colechristensen · on Dec 23, 2020

US pacific time zone is more or less last to the party. For 20 hours a day, it is tomorrow somewhere else.

josephg · on Dec 23, 2020

Its the 23rd already in much of the world. At the time of writing its 3:40pm on the 23rd here in Australia.

jamescun · on Dec 23, 2020

> To lock an instance into a 6PN network, all we really need is a trivial BPF program that enforces the “don’t cross the streams” rule: you can’t send packets between different 6PN prefixes.

Are you using eBPF for this egress filtering? My (hopefully mis)understanding is XDP programmes, the higher performance cousin of eBPF, can only work on ingress packets currently.

ignoramous · on Dec 23, 2020

Not OP, but see https://news.ycombinator.com/item?id=24848391

> The unlucky bit is WireGuard. XDP doesn't really work on WireGuard; it only pretends to (with the "xdpgeneric" interface that runs in the TCP/IP stack, after socket buffers are allocated). Among the problems: WireGuard doesn't have link-layer headers, and XDP wants it to; the discrepancy jams up the socket code if you try to pass a packet with XDP_OK. We janked our way around this with XDP_REDIRECT, and Jason Donenfeld even wrote a patch, but the XDP developers were not enthused, just about the concept of XDP running on WireGuard at all, and so we ended up implementing the worker side of this in TC BPF.

jamescun · on Dec 23, 2020

Thanks!

yencabulator · on Dec 29, 2020

Is there reliable & secure reverse DNS?

If only fearsome-bagel is supposed to access serf, can serf do a reverse lookup on the source IP and then make sure it matches regexp /^fearsome-bagel-\d+\.internal$/?

aaronblohowiak · on Dec 23, 2020

Fascinating — have considered something similar. How are you dealing with the difficulties of core load balancing and ecmp that wg’s information hiding create?

ignoramous · on Dec 23, 2020

That's tptacek: Jon Skeet of news.yc https://news.ycombinator.com/leaders

ignoramous · on Dec 23, 2020

Sorry: The above was meant as a reply to this question [0].

> I love the writing style here. It seems to have the passion and perspective of a founder but is super technical. I’m curious how writing like this gets executed at a company like this. What’s the workflow?

[0] https://news.ycombinator.com/item?id=25516398

mrkurt · on Dec 23, 2020

Since these are all internal and 1 to 1 with VMs, we don't really have to worry about it.

yencabulator · on Dec 29, 2020

Small typo: `fdaa:net:net::48` should be `fdaa:net:net::/48`

aaronblohowiak · on Dec 23, 2020

Do you also provide within-organization inter-app ACLs?

tptacek · on Dec 23, 2020

We could! What would you want to do with them?

aaronblohowiak · on Dec 23, 2020

Well 6PN’s are effectively VPCs and within-org ACLs could be analogous to security groups - not every app within an org needs to be able to talk to every other app, though there are not clear “islands” of communication within the graph so using separate orgs for distinct communities of apps isnt a tenable strategy. Segmenting the bottom 48 bits into app Id / app instance Id and then doing more shenanigans at the wg routing layer would provide you with some of this functionality. This is something I am thinking about for my employer (Netflix) which does not offer a public PaaS, but is also trying to scale and secure a big flat network.

I do believe that the dual goals of network identity and scalability are difficult to solve for simultaneously (in the old world, we relied on subnets, but this is the opposite of multi-tenant container hosts..) so i think we want to have simple physical networks with routing and then some other means of trusted network identity, which leads you to crypto-derived solutions (really, it seems like wg and IPSec are the biggest contenders here and both have pros and cons)

Anyway, seeing your post was validating about this line of reasoning and I am curious to follow along to see how you are thinking about it more.

Havoc · on Dec 23, 2020

Seems like a pretty neat use case

birdsbirdsbirds · on Dec 23, 2020

Offtopic: [1]

>North America Europe 100GB per month free $0.02 per GB

>India 30 GB per month free $0.12 per GB

How come India is so much more expensive?

Is this how Jio manages to keep consumer costs down, by making service providers pay the bill?

[1] https://fly.io/docs/about/pricing/#outbound-data-transfer

stevefan1999 · on Dec 23, 2020

Supply and demand

and vicious cycle (lack of demand -> cant sustain to pay for equipment and maintenance cost -> rise higher price to compensate -> be more expensive -> subscribers cant pay -> quit -> lesser demand -> rinse and repeat)

mrkurt · on Dec 23, 2020

We have a blog post about that! https://fly.io/blog/we-cut-bandwidth-prices-go-nuts/

The tldr: we don't have heavy usage in India. When we do, it'll be about half that price.

fs111 · on Dec 23, 2020

What's with the condescending tone of the article? The technology sounds cool, but the tone is off putting.

tptacek · on Dec 23, 2020

You're not wrong. The article is trying to do two things at once: convey to Fly users a new feature we launched and what to do with it, and also talk to the Internet about how we build stuff just as a part of, like, a broader tech industry conversation. The two goals ask for different tones, and they clash.

The right thing to do would be to write two blog posts, but there's no better way to get me not to write something than to tell me I have to write two somethings.

kinoriw · on Dec 23, 2020

felt that too, a bit confusing ngl