Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks neat. When you detect anomalies, how can you tell whether it's the cloud provider or the public internet or a transient peer tho?


For data and control plane I can determine the issue from the API request/response logs (i.e., network timeout, 5xx, etc.). Network tests are trickier and we don't have a great way to validate failure cause each of those events (i.e., we don't capture a traceroute on failure), other than to evaluate results from multiple endpoint combinations (e.g., AWS us-east-1 to us-west-1 fails while us-east-2 to us-west-1 succeeds).


For networking, the site only reports uptime % for zonal, regional, cross-region or cross-cloud tests. It excludes last mile network tests as those fail frequently due to many hops and endpoint unreliability (we use Ripe Atlas and Globalping.io endpoints which are not always reliable even with redundant probes per test).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: