Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
‘You can't use Brotli for dynamic content’ and other misconceptions (certsimple.com)
136 points by nailer on June 26, 2018 | hide | past | favorite | 80 comments


We just had tried brotli and decided not to implement it yet.

One of our devs just returned from a conference super excited about brotli. I'm the ops guy and want to try to support excitement, so I spent the better part of 2 days setting up an experiment, but I was skeptical. I'd been keeping my eye on brotli for years.

I first went out and looked for some existing evaluations of brotli, and found years old write-ups, one from Cloudflare from 2015, and another but I can't remember the details there. They both concluded that brotli wasn't ready yet, but they were also old...

Our stack uses Apache on Ubuntu 16.04, and the first stumbling block was that the Apache version on 16.04 is just a bit before the mod_brotli was added. So I used Ansible to spin up an 18.04 web server which did have the right version of Apache. But, apache is built without brotli support. So I ended up having to build my own custom packages, which went fairly smoothly.

We have one particular asset that right now we would like to speed up. It is a big JSON data structure. The dev did some testing and found that his 350K data-set (with gzip) ended up as 320K with brotli.

Maintaining our own custom Apache packages, and possibly having to upgrade our entire stack from 16.04 to 18.04, plus resolve known problems with OS packages that we've seen in 18.04, for a <10% compression gain, for our cherry-picked "best example for brotli wins"?

Our decision was that if we could have just turned it on via changes to the Apache configs, we would have done it. But as it is, it's not worth it for us to do.

YMMV.


Cloudflare now supports Brotli compression through their Dashboard (it's in the Performance tab). We've been using it for at least a month and haven't had any issues with it.

We didn't do any before/after testing to see how much data it saved our users but the small tests we've done in the office showed around a 2-5% size reduction for our static assets which was great to see for all the effort of toggling a switch.


I understand this isn't because of some evil plot or crazy corporate machination, and I'm talking of different teams, but it's interesting how Google (which spends a lot of resources on developer relations and web performance education) created Brotli, released it, and then mostly stopped advocating for it.


May be because Brotli was mainly for WOFF2 font format, so mission accomplished?


None of that has anything to do with brotli. Blame the distro or the package maintainer.


Does it matter? The OP's evaluation methodology, for better or worse, is the one a big segment of the population uses to evaluate new technologies. It's critical to make sure that packaging is smooth, defaults are set reasonably, and tutorials are up to do --- otherwise, nobody's going to use your stuff no matter how good it is.


His point is that brotli's advantages aren't big enough to justify it until it becomes ubiquitous. Compression improvements of less than 10% just don't justify much effort to get ahead of the curve (in most situations).


Part of Operations is weighing new and shiny features versus the maintenance cost of supporting them. You can make an amazing amount of work for yourself by deciding you are going to run the latest and greatest, and becoming the maintainer of your own custom distribution. My shop is on the small side for deciding to bite that off though, so we try to leverage Ubuntu's expertise in doing that whenever possible.


2.5 is the development branch. That should not be used in production.


The versions I was speaking of were Ubuntu 18.04 = 2.4.29, Ubuntu 16.04 = 2.4.18, mod_brotli was introduced in 2.4.26.


The problem you're describing is more about your choice in stack than brotli. NginX solved this problem with dynamic modules. I would argue that compiling your own software is a core part of servers and apart of running any service on the web that you are developing custom software with. It should be an automated process with release schedules the same as anything else, because not knowing how to build your stack will become painfully obvious when the 'official' releases have a problem.


Apache has dynamic modules too, and they're generally better designed. In nginx they were added later and a module compiled against one version of nginx won't work against any other. With Apache you could download a mod_brotli binary module and use it with any (2.4) Apache build.

(I used to work on mod_pagespeed)


That's what I had hoped for, but I didn't come across any stand-alone builds of mod_brotli, only the one included with 2.4.


We're going to have to agree to disagree there. Leveraging other people's work in building an LTS is money well spent, especially when you can get it for free. Managing your own custom builds, is a fairly big job when done right, even when automated.

I know how to build my stack, it's encoded in Debian packages all around the Internet.


Cloudflare analysis divides size difference by time difference. That function is commutative and cannot be used for meaningful comparisons. This leads to faulty numbers and faulty analysis.


Speaking of Cloudflare, they do brotli encoding for you at the free tier. Not sure about other CDN's but I'm sure it's a matter of time.


It makes sense at the scale of a CDN, but I can't see it being worth it for anyone else (re: other threads here).


If you want to attempt building nginx with brotli support, let me save you 15 minutes: Google's original module is more or less abandoned. Look here instead: https://github.com/eustas/ngx_brotli

Another alternative is exploring h2o/caddy as a web server replacement.


(deleted, I see your point now)


But that PPA is using Google's repo as upstream; what's the point of the OS alerting you to updates, if there are none?


This still relies on a special build of Nginx (from the PPA repo) instead of LTS Nginx, although I agree that it's already more convenient than unpackaged sources.


The CertSimple article completely ignores CPU and memory usage. That is where Brotli can become completely unacceptable for dynamic compression – it uses vastly more CPU and RAM than gzip.

Brotli is fine if you can precompress static files on a prep server or desktop.

It's unprofessional and unscientific to evaluate compression codecs without measuring how much CPU and memory they use. Brotli is a disappointment, coming 20 years after gzip. In 20 years we should expect something better than Brotli, something that is better optimized for modern CPUs, vectorized by default, and with a much better dictionary.


CPU usage (but not memory usage) seems to be addressed by the section "Brotli can compress faster than gzip and still produce smaller files", quoting https://blogs.akamai.com/2016/02/understanding-brotlis-poten...?



I find zstd to be almost twice the speed of brotli when compressing files (my simple test case is a 55M directory of small files - about 2k each)

However, zstd doesn't have the browser support that brotli does. So you can't easily replace brotli with zstd for web use.


Zstd compresses 5% less densely. Zstd compresses slower to a given density. Decoding zstd may use 128 mb of ram whereas Brotli may use up to 16 mb.


Isn't it like within 10% of the default gzip settings in every case?


FWIW at Cloudflare we were running brotli for dynamic content for a while now. However the Cloudflare gzip library is much faster than brotli. You can find a benchmark in here: https://blog.cloudflare.com/arm-takes-wing/


Why not standardize zstd? it seems to beat brotli in ratio,comp/decomp based on the benchmarks below:

https://facebook.github.io/zstd/


Just get browsers to implement it, I think by the time Zstandard was open sourced, there was some browser support for Brotli already (probably because Google controls a major website, a major browser, and the standard library for a major OS).

Timeline: Zstandard was released August 31, 2016. By that point, Firefox, Chrome, and Opera already had Brotli support. Safari and Edge got it shortly afterwards.


I think browser support for Brotli is mostly a side effect of supporting WOFF2 font format (which uses Brotli).


One reason might be that browser manufacturers are not keen to have a decompression that might allocate 128 mb per connection.


Always look for third party benchmarking. High quality benchmarks can be found at lz turbo or from encode.ru


Worth noting for backend engineers: every major browser supports Brotli compression, and 85.22% of users run a browser that supports it[1].

[1] https://caniuse.com/#feat=brotli


I wish it wasn't, but IE11 is still a major browser, at least too big to just dump completely in my experience. The UK is worse than most countries in that regard though with higher than average IE11 usage.


Indeed, but this is not a problem for this enhancement specifically. IE11 asks for gzipped files, and the server can still send them when the client doesn't support Brotli, based on the Accept-Encoding header.


It's kind of sad that more people use Internet Explorer 11 than Edge…


9 IE users for every 1 Edge user in our stats. Compared with 9 Win7 users for every 7 Win10 users.

We serve small to medium sized businesses for our customer base in NZ.

Although I am not surprised: the Edge UI is buggy and the browser engine has its problems with sites too. And we have put a lot of work into ensuring Edge works properly for our clients.


Mirrors our experience: https://blog.cloudflare.com/results-experimenting-brotli/ but we chose to use Brotli level 5.


Does Cloudflare pass a dictionary when compressing for your clients? Would be interesting to hear about excursions in testing dictionary sizes/variations at that scale.


I'll suggest to someone internally that they write up all the details of how we support Brotli.


Edit: the title was changed again to it's current form(a moment ago it was 'brotli can be smaller and faster than gzip'). Thanks HN mods.

Original post below.

>

Author here. This was submitted with:

    'You can't use Brotli for dynamic content' and other horses**t (2018)
Odd the title was changed, you can see from the timestamp I spent yesterday updating it for 2018 (brotli is more of a big deal now iOS users can use it). Also the original title highlighted the very specific point that a lot of what is written about brotli online is incorrect. Most HN users are grown ups and have handle a starred-out swear word. Oh well.


I never understood the starring out thing. If the forum is inappropriate for that word, then using stars is also inappropriate.


I like your title much better than the title it was replaced with.

The title as it stands now is meaningless, because it applies equally to nearly any pair of compression algorithms.


OK, we've put the title back closer to the original. “Horseshit” (we don't need to bowdlerize) is clickbait, so we've gone with “misconception”. Happy to update again if there's a better word.


Brotli is amazing. But...

> Brotli can compress faster than gzip and still produce smaller files

Huh?

> For dynamic content, we'll use 4, which still produces smaller responses but takes less time to compress than gzip or brotli on a higher setting.

Oh if you compare to gzip max. But who runs gzip max? Gzip 6 gives decent low cost compression. Gzip 9 is barely smaller but much slower.

Algorithms have their sweet spots. Brotli (small), LZMA (fast), gzip (somewhere in between).

Knowing nothing about my clients, bandwidth, or dynamism, I can turn on gzip without really a second thought.


This reminds me I have to develop and send sometime a PR for Axios to support Brotli. We turned brotli on and figured out we only get garbage because Axios itself also needs to understand on how to handle brotli https://github.com/axios/axios/blob/cb630218303095c0075182b5...


> Modern websites in particular often have large JavaScript bundles - the front page of CertSimple is 242 K gzipped, and would be 1.1MB uncompressed!

What puzzles me though... How much of this 1.1MB is really needed? You can do form validation with HTML5. I don't see any other candidate on their front page. So instead of getting a 14% advantage, what about pushing it to 100% or so?


HTML5’s built-in validation is actually not all it’s cracked up to be just yet, because the currently defined and supported CSS selectors don’t cover all cases needed: you don’t want all your fields to be styled invalid on page load, only once the user has interacted with them or tried to submit. Firefox has :-moz-ui-invalid, I don’t think others have anything, and :user-invalid is the direction Selectors Level 4 is going, see https://drafts.csswg.org/selectors-4/#user-pseudos.

If you want a pleasant experience, you can’t use HTML form validation by itself. You need some added JavaScript.

(I am not defending over a megabyte of JavaScript. I like to do such things from scratch and optimise for size, and all the functionality that I see on that page looks like it’d comfortably fit in under ten kilobytes of JavaScript. The CSS can probably also be decimated. But these things only happen that way if you value such performance, and few do—perhaps justifiably, perhaps not.)


So you looked at their front page and concluded that they should be able to get rid of all javascript?

CertSimple is a full dashboard and application that does a lot of crypto in the browser which requires quite a lot of code.

Sure, there are probably gains to be had with code splitting and lazy loading, but those get significantly more complex and can cause issues for codebases that aren't setup to take advantage of it from the start.

And improvements are improvements. This was a blog post showing you how and why to implement brotli, not a request for how to improve page loading speed for their application specifically.


I looked at the front page because they specifically mentioned it in the article:

> the front page of CertSimple is 242 K gzipped, and would be 1.1MB uncompressed!

Of course the full app is a completely different thing.


> Of course the full app is a completely different thing.

Author here: no. The app is the front page.

/blog is a seperate AMP based site and tiny. Search 'discify' in this thread for a breakdown of what's in the module.


For those who wish to put Brotli support on Microsoft's radar: https://windowsserver.uservoice.com/forums/310252-iis-and-we...


>Modern websites in particular often have large JavaScript bundles - the front page of CertSimple is 242 K gzipped, and would be 1.1MB uncompressed!

Mike, I’m afraid you have bigger problems than inefficient compression...


CertSimple is a fairly complex web app with a bunch of ractive components that use webcrypto (requiring ASN1, PKI libraries etc) and URLs (using a WhatWG URL polyflll) which takes up a large chunk of the bundle.

We use https://github.com/131/discify to analyse this and should be able to cut down a little further once the URL spec is more widely implemented.


If you're interested in analysing CertSimple performance, removing the artificial 8s delay on the CertSimple blog might be another line of inquiry.

    animation:-amp-start 8s steps(1,end) 0s 1 normal both


Blog performance is also decent - on all platforms (including desktop), you're viewing an AMP page, so the only js is AMP itself.

I don't know specifically what amp.js is doing there with the animation but since the entire purpose of AMP is performance, I'm sure there's a solid motivation behind it.


It's so users don't get to see the horror that's the non-JS-ified view of the page before the AMP scripts kick in. Pretty bad reasoning for something so annoying in my book.


> I don't know specifically what amp.js is doing there with the animation

The snippet I posted is a small bit of inline CSS at the top of your page that hides all page content for 8s. There is no JS involved, and the mechanism is quite clear if you know CSS.

> the entire purpose of AMP is performance

I'm afraid you are sorely mistaken. (See the comments on any AMP-related HN submission if this is news to you).

> I'm sure there's a solid motivation behind it.

What makes you so sure? The motivation is to artificially degrade the experience for any users that choose to block 3rd-party AMP tracking scripts. There is no other reason behind it.

> I don't know specifically what amp.js is doing

Normally, one can overlook people throwing random 3rd-party scripts into their blog with no clue what they do, but for someone blogging about using brotli and commenting about applying discify, this kind of wilful ignorance is a little ironic. Particularly after someone points out that there is an 8s delay in your page load.


> The snippet I posted is a small bit of inline CSS at the top of your page that hides all page content for 8s. There is no JS involved

No, the inline CSS is added by amp.js.

> > the entire purpose of AMP is performance

> I'm afraid you are sorely mistaken (See the comments on any AMP-related HN submission if this is news to you).

I wrote a bunch of those comments. Now Malte is allowing people to host AMP content on their own domains I trust it a little more.

> The motivation is to artificially degrade the experience for any users that choose to block 3rd-party AMP tracking scripts.

AFAIK amp.js doesn't do any tracking on it's own - you need an analytics module to do that. Do you know differently?

> > I don't know specifically what amp.js is doing > this kind of wilful ignorance is a little ironic

Not really. Do you know your CPU microcode well or would you consider yourself wilfully ignorant? Perhaps rather than being wilfully ignorant, I choose to focus my time on what matters to my customers (with a side of arguing on Hacker News).


> No, the inline CSS is added by amp.js.

It definitely isn't. It's in the raw source, sans-JS. Whomever developed the blog theme placed it there explicitly. (it is a snippet given to you to copy paste while implement amp, but it serves no purpose other than to degrade performance for non-amp users).

> Malte is allowing people to host AMP content on their own domains I trust it a little more.

That's not how "AMP cache" hosting works. AMP is a system whereby the site source is crawled and rehosted on the server of an indexer. You can host it on your own domain to make yourself feel a little better, but the only people who see that copy will be those to whom you give that direct link (e.g. HN readers). Indexers are free to rehost your AMP content on their own "AMP cache". i.e. it's still going to be served from https://www.google.com/amp/s/certsimple.com/blog/nginx-brotl... for any mobile search traffic coming from Google (note the link is referer-dependent).

> AFAIK amp.js doesn't do any tracking on it's own - you need an analytics module to do that. Do you know differently?

You need an analytics module to avail of an analytics service to do your own tracking. There's nothing to indicate that Google do no internal tracking of their own.

> Do you know your CPU microcode well or would you consider yourself wilfully ignorant?

The intent of my comment was to point out that being aware of this issue involves a relatively low level of knowledge/investigation/interest. So low in fact that in my very first comment, I posted the entire 53 character sourcecode of the issue. Also, CPU microcode changes aren't causing 8s page load delays for me.


You're right, it's not from amp.js, it's from https://github.com/ampproject/amphtml/blob/master/spec/amp-b...

From that: AMP HTML documents must contain the following boilerplate in their head tag.

> That's not how "AMP cache" hosting works.

With prefetching my understanding (from what's been written publically) is that content will be allowed to be hosted on your own domain. Ie, your domain in the address bar. Are you saying otherwise?

> There's nothing to indicate that Google do no internal tracking of their own.

OK so you just disagree on where the burden of proof lies.

> The intent of my comment was to point out that being aware of this issue involves a relatively low level of knowledge/investigation/interest.

You think it causes an 8 second delay. It doesn't. The page loads unbrowsercached in less than a second. The low level of knowledge/investigation/interest still seems to be above your own.


> With prefetching my understanding (from what's been written publically) is that content will be allowed to be hosted on your own domain. Ie, your domain in the address bar. Are you saying otherwise?

No. As I said, you can host it where you want. You can visit that domain and see your content. But Google are still going to rehost your content on their own server and direct their users to their copy, on their domain.

> You think it causes an 8 second delay. It doesn't.

Apologies, I assumed you'd have some understanding of frontend technologies when I posted the code above, so I'll explain how it works:

- you have an inline CSS snippet that hides all page content for 8s (there is no reason for this). This is executed immediately.

- you then have your amp.js file, pulled from Google's server, which should download and parse relatively fast (especially if you've put the script tag in the document head, as is recommended)

- once amp.js is loaded and is parsed, it enables some further inline CSS which overrides the first inline CSS snippet, unhiding all page content instantaneously

As I said above, the 8s delay is for non-AMP users; anyone not loading the amp.js resource from Google's servers, or anyone for whom the resource fails to load for any reason.


> Google are still going to rehost your content on their own server and direct their users to their copy, on their domain.

As long as my domain is in the address bar, and the content isn't modified - which I understand will be the case - I don't care.

> As I said above, the 8s delay is for non-AMP users;

No, that's not what you 'said' (wrote) above.

You wrote:

>> removing the artificial 8s delay on the CertSimple blog might be another line of inquiry.

You wrote:

>> The snippet I posted is a small bit of inline CSS at the top of your page that hides all page content for 8s.

Implying there was an 8 second delay that normal users would see.

Even now you're jumping through hoops to say 'non AMP users' where in reality you mean a tiny niche of users with broken network connectivity or who deliberate disable JavaScript, etc.

Thanks for wasting my time. If you'd have said what you honestly knew "I dislike this as it won't show the site if users can't load the JavaScript" I would have said "fine I don't care". Instead you pretended something was actually wrong and I wasted time communicating with you and being insulted.

Please do not communicate with me again. This is why HN needs a block button.


Are you doing code splitting? If not, this would be my first target for gains :)


Seperate bundles for different parts of the app, but not doing lazy loading or anything like that. Might look at it again if/when ES6 modules ever become popular (I just built a smaller project (https://mikemaccana.com) using ES6 modules and right now the lack of ES6 modules on npm makes any gains from tree shaking not worth the productivity loss).

Perf is always fun to optimise, but we're got features customers want that we need to work on first!


This together with cache-control: immutable is a great bandwidth saver: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ca...


See [Static/Dynamic web content compression benchmark]

One of the best benchmark about web compression including brotli + different gzip compatible libraries:

https://sites.google.com/site/powturbo/home/web-compression


Afaik combining compression and dynamic content has a tendency to weaken encryption (BREACH).

Compression ratios give information about the contents of a message. Hence, it seems to make sense to only compress static content from a security standpoint.


Sane defaults matter. Why does brotli do maximum compression by default?


There really is no such thing as a sane default for compression in general. There are three common use cases and each one demands different tradeoffs. Maximum compression is often suitable for compression once and decompression many times. Lower compression is useful for compression/decompression once. A middle value is useful for compression once/decompression less than once. Then you choose different values based on the relative costs of CPU, memory, network bandwidth, disk space, etc.

“Maximum” is good because it handles a common use case. When you’re running a CLI tool, you’re much more likely to be processing static assets which are served many times. So “maximum” is a good guess of what the user wants.


The offline brotli compression tool uses maximum compression by default. The ngx_brotli module uses 6. I think that's reasonable. If you're compressing a file offline, you probably don't have big time constraints, so it makes sense to try to get the smallest size possible.


We chose max as default because Brotli actually compresses more at max setting. Gzip does not compress much more at max than it does at faster middle qualities, so they chose a default from middle qualities.


Is there a good Brotli library for Python?


Have you tried searching Pypi?

https://brotlipy.readthedocs.io/en/latest/


I wish there was a simpler way to use Brotli with Nginx: at the moment, either you have to compile from sources, either rely on a PPA to use a special build of Nginx (and an extra module) instead of the one from the LTS packages, even in 18.04.

I'd especially like to use Brotli because it has good support in modern browsers (pretty much everything except IE11) and would be good replacement for gzip (which should be disabled for HTTPS).


> "gzip (which should be disabled for HTTPS)"

?

Please explain.


BREACH and CRIME [0,1] are attacks on the combination of HTTPS and compression.

There are workarounds / mitigations with various effectiveness, but disabling compression when TLS is used is the simple way to prevent the attack.

0: https://en.wikipedia.org/wiki/BREACH

1: https://en.wikipedia.org/wiki/CRIME


I thought the TLS length hiding extension was supposed to fix this, I wonder why it did not progress. Maybe it is just easier to disable compression for dynamic content.

[0] https://tools.ietf.org/html/draft-pironti-tls-length-hiding-...


The fact that there is no native JAVA encoder for Brotli is an obstacle for the adoption in the enterprise segment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: