Scientific notation bug in MySQL left AWS WAF vulnerable to SQL injection

dragonwriter · on Oct 22, 2021

I thought this was an SQL injection in WAF, turns out it just is a bug in the SQL injection protection provided by WAF, because WAF’s generic SQL injection protection isn’t bug-compatible with MySQL SQL parsing, so that a vulnerable app behind WAF would not be protected by WAF from an attack using the shouldn't-be-valid SQL.

Honestly, I think this highlights a fundamental weakness in trying to put generic SQL injection protection in an external layer rather than just avoiding treating untrustworthy input as SQL in the first place. Conceptually, you need to replicate every protected SQL engine and the manner in which every input might be concatenated to get SQL, or you are going to have exploitable holes.

icecap12 · on Oct 22, 2021

A WAF is obviously not a control designed to be a full-scale replacement for good input validation and development best practices. So yes, its inherent that there are fundamental weaknesses to WAFs.

However, there are still benefits. You deploy a WAF as part of a defense in depth strategy, with one of the best use-cases being situations where you have legacy web systems that nobody is maintaining. Additionally, you can get TLS upscaling, easy HTTP rewrite capabilities, DDoS protection, and other granular controls with some SaaS offerings. So while it's true that a WAF won't stop a determined attacker, there are certainly benefits to operating them, particularly in large enterprise environments.

edit: spelling

freeqaz · on Oct 22, 2021

This is true when you think about it from a security perspective, but it's definitely not always the case. Many people _do_ rely on a WAF to outsource their security. It's often political, from my experience, that security is either punted or not prioritized in the promo process for most SWEs.

It's the unfortunate reality for many companies to just not actually care about security. If you only promote people based on them shipping features but not fixing security issues, then you slowly remove all incentives to care at an individual level. It's just game theory because there are few penalties for not caring about security.

Fortunately, the winds of change are blowing now with new regulations. Society is beginning to force companies to care. We're still far from living in a reality where most companies have a strong security posture with their tech though. It's going to take time and energy for frameworks and development methodologies to catch up (and legacy software to die off or be updated).

Kalium · on Oct 22, 2021

Agreed, it's almost always political. Product has a schedule, developers find it convenient to say "Well, we have a firewall", and now suddenly they can hit the deadline! That they got there by deciding to not validate their inputs isn't important. They can handwave it away as "We make certain assumptions about the traffic coming in over the network" and Product learns a neat trick to ship faster.

I've seen this exact dynamic play out several times. Lots of vulnerabilities were created, and infosec pointing to this only attracted the ire of Product. And angered several Director-grade people in engineering who were spreading the idea that a firewall is a replacement for input validation.

chrisandchris · on Oct 22, 2021

> […] AWS Web Application Firewall (WAF) customers were left unprotected to SQL injection […]

If a WAF is your _last_ line of defense, you‘re doing it wrong.

sebazzz · on Oct 23, 2021

This. It is _a_ line of defense, like so many others. Same as obscurity can also be a line of defense.

collegeburner · on Oct 22, 2021

This is why in my new project I decided to use prepared statements wherever possible, it seemed like simply the best choice because sanitizing is never perfect. Here's the article that convinced me in case anybody else wants to read: https://kevinsmith.io/sanitize-your-inputs/

MattPalmer1086 · on Oct 22, 2021

This is absolutely the best mitigation for SQL injection.

It can also speed up your queries as it allows better caching of the query plan.

It always annoys me when SAST products on detecting injection advise using better input validation. No! Use parameterised queries and be done with it.

illvm · on Oct 22, 2021

Input validation is more general and helps mitigate other types of vulnerabilities, such as overflows and other injection attacks.

MattPalmer1086 · on Oct 22, 2021

I'm all for defence in depth, and agreed there are many other types of injection attack this doesn't mitigate.

Shorn · on Oct 22, 2021

Until someone wants to do something slightly different, can't figure out how to make it work because they've tried nothing and are all out of ideas - so they just use string concatenation and execute the string dynamically. Ask me how I know.

deepsun · on Oct 22, 2021

It still fascinates me why people take human-focused formats (SQL, shell commands in one line) and use them in automated programs. If we'd used application-focused formats, there would be much less room for injections. E.g. MsgPack for SQL statements. If a program does not interpret shell commands, but composes raw ["program", "argument"] and runs it, there's no room for sh-injections.

Same can be said for HTML (it was designed for human writing/reading), but it's too late to change that. But SQL and shell commands can be fixed today in application access.

defanor · on Oct 22, 2021

> It still fascinates me why people take human-focused formats (SQL, shell commands in one line) and use them in automated programs.

I tend to assume that it happens because it is what's available, and possibly easier (at least in some cases, but then the lowest common denominator wins). Unintended and not quite appropriate use of technologies in general seems to often happen that way.

> Same can be said for HTML (it was designed for human writing/reading), but it's too late to change that. But SQL and shell commands can be fixed today in application access.

Well, HTML 5 includes the XML-based syntax, which is quite machine-friendly. And for running programs one can indeed skip a shell and just run it with provided arguments and environment variables when it's done programmatically. While to SQL there's usually no alternatives in relational databases.

Though I think even SQL isn't inherently bad: a complete, specified, and sensible grammar, together with a proper parser, would allow sane composition and processing, and exclude the bugs such as this one.

rini17 · on Oct 22, 2021

The root problem is programming languages don't have the symbolic facilities to construct SQL statement safely, or they are incredibly awkward. Distinguishing keywords, parameters and table names at source code level would allow for strict version of SQL with simple unambiguous syntax throughout.

But nope, let's just concatenate some strings.

zokier · on Oct 22, 2021

What about LINQ?

rini17 · on Oct 22, 2021

Never worked with dotnet so no idea.

kayodelycaon · on Oct 22, 2021

Ideally, a web application firewall is a first line of defense. (If you use one.) It’s effectively an advanced form of security through obscurity. It stops routine attacks, reducing the work of the machines behind the firewall.

If it’s your only protection against SQL injection, you’re going to learn WAF’s limitations at a very inconvenient time.

snvzz · on Oct 22, 2021

AWS WAF might be very useful, but there's only so much you can do when the database is mysql.

Postgresql has been around for a while. There's literally no reason to use mysql except legacy code in life support mode.

paulryanrogers · on Oct 22, 2021

MySQL has undo logs while PostgreSQL has vacuum based MVCC. It's a significant difference for some workloads. And MySQL is closer to standard SQL on some things like PL/PSM.

I personally still prefer PostgreSQL but can't deny the benefits of a well configured MySQL installation.

mrweasel · on Oct 22, 2021

> There's literally no reason to use mysql except legacy code in life support mode.

Multi-master setups? I don't think Postgresql is capable of that yet.

Does anyone actually use/depend on a WAF (AWS or others)? It feeling like a clever solution, but also remarkably like snake oil.

dagw · on Oct 22, 2021

Multi-master setups? I don't think Postgresql is capable of that yet.

It doesn't support it "out of the box", but there at least two different third-party solutions that add multi-master replication to PostgreSQL.

snvzz · on Oct 23, 2021

There's also Amazon's own Aurora in postgresql mode.

Relevant because AWS context (AWS WAF was discussed).

a-dub · on Oct 22, 2021

> The situation is complex. If requests are malformed, it is natural that security products wouldn’t consider them valid SQL, thus making them unnecessary to block.

what?! "this doesn't look like valid sql, so let it pass and have the potentially vulnerable sql server with all the data and creds and tunnels on it parse and reject" sounds ridiculous to me!

isn't the whole point of having a security proxy to check if requests are well-formed, and if they're not, block those requests so that the more complicated implementation in the actual server on the production host is protected from malformed requests that are potentially malicious?

i get it, building a parser and validator that is independent of the one that is in the actual database server, yet implements the same language is hard... but in my mind, the whole point of using some sort of firewall is that it looks for quasi-valid things and blocks them! (where the parser in the database server, and the parser in the firewall, are two independently hardened pieces of software developed from the language spec that provide redundancy.)

edit: ok, i think i see the complexity now. forgot that sql injections are not complete queries, but rather fragments of sql, so they're harder to detect since sql fragments can look so general. (ie; a strict parser for finding them is required otherwise all sorts of reasonable things would get blocked)

isn't this why we have pre-tokenized sql queries? they've been around a long, long time.

gruez · on Oct 22, 2021

>isn't this why we have pre-tokenized sql queries? they've been around a long, long time.

Because it's not a silver bullet. At least for postgres, they only allow parameters with literals. Something like

    SELECT * FROM users WHERE username = :username

works fine, because :username is substituted with a string literal (eg. 'user@example.com'). However, you can't do

    SELECT * FROM users WHERE :column = :value

(to do dynamic lookups), because you can't use :column to select the "username" or "email" column. Of course, using an ORM or query builder can potentially solve this issue, but getting those to work and/or refactoring your code to do so is non-trivial, and it's not guaranteed that they're safe from sql injections either.

a-dub · on Oct 22, 2021

> (to do dynamic lookups), because you can't use :column to select the "username" or "email" column.

that's by design. in 99% of cases you should never, ever do this.

maybe if you're building a database exploration/management/manipulation tool, sure... but for all other cases (including weird manual table sharding schemes), the queries should be literals that are included in static code (possibly codegened if there's too many of them).

under no circumstances should a user string ever be used to construct a column name.

> Of course, using an ORM or query builder can potentially solve this issue, but getting those to work and/or refactoring your code to do so is non-trivial,

orms are really complicated. query builders are probably okay for solving the issue most of the time. if i was building production software that i thought may be targeted, i would either write all the queries by hand, or codegen them ahead of time. if i had to pick them at runtime based on user input, i'd have one lookup function that takes the user data and returns a handle to the prepared query.

> and it's not guaranteed that they're safe from sql injections either.

is that actually true? i'm pretty sure if you use a prepared/tokenized query, all your sql injection nightmares will go away.

sanitize untrusted strings and use prepared queries. problem solved.

personally i think database drivers should have a mode that removes any capability for doing queries with raw sql strings.

fabian2k · on Oct 22, 2021

Let's say I have a table in my application, and I want the user to be able to select which columns of a very wide table to show. I could of course return all columns all the time and filter on the frontend, but the more efficient way would be to determine the columns to be fetched from user input. And at that point I need to derive column names from a user string. Of course this should be implemented as picking from a whitelisted set of column names.

function_seven · on Oct 22, 2021

I've had to do similar in the past. I ended up enumerating all the columns, intersecting that set with the ones submitted by the user, and using the result in the SELECT clause of the query.

Not as easy as just dropping in the user's choices, but guarantees they have no control over the resulting query string.

a-dub · on Oct 22, 2021

in terms of performance, i would err on the side of reducing work for the database, assuming the middleware/rpc layer is written in a language that has a fast runtime. i suspect that using a smaller number of prepared queries at the cost of shipping more data around (especially if the number of rows per query is limited) would be more performant than custom queries. prepared queries avoid repetitive string parsing of the query language and give the db backend more opportunities to optimize. less entropy in the overall workload also can result in better utilization of caching. if there are a few common display filters, additional prepared queries could be added to optimize them. in practice, it's unlikely that there would be a lot of entropy in column selection.

gruez · on Oct 22, 2021

>i suspect that using a smaller number of prepared queries at the cost of shipping more data around (especially if the number of rows per query is limited) would be more performant than custom queries.

that depends entirely on how many columns/rows you're returning.

> prepared queries avoid repetitive string parsing of the query language and give the db backend more opportunities to optimize.

On postgres at least, prepared statements only last for a session, so the benefits you state is dubious. Even for databases that support persistent prepared statements, the fact that they're part of the database makes it a nightmare to deploy. Every time you update a query, you'll need a mechanism to update the corresponding prepared statement.

a-dub · on Oct 22, 2021

> On postgres at least, prepared statements only last for a session, so the benefits you state is dubious. Even for databases that support persistent prepared statements, the fact that they're part of the database makes it a nightmare to deploy. Every time you update a query, you'll need a mechanism to update the corresponding prepared statement.

sessions should live for the lifetime of the processes of the backend application server, so they would get updated every time you deploy. if you don't have a long living backend, then usually a long living proxy is used.

if you're setting up a new session on every request, it seems there's not really much point to worrying about performance!

anyway, every application is different and it's worth benchmarking. if i were building something today that used a sql backend, i would err on the side of using prepared statements as much as possible, both for security and performance reasons. maybe string queries could make sense, but they're both more expensive to run and more likely to result in a vulnerability, even if they were coded carefully at first implementation.

Kalium · on Oct 22, 2021

How sure are you that you can get your parser and validator correct in every case? And how sure are you that you can correctly define and then notice "quasi-valid" at the edge of the defined spec and implementation? How sure are you of your version-specific bug-for-bug compatibility, beyond the edges of the spec?

I find it helps to remember that a firewall should be just one layer. You should have others, each of which blocks things detectably malicious and lets the rest go. It's not reasonable to expect a firewall to stop any and all potentially malicious traffic. It's not a replacement for input validation and good development practices, even though I have had coworkers who thought this.

a-dub · on Oct 22, 2021

> I find it helps to remember that a firewall should be just one layer.

totally agree.

Traubenfuchs · on Oct 22, 2021

Why would anyone ever need a WAF against SQL injections?

Isn‘t injection safe SQL development beaten into everyones head from all directions, all the time?

RNCTX · on Oct 22, 2021

Reminiscent of the talk in which “SELECT 1 / 0” was offered as reason enough in one line of code to not use MySQL.

n3t · on Oct 22, 2021

What talk are you referring to?

RNCTX · on Oct 22, 2021

https://vimeo.com/43536445

jamesfinlayson · on Oct 22, 2021

Interesting bug, but on reflection I find it more interesting that AWS WAF uses MySQL under the hood.

edit: I see WAF doesn't use MySQL - thanks. I've seen a few false positives in Akamai's SQL-in-URL detection so not surprising that AWS let something like this slip through.

jcims · on Oct 22, 2021

It doesn’t use mysaql under the covers, it’s just that AWS WAF doesn’t know about this syntax so it doesn’t filter for it.

Singling out AWS WAF is a bit awkward. I’m sure there are other products that don’t protect against this. WAF in general is a mess, the entire philosophy is broken and it’s at best a speed bump. I don’t know why the industry allowed the moniker.

jamesfinlayson · on Oct 22, 2021

Yep right you are. Agreed though - I think Akamai thinks URLs containing an = are suspicious, even if the URL looks nothing like SQL and very much like a path to a .jpg file.

dikei · on Oct 22, 2021

Yup. Just consider WAF as a low-pass filter that clean out the noise and let you focus on the real attacks that get through.

smolder · on Oct 22, 2021

Another good use case is hardening web applications that you aren't licensed to modify, or otherwise can't secure in a way that's internal to the app. (That could be due to limited time, too much complexity, inaccessible source code, or missing domain knowledge, maybe.)

Using a SaaS WAF for greenfield development always seemed to me like... I guess you'd say an "architecture smell" as opposed to code smell. More nodes in a network graph are worth avoiding if you can consolidate without losing functionality, velocity, etc.

Using WAF to get defense in depth starts making sense to me only when the paranoia level is sufficiently high and you're out of relatively effective ways to harden the other layers further.