Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I found this interesting, so I tried to test it:

On LC_ALL=en_US.UTF-8:

    $ echo $'O\u011Fuz' | grep -E '^.{4}$'
    Oğuz
    $ echo $'O\u011Fuz' | rg '^.{4}$'
    Oğuz
On LC_ALL=en_US.ISO-8859-1:

    $ echo $'O\xF0uz' | grep -E '^.{4}$'
    O�uz
    $ echo $'O\xF0uz' | rg '^.{4}$'
It strangely doesn't find anything at all:

    $ echo $'O\xF0uz' | rg '^.*$' | wc -c
    0
It only does once the $ anchor is removed:

    $ echo $'O\xF0uz' | rg '^.*' | wc -c
    5


It's not strange because ripgrep doesn't understand non-UTF-8 data (unless there's a UTF-16 BOM, in which case, ripgrep will automatically understand it). But you can tell it to:

    $ echo $'O\xF0uz' | rg -E iso-8859-1 '^.{4}$'
    Oðuz
The person you're responding to has been trolling in this thread (and others) by twisting words and claiming multiple false things. When I've fixed their errors, they don't acknowledge them as mistakes and just keep on twisting words.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: