--- title: Regular Expressions TARGET DECK: Obsidian::STEM FILE TAGS: linux::cli posix::awk regexp tags: - awk --- ## Overview Most `awk` patterns are regular expressions delimited with `/`. We can use `~` and `!~` to perform more complicated regexp filtering: ```bash # Matches any line with `li` somewhere. $ awk '/li/' data $ awk '$0 ~ /li/' data # Matches any line with `li` somewhere in the first field. $ awk '$1 ~ /li/' data ``` `awk`'s implementation of regexps are a superset of [[posix/regexp|POSIX EREs]]. %%ANKI Basic What is the result of the following? ```bash $ echo aaaabcd | awk '{ sub(/a+/, ""); print }' ``` Back: `bcd` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic How is the following equivalently written using `~`? ```bash $ awk '/li/' data ``` Back: ```bash $ awk '$0 ~ /li/' data ``` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic What operator is used for regexp matching? Back: `~` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic What operator is used for regexp non-matching? Back: `!~` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic How do we write a pattern where the second field matches regexp `/li/`? Back: ```bash $ awk '$2 ~ /li/' {...} ``` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Cloze In `awk`, `/.../` is to a {regexp} constant whereas `"..."` is to a {string} constant. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic How are string constants processed differently from regexp constants? Back: The string constant is scanned twice. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic What term describes a regexp that isn't a regexp constant? Back: A dynamic regexp. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic How is `*` escaped in a regexp constant? Back: `/\*/` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic How is `*` escaped in a string constant (dynamic regexp)? Back: `"\\*"` Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% %%ANKI Basic Why is it recommended to avoid using `^` and `$$` in `RS`? Back: These anchors match the beginning and end of a string, not of a line. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf) END%% ## References * Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. [https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf)