notebook/notes/posix/awk/regexp.md

4.2 KiB

title TARGET DECK FILE TAGS tags
Regular Expressions Obsidian::STEM linux::cli posix::awk regexp
awk

Overview

Most awk patterns are regular expressions delimited with /. We can use ~ and !~ to perform more complicated regexp filtering:

# Matches any line with `li` somewhere.
$ awk '/li/' data
$ awk '$0 ~ /li/' data
# Matches any line with `li` somewhere in the first field.
$ awk '$1 ~ /li/' data

awk's implementation of regexps are a superset of posix/regexp.

%%ANKI Basic What is the result of the following?

$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'

Back: <A>bcd Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How is the following equivalently written using ~?

$ awk '/li/' data

Back:

$ awk '$0 ~ /li/' data

Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What operator is used for regexp matching? Back: ~ Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What operator is used for regexp non-matching? Back: !~ Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How do we write a pattern where the second field matches regexp /li/? Back:

$ awk '$2 ~ /li/' {...}

Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze In awk, /.../ is to a {regexp} constant whereas "..." is to a {string} constant. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How are string constants processed differently from regexp constants? Back: The string constant is scanned twice. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What term describes a regexp that isn't a regexp constant? Back: A dynamic regexp. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How is * escaped in a regexp constant? Back: /\*/ Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How is * escaped in a string constant (dynamic regexp)? Back: "\\*" Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic Why is it recommended to avoid using ^ and $$ in RS? Back: These anchors match the beginning and end of a string, not of a line. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

References