15 KiB
title | TARGET DECK | FILE TAGS | tags | |
---|---|---|---|---|
Regular Expressions | Obsidian::STEM | regexp |
|
Overview
The following ERE (Extended Regular Expression) operators were defined to achieve consistency between programs like grep
, sed
, and awk
. In POSIX, regexps are greedy.
%%ANKI Cloze Regular expressions are either {greedy} or {lazy}. Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI Basic Are POSIX regexps greedy or lazy? Back: Greedy. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI Basic What does it mean for a regexp to be greedy? Back: The regexp matches as many characters as it can. Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI Basic What does it mean for a regexp to be lazy? Back: The regexp matches as few characters as it can. Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
What is the POSIX ERE standard?
Back: The Extended Regular Expression standard. A standard based off of regexps accepted by egrep
.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
.
matches any single character.- There exist application-specific exclusions. For instance, newlines and the
NUL
character are often ignored.
- There exist application-specific exclusions. For instance, newlines and the
%%ANKI
Cloze
The {.
} operator matches {any single character}.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
What two common exclusions are made with .
?
Back: Newlines and the NUL
character.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
[...]
, the bracket expression, matches any enclosed character.- An optional
-
can be included to denote a range. -
is treated literally if its the first or last specified character.]
is treated literally if its the first specified character.^
complements the set if its the first specified character.
- An optional
%%ANKI
Basic
What name is given to the [...]
operator?
Back: The bracket expression.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
What three characters are interpreted specially in a bracket expression?
Back: ^
, -
, and ]
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
When is -
interpreted literally in a bracket expression?
Back: When it is the first or last specified character.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
When is ^
interpreted literally in a bracket expression?
Back: When it is not the first specified character.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
When is ]
interpreted literally in a bracket expression?
Back: When it is the first specified character.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
^
is the leading anchor. It matches the starting position of a string.$
is the trailing anchor. It matches the ending position of a string.
%%ANKI
Cloze
The {^
} operator matches {the starting position of a string}.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Cloze
The {$$
} operator matches {the ending position of a string}.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
^
and $$
belong to what operator category?
Back: Anchors
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
*
matches the preceding element zero or more times.+
matches the preceding element one or more times.?
matches the preceding element zero or one times.
%%ANKI
Basic
What does the *
operator do?
Back: Matches the preceding element zero or more times.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
How is the *
operator written equivalently as an interval expression?
Back: {0,}
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
What does the +
operator do?
Back: Matches the preceding element one or more times.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
How is the +
operator written equivalently as an interval expression?
Back: {1,}
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
What does the ?
operator do?
Back: Matches the preceding element zero or one times.
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
%%ANKI
Basic
How is the ?
operator written equivalently as an interval expression?
Back: {0,1}
Reference: “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
END%%
{n}
, an interval expression, matches the preceding elementn
times.{n,}
matches the preceding element at leastn
times.{n,m}
matches the preceding element betweenn
andm
times.- Interval expressions cannot contain repetition counts
> 255
. Results are otherwise undefined.
%%ANKI
Basic
What name is given to the e.g. {n,m}
operator?
Back: The interval expression.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
What does the {n}
operator do?
Back: Matches the preceding element n
times.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
What does the {n,}
operator do?
Back: Matches the preceding element at least n
times.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
What does the {n,m}
operator do?
Back: Matches the preceding element between n
and m
times.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
What interval expression repetition counts lead to undefined behavior?
Back: Counts greater than 255
.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
|
is the alternation operator. It allows specifying match alternatives.
%%ANKI
Basic
What name is given to the e.g. |
operator?
Back: The alternation operator.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
What does the |
operator do?
Back: Matches different regexp alternatives.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
Which regexp operator has the least precedence?
Back: |
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
Character Classes
Notation for describing a class of characters specific to a given locale/character set.
%%ANKI Basic What portability issue do character classes introduce? Back: Matching characters are dependent on locale/character set. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
How are character classes denoted?
Back: [:class:]
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
Class | Similar To | Meaning |
---|---|---|
[:alnum:] |
[A-Za-z0-9] |
Alphanumeric characters |
[:alpha:] |
[A-Za-z] |
Alphabetic characters |
[:blank:] |
[ \t] |
' ' and TAB characters |
[:cntrl:] |
Control characters | |
[:digit:] |
[0-9] |
Numeric characters |
[:graph:] |
[^ [:cntrl:]] |
Printable and visible characters |
[:lower:] |
[a-z] |
Lowercase alphabetic characters |
[:print:] |
[ [:graph:]] |
Printable characters |
[:punct:] |
All graphic characters except letters and digits | |
[:space:] |
[ \t\n\r\f\v] |
Whitespace characters |
[:upper:] |
[A-Z] |
Uppercase alphabetic characters |
[:xdigit:] |
[0-9A-Fa-f] |
Hexadecimal digits |
%%ANKI Basic Generally speaking, what is a printable character? Back: Characters that can be displayed on screen or printed on paper. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
Is 'a'
(i.e. the letter a) printable and/or visible?
Back: It is printable and visible.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
Is ' '
(i.e. the space character) printable and/or visible?
Back: It is printable but not visible.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
%%ANKI
Basic
Is '\t'
(i.e. the tab character) printable and/or visible?
Back: It is neither printable nor visible.
Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf
END%%
References
- “POSIX Basic Regular Expressions,” accessed February 4, 2024, https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions.
- Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf