notebook/notes/gawk/variables.md

15 KiB

title TARGET DECK FILE TAGS tags
Variables Obsidian::STEM linux::cli gawk
gawk

Overview

Variables are defined like var=val. They can be specified in two different places:

  1. Via the -v command line flag. Using this allows accessing the variable value from within a BEGIN rule.
  2. In the file list. Using this allows accessing the variable value in all subsequent file processing.

%%ANKI Basic Where in an awk invocation can variables be assigned? Back: As a -v argument or in the file list. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic The -v flag was introduced to accommodate what functionality? Back: Accessing variables from a BEGIN rule. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic Describe what the following command does in in a single sentence:

$ awk 'program' pass=1 data pass=2 data

Back: Evaluates 'program' against the data file twice with a different value of pass on each run. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How is stdin specified in awk's file list? Back: - Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

Predefined Variables

There exists a number of useful predefined variables:

  • NR (Number of Records)
    • The 1-indexed number of records so far read.
    • The count includes the current record.
  • FNR (File Number of Records)
    • The 1-indexed number of records so far read from the current file.
    • The count includes the current record.

%%ANKI Cloze The {NR} variable specifies the {number of read input records}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze The {FNR} variable specifies the {number of read input records for the current file}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

  • RT (Record Text)
    • The matching separator used to distinguish the currently read record.

%%ANKI Cloze The {RT} variable matches the {input characters that matched RS}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic Barring the final record, when is RT always equal to RS? Back: When RS is a string containing a single character. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

  • RS (Record Separator)
    • The separator used to distinguish records from one another.
    • Defaults to "\n".
RS == ?? Description
"\n" Records are separated by the newline character. This is the default.
any single character Records are separated by each occurrence of the character. Multiple successive occurrences delimit empty records.
"" Records are separated by one or more blank lines. Leading/trailing newlines in a file are ignored. If FS is a single character, then "\n" also serves as a field separator.
regexp Records are separated by occurrences of characters that match regexp. Leading/trailing matches delimit empty records.

%%ANKI Cloze The {RS} variable is used to change the {record separator}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What is the default value of RS? Back: "\n" Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze If RS is a string with {more than one character}, it is treated as a {regexp}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What value of RS may gawk not process correctly? Back: A regexp with optional trailing part, e.g. AB(XYZ)?. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What implementation detail inspires avoiding RS = "\0"? Back: Most awk implementations store strings internally as C-style strings. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What equivalent assignment do most awk implementations interpret RS = "\0" as? Back: RS = "" Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How is RS = "" interpreted? Back: "" indicates one or more blank lines should be treated as the record separator. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What distinguishes RS value "" and \n\n+? Back: When set to the former, awk strips leading/trailing newlines from the file. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What distinguishes RS value "" and \n? Back: The former separates on one or more blank lines, not just a newline character. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What regexp is closest to mirroring RS = "" behavior? Back: \n\n+ Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze If RS = "" and FS is set to {a single character}, the {newline character} always acts as a field separator. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

  • NF (Number of Fields)
    • The 1-indexed number of fields found in the current record.

%%ANKI Basic What is the arithmetical value of ${NF + 1}? Back: 0 Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What is the printed value of ${NF + 1}? Back: "" Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What value is $${NF + 1} given when we run ${NF + 2} = "test"? Back: "" Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze The {NF} variable specifies the {number of fields in the current record}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What two things does incrementing NF do? Back: Creates the field and rebuilds the record. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What two things does decrementing NF do? Back: Throws away fields and rebuilds the record. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

  • FS (Field Separator)
    • The separator used to distinguish fields from one another.
FS == ?? Description
" " Fields are separated by runs of whitespace. Leading/trailing whitespace is ignored. This is the default.
any other single character Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading/trailing occurrences.
"\n" Specific instance of the above row. It is used to treat the record as a single field (assuming newlines separate records).
regexp Fields are separated by occurrences of characters that match regexp. Leading/trailing matches delimit empty fields.
"" Each individual character in the record becomes a separate field.

%%ANKI Cloze The {FS} variable is used to change the {field separator}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze {FS} is to awk as {IFS} is to Bash. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf END%%

%%ANKI Basic What is the default value of FS? Back: " " Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What value of FS is specially handled? Back: " " Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic How is FS = " " interpreted? Back: As a contiguous sequence of spaces, tabs, and newlines. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What distinguishes FS value " " and [ \t\n]+? Back: When set to the former, awk strips leading/trailing whitespace from each record. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze Setting FS to {""} allows examining {each character of a record separately}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Cloze If RS has its default value, setting FS to {"\n"} treats the {record as the single field}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic What value of FS ensures $1 = $0? Back: RS Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

%%ANKI Basic Why does awk support a CSV mode? Back: Because CSV fields may contain commas and newlines. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

  • OFS (Output Field Separator)
    • Specifies the field separator used on printing.

%%ANKI Cloze The {OFS} variable is used to change the {output field separator}. Reference: Robbins, Arnold D. “GAWK: Effective AWK Programming,” October 2023. https://www.gnu.org/software/gawk/manual/gawk.pdf

END%%

References