|
[Next] [Previous] [Up] [Top] [Contents]
7.2 Text Processing Commands 7.2.3 awk, nawk, gawkawk is a pattern scanning and processing language. Its name comes from the last initials of the three authors: Alfred. V. Aho, Brian. W. Kernighan, and Peter. J. Weinberger. nawk is new awkawk searches its input for patterns and performs the specified operation on each line, or fields of the line, that contain those patterns. You can specify the pattern matching statements for awk either on the command line, or by putting them in a file and using the -f program_file option. Syntax awk program [file] where program is composed of one or more: pattern { action } fields. Each input line is checked for a pattern match with the indicated action being taken on a match. This continues through the full sequence of patterns, then the next line of input is checked. Input is divided into records and fields. The default record separator is <newline>, and the variable NR keeps the record count. The default field separator is whitespace, spaces and tabs, and the variable NF keeps the field count. Input field, FS, and record, RS, separators can be set at any time to match any single character. Output field, OFS, and record, ORS, separators can also be changed to any single character, as desired. $n, where n is an integer, is used to represent the nth field of the input record, while $0 represents the entire input record. BEGIN and END are special patterns matching the beginning of input, before the first field is read, and the end of input, after the last field is read, respectively. Printing is allowed through the print, and formatted print, printf, statements. Patterns may be regular expressions, arithmetic relational expressions, string-valued expressions, and boolean combinations of any of these. For the latter the patterns can be combined with the boolean operators below, using parentheses to define the combination: || or && and ! not Comma separated patterns define the range for which the pattern is applicable, e.g.: /first/,/last/ selects all lines starting with the one containing first, and continuing inclusively, through the one containing last. To select lines 15 through 20 use the pattern range: NR ==index.html 15, NR == 20 Regular expressions must be enclosed with slashes (/) and meta-characters can be escaped with the backslash (\). Regular expressions can be grouped with the operators: | or, to separate alternatives + one or more ? zero or one A regular expression match can be either of: ~ contains the expression !~ does not contain the expression So the program: $1 ~ /[Ff]rank/ is true if the first field, $1, contains "Frank" or "frank" anywhere within the field. To match a field identical to "Frank" or "frank" use: $1 ~ /^[Ff]rank$/
Relational expressions < less than <=index.html less than or equal to ==index.html equal to >=index.html greater than or equal to !=index.html not equal to > greater than Offhand you don't know if variables are strings or numbers. If neither operand is known to be numeric, than string comparisons are performed. Otherwise, a numeric comparison is done. In the absence of any information to the contrary, a string comparison is done, so that: $1 > $2 will compare the string values. To ensure a numerical comparison do something similar to: ( $1 + 0 ) > $2 The mathematical functions: exp, log and sqrt are built-in.
Some other built-in functions include: index(s,t) returns the position of string s where t first occurs, or 0 if it doesn't length(s) returns the length of string s substr(s,m,n) returns the n-character substring of s, beginning at position m Arrays are declared automatically when they are used, e.g.: arr[i] =index.html $1 assigns the first field of the current input record to the ith element of the array. Flow control statements using if-else, while, and for are allowed with C type syntax: for (i=1; i <=index.html NF; i++) {actions} while (i<=NF) {actions} if (i<NF) {actions} Common Options -f program_file read the commands from program_file -Fc use character c as the field separator character Examples % cat filex | tr a-z A-Z | awk -F: '{printf ("7R %-6s %-9s %-24s \n",$1,$2,$3)}'>upload.file cats filex, which is formatted as follows: nfb791:99999999:smith 7ax791:999999999:jones 8ab792:99999999:chen 8aa791:999999999:mcnulty changes all lower case characters to upper case with the tr utility, and formats the file into the following which is written into the file upload.file: 7R NFB791 99999999 SMITH 7R 7AX791 999999999 JONES 7R 8AB792 99999999 CHEN 7R 8AA791 999999999 MCNULTY
Introduction to Unix - 14 AUG 1996 [Next] [Previous] [Up] [Top] [Contents]
|