A field extractor command line utility
Do you use lots of shell pipelines and find yourself choosing between cut and awk to select columns from input? pk is designed as a middle-ground tool; flexible enough to handle variable numbers of delimiters, fixed format files, quoted or escaped delimiters and more.
Usage: pk [OPTION...] [STRING...] A field extraction utility -b, --backslash Backslash escapes delimiters -d, --delimiters=STRING Characters used as input delimiters -e, --empty Allow empty fields -E, --excludes[=STRINGS] Strings excluded from output (separated by :) -f, --file=FILE Read input from file instead of stdin -N, --null[=STRING] Change output text used for empty fields -q, --quotes[=STRING] Ignore delimiters within quotes -S, --separator=STRING Separator used in output text -T, --trim Trim non-alphanumerics characters before printing -?, --help Give this help list --usage Give a short usage message -V, --version Print program version
It may help to remember that when using short flags, lower-case flags will affect how the input is processed, for example, changing the field delimiters. Upper-case flags will affect the output of pk, for example, the output separator or whether tokens are trimmed of alphanumerics.
pk is a tool for grabbing columns from an input stream and printing them on stdout. Each argument represents something you would like to see appearing in the output.
If an argument string is a positive integer then it is a request to see that column from the input printed in the output. The first column in the input stream has an index of 1.
$ df | pk 6 5 Mounted Use% / 9% /dev 1% /run 1% /run/lock 0% /run/shm 0%
An argument string can also represent a range of fields. The following range formats are supported:
|X..Y||Print each field from index X through to index Y|
|X..||Print each field from index X through to the last field|
|..Y||Print each field from field 1 through to field Y|
|..||Print every field|
Here are examples of each type of range:
$ echo A B C D E | pk 2..4 B C D $ echo A B C D E | pk 2.. B C D E $ echo A B C D E | pk ..3 A B C $ echo A B C D E | pk .. A B C D E
Finally, any argument that does not appear to be an index or a range is outputted directly. This can be useful for putting together command lines or quick scripts:
$ ls -l /etc | pk 9 is owned by 3 adduser.conf is owned by root alternatives/ is owned by root apparmor/ is owned by root
In its default mode of operation pk uses both
<space> as delimiter characters. Additionally, multiple delimiters
appearing next to each other in the input stream are treated as
a single delimiter. This makes it good for parsing input streams
where the columns are separated by varying amounts of whitespace.
Many system tools such as ps, df, last, etc. use this kind
By using the -d flag you can change the set of delimiters. Note that the string argument to -d contains a set of characters, each of which is a delimiter. This is a not a fixed string that separates each field.
By default the field specified for printed will be outputed separated by single spaces. Use the -S flag to change the string that separates output fields:
$ ps aux | pk -S, 1 2 | head USER,PID root,1 root,2 root,3
Fixed formats, such as /etc/passwd, can have different delimiters and possibly contain empty fields. Using the -e flag tells pk that adjacent delimiter characters in the input line represent empty fields.
$ cat /etc/passwd | pk -e -d: 1 7 root /bin/bash daemon /bin/sh bin /bin/sh sys /bin/sh
If you tell pk to print an empty field it will print out the string NULL by default. This can be set to another string using the -N flag. If this flag is used without an argument then empty fields are not printed.
Note that pk will not print trailing empty fields unless you specify them directly using their index. i.e. trailing empty fields are not printed if specified as a range.
The -T flags trims non-alphanumeric characters from the left and right side before printing the field to stdout. This is useful removing quotes, parentheses or other visual delineations.
$ echo "'Example User' <firstname.lastname@example.org>" | pk -T 3 email@example.com
Some text formats allow delimiters to be ignored inside quotes. pk supports simple quoting, that is you can select a single quote character or a pair (start and end) of characters. Nested quotes or multiple types of quoting on the same line are not supported.
Use the -q flag to enable quoting support. Double quote is the default quote character:
$ cat input "Bilbo Baggins", "The Hobbit" "Frodo Baggins", "The Lord of the Rings" $ pk -f input -d", " -q 2 "The Hobbit" "The Lord of the Rings"
The quote character can be changed by supplying a one character argument to the -q flag:
$ cat input 'Bilbo Baggins', 'The Hobbit' 'Frodo Baggins', 'The Lord of the Rings' $ pk -f input -d", " -q"'" 1 'Bilbo Baggins' 'Frodo Baggins'
A two character argument supplied to the -q flag is used to specify the open and close quote characters:
$ cat input (Bilbo Baggins) (The Hobbit) (Frodo Baggins) (The Lord of the Rings) $ pk -f input -q"()" 1 (Bilbo Baggins) (Frodo Baggins) $ pk -f input -q"()" -T 1 Bilbo Baggins Frodo Baggins
A list of strings that will always be excluded from the output can be supplied to pk via the -E flag. An example use case for this is when dealing with lists of servers you may want to automatically strip fully qualified hostnames down to their local names before passing on to another process in the pipeline.
$ cat input foo.example.com 192.168.1.1 active bar.example.com 192.168.1.2 repair baz.example.net 192.168.1.3 active $ cat input | pk -E.example.com:.example.net 1 3 foo active bar repair baz active
Alternatively, as this is likely to be a regular request, it can be set using an environment variable. When using the environment variable the -E flag without arguments can be used to ignore the setting. The -E flag used with arguments will override the environment variable.
$ export PK_EXCLUDES=.example.com:.example.net $ cat input | pk 1 3 foo active bar repair baz active $ cat input | pk -E 1 3 foo.example.com active bar.example.com repair baz.example.net active
Copyright 2012-Present John Morrow
I am providing code in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook).
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.