Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Postman's Sort


August 1992/The Postman's Sort/Sidebar

psort Technical Reference


Description

psort sorts lines of the standard input file and writes the result on the standard output. The default key is the entire line. Default ordering is lexicographic by bytes in machine collating sequence.

Synopsis of Usage

psort [-t <dir>]
[-s <record size>]

[ [-k <keys>] ] [ [-f <range>]
[-c <range>]... ]...

-t Use the following name for the temporary directory. The program takes the default from environment variable %TMP%

-s Record size. If you specify a range (eg. 10-124), records will be of variable length delimited by a newline. If the program encounters a record larger than the indicated maximum size, it will terminate with an error message. If you specify a single value, records will be fixed in length and the newline will have no special significance. If you specify no record size, the program assumes variable-length records up to 511 bytes.

-k Specify the collating sequence for the subsequently specified sorting fields. You specify the collating sequence as one or more ranges of values. The program assigns characters a collating sequence in order of their specification. For example, to sort lower-case alphabetic characters use -k 'a'-'z'. Any characters not specified will be assigned a collating value of 0. Characters in a field beyond a character with a 0 collating value will not be included within the sorted field. Hence, either of records b<0>a or b<0>c may precede the other in the output file. Within a key specification, any number of ranges may be specified. For example, if the sorting field will contain any combination of lower-case letters, digits, and spaces, use -k ' ' '0'-'9' 'a'-'z'. Spaces will sort before digits, which will sort before lower-case letters. If no key-collating sequence is specified, the program uses a default collating sequence of all printable ASCII characters.

-r Repeat previous collating sequence. For example, to fold upper-case letters to lower-case letters for purposes of determining sorting priority use-k 'a'-'z' -r 'A'-'Z'. This would assign the first character following the -r the same collating value as the first one assigned in the previous range. To give varying white space characters equal weight use -k ' ' -r '\t' -r '_'.

-n Numeric sort on the key. This is an alternative to -k. Numeric fields may contain a leading sign and/or decimal point. Numeric fields should look like [' ']...[+\-]['0'-'9']...[.]['0'-'9']... Any non-numeric characters will terminate the field. The field will be sorted by numeric value.

-i Invert the sequence of the sort for the last key specified.

-u Output only records that are unique according to the sorting key fields.

-f Sort on one or more fields. Fields are groups of characters separated by a delimiter character. Fields are numbered starting at 0. That is, -f 0 refers to the start of the record. A field specification may contain a range of fields as in -f 2-4 to indicate that sorting sequence is to be determined on the basis of the third, fourth, and fifth fields in turn. A range must have a definite end, i.e., -f 2- is not permitted. A field range need not be increasing, i.e., -f 3-2 is permited and will sort first by the fourth then by the third field.

-c Sort one or more characters within the indicated fields. Start counting character positions from 0. For example, -f 1 -c 2-3 would sort on the third and fourth characters of the second field. Several character ranges may be specified for a given field. For example, -f 2 -c 5-6 -c 3-4 -c 1-2 would specify three sorting fields of two characters each, with the third delimited field. When specifying a character range within a field, the second number must be greater or larger than the first, i.e., -c 7-3 cannot be used. An indefinite character range can be specified as in -c 4- . This will indicate all characters starting with the fifth to the end of the field.

<range> Specify ranges of fields, displacements within a field, and collating values. The common syntax is <start>[-[<end>]] . <start> indicates a single value. <start> - indicates a range beginning at <start> to a large number. For example -f 2- would be used to specify all fields after the second. <start>-<end> indicates a range of fields. The start and end number can be in a number of formats: simple decimal numbers, numbers starting with 0 are taken to be octal, numbers starting with 0x are taken to be hexadecimal, and characters within apostrophes are converted to ASCII. Hence -k ' ' 'a'-'z' and -k 0x20 'a'-122 are equivalent.

-d The following character is the field delimiter. For example, -d '|' . The default field delimiter is a tab (0x09).

If no sorting fields are specified, the whole record is taken as a sorting field. Sorting proceeds according to the precedence indicated by the sequence of the sorting fields. Records with the same sorting fields will be output in an unpredictable sequence.

Remember that characters not specified within a collating sequence are taken as collating value zero. This can result in unexpected behavior when fields are not the same length. Following is the result of sorting a small file with -k 'z'-'a'.

def
cad
basdf
a
aa
This was probably not the result intended. To get the desired result, use-k 'a'-'z' -i.

def
cad
basdf
aa
a
The following switches normally need not be used. They are included for purposes of fine tuning and debugging.

-m Maximum memory in kilobytes to be allocated for sort. This can be useful in a multitasking environment so that psort doesn't suck up all the memory available in the system.

-b Specify the size of the buffers used for standard input/output in kilobytes. The default size is 30.

-sb Specify the size of the output buffer used to write data to the temporary file. Current value is 30.

-rb Specify the size of the input buffer used to read data from the temporary file. Current value is 30.

-v Specify visible mode. This displays statistics on each distribution pass in the file. It is useful for debugging and fine tuning. If only the top levels of distribution are desired use -v <number of levels>.

-l Length of segment used by internal storage in kilobytes. Current value is 16.

© 1991 by Robert Ramey, all rights reserved


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.