Lab 7 Preparation

Regular expressions are a powerful way to write a pattern to describe a set of strings. Regular expressions are constructed like arithmetic expressions, using various operators to combine smaller expressions. The utility program grep understands two versions of regular expression syntax: basic (grep) and extended (egrep), which includes the basic set with some alterations.

Match a single character

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Therefore, the regular expression like "abc123" exactly matches that same sequence of characters, and a grep or egrep using it will find any line that contains exactly that string.

A list or range of characters enclosed by [ and ] brackets matches any single character in that list or range; however, if the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit, as does the range [0-9]. On the other hand, [^0-9] matches any character that is not a digit.

There are also named classes of characters predefined for use within brackets, such as [:alnum:], [:alpha:], [:digit:], and more. Look them up in grep(1). As an example, [[:digit:]] means the same as [0-9] -- any single digit.

And finally, the period . matches any single character whatever it is.

Anchors

If you don't anchor your regular expression to some fixed location or value, it will be found anywhere it exists, even as part of something else. The caret ^ (when outside [ and ]) and the dollar sign $ match the beginning and end of a line. The symbols \< and \> match the beginning and end of a word. There are many more special codes and metacharacters for a variety of locations.

Repetition

A basic regular expression may be followed by one of several repetition operators:

* The preceding item will be matched zero or more times.

\{n\} The preceding item is matched exactly n times.

\{n,m\} The preceding item is matched at least n times, but not more than m times, to no limit if m is omitted as in \{n,\}.

An extended regular expression has more:

? The preceding item is optional and matched at most once.

+ The preceding item will be matched one or more times.

{n,m} like the \{n,m\} forms, except the backslashes are omitted.

List

A list of regular expressions separated by the OR operator | matches any string matching any of the sub-expressions. These are often grouped in \( \) parentheses (for basic -- omit backslashes in extended).

Examples

First, get a copy of the full system password file:

ypcat passwd > ypcat-passwd

Now experiment with these and other regular expressions:

grep '^allisor:' ypcat-passwd # display myself

grep '^\(allisor\|ayalac\):' ypcat-passwd # me or another

grep '^.ll' ypcat-passwd # another way

grep ':[SDJ][a-z]' ypcat-passwd # a few profs

Note

Regular expressions can look confusing like file globs, so do not confuse them:

Character

Glob meaning

Reg Exp meaning

?

Any single character

0 or 1 of preceding object

*

Any string of characters

0 or more of preceding object

[ ]

Range or list

Range or list

Password file

The password file (/etc/passwd) no longer contains actual passwords, but it does contain an entry for every valid user of the system. There are 7 fields

  1. userid -- faculty ids are the first 6 letters of the family name plus the personal name's initial letter; student userids are the first 4 letters of the family name plus a 4-digit (or more) number to get an 8-character id.

  2. x (formerly encrypted password)

  3. uid (numeric user id) -- should be unique

  4. gid (numeric group id) -- students are all in group 503, but each faculty member has a group number from 600 to 699.

  5. name or description, can contain spaces; students are normally all upper-case "FAMILY PERSONAL", while faculty are usually "Personal Family".

  6. home directory -- usually /home/userid_for_record

  7. default shell to run

See passwd(5) for more details.