Friday, September 19, 2014

Tip #3 [sed]: Group capturing

Suppose you have the following text file (let's name it file.txt)

date:2014-01-01 value [10cm]
date:2014-01-02 value [11cm]
date:2014-01-03 value [15cm]
date:2014-01-04 value [19cm]

and you want to strip the date and the numeric vale, like so

2014-01-01 10
2014-01-02 11
2014-01-03 15
2014-01-04 19

You can easily achieve this in the command line using group capturing with sed:

sed -r 's/.*:(....-..-..) value \[(.*)cm\]/\1 \2/g' file.txt

sed captures the first and second group (defined by the use of parenthesis) and prints them out with the identifiers \1 and \2.

The '-r' option allows to use extended regexp expressions. However, this option is usually for Linux distros. If you are using Mac OS X, you will probably have to use the option '-E' instead. If in doubt, check the help by typing man sed.

You can have more information on group capturing with regular expressions here.

No comments:

Post a Comment