[LUAU] Using SED to extract song, artist, and album from my iTunes Music List Export

Tim Newsham newsham at lava.net
Tue Sep 26 10:59:13 PDT 2006


> I am wanting to generate a human-readable list of my iTunes library.

> I want it to look something like this.  Automagically wrapping quotes around 
> the song title would be idea.
>
> 1. The Happy Organ Baby Dave Cortez Rockin Instrumentals

I cant tell if your file is tab-separated or if they are all aligned
by adding spaces...  It makes a difference in the solution and in
the tools you might use..

If the file is aligned out with fixed-width columns, its really easy to 
get at the fields using the "cut" command.  For example, if you want the 
characters from 6 to 7 an from 11 to 15 on each line:

     $ echo this is a test |cut --output-delimiter=' : ' -c6-7,11-15
     is : test

Cut can't reorder the columns and it cant put quotes around your results
(although it can put things between your results as shown above).

You can also used sed for this situation.  Its actually not that 
complicated, but a little messier.  We use the rule

      's/pattern/repl/g'

to globally replace pattern with repl.  Here each dot is a wild card.  We 
count out as many dots as we want to skip and as many dots as we want to 
capture and put the captures in parenthesis (sed requires the parenthesis 
be quoted with backslashes).  The final '.*' means as many dots 
(wildcards) as needed. That makes up the pattern we want, the second half 
uses \1 and \2 to fill in the part we captured with parenthesis in
the replacement string:

     $ echo this is a test |sed 's/^.....\(..\)...\(....\).*/\1 \2/g'
     is test

or with the fields reversed and quoted:

     $ echo this is a test |sed 's/^.....\(..\)...\(....\).*/"\2" "\1"/g'
     "test" "is"

if on the other hand things are tab separated you can use cut with its
-d flag (sets the field delimiter) and -f flag (pick out fields):

     $ echo 'this<tab>is<tab>a<tab>test' |
       cut --output-delimiter=' : ' -d'<tab>' -f2,4
     is : test

(you might have to type control-v then <tab> to get a tab in your shell).
There's a similar solution using sed but its a bit more complicated than
the previous one, and it turns out to be really easy in awk so I'll
skip the sed solution here and give this instead:


     $ echo 'this<tab>is<tab>a<tab>test' |
       awk -F '<tab>' '{ print $4, $2; }'
     test is

     $ echo 'this<tab>is<tab>a<tab>test' |
       awk -F '<tab>' '{ printf("\"%s\" \"%s\"\n", $4, $2); }'
     "test" "is"

What's going on here is you're telling awk that fields are delimited
by tabs (the -F flag) and giving a little awk script that prints out
the fourth and second fields.  The print command is simple but doesn't
let you format the results very much.  The printf command is more
complicated but gives you more control (here we used it to put
quotes around each field).

If you know perl or python or some other more general scripting language
with good regular expression support, its probably easier and cleaner
to implement what we did above and as others suggested, thats the
way to go.  But, if you don't know one of those languages, learning
a little bit of sed or awk (or even cut) can get the job done and is
a lot easier to pick up than a new (general) language...

> --scott

Hope that helps..

Tim Newsham
http://www.thenewsh.com/~newsham/



More information about the LUAU mailing list