Thursday, April 4, 2013

Parsing columns in BASH

I sometimes need "cut" functionality that works for fields with more than a single space. For instance, I sadly have a script just to kill nepomukindexer because it constantly goes berzerk and uses all my 8gb of ram. 
So I need to parse the process IDs of all the processes that contain the word nepomukindexer (there are hundreds at times).

First try

I figured cut would do the trick.
ps -ef | grep nepomukindexer | cut -d " " -f 5 
This only works if there are exactly 4 spaces between the two fields, and that isn't always the case.

Second try

I still don't understand why sed doesn't seem to work with the "+" regular expression symbol the way I expect. The following seems like it should replace all instances of one or more spaces with a single space, but it does nothing.
ps -ef | grep nepomukindexer | sed 's/ +/ /g'
Edit: I just found a workaround for this. I changed this to
ps -ef | grep nepomukindexer | sed 's/  */ /g'
                                      ^^ two spaces
and it worked, the only difference being the matching part which says match a single space followed by 0 or more additional spaces, and replace the whole mess with a single space.

Third try

The solution I found is to use awk.
ps -ef | grep nepomuk | awk -F" " '{print $2}'
Using this command, I pipe the results through kill and all is well.

No comments:

Post a Comment