match regexp help

Discuss Programming
worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

match regexp help

Post by worker201 » Tue Aug 02, 2005 4:59 pm

I have a text file that looks kinda like this:
adbathyqc.grd: Title: surface
adbathyqc.grd: Command: surface adbathyqc.xyz -R-92:06/-91:53/28:02/28:08 -I0.25c/0.25c -Gadbathyqc.grd -V
adbathyqc.grd: Remark:
adbathyqc.grd: Normal node registration used
adbathyqc.grd: grdfile format # 0
adbathyqc.grd: x_min: -92.1 x_max: -91.8833 x_inc: 6.94444e-05 units: user_x_unit nx: 3121
adbathyqc.grd: y_min: 28.0333 y_max: 28.1333 y_inc: 6.94444e-05 units: user_y_unit ny: 1441
adbathyqc.grd: z_min: -112.58 z_max: -60.9849 units: user_z_unit
adbathyqc.grd: scale_factor: 1 add_offset: 0
What I want to do is match the -R part of the second line (in this case, -R-92:06/-91:53/28:02/28:08), and use it as a variable in a shell script. I think I could do this with sed, but I don't have my trusty sed/awk manual with me right now. Apparently, this sort of thing can be done in Perl, but I don't know Perl, and don't have time to learn it at the moment.

So some sort of shell script subroutine is what I need. Anyone got any pointers or ideas?

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Tue Aug 02, 2005 7:45 pm

This stuff is SO easy in Perl compared to trying to do it in a shell script. But if you must:

Code: Select all

$ grep Command adbathyqc.grd | sed "s/.*-R\(.*\) -I.*/\1/"
Or if you wanted to stick the result of that into a variable:

Code: Select all

MYVAR="`grep Command adbathyqc.grd | sed \"s/.*-R\(.*\) -I.*/\1/\"`"

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 12:47 pm

That works, alright. But I would like to understand why it works, for next time. The grep part of the command I understand - it is searching for the line Command in the file. But the sed part is losing me, especially the "s/ part. Can you help me break this down.

I swear, I'm going to learn perl eventually :oops:

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Wed Aug 03, 2005 1:34 pm

The same command works in Perl (very nearly the same anyway). Basically it's just a "search and replace" using regular expressions. This is the search and replace:

s///

You put the pattern in between the first and second "/" and put what you want to change any matching text to in between the second and third "/". There are switches you can add to that like"

s///g

Which means don't just change the first occurance change every occurance that matches your pattern (global). You can also make it case insensitive by adding an "i":

s///i

Actually the sed man page describes this more clearly:

Code: Select all

       s/regexp/replacement/
              Attempt  to  match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the
              special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding
              matching sub-expressions in the regexp.
Now the magical part of all of this are the regular expressions. These things look very intimidating but let me tell you they are not hard to learn and they are SOOOOO powerful and cool it will definitetly not be a waste of time to learn the basics. Here's a nice site to get you started:

http://www.regular-expressions.info/

So basically in what I gave you the ".* -R" which is all characters up to and including the " -R" will get wiped out. Anything after the "-R" and up to the space and the -I " -I" signified by the parens with the wildcard to match any number of any character "(.*)". The part that matches the "(.*)" is in this case the first match and "\1" will get replaced with this match. That's not a very good explanation but once you experiment with them it will all become clear.

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 2:14 pm

I'm going to try to break this down, and make sure that I know what each character is doing. I've used _ to represent a space. First, I give the bit of the command, and then my interpretation of what it is doing.


MYVAR=
# assign the following value to $MYVAR
"
# beginning of what goes into the variable
'
# beginning of the grep command
grep
# the finding tool
Command
# the pattern grep is searching for
adbathyqc.grd
# the file grep is searching in
|
# send that output to...
sed
# the stream editor
\"
# escape the dbl-quote, so that we know it isn't the end of the var defintion
s/
# begin the sed search
.*-R
# match at least one character and then anything up to -R
\(.*\)
# no idea - this is the part that confuses me most
_-I.*
# match at least one character and anything else after _-I
/
# end of the match pattern, beginning of replace pattern
\1
# what's left after the match? does this get thrown onto a stack or something?
/
# end sed s&r
\"
# escape dbl-quote to end sed
'
# end grep
"
# end variable assignment

Is this about right? Can you help me fill in the gaps? I know a little bit about this stuff, but your code goes beyond my familiarity.

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Wed Aug 03, 2005 2:46 pm

You are really close. The important part is the part that is matched within the parens \(.*\) is the part that replaces the \1. Everything else just gets thrown out. So think of it as picking out the part between the -R before the parens and the " -I" after the parens. In regex a space is actually signified by "\s" so I could have used that instead of " ". Also, ".*" actually matches 0 or more of any character, not 1 or more. Other than that you have it. The easiest way to play with this is just by piping echo statements to sed:

Code: Select all

echo "This is a test" | sed "s/ is / was /"
Results in "This was a test". Notice I put spaces (could have used "\s" to represent those spaces) around the word "is" and "was". If I didn't do that it would have matched the "is" in the word "This" and changed it to "Thwas is a test". Not what I was looking for. If you want to use part of the matched original text as part of the replacement you have to use the parens and the \1, \2, \etc syntax. For instance:

Code: Select all

echo "This is a test" | sed "s/ \(is\) / \1n\'t /"
Results in "This isn't a test".

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 3:08 pm

Weird. Because . matches a single character. So you would think that .* would match a single character followed by whatever. But I guess the * modifies the ., changing its meaning.

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Wed Aug 03, 2005 3:27 pm

worker201 wrote:Weird. Because . matches a single character. So you would think that .* would match a single character followed by whatever. But I guess the * modifies the ., changing its meaning.
Exactly, memorize this page:

http://www.regular-expressions.info/reference.html

It will all be clear after that.

If you wanted one or more of any character you would ".+", zero or more is ".*".

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 3:54 pm

Next problem:

at the command-line, this works:
% echo "adpoly.xy" | sed "s/\(.*\)poly.xy/\1/"

it produces the output "ad", which is what I wanted.

Now, what I want to do is pass $1 (first script invocation argument) to this sed statement, and then produce a name out of it. Does that make any sense? I want to give XXpoly.xy on the command-line, and have the shell script read what the XX is, and then create a file called XX.info. What I have so far doesn't quite do it:
#!/bin/bash
# command line variables
POLY=$1
GRID=$2
#
PREF="'echo $POLY | sed \"s/\(.*\)poly.xy/\1/\"'"
echo $PREF
infofile=$PREF.info
grdinfo $GRID > infofile
RANGE="`grep Command infofile | sed \"s/.*-R\(.*\) -I.*/\1/\"`"
echo $RANGE
Once I can figure out how to extract the "prefix" and use it to make filenames, then I have lots more fun to add to this. But I have to get this part working first.

Here's the output:
[lholcombe@holcombe2 histograms]$ ./test.script adpoly.xy adbathyqc.grd
'echo adpoly.xy | sed "s/\(.*\)poly.xy/\1/"'
-92:06/-91:53/28:02/28:08
So it works! But it is creating an extremely inconvenient filename!

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Wed Aug 03, 2005 5:50 pm

You are using single quotes instead of left quotes in your PREF= statement. It should be this:

Code: Select all

PREF="`echo $POLY | sed \"s/\(.*\)poly.xy/\1/\"`"
not this:

Code: Select all

PREF="'echo $POLY | sed \"s/\(.*\)poly.xy/\1/\"'"

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 5:57 pm

dammit!

it's hard to tell the difference between an apostrphe and a tick. I think the keyboard needs a couple more symbols.

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Wed Aug 03, 2005 6:28 pm

copy/paste never gets it wrong. :)

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 6:28 pm

Things are going well now. Just one more thing, and I think I might have it.

I've been using this kind of construction to build filenames:
infofile=$PREF.info
which in the case of $PREF=ad, produces ad.info. Simple enough, and somewhat effective.

What if I wanted to construct adinfo.info? How would that work? Would sed do this? Or is there some simpler way?

User avatar
Void Main
Site Admin
Site Admin
Posts: 5712
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Wed Aug 03, 2005 6:34 pm

You mean like "${PREF}info.info" or am I missing something?

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Wed Aug 03, 2005 7:12 pm

Yeah, that would be the one. Care to comment on why that works?

Post Reply