awkage

Discuss Applications
Post Reply
worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

awkage

Post by worker201 » Fri Apr 08, 2005 1:48 pm

I was using awk yesterday, and I ran into a snag that forced me to use a spreadsheet to do the column manipulations. I'll explain, and then you can tell me what I should have done.

Ok, I have a file which has 3 columns, x, y, and z. I needed to edit the z-values, so I used awk to separate the file into 2 parts - an xy and a z. After editing the z values, I wanted to add them to the xy file, thus recreating the original xyz file with new z values. Reasons - I was using gedit to do a find-replace on the z, and didn't want to disturb the xy (this could have been done with a combination of grep and awk, but the find-replace seemed easier).

So
how can I take two columns from one file and add a third column from another file? Awk, being the one program that will always make *nix superior to Windows, no matter what, has got to be able to do this. I just don't know how.

caveman
programmer
programmer
Posts: 130
Joined: Sun Feb 09, 2003 1:08 pm
Location: Midrand Gauteng, South Africa

Post by caveman » Fri Apr 08, 2005 5:01 pm

Now - there are various ways of doing this...

given that the file records are still in the correct order
eg.

use getline to read a record from the second file for every record read
from the first file - something like the following
{
getline line < "second.file"
print $0 "," line
}

or - a little combo programming

create a little awk program that will prefix the record number for
each record and store it in a new file.
Then use "join" to join the two files.
- something like the following to create the new files
{
print NR "," $0
}

and then use join where the default join field is the first
join first.file.pref second.file.pref > combo.file

hope this makes sense 8)

Q - why didn't you use awk from the word go to manipulate the 3rd
field in any case?

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Fri Apr 08, 2005 5:20 pm

Things have changed a bit since the first post. I'll try and give you a little background to what I am doing here.

A bathymetric survey (long, lat, depth) from the '30s was measured in fathoms (6 feet) and feet. Unfortunately, they used a retarded decimal notation to record this. So what looks like 16.5 fathoms is actually 16 fathoms and 5 feet, or 16.8333 fathoms. Which totally screws up the mapping. The replacement algorithm is as follows:
.1 >> .1667
.2 >> .3333
.3 >>.5
.4 >>.6667
.5 >> .8333

Of course those values might occur in the long or lat columns, so a find/replace with gedit wouldn't work. And I had no idea how to do a find/replace in awk. So, I dumped the columns into 2 files, as explained earlier.

Now, I have realized that awk is capable of doing this by itself. Here is my script so far (which does not yet work):

Code: Select all

#! /bin/awk -f
{if ($3 ~ /.5/){
	sub(/.5/, ".8333", $3)
else
if ($3 ~ /.4/)
	sub(/.4/, ".6667", $3)
else
if ($3 ~ /.3/)
	sub(/.3/, ".5", $3)
else
if ($3 ~ /.2/)
	sub(/.2/, ".3333", $3)
else
if ($3 ~ /.1/)
	sub(/.1/, ".1667", $3)
else
	sub(/.0/, ".0", $3)
}
print $1,$2,$3 ARGV > testB}
This script is supposed to read from testA (given as an arg on the command line) and output testB. Never used 'sub()' before, but apparently, you give it the pattern to search for, the replacement, and where to do the replacement. The problem seems to be in my bracketing of the if/else statements, which I cannot seem to find good documentation for. I finally broke down and bought the O'Reilly book, so maybe that will help. FYI, the syntax error seems to come on line 4, the first 'else' statement.

caveman
programmer
programmer
Posts: 130
Joined: Sun Feb 09, 2003 1:08 pm
Location: Midrand Gauteng, South Africa

Post by caveman » Fri Apr 08, 2005 6:16 pm

Hmm - just some quick info
an if-else in awk is like the following
if ( x ==y )
{
# what to do when
# the if is true
}
else
{
# what to do when
# the if is false
}

nesting if-else statements - the brackets are crucial
(i put the brackets vertically aligned with the if-else
{
if ($3 ~ /.5/)
{
sub(/.5/, ".8333", $3)
}
else if ($3 ~ /.4/)
{
sub(/.4/, ".6667", $3)
}
else if ($3 ~ /.3/)
{
sub(/.3/, ".5", $3)
}
else if ($3 ~ /.2/)
{
sub(/.2/, ".3333", $3)
}
else if ($3 ~ /.1/)
{
sub(/.1/, ".1667", $3)
}
else
{
sub(/.0/, ".0", $3)
}
}
hope this helps
NB! - this example will leave xxx.6 as xxx.6 --> xxx.9 as xxx.9

<edit>
look at the source of this page for the correct layout and indenting
I'm having a problem getting the indenting properly done
</edit>

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Fri Apr 08, 2005 6:34 pm

Here's my modified script:

Code: Select all

#! /bin/awk -f
{if ($3 ~ /.5/)
{	sub(/.5/, ".8333", $3)
	}
	else if ($3 ~ /.4/)
	{	sub(/.4/, ".6667", $3)
		}
		else if ($3 ~ /.3/)
		{	sub(/.3/, ".5", $3)
			}
			else if ($3 ~ /.2/)
			{	sub(/.2/, ".3333", $3)
				}
				else if ($3 ~ /.1/)
				{	sub(/.1/, ".1667", $3)
					}
					else
					{	sub(/.0/, ".0", $3)
						}
print $1,$2,$3 > testB}
And here's the error message:
awk: ./testconv:20: (FILENAME=testA FNR=1) fatal: expression for `>' redirection has null string value

what I want to do is put the new values into a new file, called testB. Unfortunately, line 20 above (the print statement) is the only way I know how to do this. Otherwise, this looks fine so far.

BTW, by definition, this file will have no decimal values greater than five, so a case-based is probably the most efficient way to do it. No need to test for .9, because I know that there are none.

caveman
programmer
programmer
Posts: 130
Joined: Sun Feb 09, 2003 1:08 pm
Location: Midrand Gauteng, South Africa

Post by caveman » Fri Apr 08, 2005 6:44 pm

right - a quick reply afore I'm off to bed - it is now 02h47 where I live

your statement
print $1,$2,$3 > testB
means to output to a file defined in variable testB which does not exist
change to the following
print $1,$2,$3 > "testB"
note the quotes
this means output to a file named testB
or
varoutfile = "testB"
print $1,$2,$3 > varoutfile
this means output to a file named testB as defined in variable varoutfile

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Fri Apr 08, 2005 7:03 pm

Okay, the script produces output now. I think I can find the rest of the answers on my own - it is matching .3 and 3., which produces some erroneous garbage. Just need to check on my regexp character.

Thanks for your help, Caveman. One of these days, I'm going to write a graphics program in this language. It is so great - nothing else like it!

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Mon Apr 11, 2005 2:09 pm

Yeah, ran into troubles - the script works...too well. It goes through and makes replacements on replacements, creating a bunch of garbage.

What regexp pattern can i use to only match something at the end of the line? I tried using a newline character, but that doesn't work.

User avatar
Void Main
Site Admin
Site Admin
Posts: 5716
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Mon Apr 11, 2005 2:18 pm

End of line regex is '$'.

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Mon Apr 11, 2005 2:34 pm

Yeah, I got that. Turns out that is not my problem. The problem is the decimal point in my pattern - I think it is being evaluated as a dot, which matches whatever. So if $3 is -74.4, it replaces the 74 instead of the .4. I tried using /\.4$/ as my pattern - I think a backslash escapes the dot. But results were not what I predicted. Is there some other method for escaping the dot?

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Mon Apr 11, 2005 3:38 pm

Okay, problem solved. I was escaping the dot in the 'if' statements, but also needed to do so in the 'sub' statements. Here is the working version of the program:

Code: Select all

#! /bin/awk -f
{if ($3 ~ /\.5$/)
{	sub(/\.5/, ".8333", $3)
	}
	else if ($3 ~ /\.4$/)
	{	sub(/\.4/, ".6667", $3)
		}
		else if ($3 ~ /\.3$/)
		{	sub(/\.3/, ".5000", $3)
			}
			else if ($3 ~ /\.2$/)
			{	sub(/\.2/, ".3333", $3)
				}
				else if ($3 ~ /\.1$/)
				{	sub(/\.1/, ".1667", $3)
					}
print $1,$2,$3 > "testB.xyz"}
Notice a lot of slashes! :D

Okay, so the program takes a command line argument as its input. What if I wanted to take a 2nd command line argument as the output? How do I refer to that in the last line, where I print to the file? I want to be able to use this from the command line, by typing:

Code: Select all

./program inputfile outputfile
at a shell prompt.

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Tue Apr 12, 2005 1:59 pm

Well, trying to work this out still. I have found, by printing some variables, that ARGV[0] = awk, ARGV[1] = infile, and ARGV[2] = outfile, given two arguments on the command line.

What I don't know is how to get awk to write from ARGV1 to ARGV2. Print statements such as:

Code: Select all

print $1,$2,$3 > "ARGV[2]"
print $1,$2,$3 > 'ARGV[2]'
print $1,$2,$3 > ARGV[2]
don't work - they throw the script into some sort of endless loop that creates the output file, but the file is unopenable. I have to ctrl-c the operation.

Any ideas?

User avatar
Void Main
Site Admin
Site Admin
Posts: 5716
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Tue Apr 12, 2005 2:08 pm

I would be happy to do it in Perl for you. :)

worker201
guru
guru
Posts: 668
Joined: Sun Jun 13, 2004 6:38 pm
Location: Hawaii

Post by worker201 » Tue Apr 12, 2005 2:18 pm

Are you telling me I know more awk than you do?
:D :D :D :D

User avatar
Void Main
Site Admin
Site Admin
Posts: 5716
Joined: Wed Jan 08, 2003 5:24 am
Location: Tuxville, USA
Contact:

Post by Void Main » Tue Apr 12, 2005 3:23 pm

Awk is just awkful. :)

Post Reply