Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > How do you do this in awk?
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
1 19th October 07:03
External User
 
Posts: 1
Default How do you do this in awk?



Say, for example, you have lines on which there's a - let's say -
/word\.[1-9]/

On each line there's one of these, but its location is random.

You want, not the whole line, but just that word.n printed.

Can you do that? How?
  Reply With Quote


 


2 19th October 07:03
ed morton
External User
 
Posts: 1
Default How do you do this in awk?



function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,"word\\.[1-9]") { print RMATCH }

Regards,

Ed.
  Reply With Quote
3 10th November 03:12
External User
 
Posts: 1
Default Storing just the match in a line, Was: Re: How do you do this in awk?


I'm hitting up against the same thing, but it's not working correctly
for me. My gawk script, running in a WinXP console window (sorry,
nothing I can do about that), has to read a file and find matches with
the pattern "foo.bar" where "foo" is unknown... all I know is the
".bar" part. These matches may exist anywhere in a line, and I want
just "foo.bar" stored to a variable. I haven't yet scripted the part
about storing the substring to the variable, as I haven't yet been able
to verify I have the substring correctly.

I copied the function(extract) exactly. After the BEGIN statement, I have: ====
while ((getline line < filename) > 0) {
#Look for all words that match foo.bar and store them
if (line ~ / *\.bar/ ) {
#print line;
extract(line," *\\.bar") { print RMATCH }
}
}
====
gawk tells me there's a syntax error at the curly brace at the start of
{ print RMATCH }. I know the rest of that piece is working because if
I uncomment "print line" then I do indeed see every line of the file
containing that match. And yes, I must use gawk.

Thanks,
JP
  Reply With Quote
4 10th November 03:12
jon labadie
External User
 
Posts: 1
Default Storing just the match in a line, Was: Re: How do you do thisin awk?


Not the cause of your syntax error, but your pattern for matching
is flawed. I presume you mean 'a space' to ensure the start of a
word, followed by '*' to match anything upto the '.bar'.

The problem is that the pattern uses "Regular Expression" syntax
and the asterix ('*') doesn't mean what you think. In your context
it matches "zero or more spaces" in front of '.bar'. The asterix
does not stand alone, it is paired with the character in front of
it, so A* is zero or more A's, x* is zero or more x's.

It probably finds what you were looking for just because each '.bar'
in your data was proceeded by zero spaces.

I think this would be more like what you wanted:

/ [^ ]*\.bar/

A space followed by zero or more characters that are not spaces
before the '.bar'. If there has to be at least 1 char between
the space and the period, change the asterix to a plus sign.

However, note this will fail to properly match foo.bar at the
beginning of the line (no space) or after a tab or punctuation
(match too much, all the way back to a space).
  Reply With Quote
5 10th November 03:12
External User
 
Posts: 1
Default Storing just the match in a line, Was: Re: How do you do this in awk?


You are correct. [snip]


Since foo.bar will never occur at the start of a line and will only
occur after a space, this is okay, but I will polish up my regexp knowledge.


So now I have:
===
if (line ~ / [^ ]*\.bar/ ) {
print line;
extract(line," [^ ]*\.bar"); { print RMATCH }
}
===
....and it works! Note I had to add a semicolon after the function
call, or else I got that same parse error.

Thank you!
JP
  Reply With Quote
6 10th November 03:13
External User
 
Posts: 1
Default Storing just the match in a line, Was: Re: How do you do this in awk?


This is relevant to the previous discussion in this thread but is
different enough that it's less confusing for me to start from scratch
in this post. The main problem is that the regular expression is not
always finding the correct pattern match.

Background:
I'm using gawk in a Win32 console window, though that may not be
relevant. I need to extract, in this case, the names of .c or .cpp
source files that may appear at random in an ASCII text file.
Fortunately only one instance of a source file name may appear in any
one line.

Provided this function:
====
# Extract a substring that matches the regular expression
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
====
.... I have this snippet of code (note the source file name is always in
quotation marks in the text file):
====
if (line ~ /[^\"]*\.cp*/ ) {
extract(line,"[^\"]+\.cp*");
print "The source file is named " RMATCH;
}
====
.... to try to get the quoted name of the source file. When the line is
as follows, the match is "foo.c", which is what I want:
RelativePath="foo.c"


.... but when the name of the source file begins with a c, the match is
either:
RelativePath="c
or
RelativePath="cp

.... (including all the leading whitespace) depending on whether the
source file is "coo.c" or "coo.cpp".

I do not understand why the file name starting with a "c" will change
the behavior, since the regular expression specifies the match must
have a period before the c. Help, please?

Thank you,
JP
  Reply With Quote
7 10th November 03:13
External User
 
Posts: 1
Default Storing just the match in a line, Was: Re: How do you do this in awk?


I'll add that if I don't call the function, but instead use the guts of
the function directly, the name of the source file is always found
correctly:
====
if (line ~ /[^\"]*\.cp*/ ) {
match(line,/[^\"]*\.cp*/);
temp = substr(line,RSTART,RLENGTH);
print "The source file is named " temp;
}
====
.... could this be a bug in gawk, or is there a finer point of the
language that eludes me?

Thanks,
JP
  Reply With Quote
8 10th November 03:13
peter volsted
External User
 
Posts: 1
Default Storing just the match in a line, Was: Re: How do you do thisin awk?


hi

In your 'not-class' include the ".", i.e. [^\".] -- escape not needed in
class ; and change the quantifier from "*" to "+", unless you expect
lines starting with ".c"

--
good luck

peter
  Reply With Quote
Reply


Thread Tools
Display Modes




666