A character string matches a hex numeric pattern
Ok, to continue this hilarity and isolate my problem:
(Yikes, this is getting long!)
Below is a tiny AWK script that demonstrates my problem. I think it
boils down to either a basic misunderstanding of the ~ (matches)
operator or of the pattern I'm using. The awk script file is named
test-note2.awk. It has, as comments, the failed pattern-matching
schemes as well; they give false positives.
BEGIN {
hex_pattern = "[0-9a-f]+"
printf ("hex_pattern = <%s>\n", hex_pattern)
}
/^[0-9a-f]+/ { #*** WORKS
#$1 ~ hex_pattern { #*** FAILS
#$1 ~ "[0-9a-f]+" { #*** FAILS
#$1 ~ /^[0-9a-f]+/ { #*** FAILS
printf ("Field 1 <%s> matches; line is <%s>\n", $1, $0)
}
The only scheme that works is using the literal hex pattern as a
condition and achoring it to the start of the line. So why don't I
just use that? Because this is in a script that has worked nicely for
the past 9 years and I had a reason to use the $1 ~ construct. Now I
discover that this construct gets broken by a new string it has not
encountered before. Yes, I could rewrite the whole script to reverse
the order of states and pattern matches but that would require a whole
big debugging process again.
Bottom line: I simply need to know:
(a) How I can use the ~ operator in an IF statement to test a single
variable for this hex pattern, without the false positives I'm getting.
OR
(b) Perhaps this is not the correct pattern-matching string for a hex
number.
In the original version of the script, the hex_pattern was:
"[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]" - 8
hex digits. And the test was something like:
condition {
if ($1 ~ hex_pattern) {
printf ("Field 1 <%s> matches; line is <%s>\n", $1, $0)
}
}
This is what worked for the 9 years, no false positives, no incorrect
negatives. I changed hex_pattern to the "[0-9a-f]+" pattern when I
encountered a situation with only 7 hex digits that broke my 8-digit
code. But now I realize that the IF test is giving false positives.
Note: the POSIX [:xdigit:] rejects all strings - it may not be
recognized by gawk. Let's not go there.
(Whew!)
Anyone willing to point me in the right direction?
Thanks mucho.
-- B.N.
|