Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > A character string matches a hex numeric pattern
User Name
Password
REGISTER NOW! Mark Forums Read




Reply Bookmark and Share
1 10th November 03:14
External User
 
Posts: 1
Default A character string matches a hex numeric pattern



GNU Awk 3.0.3

Greetings.

In an awk script (executed by gawk on my environment) I have a pattern
to recognize when the first field on a line is a hex number:
hex_digit_pattern = "[0-9a-f]"
hex_pattern = hex_digit_pattern "+"

So hex_pattern == "[0-9a-f]+"

My script hums along until it ran into a line starting with the worn
"Note:". Amazingly, this matched the hex pattern, messing up my
script. I kluged around this with an additional check inside the
action for $1 ~ hex_pattern:
if ($1 ~ "[A-Zg-z]+") { next }

My problem is superficially solved. But why did this happen?

Thanks.
-- J.S.
  Reply With Quote


 


2 10th November 03:14
chris f.a. johnson
External User
 
Posts: 1
Default A character string matches a hex numeric pattern



Probably because you didn't anchor the pattern to the first
character in the line:
/^[0-9a-f]+/ --
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
  Reply With Quote
3 10th November 03:14
beau nanaz
External User
 
Posts: 1
Default A character string matches a hex numeric pattern


Actually, Chris, the pattern match I discussed was not a [pattern
{action}] sequence.

Here's the relevant snatch of code:

space_section == 1 {
if ($1 ~ hex_pattern)
{
# The word "Note:" matched the hex_pattern. Patch problem
# with this extra check:
#
if ($1 ~ "[A-Zg-z]+") { next }
dbspnum = $2 # dbspace number
high_dbsnum = dbspnum # Limit my looping at END
dbspname[dbspnum] = $NF # Keep the dbspace name as well
}
}

The input data has about a dozen line layouts in two main sections. If
the line I'm scanning is in the "space_section" state, I check if the
first field matches the hex pattern - that's the if {} code you see up
there. The 3 comment lines and the "if" matching for [A-Zg-z] are the
kluge I described. But it is my contention that I should not have had
to do this, that the string "Note:" should not have matched the
hex_pattern. This is a string that has crept into a newer version of
an Informix utility that produces the input to my script.

As you can see from the more clarified code (which I had originally
omitted for brevity Ha!), anchoring has nothing to do with this
situation. Otherwise, it was a very reasonable suggestion.

Thanks.

-- J.
  Reply With Quote
4 10th November 03:14
chris f.a. johnson
External User
 
Posts: 1
Default A character string matches a hex numeric pattern


[please don't top post]

I repeat: anchor the pattern to the beginning of the string:

hex_pattern = "^[0-9a-f]+" --
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
  Reply With Quote
5 10th November 03:14
beau nanaz
External User
 
Posts: 1
Default A character string matches a hex numeric pattern


Ok, to continue this hilarity and isolate my problem:

(Yikes, this is getting long!)

Below is a tiny AWK script that demonstrates my problem. I think it
boils down to either a basic misunderstanding of the ~ (matches)
operator or of the pattern I'm using. The awk script file is named
test-note2.awk. It has, as comments, the failed pattern-matching
schemes as well; they give false positives.

BEGIN {
hex_pattern = "[0-9a-f]+"
printf ("hex_pattern = <%s>\n", hex_pattern)
}
/^[0-9a-f]+/ { #*** WORKS
#$1 ~ hex_pattern { #*** FAILS
#$1 ~ "[0-9a-f]+" { #*** FAILS
#$1 ~ /^[0-9a-f]+/ { #*** FAILS
printf ("Field 1 <%s> matches; line is <%s>\n", $1, $0)
}

The only scheme that works is using the literal hex pattern as a
condition and achoring it to the start of the line. So why don't I
just use that? Because this is in a script that has worked nicely for
the past 9 years and I had a reason to use the $1 ~ construct. Now I
discover that this construct gets broken by a new string it has not
encountered before. Yes, I could rewrite the whole script to reverse
the order of states and pattern matches but that would require a whole
big debugging process again.

Bottom line: I simply need to know:
(a) How I can use the ~ operator in an IF statement to test a single
variable for this hex pattern, without the false positives I'm getting.
OR
(b) Perhaps this is not the correct pattern-matching string for a hex
number.

In the original version of the script, the hex_pattern was:
"[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]" - 8
hex digits. And the test was something like:
condition {
if ($1 ~ hex_pattern) {
printf ("Field 1 <%s> matches; line is <%s>\n", $1, $0)
}
}

This is what worked for the 9 years, no false positives, no incorrect
negatives. I changed hex_pattern to the "[0-9a-f]+" pattern when I
encountered a situation with only 7 hex digits that broke my 8-digit
code. But now I realize that the IF test is giving false positives.

Note: the POSIX [:xdigit:] rejects all strings - it may not be
recognized by gawk. Let's not go there.

(Whew!)

Anyone willing to point me in the right direction?

Thanks mucho.

-- B.N.
  Reply With Quote
6 10th November 03:14
ed morton
External User
 
Posts: 1
Default A character string matches a hex numeric pattern


a) Please stop top-posting.

b) Chris already told you the solution twice. The only thing you may
additionally want to to is anchor the end of the string, e.g.

hex_pattern = "^[0-9a-f]+$"

c) [:xdigit:] would work too, i.e.:

hex_pattern = "^[[:xdigit:]]+$"

In either case, you'd use:

$1 ~ hex_pattern { print }

If the above isn't working for you, post a small sample input set and
expected output plus exactly the small script you're running. I cut out
the rest of your posting as all the top-posting makes it hard to read
and if the posted solution isn't working then it's because the
preceeding postings don't clearly explain the problem anyway.

Regards,

Ed.
  Reply With Quote
7 10th November 03:14
anton treuenfels
External User
 
Posts: 1
Default A character string matches a hex numeric pattern


/[0-9a-f]+/ matches the "e" in "Note"

/^[0-9a-f]+/ matches only strings beginning with legal hex digits, but does
not guarantee those strings also end with them.

/^[0-9a-f]+$/ matches strings containing only legal hex digits.

/[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]+/ matches strings
containing seven or more consecutive hex digits, although they may contain
other characters as well.

/[0-9a-f]{7,}/ is a more compact way of specifying the same thing.

/^[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]?$/
matches only strings of seven or eight consecutive hex digits.

/^[0-9a-f]{7,8}$/ is a more compact way of specifying the same thing.

- Anton Treuenfels
  Reply With Quote
8 10th November 03:14
beau nanaz
External User
 
Posts: 1
Default A character string matches a hex numeric pattern


(OK, sorry about the top posting. My last remark is at end.)


Anton,
Prostrating myself in grateful humility: _/()\o_

The setting: hex_pattern = "^[0-9a-f]+$" did the trick. No need to add
extra checks. No repeated literals.

Thanks so much!

-- B.N.
  Reply With Quote
Reply


Thread Tools
Display Modes


Some other forums that might be of your interest : Development, Ada, Apple script, Assembler, Awk, Beos, Basic, C, C++, C#, C# .net, .net, .net frameworks, Asp .net, Clarion, Clipper, Clos, Clu, Cobol, Coldfusion, Delphi, Dylan, Eiffel, Forth, Fortran, Haskell, Hermes, Icon, Idl, Java, Java script, Jscript .net, Jcl, Linoleum, Lisp, Lotus, Limbo, Logo, Ml, Mumps, Oberon, Postscript, Pop, Pl1, Prolog, Python, Ruby, Pascal, Perl, Php, Rebol, Rexx, Sed, Sather, Scheme, Smalltalk, Tcl, Vhdl, Vrml, Visual basic, Visual basic .net, Yorick, Mysql, Omnis, Postgresql, Xbase, Access, Oracle, Adabas, Berkeley, Btrieve, Filemaker, Gupta, Db2, Informix, Ingres, Mssql server, Object, Olap, Paradox, Rdb, Revelation, Sybase, Theory, Dbase, Html, Java script, Css, Flash, Photoshop, Corel script, Xml, Tech, Beos, Gem, Hp48, Hpux, Linux, Mac, Ms-dos, Os2, Palm, Solaris, Ti99, Windows, Xenix, Aos, Chorus, Geos, Inferno, Lantastic, Lynx, Mach, Minix, Netware, Os9, Parix, Plan9, Psos, Qnx, Xinu, Sco, Unix, Aix, Aux, 386bsd, Bsdi, Freebsd, Netbsd, Openbsd, Ultrix, Amd, Intel, Aptiva, Buz, Deals, Homebuilt, Overclocking, Programming, Extra forums


Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666