![]() |
sponsored links |
|
|
sponsored links
|
|
|
2
24th February 06:25
External User
Posts: 1
|
i
Try this: grep -E '[ {$][A-Z]+[ }$]' file or grep '[ {$][A-Z][A-Z]*[ }$]' file Chuck Demas -- Eat Healthy | _ _ | Nothing would be done at all, Stay Fit | @ @ | If a man waited to do it so well, Die Anyway | v | That no one could find fault with it. demas@theworld.com | \___/ | http://world.std.com/~cpd |
|
|
4
26th February 13:39
External User
Posts: 1
|
% Would e.g.
% % grep '[ {$][A-Z].[ }$]' file % % suffice? Unfortunately, it does not work for me. Suffice for what? Ah, I see, the first part of the question is way up there in the subject line. I hate that. If this were a grep newsgroup, I might suggest grep '\<[A-Z][A-Z][A-Z]*\>' file which would extract lines containing acronyms. Since this isn't a grep newsgroup and you say you want the acronyms themselves, I will propose an awk solution. My suggestion is to first identify lines with strings of upper-case letters, then test each field to see if it's an acronym. Put everything but letters and maybe dashes into the field separator. If you don't want duplicates, you can store the acronyms in an array and spit them out at the end. BEGIN { FS = "[^A-Za-z-]+" } END { for (nym in acro) printf "%-10s %5d\n", nym, acro[nym] } /[A-Z][A-Z]+/ { for (i = 1; i <= NF; i++) { if ($i ~ /^[A-Z][A-Z]+$/) acro[$i]++ # might want to do extra processing here to deal with - used as # punctuation rather than composition } } If accented letters are important to this problem, you should use the POSIX [:upper:] instead of A-Z (for gawk users, this requires either the `-W posix' or `-W re-interval' switch): BEGIN { FS = "[^[:upper:]-]+" } END { for (nym in acro) printf "%-10s %5d\n", nym, acro[nym] } /[[:upper:]]{2,}/ { for (i = 1; i <= NF; i++) { if ($i ~ /^[[:upper:]]{2,}+$/) acro[$i]++ # might want to do extra processing here to deal with - used as # punctuation rather than composition } } -- Patrick TJ McPhee East York Canada ptjm@interlog.com |
|
|
7
26th February 13:39
External User
Posts: 1
|
% BEGIN { FS = "[^[:upper:]-]+" }
But intended to write BEGIN { FS = "[^[:alpha:]-]+" } -- Patrick TJ McPhee East York Canada ptjm@interlog.com |
|