Mombu the Programming Forum sponsored links

Go Back   Mombu the Programming Forum > Programming > Extracting upper-case acronyms from a text
User Name
Password
REGISTER NOW! Mark Forums Read

sponsored links


Reply
 
1 24th February 06:25
till halbach
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text



Would e.g.

grep '[ {$][A-Z].[ }$]' file

suffice? Unfortunately, it does not work for me.

TIA,
Till
  Reply With Quote


  sponsored links


2 24th February 06:25
demas
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text



i
Try this:

grep -E '[ {$][A-Z]+[ }$]' file

or

grep '[ {$][A-Z][A-Z]*[ }$]' file


Chuck Demas

--
Eat Healthy | _ _ | Nothing would be done at all,
Stay Fit | @ @ | If a man waited to do it so well,
Die Anyway | v | That no one could find fault with it.
demas@theworld.com | \___/ | http://world.std.com/~cpd
  Reply With Quote
3 24th February 06:26
doug mcclure
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text


{
# You probably don't want single letter acronyms!
while (match($0, /[A-Z][A-Z]+/))
{
acronym = substr($0, RSTART, RLENGTH)
$0 = substr($0, RSTART+RLENGTH)
}
}

DKM


To contact me directly, send EMAIL to (single letters all)
DEE KAY EMM AT CEE TEE ESS D0T CEE OH EMM
  Reply With Quote
4 26th February 13:39
ptjm
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text


% Would e.g.
%
% grep '[ {$][A-Z].[ }$]' file
%
% suffice? Unfortunately, it does not work for me.

Suffice for what? Ah, I see, the first part of the question is way
up there in the subject line. I hate that.

If this were a grep newsgroup, I might suggest
grep '\<[A-Z][A-Z][A-Z]*\>' file

which would extract lines containing acronyms. Since this isn't a grep
newsgroup and you say you want the acronyms themselves, I will propose
an awk solution.

My suggestion is to first identify lines with strings of upper-case
letters, then test each field to see if it's an acronym. Put everything
but letters and maybe dashes into the field separator. If you don't want
duplicates, you can store the acronyms in an array and spit them out at
the end.

BEGIN { FS = "[^A-Za-z-]+" }

END { for (nym in acro) printf "%-10s %5d\n", nym, acro[nym] }

/[A-Z][A-Z]+/ {
for (i = 1; i <= NF; i++) {
if ($i ~ /^[A-Z][A-Z]+$/)
acro[$i]++
# might want to do extra processing here to deal with - used as
# punctuation rather than composition
}
}

If accented letters are important to this problem, you should use the
POSIX [:upper:] instead of A-Z (for gawk users, this requires either the
`-W posix' or `-W re-interval' switch):

BEGIN { FS = "[^[:upper:]-]+" }

END { for (nym in acro) printf "%-10s %5d\n", nym, acro[nym] }

/[[:upper:]]{2,}/ {
for (i = 1; i <= NF; i++) {
if ($i ~ /^[[:upper:]]{2,}+$/)
acro[$i]++
# might want to do extra processing here to deal with - used as
# punctuation rather than composition
}
}
--

Patrick TJ McPhee
East York Canada
ptjm@interlog.com
  Reply With Quote
5 26th February 13:39
chris f.a. johnson
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text


tr -c '[A-Z]' '\n' < file | grep '[A-Z][A-Z]'

--
Chris F.A. Johnson http://cfaj.freeshell.org
================================================== =================
My code (if any) in this post is copyright 2003, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
  Reply With Quote
6 26th February 13:39
andreas kahari
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text


Assuming POSIX awk:

awk 'BEGIN { RS=" " } /^[[:upper:]]+$/ { print }' file

--
Andreas Kähäri
  Reply With Quote
7 26th February 13:39
ptjm
External User
 
Posts: 1
Default Extracting upper-case acronyms from a text


% BEGIN { FS = "[^[:upper:]-]+" }

But intended to write

BEGIN { FS = "[^[:alpha:]-]+" }


--

Patrick TJ McPhee
East York Canada
ptjm@interlog.com
  Reply With Quote
Reply


Thread Tools
Display Modes




Copyright © 2006 SmartyDevil.com - Dies Mies Jeschet Boenedoesef Douvema Enitemaus -
666