![]() |
|
|
|
|
|
|
2
18th September 20:47
External User
Posts: 1
|
I've attached a Ruby solution below. It takes 2.593646 seconds on a
2.33 GHz Intel Core 2 Duo (although only one core is used). This is an interpreted speed. There are at least four highly active parallel efforts to create a compiled Ruby underway as I type this. They might produce faster times. My code not only keeps track of the unique words, it counts how many time each appears in the text. Then, after the timing result is produced, the code outputs information about the most frequent word(s) (which happens to be "the") and the least frequent words (4004 appear only once, from "abaddon" to "zuzims"). But again, that output does not affect the timing. Also, since I wanted to be able to access the data from the web directly, there's a little bit of code to allow it to skip the non- included material at the top and bottom of the file. Eric ==== Are you interested in on-site Ruby training that uses well-designed, real-world, hands-on exercises? http://LearnRuby.com ======== # Reads a file containing the text of the Bible and, after processing # the data slightly, prints out how many total words and how many # unique words it contained. See http://tinyurl.com/354hry for the # full problem description. # This solution is offered by LearnRuby.com (http://learnruby.com). # If there is a file named "kjv12.txt" in the current directory, the # data will be read from that file. Otherwise, the data will be read # from the URI "http://patriot.net/users/bmcgin/kjv12.txt". start_time = Time.now Bible_Filename = "kjv12.txt" Bible_URI = "http://patriot.net/users/bmcgin/kjv12.txt" input = begin open Bible_Filename rescue require 'open-uri' puts "NOTE: time taken is invalid since it includes web access\n\n" open Bible_URI end state = :skip_top word_count = 0 words_seen = Hash.new(0) input.each_line do |line| state = rocess ifstate == :skip_top && line =~ /Book\s+01\s+Genesis/ next unless state == rocessstate = :skip_bottom if line =~ /022:021.*Amen\./ # remove apostrophe between letters mod_line = line.gsub /([[:alpha:]])'([[:alpha:]])/, '\1\2' # convert sequences of non-letters to single spaces, remove white # space at either end, and convert letters to lower case mod_line.gsub!(/[^[:alpha:]]+/, ' ').strip!.downcase! words = mod_line.split word_count += words.size words.each { |word| words_seen[word] += 1 } end input.close puts "Number of words: %d" % word_count puts "Number of unique words: %d" % words_seen.size end_time = Time.now puts "Time taken to compute: %f seconds" % (end_time - start_time) # # Extra information, just for the fun of it... # # figure out the counts for the most and least frequent words word_counts = words_seen.values top_word_count = word_counts.max bottom_word_count = word_counts.min # put together a list of the most frequent word(s) and the least # frequent word(s) top_words = words_seen.select { |word, count| count == top_word_count }.map { |e| e[0] } bottom_words = words_seen.select { |word, count| count == bottom_word_count }.map { |e| e[0] } # output information about the most and least frequent words puts("\nThe following %d most frequent word(s) each appeared %d time(s):" % [top_words.size, top_word_count]) puts top_words.sort.join("\n").gsub(/^/, ' ') puts "\nThe following %d least frequent word(s) each appeared %d time(s):" % [bottom_words.size, bottom_word_count] puts bottom_words.sort.join("\n").gsub(/^/, ' ') ==== |
|
|
|
|
4
18th September 20:50
External User
Posts: 1
|
<snip source>
Congratulations, very interesting translation of my http://home.earthlink.net/~dave_gemini/wc.f90 fortran source that NO-ONE expected to see after almost 3yrs from the original challenge. I note you have removed any benchmark timing, surely the code will beat Eric's RUBY version = 2.6 sec ?? Prove that PL/I "once upon a time" supported EASY distribution of a windows exe program by making your exe available to run on our PCs, even tho I sense that no longer is supported with the "web-sphere PL/I system" OTOH, I can EASILY make my windows exe (1 file) available on request for anyone's use, and note that it will process ANY text file Perhaps someone will confirm your source is valid for their compiler.. |
|
|
5
18th September 20:50
External User
Posts: 1
|
|> This is a translation to PL/I of DF's Fortran code.
|> Like that code, it's ASCII specific. | <snip source> | Congratulations, | very interesting translation of my | http://home.earthlink.net/~dave_gemini/wc.f90 fortran source | that NO-ONE expected to see after almost 3yrs from the original challenge. | | > ....... | > 3 zuph | > 5 zur | > 1 zuriel | > 5 zurishaddai | > 1 zuzims | > total words = 789781 | > unique words = 12691 | > COLLISIONS= 4318; | > */ | | I note you have removed any benchmark timing, surely the code will beat | Eric's RUBY version = 2.6 sec ?? Benchmark timings from different CPUs are meaningless unless all benchmarks are done on the same CPU, same operating systems, same harddrive(s), etc. Saying that I ran some code on my computer, and it ran in 2.2 seconds. So what? I'd have to run the Fortran code (with a particular Fortran compiler) also. I also wish you would write the code to not be ASCII dependent. __________________________________________________ ____Gerard S. | Prove that PL/I "once upon a time" supported EASY distribution of a windows | exe program by | making your exe available to run on our PCs, even tho I sense that no | longer is supported with the | "web-sphere PL/I system" | | OTOH, I can EASILY make my windows exe (1 file) available on request for | anyone's use, and note that it will process | ANY text file | | Perhaps someone will confirm your source is valid for their compiler.. |
|