scRUBYt!
WWW::Mechanize and Hpricot on Steroids

Briefly...
scRUBYt! is a simple to learn and use, yet powerful web scraping toolkit written in Ruby. The idea behind making scRUBYt! was to show a few simple concepts of Web extraction as a practical extension of this tutorial.
September 28th, 2007 at 9:28 am
Very nice to see a new release.
Keep up the good work
it is sad , though, that there is no windows version.
September 28th, 2007 at 11:22 am
Sweet! Glad to hear you guys are working hard. Scrubyt makes my life easier.
September 29th, 2007 at 2:57 pm
[…] scRUBYt 0.3.4 - a hot new release is out. […]
October 2nd, 2007 at 7:30 pm
NOOB question
pattern lamda {|x| x.gsub(‘x’,’y’).downcase}
can you please explain better how to use this?
what is lamda or lambda?
how about a real life example on how to use this?
October 2nd, 2007 at 7:48 pm
In 0.3.4, Rakefile has s.adddependency(’RubyInline’, ‘= 3.6.3′).
Why is not dependency s.adddependency(’RubyInline’, ‘>= 3.6.3′) or s.add_dependency(’RubyInline’, ‘= 3.6.4′)?
October 2nd, 2007 at 11:41 pm
seofacile,
Thanks for the typo correction. It should be lambda of course.
lambda is a synonym for Kernel::proc - check out the definition here.
lamda/proc is used when you would like to pass a block as a parameter to a function - like in this case. A script pattern needs a block as a parameter so it can execute it.
The block’s parameter (x in the above example) is the output of the parent pattern (it always works like this in scRUBYt! - if you have a pattern ‘book’, which has a child ‘author’, then author’s input is book’s output). It’s output is the execution of the block on that parameter.
Example: let’s take the classic google example. Let’s say you are interested in the sites (and not the full URL) of each result.
require 'rubygems' require 'scrubyt' google_data = Scrubyt::Extractor.define do fetch 'http://www.google.com/ncr' fill_textfield 'q', 'ruby' submit link "Ruby Programming Language/@href" do site lambda {|x| x.scan(/.+.(.+..+?)//)[0][0]}, :type => :script end next_page "Next", :limit => 2 end puts google_data.to_xmlOf course this is example is a bit contrived since you can do the same in Ruby after the extraction ends (even if I think it is more cleaner and concise like this) - script patterns help a lot when used in midst of the extraction.
October 3rd, 2007 at 12:04 am
hoge: Well, because (AFAIK) it needs exactly those versions…