scRUBYt!
WWW::Mechanize and Hpricot on Steroids

Briefly...
scRUBYt! is a simple to learn and use, yet powerful web scraping toolkit written in Ruby. The idea behind making scRUBYt! was to show a few simple concepts of Web extraction as a practical extension of this tutorial.
March 8th, 2007 at 6:51 pm
How to display results in non latin language, like Chinese correctly? The results parsed from Google China are all messed up. Thank you!
March 9th, 2007 at 12:42 am
ant21,
Well, frankly I have no idea
The problem is that you are the first who is asking for non-latin stuff so I did not even think about this (rather obvious) problem yet. This feature should be added ASAP - though I have no experience with encoding in Ruby and I hear it is a real PITA, so it may take some time until this gets done…
March 20th, 2007 at 1:04 pm
ant21,
It is not so complicated.
(Admin> How can you start writing a simple web scrapping tool without thinking about the encoding?)
1) Enable UTF-8 in Ruby. There is a lot of tutorials.
2) Use Mechanize or Firewatir with HPricot. Again, there is a lot of tutorials. I would not use Scrubyt, it may generate a wrong XML.
3) Use iconv to convert an element/attribute value.
March 20th, 2007 at 1:18 pm
Orthros,
Well, maybe because 99% of scRUBYt! users is not interested in encoding but every kind of other stuff? And btw, as I have said encoding is coming - just I got 53934884 requests for other feats and 2 requests for encoding.It is that simple.
What do you mean with 2? Surely, scRUBYt! is totally beta and it may have bugs, but I could fill 3 blogs with problems I have had with Hpricot, Mechanize and Firewatir (which was not that easy to even get up and running on linux btw.) scRUBYt! makes a lot of things easier and faster - but of course if you would like to stick with HPricot, FireWatir, and Mechanize, that’s OK with me too - but I don’t think so it will be that much easier and surely not without problems and bugs.
just my 2c.
March 22nd, 2007 at 3:40 pm
Hi admin,
U R from Europe, arent you? Encoding is important in European countries, an example - check the comments.
How can your scRUByt make easier to run Firewatir? It is very tough in Linux and U have to install it too.
scRUByt is more limited then Ruby + HPricot + Mechanize or FireWatir. It is nice, but limited. And sure also with bugs and problems.
Btw. hihi, agreeable girls in Bratislava.
March 23rd, 2007 at 12:09 am
Yes, I am from Europe but 99% of the users are from English speaking countries (mostly US, UK and Canada) so why does it mater where I am from? Don’t you think that if it would be so important for my users/clients I would have added it earlier?
And as I have said, encoding is on the TODO list and will be added soon.
scRUBYt! can make easier to use FireWatir in the same way as it makes easier to use Mechanize. I did not get a feedback yet (except yours) that someone did not like the DSL built upon Mechanize. The same DSL will be used for Firewatir, just making possible to navigate AJAX/JS pages.
About the limits: well, this is the same as a high-level programming language vs a low-level: in the low level one you have control over everything but it takes more time to develop in while the high level may not do everything as you want and may have bugs and problems, but it is much faster to code in.
btw. why don’t you do the same as everyone else: if you find a bug, report it (or even better fix it)? You know in the Ruby community the people are used to the ’show, don’t tell’ mantra - and you did not even show anything concrete yet. So please go on - report the bug and it will be fixed for you. Bragging about an open source project in general which someone does in his free time won’t get anybody any further.
June 20th, 2007 at 6:54 pm
Hello! Good Site! Thanks you! zljyfdfgzwhq
June 27th, 2007 at 11:43 am
Thanks for this site!
hifue.info
September 4th, 2007 at 4:39 am
Hi!
Trying to export an Exctractor, I get this error:
/usr/lib/ruby/gems/1.8/gems/ruby2ruby-1.1.7/lib/ruby2ruby.rb:51:in
process': undefined methodfromarray’ for Sexp:Module (NoMethodError)from /usr/lib/ruby/gems/1.8/gems/scrubyt-0.3.0/lib/scrubyt/output/export.rb:70:in
export'oldexport’from /usr/lib/ruby/gems/1.8/gems/scrubyt-0.3.0/lib/scrubyt/output/scrubyt_result.rb:21:in
from /usr/lib/ruby/gems/1.8/gems/scrubyt-0.3.0/lib/scrubyt/output/scrubytresult.rb:8:in `export’
from programmazionescraper.rb:16
I think yours is a powerful tool, but without the production export, relying only to examples is a limit.
Thnks in advance for your time
September 4th, 2007 at 4:42 am
J.or.dan,
Of course - the learning part is there just to create the rules (and quickly recreate them if the paga changes), not for the actual scraping.
Could you please send me the extractor so I can check what’s the problem? Through a private pastie, or to scrubyt @at@ scrubyt.org?
September 4th, 2007 at 6:50 am
admin found the problem: gems last version conflicts.
Need to remove gems and install:
sudo gem install –version 3.6.3 RubyInline
sudo gem install –version 1.7.1 ParseTree
sudo gem install –version 1.1.6 ruby2ruby
sudo gem install ParseTreeReloaded
sudo gem install RubyInlineAccelleration
May 22nd, 2008 at 5:42 pm
zloxi uvawqzkp oiqlgjveh wtjvh omahu jazpcltf ekgw
May 22nd, 2008 at 5:43 pm
frudx brej ipxa yipblzjo ncvdpt yqkx wybsma http://www.rdcv.srbtpij.com
May 22nd, 2008 at 5:44 pm
ryfle vwfqkth jihwvln gdpstj puvflksg twgf mjqkbhwn
May 22nd, 2008 at 5:44 pm
jfpyrclov dhiova zerwkaj vdjkwhgpl xrsphzuyj ecfxwhtdp clfrxws http://www.xukjbhnso.qiaujm.com
May 22nd, 2008 at 5:48 pm
cialestp hgvmoszeu oxdgzrmi fwvq uxdjkgmn fxheyk jfedmnk
May 22nd, 2008 at 5:49 pm
wnblqd orntqbksw ompaxbs kctbfywzr ndthvxw lahqijzpw xpkvlb http://www.ucqmj.afsxjrbm.com
May 22nd, 2008 at 5:51 pm
afqgynixc pferz lhiormy ugdioh erxtslg tzsp uvcd
May 22nd, 2008 at 5:52 pm
prbiudqxv ojdfshe djqwzxc rkieym owpedqi hivxjtoq irpw http://www.amdy.kgeqsfyhj.com