<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>scRUBYt!</title>
	<atom:link href="http://scrubyt.org/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://scrubyt.org/blog</link>
	<description>WWW::Mechanize and Hpricot on Steroids</description>
	<pubDate>Sat, 31 Jan 2009 01:25:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>scRUBYt! gem on Github</title>
		<link>http://scrubyt.org/blog/scrubyt-gem-on-github/</link>
		<comments>http://scrubyt.org/blog/scrubyt-gem-on-github/#comments</comments>
		<pubDate>Sat, 31 Jan 2009 01:18:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/blog/?p=101</guid>
		<description><![CDATA[I have finally fixed the scrubyt.gemspec and commited to github - so you can install scRUBYt! from the github gem, which I will update quite often (&#8217;real&#8217; releases to rubyforge, with announcements etc. will happen much less frequently). So in case you would like to keep up with the newest stuff, get the lastest bugfixes [...]]]></description>
			<content:encoded><![CDATA[<p>I have finally fixed the scrubyt.gemspec and commited to github - so you can install scRUBYt! from the github gem, which I will update quite often (&#8217;real&#8217; releases to rubyforge, with announcements etc. will happen much less frequently). So in case you would like to keep up with the newest stuff, get the lastest bugfixes and whatnot, be sure to follow <a href="http://github.com/scrubber/scrubyt/tree/master">scRUBYt! on github</a> and install the newest gem.</p>

<p>You can do so by running the following (if you haven&#8217;t already):</p>

<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
gem sources -a http://gems.github.com
</pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">gem sources -a http://gems.<span style="color:#9900CC;">github</span>.<span style="color:#9900CC;">com</span> </div></li></ol></div></div>

<p>and installing scRUBYt! with </p>

<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
sudo gem install scrubber-scrubyt
</pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">sudo gem install scrubber-scrubyt </div></li></ol></div></div>

<p>If you do so right now, you will get version 0.4.11 which contains a number of bug fixes, so be sure check it out! More goodies to come soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/scrubyt-gem-on-github/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Bye-bye Beast, Hello googlegroups</title>
		<link>http://scrubyt.org/blog/bye-bye-beast-hello-googlegroups/</link>
		<comments>http://scrubyt.org/blog/bye-bye-beast-hello-googlegroups/#comments</comments>
		<pubDate>Tue, 27 Jan 2009 21:04:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/blog/?p=99</guid>
		<description><![CDATA[I am sure most of you noticed that the forums are down since some time - I put quite some energy into fixing the issue, but it&#8217;s a very old install (using the archaic fcgi way) using the old beast and I had no time to convert it to nginx/mongrel or phusion (besides the fact [...]]]></description>
			<content:encoded><![CDATA[<p>I am sure most of you noticed that the forums are down since some time - I put quite some energy into fixing the issue, but it&#8217;s a very old install (using the archaic fcgi way) using the old beast and I had no time to convert it to nginx/mongrel or phusion (besides the fact that beast is abandonware).</p>

<p>To make a long story short: I have created a <a href="http://groups.google.com/group/scrubyt?hl=en">google groups mailing list for scRUBYt!</a> - please subscribe and lace your questions there. It&#8217;s not very likely the forums will be back (though it would be great to share all the info that piled up there in some way - will think about it) - a mailing list is easier for everyone.</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/bye-bye-beast-hello-googlegroups/feed/</wfw:commentRss>
		</item>
		<item>
		<title>At last: scRUBYt! 0.4.1 is out</title>
		<link>http://scrubyt.org/blog/at-last-scrubyt-041-is-out/</link>
		<comments>http://scrubyt.org/blog/at-last-scrubyt-041-is-out/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 14:04:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/blog/?p=83</guid>
		<description><![CDATA[After more than a year, I&#8217;d like to announce a new release of scRUBYt! and set &#8220;scRUBYt!&#8221;.is_vaporware? = false. w00t!

Thanks to Glen Gillen, it is possible now to use FireWatir as the agent for navigation, enabling AJAX/more robust scraping via Firefox/FireWatir.

Another big news is that the RubyInline, ParseTree and Ruby2Ruby dependency was dropped since we [...]]]></description>
			<content:encoded><![CDATA[<p>After more than a year, I&#8217;d like to announce a new release of scRUBYt! and set &#8220;scRUBYt!&#8221;.is_vaporware? = false. w00t!</p>

<p>Thanks to <a href="http://rubypond.com">Glen Gillen</a>, it is possible now to use FireWatir as the agent for navigation, enabling AJAX/more robust scraping via Firefox/FireWatir.</p>

<p>Another big news is that the RubyInline, ParseTree and Ruby2Ruby dependency was dropped since we couldn&#8217;t solve this problem for win32 for one year. Yay for the windows users (and other OS users juggling various versions of the above stuff).</p>

<p>Of course a lot of bugs were fixed as well.</p>

<p>On the non-source code front, we have  </p>

<ul>



<li>An <a href="http://scrubyt.org">all new homepage</a> with all the useful links
</li>
<li>
A <a href="http://github.com/scrubber/scrubyt/tree/master">github repository</a>
</li>
<li>
A small (at the moment only!) <a href="http://github.com/scrubber/scrubyt_examples/tree/master">scraper repository</a>
</li>

<li>
A <a href="http://scrubyt.lighthouseapp.com/projects/18686-scrubyt/overview">Lighthouse tracker</a>
</li>
<li>
<a href="http://github.com/scrubber/scrubyt_tmbundle/tree/master">TextMate bundle</a>
</li>

</ul>

<p>and probably other cool stuff which I can&#8217;t remember right now! Will update the article later.</p>

<h3>What&#8217;s next?</h3>

<p>The biggest news is that scRUBYt! is going to be rewritten from scratch - the work has already been started by Glenn Gillen. scRUBYt! has grown too big for our taste, so we decided to start anew, aiming for 100% rSpec coverage, refactored code, speed/performance optimization and leaving all the cruft behind. So scRUBYt! 0.4.1, the last one based on the original scRUBYt! will be supported until the new, rewritten one (0.5.0) comes out and takes it&#8217;s place.</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/at-last-scrubyt-041-is-out/feed/</wfw:commentRss>
		</item>
		<item>
		<title>TextMate Bundle for scRUBYt!</title>
		<link>http://scrubyt.org/blog/textmate-bundle-for-scrubyt/</link>
		<comments>http://scrubyt.org/blog/textmate-bundle-for-scrubyt/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 02:44:34 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/blog/?p=80</guid>
		<description><![CDATA[As stupid as this sounds from the original author after countless hours of scRUBYt! usage and development, I still had to occasionally open some older scrapers to get the exact logger, exporter, clicklinkand_wait etc. syntax. Even though I know 95% of the possible commands, I thought it&#8217;d be great to speed up the typing time [...]]]></description>
			<content:encoded><![CDATA[<p>As stupid as this sounds from the original author after countless hours of scRUBYt! usage and development, I still had to occasionally open some older scrapers to get the exact logger, exporter, click<em>link</em>and_wait etc. syntax. Even though I know 95% of the possible commands, I thought it&#8217;d be great to speed up the typing time - a typical scRUBYt! extractor has tons of boilerplate code.</p>

<p>So I decided to create a <a href="http://github.com/scrubber/scrubyt_tmbundle/tree/master">TextMate bundle and host it on github</a>. It&#8217;s pretty rudimentary right now, consisting of about two dozens of snippets, but hey, it&#8217;s a start.</p>

<p>I bet scgoog-&gt;TAB will become a big favorite right away (spits the classical google extractor example into your editor) - but there are other usable snippets included as well. With their help it&#8217;s literally possible to create a scraper in a few seconds.</p>

<p>If you have further ideas, would like to contribute etc. please drop me a mail (scrubyt -nice try spambot! NOT.- at scrubyt dot org).</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/textmate-bundle-for-scrubyt/feed/</wfw:commentRss>
		</item>
		<item>
		<title>scRUBYt! on GitHub and LightHouse</title>
		<link>http://scrubyt.org/blog/scrubyt-on-github-and-lighthouse/</link>
		<comments>http://scrubyt.org/blog/scrubyt-on-github-and-lighthouse/#comments</comments>
		<pubDate>Thu, 23 Oct 2008 12:43:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/blog/?p=77</guid>
		<description><![CDATA[The release is almost ready - I have to finish one more important feature (print the generated XPaths for patterns which were specified by an example - Extractor#export() is not supported at the moment since RubyInline, ParseTree and ruby2ruby were dropped to make the installation smoother possible :-). Until it will be added back, at [...]]]></description>
			<content:encoded><![CDATA[<p>The release is almost ready - I have to finish one more important feature (print the generated XPaths for patterns which were specified by an example - Extractor#export() is not supported at the moment since RubyInline, ParseTree and ruby2ruby were dropped to make the installation <del datetime="2008-10-23T12:13:42+00:00">smoother</del> possible :-). Until it will be added back, at least you can substitute the examples manually - not a great, but at least working solution). Also I&#8217;ll test the whole stuff once again, fix some minor bugs, create some nice examples (if you have suggestions, drop me a comment and let&#8217;s see what can I do) - I guess this should be done until the weekend.</p>

<p>Until then, if you&#8217;d like to check out the present state, check out <a href="http://github.com/scrubber/scrubyt/">scRUBYt! on github</a> and I have just set up a <a href="http://scrubyt.lighthouseapp.com/projects/18686-scrubyt/overview">LightHouse tracker for scRUBYt!</a> - if you find any bugs, have feature requests/ideas etc. just drop it there.</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/scrubyt-on-github-and-lighthouse/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Installing FireWatir for Firefox 3.0+</title>
		<link>http://scrubyt.org/blog/installing-firewatir-for-firefox-30/</link>
		<comments>http://scrubyt.org/blog/installing-firewatir-for-firefox-30/#comments</comments>
		<pubDate>Wed, 15 Oct 2008 21:10:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/?p=71</guid>
		<description><![CDATA[After nearly a year of silence, a new version, firescRUBYt! (scRUBYt! integrated with FireWatir) is ready for release. In fact, firescRUBYt! is ready since a few months already - the problem was that FireWatir [1]  was not ready for Firefox 3, (which I believe everyone is using for some time now) and forcing users [...]]]></description>
			<content:encoded><![CDATA[<p>After nearly a year of silence, a new version, firescRUBYt! (scRUBYt! integrated with FireWatir) is ready for release. In fact, firescRUBYt! is ready since a few months already - the problem was that <span id="back_1">FireWatir <a href="#note_1">[1]</a></span>  was not ready for Firefox 3, (which I believe everyone is using for some time now) and forcing users to go back to FF2 just to try out the new release would have not been a bold move I guess :-).</p>

<p>To make a long story short: the release is coming in the next few days, until then get ready by installing the FireWatir (and jssh, it&#8217;s prerequisite). So let&#8217;s start with jssh.</p>

<p>Check out the files attached to the FireWatir project <a href="http://wiki.openqa.org/pages/viewpageattachments.action?pageId=13893658">here</a> (in case the link is broken, go to the <a href="http://wiki.openqa.org/display/WTR/FireWatir">FireWatir site</a> and navigate from there). Select your Firefox version and OS (for example take <a href="http://wiki.openqa.org/download/attachments/13893658/jssh-20080924-Darwin.xpi">this xpi</a> if you are on OS X and using FF >3.0 - see all the combos in part 2) of <a href="http://wiki.openqa.org/display/WTR/FireWatir+Installation">the official installation guide</a>) and install jssh which is an xpi file (unless opened automatically, open it with FF (File -> Open)). Restart Firefox after the installation to ensure that the add-on is activated.</p>

<p>To test whether jssh is working, close your current Firefox instance, go to the commandline and start FF with the -jssh option, i.e.:</p>

<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
firefox -jssh
</pre></div><div class="synthi_code" style="display:block;" ><div class="bash" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">firefox -jssh </div></li></ol></div></div>

<p>In a separate window, try to connect to Firefox via jssh with telnet:</p>

<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
telnet 127.0.0.1 9997
</pre></div><div class="synthi_code" style="display:block;" ><div class="bash" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">telnet <span style="color: #cc66cc;">127.0</span><span style="color: #cc66cc;">.0</span><span style="color: #cc66cc;">.1</span> <span style="color: #cc66cc;">9997</span> </div></li></ol></div></div>

<p>You should see/try something similar:</p>

<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
Macintosh-4:~ mbp$ telnet 127.0.0.1 9997
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Welcome to the Mozilla JavaScript Shell!

> 'Hello, world!'
Hello, world!
> exit()
Goodbye!
Connection closed by foreign host.
</pre></div><div class="synthi_code" style="display:block;" ><div class="bash" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Macintosh<span style="color: #cc66cc;">-4</span>:~ mbp$ telnet <span style="color: #cc66cc;">127.0</span><span style="color: #cc66cc;">.0</span><span style="color: #cc66cc;">.1</span> <span style="color: #cc66cc;">9997</span></div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Trying <span style="color: #cc66cc;">127.0</span><span style="color: #cc66cc;">.0</span><span style="color: #cc66cc;">.1</span>&#8230;</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Connected to localhost.</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Escape character is <span style="color: #ff0000;">&#8216;^]&#8217;</span>.</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Welcome to the Mozilla JavaScript Shell!</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&gt; <span style="color: #ff0000;">&#8216;Hello, world!&#8217;</span></div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Hello, world!</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&gt; <span style="color: #000066;">exit</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span></div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Goodbye!</div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Connection closed by foreign host. </div></li></ol></div></div>

<p>Which means jssh is properly installed! You are through with the hardest part.
Now you need to install the firewatir gem:</p>

<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
gem install firewatir
</pre></div><div class="synthi_code" style="display:block;" ><div class="bash" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">gem install firewatir </div></li></ol></div></div>

<p>That&#8217;s it! - you are ready to roll with FireWatir! The official release of scRUBYt!, along with a few tutorials is coming soon - stay tuned!</p>

<div style='border-top: 1px solid black; padding-top:40px; margin-top:40px'></div>

<div id="note_1">[1] (or, to be more precise, jssh (a small component allowing other programs to establish JS connection to a running Firefox process)) <a href="#back_1">back</a></div>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/installing-firewatir-for-firefox-30/feed/</wfw:commentRss>
		</item>
		<item>
		<title>My EURUKO 2007 slides</title>
		<link>http://scrubyt.org/blog/my-euruko-2007-slides/</link>
		<comments>http://scrubyt.org/blog/my-euruko-2007-slides/#comments</comments>
		<pubDate>Tue, 13 Nov 2007 10:25:26 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/my-euruko-2007-slides/</guid>
		<description><![CDATA[You can download my EURUKO (the European Ruby Conference) 2007 slides from here. Enjoy!
]]></description>
			<content:encoded><![CDATA[<p>You can download my <a href="http://scrubyt.org/presentation_euruko2007.pdf">EURUKO (the European Ruby Conference) 2007 slides from here</a>. Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/my-euruko-2007-slides/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Announcing JscRUBYt! - no more win32 problems (?)</title>
		<link>http://scrubyt.org/blog/announcing-jscrubyt-no-more-win32-problems/</link>
		<comments>http://scrubyt.org/blog/announcing-jscrubyt-no-more-win32-problems/#comments</comments>
		<pubDate>Tue, 02 Oct 2007 13:28:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/announcing-jscrubyt-no-more-win32-problems/</guid>
		<description><![CDATA[Thanks to Paul Nikitochkin a.k.a. pftg, scRUBYt! made a great leap to ensure win32 compatibility. Paul created JscRUBYt! - the JRuby version of scRUBYt! which should be easy to install under win32 even if you are not a level 64 microsoft compiling ninja (in fact, it requires no compiling, fiddling around with C/C++ or doing [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to <a href="http://www.pftg.net.ru/">Paul Nikitochkin</a> a.k.a. <a href="http://agora.scrubyt.org/users/120">pftg</a>, scRUBYt! made a great leap to ensure win32 compatibility. Paul created JscRUBYt! - the JRuby version of scRUBYt! which should be easy to install under win32 even if you are not a level 64 microsoft compiling ninja (in fact, it requires no compiling, fiddling around with C/C++ or doing anything outside (J)Ruby-land (well, except of installing JRuby, of course)). </p>

<p>Please <a href='http://rubyforge.org/frs/download.php/26165/scrubyt-jruby-0.3.4.rar'>download JscRUBYt!  from here</a> and read the <a href="http://pftg.blogspot.com/2007/10/installing-jruby-install-jdk-from.html">installation instructions</a> written up by Paul. </p>

<p>Please let us know if you run into any problems and/or your experience using this package!</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/announcing-jscrubyt-no-more-win32-problems/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A Hot New Release, 0.3.4 is Out - What&#8217;s New?</title>
		<link>http://scrubyt.org/blog/a-hot-new-release-034-is-out-whats-new/</link>
		<comments>http://scrubyt.org/blog/a-hot-new-release-034-is-out-whats-new/#comments</comments>
		<pubDate>Thu, 27 Sep 2007 20:45:16 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/a-hot-new-release-034-is-out-whats-new/</guid>
		<description><![CDATA[
After a long-long time, a lot of bugfixes, brainstorming sessions, coding, coding, coding, cans of red bull and coding, we are proud to present scRUBYt! 0.3.4!



Judging from the posts on the forum, people are not aware of quite lot powerful features (which is mainly my fault as I was lazy to do any documentation for [...]]]></description>
			<content:encoded><![CDATA[<p>
After a long-long time, a lot of bugfixes, brainstorming sessions, coding, coding, coding, cans of red bull and coding, we are proud to present scRUBYt! 0.3.4!
</p>

<p>
Judging from the posts on the <a href='http://agora.scrubyt.org'>forum</a>, people are not aware of quite lot powerful features (which is mainly my fault as I was lazy to do any documentation for the last 2 releases - but a cheatsheet and reference is on the way) - so I&#8217;d like to introduce a few new features which were added to scRUBYt! 0.3.4 to avoid this, at least for this release.
</p>

<p>
First of all there are <b><i>3 new pattern types</i></b>, of which 2 are particularly interesting. Let&#8217;s start with the not-so-interesting one:
<ul>
  <li><b>Constant pattern:</b>
    <div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
      pattern 'some constant text', :type => :constant
    </pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">pattern &#8217;some constant text&#8217;, :type =&gt; :constant </div></li></ol></div></div>
    Sometimes I needed a piece of text or data which was not contained in the web page (or it was always constant, so scraping it would mean an unneeded overhead) - perhaps a comment or a required field in a feed or other predefined schema. Constant pattern comes handy exactly in these cases: the above example will produce:
   <div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
     <pattern>some constant text</pattern>
   </pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&lt;pattern&gt;some constant text&lt;/pattern&gt; </div></li></ol></div></div>
  </li>
</ul>
<p>
The two interesting ones are in a very-alpha stage (in fact one of them was implemented 2 days ago for a scenario) so they are more of a preview of what to expect in the future releases than full-fledged features. They are already usable to some extent, but a lot of tweaking, polishing and adding new functionality can be expected in the near future.
</p>

<ul>
  <li><b>Text pattern:</b>
    <div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
      pattern 'td[some text]:all', :type => :text
    </pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">pattern &#8216;td<span style="color:#006600; font-weight:bold;">&#91;</span>some text<span style="color:#006600; font-weight:bold;">&#93;</span>:all&#8217;, :type =&gt; :text </div></li></ol></div></div>
  A text pattern works differently than an XPath one: while the XPath pattern relies on the structure of the document, the text pattern doesn&#8217;t. This is essential in the case of some sites (the most typical example is perhaps wikipedia) which are not using a single template to present the content and/or the structure changes often, but there are some text labels or other constant text chunks which can aid the scraping. The semantic of the above example is:
<pre>
Find all &lt;td&gt; tags which contain the text 'some text', wherever on the page.
</pre>
I am sure you noticed the :all notation - currently :index (where index is a number, so :0, :1 etc.) is supported besides :all, meaning &#8216;give me the first (:0), second (:1) etc. occurrence of the match).
  A lot of additions can be expected for the text pattern in the future (for example give me the longest text in a &lt;td&gt; or give me &lt;td&gt;s with a certain regexp etc.). As always, suggestions are warmly welcome!
  </li>
  <li><b>Script pattern:</b>
  <div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
    pattern lambda {|x| x.gsub('x','y').downcase}
  </pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">pattern <span style="color:#CC0066; font-weight:bold;">lambda</span> <span style="color:#006600; font-weight:bold;">&#123;</span>|x| x.<span style="color:#CC0066; font-weight:bold;">gsub</span><span style="color:#006600; font-weight:bold;">&#40;</span>&#8216;x&#8217;,'y&#8217;<span style="color:#006600; font-weight:bold;">&#41;</span>.<span style="color:#9900CC;">downcase</span><span style="color:#006600; font-weight:bold;">&#125;</span> </div></li></ol></div></div>
  A script pattern is a way to execute an arbitrary Ruby block during scraping. It&#8217;s input, as always, is the output of it&#8217;s parent pattern, represented by &#8216;x&#8217;.
  While this pattern type will be enhanced a lot in the future (allowing to choose more, arbitrary patterns as the input, possibility of specifying custom input, simplifying the syntax (the &#8216;lambda {|x| }&#8217; stuff is constant so  it will be most probably dropped) etc.) this pattern is already quite powerful as it is. The simplest use cases include filtering and modifying URLs, stripping white space or another string modifications like substitutions on the result, primitive branching etc. However, only your imagination is the limit here: you could do different operations on scraped prices, stock data, or running scraped coordinates through a geocoder. I am quite sure that script pattern will be a lot of fun, resulting in interesting uses.
  </li>
</ul>

<p></p></p>

<p>
There are some additions to the <b><i>output functionality</i></b>: <i>to_hash</i> now accepts a custom delimiter (for the cases when the output contained a comma, the default delimiter) and there is a new method: <i>to_flat_xml</i>, which produces a feed-like, flat xml instead of the hierarchical output generated by <i>to_xml</i>.
</p>

<p>
<b><i>Logging</i></b> was reworked completely by Tim Fletcher. The most notable difference is that by default, you won&#8217;t be overflooded with all the debug messages pouring from scRRUByt!. To enable logging, you have to explicitly add a line before your extractor:
<div class="synthi_code" style="display:none;" ><pre style="width:100%;overflow:auto;">
Scrubyt.logger = Scrubyt::Logger.new
#your extractor begins here
</pre></div><div class="synthi_code" style="display:block;" ><div class="ruby" style="font-family: monospace;"><ol><li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Scrubyt.<span style="color:#9900CC;">logger</span> = Scrubyt::Logger.<span style="color:#9900CC;">new</span></div></li>
<li style="font-weight: bold;"><div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#008000; font-style:italic;">#your extractor begins here </span></div></li></ol></div></div>
</p>

<p>Last but not least, a lot of bugs were fixed: the infamous regexp pattern bug, the encoding bug (scraping utf-8 pages should be ok now), a lot of fixes in the download pattern and other places.</p>

<p>jscRUBYt! and firescRUBYt! are on the way, so stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/a-hot-new-release-034-is-out-whats-new/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Let the Dogs Out!</title>
		<link>http://scrubyt.org/blog/let-the-dogs-out/</link>
		<comments>http://scrubyt.org/blog/let-the-dogs-out/#comments</comments>
		<pubDate>Wed, 19 Sep 2007 09:54:47 +0000</pubDate>
		<dc:creator>CopperMonkey</dc:creator>
		
		<category><![CDATA[News &amp; Announcements]]></category>

		<guid isPermaLink="false">http://scrubyt.org/let-the-dogs-out/</guid>
		<description><![CDATA[


scRUBYt! has dug its way into the investment business :). No joke - check out this scrummy tutorial created by Doug Bromley to find out more. 

A refreshing mix of business and technology is served here: a nice scraper that returns dividend yields summarized all in one place and a useful example for form filling [...]]]></description>
			<content:encoded><![CDATA[<p><img src='http://scrubyt.org/wp-content/uploads/2007/09/dogs_dow.png' alt='dogs_dow' />
<br />
<br />
scRUBYt! has dug its way into the investment business :). No joke - check out this scrummy <a href="http://www.straw-dogs.co.uk/09/05/scrubyt-tutorial-dogs-of-the-ftse/">tutorial</a> created by Doug Bromley to find out more. 
<br />
A refreshing mix of business and technology is served here: a nice scraper that returns dividend yields summarized all in one place and a useful example for form filling and submitting, page navigation and constrains. 
<br />
<br />
Thnx Doug for the excellent job! </p>
]]></content:encoded>
			<wfw:commentRss>http://scrubyt.org/blog/let-the-dogs-out/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.381 seconds -->
