A: Extracting structured data from web pages is a common task in automation, data analysis, and content aggregation. In Ruby, two of the most widely used libraries for this are Nokogiri
and scRUBYt
.
require 'open-uri'
require 'nokogiri'
html = URI.open("https://example.com/data").read
doc = Nokogiri::HTML(html)
This loads the HTML into a parsable document. From here, you can extract fields based on CSS selectors or XPath expressions.
doc.css('.item').each do |block|
title = block.at('.title')&.text
price = block.at('.price')&.text
puts "Title: #{title}, Price: #{price}"
end
Each .item
block acts like a slot in a structured list—holding consistent data points like title and price. This “slot-based” layout is ideal for scraping, since it can be mapped cleanly into arrays, tables, or JSON objects.
If you prefer a DSL (domain-specific language) approach, scRUBYt
allows you to define patterns and structure in a clean, readable way:
require 'scrubyt'
data = Scrubyt::Extractor.define do
fetch 'https://example.com/products'
product "div.product" do
name "h2.name"
price "span.cost"
end
end
puts data.to_xml
Whether you’re building a price tracker, article archiver, or content aggregator, Ruby offers solid options for structured scraping. Focus on sites with repeatable slot-based layouts and clear HTML structure to simplify your extraction logic and minimize errors.
Copyright © scRUBYt! | Powered by Wordpress – RSS Feed
Orange 2 by Headsetoptions Based on Design by David Herreman