Data Scraping in Rails by Processing CSV

The ruby on rails Application to scrape the link uploaded from CSV file and

find the occurance of link in particular page.

In the application user need to pass a csv and list of users email to whom the parsed CSV will be sent.

In the csv there will be three 2 column:

  • refferal_link
  • home_link
  • and there values like below

First of all we will create the rails application

$ rails new scrape_data

$ cd scrape_data

Then we will genrate the UploadCsv module, run the below command

$ rails g scaffold UploadCsv generated_csv:string csv_file:string

That will create All the required model, controller and migrations for csv_file

Then we will start by first upload the file in DB

replace the below code in files  app/views/upload_csvs/_form.html.erb

we added the below code to upload file in view

<%= form_with(model: upload_csv, local: true) do |form| %>

  <% if upload_csv.errors.any? %>

    <div id=”error_explanation”>

      <h2><%= pluralize(upload_csv.errors.count, “error”) %> prohibited this upload_csv from being saved:</h2>


        <% upload_csv.errors.full_messages.each do |message| %>

          <li><%= message %></li>

        <% end %>



  <% end %>

  <div class=”field”>

    <%= form.label :csv_file %>

    <%= form.file_field :csv_file %>


  <div class=”actions”>

    <%= form.submit %>


<% end %>

Then we will add the gem for upload a csv_file

add the below line in gem file

gem ‘carrierwave’, ‘~> 2.0’

$ bundle install

Then we will create the uploader in carrierwave

$ rails generate uploader Avatar

we will attach the uploader in model


class UploadCsv < ApplicationRecord

  mount_uploader :csv_file, AvatarUploader


before moving further just check your application is working

run below commands

$ rake db:create db:migrate

update the routes

Rails.application.routes.draw do

  resources :upload_csvs

  root ‘upload_csvs#index’


$ rails s

Then we will create a Job to read the CSV file and scrape the link from it

and genrated file will be save in generated_csv column of that records

for genearting the job we will do like below

$ rails generate job genrate_csv

add the below gem and run bundle install

gem ‘httparty’

gem ‘nokogiri’

then we will replace the code with below

class GenrateCsvJob < ApplicationJob

  queue_as :default

  def perform(upload_csv)


    file =[“#{Rails.root}/public/generated_csv”, ‘.csv’]) do |csv|

      csv << %w[referal_link home_link count]

      @new_array.each do |new_array|

        csv << new_array


      file = “#{Rails.root}/public/product_data.csv”

      headers = [‘referal_link’, ‘home_link’, ‘count’]

      file =, ‘w’, write_headers: true, headers: headers) do |writer|

        @new_array.each do |new_array|

          writer << new_array


        upload_csv.update(generated_csv: file)



    NotificationMailer.send_csv(upload_csv).deliver_now! if @new_array.present?

    #need to genrate the mailer and follow the mailer steps


  # Method to get the link count and stores in the array

  def processed_csv(upload_csv)

    @new_array = []

    CSV.foreach(upload_csv.csv_file.path, headers: true, header_converters: :symbol) do |row|

      row_map = row.to_h

      page = HTTParty.get(row_map[:refferal_link])

      page_parse = Nokogiri::HTML(page)

      link_array = page_parse.css(‘a’).map { |link| link[‘href’] }

      link_array_group = link_array.group_by(&:itself).map { |k, v| [k, v.length] }.to_h

      @new_array.push([row_map[:refferal_link], row_map[:home_link], (link_array_group[row_map[:home_link]]).to_s])




Then we will attach the job after_create of upload_csvs and we will add the validation for csv_file require

 please update the code of  app/models/upload_csv.rb

class UploadCsv < ApplicationRecord

  mount_uploader :csv_file, AvatarUploader