Toll free (US) : +1 (888) 838-3532 | NYC: +1 (646) 491-6866

The ruby on rails Application to scrape the link uploaded from CSV file and

find the occurance of link in particular page.

In the application user need to pass a csv and list of users email to whom the parsed CSV will be sent.

In the csv there will be three 2 column:

  • refferal_link
  • home_link
  • and there values like below

First of all we will create the rails application

$ rails new scrape_data

$ cd scrape_data

Then we will genrate the UploadCsv module, run the below command

$ rails g scaffold UploadCsv generated_csv:string csv_file:string

That will create All the required model, controller and migrations for csv_file

Then we will start by first upload the file in DB

replace the below code in files  app/views/upload_csvs/_form.html.erb

we added the below code to upload file in view

<%= form_with(model: upload_csv, local: true) do |form| %>

  <% if upload_csv.errors.any? %>

    <div id=”error_explanation”>

      <h2><%= pluralize(upload_csv.errors.count, “error”) %> prohibited this upload_csv from being saved:</h2>

      <ul>

        <% upload_csv.errors.full_messages.each do |message| %>

          <li><%= message %></li>

        <% end %>

      </ul>

    </div>

  <% end %>

  <div class=”field”>

    <%= form.label :csv_file %>

    <%= form.file_field :csv_file %>

  </div>

  <div class=”actions”>

    <%= form.submit %>

  </div>

<% end %>

Then we will add the gem for upload a csv_file

add the below line in gem file

gem ‘carrierwave’, ‘~> 2.0’

$ bundle install

Then we will create the uploader in carrierwave

$ rails generate uploader Avatar

we will attach the uploader in model

app/models/upload_csv.rb

class UploadCsv < ApplicationRecord

  mount_uploader :csv_file, AvatarUploader

end

before moving further just check your application is working

run below commands

$ rake db:create db:migrate

update the routes

Rails.application.routes.draw do

  resources :upload_csvs

  root ‘upload_csvs#index’

end

$ rails s

Then we will create a Job to read the CSV file and scrape the link from it

and genrated file will be save in generated_csv column of that records

for genearting the job we will do like below

$ rails generate job genrate_csv

add the below gem and run bundle install

gem ‘httparty’

gem ‘nokogiri’

then we will replace the code with below

class GenrateCsvJob < ApplicationJob

  queue_as :default

  def perform(upload_csv)

    processed_csv(upload_csv)

    file = Tempfile.open([“#{Rails.root}/public/generated_csv”, ‘.csv’]) do |csv|

      csv << %w[referal_link home_link count]

      @new_array.each do |new_array|

        csv << new_array

      end

      file = “#{Rails.root}/public/product_data.csv”

      headers = [‘referal_link’, ‘home_link’, ‘count’]

      file = CSV.open(file, ‘w’, write_headers: true, headers: headers) do |writer|

        @new_array.each do |new_array|

          writer << new_array

        end

        upload_csv.update(generated_csv: file)

      end

    end

    NotificationMailer.send_csv(upload_csv).deliver_now! if @new_array.present?

    #need to genrate the mailer and follow the mailer steps

  end

  # Method to get the link count and stores in the array

  def processed_csv(upload_csv)

    @new_array = []

    CSV.foreach(upload_csv.csv_file.path, headers: true, header_converters: :symbol) do |row|

      row_map = row.to_h

      page = HTTParty.get(row_map[:refferal_link])

      page_parse = Nokogiri::HTML(page)

      link_array = page_parse.css(‘a’).map { |link| link[‘href’] }

      link_array_group = link_array.group_by(&:itself).map { |k, v| [k, v.length] }.to_h

      @new_array.push([row_map[:refferal_link], row_map[:home_link], (link_array_group[row_map[:home_link]]).to_s])

    end

  end

end

Then we will attach the job after_create of upload_csvs and we will add the validation for csv_file require

 please update the code of  app/models/upload_csv.rb

class UploadCsv < ApplicationRecord

  mount_uploader :csv_file, AvatarUploader

  after_create :processed_csv

  def processed_csv

    GenrateCsvJob.perform_later(self)

  end

end

then check after uploding file your scrape genrated file will be updated you can check generated csv

inside  /scrape_data/public/product_data.csv

we can send through email by using below instruction

First of we will genrate the mailer

$ rails generate mailer NotificationMailer

update the code of app/mailers/notification_mailer.rb

  def send_csv(upload_csv)

    @greeting = ‘Hi’

    attachments[‘parsed.csv’] = File.read(upload_csv.generated_csv)

    mail(to: “[email protected]”, subject: ‘CSV is parsed succesfully.’)

  end

end

please configure the mail configure also config/environments/development.rb or production.rb

add below lines in the file

config.action_mailer.default_url_options = { host: ‘https://sample-scrape.herokuapp.com/’ }

config.action_mailer.delivery_method = :smtp

config.action_mailer.smtp_settings = {

  user_name: ‘[email protected]’,

  password: ‘*******123456’,

  domain: ‘gmail.com’,

  address: ‘smtp.gmail.com’,

  port: ‘587’,

  authentication: :plain

}

config.action_mailer.raise_delivery_errors = false

and update the view also app/views/notification_mailer/send_csv.html.erb

<h1>CSV has been processed, Thanks!</h1>

<p>

  <%= @greeting %>, Please check attachment to recieve the email

</p>

Thank you !

WhatsApp chat