In Part 1 of our WebScrape tutorial, we took a look at a simple PHP script to scrape offensive stats for all MLB teams.  If you missed Part 1, you can READ IT HERE.  In Part 2 of the tutorial, we’ll take a look at how you can run a script (or program) to scrape stats and how you can save the data to use however you wish.

How Can I Run My Program or Script?

Depending on the program language you choose (PHP, Python, Ruby on Rails, etc) there are multiple ways you can run a script or program and have the program or script scrape data off a particular website.  One method is to set up some sort of server on your machine at home.  This is pretty complicated and quite frankly I wouldn’t recommend it.  But, if you insist on doing it this way, I’m sure you can find plenty of good tutorials online that discusses running a program or script for your programming language of choice.

What we strongly recommend is paying a few bucks a month for a web host.  A few benefits of having a web host include:

  • Your web host will likely be up to date on security threats and detect them before the threat becomes an issue.  If you decided to do all of this without a web host, you would be responsible for all security risks.
  • Web hosts typically provide very user friendly graphical user interfaces for creating your tables you will need to store data behind the scenes.  Most web hosts also have tutorials that walk you through most any process you would need, including creating  a MySQL table.  Without using an interface provided by a web host, you would need to be fairly proficient writing SQL code.
  • You’re able to easily access your data from any location.  Let’s face it, with smart phones, wifi, etc, accessing your information on the web is quite simple.
  • Your web host will back up your data frequently.  Our web host backs up all of our data weekly, but we have actually modified the schedule slightly to back up data much more frequently.  In the event that you screw your data up or something, you can have your web host restore all of your data, or restore from your backed up copy.  Without a web host, you’re responsible for this on your own.
  • You can set up a cron jobs to run your program/script as frequently as you wish.  For web scraping stats, once per day is typically optimal.  You can set up an automated process (cron job) to run this program/script for you at a certain time every day.  You don’t have to worry about manually running anything at all.
  • Most web hosts provide you with as many customized email addresses as you want.  One example for our site is lineprojector@lineprojector.com.  Your email address could be WhatEverYouWant@YourWebsite.com

A Web Host Sounds Expensive.  Is it?

Absolutely not.  Most likely you’ll be able to use a particular web host’s cheapest plan.  Our site is powered by HostGator.  We’ve been fairly happy with them and their plans are very affordable.

For storing your own data, I would look for a web host that offers unlimited SQL tables.  HostGator fits this criteria.  Their Cheapest Plan, the Hatchling Plan, should be more than enough for a typical Web Scraper and runs $7.16 per month.  If you want to pre-pay for a year, the price drops to $5.56 per month.  As you can see, we’re not talking about much money at all.  In our opinion, the minimal monthly fee is definitely worth the peace of mind of not having to worry about security threats, backups, etc.

Something else you’ll want is a domain name.  A domain name is basically what you’d type in to get to your website.  Our site, for example, has the domain name www.LineProjector.com.  You don’t have to buy your domain name from your web host, so you can shop around.  For example, we bought our domain from GoDaddy.com.  Domain’s are cheap and can be bought at GoDaddy now for 9.99 per year.  With a web hosting plan and domain name, you’re looking at less than $10 per month.  Quite a bargain in our opinion.

Just because you have a domain name doesn’t mean you have to build a full fledged website to access your data.  Typically for stuff we don’t necessarily want to make available to the public, we just have our SQL statements display onto page that only site admins can access.  You could do this same thing. You may not ever do this.  More than likely, you’ll just want to login to your web host’s control panel so you can access the SQL tables where your scraped data is stored.

If you want to see what a Web Host Control Panel looks like, CLICK HERE.  This demo actually allows you to click around and play with a “mock” control panel.  Within the demo, there’s a MySql wizard, video tutorials, etc.

What’s next for the tutorial?

For Part 3 of the tutorial, we’ll look at building a MySQL table to store your scraped data.  We’ll also get into the basics of running a SQL statement to pull back a result set.