CS290I - Scalable Internet Services and Systems

Thorsten von Eicken - UCSB - Spring 2001

Handout #3 - Project 1

Due Thursday, April 19th

Important:

Objective

In this project you get to learn Perl, the details of HTTP and HTML, and SQL. Your task is to write a robot in Perl which searches the web for stock information about a number of technology companies and stores the data in a SQL database. This information is going to be used in the next projects to create a web site that displays information about the companies, draws stock value graphs, and eventually allows you to trade stocks.

Documentation

  • CS290I: http://www.cs.ucsb.edu/~tve/cs290i-sp01
  • Perl.com by O'Reilly: http://www.perl.com
  • Comprehensive Perl Archive Network: http://www.cpan.org
  • Web Scripts in Perl: http://www.cpan.org/scripts/Web/index.html
  • Perl Script for fetching a page: http://www.cpan.org/authors/id/JNOLAN/timefetch-1.02
  • The Web Robots FAQ: http://info.webcrawler.com/mak/projects/robots/faq.html
  • MySQL home page and documentation: http://web.mysql.com/
  • Set-up

    All of you will have access to Sun Solaris machines in PSL, see the course web site for details, but at least bugatti.cs.ucsb.edu will be available. The SQL database will be running on bugatti and you will have to store your data there, however, the database is accessible remotely, so you can connect to it from any machine you choose to complete the project.

    To use perl on bugatti, use /usr/local/bin/perl (not /usr/pubsw/bin/perl or /usr/bin/perl) so you get a slew of HTTP and HTML related modules. (You may want to put /usr/local/bin in your path before the other dirs, use "perl -V" to check: look at the paths at the end and they should be in /usr/local/lib).

    A MySQL database for you is running on bugatti, and each of you will have your own empty database set up. You will receive mail with your MySQL password. You have all permissions on your database, please refer to the MySQL documentation for more details. Again, the mysql command line interface is in /usr/local/bin: you will need that, plus the DBI perl module.

    You can log in to your database and create tables using:

    # mysql -h bugatti -u username -p username
    Enter password: *****

    Note that the "username" at the end is the name of your database.

    Robot

    The objective of the project is to write a robot that given a Yahoo company category page URL fetches stock information on all the companies. To start out, the robot needs to fetch the list of companies we will be tracking and trading this quarter from http://biz.yahoo.com/p/_techno-cmptrs.html.

    Each of the company links on that page lead to a company information page from which the robot needs to gather the following information:

    In addition to this fairly static data, the robot needs to retrieve historical stock quote information from  http://chart.yahoo.com/d by appropriately filling out the form (you need the company stock ticker). In particular, the robot needs to fetch the following data for every day from May 1st 2000 thru March 30th 2001: high price, low price, closing price, and volume. Note that the robot will need to fetch multiple pages to get data for every day in this range. The robot will also need to record when a stock splits so your project 2 web site can display split-adjusted prices. The easiest is probably to infer the splits from the adjusted closing price provided by Yahoo.

    Database

    As your robot fetches the information from Yahoo, it needs to talk to mysql and create a database with the two tables described below. It is ok for you to create the tables manually, but the robot needs to insert the data as it crawls.

    companies table:

    Field Type
    ticker varchar(8)
    name varchar(64)
    bizDesc text
    finDesc text
    marketCap decimal(7,2)
    shareOut decimal(7,2)
    shareFloat decimal(7,2)

    quotes table:

    Field Type
    ticker varchar(8)
    date date
    open decimal(9,4)
    high decimal(9,4)
    low decimal(9,4)
    close decimal(9,4)
    volume bigint(20) unsigned 
    adjClose decimal(9.4)

    What to turn in

    You will need to hand in the perl source code for your robot as well as a one page project overview. We expect the robot code to be well commented so that its functioning is self-evident. The overview should describe the overall structure of the robot and explain any tricks discovered while writing and running the robot. We will also verify the results in our database.