Page snatcher 2

Posted by Aaron Feng Sun, 17 Feb 2008 03:02:00 GMT

A couple of months ago I wrote a utility that will download a web page with all the dependencies (css, and images) to your hard drive. All the references in the web page will be changed to refer to your local copy.

I wrote it as a prototype, and it took me 30 to 40 minutes to write it, so I'm sure there is room for improvement. I pointed to a few web pages, such as amazon, ebay, google, and my blog it worked pretty well!

The code requires Why the lucky stiff's Hpricot library. With out further adieu, here is the code below:

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'rio'

module Hpricot
  class Elem
    def is_css
      if self.name == "link"
        self["type"] == "text/css"
      else
        false
      end
    end
    def is_full_path
      if self.name == "link"
        self["href"][0..6] == "http://"
      elsif self.name == "img"
        self["src"][0..6] == "http://"
      else
        false
      end
    end
  end
end

if ARGV.size.zero?
  puts "Missing web page you wish to snatch."
  exit
end

url_scheme = "http://"
url = ARGV[0]
doc = Hpricot(open(url_scheme + url))

Dir.mkdir(url) unless File.directory?(url)

doc.search("link") do |item|
  if item.is_css
    if item.is_full_path
      rio(item['href']) > rio(url)
    else
      rio(url_scheme + url + item['href']) > rio(url)
    end

    # nested style sheets in another style sheet
    css_path = File.dirname(item['href'])
    css_file = File.basename(item['href']).scan(/(.*?\.css)/m).flatten.to_s

    file = File.open(url + "/" + css_file,"r")

    inner_css = file.read.scan(/@import '(.*?\.css)';/m).flatten
    inner_css.each do |css|
      css_url = url_scheme + url + css_path + "/" + css
      rio(css_url) > rio(url)
    end
    file.close

    item['href'] = css_file
  end
end

doc.search("img") do |item|
  if item.is_full_path
    rio(item["src"]) > rio(url)
  else
    rio(url_scheme + url + item["src"]) > rio(url)
  end
  item["src"] = item["src"].split("/")[-1]
end

File.open(url + "/" + url + ".html", "w") do |file|
  file << doc.to_s
end

Slime video and transcript 2

Posted by Aaron Feng Thu, 14 Feb 2008 05:30:00 GMT

Learning Lisp has been a challenge so far. I'm plunging my way through Practical Common Lisp at a decent pace though. Learning Lisp development environment is another story. Since all the hardcore Lisp hackers are all using emacs with slime I figured I would do the same. I have no previous experience with emacs, so I'm simultaneously learning 3 things at once.

One night I came across Maco Baringer's slime video. After I watched the video, all the dots in my head started to connect. In the video, Maco demonstrated the usage of slime while he developed a morse code application. I kept rewinding to figure out the key strokes used. Recently Peter Christensen wrote a transcript for the slime video. This made the video much easier to follow. Christensen also has a transcript for Baringer's Hello World video using Uncommon Web framework.

Besides the transcripts, Christensen is also working on emacs/slime cheat sheet. As a noob learning Lisp, I need all the help I can get.

Coming out of the Lisp dungeon 1

Posted by Aaron Feng Thu, 14 Feb 2008 02:03:00 GMT

For the past few weeks I have been playing around with Lisp during pretty much all my free time. Yes, Lisp. This is not my first encounter with Lisp. I played around with Lisp when I was in college, and I hated every moment of it. Functional languages have always been foreign to me, and I tried to stay far away from that path. I'm at the point in my career I can no longer ignore the existence of functional languages.

Last year I decided to learn Erlang since it received a lot of buzz in the community. Shortly after learning the basics, I lost interest. Even though I no longer keep up with Erlang on a regular basis, it has eased me into the realm of functional languages.

Back in 2004, I read Hackers & Painters by Paul Graham. The chapter entitled "Beating the average" has stuck in my mind since I first read it. In this chapter, Graham described how he was able to overtake the big corporations during the internet boom using an unconventional programming language: Lisp. I decided to give the chapter another read to refresh my mind. After reading it again, I was inspiredto give Lisp another try. To setup my Lisp environment, I installed emacs, slime, and sbcl on my macbook.

After going through a few online Common Lisp tutorials, I just couldn't get enough of it. The more I learned about it, the deeper I wanted to understand.Eventually I stumbled on to Practical Common Lisp book by Peter Seibel. The book is very well written and easy to read. By the nature of the language, Lisp tends to be more theoretical. Seibel connected the theoretical with the practical which made the book relevent and enjoyable. Best of all, the whole book is freely available online. Even if you have no desire to learn Lisp, just read the first 3 chapters (they are very short). Who knows, you might just continue to read the whole thing.

Project Euler Solutions 1 - 5 5

Posted by Aaron Feng Mon, 31 Dec 2007 23:49:00 GMT

Click on each problem for a more detailed solution.

1. Add all the natural numbers below 1000 that are multiples of 3 or 5.

start = Time.now
total = 0
(1...1000).each do |n|
  total += n if (n % 3).zero? or (n % 5).zero?
end

puts "Took: #{Time.now - start} seconds"
puts total

Took: 0.000977 seconds

2. Find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed one million.

start = Time.now
def fib(n1, n2, total)
  return total if n2 > 1000000 
  total += n2 if (n2 % 2).zero? 
  fib(n2, n1 + n2, total)
end

puts "Took: #{Time.now - start} seconds"
puts fib(1, 2, 0)

Took: 1.0e-05 seconds

3. Find the largest prime factor of 317584931803.

def next_prime(start_num, max_num)
  is_prime = true
  prime = 0
  (start_num + 1..max_num).each do |n|
    prime = n
    (2..n - 1).each do |nn| 
      if (n % nn).zero? then is_prime = false; break; end
    end
    if is_prime then break; end
    is_prime = true
  end
  prime
end

start = Time.now
n = 1
biggest_prime = 0
num = 317584931803

while(num != 1 or num > n)
  n = next_prime(n,  num)
  if n > 0
    result = num % n 

    if (result).zero?
      biggest_prime = n
      num = num / n 
    end
  end
  n += 1  
end

puts "Took: #{Time.now - start} seconds"
puts biggest_prime

Took: 0.477182 seconds

4. Find the largest palindrome made from the product of two 3-digit numbers.

start = Time.now
result = 0
left = 0
right = 0

(100...1000).to_a.reverse.each do |l|
  (100...1000).to_a.reverse.each do |r|
    temp = (l * r).to_s 

    if temp == temp.reverse and temp.to_i > result
      result = temp.to_i
      left = l
      right = r
    end     
  end
end

puts "Took: #{Time.now - start} seconds"
puts "#{left} * #{right} = #{result}"

Took: 1.501993 seconds

5. What is the smallest number divisible by each of the numbers 1 to 20?

def is_prime(num)
  if num < 2  then return false; end
  (2..num - 1).each do |n|
    if (num % n).zero?
      return false
    end
  end
  return true
end

def smallest_factor(num)
  (2..num - 1).each do |n|
    if (num % n).zero? then return n; end
  end
  return num
end

start = Time.now
result = 1 
(1..20).each do |n|
  if is_prime(n) 
    result = result * n
  elsif not (result % n).zero? 
    result = result * smallest_factor(n) 
  end
end

puts "Took: #{Time.now - start} seconds"
puts result

Took: 0.00018 seconds

Project Euler

Posted by Aaron Feng Tue, 25 Dec 2007 03:54:00 GMT

A while back my friend James Horsley told me about Project Euler. I just pushed it onto my stack of things to look into. Recently, I was reminded of it again from Steve Eichert at work, so decided to give a try.

Project Euler contains a collection of mathematical problems ranging in difficulties. A problem can be solved using pencil and paper or using a computer program. The only requirement using a computer program is that it should run under one minute. It's an honor system because all you need to submit is the answer.

My choice of weapon is Ruby. I have been spectating Ruby for the past 5 years (not entirely true, but mostly). I figured it's time to roll up my sleeves. I'll be posting my solutions in batches for those who are interested. In addition to posting the solutions, I'll also post the amount of time each solution took. However, I will not post the answers. You can run the code on your own machine if you wish to see the answers. This way I will not ruin it for people who are interested in solving the problems themselves. All the code has been run on my MacBook on an Intel Core 2 Duo 2.0 Ghz with Ruby 1.8.

Hardcore Erlang

Posted by Aaron Feng Thu, 29 Nov 2007 03:01:00 GMT

There is definitely a lot of momentum behind Erlang recently and more is about to come. A few months ago Joe Armstrong released Programming in Erlang which set off the initial Erlang awareness for many people including myself. Earlier this month, Channel 9 posted two videos with Armstrong on Erlang (part1 and part2).

Now another Erlang book is in progress: Hardcore Erlang by Joel Reymont.
It is also another Pragmatic Programmers book. The inital project for the book was a poker server, but now the focus is on a stock exchange program. A quote from Reymont:

"So lets build a stock exchange! Not just any stock exchange but one running on the biggest Erlang cluster in the world. This cluster does not exist yet but can be put together on a moments notice, using Amazon EC2."

This book might just keep Erlang on the hotness list for the year of 2008.

Another reason for Rhino Mocks Generic Constraint 2

Posted by Aaron Feng Mon, 08 Oct 2007 11:22:00 GMT

Jeffrey Palermo recently had a post about Generic Constraints for Rhino Mocks - make unit tests more readable. I would like to touch on an alternative reason why you might want to use Generic Constraint. I'll reiterate Jeffrey's example before I start, with minor modification, and offer possible alternative ways the tests can be written.

Jeffrey implemented a GenericConstraint class to capture the parameter of the method call on a mock object. Below resembles his original example:

[Test]
public void ShouldSaveObjectWithAllInformation() {
    string firstName = "Aaron";
    string lastName = "Feng";

    MockRepository mocks = new MockRepository();
    IPersonRepository personRepository = mocks.CreateMock<IPersonRepository>();

    personRepository.SavePerson(null);
    GenericConstraint<Person> personConstraint = new GenericConstraint<Person>();
    LastCall.On(personRepository).Constraints(personConstraint);

    mocks.ReplayAll();

    PersonController controller = new PersonController(personRepository);
    controller.PersonFirstName = "Aaron";
    controller.PersonLastName = "Feng";
    controller.Save();

    mocks.VerifyAll();

    Person person = personConstraint.GetParameterObject();
    Assert.AreEqual(person.FirstName, firstName);
    Assert.AreEqual(person.LastName, lastName);
}

Once the PersonConstraint captured the Person that is being saved, he asserted that the values are as expected. This makes the test look more like a typical unit test.

Jeffrey's goal was to avoid the following code:

public delegate void Proc<P>(P p);

[Test]
public void ShouldSaveObjectWithAllInformationUsingBuildInConstraint() {
    string firstName = "Aaron";
    string lastName = "Feng";

    MockRepository mocks = new MockRepository();
    IPersonRepository personRepository = mocks.CreateMock<IPersonRepository>();

    personRepository.SavePerson(null);
    LastCall.On(personRepository).IgnoreArguments().Do(
        new Proc<Person>(delegate(Person person) {
            Assert.AreEqual(person.FirstName, firstName);
            Assert.AreEqual(person.LastName, lastName);
        })
    );

    mocks.ReplayAll();

    PersonController controller = new PersonController(personRepository);
    controller.PersonFirstName = "Aaron";
    controller.PersonLastName = "Feng";
    controller.Save();

    mocks.VerifyAll();
    // Notice no asserts
}

An astute reader might say: "Hey you don't have to do that, just implement the Equals method on the Person class." Which would look like the following:

[Test]
public void ShouldSaveObjectWithAllInformationUsingEquals() {
    MockRepository mocks = new MockRepository();
    IPersonRepository personRepository = mocks.CreateMock<IPersonRepository>();

    // Have to implement Equals on Person
    personRepository.SavePerson(new Person("Aaron", "Feng"));

    mocks.ReplayAll();

    PersonController controller = new PersonController(personRepository);
    controller.PersonFirstName = "Aaron";
    controller.PersonLastName = "Feng";
    controller.Save();

    mocks.VerifyAll();
    // Notice no asserts again
}

The last example by implementing an Equals on the Person object which made the test look too clean. The asserts are invisible. On top of that, you have to implement an Equals method on an Object which you might not ever call the Equals in the real system. I believe this is the real power behind Jeffrey's Generic constraint approach. One should avoid writing any code that is not utilized by the real system just to satify the test.

List comprehension kicks ass 4

Posted by Aaron Feng Sat, 29 Sep 2007 02:50:00 GMT

Recently I'm on an Erlang high, so I have tried to play around with it as much as I can. It's very common for any application to create a new list based on an existing list. For example in C# you would do something like the following:

public List<string> QualifiedUserNames(List<User> users) {
  List<string> names = new List<string>();
  foreach(User user in users) {
    if(user.Age >= 30) {
      names.Add(user.Name);
    }
  }
  return names;
}

Equivalent code in Erlang:

QualifiedUserNames(Users) -> [Name || {user,{name,Name},{age, Age}} <- Users, Age >= 30]

The Erlang function uses list comprehension to do all the dirty work. It loops through every item in the Users list, and extracts only user "type" which matches the pattern {user,{name,Name},{age,Age}}. This is done because Erlang is a dynamic language, and the list doesn't have to contain heterogeneous items. Age >= 30 is a predicate that checks if the user should be added to the newly created list and if so, the Name is added.

Pretty cool, right? I think so. This capability is one of the many reasons why Erlang program is usually shorter than programs written in other languages. Well back to programming in Erlang some more.

The beauty of code

Posted by Aaron Feng Wed, 29 Aug 2007 01:15:00 GMT

A few days ago, Steve Eichert sent me this link to Marcel Molina's presentation at Ruby Hoedown 2007. Marcel explores what makes code beautiful. He lists the three following attributes:

  • Proportion
  • Integrity
  • Clarity

Proportion references the amount of code needed to make a feature work. You wouldn't expect 27 lines of code to multiply two numbers together. The code has integrity if it doesn't break down under non-trivial cases. Lastly, the code should be clear as to what it is trying to accomplish.

This is a very simple and elegant way to describe beautiful code.
Watch the video if you to see how Marcel came up with the assertions.

JRuby Dilemma

Posted by Aaron Feng Mon, 16 Jul 2007 22:44:00 GMT

I submitted the following code to JRuby mailing list, and I haven't received any reponse yet. Can you spot the problem?

package my;
import java.util.Vector;

public class MyClassInJava {
    Vector vector;

    public MyClassInJava(java.util.Vector vector) {
        this.vector = vector;
    }

    public Object getVector() {
        return vector;
    }
}

Here is my ruby code which calls the Java code above:

include Java
require 'my.jar'

class MyVector < java.util.Vector
  def my_method
  end
end

class MyRuby
  def initialize
    my_vec = MyVector.new
    c = Java::my.MyClassInJava.new(my_vec)
    vec_from_java = c.getVector()

    if vec_from_java.respond_to?(:my_method)
      puts "found"
    else
      puts "not found"
      puts vec_from_java.java_class
    end

  end
end

r = MyRuby.new

The output from the ruby code :

not found

org.jruby.javasupport.proxy.gen.Vector$Proxy0

Older posts: 1 2 3 4