Cloud Zone is brought to you in partnership with:

Developer with experience in a variety of different systems and technologies, with a customer focus and balance with business goals. Particularly interested in backend and large scale systems, and also interested in high level architecture, and API design. Always open to feedback in order to keep learning and improving as a professional. Rodrigo is a DZone MVB and is not an employee of DZone and has posted 37 posts at DZone. You can read more from them at their website. View Full User Profile

An S3 File Bucket Downloader Written in Ruby

06.06.2012
| 2592 views |
  • submit to reddit

Today I wanted to download files from a website that I happened to find out that stored all files in S3. By accessing the website root, I realized that it was just the response of a S3 ListBucket API call. For instance:

    <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">  
       <Name>foo.com</Name>  
       <Prefix/>  
       <Marker/>  
       <MaxKeys>1000</MaxKeys>  
       <IsTruncated>true</IsTruncated>  
       <Contents>  
          <Key>file/1</Key>  
          <LastModified>2011-06-09T06:29:02.000Z</LastModified>  
          <ETag>"5cb3930839817ff4a5c1ddf08e3fea1e"</ETag>  
          <Size>1440231</Size>  
          <StorageClass>STANDARD</StorageClass>  
       </Contents>  
       <Contents>  
          <Key>file/2</Key>  
          <LastModified>2011-06-09T06:29:18.000Z</LastModified>  
          <ETag>"96fdc94d14b6d9817f80ac1e9e2049b4"</ETag>  
          <Size>1310</Size>  
          <StorageClass>STANDARD</StorageClass>  
       </Contents>  
    </ListBucketResult>  

In order to download all files more quickly, I wrote the following Ruby program that downloads all files from this website, and I hope it can be useful for others:

    require 'net/http'  
    require 'rexml/document'  
      
    baseurl = 'foo.com'  
      
    # get the XML data as a string  
    xml_data = Net::HTTP.get_response(URI.parse("http://" + baseurl)).body  
      
    # extract event information  
    doc = REXML::Document.new(xml_data)  
    titles = []  
    links = []  
    Net::HTTP.start(baseurl) do |http|  
      doc.elements.each('ListBucketResult/Contents/Key') do |ele|  
        puts "Downloading " + ele.text  
        resp = http.get("/" + ele.text)  
        open("images/" + ele.text.gsub("/", "_") + ".jpg", "wb") { |file|  
          file.write(resp.body)  
        }  
      end  
    end  
    puts "Done"  
Published at DZone with permission of Rodrigo De Castro, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)