Cloud Zone is brought to you in partnership with:

I am currently working as a Principal Architect at 8KMiles heading the AWS practice. Heavily involved in Architecture consulting, I help companies build systems that Scale and deliver the value of Cloud Computing to Startups, SMBs and enterprises. I also love to evangelize about Cloud Computing and talk about problems at scale and designing solutions for them by regularly presenting at conferences, users groups, webinars and student workshops. Raghuraman is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Log archive & analysis with Amazon S3 and Glacier - Part II

  • submit to reddit

In the previous post, we saw how to configure logging in AWS CloudFront and start collecting access logs. Let's now move on to the next tier in the architecture - web/app. The following are the key considerations for  logging in web/app layer:

  • Local Log Storage - choosing the local storage for logging. Using a storage option that is sufficient, cost-effective and meets the performance requirement for logging
  • Central Log Storage - how do we centrally store log files for log analysis in future
  • Dynamic Infrastructure - how do we collect logs from multiple servers that are provisioned on demand
Local Log Storage Except very few cases, EBS-backed Instances are the most sought after Instance type. They launch quickly and easier to build Images out of them. But they come with couple of limitations from logging perspective
  • Limited storage - EBS-backed AMIs that are provided by AWS or third-party providers come with limited storage. For example, a typical RHEL AMI comes with around 6GB of EBS attached as the root partition. Similarly Windows AMI come with 30GB EBS attached as C:\
  • Growing EBS - log files tend to grow faster. And it becomes difficult to grow the root EBS (or an additional EBS) as the log file sizes grow
  • Performance - any I/O operation on an EBS Volume is over the network. And it tends to be slower than local disk writes. Specifically for logging, it is always better to remove the I/O bottleneck. Otherwise lot of system resources could be spent towards logging
Every EC2 Instance comes with ephemeral storage. These are local storage directly attached to the host on which the Instance is running. Ephemeral storage do not persist between stop-start cycles of an Instance (EBS-backed) but they are available when the Instance is running and persist during reboots. There are couple of advantages of Ephemeral storage:
  • They are locally attached on the physical host on which the Instance runs and hence have better I/O throughput when compared to EBS Volumes
  • They come in pretty good size - for example a m1.large Instance comes with 850GB of ephemeral storage
  • And it comes free of cost - you aren't charged per GB or for any I/O operations on the ephemeral storage unlinke EBS
This makes ephemeral storage the ideal candidate for storing log files. For an EBS-backed Instance, the ephemeral storage is not mounted and readily available. Hence one needs to follow the following steps to start using the ephemeral storage for storing log files
  • The logging framework usually comes with a configuration file to configure logging parameters. The log file path needs to be configured to point to the ephemeral storage mount directory that we create below
  • All application related files (such as binaries, configuration files, web/app server) will be installed on the root EBS. Before the final AMI is created, the ephemeral storage needs to be setup and configured
  • Run fdisk to list all the storage devices that are mounted
fdisk -l

Created a directory such as "/applogs". This is the directory where the ephemeral storage will be mounted

mkdir /var/log/applogs

Mount the storage device in this directory using the "mount" command

mount /dev/xvdj /var/log/applogs

Add "fstab" entries so that the ephemeral storage is mounted in the same directory after stop/start or when new Instances are launched out of this AMI

/dev/xvdj  /data xfs defaults,noatime,nobarrier,allocsize=64k,logbufs=8 0 2
/dev/xvdj /data    ext3    defaults        0   0

The last step is essential especially from AutoScaling point of view. When AutoScaling launches new Instances, the ephemeral storage needs to be automatically mounted in the directory so that the application can start logging. Now, we can go ahead create the final AMI and launch Instances from them. The new Instances will have the ephemeral storage automatically mounted in the "/var/log/applogs" directory and applications can start storing the log files in them.


Published at DZone with permission of Raghuraman Balachandran, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)