Logging in Practice
In this excerpt on "Continuous movement of log and binary files..." from Hadoop in Practice by Alex Holmes, the overall class of things called log files are discussed -- and the excerpt hints both at the widespread availability of log files, but also the overwhelming amount of data that they represent. It is this "overwhelming" aspect of processing logs that Holmes is taking up here; naturally since the book is about Hadoop, he goes on to discuss how Hadoop can be used to manage these potentially vast data streams.
While using Hadoop's HDFS and associated tools to move and manage vast amounts of data might be a magical solution, it is an approach that requires vast resources -- and not every company is there yet. Many companies are simply too small; other companies simply have no use yet for Hadoop. But every company with an online presence (and many who don't) should keep good log data! Good management, good backups, regular checking of log files -- just a good systematic approach to log data management can be enough in cases where data streams are not too large.
My favorite use for log data is debugging. Just getting the chance to crawl around in log files can deliver quite a few unique insights; just learning how an application being healthy "looks" can be helpful, and a good complement to measuring baselines regularly. Often the two techniques work well together, with one or the other being the "canary" before the other even detects the anomaly.
When data is 'sparse,' lots of small writes to logs spread out over time, it can be harder when to know when something is happening, which is where notifications come in. Whether pager, email, or text message, notifications can be key to properly handling logging -- especially for those events that can grow quickly if not handled rapidly. While there are always companies out there with better "solutions," the key is that there is no one-size-fits-all solution. Pay attention to how well your system is working, pay attention to where the common pain points seem to be. If you find a "solution" that seems to match well, both with your location and number of pain points, then by all means -- go for it. Your Operations and Engineering teammates will thank you.