Thanks for the deep dive. One approach I have seen systems take is to make the asyn-logger write to a short lived file on disk (5 mins or 1hr) and let a daemon running (independent from the application) on the same instance to take care of flushing those logs to the stream/queue system over network. This way (1) company can have a centralized system decoupled from application's language etc. for publish logs (2) handle Kafka/Kinesis failure using retry without bothering the application (3) application crashes won't affect logging (esp info needed to debug the crash), daemon would continue to publish as long as instance is up (4) allow batching to avoid too many network calls.
The logs are batched in memory and sent to Kafka. How is the partition key chosen here ? And how will consumers know whether to persist it in hot , warm or cold tier ? Is it pushed to hot tier first and then a separate daemon process transfer the logs to warm & cold tiers? Because the retention in Kafka holds for 7 days (can be extended) , if increased the Kafka storage also increases
Those who do not know syslog are doomed to reinvent it.
Another key thing is if a log is not going to be emitted due to log level, do not do the string operations to create the message which will not be sent.
Also the most expensive floating point operation is printf, so only log floating point quantities when essential.
Really strong breakdown of how logging becomes its own infrastructure problem. The part about treating logs as a firehose instead of structured data really captures why most teams hit a wall. One thing that often gets underestimated is the query cost later on, even with hot/warm/cold storage people dunno they're still indexing way too much in the hot tier. Sampling or filtering eariler can save tons in compute.
Nice article.
Thanks for the deep dive. One approach I have seen systems take is to make the asyn-logger write to a short lived file on disk (5 mins or 1hr) and let a daemon running (independent from the application) on the same instance to take care of flushing those logs to the stream/queue system over network. This way (1) company can have a centralized system decoupled from application's language etc. for publish logs (2) handle Kafka/Kinesis failure using retry without bothering the application (3) application crashes won't affect logging (esp info needed to debug the crash), daemon would continue to publish as long as instance is up (4) allow batching to avoid too many network calls.
The logs are batched in memory and sent to Kafka. How is the partition key chosen here ? And how will consumers know whether to persist it in hot , warm or cold tier ? Is it pushed to hot tier first and then a separate daemon process transfer the logs to warm & cold tiers? Because the retention in Kafka holds for 7 days (can be extended) , if increased the Kafka storage also increases
Great solution. Looking forward to seeing the hot warm cold solution!
Those who do not know syslog are doomed to reinvent it.
Another key thing is if a log is not going to be emitted due to log level, do not do the string operations to create the message which will not be sent.
Also the most expensive floating point operation is printf, so only log floating point quantities when essential.
Really strong breakdown of how logging becomes its own infrastructure problem. The part about treating logs as a firehose instead of structured data really captures why most teams hit a wall. One thing that often gets underestimated is the query cost later on, even with hot/warm/cold storage people dunno they're still indexing way too much in the hot tier. Sampling or filtering eariler can save tons in compute.