Infrastructure metrics

AWS  

If you are running on AWS we recommend setting up Cloudwatch alarms for the following metrics

  • AWS Aurora

    • Alert when CPUUtilization is above 80% utilization. Reference here.

    • Alert when FreeableMemory is below 80% utilization of the total memory. Reference here.

  • AWS Elasticache

    • Alert when DatabaseMemoryUsagePercentage is above 80%. Reference here.

    • Alert when EngineCPUUtilization is above 80%. Reference here.

  • AWS ALB

    • Alert when UnHealthyHostCount is above 50% of the desired host count for over 2 mins. For example: If you have set desired task count for tines-app to be 2, then your set the threshold for UnHealthyHostCount to be 1. Reference here.

    • Alert when HTTPCode_ELB_502_Count is above 5 requests. This metric indicates that your load balancer cannot successfully route requests to its backends and you traffic has been dropped. Reference here.

      • If you see frequent occurrences of this alert then increase your desired tasks count.

  • AWS ECS Fargate

    • Alert when CPUUtilization is consistently (5 minutes or more) above 80%. Note: this could also be a sign that you may need to increase the number of tasks on the service. In other words, scale horizontally. Reference here.

If you find that the alert is frequent and any of the metrics are consistently above the mentioned thresholds then its best to scale up the instance type. For example: If you are on db.r7g.large , you should upgrade the Aurora cluster to db.r7g.xlarge.

Non-AWS setup  

For now AWS setups our recommendations are similar to AWS setups. For example

  • You should setup monitoring for your storage system if it is occupying more than 80% of total storage.

  • You should setup monitoring if the CPU utilization of your compute systems is consistently above 80%.

Was this helpful?