Everything monitored.  Quickly gain a complete view of your IT infrastructure, no matter how complex.  Checkmk provides powerful monitoring of networks, servers, clouds, containers and applications. Fast. Effective.

Checkmk provides features for every need

Checkmk is a comprehensive solution for monitoring of applications, servers, and networks. This vast set of features was designed in collaboration with our customers over many years. Checkmk is easy to learn and use, but powerful enough for the most complex IT environments.

Automate monitoring to save time

  • Add new components with less effort using automatic detection and configuration. You don't need to tell Checkmk that firewall is a firewall — it will recognize it and provide monitoring for all relevant components, along with their metrics and thresholds.

  • Automate monitoring of dynamic, ephemeral infrastructure. Containers, pods, VMs and more can be added and removed from monitoring automatically.

  • Use modern rule-based 1-to-N configuration, which remains intuitive even in complex environments and results in lower configuration effort than other monitoring solutions.

  • Automate the configuration and operation with the Checkmk REST-API.

  • Centrally manage your agents and automate agent updating with the Agent Bakery.

  • Integrate other systems using powerful APIs to automate almost anything imaginable.

  • Integrate data from a wide range of data sources and formats for metrics (JSON, XML, SNMP data, and more).

Get monitoring up and running quickly

  • Fast installation from a single integrated package, available for many platforms and as a Docker container. No need to separately install and maintain databases and web servers.

  • Don't waste time deciding which metrics you need — Checkmk auto-discovers the relevant metrics for you, buidling on years of expertise.

  • Monitor using powerful agent-based monitoring, agentless monitoring via HTTP/SNMP, or direct API connection to many applications.

  • Quickly identify problems in your IT environment through an easy to identify 'state' (OK, WARN, CRIT) of each monitored component or system, drilling down with a single click.

  • Configure everything in a web interface. Fast, easy, and less prone to error.

  • Apply your existing role-based access controls (LDAP, AD) to a fine-grained permission model for user and group actions.

Monitor everything using a comprehensive collection of plug-ins

  • Checkmk's collection of plugins is unique: we directly maintain over 2,000 of them. In the unlikely case that one is missing, you can probably find it in the Checkmk Exchange.

  • Easily collect relevant metrics across heterogeneous infrastructure while avoiding third-party plugins from questionable sources.
  • Benefit from actively maintained, regularly updated plug-ins that keep up with software and hardware changes

Scale monitoring with a performance-optimized, distributed architecture

  • Benefit from Checkmk's own high-performance core: the 'CMC'. The 'core' is the heart of every monitoring system, querying plug-ins, collecting results, providing information to the GUI, and more.

  • Monitor thousands of services from a single monitoring server.

  • Scale across hundreds of sites and millions of devices. Build a world-wide distributed monitoring network, achieving a scale that is hard to find in monitoring systems.

  • Leverage highly efficient, self-contained agents with minimal CPU, RAM and storage utilization. They run on even the smallest servers, without the need for DLLs or libraries.

Modern monitoring for cloud-native and on-premises architectures

  • Ingest data with high enough granularity to handle IT architectures of all kinds — traditional environments and container orchestration platforms included.
  • Sample in real-time, with measurement intervals as short as 1 second.
  • Tag your data by hand or auto-discover tags and labels to provide relevant context to help you filter — labels offer full flexibility and tags ensure consistency.
  • Store metrics in disk-space-efficient long-term storage.

Get detailed insights into your network

  • In-depth analysis of your network traffic with the integration of network flows into Checkmk via ntop
  • Traffic dashboards for your network
  • View alerts, characterized by duration, severity and alert type
  • Filter flows in many dimensions to analyze your networks
  • Detailed views for your hosts: traffic, packets, ports, peers

Easily customize or extend to meet your needs

  • Customize or extend the Checkmk source code, written in easy-to-read Python.
  • Rely on tribe29 and our broad network of partners to customize Checkmk or its plugins.
  • Use the Check-API, a set of common functions that make plug-in development much easier.
  • Learn from the extensive developer documentation.

Visualize your data with integrated, customizable dashboards, or view with Grafana

  • Leverage graphic maps and diagrams with live monitoring data to get a dynamic view of the health of your infrastructure and applications.
  • Analyze time-series metrics over long time horizons with interactive HTML5 graphs.
  • Customize dashboards and views to your specific needs with different dashboard elements to visualize your most important metrics.
  • Compare metrics across multiple graphs at a glance.
  • Custom dashboards and views for users or user groups, e.g. vSphere specific views for VMware admins.
  • Customize side menus for the user's workload — e.g. monitoring admins need live statistics while network admins might only need reporting.
  • Alternatively visualize your data in Grafana using the Grafana Checkmk datasource plugin or using Checkmk's Graphite exporter for InfluxDB.

Avoid notification overload with smart alerting

  • Notify the responsible team quickly — e.g. notify the storage admins for a failing disk, but not the email admins.
  • Notify via email, SMS and third-party tools such as ServiceNow, Jira, Slack, PagerDuty and VictorOps — use your established tools for handling incidents.
  • Leverage comprehensive, rule-based notifications to fulfill complex custom requirements regarding time periods, service levels, and more.
  • Configure additional alerts or cancel an alert in specific situations. Escalate problems if they are not handled in time.
  • Handle alerts centrally — even in distributed environments.
  • Use the alert handler to automatically trigger actions upon detection of new problems — e.g. remediation steps contained within a script.

Combine metrics and log data for fast problem identification and root cause analysis

  • Capture and analyze error messages via syslog, SNMP-traps, or arbitrary log files.
  • Filter and forward events, triggering scripts or generating notifications.
  • Collapse duplicate entries into a single event (e.g. several failed user logins) to prevent operator overload.
  • Filter incoming messages to only show important events. No more manual filtering and information overload.

Study the past and plan for the future with advanced analytics

  • Use sophisticated predictive monitoring algorithms to dynamically adapt thresholds based on historical events.
  • Study data, predict trends, and forecast resource utilization easily and efficiently.

Bridge the gap between Dev and Ops with the Prometheus integration

  • Use Checkmk's powerful context monitoring to complement Prometheus' flexibility.
  • View structured information and helpful context for Prometheus metrics from key exporters.
  • Execute PromQL queries directly from within Checkmk, e.g. to monitor data captured via code instrumentation.

Support business leaders by monitoring the health of key processes

  • Monitor business processes by mapping application dependencies into a single overview.
  • View the availability and performance of complex systems at a glance.
  • Aggregate various services and hosts into a single state.
  • Review historical states to determine the root cause for degraded performance.
  • Deliver more reliable services to customers by maintaining awareness through a completely transparent, easy-to-understand view.
  • Support all possible setups — high availability with two or more nodes, HPC, and more — with unprecedented freedom.
  • Simulate worst-case scenarios in real-time, studying the impact of failing components to determine areas of operational weakness.

Identify all assets in and within your IT

  • Identify and inventorize all hardware and software, proactively monitoring changes.
  • Benefit from integration of dynamic parameters, such as disk space utilization, that are regularly updated via monitoring data.
  • Combine these dynamic parameters with data from your CMDB, comparing "static" and "dynamic" monitoring views of your asset's state.

Proactively keep the business informed with automatically generated reports

  • Generate branded PDF reports containing all of the views you build — either on-demand or automated at regular intervals.
  • Review the history of states over any desired timeframe with a single click, and compute availability metrics in real time.
  • Deaverage availability data. Exclude non-monitored times, change the resolution, and ignore short intervals.
  • Get notified before you break contracts by monitoring the compliance of complex SLAs — even if the SLA definition contains only working hours.

Integrate with major ITOM/ITSM tools to streamline workflows

  • Use powerful, well-documented APIs to build deep integrations.
  • Interface with standard, off-the-shelf Configuration Management Database (CMDB) software.
  • Configure monitoring using existing information from a Configuration Management Database (CMDB) via Checkmk's APIs.

Migrate from Nagios without skipping a beat

  • Continue using your Nagios Checks for the (rare) cases where native Checkmk ones do not yet exist — the Checkmk Microcore supports them both.
  • Take advantage of your familiarity with Nagios — avoid retraining your entire operations team. Checkmk and Nagios share many design concepts, making the switch easy.
  • Checkmk improves on several design deficiencies present in Nagios — and Checkmk Enterprise Edition operates 100% stand-alone.