Netdata Helm chart for Kubernetes deployments | Learn Netdata (2024)

Netdata Helm chart for Kubernetes deployments | Learn Netdata (1)Netdata Helm chart for Kubernetes deployments | Learn Netdata (2)Netdata Helm chart for Kubernetes deployments | Learn Netdata (3)

Based on the work of varyumin (https://github.com/varyumin/netdata).

Introduction

This chart bootstraps a Netdata deployment on a Kubernetescluster using the Helm package manager.

By default, the chart installs:

  • A Netdata child pod on each node of a cluster, using a Daemonset
  • A Netdata k8s state monitoring pod on one node, using a Deployment. This virtual node is called netdata-k8s-state.
  • A Netdata parent pod on one node, using a Deployment. This virtual node is called netdata-parent.

Disabled by default:

  • A Netdata restarter CronJob. Its main purpose is to automatically update Netdata when using nightly releases.

The child pods and the state pod function as headless collectors that collect and forwardall the metrics to the parent pod. The parent pod uses persistent volumes to store metrics and alarms, handle alarmnotifications, and provide the Netdata UI to view metrics using an ingress controller.

Please validate that the settings are suitable for your cluster before using them in production.

Prerequisites

Installing the Helm chart

You can install the Helm chart via our Helm repository, or by cloning this repository.

Installing via our Helm repository (recommended)

To use Netdata's Helm repository, run the following commands:

helm repo add netdata https://netdata.github.io/helmchart/
helm install netdata netdata/netdata

See our install Netdata on Kubernetesdocumentation for detailed installation and configuration instructions. The remainder of this document assumes youinstalled the Helm chart by cloning this repository, and thus uses slightly different helm install/helm upgradecommands.

Install by cloning the repository

Clone the repository locally.

git clone https://github.com/netdata/helmchart.git netdata-helmchart

To install the chart with the release name netdata:

helm install netdata ./netdata-helmchart/charts/netdata

The command deploys ingress on the Kubernetes cluster in the default configuration. The configurationsection lists the parameters that can be configured during installation.

Tip: List all releases using helm list.

Once the Netdata deployment is up and running, read our guide, Monitor a Kubernetes (k8s) cluster withNetdata, for a breakdown of all the collectors,metrics, and charts available for health monitoring and performance troubleshooting.

Uninstalling the Chart

To uninstall/delete the my-release deployment:

 helm delete netdata

The command removes all the Kubernetes components associated with the chart and deletes the release.

Configuration

The following table lists the configurable parameters of the netdata chart and their default values.

ParameterDescriptionDefault
kubeVersionKubernetes versionAutodetected
replicaCountNumber of replicas for the parent netdata Deployment1
imagePullSecretsAn optional list of references to secrets in the same namespace to use for pulling any of the images[]
image.repositoryContainer image reponetdata/netdata
image.tagContainer image tagLatest stable netdata release
image.pullPolicyContainer image pull policyAlways
service.typeParent service typeClusterIP
service.portParent service port19999
service.loadBalancerIPStatic LoadBalancer IP, only to be used with service type=LoadBalancer""
service.loadBalancerSourceRangesList of allowed IPs for LoadBalancer[]
service.externalTrafficPolicyDenotes if this Service desires to route external traffic to node-local or cluster-wide endpointsCluster
service.healthCheckNodePortSpecifies the health check node portAllocated a port from your cluster's NodePort range
service.clusterIPSpecific cluster IP when service type is cluster IP. Use None for headless serviceAllocated an IP from your cluster's service IP range
service.annotationsAdditional annotations to add to the service{}
ingress.enabledCreate Ingress to access the netdata web UItrue
ingress.apiVersionapiVersion for the IngressDepends on Kubernetes version
ingress.annotationsAssociate annotations to the Ingresskubernetes.io/ingress.class: nginx and kubernetes.io/tls-acme: "true"
ingress.pathURL path for the ingress. If changed, a proxy server needs to be configured in front of netdata to translate path from a custom one to a //
ingress.pathTypepathType for your ingress contrller. Default value is correct for nginx. If you use yor own ingress controller, check the correct valuePrefix
ingress.hostsURL hostnames for the ingress (they need to resolve to the external IP of the ingress controller)netdata.k8s.local
ingress.specSpec section for ingress object. Everything there will be included into the object on deplyoment{}
ingress.spec.ingressClassNameIngress class declaration for Kubernetes version 1.19+. Annotation ingress.class should be removed if this type of declaration is usednginx
rbac.createif true, create & use RBAC resourcestrue
rbac.pspEnabledSpecifies whether a PodSecurityPolicy should be created.true
serviceAccount.createif true, create a service accounttrue
serviceAccount.nameThe name of the service account to use. If not set and create is true, a name is generated using the fullname template.netdata
clusterrole.nameName of the cluster role linked with the service accountnetdata
APIKEYThe key shared between the parent and the child netdata for streaming11111111-2222-3333-4444-555555555555
restarter.enabledInstall CronJob to update Netdata Podsfalse
restarter.scheduleThe schedule in Cron format00 06 * * *
restarter.image.repositoryContainer image repobitnami/kubectl
restarter.image.tagContainer image tag1.25
restarter.image.pullPolicyContainer image pull policyAlways
restarter.image.restartPolicyContainer restart policyNever
restarter.image.resourcesContainer resources{}
restarter.concurrencyPolicySpecifies how to treat concurrent executions of a jobForbid
restarter.startingDeadlineSecondsOptional deadline in seconds for starting the job if it misses scheduled time for any reason60
restarter.successfulJobsHistoryLimitThe number of successful finished jobs to retain3
restarter.failedJobsHistoryLimitThe number of failed finished jobs to retain3
parent.enabledInstall parent Deployment to receive metrics from children nodestrue
parent.portParent's listen port19999
parent.resourcesResources for the parent deployment{}
parent.livenessProbe.initialDelaySecondsNumber of seconds after the container has started before liveness probes are initiated0
parent.livenessProbe.failureThresholdWhen a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container3
parent.livenessProbe.periodSecondsHow often (in seconds) to perform the liveness probe30
parent.livenessProbe.successThresholdMinimum consecutive successes for the liveness probe to be considered successful after having failed1
parent.livenessProbe.timeoutSecondsNumber of seconds after which the liveness probe times out1
parent.readinessProbe.initialDelaySecondsNumber of seconds after the container has started before readiness probes are initiated0
parent.readinessProbe.failureThresholdWhen a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready3
parent.readinessProbe.periodSecondsHow often (in seconds) to perform the readiness probe30
parent.readinessProbe.successThresholdMinimum consecutive successes for the readiness probe to be considered successful after having failed1
parent.readinessProbe.timeoutSecondsNumber of seconds after which the readiness probe times out1
parent.terminationGracePeriodSecondsDuration in seconds the pod needs to terminate gracefully300
parent.nodeSelectorNode selector for the parent deployment{}
parent.tolerationsTolerations settings for the parent deployment[]
parent.affinityAffinity settings for the parent deployment{}
parent.priorityClassNamePod priority class name for the parent deployment""
parent.database.persistenceWhether the parent should use a persistent volume for the DBtrue
parent.database.storageclassThe storage class for the persistent volume claim of the parent's database store, mounted to /var/cache/netdatathe default storage class
parent.database.volumesizeThe storage space for the PVC of the parent database2Gi
parent.alarms.persistenceWhether the parent should use a persistent volume for the alarms logtrue
parent.alarms.storageclassThe storage class for the persistent volume claim of the parent's alarm log, mounted to /var/lib/netdatathe default storage class
parent.alarms.volumesizeThe storage space for the PVC of the parent alarm log1Gi
parent.envSet environment parameters for the parent deployment{}
parent.envFromSet environment parameters for the parent deployment from ConfigMap and/or Secrets[]
parent.podLabelsAdditional labels to add to the parent pods{}
parent.podAnnotationsAdditional annotations to add to the parent pods{}
parent.dnsPolicyDNS policy for podDefault
parent.configsManage custom parent's configsSee Configuration files.
parent.claiming.enabledEnable parent claiming for netdata cloudfalse
parent.claiming.tokenClaim token""
parent.claiming.roomComma separated list of claim rooms IDs""
parent.extraVolumeMountsAdditional volumeMounts to add to the parent pods[]
parent.extraVolumesAdditional volumes to add to the parent pods[]
k8sState.enabledInstall this Deployment to gather data fr K8s clusteryes
k8sState.portListen portservice.port (Same as parent's listen port)
k8sState.resourcesCompute resources required by this Deployment{}
k8sState.livenessProbe.initialDelaySecondsNumber of seconds after the container has started before liveness probes are initiated0
k8sState.livenessProbe.failureThresholdWhen a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container3
k8sState.livenessProbe.periodSecondsHow often (in seconds) to perform the liveness probe30
k8sState.livenessProbe.successThresholdMinimum consecutive successes for the liveness probe to be considered successful after having failed1
k8sState.livenessProbe.timeoutSecondsNumber of seconds after which the liveness probe times out1
k8sState.readinessProbe.initialDelaySecondsNumber of seconds after the container has started before readiness probes are initiated0
k8sState.readinessProbe.failureThresholdWhen a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready3
k8sState.readinessProbe.periodSecondsHow often (in seconds) to perform the readiness probe30
k8sState.readinessProbe.successThresholdMinimum consecutive successes for the readiness probe to be considered successful after having failed1
k8sState.readinessProbe.timeoutSecondsNumber of seconds after which the readiness probe times out1
k8sState.terminationGracePeriodSecondsDuration in seconds the pod needs to terminate gracefully30
k8sState.terminationGracePeriodSecondsDuration in seconds the pod needs to terminate gracefully300
k8sState.nodeSelectorNode selector{}
k8sState.tolerationsTolerations settings[]
k8sState.affinityAffinity settings{}
k8sState.priorityClassNamePod priority class name""
k8sState.podLabelsAdditional labels{}
k8sState.podAnnotationsAdditional annotations{}
k8sState.podAnnotationAppArmor.enabledWhether or not to include the AppArmor security annotationtrue
k8sState.dnsPolicyDNS policy for podClusterFirstWithHostNet
k8sState.persistence.enabledWhether should use a persistent volume for /var/lib/netdatatrue
k8sState.persistence.storageclassThe storage class for the persistent volume claim of /var/lib/netdatathe default storage class
k8sState.persistence.volumesizeThe storage space for the PVC of /var/lib/netdata1Gi
k8sState.envSet environment parameters{}
k8sState.envFromSet environment parameters from ConfigMap and/or Secrets[]
k8sState.configsManage custom configsSee Configuration files.
k8sState.claiming.enabledEnable claiming for netdata cloudfalse
k8sState.claiming.tokenClaim token""
k8sState.claiming.roomComma separated list of claim rooms IDs""
k8sState.extraVolumeMountsAdditional volumeMounts to add to the k8sState pods[]
k8sState.extraVolumesAdditional volumes to add to the k8sState pods[]
child.enabledInstall child DaemonSet to gather data from nodestrue
child.portChildren's listen portservice.port (Same as parent's listen port)
child.updateStrategyAn update strategy to replace existing DaemonSet pods with new pods{}
child.resourcesResources for the child DaemonSet{}
child.livenessProbe.initialDelaySecondsNumber of seconds after the container has started before liveness probes are initiated0
child.livenessProbe.failureThresholdWhen a liveness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the liveness probe means restarting the container3
child.livenessProbe.periodSecondsHow often (in seconds) to perform the liveness probe30
child.livenessProbe.successThresholdMinimum consecutive successes for the liveness probe to be considered successful after having failed1
child.livenessProbe.timeoutSecondsNumber of seconds after which the liveness probe times out1
child.readinessProbe.initialDelaySecondsNumber of seconds after the container has started before readiness probes are initiated0
child.readinessProbe.failureThresholdWhen a readiness probe fails, Kubernetes will try failureThreshold times before giving up. Giving up the readiness probe means marking the Pod Unready3
child.readinessProbe.periodSecondsHow often (in seconds) to perform the readiness probe30
child.readinessProbe.successThresholdMinimum consecutive successes for the readiness probe to be considered successful after having failed1
child.readinessProbe.timeoutSecondsNumber of seconds after which the readiness probe times out1
child.terminationGracePeriodSecondsDuration in seconds the pod needs to terminate gracefully30
child.nodeSelectorNode selector for the child daemonsets{}
child.tolerationsTolerations settings for the child daemonsets- operator: Exists with effect: NoSchedule
child.affinityAffinity settings for the child daemonsets{}
child.priorityClassNamePod priority class name for the child daemonsets""
child.envSet environment parameters for the child daemonset{}
child.envFromSet environment parameters for the child daemonset from ConfigMap and/or Secrets[]
child.podLabelsAdditional labels to add to the child pods{}
child.podAnnotationsAdditional annotations to add to the child pods{}
child.hostNetworkUsage of host networking and portstrue
child.dnsPolicyDNS policy for pod. Should be ClusterFirstWithHostNet if child.hostNetwork = trueClusterFirstWithHostNet
child.podAnnotationAppArmor.enabledWhether or not to include the AppArmor security annotationtrue
child.persistence.hostPathHost node directory for storing child instance data/var/lib/netdata-k8s-child
child.persistence.enabledWhether or not to persist /var/lib/netdata in the child.persistence.hostPath.true
child.podsMetadata.useKubeletSend requests to the Kubelet /pods endpoint instead of Kubernetes API server to get pod metadatafalse
child.podsMetadata.kubeletUrlKubelet URLhttps://localhost:10250
child.configsManage custom child's configsSee Configuration files.
child.claiming.enabledEnable child claiming for netdata cloudfalse
child.claiming.tokenClaim token""
child.claiming.roomComma separated list of claim rooms IDs""
child.extraVolumeMountsAdditional volumeMounts to add to the child pods[]
child.extraVolumesAdditional volumes to add to the child pods[]
notifications.slack.webhook_urlSlack webhook URL""
notifications.slack.recipientSlack recipient list""
initContainersImage.repositoryInit containers' image repositoryalpine
initContainersImage.tagInit containers' image taglatest
initContainersImage.pullPolicyInit containers' image pull policyAlways
sysctlInitContainer.enabledEnable an init container to modify Kernel settingsfalse
sysctlInitContainer.commandsysctl init container command to execute[]
sysctlInitContainer.resourcessysctl Init container CPU/Memory resource requests/limits{}
sd.image.repositoryService-discovery image reponetdata/agent-sd
sd.image.tagService-discovery image tagLatest stable release (e.g. v0.2.2)
sd.image.pullPolicyService-discovery image pull policyAlways
sd.child.enabledAdd service-discovery sidecar container to the netdata child pod definitiontrue
sd.child.resourcesChild service-discovery container CPU/Memory resource requests/limits{resources: {limits: {cpu: 50m, memory: 150Mi}, requests: {cpu: 50m, memory: 100Mi}}}
sd.child.configmap.nameChild service-discovery ConfigMap namenetdata-child-sd-config-map
sd.child.configmap.keyChild service-discovery ConfigMap keyconfig.yml
sd.child.configmap.from.fileFile to use for child service-discovery configuration generationsdconfig/sd-child.yml
sd.child.configmap.from.valueValue to use for child service-discovery configuration generation{}

Example to set the parameters from the command line:

$ helm install ./netdata --name my-release \
--set notifications.slack.webhook_url=MySlackAPIURL \
--set notifications.slack.recipient="@MyUser MyChannel"

Another example, to set a different ingress controller.

By default kubernetes.io/ingress.class set to use nginx as an ingress controller, but you can set Traefik as youringress controller by setting ingress.annotations.

$ helm install ./netdata --name my-release \
--set ingress.annotations=kubernetes.io/ingress.class: traefik

Alternatively to passing each variable in the command line, a YAML file that specifies the values for the parameters canbe provided while installing the chart. For example,

$ helm install ./netdata --name my-release -f values.yaml

Tip: You can use the default values.yaml

Note:: To opt out of anonymous statistics, set the DO_NOT_TRACKenvironment variable to non-zero or non-empty value inparent.env / child.env configuration (e.g: DO_NOT_TRACK: 1)or uncomment the line in values.yml.

Configuration files

ParameterDescriptionDefault
parent.configs.netdataContents of the parent's netdata.confmemory mode = dbengine
parent.configs.streamContents of the parent's stream.confStore child data, accept all connections, and issue alarms for child data.
parent.configs.healthContents of health_alarm_notify.confEmail disabled, a sample of the required settings for Slack notifications
parent.configs.exportingContents of exporting.confDisabled
k8sState.configs.netdataContents of netdata.confNo persistent storage, no alarms
k8sState.configs.streamContents of stream.confSend metrics to the parent at netdata:{{ service.port }}
k8sState.configs.exportingContents of exporting.confDisabled
k8sState.configs.go.dContents of go.d.confOnly k8s_state enabled
k8sState.configs.go.d-k8s_stateContents of go.d/k8s_state.confk8s_state configuration
child.configs.netdataContents of the child's netdata.confNo persistent storage, no alarms, no UI
child.configs.streamContents of the child's stream.confSend metrics to the parent at netdata:{{ service.port }}
child.configs.exportingContents of the child's exporting.confDisabled
child.configs.kubeletContents of the child's go.d/k8s_kubelet.conf that drives the kubelet collectorUpdate metrics every sec, do not retry to detect the endpoint, look for the kubelet metrics at http://127.0.0.1:10255/metrics
child.configs.kubeproxyContents of the child's go.d/k8s_kubeproxy.conf that drives the kubeproxy collectorUpdate metrics every sec, do not retry to detect the endpoint, look for the coredns metrics at http://127.0.0.1:10249/metrics

To deploy additional netdata user configuration files, you will need to add similar entries to eitherthe parent.configs or the child.configs arrays. Regardless of whether you add config files that reside directlyunder /etc/netdata or in a subdirectory such as /etc/netdata/go.d, you can use the already provided configurationsas reference. For reference, the parent.configs the array includes an example alarm that would get triggered if thepython.d example module was enabled. Whenever you pass the sensitive data to your configuration like the databasecredential you can take an option to put it into the Kubernetes Secret by specifying storedType: secret in theselected configuration. Default all the configuration will be placed in the Kubernetes configmap.

Note that with the default configuration of this chart, the parent does the health checks and triggers alarms, but doesnot collect much data. As a result, the only other configuration files that might make sense to add to the parent arethe alarm and alarm template definitions, under /etc/netdata/health.d.

Tip: Do pay attention to the indentation of the config file contents, as it matters for the parsing of the yaml file. Note that the first line under var: |must be indented with two more spaces relative to the preceding line:

 data: |-
config line 1 #Need those two spaces
config line 2 #No problem indenting more here

Persistent volumes

There are two different persistent volumes on parent node by design (not counting any Configmap/Secret mounts). Bothcan be used, but they don't have to be. Keep in mind that whenever persistent volumes for parent are not used, all thedata for specific PV is lost in case of pod removal.

  1. database (/var/cache/netdata) - all metrics data is stored here. Performance of this volume affects query timings.
  2. alarms (/var/lib/netdata) - alarm log, if not persistent pod recreation will result in parent appearing as a newnode in netdata.cloud (due to ./registry/ and ./cloud.d/ being removed).

In case of child instance it is a bit simpler. By default, hostPath: /var/lib/netdata-k8s-child is mounted on childin: /var/lib/netdata. You can disable it but this option is pretty much required in a real life scenario, as withoutit each pod deletion will result in new replication node for a parent.

Service discovery and supported services

Netdata's service discovery, which is installed as part of theHelm chart installation, finds what services are running on a cluster's pods, converts that into configuration files,and exports them, so they can be monitored.

Applications

Service discovery currently supports the following applications via their associated collector:

  • ActiveMQ
  • Apache
  • Bind
  • co*ckroachDB
  • Consul
  • CoreDNS
  • Elasticsearch
  • Fluentd
  • FreeRADIUS
  • HDFS
  • Lighttpd
  • Logstash
  • MySQL
  • NGINX
  • OpenVPN
  • PHP-FPM
  • RabbitMQ
  • Solr
  • Tengine
  • Unbound
  • VerneMQ
  • ZooKeeper

Prometheus endpoints

Service discovery supports Prometheus endpoints viathe Prometheus collector.

Annotations on pods allow a fine control of the scraping process:

  • prometheus.io/scrape: The default configuration will scrape all pods and, if set to false, this annotation excludesthe pod from the scraping process.
  • prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
  • prometheus.io/port: Scrape the pod on the indicated port instead of the pod’s declared ports.

Configure service discovery

If your cluster runs services on non-default ports or uses non-default names, you may need to configure servicediscovery to start collecting metrics from your services. You have to editthe default ConfigMap that is shipped with theHelmchart and deploy that to your cluster.

First, copy netdata-helmchart/sdconfig/child.yml to a new location outside the netdata-helmchart directory. Thedestination can be anywhere you like, but the following examples assume it resides next to the netdata-helmchartdirectory.

cp netdata-helmchart/sdconfig/child.yml .

Edit the new child.yml file according to your needs. Seethe Helm chart configuration and the file itself for details. Youcan then runhelm install/helm upgrade with the --set-file argument to use your configured child.yml file instead of thedefault, changing the path if you copied it elsewhere.

helm install --set-file sd.child.configmap.from.value=./child.yml netdata ./netdata-helmchart/charts/netdata
helm upgrade --set-file sd.child.configmap.from.value=./child.yml netdata ./netdata-helmchart/charts/netdata

Now that you pushed an edited ConfigMap to your cluster, service discovery should find and set up metrics collectionfrom your non-default service.

Custom pod labels and annotations

Occasionally, you will want to addspecific labelsand annotations to the parent and/orchild pods. You might want to do this to tell other applications on the cluster how to treat your pods, or simply tocategorize applications on your cluster. You can label and annotate the parent and child pods by using the podLabelsand podAnnotations dictionaries under the parent and child objects, respectively.

For example, suppose you're installing Netdata on all your database nodes, and you'd like the child pods to be labeledwith workload: database so that you're able to recognize this.

At the same time, say you've configured chaoskube to killall pods annotated with chaoskube.io/enabled: true, and you'd like chaoskube to be enabled for the parent pod but notthe childs.

You would do this by installing as:

$ helm install \
--set child.podLabels.workload=database \
--set 'child.podAnnotations.chaoskube\.io/enabled=false' \
--set 'parent.podAnnotations.chaoskube\.io/enabled=true' \
netdata ./netdata-helmchart/charts/netdata

Contributing

If you want to contribute, we are humbled!

  • Take a look at our Contributing Guidelines.
  • This repository is under the Netdata Code Of Conduct.
  • Chat about your contribution and let us help you inour forum!
Netdata Helm chart for Kubernetes deployments | Learn Netdata (2024)

References

Top Articles
Latest Posts
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 5775

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.