2018-02-21

Setting up Kafka on a Raspberry Pi cluster via Ansible




After installing Docker and Kubernetes on my RPi cluster, I wanted to go on, with Kafka.


Prerequisites



First of all, I assume that you have a RPi cluster already configured to work with Ansible. If not, please refer to my previous article on the subject.

I could have refer to online archives, but as you need Java 8 and as Oracles forces you to acknowledge their license, I decided to start with archives previously downloaded.
You need :
  1. Kafka 1.0.0
  2. Java 8 (recent update) for ARM
You will then have to fill the var section in the playbook with the values matching your environment.


Ansible Playbook



The goal of this playbook is to provide a way to learn about using Kafka in a cluster, without having to cope with the installation. Therefore, you can decide whether it suits your needs or not.
And if it's the case, you still can learn about the installation process by simply reading the playbook.

The playbook is here for information, but if it were to be modified, it would be on my github.

Zookeeper and Kafka cluster configuration



One thing to have in mind when reading the Ansible playbook is that my nodes have fixed IPs that are following each other.
[pi-cluster]
192.168.0.23[1:4]

[masters]
192.168.0.231

[slaves]
192.168.0.232
192.168.0.233
192.168.0.234
As you can see, my first node IP last number is 1. The second one is 2 and so on. I took that as a fact when I wrote my playbook. If you have a different strategy, you'll have to modify the playbook accordingly.
This playbook is not that hard, but there is a little tricky part : the configuration of the ZooKeeper and Kafka clusters : I had to compose the cluster configuration from the information available through the Ansible inventory (= Ansible hosts).

For instance, the zookeeper cluster config looks like :
server.1=192.168.0.231:2888:3888
server.2=192.168.0.232:2888:3888
server.3=192.168.0.233:2888:3888
server.4=192.168.0.234:2888:3888
To reach that result, here is what I used :
- name: (ZOOKEEPER CONFIG) Adding cluster nodes to the zookeeper configuration
  lineinfile:
    path: "{{ zookeper_config }}"
    line: "server.{{ item[-1] }}={{ item + ':2888:3888' }}"
    insertafter: EOF
  with_items:
    - "{{ groups['pi-cluster'] }}"
And regarding the Kafka config, it looks like :
192.168.0.231:2181,192.168.0.232:2181,192.168.0.233:2181,192.168.0.234:2181
That was done by a simple var declaration :
"{{ groups['pi-cluster'] | join(':2181,') }}:2181"


Idempotent, yes but how ?



One of Ansible great strengths is the fact that most of its command are idempotent. Yet, in certain cases, I did not use lineinfile because I wanted to keep the original files intact to be able to start the configuration all over again without having to go to the process of copying archives and installing them from scratch. Maybe there's a better way to do it.
If so, leave a comment or better, open a PR !

The playbook




---
- name: Set Kafka up on the RPi cluster
  hosts: pi-cluster
  remote_user: pi

  tasks:
    - block:
      - name: (INSTALL) Checking if Java is already installed
        stat:
          path: "/opt/{{ java_version }}"
        register: javadir
      - name: (INSTALL) Checking if Kafka is already installed
        stat:
          path: "/opt/{{ kafka_version }}"
        register: kafkadir
      - name: (INSTALL) Unarchiving Java and Kafka
        unarchive:
          src: "{{ item }}"
          dest: /opt
          owner: pi
          group: pi
          mode: 0755
        with_items:
          - "{{ java_installer_path }}"
          - "{{ kafka_installer_path }}"
        when: javadir.stat.exists == false or kafkadir.stat.exists == false
      - name: (INSTALL) Fixing permissions for Java (unarchive user/group modification does not work with that one)
        file:
          path: /opt/{{ java_version }}
          owner: pi
          group: pi
          mode: 0755
          recurse: yes
        when: javadir.stat.exists == false
      - name: (INSTALL) Adding symbolic link for Java
        file:
          src: "/opt/{{ java_version }}/bin/java"
          dest: /usr/bin/java
          owner: pi
          group: pi
          state: link
        when: javadir.stat.exists == false
      - name: (INSTALL) Removing Kafka "windows" directory
        file:
          path: "/opt/{{ kafka_version }}/bin/windows"
          state: absent
        when: kafkadir.stat.exists == false
      - name: (BACKUP) Checking if previous config backups already exist
        stat:
          path: "{{ item }}"
        register: backup
        with_items:
          - "{{ zookeper_config }}.original"
          - "{{ kafka_config }}.original"
      - debug:
          var: backup
      - name: (BACKUP) Making backup copies of the zookeper and kafka config files, if never been backed up before
        copy:
          src: "{{ item }}"
          dest: "{{ item }}.original"
          owner: pi
          group: pi
          mode: 0755
          remote_src: yes
        with_items:
          - "{{ zookeper_config }}"
          - "{{ kafka_config }}"
        when: backup.results[0].stat.exists == false
      - name: (BACKUP) Restoring original file to be truly idempotent
        copy:
          src: "{{ item }}.original"
          dest: "{{ item }}"
          remote_src: true
        with_items:
          - "{{ zookeper_config }}"
          - "{{ kafka_config }}"
        when: backup.results[0].stat.exists == true
      - name: (ZOOKEEPER CONFIG) Creating zookeeper work directory
        file:
          path: /var/zookeeper
          owner: pi
          group: pi
          state: directory
          mode: 0755
      - name: (ZOOKEEPER CONFIG) Replacing the default config which sets the zookeeper workdir under var
        lineinfile:
          path: "{{ zookeper_config }}"
          regexp: '^dataDir=.*$'
          line: 'dataDir={{ zookeeper_workdir }}'
      - name: (ZOOKEEPER CONFIG) Adding useful configuration
        lineinfile:
          path: "{{ zookeper_config }}"
          line: "{{ item }}"
          insertafter: EOF
        with_items:
          - "tickTime=2000"
          - "initLimit=10"
          - "syncLimit=5"
      - name: (ZOOKEEPER CONFIG) Adding cluster nodes to the zookeeper configuration
        lineinfile:
          path: "{{ zookeper_config }}"
          line: "server.{{ item[-1] }}={{ item + ':2888:3888' }}"
          insertafter: EOF
        with_items:
          - "{{ groups['pi-cluster'] }}"
      - name: (ZOOKEEPER CONFIG) Removing a previous idFile
        file:
          path: "{{ zookeeper_workdir }}/myid"
          state: absent
      - name: (ZOOKEEPER CONFIG) Creating zookeeper id file
        file:
          path: "{{ zookeeper_workdir }}/myid"
          state: touch
          owner: pi
          group: pi
          mode: 0755
      - name: (ZOOKEEPER CONFIG) Filling id file with respecting id
        lineinfile:
          path: "{{ zookeeper_workdir }}/myid"
          line: "{{ inventory_hostname[-1] }}"
          insertafter: EOF
      - name: (KAFKA CONFIG) Defining the broker ID
        lineinfile:
          path: "{{ kafka_config }}"
          regexp: '^broker.id=.*$'
          line: 'broker.id={{ inventory_hostname[-1] }}'
      - name: (KAFKA CONFIG) Setting the listen address
        lineinfile:
          path: "{{ kafka_config }}"
          regexp: '^#listeners=.*$'
          line: 'listeners=PLAINTEXT://{{ inventory_hostname }}:9092'
      - name: (KAFKA CONFIG) Setting the zookeeper cluster address
        lineinfile:
          path: "{{ kafka_config }}"
          regexp: '^zookeeper.connect=.*$'
          line: 'zookeeper.connect={{ zookeeper_cluster_address }}'
      - name: (STARTUP) Starting ZooKeeper
        shell: "nohup /opt/{{ kafka_version }}/bin/zookeeper-server-start.sh {{ zookeper_config }} &"
        async: 10
        poll: 0
      - name: (STARTUP) Starting Kafka
        shell: "nohup /opt/{{ kafka_version }}/bin/kafka-server-start.sh {{ kafka_config }} &"
        async: 10
        poll: 0

      become: true
      vars:
        installer_dir: "YourPathToTheDownloadedArchives"
        java_version: "jdk1.8.0_162"
        kafka_version: "kafka_2.11-1.0.0"
        java_installer_path: "{{ installer_dir }}/jdk-8u162-linux-arm32-vfp-hflt.tar.gz"
        kafka_installer_path: "{{ installer_dir }}/{{ kafka_version }}.tgz"
        zookeper_config: "/opt/{{ kafka_version }}/config/zookeeper.properties"
        kafka_config: "/opt/{{ kafka_version }}/config/server.properties"
        zookeeper_workdir: "/var/zookeeper"
        zookeeper_cluster_address: "{{ groups['pi-cluster'] | join(':2181,') }}:2181"

Then to run it, use the following command :
ansible-playbook nameOfYourFile.yml --ask-become-pass 
Then you will be prompted the password (that's what the option --ask-become-pass do) for issuing commands as root.

Testing the cluster


Now, it's time to check that everything went smoothly. To do so, I'm going to use the test command lines shipped with the distribution.

First, start a producer from any of the nodes (or even on another machine, as long as Kafka is installed) :
/opt/kafka_2.11-1.0.0/bin/kafka-console-producer.sh \
           --broker-list 192.168.0.231:9092,192.168.0.232:9092,192.168.0.233:9092,192.168.0.234:9092 --topic testopic
This will get you a command prompt in which anything you type will be sent across the cluster.

Then, start as many consumers as you want and just observe what comes up in the terminal. (You can use something like Terminator to handle multi dynamic display, even if I never managed to get it working on my Mac)
/opt/kafka_2.11-1.0.0/bin/kafka-console-consumer.sh \
           --zookeeper 192.168.0.231 --topic testopic from-beginning
Then type something in the producer prompt, and it should be displayed on all consumer terminals.


Troubleshooting


But if something does not work, I strongly suggest that you refer to these commands :
echo dump | nc localhost 2181
/opt/kafka_2.11-1.0.0/bin/zookeeper-shell.sh localhost:2181 <<< "get /brokers/ids/1"
Source : https://stackoverflow.com/questions/46158296/kafka-broker-not-available-at-starting

References


2018-02-04

Deploying Docker and Kubernetes on a Raspberry Pi cluster using Ansible


Disclaimer


I would like to thank Alex Ellis for his wonderful work in the container world.
From the moment I first read his blog, I wanted to build a RPi cluster and test stuff :)

And what you are going to find in that article is no more no less than his work that I "translated" in a playbook.
Why Ansible ? Simply because it is probably the tool that has impressed me most these past years by its simplicity, efficiency and power.
Nothing new, nothing fancy : just a playbook I wrote for me and I thought could be interesting to others.

For the playbook to run smoothly, I'm assuming you already have a configured RPi cluster and Ansible set up to interact with it. If not, you still can check my previous article on the topic.
If you are just willing to install Ansible, go to the dedicated section.

The playbook


This playbook, along with some others to come (I'm really eager to play with Kafka and Pulsar) is available on my Github.

It's here for you to have a quick overview, but if some changes were to occur, I don't think I will reflect them here. You've been warned.

---
- name: Install Docker and K8s
  hosts: pi-cluster
  remote_user: pi

  tasks:
  - block:
      - name: Add encryption key for the Docker and K8s repository
        apt_key:
          url: "{{ item }}"
          state: present
        with_items:
          - https://download.docker.com/linux/raspbian/gpg
          - https://packages.cloud.google.com/apt/doc/apt-key.gpg
      - name: Clean Docker and K8s repository files to be idempotent
        file:
          name: "{{ item }}"
          state: absent
        with_items:
          - /etc/apt/sources.list.d/docker.list
          - /etc/apt/sources.list.d/kubernetes.list
      - name: Recreate Docker and K8s repository files
        file:
          name: "{{ item }}"
          state: touch
        with_items:
          - /etc/apt/sources.list.d/docker.list
          - /etc/apt/sources.list.d/kubernetes.list
      - name: Add Docker and K8s repository to the list of repositories
        lineinfile:
          path: /etc/apt/sources.list.d/{{ item.category }}.list
          line: "{{ item.url }}"
        with_items:
          - { url: 'deb [arch=armhf] https://download.docker.com/linux/raspbian stretch stable', category: 'docker'     }
          - { url: 'deb http://apt.kubernetes.io/ kubernetes-xenial main'                      , category: 'kubernetes' }
      - name: Install packages to allow apt to use HTTPS repositories
        apt:
          name: "{{ item }}"
          state: present
        with_items:
          - apt-transport-https
          - ca-certificates
          - software-properties-common
      - name: Update list of available repositories
        apt:
          update_cache: yes
      - name: Update all packages to the latest version
        apt:
          upgrade: dist
      - name: Install Docker and K8s binaries
        apt:
          name: "{{ item }}"
          state: present
        with_items:
          - docker-ce
          - kubelet
          - kubeadm
          - kubectl
          - kubernetes-cni
      - name: Turn off swap
        shell: dphys-swapfile swapoff && dphys-swapfile uninstall && update-rc.d dphys-swapfile remove
      - name: Activating cgroup
        lineinfile:
          path: /boot/cmdline.txt
          backrefs: true
          regexp: '^(.*rootwait)$'
          line: '\1 cgroup_enable=cpuset cgroup_memory=1'
      - name: Rebooting
        shell: reboot now
        ignore_errors: true
    become: true

Then to run it, use the following command :
ansible-playbook nameOfYourFile.yml --ask-become-pass
Then you will be prompted the password (that's what the option --ask-become-pass do) for executing commands as root.

Once you've run it (without any errors I hope), Docker and Kubernetes should be installed and ready to use.

Have fun !

2018-02-03

Building a small Raspberry Pi cluster and configure Ansible to administer it



Lately, everybody has been talking about microservices, Docker, Kubernetes, Kafka, Pulsar, ...
All those subjects are highly appealing but to really be able to learn them, you have to eventually test them in some real conditions. By "real", I mean, you can for instance learn K8s with minikube, but to see how it reacts when a node is gone, is something you truly have to experiment to correctly apprehend it. VMs are cool but RPis are fun !

For that article, I was inspired by Alex Ellis' blogpost.

Gathering the hardware


As you can see in the first picture, my cluster is composed of 4 units. I bought two Raspberry Pi 3 and recycled two Raspberry Pi 2.
4 is the perfect number to me : if you take a RPi as the master, it leaves 3 units which offers multiple use cases to test.
Two is too limited. More is better but also more expensive.

As I already have too many cables running around, I also bought two WiFi dongles for my RPi2.



To power the whole stuff, I found a cool alim from RavPower



And of course, a pack of USB cables


Once your cluster is ready from a hardware point of view, it's time to proceed to the software part.

Prepare Raspbian



For that part, it's pretty much the same than Alex's.

First, just get the Raspbian Stretch Lite image :
https://www.raspberrypi.org/downloads/raspbian/

Then flash it with Etcher.io

Don't put it in your RPi yet ! First thing to do, before booting, is to create an empty "ssh" file in the boot partition. This will allow you to login remotely.
sudo touch /boot/ssh
Then put the card inside the RPi and boot it.

Configure Raspbian



Keyboard layout (optional) / hostname / WIFI



As I'm French, my keyboard layout is AZERTY. It's really easy to switch from QWERTY (default) to the desired layout.
In fact, the three next steps are all done in the Raspbian menu.
sudo raspi-config

4th item "Localisation Options" is the part where you will be able to select the keyboard layout you want.

Network 1/2


As you can guess from the screenshot above, the menu "Network Options" (second item) will allow you to configure the hostname, the WiFi network and change the user password.
Configuring the WiFi access will update the file /etc/wpa_supplicant/wpa_supplicant.conf with your configuration.

Once you're done, you'll be asked if you want to reboot, click "Yes".

Network 2/2


In order to make the administration of the cluster easier , we're going to set a fix IP to each node.
To do so, just update the file /etc/dhcpcd.conf following the example below, but pay attention to reflect your network configuration.
sudo vi /etc/dhcpcd.conf

interface wlan0
static ip_address=192.168.0.23X
static routers=192.168.0.1
static domain_name_servers=192.168.0.1
When complete, reboot once again (this is the last time).

SSH key-based authentication



To be able to let Ansible interact with your nodes, you have to set a SSH key-based authentication for each one.
ssh-copy-id pi@192.168.0.23X
Try to connect to your node(s), and you shouldn't be prompted any password.
ssh pi@192.168.0.23X
Okay, we're ready to let Ansible come join the party.
(I'm expecting you to have it already installed. If not, simply follow the official documentation.).

If you're using a Linux OS, your inventory file should be /etc/ansible/hosts and you may then skip the next part.
Since I'm working on a Mac, I configured it manually.

First, edit or create a file ~/.ansible.cfg
# config file for ansible -- https://ansible.com/
# ===============================================

# nearly all parameters can be overridden in ansible-playbook
# or with command line flags. ansible will read ANSIBLE_CONFIG,
# ansible.cfg in the current working directory, .ansible.cfg in
# the home directory or /etc/ansible/ansible.cfg, whichever it
# finds first

[defaults]

inventory=/your/path/to/ansible/hosts
Then, once you've defined where to find it, edit it, paste the following configuration and update the IP values.
[pi-cluster]
192.168.0.23[1:4]

[masters]
192.168.0.231

[slaves]
192.168.0.232
192.168.0.233
192.168.0.234
Create a playbook (a simple YAML file) test-cluster.yml with the following content.
---
- name: Test my raspberry cluster
  hosts: pi-cluster
  remote_user: pi

  tasks:
  - name: Check Rasp model
    command: cat /proc/device-tree/model
    register: piversion
  - name: Debug each entry of my hosts
    debug:
      msg: "System {{ inventory_hostname }} is a {{ piversion.stdout }}"
And then just execute it :
ansible-playbook test-cluster.yml 
You should have a result such as the one below.
PLAY [Test my raspberry cluster] ***************************************************

TASK [Gathering Facts] *************************************************************
ok: [192.168.0.233]
ok: [192.168.0.232]
ok: [192.168.0.234]
ok: [192.168.0.231]

TASK [Check Rasp model] ************************************************************
changed: [192.168.0.233]
changed: [192.168.0.232]
changed: [192.168.0.234]
changed: [192.168.0.231]

TASK [Debug each entry of my hosts] ************************************************
ok: [192.168.0.231] => {
    "msg": "System 192.168.0.231 is a Raspberry Pi 2 Model B Rev 1.1\u0000"
}
ok: [192.168.0.232] => {
    "msg": "System 192.168.0.232 is a Raspberry Pi 3 Model B Rev 1.2\u0000"
}
ok: [192.168.0.233] => {
    "msg": "System 192.168.0.233 is a Raspberry Pi 3 Model B Rev 1.2\u0000"
}
ok: [192.168.0.234] => {
    "msg": "System 192.168.0.234 is a Raspberry Pi 2 Model B Rev 1.1\u0000"
}

PLAY RECAP *************************************************************************
192.168.0.231              : ok=3    changed=1    unreachable=0    failed=0   
192.168.0.232              : ok=3    changed=1    unreachable=0    failed=0   
192.168.0.233              : ok=3    changed=1    unreachable=0    failed=0   
192.168.0.234              : ok=3    changed=1    unreachable=0    failed=0   
And you're done : your RPi cluster is all set up, and Ansible is ready to perform various actions on it.