Compile Ceph (master) on ARM (32-Bit)

I gave up on getting Ceph run on ARM 32 bit. It was a huge effort to fix the types, that diverge when switching from 64 to 32 bit. The development of Ceph is simply to fast to cope with. Since the developers decided to drop all tests for 32 bit builds, the needed fixes are too many for a single person to hunt after it.

TODO: Test this all on a virgin armhf system (raspberry, odroid hc1/2/XU4,…) and complete the TODOs for openssl and phantomjs (and the sass-dependency). Maybe with the new master tree, it is not needed to build it outside the ceph repo.

First install prerequisites:

sudo apt install python-pip build-essential libgmp-dev \
libmpfr-dev libmpc-dev reprepro

Install nodejs from nodejs.org

curl -sL https://deb.nodesource.com/setup_11.x | sudo -E bash - sudo apt-get install -y nodejs
sudo npm install -g npm

Then prepare a swap partition (you will need it 😉 )

dd if=/dev/zero of=/<some-hdd-path>/swapfile \
bs=1M count=8192 progress=status
mkswap /<some-hdd-path>/swapfile
swapon /<some-hdd-path>/swapfile

Then we should install some dependencies

sudo apt install libgmp-dev libmpfr-dev libmpc-dev ruby

Now install a new GCC that supports C++17.

wget https://ftp.gnu.org/gnu/gcc/gcc-8.2.0/gcc-8.2.0.tar.xz
tar xfJ gcc-8.2.0.tar.xz
cd gcc-8.2.0
./configure # for armhf
# ./configure --disable-multilib # for x86_64/arm64
make

Building ceph with do_cmake, building a debian package with make-debs.sh or simply build packages using another compiler than the debian default one (6.3.0) requires you to change the default compiler e.g. to gcc-8.2.0 for the whole system:

sudo update-alternatives --install /usr/bin/cc cc /usr/local/gcc-8.2/bin/gcc-8.2 50
sudo update-alternatives --install /usr/bin/c++ c++ /usr/local/gcc-8.2/bin/g++-8.2 50

Checkout OpenSSL-1.0.2-stable (seems also necessary for armhf), PhantomJS, compile and install it:

cd /opt/GIT
git clone git@github.com:openssl/openssl.git
cd openssl
git checkout OpenSSL-1_0_2-stable
...TODO...
# Following seems only necessary on arm
# (or all platforms wihtout precompiled binary)
cd /opt/GIT
git clone git@github.com:ariya/phantomjs.git
cd phantomjs
...TODO...
sudo LD_LIBRARY_PATH=/opt/openssl_build_stable/lib/ \
deploy/package.sh --bundle-libs

Add the following to build.py (at L:244, just after PlatformOptions.extend)

phantom_openssl = os.getenv("PHANTOM_OPENSSL_PATH", "")
if phantom_openssl != "":
openssl = os.putenv("OPENSSL_LIBS", "-L" + phantom_openssl + "/lib -lssl -lcrypto")
openssl_include = "-I" + phantom_openssl + "/include"
openssl_lib = "-L" + phantom_openssl + "/lib"
platformOptions.extend([openssl_include, openssl_lib])
print("Using OpenSSL at %s" % phantom_openssl)

Then install it to /opt

Build and compile Ceph

git clone git@github.com:the78mole/ceph.git
cd ceph
git checkout wip-32-bit-arm-fixes
./install-deps.sh
./do_cmake_arm32.sh # for armhf
# ./do_cmake.sh # for x86_64/amd64 or arm64
cd build
make -j4
# if it gets really slow due to swapping, break an do make -j1
# or use the scheduler-script from link below

Here you can find a rudimentary (but working) script that suspends compilers processes based on total compilers memory consumption. Running it through ‚watch‘-tool you can start e.g. 8 tasks and when memory limit is reached, it will suspend (kill -TSPT) the youngest tasks in sense of user space runtime.
https://github.com/the78mole/scripts/blob/master/linux/bash/schedule_compile.sh

Now do…

cd ..   # Back to ceph base dir
./make-debs-arm32.sh # for armhf
# ./make-debs.sh # fox x86_64/amd64 or arm64

If you encounter problems with setuptools (Exception –> TypeError: unsupported operand type(s) for -= ‚Retry‘ and ‚int‘) try to get a more recent version of python pip with the following commands and rerun make-debs-arm32.sh.

apt-get remove python-pip python3-pip
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
python3 get-pip.py

If I forgot anything to make it work, feel free to write some comment…

How to Build A Private Storage Cluster (with Ceph)

Finally, I did not succeed in getting Ceph running on a 32 bit ARM. There have been to many issues in the code (especially incompatible datatypes) and issues with GCC and the 3 GB RAM limit for 32 bit platforms. I’m now focusing on using a 64 bit ARM and developing a dedicated HW for it. So, stay tuned…

Since my NAS (a QNAP TS-419P II) get more and more buggy, especially with non-working Windows shares and the painfully low processing power of the integrated ARM single core, wished something like a SAN for myself. But SAN is quite expensive, the peripherial hardware (Switches, UPS,…) not included. So I decided to skip a few levels and build up a NAS 2.0 storage cluster based on open source ceph using low-budget ODROID HC2 (Octa-Core 4 x Cortex-A15 + 4 x Cortex-A7) from Hardkernel as the work horse to create storage nodes. To make it even more dense, you can use the ODROID HC1 that is just the same but for 2.5″ disks (be aware of the power supply: HC2 = 12V, HC1 = 5V !!!).

If you don’t need a SATA drive (e.g. for the controlling nodes of the cluster: mgr, metadata, nfs, cifs,…), you can use the MC1, MC1 solo, XU4 or XU4Q.

If you want to go with x86 instead of ARM, the ODROID H2 looks like a great alternative, but it will also be a bit more expensive (e.g. RAM is not included).

In fact, installing ceph will be much less pain, going for 64-bit x86 than going with ARM 32 bit. I decided to go with ARM 32, because I want to build up the most energy efficient cluster, to maximize scale out capabilities also in sense of my private budget.

To build up the cluster, I currently use 4 x ODRID HC2 with WD Red 4 TB drives (WD40EFRX), also installing the ceph non-OSD-services distributed accross this little cluster. The BOM for my test cluster is as follows:

If powering up the cluster in sequence (not all at once), you could reduce the power requirements of the supply component a lot (currently 12V/2A per node, 5V/4A for HC1). I will dive into this topic a bit deeper in future. I think, it can be done in software by delaying the spin up through some bootarg. Nevertheless, an optimum solution would be to have a power distribution unit for switching and measuring the supply current and also providing some UPS capabilities on the low voltage path. Additionally, current measuments could give you the ability to regulate the power through e.g. cpufreq to optimize the efficiency of the cluster and the power supply.

To generate the debian packages for installing ceph on the nodes, follow the instructions here. When you have built the debian packages, move them over to some http(s) server, to be easily accessible by your nodes.

Your Own Debian Package Repository

To be accessible, a Debian Package Repository needs to be placed in a webserver’s directory accessible at least in your own network. It is best practice to secure this repository with SSL, since Debian APT more or less expects this… So first we start with creating a (self signed CA). Later, if needed, you can easily replace the certifciate by an official one or let an authorithy also sign your server certificate.

Generating a CA for SSL

This part is based on the tutorial here. First, we will simply use self signed certificates, since it is much easier and faster than using officially signed certificates. We will then place the CA in the cert storage of our linux OS, to make it trust ourself. 😉

mkdir ~/CA
cd ~/CA
# Generate the CA key
openssl genrsa -out ca.key 4096
# Generate the CA certificate, here, you can leave the CN empty
openssl req -new -x509 -key ca.key -days 366 -out ca.crt
# Make it unaccessible by other users
chmod 700 ca.key
# Generate a certificate configuration
wget https://raw.githubusercontent.com/the78mole/scripts/master/templates/configs/ssl/cert.conf -O example.org.conf
# Edit the configuration
vi example.org.conf
# Create a server certificate key and the signing request (not the yet cert)
openssl req -new -out example.org.csr -config example.org.conf
# Create the public key
openssl rsa -in example.org.key -pubout -out example.org.pubkey
# Sign the CSR with your CA and create the certificate
openssl x509 -req -in example.org.csr -CA ca.crt -CAkey ca.key -CAcreateserial -extensions my_extensions -extfile example.org.conf -days 366 -out example.org.crt

To get the alternative DNS names and IPs added to the certificate, you need to specify the config file as an extensions and point to the config section, where the extensions are located. This is because the extensions in the CSR get ignored by openssl when signing and you need to specify it explicitly.

After generating the certificate, you need to import it, where you need it to be accepted (Browser, APT). For testing, it is best to try with a browser. Some tutorial can be found here (it’s german, so use google translator, to read in english) and here. Use the shell of your desktop Debian system.

scp <CA_HOST>:/<PATH_TO_CA>/ca.crt example_ca.pem
sudo cp example_ca.pem /usr/local/share/ca-certificates/
sudo update-ca-certificates

To add the certificate to your browser, e.g. chromium

sudo apt install libnss3-tools
certutil -A -n "Example Company CA" -t "TCu,Cu,Tu" -i example_ca.pem -d ~/.pki/nssdb

Note: Maybe this does not work correctly… Then, in Chromium, use Settings –> Privacy and Security –> Manage Certificates –> Import –> Select the CA –> Check all boxes.

Now we need the CA’s and the server’s certificate along with the server key for securing webserver traffic.

Install and Configure the Webserver

Welp, we will use nginx as our webserver. Feel free to use any other, it does not really matter. In fact, every further step (e.g. the let’s encrypt tutorial) will be based on nginx.

sudo apt install nginx
cd /etc/nginx
cp snippets/snakeoil.conf snippets/ssl_example.org.conf
mkdir -p ssl/pub
mkdir -p ssl/priv
sudo chown -R root:www-data ssl
sudo chmod -R 0755 ssl/pub
sudo chmod -R 0750 ssl/priv
cp ~/CA/ca.crt ~/CA/example.org.crt
cp ~/CA/example.org.key
# Edit the ssl config file to your needs
vi snippets/ssl_example.org.conf
# Now adjust the nginx configuration to use SSL
vi sites-enabled/default
# Ensure following lines are added and not commented out
# listen 443 ssl default_server
# listen [::]:443 ssl default_server
# include snippets/ssl_example.org.conf
service nginx restart

When everything is OK, use your desktop web browser and point it to the location https://example.org. Your should get the page without an error. Thos means, you have setup a CA you can use to sign server certificates and they get trusted.

If you plan to use the Debian package repository on many of your linux hosts, then you should add your CA certificate to the certificate store on all the machines.

Generating GnuPG Key-Pair

To sign a file, email, hash, debian package, repository,… you often need GnuPG. To be able to sign something, you need to first generate your own key, that get trusted from at least the receiving party. All this works again with asymmetric encraption, like the signing of certificates does. An in depth tutorial with links to even deeper knowledge can be found here.

First we should install a tool to gather some entropy, otherwise gnupg may be not able to generate a key on a headless system (no real user input,… –> very few entropy sources).

apt install rng-tools

IF it can not find a hw-rng, you can still try to get randomsound working (if you have a soundcard…)

apt install randomsound

Run this in a seperate window, when gpg is collecting entropy for too long. It will abort after some time, if it can not generate the key.

arecord -l # Do you have any soundcard?
randomsound -v

If all fails, you can still pipe some data into /dev/random to feed the entropy pool, e.g. with (also in a seperate window when gpg gen-key is running.

sudo dd if=/dev/sda of=/dev/random status=progress

You can watch the entropy-pool with:

watch -n 0.5 cat /proc/sys/kernel/random/entropy_avail

To finally generate a GPG-key, simply follow the instructions below:

apt install gnupg
# Create the .gnupg directory easily and add a secure configuration
gpg --list-keys --fingerprint
wget https://raw.githubusercontent.com/the78mole/scripts/master/templates/configs/gnupg/gpg.conf -O ~/.gnupg/gpg.conf
gpg --full-gen-key
# Select:
# Key type : RSA and RSA
# Keysize : 4096
# Expiration: 1y
# Then enter your name and email, but don't include a comment
# Skipping the password makes CI much easier, but less secure...
# It will take some time (maybe minutes) to generate the key

Creating the debian repository (reprepro)

make-debs already created a debian repository, but we will create one, that is more general, also serving well for other software packages. make-deps

…. to be continued …

Coming soon: To add some real NAS features, we could use just another embedded board with e.g. FreeNAS or NextCloud installed to mount the cluster file system and using the cluster as the storage backend. We already have the nginx SSL configured, so we easily can add reverse proxy targets… (for HTTPS-HTTPS-proxy, see here)

Sorting Your Digital Mess – How to Easily Set-Up a Private Search Engine

Motivation

In my vacations, I kicked-off some new projects. One of it is an ARM64 based SBC with integrated SATA mostly like the Odroid-HC1/2. The main difference is the ARM (64 vs. 32 Bit) to be able to run Ceph on it.

If you read some of my previous posts, you will notice, that I already put in some effort to make Ceph run on 32-bit controllers. But the effort was way to high, to make and keep it running. But the new SBC is another story, I will tell, when the time is ripe for it.

When I thought of putting all my data on a Ceph cluster, I just stumbeld over the probem, how to manage all these bits and bytes and how to keep the overview. The should be something like a Google-Search for your private data. When you dig through the net, you will find many sites, blogs and books about elastic search. But there is always a problem, how to get your data in there without a deep knowledge on data providers, ETL, graphs,… But you could also find OpenSemanticSearch (OSS) and I gave it a try.

A Private Search Engine – Open Semantic Search

To be honest, OpenSemanticSearch is not the most beatiful engine you could think of, but it gives you a very deep insight on your data, assuming you configured it correctly. Unfortunately, the documentation is quite sparse and there have not been many bloggers writing on the topic, at least not for some current version of it.

After you finished Installation, first do some deeper configuration. If you choose to use a VM, I suggest, to usa a RAM centric configuration. Better take 12+ GB of RAM and only 4 cores. Every core will lead to an etl-task that eats up a significant amount of RAM, depending on the data it is advised to index.

To avoid links of the WebUI to point you to some unreachable destination, do some configuration ahead of adding local crawler paths. If you mount your data using mount to a path in your filesystem on the OSS server, the link will be provided unchanged (e.g. /mnt/myData) and the URL will just be the location it is on your server without http, hostame, location-suffix,… Not really helpful, if you have mounted NFS shares and want to access it on a remote windows machine.

OSS provides already some document server, that proxies your requests to HTTP, but you need to configure it correctly. But if you do not configure it in the beginning, your whole indexing run will create wrong links and it will be hard to stop the indexing and do a rerun (see the troubleshooting section below). The task will be running, until it has done it’s job. For terabytes of data, this will last at least some days.

Installation and Configuration of OSS

Installation of OSS is quite straight forward, as described on their web page.

Configuration is a bit more complex due to the fact, that the documentation is a bit sparse. Some of it can be done with the web UI, but others are a bit more covered. But first things first.

Mounting Your Data

The first to do is, to make your data accessible to the indexer.

[TODO]

Configuring Apache to Proxy the Documents

The quick hack is, to add your path to proxy in /etc/apache2/sites-available/000-default.conf

Insecure Proxy for documents

This allows the documents to be accessed throuch http://<IP_OF_YOUR_OSS>/documents/mnt but also through http://<IP_OF_YOUR_OSS>/mnt. We will secure this later on. But for debugging, this is quite helpful.

To get the links in OSS right, you need also to adjust /etc/opensemanticsearch/connector-files to have a line with the following content

config['mappings'] = { "/": "http://172.22.2.108/documents/" }

You can also add more mappings, but this helps to get it right for the links in OSS Web UI.

(Re-)Starting the OSS server

I would advise to do a simple reboot of your server, to also check, if the shares get mounted correcly. For my debian buster, this is not the case. It is failing to mount the NFS shares for some reason, I have not yet digged down. I manually mount it, after it has startet. I assume, the systemd start dependencies are a bit buggy.

[Much more to write on… Coming soon]

Troubleshooting

So, what to do, if something goes wrong? E.g. if your decided to index a ton of data, it is time to purge the queue. But how?

Purging the Queue

OpenSemanticSearch uses RabbitMQ to organize the indexing tasks. If OSS decides to index a path, it will simply list all files in this path and put it in the open_semantic_etl_tasks-Queue of your RabbitMQ server. But the user interface of OSS does not provide a means of purging or deleting the content of the queue. But therefore, you neet to activate the rabbitmq management web ui.

sudo rabbitmq-plugins enable rabbitmq_management

After this, you can check, if your server listens on the apropriate interface (0.0.0.0 for acces from another host) and port (15672)

netstat -nlpt output of a typical OSS host with enabled RabbitMQ Web UI

This checked, we need to add a user with administrative rights to the OSS worker.

rabbitmqctl cluster_status   # Check if everything is fine
rabbitmqctl list_users       # Check if your webuser doesn't exist
rabbitmqctl add_user webuser <PASSWORD>
rabbitmqctl set_user_tags webuser administrator
rabbitmqctl set_permissions -p / webuser "." "." "."

This done, you can simply acces the user interface. Type in your browser http://<IP_OF_OSS_HOST>:15672/ and the login will be presented to you.

RabbitMQ Web Login

After loging in, head over to the queues and enter your open_semantic_etl_task-queue.

Queues view of RabbitMQ web UI

You should be presented with the queue details

Queue details of open_semantic_etl_tasks

In my example, you can see a very limited count of ready tasks. This can go up to a few thousand tasks when indexing a large directory with many files. Each file will add a task to this queue and RabbitMQ will hand this out to the etl workers of OSS.

To delete all messages in the queue, you can simply hit the Purge Messages button.

Purging the queue

After this, you can re-enqueue the index files on your OSS search site through http://<IP_OF_OSS_HOST>/search-apps/files/

Cookie Banner von Real Cookie Banner