Backup of Cloudant databases to IBM Cloud Object Storage in Kubernetes

Ong Khai Wei
3 min readMay 1, 2020

More than a year ago, I wrote a post about performing backup of JSON messages that stored in Cloudant to IBM Cloud Object Storage. The backup task is scheduled to execute by using IBM Cloud Function.

I was glad to know that the post is useful to someone and trying to incorporate the asset into their project. Each project has its unique requirements in term of backup needs, in example backup of multiple databases and the size of Cloudant database is big. IBM Cloud Function is a serverless service, it has a limit of 600 seconds (10 minutes) execution time. In this case the asset requires some modification to cater for project requirements.

Instead of using IBM Cloud Function, Kubernetes provides a Job feature which allows the scheduling of task to be executed in Kubernetes cluster. There are few steps to modify the existing asset to achieve this

Step 1: Modify Node.JS codes

IBM Cloud Functions executes Node.JS coding by executing main function as follow:

exports.main = couchBackupAction;

Modification is done to read a JSON file and perform a loop to perform the backup of multiple databases as follow:

fs.readFile('/tmp_config/params.json', 'utf8', function(err, params) {const configs = JSON.parse(params);

configs.forEach(config => {
couchBackupAction(config);
});
});

Step 2: Dockerfile

Although IBM Cloud Function executes the coding in container, however it is done at the fly is using Node.js runtime, without the need to build a container image upfront (IBM Cloud Function supports container image execution as well). So I wrote a simple Dockerfile to build the container image.

FROM node:10# Create app directory
WORKDIR /usr/src/app
# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm@5+)
COPY package*.json ./
RUN npm install
# RUN npm install
# Bundle app source
COPY . .
CMD [ "npm", "start" ]

Step 3: Kubernetes Yaml files for ConfigMap and Job

IBM Cloud Function uses “parameters” to inject into executions to prevent any hardcoded value inside the application and to increase portability as well. In this case I store the parameters into ConfigMap as JSON array. If there are 1 database to be backup, then it will be just 1 parameter in an array. If there are multiple, we just need to duplicate the parameters and modify according.

ConfigMap — configmap.yaml

kind: ConfigMap 
apiVersion: v1
metadata:
name: configmap
data:
params.json: |
[
{
"bucket": <BUCKET_NAME>,
"key": <KEY>,
"cloudant_url": <CLOUDANT_DB_URL>,
"config": {
"endpoint": <IBM_COS_ENDPOINT>,
"apiKeyId": <IBM_COS_APIKEY>,
"ibmAuthEndpoint": "https://iam.ng.bluemix.net/oidc/token",
"serviceInstanceId": <RESOURE_INSTANCE_ID>
}
},
{
"bucket": <BUCKET_NAME>,
"key": <KEY>,
"cloudant_url": <CLOUDANT_DB_URL>,
"config": {
"endpoint": <IBM_COS_ENDPOINT>,
"apiKeyId": <IBM_COS_APIKEY>,
"ibmAuthEndpoint": "https://iam.ng.bluemix.net/oidc/token",
"serviceInstanceId": <RESOURE_INSTANCE_ID>
}
}
]

Job — job.yaml

apiVersion: batch/v1
kind: Job
metadata:
name: cloudantbackup
spec:
template:
spec:
volumes:
- name: config
configMap:
name: configmap
containers:
- name: cloudantbackup
image: cloudantbackup:1
volumeMounts:
- name: config
mountPath: /tmp_config
restartPolicy: Never
backoffLimit: 1

This job.yaml is just a simple definition for 1 time execution, to schedule with certain Job patterns, then can refer CronJob — https://kubernetes.io/docs/tasks/job/

Please take note take the volume path cannot be /tmp as couchbackup package will write temp file to this file system.

Step 4: Test

To deploy configmap and job

kubectl apply -f configmap.yamlkubectl apply -f job.yaml

To check the status of job

kubectl get jobs

To check the log of the job

kubectl logs job.batch/cloudantbackup

Full source is available at my gihub repo — https://github.com/ongkhaiwei/couchbackupk8sjob

--

--