addie

Initial Docker Image Build

As conventionally, the ADDIE (ADvanced DIffraction Environment, available at addie.ornl.gov) service is deployed on a VPS (specifically, an instance on the ORNL hosted research cloud). The service itself and all the necessary configurations are therefore locked to the VPS machine, making it extremely painful for transfer and maintenance. Especially for the maintenance, anytime we need to install a new functionality into the ADDIE service, unavoidably we would need some new dependencies. However, the existing modules and libraries will potentially be conflicting with the module we are trying to install and quite often we need to do the homework for testing and find out the compatible versions of all the modules (existing ones and those to be installed). Doing such on a VPS locally is rather risky as once the installed new module breaks some other modules, it is very difficult, or sometimes even impossible, to roll back. So, from the long run, we need to make the service containerized, using, e.g., docker so the whole service is self-contained and more importantly, the testing will become way easier. We just need to pull the docker image, fire up a container and perform the installation and test interactively. Once finished, we can then push the image to a new version and deploy it on the VPS via docker. This blog will keep a record of the preparation of the docker image for ADDIE. The procedure may not be the optimal one, as the finally prepared docker image is with the size of ~10 Gb. But at least, it will give us a working version of the docker image, with which we can then easily fire up a ADDIE service instance on any supported VPS.

N.B. The reason why the docker image is large is that we include all the required modules and configurations in the image. We could probably do the installation and configuration via docker entrypoint or startup script. But in this case, I will not go through that route, and in my case, the image is huge, but the startup script is very simple.

Pull a startup docker image, e.g.,
```
 docker pull ubuntu
```
Run the pulled docker image interactively,
```
 docker run -i -t ubuntu bash
```
where the ubuntu refers to the pulled docker image name.
Within the interactive docker container, install all the necessary packages,
```
 apt install git
 apt install gfortran
 apt install build-essential
 apt install vim
 apt install wget
 apt install curl
 apt install sshfs
 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /tmp
 bash /tmp/Miniconda3-latest-Linux-x86_64.sh
 apt install python-wxtools
 wget https://github.com/AdvancedPhotonSource/GSAS-II-buildtools/releases/download/v1.0.1/gsas2full-Latest-Linux-x86_64.sh -P /tmp
 bash /tmp/gsas2full-Latest-Linux-x86_64.sh -b -p ~/g2full
 apt install redis-server
```
GSAS-II requires quite a few compiled modules to run and those modules were pre-compiled against a certain Python version. However, for the reason to be mentioned down below, we have to stay with Python=3.7 which is no longer working with the latest compiled GSAS-II modules. So, after we install the full GSAS-II package, we need to remove all those pre-compiled modules within the /root/g2full/GSAS-II/GSASII-bin directory and place the unzipped files from this link in there – files in the link here were compiled some old version of Python which works just fine with Python=3.7.

curl and python-wxtools are needed for running GSASII. Without python-wxtools, running GSASII will complain about wx not found.

GSASII also needs libgfortran.so.4 to run and usually the library file is not available in the startup image. We can download this file here and put it somewhere in the docker container (e.g., /usr/lib64) which we will use later on.
Clone the source code of ADDIE (assuming we are located at / in the docker container on the command line),
```
 git clone https://code.ornl.gov/general/tsitc.git
 git checkout -b docker_new remotes/origin/docker_new
```
We need the token for login. Go to Settings -> Access Tokens -> Add new token, and select proper permission, select the expiration date and then copy the generated token.

Also, we need to let the system memorize the git credentials so we don’t to type in the access token every time we try to pull the updates. We can run remotes/origin/docker_new.
Create a conda environment,
```
 conda create -n py37 python=3.7
 conda activate py37
```
The Diffpy-CMI module is only compatible with Python=3.7, so we have to stay with it for the moment, even the Python=3.7 has been dropped for support.
Go into the tsitc directory and install all the required python modules,
```
 pip install -r requirements.txt
```

Install Diffpy-CMI,

 conda config --add channels diffpy
 conda install diffpy-cmi

Install strumining,
```
 cd
 git clone https://code.ornl.gov/ly0/strumining.git
 cd strumining
 pip install pymatgen
 pip install pybtex
 pip install PyYAML
 python setup.py install
```
After the installation, we need to manually copy over two folders (data and utils) in the strumining source code tree to the corresponding location in the conda environment. In my case, the destination is,
```
 /root/miniconda3/envs/py37/lib/python3.7/site-packages/strumining-1.0.0-py3.7.egg/strumining
```
The specific location will depend on the conda installation location and the conda environment we created previously.
Finally, we need to copy over the instance folder (download the zip file here, get in touch with Yuanpeng for access passcode) to the /tsitc directory.

With the recent update to include the LDAP authentication, there is another file ldap_blueprint.py (download it here – again, get in touch with Yuanpeng for access passcode) under the pdfitc directory that needs to be copied over manually, as this file contains sensitive authentication information which should not be included in the git history.

Also with the LDAP implementation, we need to install the flask-session, flask-wtf, pyldap, and pyoncat modules,
```
 conda install conda-forge::flask-session
 conda install anaconda::flask-wtf
 pip install pyldap
 pip install https://oncat.ornl.gov/packages/pyoncat-1.5.1-py3-none-any.whl
```
When installing pyldap with pip, we might come across with errors relevant to the gcc failure, in which case the following command might be helpful [12],
```
 sudo apt install libsasl2-dev libldap2-dev libssl-dev
```

Install the VESTA and data2config software, by running,

mkdir /Applications
cd /Applications
wget https://jp-minerals.org/vesta/archives/testing/VESTA-gtk3-x86_64.tar.bz2
tar xvf VESTA-gtk3-x86_64.tar.bz2
mv VESTA-gtk3-x86_64 VESTA-gtk3
mkdir -p /Applications/RMCProfile_package_V6.7.9/exe
cd /Applications/RMCProfile_package_V6.7.9/exe
wget https://flv.iris-home.net/pkgs/data2config

Exit the docker container – just execute exit from the command line.
Commit the container to a new image. First, we can check the running container(s) using the command,
```
docker ps -a
```
Identifying the container we were working on by its name and ID, then,
```
docker commit [CONTAINER_ID] flask_addie_n
```
Use the command docker images to check the new committed image in the docker image list.

The flask_addie_n here refers to the name of the image to be created by the commitment. It could be possible that an image with the same name already exists on the local host. In such a situation, the existing image with the same name as, e.g., flask_addie_n, will be renamed to <none> and the new flask_addie_n stays the latest.

N.B. It is always a good practice not to do the commit so often since otherwise the docker image will have a lot of layers until hitting the limit whereby we cannot do the commit anymore. Instead, once we have done the initial commit, we can use the git pull command in the startup script (see step below) to update the source codes so that the web service inside the docker container can be updated.
Prepare a Dockerfile file, as below,
```
FROM flask_addie_n

WORKDIR /tsitc

COPY startup.sh /

CMD ["/bin/bash", "-c", "/startup.sh"]
```
where flask_addie_n is just the name of the new committed docker image in previous step. The name can be whatever we prefer to use and we can always use flask_addie_n as it will overwrite the existing image with the same name and the original one will be backed up to a <none> image.
In the same folder (now, we already existed the docker container and we are on the host machine) as the Dockerfile file, we need to create a startup.sh file as below,
```
#!/bin/bash

source /root/miniconda3/etc/profile.d/conda.sh
conda activate py37
export LD_LIBRARY_PATH='/usr/lib64'

sns_dir="/SNS"
if findmnt | grep -q "${sns_dir}"; then
    echo "The directory ${sns_dir} is mounted."
else
    echo "The directory ${sns_dir} is not mounted."
    sshfs sns:/SNS /SNS
fi

hfir_dir="/HFIR"
if findmnt | grep -q "${hfir_dir}"; then
    echo "The directory ${hfir_dir} is mounted."
else
    echo "The directory ${hfir_dir} is not mounted."
    sshfs sns:/HFIR /HFIR
fi

git pull
gunicorn -c gunicorn_config.py run:app &
redis-server --port 6379 &
celery -A pdfitc.app.celery worker --loglevel=info
```
N.B. The startup script will be run when launching the docker image and it will be running in the linux environment. So, if we were preparing the startup.sh file on Windows, the file ending will cause some issues, in which case, we may need to edit the file using special solutions, e.g., in WSL linux environment on Windows.

The export command is for exporting the library system path so that the GSASII program can find the libgfortran.so.4 that we previously put in the /usr/lib64 directory.

In the last line of the startup script, we don’t need the & sign so that the process will be running as the for-ground process without releasing the process. If we put an & sign to the end of the command, docker will exit immediately after running the last command since he thinks that he has gone over all the processes and will exit without worrying about those jobs running in the background.
Build the docker image,
```
docker image build -t flask_addie_n .
```
Again, the flask_addie_n here refers to the image to be created via the image building. Same as the comments above, if a local image of the same name already exists, the existing image will be renamed to <none>, with the new flask_addie_n staying the latest.
Fire up the container,
```
docker run --privileged -v /home/cloud/.ssh:/root/.ssh/keys -p 5000:5000 -d flask_addie_n
```
In the demo here, we were mapping the local directory /home/cloud/.ssh to the /root/.ssh/keys directory inside the docker container. In general situation, we can for sure adjust the location on both sides, but we will take this as is in current documentation.
The Flask server should be now accessible from the host machine, at localhost:5000.
Now, we want to go into the running container in the interactive mode and set up the passwordless connection to the Analysis cluster,
```
docker exec -it [CONTAINER_ID] /bin/bash
cd ~/.ssh
cp keys/* .
chown root:root config
```
Push the local docker image to the Docker Hub,
```
docker login --username=apw247
docker tag bb38976d03cf apw247/flask_addie_n:latest
docker push apw247/flask_addie_n
```
where bb38976d03cf is the docker image ID which we can obtain via the command,
```
docker images
```
to see all the existing images on our host machine and we can identify the associated image ID with the flask_addie_n image.

For security purpose, the remote docker repository is made private. To contribute to the repo, please get in touch with Yuanpeng to request access.

Deployment

First, log in the remote VPS and run,
```
 docker run --privileged -v /home/cloud/.ssh:/root/.ssh/keys -p 5000:5000 -d apw247/flask_addie_n
```
to start up the server and then we can configure nginx to redirect the traffic on the port 443 to the local 5000 port.
The ADDIE service is hosted on an ORNL research cloud instance – this is the ORNL internal cloud computation resource. For security concerns, ORNL needs to perform systematic scanning over the server and the ADDIE service. On the system side, those vulnerabilities can be easily patched according to the instructions provided by the cyber security team. On the nginx side, we need to add specific headers for mitigating the potential vulnerabilities and it turns out that not only we need to patch for the nginx configuration, but also we need to add in some security header into the Flask app. Here is the saved nginx configuration, Click Me, and here is the chunk of codes in the Flask app,
```
 # Define a function to set X-Frame-Options header
 def add_security_headers(response):
     response.headers['X-Frame-Options'] = 'DENY'
     response.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains; preload;'
     return response


 # Register the function as an after_request handler
 app.after_request(add_security_headers)
```
There are some other security patches that will be needed in the Flask app, to mitigate risks such as Cross-Site Request Forgery (CSRF), rigorous input field checking, etc. We can refer to the source codes for ADDIE (ORNL internal access only, not open source yet) here, Click Me.
Another domain addie-dev.ornl.gov has been made available to point to the local 6000 port. So, to fire up a test service of ADDIE, we can execute the following command first,
```
 sudo docker run --privileged -v /home/cloud/.ssh:/root/.ssh/keys -it -p 6000:5000 apw247/flask_addie_n bash
```
The --privileged flag is to ensure that remote drives can be mounted using sshfs within the docker container.

Then, within the docker container, run the following commands to start up the server,
```
 sshfs sns:/SNS /SNS
 sshfs sns:/HFIR /HFIR
 conda activate py37
 gunicorn -c gunicorn_config.py run:app &
 redis-server --port 6379 &
 celery -A pdfitc.app.celery worker --loglevel=info &
```
As the dev server is up running, we can open another terminal to execute,
```
 sudo docker exec -it [CONTAINER_ID] /bin/bash
```
where [CONTAINER_ID] refers to the ID of the running docker container started in previous step (use sudo docker ps -a to see all the running containers). Within this interactive shell, we can change the code and it will be directly reflected onto the addie-dev.ornl.gov server – we may have to kill the gunicorn job from another shell and restart it if it is the Python source codes that were changed. The changes in template HTML files will be directly reflected without restarting the server, though.
While the docker container is running, we can lanuch an interactive shell in the running container like this,
```
 sudo docker exec -it CONTAINER_ID /bin/bash
```
where CONTAINER_ID is the ID of the running container, which can be obtained via running sudo docker ps -a.
Sometimes during the docker image commit and building process, we could have multiple repositories attached a single image ID. In this case, if we want to remove certain tags, we can do,
```
 sudo docker rmi REPO_NAME
```
where REPO_NAME refers to the repository name associated with a certain image. It can be obtained via running sudo docker images and usually the first column would be the repository name.