Porting a humongous Perl script to Python

The Problem

The first project I was tasked with at my new job involved porting a large (>18k lines long!) Perl script to Python. I knew from experience that trying to do this in one ‘big bang’ step was sure to result in the new version having a bunch of bugs that had been squashed out of the Perl script over years of development. Instead I sought a more cautious approach which is described in this post.

The Perl script in question is run as a console app. It takes a document id along with various optional arguments. The script locates/generates a number of urls, writes them to a database and exits. It is invoked by another Perl script which reads the results from the database on completion, all in the lifecycle of a FastCGI API request.

Now, many years on from this script being created it seems an obvious fit for a ‘microservice’. Thus the goal is to both port the code to Python (as part of a company-wide push to consolidate languages) and to change it from a console app to a Flask API.

Interfacing Code

Going back to the cautious approach I mentioned above; fortunately the structure of the existing Perl script lent it to being gradually ported over piece by piece. I looked for a way to interface it with the new Flask API and the Python subprocess module looked like it would work nicely.

In terms of data transfer, I made a minor modification to the Perl script to write its output to stdout as JSON, rather than to the existing database (which I did not want the Python API to be coupled to). Writing data to stdout sounds fragile but I rationalised that this is exactly what Linux utilities piped to each other have been doing for years. It just means you have to be careful not to have any stray print statements floating around.

The interfacing Python code looks something like this:

def get_links_from_perl_script(start_process, process_info):
	input_list = [start_process, process_info]
	input_json = json.dumps(input_list)

	p = Popen(perl_script_path, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=perl_script_dir)
	output, err = p.communicate(input_json.encode())
	rc = p.returncode

	if rc == 0:
		logger.info('Perl script finished successfully for start process: %s' % (start_process))
		if err:
			logger.warn('But the following errors were reported: %s' % err)
		links = []
		json_output = json.loads(output)
		return json_output 
	else:
		logger.error('Perl script exited with non-zero error code: %d for start process: %s. Error: %s' % (rc, start_process, err))

And on the Perl side:

use strict;
use JSON qw(encode_json decode_json);
my $str = do { local $/; <STDIN> };
my $decoded_json = decode_json($str);
# do stuff....
print encode_json(\@some_results);
exit(0);

One extra quirk is that I work on a Windows machine. Whilst options exist to install Perl on Windows, it definitely doesn’t seem to be a first class citizen. However we now have the WSL (Windows Subsystem for Linux)! My Ubuntu WSL already has Perl installed, so I wondered if I could get my Python Flask API to spin up a Perl subprocess on the WSL and pipe data to and from it. It turns out this is fairly easy. In the Python code above, the perl_script_path variable is declared as follows:

perl_script_path = 'wsl perl /mnt/c/Users/nware/Dev/this_project/humongous_script.pl'.split()

Note: a trick for young players is that this won’t work if you have a 32-bit version of Python installed. The WSL is 64-bit so Python won’t know how to find it. Ideally just install 64-bit Python, but you can work around it with this magical incantation:

perl_script_path = os.path.join(os.environ['SystemRoot'], 'SysNative', 'wsl perl /mnt/c/Users/nware/Dev/this_project/humongous_script.pl'.split()

Perl package management

A quick note on Perl package management. I was frustrated at the seemingly manual process of installing Perl packages with cspan. Coming from a Python/C#/Javascript background which all have good(ish) package management solutions, this seemed archaic. I went looking for something similar and found exactly what I was after: Carton.

Containerization

This all worked nicely for dev/test but I wanted the Flask API in a Docker container for production. The tricky thing here is that containers are meant (for good reason) to run a single workload. Thus there are official Python base containers and official Perl base containers but obviously none that have both.

I ended up creating an intermediary container, which is essentially the contents of the official Perl Dockerfile but with the based changed from buildpack-deps:stretch to python:3.6-stretchhttps://hub.docker.com/r/nwareing/perl-python/.

Note: The long term goal of this project is to gradually port all of the Perl code into Python. When this is done, the interfacing code can be removed and we will just use the offical Python docker image as per normal.

I could then create my actual application Dockerfile as follows:

FROM nwareing/perl-python:latest

RUN cpanm Carton && mkdir -p /perl_src    
WORKDIR /perl_src

COPY ./perl_src/cpanfile /perl_src/cpanfile
RUN carton install

WORKDIR /app

RUN set -ex && pip install pipenv uwsgi

COPY ./Pipfile* /app/

RUN pipenv install --system --deploy

COPY ./perl_src/humongous_script.pl /perl_src/humongous_script.pl

COPY ./src /app

# Start uwsgi web server
ENTRYPOINT [ "uwsgi", "--ini", "uwsgi.ini" ]

Summary

In summary, the work of art / monstrosity I’ve created looks something like this: