Porting a humongous Perl script to Python

The Problem

The first project I was tasked with at my new job involved porting a large (>18k lines long!) Perl script to Python. I knew from experience that trying to do this in one ‘big bang’ step was sure to result in the new version having a bunch of bugs that had been squashed out of the Perl script over years of development. Instead I sought a more cautious approach which is described in this post.

The Perl script in question is run as a console app. It takes a document id along with various optional arguments. The script locates/generates a number of urls, writes them to a database and exits. It is invoked by another Perl script which reads the results from the database on completion, all in the lifecycle of a FastCGI API request.

Now, many years on from this script being created it seems an obvious fit for a ‘microservice’. Thus the goal is to both port the code to Python (as part of a company-wide push to consolidate languages) and to change it from a console app to a Flask API.

Interfacing Code

Going back to the cautious approach I mentioned above; fortunately the structure of the existing Perl script lent it to being gradually ported over piece by piece. I looked for a way to interface it with the new Flask API and the Python subprocess module looked like it would work nicely.

In terms of data transfer, I made a minor modification to the Perl script to write its output to stdout as JSON, rather than to the existing database (which I did not want the Python API to be coupled to). Writing data to stdout sounds fragile but I rationalised that this is exactly what Linux utilities piped to each other have been doing for years. It just means you have to be careful not to have any stray print statements floating around.

The interfacing Python code looks something like this:

def get_links_from_perl_script(start_process, process_info):
	input_list = [start_process, process_info]
	input_json = json.dumps(input_list)

	p = Popen(perl_script_path, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=perl_script_dir)
	output, err = p.communicate(input_json.encode())
	rc = p.returncode

	if rc == 0:
		logger.info('Perl script finished successfully for start process: %s' % (start_process))
		if err:
			logger.warn('But the following errors were reported: %s' % err)
		links = []
		json_output = json.loads(output)
		return json_output 
		logger.error('Perl script exited with non-zero error code: %d for start process: %s. Error: %s' % (rc, start_process, err))

And on the Perl side:

use strict;
use JSON qw(encode_json decode_json);
my $str = do { local $/; <STDIN> };
my $decoded_json = decode_json($str);
# do stuff....
print encode_json(\@some_results);

One extra quirk is that I work on a Windows machine. Whilst options exist to install Perl on Windows, it definitely doesn’t seem to be a first class citizen. However we now have the WSL (Windows Subsystem for Linux)! My Ubuntu WSL already has Perl installed, so I wondered if I could get my Python Flask API to spin up a Perl subprocess on the WSL and pipe data to and from it. It turns out this is fairly easy. In the Python code above, the perl_script_path variable is declared as follows:

perl_script_path = 'wsl perl /mnt/c/Users/nware/Dev/this_project/humongous_script.pl'.split()

Note: a trick for young players is that this won’t work if you have a 32-bit version of Python installed. The WSL is 64-bit so Python won’t know how to find it. Ideally just install 64-bit Python, but you can work around it with this magical incantation:

perl_script_path = os.path.join(os.environ['SystemRoot'], 'SysNative', 'wsl perl /mnt/c/Users/nware/Dev/this_project/humongous_script.pl'.split()

Perl package management

A quick note on Perl package management. I was frustrated at the seemingly manual process of installing Perl packages with cspan. Coming from a Python/C#/Javascript background which all have good(ish) package management solutions, this seemed archaic. I went looking for something similar and found exactly what I was after: Carton.


This all worked nicely for dev/test but I wanted the Flask API in a Docker container for production. The tricky thing here is that containers are meant (for good reason) to run a single workload. Thus there are official Python base containers and official Perl base containers but obviously none that have both.

I ended up creating an intermediary container, which is essentially the contents of the official Perl Dockerfile but with the based changed from buildpack-deps:stretch to python:3.6-stretchhttps://hub.docker.com/r/nwareing/perl-python/.

Note: The long term goal of this project is to gradually port all of the Perl code into Python. When this is done, the interfacing code can be removed and we will just use the offical Python docker image as per normal.

I could then create my actual application Dockerfile as follows:

FROM nwareing/perl-python:latest

RUN cpanm Carton && mkdir -p /perl_src    
WORKDIR /perl_src

COPY ./perl_src/cpanfile /perl_src/cpanfile
RUN carton install


RUN set -ex && pip install pipenv uwsgi

COPY ./Pipfile* /app/

RUN pipenv install --system --deploy

COPY ./perl_src/humongous_script.pl /perl_src/humongous_script.pl

COPY ./src /app

# Start uwsgi web server
ENTRYPOINT [ "uwsgi", "--ini", "uwsgi.ini" ]


In summary, the work of art / monstrosity I’ve created looks something like this:

Hosting Upsource with Docker – DNS Dilemmas

Currently at work we are using an open source source code management tool called Kallithea. Unfortunately it doesn’t seem to be under active development any longer and is in general a bit unstable and lacking the features we need in a growing development team. For me the biggest pain point was not having a nice web interface to browse and review code. We’re currently evaluating other options (BitBucket, GitHub, VSO/TFS etc.) and trying to decide whether to self-host or not. This process is taking a bit of time, so I went looking for something to tide us over until we came up with a more permanent solution. This lead me to Upsource, one of JetBrains’ latest incarnations.

Upsource is web-based tool for browsing and reviewing code. The handy thing with Upsource is that it tacks onto your source code hosting tool, rather than being an all-in-one like the systems we are looking at moving to. This allowed me to quietly install it without ruffling any feathers and let members of the team decide whether or not they wanted to use it. Luckily I had a spare Linux box running Ubuntu on which I was quickly able to get it installed and hooked up with LDAP.

The interesting part came a month or later when the next version of Upsource was released (February 2017). As well as a bunch of handy new features (full-text search FTW) they also announced that new versions were being published as Docker images. This sounded like a good idea and one which would make future updates easier, so I followed the instructions to migrate my Upsource instance to being hosted under Docker. Unfortunately I found that after starting up my new version of Upsource inside a Docker container, it could no longer resolve internal URLs; neither those pointing to the source code repositories or to the LDAP server.

A bit of Googling revealed that this was a known issue with Docker on recent versions of Ubuntu: https://github.com/docker/docker/issues/23910. It sounds like it’s resolved in the latest version of Docker, but I couldn’t work out whether that had been released yet.

Luckily someone had already written up a handy blogpost showing how to get around the issue: https://robinwinslow.uk/2016/06/23/fix-docker-networking-dns/#the-permanent-system-wide-fix.

I went with the ‘quick fix’ approach described there:

  1. I ran this command to find the IP address of the DNS server running inside my company’s network. This spat out two contiguous IPs for me, so I just choose the first one.
    $ nmcli dev show | grep 'IP4.DNS'
  2. Added a –dns argument to the docker run command I used to start the Upsource container.

Problem solved!

TradeMe to Trello Chrome Extension

TradeMe to TrelloIt can be tricky keeping track of a bunch of listings when you’re looking to join a new flat. Have I already contacted that person? When am I viewing this place? Have I heard back from them yet? The built in functionality on TradeMe (the auction site just about everyone in New Zealand uses to list and look for flats) is just not up to the task.

Trello is a web application which allows you to create a custom set of lists and to move ‘cards’ back and forth between them. Many developers and others working in the tech industry are likely to be familiar with it already.

I’ve created a Chrome extension to link TradeMe and Trello together, making moving flats that little bit easier. Using this extension, it’s as simple as clicking the icon when you are on a TradeMe listing to have a card automatically created in Trello.

Trello board with TradeMe listings

May your flat hunting be forever more organised!

Get the extension: https://chrome.google.com/webstore/detail/trademe-to-trello/eapogjcjbcgaoocipcfcnedibnfdmlng?hl=en&gl=NZ.

The code is open source, and up on GitHub: https://github.com/nick-nz/trademe-to-trello.

Is there more to Googling than you think?

As a developer I Google stuff. Alot. It almost happens automatically:

  1. Working on something.
  2. Unfamiliar error appears.
  3. Google it.
  4. Choose the first StackOverflow link in the results.
  5. Problem solved (usually).

That trivialises the process somewhat. Most decent developers will spend some time considering the error and trying a few things to fix it before resorting to searching. And of course we don’t just seek help when errors occur: looking for best practices during the design phase of a project, finding a more concise way of implementing some logic and learning from the mistakes of others are all common use cases.

So I was surprised when recently I was tutoring at a ‘bootcamp’ style programming course and noticed many of the students struggled to construct useful search queries. They would do things like searching for an error verbatim, including their custom variable names and data. They struggled to abstract and generalize a problem. They also struggled to use the results of a first query to make improvements to subsequent queries.

It turns out being a good Googler is a skill many of us have subconsciously built up over years of work. Is my problem language or framework specific? Do I need to widen or narrow my search? Is this even a problem that the wider developer community would be able to help with or is it an issue specific to my company’s codebase?

Wisdom of the Ancients


So, if you’re involved in teaching programming or mentoring junior developers, consider working with them to construct useful searches. You may already know the answer to the problem and could go straight to helping them with it. Teaching them to find the solution themselves may actually be more beneficial.

Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.