For a while now, I’ve been building findatechjob.dev to help developers find their dream jobs without all the spam and distractions of the big jobs boards.

At 8am every morning, I’d be filled with dread as I checked my website to see whether the daily jobs load had succeeded.

Most of the time it worked, but when it didn’t I’d have the dreaded task of a manual VM restart ahead of me. Shocking.

What Went Wrong

To get fresh jobs data into findatechjob, I had a simple Cloud Run Job that ran each morning. It hit various APIs for jobs data, transformed it into a standard format, then simply dumped it into a Meilisearch server I had running on a VM in GCP.

This worked fine for the most part. It cost me around $8/month, the search latency was pretty good and it was relatively easy to set up.

The problem was the machine I had it on: it was an e2-micro instance with 2 shared vCPUs and 1 GB of memory.

This was enough at first, until it wasn’t. Over time, the number of jobs I was gathering increased, so Meilisearch had to store more data in memory. Jobs are quite small, and that actually wasn’t a problem for its 1 GB of memory.

The problem was indexing. Each time I loaded a new dataset into Meilisearch, it had to re-index it all. This caused a memory spike on each load that sometimes the VM survived, and other times it completely bricked it.

The Core Insight

Eventually I grew tired of the sad mornings, and had looming plans to vastly increase the number of jobs on my site. So I had to solve this problem.

I came to a realisation: serving search requests is an inherently different task to building an index, yet I was expecting my tiny VM to handle both with the same resources.

So why not separate them?

The Solution

My solution was very simple: I started running two instances of Meilisearch. One to index, and one to serve.

Indexing

As mentioned before, I was loading jobs directly from my Cloud Run Job into a Meilisearch server running on a VM.

First, I replaced the VM with Meilisearch as a second container in my Cloud Run Job. I then loaded jobs data there instead.

This ephemeral Meilisearch was used only to build an index. The index was then compressed and uploaded to GCS.

I recommend reading this very good blog post that outlines a similar architecture in far more detail than me.

One issue I found with that blog post is that it uses a Meilisearch dump, but those require re-indexing on load. This is why I just uploaded the whole compressed data.ms directory instead - it comes out to around the same size.

Serving

With the index uploaded to GCS, I deployed a second Meilisearch instance as a Cloud Run Service. I built my own container image for this that wraps the official Meilisearch one and pulls the index from GCS on start-up.

To make this efficient and reduce cold-starts, I iterated with a few key actions:

  • Compressed the data.ms directory before uploading. This reduced the download size by approximately 5x. Using zstandard gives fast decompression times too.
  • Replaced use of gcloud CLI to download my index with curl. Initialising gcloud CLI was taking 5-10s on every start-up!
  • Vastly reduce size of each job stored in Meilisearch. Meilisearch now just stores and searches across relevant metadata, full details are stored in Firestore. An additional 30x reduction in index size!

Results

This solution has been working incredibly well for me.

Some stats to satisfy data-driven minds:

  • Meilisearch Resource Allocation: 1 vCPU, 1 GB memory. Actual usage barely scratches the surface; my site does not get high traffic.
  • Compressed Index Size: ~5MB, with ~12,000 unique jobs.
  • Cold Start: Varies between 500ms and 2000ms depending on the mood of the Google Cloud Gods - this is acceptable for me but it might not be for you.
  • Cost Savings: $8/month VM is gone. I’m now paying nothing at all because my usage fits in the Cloud Run free tier.

When This Works/Doesn’t

This solution will work great for you if:

  • Your data does not constantly change.
  • You care about performance but don’t want to pay through the roof for it.
  • You don’t have vast quantities of data, or care much about cold start time.

It may not work if:

  • You need live updates to your data. This is great for me because the data loads once a day and doesn’t change. If your data continuously changes, you’ll need Meilisearch serving requests and indexing new data all the time.
  • You have high traffic - it may work, but I haven’t load-tested it.

Closing Thoughts

The big takeaway here is that separation of concerns is a real thing and leads to far nicer solutions.

By implementing this solution, I haven’t had a sad morning in 2 months and as a bonus have saved 100% on costs (that looks better than $8/month).

I’ll definitely follow up with another post to see how this is holding up when my site starts getting more traffic.

Check out findatechjob.dev if you haven’t already!