Creating a Custom Post Recommendation System Using Transformers

Jul 29, 2025

Chris Vilches

Since I migrated my blog from WordPress to Astro, I initially lost the ability to display related posts at the bottom of each article, as Astro doesn’t offer this functionality out of the box.

I decided to implement it myself from scratch, using Hugging Face’s Transformers library.

The goal was to display a snippet below each post’s main content, like this:

(current post content ends here)

---

Read Related Articles

* Recommended Post 1
* Recommended Post 2

In this article, I’ll explain how I implemented this feature. At the time of writing, it’s deployed and working as expected.

Algorithm Overview

Convert each post into a vector: Using transformers and embeddings, we convert each post into a single vector representing the entire article’s text.
Find similar vectors using cosine similarity: Vectors close to each other represent texts with similar semantic content. We calculate the similarity score between two articles using cosine similarity, which suits this purpose well.
Apply ad-hoc filtering: This involves custom logic, such as ignoring articles with low similarity scores or only including recommendations if the articles share at least one blog category.
Precompute recommendations and generate the static site: Since Astro is a static site generator, recommendations for each article are computed during build time, rather than dynamically on every page request.

Messing with Transformers

The first step is converting text into a vector. For this, I used the all-MiniLM-L6-v2 model, which transforms sentences and paragraphs into 384-dimensional vectors.

In the following example, we analyze what happens when we process the sentence “Today I solved some excellent algorithmic problems” using this model.

The text is first tokenized. Some tokens represent whole words, but words like “algorithmic” may be split into multiple tokens. Each token is assigned a unique non-negative integer ID:

[CLS]     --> Token ID 101
today     --> Token ID 2651
i         --> Token ID 1045
solved    --> Token ID 13332
some      --> Token ID 2070
excellent --> Token ID 6581
algorithm --> Token ID 9896
##ic      --> Token ID 2594
problems  --> Token ID 3471
[SEP]     --> Token ID 102

These token IDs are used to retrieve the initial vector embeddings for each token. These initial embeddings lack contextual information since they haven’t been processed through self-attention layers yet. Depending on the model, they may include positional information, which slightly adjusts the vector (e.g., using positional embeddings).

Here are the initial vector embeddings for the tokens:

Token ID 101   -->  [-0.1766, -0.0482,  0.0377, -0.0157,  0.0063, ...,  0.0379,  0.1696,  0.0310,  0.1154, -0.2001]
Token ID 2651  -->  [ 0.1899,  0.4778, -0.6466, -0.0620,  0.4477, ..., -0.3027, -0.8269,  0.0499, -0.5361,  0.4469]
Token ID 1045  -->  [-0.2715, -0.5152,  0.1800,  0.1759,  0.1213, ..., -0.0806,  1.1190, -0.5518, -0.7233, -0.7180]
Token ID 13332 -->  [-0.6097, -0.4077, -0.3073,  0.0387, -0.2921, ..., -0.1415, -0.0782, -0.0844,  0.1180,  0.1815]
Token ID 2070  -->  [-0.1910, -0.3607,  0.8348,  0.0936,  0.3426, ..., -0.2924, -1.2255,  0.4598,  0.1018,  1.4967]
Token ID 6581  -->  [-0.0791,  0.8314,  0.6794,  0.6199,  0.1778, ..., -0.2922,  0.6357,  1.1416, -0.3689, -0.1879]
Token ID 9896  -->  [-0.6089, -0.2179,  0.0612,  0.7646, -0.4000, ...,  0.7202, -0.5597,  0.1309,  0.6490, -0.4538]
Token ID 2594  -->  [-0.7215, -0.0884,  0.2669, -0.2393, -0.1560, ...,  0.0523, -0.2557,  0.9165, -0.4877, -0.6912]
Token ID 3471  -->  [-0.5559, -0.1996, -0.2737,  0.4928, -0.2631, ..., -0.4326, -0.2903,  0.8535, -0.2942,  0.7360]
Token ID 102   -->  [ 0.2826,  0.1163, -0.2290,  0.0818,  0.1552, ...,  0.0954, -0.1195,  0.1418, -0.0491, -0.1000]

The model then applies self-attention, the core mechanism of transformers, which adds contextual information to the embeddings based on the other tokens in the sequence.

After self-attention, the embeddings are updated:

Token ID 101   -->  [-0.1642, 0.4660,  0.0489, -0.4458, -0.1046, ...,  0.1672, -0.1306,  0.2386, -0.0663, -0.0930]
Token ID 2651  -->  [-0.5260, 0.8277,  0.7240,  0.0294,  0.4266, ...,  0.0601, -0.0381, -0.7451, -1.1621,  1.1199]
Token ID 1045  -->  [ 0.1473, 0.2505,  0.1579, -0.7921,  0.1306, ...,  0.1652,  1.1664, -0.2087, -1.1386, -0.6368]
Token ID 13332 -->  [-0.4840, 1.1230,  0.6628, -0.0099, -0.5324, ...,  0.2173, -0.2530,  0.8590,  0.3046, -0.1156]
Token ID 2070  -->  [-0.5985, 0.2569,  0.3128, -0.2144,  0.1356, ...,  0.2941,  0.0237,  0.1431, -0.1129,  0.7390]
Token ID 6581  -->  [-0.4394, 0.5312,  0.4112, -0.3248, -0.4767, ...,  0.6671,  0.8386,  0.8715, -0.3522,  0.3678]
Token ID 9896  -->  [-0.9628, 0.4037,  0.0501, -0.8665, -0.3116, ..., -0.2301, -0.8126, -0.2742,  0.9110, -1.1934]
Token ID 2594  -->  [-0.5978, 0.6489, -0.2759, -0.3145,  0.0450, ...,  0.3996,  0.1469,  0.2102, -0.1217, -0.2887]
Token ID 3471  -->  [-0.8411, 0.7750,  0.2105, -0.3113, -0.8924, ...,  0.1799,  0.1066,  1.0205, -0.1044,  0.5411]
Token ID 102   -->  [-0.3763, 0.5162, -0.3255, -0.7712,  0.0012, ...,  0.5872,  0.3516,  0.2374, -0.0547, -0.1911]

At this point, we have ten 384-dimensional vectors (the fixed size specified by the model). To create a single vector for the document, I used mean pooling, which involves averaging the vectors element-wise. The result is a single 384-dimensional vector representing the entire document:

[-0.4843, 0.5799, 0.1977, -0.4021, -0.1579, ..., 0.2508, 0.1400, 0.2352, -0.1897, 0.0249]

Now we need to apply this process to full articles. Since a blog article is usually much longer than the example sentence above, you might also want to consider whether to split the text into chunks before processing it.

Cosine Similarity to Find Similar Vectors

This step is straightforward. We use the cosine similarity function to measure the similarity between pairs of vectors. The formula is well-defined and can be found in the Wikipedia article.

After calculating similarities, we apply filtering to exclude low-scoring matches and select the recommendations to display.

Handling Online Recommendation Queries

Since Astro is a static site generator, I compute all recommendations during the build process. This can be slow, but it only happens once before publishing the site. To speed up development, I implemented ad-hoc caching to avoid recalculating embeddings every time I update a page or post, which can otherwise slow down loading during development.

But what if we needed to serve recommendations dynamically on each page request, rather than precomputing them for static HTML?

Suppose we have an API with three methods: create post, update post, and fetch post (with recommendations included). One approach is to generate the embedding vector immediately after saving content to the database, possibly using CQRS (Command Query Responsibility Segregation) to offload this task to a separate server with some delay. This keeps the API responsive by avoiding delays from vector creation.

We can store these vectors in a database that supports vector storage and nearest-neighbor queries, such as PostgreSQL (with pgvector) or Elasticsearch. These databases can efficiently find similar vectors in the vector store. Assuming posts are created or updated infrequently, the main bottleneck would likely be executing nearest-neighbor queries, but this approach should perform well in most common scenarios.

Latest Posts

Categories