links.arjun.tv/docs/end-user/vector-embedding-targeting

Vector Embedding Targeting

Use this guide to understand how the platform uses vector embeddings to power content-aware link personalization.

When to use

  • You want to understand how the platform recommends or groups links by similarity.
  • You are building a page or profile and want to know how link ordering and relevance work.
  • You want to optimize your link metadata (titles, tags, categories) for better personalization.

What are link embeddings?

The platform generates lightweight vector embeddings for each link in a handle's corpus. These embeddings capture the semantic content of a link based on its metadata, enabling:

  • Similarity-based link grouping — Links with similar content cluster together.
  • Content-aware ordering — Pages can order links by relevance to a visitor's observed interests.
  • Personalization signals — Embeddings feed into the targeting pipeline alongside behavioral signals.

How it works

1. Source text extraction

For each link, the platform builds a source text string from:

FieldExample
title"Shop Now"
tags"fashion", "sale"
ogDescription"Browse our latest collection"
categories"e-commerce"

These fields are concatenated (space-separated) and truncated to 256 characters.

2. Tokenization

The source text is:

  1. Lowercased.
  2. Split on non-alphanumeric characters.
  3. Filtered to remove single-character tokens.
  4. Filtered to remove common stopwords (a, the, is, in, for, etc.).

3. TF-IDF weighting

Each token is weighted using Term Frequency - Inverse Document Frequency (TF-IDF):

  • TF (Term Frequency) — How often the token appears in this link's text.
  • IDF (Inverse Document Frequency) — How rare the token is across all links in the handle. Rare terms are weighted higher.

This means distinctive terms (like a product name) contribute more to the embedding than common terms.

4. Random projection

The TF-IDF vector is compressed to 32 dimensions using sparse random projection (Achlioptas 2003). This technique:

  • Preserves relative distances between embeddings.
  • Uses a deterministic seed so embeddings are reproducible across renders.
  • Is extremely fast (<5 ms for 20 links, zero external dependencies).

5. L2 normalization

The final embedding is L2-normalized (unit length) and rounded to 4 decimal places for storage efficiency.

Minimum corpus size

Embeddings require at least 3 links in a handle's corpus to be meaningful. With fewer than 3 links, the IDF component collapses (there is not enough contrast between documents), and the platform returns empty embeddings.

How embeddings are used

Page rendering

When a profile page or link collection is rendered, embeddings are computed for all links in the handle. The rendering pipeline can then:

  • Cluster similar links together in the UI.
  • Rank links by relevance to visitor signals or segments.
  • Power "related links" sections on individual link pages.

Targeting pipeline integration

Embeddings complement the chain signals system:

  1. Visitor signals capture behavioral intent (what the visitor clicked, viewed, or searched for).
  2. Link embeddings capture content semantics (what each link is about).
  3. The targeting pipeline can match visitor intent signals against link embeddings to surface the most relevant content.

This enables scenarios like: a visitor who has signaled interest in "rings" sees jewelry-related links ranked higher in their personalized view.

Optimizing for embeddings

To get the most out of embedding-based targeting:

DoWhy
Write descriptive link titlesTitles are the primary source text
Add relevant tagsTags add distinctive keywords the title may lack
Fill in ogDescriptionAdds additional context for the embedding
Use categories consistentlyCategories help links cluster into meaningful groups
Keep metadata specificGeneric terms ("best", "great") add noise, not signal
AvoidWhy
Identical titles across linksIdentical source text produces identical embeddings
Empty metadata fieldsLess text means a weaker, less distinctive embedding
Excessive stopwords in titlesStopwords are filtered out and contribute nothing

Technical details

PropertyValue
Embedding dimension32
AlgorithmTF-IDF + sparse random projection
NormalizationL2 (unit length)
Precision4 decimal places
Minimum corpus3 links
Source text max length256 characters
Projection seedDeterministic (default 42)
Performance<5 ms for 20 links
External dependenciesNone

UI path

Embeddings are computed automatically during page rendering. There is no separate UI to configure them. To influence embedding quality:

  1. Open https://app.{PUBLIC_DOMAIN}.
  2. Navigate to your handle and open Links.
  3. For each link, fill in the title, tags, categories, and social preview description.
  4. Save changes — embeddings are recomputed on the next page render.

Required auth

Embeddings are computed internally during page rendering. No separate auth is needed. Link metadata updates require CREATOR-level JWT or PAT with links.write scope.

Related: