How we find songs from fragments.
Most search engines fail at "what's that song that goes da-da-dum-dum about a girl in a yellow dress." Here's the four-stage pipeline we built to make it work.
1. Embedding match
Your query becomes a vector via OpenAI embeddings, then we run a nearest-neighbour search across millions of indexed lyric chunks. Paraphrases and misheard lines still find the right song.
2. Web reasoning
For vibe queries like "90s song with whistling, sounds melancholy", we run a real-time search and feed the results to an LLM that pulls out candidate matches.
3. Audio verification
Top candidates link to YouTube so you can confirm by ear in one click. The first 30 seconds is usually enough to know.
4. Neighbour discovery
Once you've confirmed a match, we surface five embedding-nearest songs (same era, same mood, same energy) for the rabbit hole.
Why this is harder than it looks
Music search has three properties that break standard search engines:
- Memory is fuzzy. People rarely remember the exact words. They remember the cadence, a similar word, or the wrong word entirely.
- Lyrics are repetitive. A four-word fragment can match thousands of songs. Disambiguation needs a model that understands which match is most likely your match.
- The query language is everything. Sometimes you have a phrase, sometimes a feeling, sometimes a description of the music video. A single retrieval method can't handle all three.
What makes our index different
Most lyric search tools index whole songs and rank by exact-phrase match. We index lyric chunks (overlapping windows of a few lines each) as embeddings, which means a fragment like "I'm walking on broken glass and dreaming" can match a song whose actual lyric is "walking through the broken parts, dreaming wide awake". Same meaning, different words.
What gets fed back into the system
Every confirmed match (when a user clicks through and confirms the song) gets re-embedded and added to the index, alongside the query that found it. Over time the system learns the natural mapping between how people describe songs and the songs themselves. The more it's used, the better it gets at the long tail.
What we don't do
We don't store full lyrics on the site. We don't claim ownership of artwork or audio. We don't sell anonymized search data. The site is built to find songs, not to repackage them.