From ab59d472f76d35304adb9c5f5f8cb8de37a6cc36 Mon Sep 17 00:00:00 2001 From: metamuffin Date: Thu, 10 Nov 2022 17:55:35 +0100 Subject: clarity, typos --- content/articles/2022-11-10-artist-correlation.md | 25 ++++++++++++----------- 1 file changed, 13 insertions(+), 12 deletions(-) (limited to 'content/articles/2022-11-10-artist-correlation.md') diff --git a/content/articles/2022-11-10-artist-correlation.md b/content/articles/2022-11-10-artist-correlation.md index c8eaa62..c62e790 100644 --- a/content/articles/2022-11-10-artist-correlation.md +++ b/content/articles/2022-11-10-artist-correlation.md @@ -2,23 +2,24 @@ A hear a lot of music and so every few months my music collection gets boring again. So far I have asked friends to recommend me music but I am running out of -friend too now. Therefore I came up with a new solution. +friend too now. Therefore I came up with a new solution during a few days. I want to find new music that i might like too. After some research I found that -there is [Musicbrainz](https://musichbrainz.org/) (a database of all artists and +there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service -to which you can submit what you are listening too). Both databases are useful +to which you can submit what you are listening to). Both databases are useful for this project. The high-level goal is to know, what people that have a lot of -listens in common, like to listen to. +listens in common, like to listen to. For that the shared number of listeners +for each artist is relevant. ## The Procedure ### Parse data & drop unnecessary detail I parse all of the JSON files of listenbrainz and only keep information about -how often what user listens to which artists. The result is stored in a B-tree -map on my disk (the [sled library](https://crates.io/crates/sledg) is great for -that). +how many listens each user has submitted for what artist. The result is stored +in a B-tree map on my disk (the [sled library](https://crates.io/crates/sledg) +is great for that). - First mapping created: `(user, artist) -> shared listens`. - (Also created a name lookup: `artist -> artist name`) @@ -30,17 +31,17 @@ a user, by scanning the prefix `(user, …`. Next an undirected graph with weighted edges is generated where nodes are artists and edges are shared listens. For every user, each pair of artists they -listen to, receives the sum of listens to either one's listens. +listen to, receives the sum of listens to either one's listens. This means that +artists that share listeners are connected. -Mapping: `(artist, artist) -> weight`. - -Every key `(x, y)` is identical with `(y, x)` so that edges are undirectional. +Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with +`(y, x)` so that edges are undirectional.) ### Query artists The graph tree can now be queried by scanning with a prefix of one artist (`("The Beatles", …`) and all correlated artists are returned with a weight. The -top 16 results are kept and saved. +top-weighted results are kept and saved. ## Results -- cgit v1.2.3-70-g09d2