aboutsummaryrefslogtreecommitdiff
path: root/content/articles/2022-11-10-artist-correlation.md
diff options
context:
space:
mode:
authormetamuffin <metamuffin@disroot.org>2023-02-13 20:25:04 +0100
committermetamuffin <metamuffin@disroot.org>2023-02-13 20:25:04 +0100
commitc19adca147d38562b3f4a06cb2205e043bc24856 (patch)
tree808ceebd163294cc66ed8882885348b914ab1125 /content/articles/2022-11-10-artist-correlation.md
parent77eef59404acaed6faa636239bd18010e34a91de (diff)
downloadmetamuffin-blog-c19adca147d38562b3f4a06cb2205e043bc24856.tar
metamuffin-blog-c19adca147d38562b3f4a06cb2205e043bc24856.tar.bz2
metamuffin-blog-c19adca147d38562b3f4a06cb2205e043bc24856.tar.zst
restructure for embedding into my website
Diffstat (limited to 'content/articles/2022-11-10-artist-correlation.md')
-rw-r--r--content/articles/2022-11-10-artist-correlation.md78
1 files changed, 0 insertions, 78 deletions
diff --git a/content/articles/2022-11-10-artist-correlation.md b/content/articles/2022-11-10-artist-correlation.md
deleted file mode 100644
index d52e613..0000000
--- a/content/articles/2022-11-10-artist-correlation.md
+++ /dev/null
@@ -1,78 +0,0 @@
-# Correlating music artists
-
-I listen to a lot of music and so every few months my music collection gets
-boring again. So far I have asked friends to recommend me music but I am running
-out of friend too now. Therefore I came up with a new solution during a few
-days.
-
-I want to find new music that i might like too. After some research I found that
-there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and
-recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service
-to which you can submit what you are listening to). Both databases are useful
-for this project. The high-level goal is to know, what people that have a lot of
-music in common with me, like to listen to. For that the shared number of
-listeners for each artist is relevant. I use the word 'a listen', to refer to
-one playthrough of a track.
-
-## The Procedure
-
-### Parse data & drop unnecessary detail
-
-All of the JSON files of listenbrainz are parsed and only information about how
-many listens each user has submitted for what artist are kept. The result is
-stored in a B-tree map on my disk (the
-[sled library](https://crates.io/crates/sledg) is great for that).
-
-- First mapping created: `(user, artist) -> shared listens`.
-- (Also created a name lookup: `artist -> artist name`)
-
-The B-Tree stores values ordered, such that i can iterate through all artists of
-a user, by scanning the prefix `(user, …`.
-
-### Create a graph
-
-Next an undirected graph with weighted edges is generated where nodes are
-artists and edges are shared listens. For each user, each edge connecting
-artists they listen to, the weight is incremented by the sum of the logarhythms
-of either one's playthrough count for that user. This means that artists that
-share listeners are connected and because of the logarhythms, users that listen
-to an artist _a lot_ won't be weighted proportionally.
-
-Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with
-`(y, x)` so that edges are undirectional.)
-
-### Query artists
-
-The graph tree can now be queried by scanning with a prefix of one artist
-(`("The Beatles", …`) and all correlated artists are returned with a weight. The
-top-weighted results are kept and saved.
-
-### Notes
-
-Two issues appeared during this project that lead to the following fixes:
-
-- Limit one identity to 32 artists at most because the edge count grows
- quadratically (100 artists -> 10000 edges)
-- When parsing data the user id is made dependent of the time to seperate arists
- when music tastes changing over time. Every 10Ms (~4 months) the user ids
- change.
-
-## Results
-
-In a couple of minutes I rendered about 2.2 million HTML documents with my
-results. They are available at `https://metamuffin.org/artist-correl/{name}`.
-Some example links:
-
-- [The Beatles](https://metamuffin.org/artist-correl/The%20Beatles)
-- [Aimer](https://metamuffin.org/artist-correl/Aimer)
-- [Rammstein](https://metamuffin.org/artist-correl/Rammstein)
-- [Mitski](https://metamuffin.org/artist-correl/Mitski)
-
-## Numbers
-
-- Musicbrainz: 15GB
-- Listenbrainz: 350GB
-- Extracted listening data: 23GB
-- Graph: 56GB
-- Rendered HTML: 2.3GB
-- Compressed HTML (squashfs with zstd): 172MB