diff options
author | metamuffin <metamuffin@disroot.org> | 2022-11-10 22:16:06 +0100 |
---|---|---|
committer | metamuffin <metamuffin@disroot.org> | 2022-11-10 22:16:06 +0100 |
commit | 0bef88a292b4c32992c8141b6d3db402f3015ef9 (patch) | |
tree | 3b0abdc5a8eb21eabb06a786585f3abb85d6afbf | |
parent | ab59d472f76d35304adb9c5f5f8cb8de37a6cc36 (diff) | |
download | metamuffin-blog-0bef88a292b4c32992c8141b6d3db402f3015ef9.tar metamuffin-blog-0bef88a292b4c32992c8141b6d3db402f3015ef9.tar.bz2 metamuffin-blog-0bef88a292b4c32992c8141b6d3db402f3015ef9.tar.zst |
include new features
-rw-r--r-- | content/articles/2022-11-07-programming-language-design.md | 62 | ||||
-rw-r--r-- | content/articles/2022-11-10-artist-correlation.md | 46 |
2 files changed, 92 insertions, 16 deletions
diff --git a/content/articles/2022-11-07-programming-language-design.md b/content/articles/2022-11-07-programming-language-design.md new file mode 100644 index 0000000..5a8b6b7 --- /dev/null +++ b/content/articles/2022-11-07-programming-language-design.md @@ -0,0 +1,62 @@ +# Some Thoughts on Programming Language Design + +This is a collection of ideas to look at when inventing new langauges. + +## Type System + +```diff +# Haskell +data LinkedList a = Nil | Cons a (Box (LinkedList a)) +data Test = Empty | Blub Int | State { x :: Int, y :: Int } +# Rust +enum LinkedList<T> { Nil, Cons(T, LinkedList<T>) } +``` + +## Memory Management + +- **Drop when out-of-scope** +- Garbage collections +- Reference counting + +## Compile-time logic + +- Annotation when calling function to be run as-far-as-possible at comptime + +```diff +fn format(template: String, args: [String]) -> String { + template.replace("@", (match, i) => args[i]) +} + +fun add(x, y) x + y + +fun main() print(format!("@ ist @; @", ["1+1", 1+1, x])) +# should expand to +fun main() print("1+1 ist 2; " ~ x)) +``` + +## Examples + +### Fizz-Buzz + +```diff +for (n in 0..100) { + if (n % (3*5) == 0) print("FizzBuzz") + else if (n % 3 == 0) print("Fizz") + else if (n % 5 == 0) print("Buzz") + else print(n) +} + + +if (true) x = 1 +if (true) { x = 1 } + +``` + +``` +f(x) = 10 + g(x) +f x = 10 + g x + +main = { + +} +``` diff --git a/content/articles/2022-11-10-artist-correlation.md b/content/articles/2022-11-10-artist-correlation.md index c62e790..d52e613 100644 --- a/content/articles/2022-11-10-artist-correlation.md +++ b/content/articles/2022-11-10-artist-correlation.md @@ -1,25 +1,27 @@ # Correlating music artists -A hear a lot of music and so every few months my music collection gets boring -again. So far I have asked friends to recommend me music but I am running out of -friend too now. Therefore I came up with a new solution during a few days. +I listen to a lot of music and so every few months my music collection gets +boring again. So far I have asked friends to recommend me music but I am running +out of friend too now. Therefore I came up with a new solution during a few +days. I want to find new music that i might like too. After some research I found that there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service to which you can submit what you are listening to). Both databases are useful for this project. The high-level goal is to know, what people that have a lot of -listens in common, like to listen to. For that the shared number of listeners -for each artist is relevant. +music in common with me, like to listen to. For that the shared number of +listeners for each artist is relevant. I use the word 'a listen', to refer to +one playthrough of a track. ## The Procedure ### Parse data & drop unnecessary detail -I parse all of the JSON files of listenbrainz and only keep information about -how many listens each user has submitted for what artist. The result is stored -in a B-tree map on my disk (the [sled library](https://crates.io/crates/sledg) -is great for that). +All of the JSON files of listenbrainz are parsed and only information about how +many listens each user has submitted for what artist are kept. The result is +stored in a B-tree map on my disk (the +[sled library](https://crates.io/crates/sledg) is great for that). - First mapping created: `(user, artist) -> shared listens`. - (Also created a name lookup: `artist -> artist name`) @@ -30,9 +32,11 @@ a user, by scanning the prefix `(user, …`. ### Create a graph Next an undirected graph with weighted edges is generated where nodes are -artists and edges are shared listens. For every user, each pair of artists they -listen to, receives the sum of listens to either one's listens. This means that -artists that share listeners are connected. +artists and edges are shared listens. For each user, each edge connecting +artists they listen to, the weight is incremented by the sum of the logarhythms +of either one's playthrough count for that user. This means that artists that +share listeners are connected and because of the logarhythms, users that listen +to an artist _a lot_ won't be weighted proportionally. Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with `(y, x)` so that edges are undirectional.) @@ -43,6 +47,16 @@ The graph tree can now be queried by scanning with a prefix of one artist (`("The Beatles", …`) and all correlated artists are returned with a weight. The top-weighted results are kept and saved. +### Notes + +Two issues appeared during this project that lead to the following fixes: + +- Limit one identity to 32 artists at most because the edge count grows + quadratically (100 artists -> 10000 edges) +- When parsing data the user id is made dependent of the time to seperate arists + when music tastes changing over time. Every 10Ms (~4 months) the user ids + change. + ## Results In a couple of minutes I rendered about 2.2 million HTML documents with my @@ -58,7 +72,7 @@ Some example links: - Musicbrainz: 15GB - Listenbrainz: 350GB -- Extracted listening data: 11GB -- Graph: 24GB -- Rendered HTML: 8.4GB -- Compressed HTML (squashfs with zstd): 105MB +- Extracted listening data: 23GB +- Graph: 56GB +- Rendered HTML: 2.3GB +- Compressed HTML (squashfs with zstd): 172MB |