aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authormetamuffin <metamuffin@disroot.org>2022-11-10 22:16:06 +0100
committermetamuffin <metamuffin@disroot.org>2022-11-10 22:16:06 +0100
commit0bef88a292b4c32992c8141b6d3db402f3015ef9 (patch)
tree3b0abdc5a8eb21eabb06a786585f3abb85d6afbf
parentab59d472f76d35304adb9c5f5f8cb8de37a6cc36 (diff)
downloadmetamuffin-blog-0bef88a292b4c32992c8141b6d3db402f3015ef9.tar
metamuffin-blog-0bef88a292b4c32992c8141b6d3db402f3015ef9.tar.bz2
metamuffin-blog-0bef88a292b4c32992c8141b6d3db402f3015ef9.tar.zst
include new features
-rw-r--r--content/articles/2022-11-07-programming-language-design.md62
-rw-r--r--content/articles/2022-11-10-artist-correlation.md46
2 files changed, 92 insertions, 16 deletions
diff --git a/content/articles/2022-11-07-programming-language-design.md b/content/articles/2022-11-07-programming-language-design.md
new file mode 100644
index 0000000..5a8b6b7
--- /dev/null
+++ b/content/articles/2022-11-07-programming-language-design.md
@@ -0,0 +1,62 @@
+# Some Thoughts on Programming Language Design
+
+This is a collection of ideas to look at when inventing new langauges.
+
+## Type System
+
+```diff
+# Haskell
+data LinkedList a = Nil | Cons a (Box (LinkedList a))
+data Test = Empty | Blub Int | State { x :: Int, y :: Int }
+# Rust
+enum LinkedList<T> { Nil, Cons(T, LinkedList<T>) }
+```
+
+## Memory Management
+
+- **Drop when out-of-scope**
+- Garbage collections
+- Reference counting
+
+## Compile-time logic
+
+- Annotation when calling function to be run as-far-as-possible at comptime
+
+```diff
+fn format(template: String, args: [String]) -> String {
+ template.replace("@", (match, i) => args[i])
+}
+
+fun add(x, y) x + y
+
+fun main() print(format!("@ ist @; @", ["1+1", 1+1, x]))
+# should expand to
+fun main() print("1+1 ist 2; " ~ x))
+```
+
+## Examples
+
+### Fizz-Buzz
+
+```diff
+for (n in 0..100) {
+ if (n % (3*5) == 0) print("FizzBuzz")
+ else if (n % 3 == 0) print("Fizz")
+ else if (n % 5 == 0) print("Buzz")
+ else print(n)
+}
+
+
+if (true) x = 1
+if (true) { x = 1 }
+
+```
+
+```
+f(x) = 10 + g(x)
+f x = 10 + g x
+
+main = {
+
+}
+```
diff --git a/content/articles/2022-11-10-artist-correlation.md b/content/articles/2022-11-10-artist-correlation.md
index c62e790..d52e613 100644
--- a/content/articles/2022-11-10-artist-correlation.md
+++ b/content/articles/2022-11-10-artist-correlation.md
@@ -1,25 +1,27 @@
# Correlating music artists
-A hear a lot of music and so every few months my music collection gets boring
-again. So far I have asked friends to recommend me music but I am running out of
-friend too now. Therefore I came up with a new solution during a few days.
+I listen to a lot of music and so every few months my music collection gets
+boring again. So far I have asked friends to recommend me music but I am running
+out of friend too now. Therefore I came up with a new solution during a few
+days.
I want to find new music that i might like too. After some research I found that
there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and
recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service
to which you can submit what you are listening to). Both databases are useful
for this project. The high-level goal is to know, what people that have a lot of
-listens in common, like to listen to. For that the shared number of listeners
-for each artist is relevant.
+music in common with me, like to listen to. For that the shared number of
+listeners for each artist is relevant. I use the word 'a listen', to refer to
+one playthrough of a track.
## The Procedure
### Parse data & drop unnecessary detail
-I parse all of the JSON files of listenbrainz and only keep information about
-how many listens each user has submitted for what artist. The result is stored
-in a B-tree map on my disk (the [sled library](https://crates.io/crates/sledg)
-is great for that).
+All of the JSON files of listenbrainz are parsed and only information about how
+many listens each user has submitted for what artist are kept. The result is
+stored in a B-tree map on my disk (the
+[sled library](https://crates.io/crates/sledg) is great for that).
- First mapping created: `(user, artist) -> shared listens`.
- (Also created a name lookup: `artist -> artist name`)
@@ -30,9 +32,11 @@ a user, by scanning the prefix `(user, …`.
### Create a graph
Next an undirected graph with weighted edges is generated where nodes are
-artists and edges are shared listens. For every user, each pair of artists they
-listen to, receives the sum of listens to either one's listens. This means that
-artists that share listeners are connected.
+artists and edges are shared listens. For each user, each edge connecting
+artists they listen to, the weight is incremented by the sum of the logarhythms
+of either one's playthrough count for that user. This means that artists that
+share listeners are connected and because of the logarhythms, users that listen
+to an artist _a lot_ won't be weighted proportionally.
Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with
`(y, x)` so that edges are undirectional.)
@@ -43,6 +47,16 @@ The graph tree can now be queried by scanning with a prefix of one artist
(`("The Beatles", …`) and all correlated artists are returned with a weight. The
top-weighted results are kept and saved.
+### Notes
+
+Two issues appeared during this project that lead to the following fixes:
+
+- Limit one identity to 32 artists at most because the edge count grows
+ quadratically (100 artists -> 10000 edges)
+- When parsing data the user id is made dependent of the time to seperate arists
+ when music tastes changing over time. Every 10Ms (~4 months) the user ids
+ change.
+
## Results
In a couple of minutes I rendered about 2.2 million HTML documents with my
@@ -58,7 +72,7 @@ Some example links:
- Musicbrainz: 15GB
- Listenbrainz: 350GB
-- Extracted listening data: 11GB
-- Graph: 24GB
-- Rendered HTML: 8.4GB
-- Compressed HTML (squashfs with zstd): 105MB
+- Extracted listening data: 23GB
+- Graph: 56GB
+- Rendered HTML: 2.3GB
+- Compressed HTML (squashfs with zstd): 172MB