7 files changed, 641 insertions, 0 deletions
diff --git a/articles/2022-08-29-blog-start.md b/articles/2022-08-29-blog-start.md
new file mode 100644
index 0000000..2f17e6c
--- /dev/null
+++ b/articles/2022-08-29-blog-start.md
@@ -0,0 +1,9 @@
+# Starting my blog
+
+This will be the place where I will post short articles about my thoughts and projects in the future, whenever I feel like it.
+
+A list of topics of which I already have an idea in mind are:
+
+- The tools I used to create this blog
+- My dream operating system (maybe even logs about me trying (aka failing) to implement it)
+- In search of a perfect text editor
diff --git a/articles/2022-08-29-blog-test.md b/articles/2022-08-29-blog-test.md
new file mode 100644
index 0000000..1c4a28a
--- /dev/null
+++ b/articles/2022-08-29-blog-test.md
@@ -0,0 +1,41 @@
+# Blog test
+
+Hello world!
+
+## heading
+
+_italic_ **bold** `code` [Link](https://metamuffin.org/)
+
+### sub heading
+
+Look at this code here.
+
+```rs
+#[derive(Default)]
+struct Impossible<T> {
+    something: HashMap<usize, T>,
+    //    ^- look here, a keyword! --the grammar regex
+    itself: Impossible,
+}
+
+pub async fn useless(s: Box<dyn Write>) -> impl Future<usize> {
+    todo!() // this panics
+}
+fn main() { 
+    1 + "blub"
+    for (x, y) in [(1,2),(3,4)] {
+        println!("{x} and {y}");
+        Impossible::default();
+    }
+}
+```
+
+As you can see, its pointless i.e. no points
+
+- list
+- list
+- list
+
+1. first this
+2. then that
+3. done
diff --git a/articles/2022-08-30-blog-tools.md b/articles/2022-08-30-blog-tools.md
new file mode 100644
index 0000000..81d1ff9
--- /dev/null
+++ b/articles/2022-08-30-blog-tools.md
@@ -0,0 +1,104 @@
+# Tools I use to create this blog
+
+As you might expect, this blog uses some overengineered tools, after all, it
+wouldn't be so much fun otherwise. This article is probably not too interesting
+on its own. It's just "documentation".
+
+// TODO upload source code and link it here.
+
+## "Build system"
+
+```tree
+.
+├── ... (readme 'n stuff)
+├── code
+│  ├── Cargo.toml
+│  └── src
+│     └── ... (rust source)
+└── content
+   ├── articles
+   │  ├── 2022-08-29-blog-start.md
+   │  ├── 2022-08-29-blog-test.md
+   │  └── 2022-08-30-blog-tools.md
+   ├── makefile
+   ├── out
+   │  ├── ... (generated files)
+   └── style.css
+```
+
+The entry point to this "build system" is `content/makefile`, it has rules for
+generating HTM from the markdown sources, the index and the atom feed. The
+"compiler" here is a small rust program. (`code`).
+
+```makefile
+# oversimplifed, doesnt work
+TOOL := blog-tool
+
+out/index: $(ALL_ARTICLES) $(TOOL)
+	$(TOOL) render-index > $@
+out/feed.atom: $(ALL_ARTICLES) $(TOOL)
+	$(TOOL) generate-atom > $@
+out/%: articles/%.md $(TOOL)
+	$(TOOL) render-article $< > $@
+```
+
+A small trick here is, to make everything depend on the compiler (`$(TOOL)`)
+too, so that when it changes, re-generation of all articles if triggered.
+
+## Rust tools
+
+I use the [laby](https://crates.io/crates/laby) crate for templating the HTM.
+
+// TODO what is important here?!
+
+### File server
+
+I wanted to serve all files without the file name extension (`.html`), but my
+previous http server ([http-server](https://github.com/http-party/http-server))
+exclusively inferred the MIME type from the extension, which makes it impossible
+to do that. The obvious solution was to reinvent the wheel.
+
+This had the great side-effect of making my website blazing fast 🚀🚀! :)
+
+#### http-server (node.js)
+
+```
+Running 10s test @ http://127.0.0.1:8080/test-file
+  2 threads and 100 connections
+Requests/sec:   2314.00
+Transfer/sec:    725.38KB
+```
+
+#### fileserver (rust, no caching)
+
+```
+Running 10s test @ http://127.0.0.1:8080/test-file
+  2 threads and 100 connections
+Requests/sec:  24464.69
+Transfer/sec:      3.10MB
+```
+
+// TODO also upload source code and link it
+
+### Syntax highlighing
+
+For that, I chose the crate [synoptic](https://crates.io/crates/synoptic). It
+provides a some functions for defining tokens with a regex, then returning
+start- and end-points of those for further processing. For example, this is how
+i defined comments and types:
+
+```rs
+let rust_grammar = &[
+    (&["(?m)(//.*)$"], "comment"),
+    (&["[A-Z][a-z]*", "bool", "usize", /* simplified */ ], "type"),
+    /* more rules */
+]
+```
+
+The library finds all the tokens and lets me serialize them to HTM.
+
+## End
+
+I still need to learn, how to write this blog well and what is the most
+interesting to read. Please give me some advice or commentary via matrix, mail
+or fedi. (see [contact](https://metamuffin.org/contact))
diff --git a/articles/2022-09-19-gctf-readysetaction.md b/articles/2022-09-19-gctf-readysetaction.md
new file mode 100644
index 0000000..0d5b719
--- /dev/null
+++ b/articles/2022-09-19-gctf-readysetaction.md
@@ -0,0 +1,84 @@
+# Google Beginner CTF: ReadySetAction
+
+**SPOILER WARNING: Go play the CTF at
+[capturetheflag.withgoogle.com](https://capturetheflag.withgoogle.com/) first**
+
+The challenge was a short python script with the following code:
+
+```py
+from Crypto.Util.number import *
+
+flag = b"REDACTED"
+
+p = getPrime(1024)
+q = getPrime(1024)
+n = p*q
+
+m = bytes_to_long(flag)
+
+
+c = pow(m,3,n)
+
+print(c)
+print(n)
+#154780 ... 6709
+#210348 ... 4477
+#(i removed both 617 digit long numbers here)
+```
+
+It was immediatly obvious to me that this has to do with RSA cryptography
+because it multiplies large primes and then does some powmod magic. Because I
+didn't remember exactly how RSA worked, I quickly read the Wikipedia article
+again - specifically section
+[4.1 Attacks against plain RSA](https://en.wikipedia.org/wiki/RSA_%29cryptosystem%29#Attacks_against_plain_RSA)
+
+The first item already seems like the solution:
+
+> When encrypting with low encryption exponents (e.g., e = 3) and small values
+> of the m (i.e., m < n1/e), the result of me is strictly less than the modulus
+> n. In this case, ciphertexts can be decrypted easily by taking the eth root of
+> the ciphertext over the integers.
+
+Which would mean that $\sqrt[3]{c}$ is a solution _if_ $m$ is small enough.
+However
+
+$c = m^3 \mod n \\$
+
+so the exponentiation could yield a much higher result that we dont get because
+it is wrapped at $n$. So we can try out how often it wrapped:
+
+$m = \sqrt[3]{c + x*n} \quad x \in \N\\$
+
+where $x$ can be brute-forced if $m$ is sufficiently small. Back in python it
+looks like this:
+
+```py
+c = ...
+n = ...
+
+# simple but bad binary search for n-th root
+def nth_root(c,e):
+    high = 1
+    while high**e < c:
+        high *= 2
+    low = 0
+    mid = 0
+    last_mid = 0
+    while mid**e != c:
+        mid = (low + high) // 2
+        if last_mid == mid:
+            return 0 # i use 0 to indicate a failure
+        if mid**e > c:
+            high = mid
+        else:
+            low = mid
+        last_mid = mid
+    return mid
+
+for x in range(100000):
+    m = nth_root(c + n*x, e)
+    # the probability of finding a perfect cube number is low, so any result is what we want
+    if m != 0: print(long_to_bytes(m))
+```
+
+Thats it! With $x=1831$ we get our flag: `CTF{34sy_RS4_1s_e4sy_us3}`
diff --git a/articles/2022-09-25-ductf-file-magic.md b/articles/2022-09-25-ductf-file-magic.md
new file mode 100644
index 0000000..f4b55c9
--- /dev/null
+++ b/articles/2022-09-25-ductf-file-magic.md
@@ -0,0 +1,255 @@
+# DownUnderCTF 2022: File Magic
+
+A short writeup about my favorite challenge from DUCTF. It took me approximatly
+12h to solve. I found it was the most creative and challenging one that i
+solved.
+
+## Task
+
+The challenge consists of a python script and an ip-port pair which appears to
+be running that script. Also the path of the flag is given: `./flag.txt`
+
+```py
+#!/usr/bin/env python3
+
+from Crypto.Cipher import AES
+from PIL import Image
+from tempfile import NamedTemporaryFile
+from io import BytesIO
+import subprocess, os
+
+KEY = b'downunderctf2022'
+
+iv = bytes.fromhex(input('iv (hex): '))
+assert len(iv) == 16 and b'DUCTF' in iv, 'Invalid IV'
+
+data = bytes.fromhex(input('file (hex): '))
+assert len(data) % 16 == 0, 'Misaligned file length'
+assert len(data) < 1337, 'Oversized file length'
+
+data_buf = BytesIO(data)
+img = Image.open(data_buf, formats=['jpeg'])
+assert img.width == 13 and img.height == 37, 'Invalid image size'
+assert img.getpixel((7, 7)) == (7, 7, 7), 'Invalid image contents'
+
+aes = AES.new(KEY, iv=iv, mode=AES.MODE_CBC)
+dec = aes.decrypt(data)
+assert dec.startswith(b'\x7fELF'), 'Not an ELF'
+
+f = NamedTemporaryFile(delete=False)
+f.write(dec)
+f.close()
+
+os.chmod(f.name, 0o777)
+pipes = subprocess.Popen([f.name], stdout=subprocess.PIPE)
+stdout, _ = pipes.communicate()
+print(stdout.decode())
+
+os.remove(f.name)
+```
+
+So, for anything to make it past these checks and be executed it must:
+
+1. be a valid 13x37 JPEG image with the pixel at 7,7 set to `#070707`
+2. be a valid ELF binary that reads `./flag.txt` after decrypting with AES CBC,
+   fixed key (`downunderctf2022`) and the provided IV
+3. The IV must contain `DUCTF`
+
+During the competition I discovered the information in the next three headings
+in parallel but internally in-order.
+
+## 1. AES CBC "flaw"
+
+We need to generate a file that is a sort-of polyglot with JPEG and ELF,
+converted with AES CBC (Cipher block chaining).
+
+AES itself operates on 16-byte (for 128-bit AES) blocks, so bigger files are
+split and then encrypted separately. To ensure that identical blocks don't
+result in identical blocks in ciphertext, each block is first xor'd with
+something that won't be identical. In the case of CBC, the last ciphertext block
+or the initialisation vector (IV) is used. Here is a diagram for encryption
+
+```
+  ___plaintext____|___plaintext____|___plaintext____|...
+       v                 v                v              
+IV--->XOR   ,---------->XOR   ,--------->XOR   ,---- ...
+       v    |            v    |           v    |         
+      AES   |           AES   |          AES   |         
+       v---'             v---'            v---'          
+  ___ciphertext___|___ciphertext___|___ciphertext___|...
+```
+
+For decryption we can just flip the diagram and replace AES with reverse AES.
+
+```
+  ___ciphertext___|___ciphertext___|___ciphertext___|...
+       v---,             v---,            v---,         
+      ∀EZ   |           ∀EZ   |          ∀EZ   |         
+       v    |            v    |           v    |         
+IV--->XOR   '---------->XOR   '--------->XOR   '---- ...
+       v                 v                v              
+  ___plaintext____|___plaintext____|___plaintext____|...
+```
+
+This does make sense, however i noticed that all but the first block do not
+depend on IV. This turns out useful since we can turn any block into any other
+block by applying a chosen value with XOR. So we can control the ciphertext with
+the IV directly as follows:
+
+- $m$: first plaintext block
+- $c$: first ciphertext block
+
+$$ c = AES(m \oplus IV) \\
+
+AES^{-1}(c) = m \oplus IV \\
+
+AES^{-1}(c) \oplus m = IV $$
+
+All blocks in ciphertext after the first are now "uncontrollable" because IV and
+plaintext are set.
+
+## 2. JPEG
+
+JPEG consists of a list of _segments_. Each starts with a marker byte (`ff`)
+followed by a identifier and the length of the segment (if non-zero).
+
+| Identifier | Name                                            |
+| ---------- | ----------------------------------------------- |
+| `d8`       | Start of Image                                  |
+| `fe`       | Comment                                         |
+| `d9`       | End of Image                                    |
+| ...        | _a bunch more that you dont need to know about_ |
+
+The comment segment is perfect for embedding our ELF binary into JPEG. We can
+first generate a JPEG image, then insert a _comment_ somewhere containing any
+data we want.
+
+## 3. ELF Payload
+
+The binary needs to be super small so creating it "manually" was required. I
+followed the guide
+[Creating a minimal ELF-64 file by tchajed](https://github.com/tchajed/minimal-elf/)
+and recreated it for my needs. Like in the guide i also wrote the assembly with
+a rust EDSL.
+
+```rs
+let mut str_end = a.create_label();
+let mut filename = a.create_label();
+a.jmp(str_end)?; // jump over the string
+a.set_label(&mut filename)?;
+a.db(b"flag.txt\0")?;
+a.set_label(&mut str_end)?;
+
+// open("flag.txt", O_RDONLY)
+a.mov(eax, 2)?;
+a.lea(rdi, ptr(filename))?;
+a.mov(rsi, 0u64)?;
+a.syscall()?; // fd -> rax
+
+// sendfile(1, rax, 0, 0xff)
+a.mov(rsi, rax)?;
+a.mov(eax, 40)?;
+a.mov(rdi, 1u64)?;
+a.mov(rdx, 0u64)?;
+a.mov(r10, 0xffu64)?;
+a.syscall()?;
+```
+
+I was able to produce a 207 byte long executable.
+
+## 4. _magic!_
+
+Here is how we align the files now (single quotes `'` indicate ASCII
+representation for clarity, question marks `?` represent data that is implicitly
+defined):
+
+```
+plaintext:  7f 'E 'L 'F 02 01 01 00 00 00 00 00 00 00 00 00 
+iv:         ?? ?? ?? ?? ?? ?? 'D 'U 'C 'T 'F 00 00 00 00 00
+ciphertext: ff d8 ff fe {len} ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
+```
+
+This scenario is a little more complicated because in some places we define
+ciphertext and plaintext and in some we define ciphertext and IV. This is not a
+problem because XOR operates on every byte individually.
+
+All-in-all, the file looks like this now:
+
+- First block:
+  - overlapping headers (6 bytes)
+  - `DUCTF` string (5 bytes)
+  - padding (5 bytes)
+- Rest of the ELF binary
+- _end of the jpeg comment_
+- Rest of the JPEG image
+
+All this information should be enough to solve this challenge!
+
+I here attach the implementation that i created _during_ the CTF here. I kept it
+as messy as it was, just removed not-so-interesting code and added comments.
+
+```rs
+let mut i = image::RgbImage::new(13, 37);
+// jpeg is lossy so filling guarantees the pixel actually has the *exact* color
+i.pixels_mut().for_each(|p| *p = Rgb([7, 7, 7]));
+
+let mut imgbuf = Cursor::new(vec![]);
+image::codecs::jpeg::JpegEncoder::new(&mut imgbuf)
+    .encode_image(&i)
+    .unwrap();
+
+let key = Aes128::new(&GenericArray::from(*b"downunderctf2022"));
+let mut payload = create_elf_payload();
+let mut buf = BytesMut::new();
+
+// pad payload so AES works
+while payload.len() % 16 != 0 {
+    payload.put_u8(0);
+}
+assert!(payload.len() % 16 == 0);
+
+let prefix_len = 6;
+// write the JPEG headers to start a comment
+buf.put_u16(0xffd8);
+buf.put_u16(0xfffe);
+buf.put_u16(payload.len() as u16 + 2 /*seg len*/ - prefix_len as u16);
+
+// find a good IV
+let iv = {
+    let mut fblock_target = vec![0u8; 16];
+    fblock_target[0..prefix_len].copy_from_slice(&buf[0..prefix_len]);
+    key.decrypt_block(GenericArray::from_mut_slice(&mut fblock_target));
+    let mut iv = xor_slice(&fblock_target, &payload[0..16]);
+    iv[prefix_len..prefix_len + 5].copy_from_slice(b"DUCTF");
+    iv
+};
+
+// fill the first block up with zeroes
+for _ in prefix_len..16 {
+    buf.put_u8(0);
+}
+
+// encrypt starting with the 2nd block
+buf.put(&payload[16..]);
+let block_range = |n: usize| (n) * 16..(n + 1) * 16;
+for i in 1..buf.len() / 16 {
+    let tb = Vec::from(&buf[block_range(i - 1)]);
+    xor_assign(&mut buf[block_range(i)], &tb);
+    key.encrypt_block(GenericArray::from_mut_slice(&mut buf[block_range(1)]));
+}
+
+// append the rest of the image, stripping the first segment
+buf.put(&imgbuf.into_inner()[2..]);
+
+// pad the final buffer again because the image might not be aligned
+while buf.len() % 16 != 0 {
+    buf.put_u8(0);
+}
+assert!(buf.len() < 1337);
+
+File::create("image").unwrap().write_all(&buf).unwrap();
+File::create("iv").unwrap().write_all(&iv).unwrap();
+```
+
+I am also still looking for team mates for upcoming CTF events and would be
+happy to hack together with you! Just [contact](https://metamuffin.org/contact) me.
diff --git a/articles/2022-11-07-programming-language-design.md b/articles/2022-11-07-programming-language-design.md
new file mode 100644
index 0000000..674e696
--- /dev/null
+++ b/articles/2022-11-07-programming-language-design.md
@@ -0,0 +1,70 @@
+# Some Thoughts on Programming Language Design
+
+This is a collection of ideas to look at when inventing new langauges.
+
+## Other Ideas
+
+- The Language pushes abstraction to the limit by not noting any
+  hardware-related issues like memory-allocations, parallelism, heterogenous
+  computer architecture (CPU, GPU, …)
+  - requires a very "intellegent" compiler and a way to describe unknowns like
+    possible inputs, etc. in the language itself
+- Start with assembly but add a very flexible macro system
+
+## Type System
+
+```diff
+# Haskell
+data LinkedList a = Nil | Cons a (Box (LinkedList a))
+data Test = Empty | Blub Int | State { x :: Int, y :: Int }
+# Rust
+enum LinkedList<T> { Nil, Cons(T, LinkedList<T>) }
+```
+
+## Memory Management
+
+- **Drop when out-of-scope**
+- Garbage collections
+- Reference counting
+
+## Compile-time logic
+
+- Annotation when calling function to be run as-far-as-possible at comptime
+
+```diff
+fn format(template: String, args: [String]) -> String {
+    template.replace("@", (match, i) => args[i])
+}
+
+fun add(x, y) x + y
+
+fun main() print(format!("@ ist @; @", ["1+1", 1+1, x]))
+# should expand to
+fun main() print("1+1 ist 2; " ~ x))
+```
+
+## Examples
+
+### Fizz-Buzz
+
+```diff
+for (n in 0..100) {
+    if (n % (3*5) == 0) print("FizzBuzz")
+    else if (n % 3 == 0) print("Fizz")
+    else if (n % 5 == 0) print("Buzz")
+    else print(n)
+}
+
+
+if (true) x = 1
+if (true) { x = 1 }
+```
+
+```
+f(x) = 10 + g(x)
+f x = 10 + g x
+
+main = {
+
+}
+```
diff --git a/articles/2022-11-10-artist-correlation.md b/articles/2022-11-10-artist-correlation.md
new file mode 100644
index 0000000..d52e613
--- /dev/null
+++ b/articles/2022-11-10-artist-correlation.md
@@ -0,0 +1,78 @@
+# Correlating music artists
+
+I listen to a lot of music and so every few months my music collection gets
+boring again. So far I have asked friends to recommend me music but I am running
+out of friend too now. Therefore I came up with a new solution during a few
+days.
+
+I want to find new music that i might like too. After some research I found that
+there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and
+recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service
+to which you can submit what you are listening to). Both databases are useful
+for this project. The high-level goal is to know, what people that have a lot of
+music in common with me, like to listen to. For that the shared number of
+listeners for each artist is relevant. I use the word 'a listen', to refer to
+one playthrough of a track.
+
+## The Procedure
+
+### Parse data & drop unnecessary detail
+
+All of the JSON files of listenbrainz are parsed and only information about how
+many listens each user has submitted for what artist are kept. The result is
+stored in a B-tree map on my disk (the
+[sled library](https://crates.io/crates/sledg) is great for that).
+
+- First mapping created: `(user, artist) -> shared listens`.
+- (Also created a name lookup: `artist -> artist name`)
+
+The B-Tree stores values ordered, such that i can iterate through all artists of
+a user, by scanning the prefix `(user, …`.
+
+### Create a graph
+
+Next an undirected graph with weighted edges is generated where nodes are
+artists and edges are shared listens. For each user, each edge connecting
+artists they listen to, the weight is incremented by the sum of the logarhythms
+of either one's playthrough count for that user. This means that artists that
+share listeners are connected and because of the logarhythms, users that listen
+to an artist _a lot_ won't be weighted proportionally.
+
+Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with
+`(y, x)` so that edges are undirectional.)
+
+### Query artists
+
+The graph tree can now be queried by scanning with a prefix of one artist
+(`("The Beatles", …`) and all correlated artists are returned with a weight. The
+top-weighted results are kept and saved.
+
+### Notes
+
+Two issues appeared during this project that lead to the following fixes:
+
+- Limit one identity to 32 artists at most because the edge count grows
+  quadratically (100 artists -> 10000 edges)
+- When parsing data the user id is made dependent of the time to seperate arists
+  when music tastes changing over time. Every 10Ms (~4 months) the user ids
+  change.
+
+## Results
+
+In a couple of minutes I rendered about 2.2 million HTML documents with my
+results. They are available at `https://metamuffin.org/artist-correl/{name}`.
+Some example links:
+
+- [The Beatles](https://metamuffin.org/artist-correl/The%20Beatles)
+- [Aimer](https://metamuffin.org/artist-correl/Aimer)
+- [Rammstein](https://metamuffin.org/artist-correl/Rammstein)
+- [Mitski](https://metamuffin.org/artist-correl/Mitski)
+
+## Numbers
+
+- Musicbrainz: 15GB
+- Listenbrainz: 350GB
+- Extracted listening data: 23GB
+- Graph: 56GB
+- Rendered HTML: 2.3GB
+- Compressed HTML (squashfs with zstd): 172MB