From c19adca147d38562b3f4a06cb2205e043bc24856 Mon Sep 17 00:00:00 2001 From: metamuffin Date: Mon, 13 Feb 2023 20:25:04 +0100 Subject: restructure for embedding into my website --- articles/2022-08-29-blog-start.md | 9 + articles/2022-08-29-blog-test.md | 41 ++++ articles/2022-08-30-blog-tools.md | 104 +++++++++ articles/2022-09-19-gctf-readysetaction.md | 84 +++++++ articles/2022-09-25-ductf-file-magic.md | 255 +++++++++++++++++++++ articles/2022-11-07-programming-language-design.md | 70 ++++++ articles/2022-11-10-artist-correlation.md | 78 +++++++ code/style.css | 44 ++++ content/.gitignore | 1 - content/articles/2022-08-29-blog-start.md | 9 - content/articles/2022-08-29-blog-test.md | 41 ---- content/articles/2022-08-30-blog-tools.md | 104 --------- content/articles/2022-09-19-gctf-readysetaction.md | 84 ------- content/articles/2022-09-25-ductf-file-magic.md | 255 --------------------- .../2022-11-07-programming-language-design.md | 70 ------ content/articles/2022-11-10-artist-correlation.md | 78 ------- content/makefile | 1 - content/style.css | 44 ---- 18 files changed, 685 insertions(+), 687 deletions(-) create mode 100644 articles/2022-08-29-blog-start.md create mode 100644 articles/2022-08-29-blog-test.md create mode 100644 articles/2022-08-30-blog-tools.md create mode 100644 articles/2022-09-19-gctf-readysetaction.md create mode 100644 articles/2022-09-25-ductf-file-magic.md create mode 100644 articles/2022-11-07-programming-language-design.md create mode 100644 articles/2022-11-10-artist-correlation.md create mode 100644 code/style.css delete mode 100644 content/.gitignore delete mode 100644 content/articles/2022-08-29-blog-start.md delete mode 100644 content/articles/2022-08-29-blog-test.md delete mode 100644 content/articles/2022-08-30-blog-tools.md delete mode 100644 content/articles/2022-09-19-gctf-readysetaction.md delete mode 100644 content/articles/2022-09-25-ductf-file-magic.md delete mode 100644 content/articles/2022-11-07-programming-language-design.md delete mode 100644 content/articles/2022-11-10-artist-correlation.md delete mode 100644 content/makefile delete mode 100644 content/style.css diff --git a/articles/2022-08-29-blog-start.md b/articles/2022-08-29-blog-start.md new file mode 100644 index 0000000..2f17e6c --- /dev/null +++ b/articles/2022-08-29-blog-start.md @@ -0,0 +1,9 @@ +# Starting my blog + +This will be the place where I will post short articles about my thoughts and projects in the future, whenever I feel like it. + +A list of topics of which I already have an idea in mind are: + +- The tools I used to create this blog +- My dream operating system (maybe even logs about me trying (aka failing) to implement it) +- In search of a perfect text editor diff --git a/articles/2022-08-29-blog-test.md b/articles/2022-08-29-blog-test.md new file mode 100644 index 0000000..1c4a28a --- /dev/null +++ b/articles/2022-08-29-blog-test.md @@ -0,0 +1,41 @@ +# Blog test + +Hello world! + +## heading + +_italic_ **bold** `code` [Link](https://metamuffin.org/) + +### sub heading + +Look at this code here. + +```rs +#[derive(Default)] +struct Impossible { + something: HashMap, + // ^- look here, a keyword! --the grammar regex + itself: Impossible, +} + +pub async fn useless(s: Box) -> impl Future { + todo!() // this panics +} +fn main() { + 1 + "blub" + for (x, y) in [(1,2),(3,4)] { + println!("{x} and {y}"); + Impossible::default(); + } +} +``` + +As you can see, its pointless i.e. no points + +- list +- list +- list + +1. first this +2. then that +3. done diff --git a/articles/2022-08-30-blog-tools.md b/articles/2022-08-30-blog-tools.md new file mode 100644 index 0000000..81d1ff9 --- /dev/null +++ b/articles/2022-08-30-blog-tools.md @@ -0,0 +1,104 @@ +# Tools I use to create this blog + +As you might expect, this blog uses some overengineered tools, after all, it +wouldn't be so much fun otherwise. This article is probably not too interesting +on its own. It's just "documentation". + +// TODO upload source code and link it here. + +## "Build system" + +```tree +. +├── ... (readme 'n stuff) +├── code +│ ├── Cargo.toml +│ └── src +│ └── ... (rust source) +└── content + ├── articles + │ ├── 2022-08-29-blog-start.md + │ ├── 2022-08-29-blog-test.md + │ └── 2022-08-30-blog-tools.md + ├── makefile + ├── out + │ ├── ... (generated files) + └── style.css +``` + +The entry point to this "build system" is `content/makefile`, it has rules for +generating HTM from the markdown sources, the index and the atom feed. The +"compiler" here is a small rust program. (`code`). + +```makefile +# oversimplifed, doesnt work +TOOL := blog-tool + +out/index: $(ALL_ARTICLES) $(TOOL) + $(TOOL) render-index > $@ +out/feed.atom: $(ALL_ARTICLES) $(TOOL) + $(TOOL) generate-atom > $@ +out/%: articles/%.md $(TOOL) + $(TOOL) render-article $< > $@ +``` + +A small trick here is, to make everything depend on the compiler (`$(TOOL)`) +too, so that when it changes, re-generation of all articles if triggered. + +## Rust tools + +I use the [laby](https://crates.io/crates/laby) crate for templating the HTM. + +// TODO what is important here?! + +### File server + +I wanted to serve all files without the file name extension (`.html`), but my +previous http server ([http-server](https://github.com/http-party/http-server)) +exclusively inferred the MIME type from the extension, which makes it impossible +to do that. The obvious solution was to reinvent the wheel. + +This had the great side-effect of making my website blazing fast 🚀🚀! :) + +#### http-server (node.js) + +``` +Running 10s test @ http://127.0.0.1:8080/test-file + 2 threads and 100 connections +Requests/sec: 2314.00 +Transfer/sec: 725.38KB +``` + +#### fileserver (rust, no caching) + +``` +Running 10s test @ http://127.0.0.1:8080/test-file + 2 threads and 100 connections +Requests/sec: 24464.69 +Transfer/sec: 3.10MB +``` + +// TODO also upload source code and link it + +### Syntax highlighing + +For that, I chose the crate [synoptic](https://crates.io/crates/synoptic). It +provides a some functions for defining tokens with a regex, then returning +start- and end-points of those for further processing. For example, this is how +i defined comments and types: + +```rs +let rust_grammar = &[ + (&["(?m)(//.*)$"], "comment"), + (&["[A-Z][a-z]*", "bool", "usize", /* simplified */ ], "type"), + /* more rules */ +] +``` + +The library finds all the tokens and lets me serialize them to HTM. + +## End + +I still need to learn, how to write this blog well and what is the most +interesting to read. Please give me some advice or commentary via matrix, mail +or fedi. (see [contact](https://metamuffin.org/contact)) diff --git a/articles/2022-09-19-gctf-readysetaction.md b/articles/2022-09-19-gctf-readysetaction.md new file mode 100644 index 0000000..0d5b719 --- /dev/null +++ b/articles/2022-09-19-gctf-readysetaction.md @@ -0,0 +1,84 @@ +# Google Beginner CTF: ReadySetAction + +**SPOILER WARNING: Go play the CTF at +[capturetheflag.withgoogle.com](https://capturetheflag.withgoogle.com/) first** + +The challenge was a short python script with the following code: + +```py +from Crypto.Util.number import * + +flag = b"REDACTED" + +p = getPrime(1024) +q = getPrime(1024) +n = p*q + +m = bytes_to_long(flag) + + +c = pow(m,3,n) + +print(c) +print(n) +#154780 ... 6709 +#210348 ... 4477 +#(i removed both 617 digit long numbers here) +``` + +It was immediatly obvious to me that this has to do with RSA cryptography +because it multiplies large primes and then does some powmod magic. Because I +didn't remember exactly how RSA worked, I quickly read the Wikipedia article +again - specifically section +[4.1 Attacks against plain RSA](https://en.wikipedia.org/wiki/RSA_%29cryptosystem%29#Attacks_against_plain_RSA) + +The first item already seems like the solution: + +> When encrypting with low encryption exponents (e.g., e = 3) and small values +> of the m (i.e., m < n1/e), the result of me is strictly less than the modulus +> n. In this case, ciphertexts can be decrypted easily by taking the eth root of +> the ciphertext over the integers. + +Which would mean that $\sqrt[3]{c}$ is a solution _if_ $m$ is small enough. +However + +$c = m^3 \mod n \\$ + +so the exponentiation could yield a much higher result that we dont get because +it is wrapped at $n$. So we can try out how often it wrapped: + +$m = \sqrt[3]{c + x*n} \quad x \in \N\\$ + +where $x$ can be brute-forced if $m$ is sufficiently small. Back in python it +looks like this: + +```py +c = ... +n = ... + +# simple but bad binary search for n-th root +def nth_root(c,e): + high = 1 + while high**e < c: + high *= 2 + low = 0 + mid = 0 + last_mid = 0 + while mid**e != c: + mid = (low + high) // 2 + if last_mid == mid: + return 0 # i use 0 to indicate a failure + if mid**e > c: + high = mid + else: + low = mid + last_mid = mid + return mid + +for x in range(100000): + m = nth_root(c + n*x, e) + # the probability of finding a perfect cube number is low, so any result is what we want + if m != 0: print(long_to_bytes(m)) +``` + +Thats it! With $x=1831$ we get our flag: `CTF{34sy_RS4_1s_e4sy_us3}` diff --git a/articles/2022-09-25-ductf-file-magic.md b/articles/2022-09-25-ductf-file-magic.md new file mode 100644 index 0000000..f4b55c9 --- /dev/null +++ b/articles/2022-09-25-ductf-file-magic.md @@ -0,0 +1,255 @@ +# DownUnderCTF 2022: File Magic + +A short writeup about my favorite challenge from DUCTF. It took me approximatly +12h to solve. I found it was the most creative and challenging one that i +solved. + +## Task + +The challenge consists of a python script and an ip-port pair which appears to +be running that script. Also the path of the flag is given: `./flag.txt` + +```py +#!/usr/bin/env python3 + +from Crypto.Cipher import AES +from PIL import Image +from tempfile import NamedTemporaryFile +from io import BytesIO +import subprocess, os + +KEY = b'downunderctf2022' + +iv = bytes.fromhex(input('iv (hex): ')) +assert len(iv) == 16 and b'DUCTF' in iv, 'Invalid IV' + +data = bytes.fromhex(input('file (hex): ')) +assert len(data) % 16 == 0, 'Misaligned file length' +assert len(data) < 1337, 'Oversized file length' + +data_buf = BytesIO(data) +img = Image.open(data_buf, formats=['jpeg']) +assert img.width == 13 and img.height == 37, 'Invalid image size' +assert img.getpixel((7, 7)) == (7, 7, 7), 'Invalid image contents' + +aes = AES.new(KEY, iv=iv, mode=AES.MODE_CBC) +dec = aes.decrypt(data) +assert dec.startswith(b'\x7fELF'), 'Not an ELF' + +f = NamedTemporaryFile(delete=False) +f.write(dec) +f.close() + +os.chmod(f.name, 0o777) +pipes = subprocess.Popen([f.name], stdout=subprocess.PIPE) +stdout, _ = pipes.communicate() +print(stdout.decode()) + +os.remove(f.name) +``` + +So, for anything to make it past these checks and be executed it must: + +1. be a valid 13x37 JPEG image with the pixel at 7,7 set to `#070707` +2. be a valid ELF binary that reads `./flag.txt` after decrypting with AES CBC, + fixed key (`downunderctf2022`) and the provided IV +3. The IV must contain `DUCTF` + +During the competition I discovered the information in the next three headings +in parallel but internally in-order. + +## 1. AES CBC "flaw" + +We need to generate a file that is a sort-of polyglot with JPEG and ELF, +converted with AES CBC (Cipher block chaining). + +AES itself operates on 16-byte (for 128-bit AES) blocks, so bigger files are +split and then encrypted separately. To ensure that identical blocks don't +result in identical blocks in ciphertext, each block is first xor'd with +something that won't be identical. In the case of CBC, the last ciphertext block +or the initialisation vector (IV) is used. Here is a diagram for encryption + +``` + ___plaintext____|___plaintext____|___plaintext____|... + v v v +IV--->XOR ,---------->XOR ,--------->XOR ,---- ... + v | v | v | + AES | AES | AES | + v---' v---' v---' + ___ciphertext___|___ciphertext___|___ciphertext___|... +``` + +For decryption we can just flip the diagram and replace AES with reverse AES. + +``` + ___ciphertext___|___ciphertext___|___ciphertext___|... + v---, v---, v---, + ∀EZ | ∀EZ | ∀EZ | + v | v | v | +IV--->XOR '---------->XOR '--------->XOR '---- ... + v v v + ___plaintext____|___plaintext____|___plaintext____|... +``` + +This does make sense, however i noticed that all but the first block do not +depend on IV. This turns out useful since we can turn any block into any other +block by applying a chosen value with XOR. So we can control the ciphertext with +the IV directly as follows: + +- $m$: first plaintext block +- $c$: first ciphertext block + +$$ c = AES(m \oplus IV) \\ + +AES^{-1}(c) = m \oplus IV \\ + +AES^{-1}(c) \oplus m = IV $$ + +All blocks in ciphertext after the first are now "uncontrollable" because IV and +plaintext are set. + +## 2. JPEG + +JPEG consists of a list of _segments_. Each starts with a marker byte (`ff`) +followed by a identifier and the length of the segment (if non-zero). + +| Identifier | Name | +| ---------- | ----------------------------------------------- | +| `d8` | Start of Image | +| `fe` | Comment | +| `d9` | End of Image | +| ... | _a bunch more that you dont need to know about_ | + +The comment segment is perfect for embedding our ELF binary into JPEG. We can +first generate a JPEG image, then insert a _comment_ somewhere containing any +data we want. + +## 3. ELF Payload + +The binary needs to be super small so creating it "manually" was required. I +followed the guide +[Creating a minimal ELF-64 file by tchajed](https://github.com/tchajed/minimal-elf/) +and recreated it for my needs. Like in the guide i also wrote the assembly with +a rust EDSL. + +```rs +let mut str_end = a.create_label(); +let mut filename = a.create_label(); +a.jmp(str_end)?; // jump over the string +a.set_label(&mut filename)?; +a.db(b"flag.txt\0")?; +a.set_label(&mut str_end)?; + +// open("flag.txt", O_RDONLY) +a.mov(eax, 2)?; +a.lea(rdi, ptr(filename))?; +a.mov(rsi, 0u64)?; +a.syscall()?; // fd -> rax + +// sendfile(1, rax, 0, 0xff) +a.mov(rsi, rax)?; +a.mov(eax, 40)?; +a.mov(rdi, 1u64)?; +a.mov(rdx, 0u64)?; +a.mov(r10, 0xffu64)?; +a.syscall()?; +``` + +I was able to produce a 207 byte long executable. + +## 4. _magic!_ + +Here is how we align the files now (single quotes `'` indicate ASCII +representation for clarity, question marks `?` represent data that is implicitly +defined): + +``` +plaintext: 7f 'E 'L 'F 02 01 01 00 00 00 00 00 00 00 00 00 +iv: ?? ?? ?? ?? ?? ?? 'D 'U 'C 'T 'F 00 00 00 00 00 +ciphertext: ff d8 ff fe {len} ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? +``` + +This scenario is a little more complicated because in some places we define +ciphertext and plaintext and in some we define ciphertext and IV. This is not a +problem because XOR operates on every byte individually. + +All-in-all, the file looks like this now: + +- First block: + - overlapping headers (6 bytes) + - `DUCTF` string (5 bytes) + - padding (5 bytes) +- Rest of the ELF binary +- _end of the jpeg comment_ +- Rest of the JPEG image + +All this information should be enough to solve this challenge! + +I here attach the implementation that i created _during_ the CTF here. I kept it +as messy as it was, just removed not-so-interesting code and added comments. + +```rs +let mut i = image::RgbImage::new(13, 37); +// jpeg is lossy so filling guarantees the pixel actually has the *exact* color +i.pixels_mut().for_each(|p| *p = Rgb([7, 7, 7])); + +let mut imgbuf = Cursor::new(vec![]); +image::codecs::jpeg::JpegEncoder::new(&mut imgbuf) + .encode_image(&i) + .unwrap(); + +let key = Aes128::new(&GenericArray::from(*b"downunderctf2022")); +let mut payload = create_elf_payload(); +let mut buf = BytesMut::new(); + +// pad payload so AES works +while payload.len() % 16 != 0 { + payload.put_u8(0); +} +assert!(payload.len() % 16 == 0); + +let prefix_len = 6; +// write the JPEG headers to start a comment +buf.put_u16(0xffd8); +buf.put_u16(0xfffe); +buf.put_u16(payload.len() as u16 + 2 /*seg len*/ - prefix_len as u16); + +// find a good IV +let iv = { + let mut fblock_target = vec![0u8; 16]; + fblock_target[0..prefix_len].copy_from_slice(&buf[0..prefix_len]); + key.decrypt_block(GenericArray::from_mut_slice(&mut fblock_target)); + let mut iv = xor_slice(&fblock_target, &payload[0..16]); + iv[prefix_len..prefix_len + 5].copy_from_slice(b"DUCTF"); + iv +}; + +// fill the first block up with zeroes +for _ in prefix_len..16 { + buf.put_u8(0); +} + +// encrypt starting with the 2nd block +buf.put(&payload[16..]); +let block_range = |n: usize| (n) * 16..(n + 1) * 16; +for i in 1..buf.len() / 16 { + let tb = Vec::from(&buf[block_range(i - 1)]); + xor_assign(&mut buf[block_range(i)], &tb); + key.encrypt_block(GenericArray::from_mut_slice(&mut buf[block_range(1)])); +} + +// append the rest of the image, stripping the first segment +buf.put(&imgbuf.into_inner()[2..]); + +// pad the final buffer again because the image might not be aligned +while buf.len() % 16 != 0 { + buf.put_u8(0); +} +assert!(buf.len() < 1337); + +File::create("image").unwrap().write_all(&buf).unwrap(); +File::create("iv").unwrap().write_all(&iv).unwrap(); +``` + +I am also still looking for team mates for upcoming CTF events and would be +happy to hack together with you! Just [contact](https://metamuffin.org/contact) me. diff --git a/articles/2022-11-07-programming-language-design.md b/articles/2022-11-07-programming-language-design.md new file mode 100644 index 0000000..674e696 --- /dev/null +++ b/articles/2022-11-07-programming-language-design.md @@ -0,0 +1,70 @@ +# Some Thoughts on Programming Language Design + +This is a collection of ideas to look at when inventing new langauges. + +## Other Ideas + +- The Language pushes abstraction to the limit by not noting any + hardware-related issues like memory-allocations, parallelism, heterogenous + computer architecture (CPU, GPU, …) + - requires a very "intellegent" compiler and a way to describe unknowns like + possible inputs, etc. in the language itself +- Start with assembly but add a very flexible macro system + +## Type System + +```diff +# Haskell +data LinkedList a = Nil | Cons a (Box (LinkedList a)) +data Test = Empty | Blub Int | State { x :: Int, y :: Int } +# Rust +enum LinkedList { Nil, Cons(T, LinkedList) } +``` + +## Memory Management + +- **Drop when out-of-scope** +- Garbage collections +- Reference counting + +## Compile-time logic + +- Annotation when calling function to be run as-far-as-possible at comptime + +```diff +fn format(template: String, args: [String]) -> String { + template.replace("@", (match, i) => args[i]) +} + +fun add(x, y) x + y + +fun main() print(format!("@ ist @; @", ["1+1", 1+1, x])) +# should expand to +fun main() print("1+1 ist 2; " ~ x)) +``` + +## Examples + +### Fizz-Buzz + +```diff +for (n in 0..100) { + if (n % (3*5) == 0) print("FizzBuzz") + else if (n % 3 == 0) print("Fizz") + else if (n % 5 == 0) print("Buzz") + else print(n) +} + + +if (true) x = 1 +if (true) { x = 1 } +``` + +``` +f(x) = 10 + g(x) +f x = 10 + g x + +main = { + +} +``` diff --git a/articles/2022-11-10-artist-correlation.md b/articles/2022-11-10-artist-correlation.md new file mode 100644 index 0000000..d52e613 --- /dev/null +++ b/articles/2022-11-10-artist-correlation.md @@ -0,0 +1,78 @@ +# Correlating music artists + +I listen to a lot of music and so every few months my music collection gets +boring again. So far I have asked friends to recommend me music but I am running +out of friend too now. Therefore I came up with a new solution during a few +days. + +I want to find new music that i might like too. After some research I found that +there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and +recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service +to which you can submit what you are listening to). Both databases are useful +for this project. The high-level goal is to know, what people that have a lot of +music in common with me, like to listen to. For that the shared number of +listeners for each artist is relevant. I use the word 'a listen', to refer to +one playthrough of a track. + +## The Procedure + +### Parse data & drop unnecessary detail + +All of the JSON files of listenbrainz are parsed and only information about how +many listens each user has submitted for what artist are kept. The result is +stored in a B-tree map on my disk (the +[sled library](https://crates.io/crates/sledg) is great for that). + +- First mapping created: `(user, artist) -> shared listens`. +- (Also created a name lookup: `artist -> artist name`) + +The B-Tree stores values ordered, such that i can iterate through all artists of +a user, by scanning the prefix `(user, …`. + +### Create a graph + +Next an undirected graph with weighted edges is generated where nodes are +artists and edges are shared listens. For each user, each edge connecting +artists they listen to, the weight is incremented by the sum of the logarhythms +of either one's playthrough count for that user. This means that artists that +share listeners are connected and because of the logarhythms, users that listen +to an artist _a lot_ won't be weighted proportionally. + +Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with +`(y, x)` so that edges are undirectional.) + +### Query artists + +The graph tree can now be queried by scanning with a prefix of one artist +(`("The Beatles", …`) and all correlated artists are returned with a weight. The +top-weighted results are kept and saved. + +### Notes + +Two issues appeared during this project that lead to the following fixes: + +- Limit one identity to 32 artists at most because the edge count grows + quadratically (100 artists -> 10000 edges) +- When parsing data the user id is made dependent of the time to seperate arists + when music tastes changing over time. Every 10Ms (~4 months) the user ids + change. + +## Results + +In a couple of minutes I rendered about 2.2 million HTML documents with my +results. They are available at `https://metamuffin.org/artist-correl/{name}`. +Some example links: + +- [The Beatles](https://metamuffin.org/artist-correl/The%20Beatles) +- [Aimer](https://metamuffin.org/artist-correl/Aimer) +- [Rammstein](https://metamuffin.org/artist-correl/Rammstein) +- [Mitski](https://metamuffin.org/artist-correl/Mitski) + +## Numbers + +- Musicbrainz: 15GB +- Listenbrainz: 350GB +- Extracted listening data: 23GB +- Graph: 56GB +- Rendered HTML: 2.3GB +- Compressed HTML (squashfs with zstd): 172MB diff --git a/code/style.css b/code/style.css new file mode 100644 index 0000000..3e19951 --- /dev/null +++ b/code/style.css @@ -0,0 +1,44 @@ +@import url("https://s.metamuffin.org/static/font-ubuntu/include.css"); + +:root { + --bg2: #141414; + --bg1: #060606; +} + +body { + margin: 3em; + background-color: var(--bg1); +} + +nav,article { + margin: 1em; + background-color: var(--bg2); + padding: 1em; + border: 0px solid transparent; + border-radius: 1em; +} + +h1 {font-size: xx-large } +h2 {font-size: x-large } +h3 {font-size: large } +h4 {font-size: medium } + +p,h1,h2,h3,h4,h5,h6,li { font-family: "Ubuntu", sans-serif; } +h1,h2 { color: #b575ff } +h3 { color: #ff83fd; margin-left: 1em } +h4 { color: #ff82b2; margin-left: 2em } +p,li { color: white; margin-left: 3em } +a { color: #82a8ff; font-style: italic; text-decoration: underline } +hr { border: 1px solid grey } + +math {color:white} + +pre,code { + color: #eeeeee; + font-family: monospace; + background-color: black; + border: 0px solid transparent; + border-radius: 5px; + padding: 0.2em; +} +pre { margin-left: 4em; margin-right: 4em; overflow-y: auto; } diff --git a/content/.gitignore b/content/.gitignore deleted file mode 100644 index e2e7327..0000000 --- a/content/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/out diff --git a/content/articles/2022-08-29-blog-start.md b/content/articles/2022-08-29-blog-start.md deleted file mode 100644 index 2f17e6c..0000000 --- a/content/articles/2022-08-29-blog-start.md +++ /dev/null @@ -1,9 +0,0 @@ -# Starting my blog - -This will be the place where I will post short articles about my thoughts and projects in the future, whenever I feel like it. - -A list of topics of which I already have an idea in mind are: - -- The tools I used to create this blog -- My dream operating system (maybe even logs about me trying (aka failing) to implement it) -- In search of a perfect text editor diff --git a/content/articles/2022-08-29-blog-test.md b/content/articles/2022-08-29-blog-test.md deleted file mode 100644 index 1c4a28a..0000000 --- a/content/articles/2022-08-29-blog-test.md +++ /dev/null @@ -1,41 +0,0 @@ -# Blog test - -Hello world! - -## heading - -_italic_ **bold** `code` [Link](https://metamuffin.org/) - -### sub heading - -Look at this code here. - -```rs -#[derive(Default)] -struct Impossible { - something: HashMap, - // ^- look here, a keyword! --the grammar regex - itself: Impossible, -} - -pub async fn useless(s: Box) -> impl Future { - todo!() // this panics -} -fn main() { - 1 + "blub" - for (x, y) in [(1,2),(3,4)] { - println!("{x} and {y}"); - Impossible::default(); - } -} -``` - -As you can see, its pointless i.e. no points - -- list -- list -- list - -1. first this -2. then that -3. done diff --git a/content/articles/2022-08-30-blog-tools.md b/content/articles/2022-08-30-blog-tools.md deleted file mode 100644 index 81d1ff9..0000000 --- a/content/articles/2022-08-30-blog-tools.md +++ /dev/null @@ -1,104 +0,0 @@ -# Tools I use to create this blog - -As you might expect, this blog uses some overengineered tools, after all, it -wouldn't be so much fun otherwise. This article is probably not too interesting -on its own. It's just "documentation". - -// TODO upload source code and link it here. - -## "Build system" - -```tree -. -├── ... (readme 'n stuff) -├── code -│ ├── Cargo.toml -│ └── src -│ └── ... (rust source) -└── content - ├── articles - │ ├── 2022-08-29-blog-start.md - │ ├── 2022-08-29-blog-test.md - │ └── 2022-08-30-blog-tools.md - ├── makefile - ├── out - │ ├── ... (generated files) - └── style.css -``` - -The entry point to this "build system" is `content/makefile`, it has rules for -generating HTM from the markdown sources, the index and the atom feed. The -"compiler" here is a small rust program. (`code`). - -```makefile -# oversimplifed, doesnt work -TOOL := blog-tool - -out/index: $(ALL_ARTICLES) $(TOOL) - $(TOOL) render-index > $@ -out/feed.atom: $(ALL_ARTICLES) $(TOOL) - $(TOOL) generate-atom > $@ -out/%: articles/%.md $(TOOL) - $(TOOL) render-article $< > $@ -``` - -A small trick here is, to make everything depend on the compiler (`$(TOOL)`) -too, so that when it changes, re-generation of all articles if triggered. - -## Rust tools - -I use the [laby](https://crates.io/crates/laby) crate for templating the HTM. - -// TODO what is important here?! - -### File server - -I wanted to serve all files without the file name extension (`.html`), but my -previous http server ([http-server](https://github.com/http-party/http-server)) -exclusively inferred the MIME type from the extension, which makes it impossible -to do that. The obvious solution was to reinvent the wheel. - -This had the great side-effect of making my website blazing fast 🚀🚀! :) - -#### http-server (node.js) - -``` -Running 10s test @ http://127.0.0.1:8080/test-file - 2 threads and 100 connections -Requests/sec: 2314.00 -Transfer/sec: 725.38KB -``` - -#### fileserver (rust, no caching) - -``` -Running 10s test @ http://127.0.0.1:8080/test-file - 2 threads and 100 connections -Requests/sec: 24464.69 -Transfer/sec: 3.10MB -``` - -// TODO also upload source code and link it - -### Syntax highlighing - -For that, I chose the crate [synoptic](https://crates.io/crates/synoptic). It -provides a some functions for defining tokens with a regex, then returning -start- and end-points of those for further processing. For example, this is how -i defined comments and types: - -```rs -let rust_grammar = &[ - (&["(?m)(//.*)$"], "comment"), - (&["[A-Z][a-z]*", "bool", "usize", /* simplified */ ], "type"), - /* more rules */ -] -``` - -The library finds all the tokens and lets me serialize them to HTM. - -## End - -I still need to learn, how to write this blog well and what is the most -interesting to read. Please give me some advice or commentary via matrix, mail -or fedi. (see [contact](https://metamuffin.org/contact)) diff --git a/content/articles/2022-09-19-gctf-readysetaction.md b/content/articles/2022-09-19-gctf-readysetaction.md deleted file mode 100644 index 0d5b719..0000000 --- a/content/articles/2022-09-19-gctf-readysetaction.md +++ /dev/null @@ -1,84 +0,0 @@ -# Google Beginner CTF: ReadySetAction - -**SPOILER WARNING: Go play the CTF at -[capturetheflag.withgoogle.com](https://capturetheflag.withgoogle.com/) first** - -The challenge was a short python script with the following code: - -```py -from Crypto.Util.number import * - -flag = b"REDACTED" - -p = getPrime(1024) -q = getPrime(1024) -n = p*q - -m = bytes_to_long(flag) - - -c = pow(m,3,n) - -print(c) -print(n) -#154780 ... 6709 -#210348 ... 4477 -#(i removed both 617 digit long numbers here) -``` - -It was immediatly obvious to me that this has to do with RSA cryptography -because it multiplies large primes and then does some powmod magic. Because I -didn't remember exactly how RSA worked, I quickly read the Wikipedia article -again - specifically section -[4.1 Attacks against plain RSA](https://en.wikipedia.org/wiki/RSA_%29cryptosystem%29#Attacks_against_plain_RSA) - -The first item already seems like the solution: - -> When encrypting with low encryption exponents (e.g., e = 3) and small values -> of the m (i.e., m < n1/e), the result of me is strictly less than the modulus -> n. In this case, ciphertexts can be decrypted easily by taking the eth root of -> the ciphertext over the integers. - -Which would mean that $\sqrt[3]{c}$ is a solution _if_ $m$ is small enough. -However - -$c = m^3 \mod n \\$ - -so the exponentiation could yield a much higher result that we dont get because -it is wrapped at $n$. So we can try out how often it wrapped: - -$m = \sqrt[3]{c + x*n} \quad x \in \N\\$ - -where $x$ can be brute-forced if $m$ is sufficiently small. Back in python it -looks like this: - -```py -c = ... -n = ... - -# simple but bad binary search for n-th root -def nth_root(c,e): - high = 1 - while high**e < c: - high *= 2 - low = 0 - mid = 0 - last_mid = 0 - while mid**e != c: - mid = (low + high) // 2 - if last_mid == mid: - return 0 # i use 0 to indicate a failure - if mid**e > c: - high = mid - else: - low = mid - last_mid = mid - return mid - -for x in range(100000): - m = nth_root(c + n*x, e) - # the probability of finding a perfect cube number is low, so any result is what we want - if m != 0: print(long_to_bytes(m)) -``` - -Thats it! With $x=1831$ we get our flag: `CTF{34sy_RS4_1s_e4sy_us3}` diff --git a/content/articles/2022-09-25-ductf-file-magic.md b/content/articles/2022-09-25-ductf-file-magic.md deleted file mode 100644 index f4b55c9..0000000 --- a/content/articles/2022-09-25-ductf-file-magic.md +++ /dev/null @@ -1,255 +0,0 @@ -# DownUnderCTF 2022: File Magic - -A short writeup about my favorite challenge from DUCTF. It took me approximatly -12h to solve. I found it was the most creative and challenging one that i -solved. - -## Task - -The challenge consists of a python script and an ip-port pair which appears to -be running that script. Also the path of the flag is given: `./flag.txt` - -```py -#!/usr/bin/env python3 - -from Crypto.Cipher import AES -from PIL import Image -from tempfile import NamedTemporaryFile -from io import BytesIO -import subprocess, os - -KEY = b'downunderctf2022' - -iv = bytes.fromhex(input('iv (hex): ')) -assert len(iv) == 16 and b'DUCTF' in iv, 'Invalid IV' - -data = bytes.fromhex(input('file (hex): ')) -assert len(data) % 16 == 0, 'Misaligned file length' -assert len(data) < 1337, 'Oversized file length' - -data_buf = BytesIO(data) -img = Image.open(data_buf, formats=['jpeg']) -assert img.width == 13 and img.height == 37, 'Invalid image size' -assert img.getpixel((7, 7)) == (7, 7, 7), 'Invalid image contents' - -aes = AES.new(KEY, iv=iv, mode=AES.MODE_CBC) -dec = aes.decrypt(data) -assert dec.startswith(b'\x7fELF'), 'Not an ELF' - -f = NamedTemporaryFile(delete=False) -f.write(dec) -f.close() - -os.chmod(f.name, 0o777) -pipes = subprocess.Popen([f.name], stdout=subprocess.PIPE) -stdout, _ = pipes.communicate() -print(stdout.decode()) - -os.remove(f.name) -``` - -So, for anything to make it past these checks and be executed it must: - -1. be a valid 13x37 JPEG image with the pixel at 7,7 set to `#070707` -2. be a valid ELF binary that reads `./flag.txt` after decrypting with AES CBC, - fixed key (`downunderctf2022`) and the provided IV -3. The IV must contain `DUCTF` - -During the competition I discovered the information in the next three headings -in parallel but internally in-order. - -## 1. AES CBC "flaw" - -We need to generate a file that is a sort-of polyglot with JPEG and ELF, -converted with AES CBC (Cipher block chaining). - -AES itself operates on 16-byte (for 128-bit AES) blocks, so bigger files are -split and then encrypted separately. To ensure that identical blocks don't -result in identical blocks in ciphertext, each block is first xor'd with -something that won't be identical. In the case of CBC, the last ciphertext block -or the initialisation vector (IV) is used. Here is a diagram for encryption - -``` - ___plaintext____|___plaintext____|___plaintext____|... - v v v -IV--->XOR ,---------->XOR ,--------->XOR ,---- ... - v | v | v | - AES | AES | AES | - v---' v---' v---' - ___ciphertext___|___ciphertext___|___ciphertext___|... -``` - -For decryption we can just flip the diagram and replace AES with reverse AES. - -``` - ___ciphertext___|___ciphertext___|___ciphertext___|... - v---, v---, v---, - ∀EZ | ∀EZ | ∀EZ | - v | v | v | -IV--->XOR '---------->XOR '--------->XOR '---- ... - v v v - ___plaintext____|___plaintext____|___plaintext____|... -``` - -This does make sense, however i noticed that all but the first block do not -depend on IV. This turns out useful since we can turn any block into any other -block by applying a chosen value with XOR. So we can control the ciphertext with -the IV directly as follows: - -- $m$: first plaintext block -- $c$: first ciphertext block - -$$ c = AES(m \oplus IV) \\ - -AES^{-1}(c) = m \oplus IV \\ - -AES^{-1}(c) \oplus m = IV $$ - -All blocks in ciphertext after the first are now "uncontrollable" because IV and -plaintext are set. - -## 2. JPEG - -JPEG consists of a list of _segments_. Each starts with a marker byte (`ff`) -followed by a identifier and the length of the segment (if non-zero). - -| Identifier | Name | -| ---------- | ----------------------------------------------- | -| `d8` | Start of Image | -| `fe` | Comment | -| `d9` | End of Image | -| ... | _a bunch more that you dont need to know about_ | - -The comment segment is perfect for embedding our ELF binary into JPEG. We can -first generate a JPEG image, then insert a _comment_ somewhere containing any -data we want. - -## 3. ELF Payload - -The binary needs to be super small so creating it "manually" was required. I -followed the guide -[Creating a minimal ELF-64 file by tchajed](https://github.com/tchajed/minimal-elf/) -and recreated it for my needs. Like in the guide i also wrote the assembly with -a rust EDSL. - -```rs -let mut str_end = a.create_label(); -let mut filename = a.create_label(); -a.jmp(str_end)?; // jump over the string -a.set_label(&mut filename)?; -a.db(b"flag.txt\0")?; -a.set_label(&mut str_end)?; - -// open("flag.txt", O_RDONLY) -a.mov(eax, 2)?; -a.lea(rdi, ptr(filename))?; -a.mov(rsi, 0u64)?; -a.syscall()?; // fd -> rax - -// sendfile(1, rax, 0, 0xff) -a.mov(rsi, rax)?; -a.mov(eax, 40)?; -a.mov(rdi, 1u64)?; -a.mov(rdx, 0u64)?; -a.mov(r10, 0xffu64)?; -a.syscall()?; -``` - -I was able to produce a 207 byte long executable. - -## 4. _magic!_ - -Here is how we align the files now (single quotes `'` indicate ASCII -representation for clarity, question marks `?` represent data that is implicitly -defined): - -``` -plaintext: 7f 'E 'L 'F 02 01 01 00 00 00 00 00 00 00 00 00 -iv: ?? ?? ?? ?? ?? ?? 'D 'U 'C 'T 'F 00 00 00 00 00 -ciphertext: ff d8 ff fe {len} ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? -``` - -This scenario is a little more complicated because in some places we define -ciphertext and plaintext and in some we define ciphertext and IV. This is not a -problem because XOR operates on every byte individually. - -All-in-all, the file looks like this now: - -- First block: - - overlapping headers (6 bytes) - - `DUCTF` string (5 bytes) - - padding (5 bytes) -- Rest of the ELF binary -- _end of the jpeg comment_ -- Rest of the JPEG image - -All this information should be enough to solve this challenge! - -I here attach the implementation that i created _during_ the CTF here. I kept it -as messy as it was, just removed not-so-interesting code and added comments. - -```rs -let mut i = image::RgbImage::new(13, 37); -// jpeg is lossy so filling guarantees the pixel actually has the *exact* color -i.pixels_mut().for_each(|p| *p = Rgb([7, 7, 7])); - -let mut imgbuf = Cursor::new(vec![]); -image::codecs::jpeg::JpegEncoder::new(&mut imgbuf) - .encode_image(&i) - .unwrap(); - -let key = Aes128::new(&GenericArray::from(*b"downunderctf2022")); -let mut payload = create_elf_payload(); -let mut buf = BytesMut::new(); - -// pad payload so AES works -while payload.len() % 16 != 0 { - payload.put_u8(0); -} -assert!(payload.len() % 16 == 0); - -let prefix_len = 6; -// write the JPEG headers to start a comment -buf.put_u16(0xffd8); -buf.put_u16(0xfffe); -buf.put_u16(payload.len() as u16 + 2 /*seg len*/ - prefix_len as u16); - -// find a good IV -let iv = { - let mut fblock_target = vec![0u8; 16]; - fblock_target[0..prefix_len].copy_from_slice(&buf[0..prefix_len]); - key.decrypt_block(GenericArray::from_mut_slice(&mut fblock_target)); - let mut iv = xor_slice(&fblock_target, &payload[0..16]); - iv[prefix_len..prefix_len + 5].copy_from_slice(b"DUCTF"); - iv -}; - -// fill the first block up with zeroes -for _ in prefix_len..16 { - buf.put_u8(0); -} - -// encrypt starting with the 2nd block -buf.put(&payload[16..]); -let block_range = |n: usize| (n) * 16..(n + 1) * 16; -for i in 1..buf.len() / 16 { - let tb = Vec::from(&buf[block_range(i - 1)]); - xor_assign(&mut buf[block_range(i)], &tb); - key.encrypt_block(GenericArray::from_mut_slice(&mut buf[block_range(1)])); -} - -// append the rest of the image, stripping the first segment -buf.put(&imgbuf.into_inner()[2..]); - -// pad the final buffer again because the image might not be aligned -while buf.len() % 16 != 0 { - buf.put_u8(0); -} -assert!(buf.len() < 1337); - -File::create("image").unwrap().write_all(&buf).unwrap(); -File::create("iv").unwrap().write_all(&iv).unwrap(); -``` - -I am also still looking for team mates for upcoming CTF events and would be -happy to hack together with you! Just [contact](https://metamuffin.org/contact) me. diff --git a/content/articles/2022-11-07-programming-language-design.md b/content/articles/2022-11-07-programming-language-design.md deleted file mode 100644 index 674e696..0000000 --- a/content/articles/2022-11-07-programming-language-design.md +++ /dev/null @@ -1,70 +0,0 @@ -# Some Thoughts on Programming Language Design - -This is a collection of ideas to look at when inventing new langauges. - -## Other Ideas - -- The Language pushes abstraction to the limit by not noting any - hardware-related issues like memory-allocations, parallelism, heterogenous - computer architecture (CPU, GPU, …) - - requires a very "intellegent" compiler and a way to describe unknowns like - possible inputs, etc. in the language itself -- Start with assembly but add a very flexible macro system - -## Type System - -```diff -# Haskell -data LinkedList a = Nil | Cons a (Box (LinkedList a)) -data Test = Empty | Blub Int | State { x :: Int, y :: Int } -# Rust -enum LinkedList { Nil, Cons(T, LinkedList) } -``` - -## Memory Management - -- **Drop when out-of-scope** -- Garbage collections -- Reference counting - -## Compile-time logic - -- Annotation when calling function to be run as-far-as-possible at comptime - -```diff -fn format(template: String, args: [String]) -> String { - template.replace("@", (match, i) => args[i]) -} - -fun add(x, y) x + y - -fun main() print(format!("@ ist @; @", ["1+1", 1+1, x])) -# should expand to -fun main() print("1+1 ist 2; " ~ x)) -``` - -## Examples - -### Fizz-Buzz - -```diff -for (n in 0..100) { - if (n % (3*5) == 0) print("FizzBuzz") - else if (n % 3 == 0) print("Fizz") - else if (n % 5 == 0) print("Buzz") - else print(n) -} - - -if (true) x = 1 -if (true) { x = 1 } -``` - -``` -f(x) = 10 + g(x) -f x = 10 + g x - -main = { - -} -``` diff --git a/content/articles/2022-11-10-artist-correlation.md b/content/articles/2022-11-10-artist-correlation.md deleted file mode 100644 index d52e613..0000000 --- a/content/articles/2022-11-10-artist-correlation.md +++ /dev/null @@ -1,78 +0,0 @@ -# Correlating music artists - -I listen to a lot of music and so every few months my music collection gets -boring again. So far I have asked friends to recommend me music but I am running -out of friend too now. Therefore I came up with a new solution during a few -days. - -I want to find new music that i might like too. After some research I found that -there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and -recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service -to which you can submit what you are listening to). Both databases are useful -for this project. The high-level goal is to know, what people that have a lot of -music in common with me, like to listen to. For that the shared number of -listeners for each artist is relevant. I use the word 'a listen', to refer to -one playthrough of a track. - -## The Procedure - -### Parse data & drop unnecessary detail - -All of the JSON files of listenbrainz are parsed and only information about how -many listens each user has submitted for what artist are kept. The result is -stored in a B-tree map on my disk (the -[sled library](https://crates.io/crates/sledg) is great for that). - -- First mapping created: `(user, artist) -> shared listens`. -- (Also created a name lookup: `artist -> artist name`) - -The B-Tree stores values ordered, such that i can iterate through all artists of -a user, by scanning the prefix `(user, …`. - -### Create a graph - -Next an undirected graph with weighted edges is generated where nodes are -artists and edges are shared listens. For each user, each edge connecting -artists they listen to, the weight is incremented by the sum of the logarhythms -of either one's playthrough count for that user. This means that artists that -share listeners are connected and because of the logarhythms, users that listen -to an artist _a lot_ won't be weighted proportionally. - -Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with -`(y, x)` so that edges are undirectional.) - -### Query artists - -The graph tree can now be queried by scanning with a prefix of one artist -(`("The Beatles", …`) and all correlated artists are returned with a weight. The -top-weighted results are kept and saved. - -### Notes - -Two issues appeared during this project that lead to the following fixes: - -- Limit one identity to 32 artists at most because the edge count grows - quadratically (100 artists -> 10000 edges) -- When parsing data the user id is made dependent of the time to seperate arists - when music tastes changing over time. Every 10Ms (~4 months) the user ids - change. - -## Results - -In a couple of minutes I rendered about 2.2 million HTML documents with my -results. They are available at `https://metamuffin.org/artist-correl/{name}`. -Some example links: - -- [The Beatles](https://metamuffin.org/artist-correl/The%20Beatles) -- [Aimer](https://metamuffin.org/artist-correl/Aimer) -- [Rammstein](https://metamuffin.org/artist-correl/Rammstein) -- [Mitski](https://metamuffin.org/artist-correl/Mitski) - -## Numbers - -- Musicbrainz: 15GB -- Listenbrainz: 350GB -- Extracted listening data: 23GB -- Graph: 56GB -- Rendered HTML: 2.3GB -- Compressed HTML (squashfs with zstd): 172MB diff --git a/content/makefile b/content/makefile deleted file mode 100644 index 84e109f..0000000 --- a/content/makefile +++ /dev/null @@ -1 +0,0 @@ -include ../code/makefile \ No newline at end of file diff --git a/content/style.css b/content/style.css deleted file mode 100644 index 3e19951..0000000 --- a/content/style.css +++ /dev/null @@ -1,44 +0,0 @@ -@import url("https://s.metamuffin.org/static/font-ubuntu/include.css"); - -:root { - --bg2: #141414; - --bg1: #060606; -} - -body { - margin: 3em; - background-color: var(--bg1); -} - -nav,article { - margin: 1em; - background-color: var(--bg2); - padding: 1em; - border: 0px solid transparent; - border-radius: 1em; -} - -h1 {font-size: xx-large } -h2 {font-size: x-large } -h3 {font-size: large } -h4 {font-size: medium } - -p,h1,h2,h3,h4,h5,h6,li { font-family: "Ubuntu", sans-serif; } -h1,h2 { color: #b575ff } -h3 { color: #ff83fd; margin-left: 1em } -h4 { color: #ff82b2; margin-left: 2em } -p,li { color: white; margin-left: 3em } -a { color: #82a8ff; font-style: italic; text-decoration: underline } -hr { border: 1px solid grey } - -math {color:white} - -pre,code { - color: #eeeeee; - font-family: monospace; - background-color: black; - border: 0px solid transparent; - border-radius: 5px; - padding: 0.2em; -} -pre { margin-left: 4em; margin-right: 4em; overflow-y: auto; } -- cgit v1.2.3-70-g09d2