diff options
Diffstat (limited to 'articles')
-rw-r--r-- | articles/2022-08-29-blog-start.md | 9 | ||||
-rw-r--r-- | articles/2022-08-29-blog-test.md | 41 | ||||
-rw-r--r-- | articles/2022-08-30-blog-tools.md | 104 | ||||
-rw-r--r-- | articles/2022-09-19-gctf-readysetaction.md | 84 | ||||
-rw-r--r-- | articles/2022-09-25-ductf-file-magic.md | 255 | ||||
-rw-r--r-- | articles/2022-11-07-programming-language-design.md | 70 | ||||
-rw-r--r-- | articles/2022-11-10-artist-correlation.md | 78 |
7 files changed, 641 insertions, 0 deletions
diff --git a/articles/2022-08-29-blog-start.md b/articles/2022-08-29-blog-start.md new file mode 100644 index 0000000..2f17e6c --- /dev/null +++ b/articles/2022-08-29-blog-start.md @@ -0,0 +1,9 @@ +# Starting my blog + +This will be the place where I will post short articles about my thoughts and projects in the future, whenever I feel like it. + +A list of topics of which I already have an idea in mind are: + +- The tools I used to create this blog +- My dream operating system (maybe even logs about me trying (aka failing) to implement it) +- In search of a perfect text editor diff --git a/articles/2022-08-29-blog-test.md b/articles/2022-08-29-blog-test.md new file mode 100644 index 0000000..1c4a28a --- /dev/null +++ b/articles/2022-08-29-blog-test.md @@ -0,0 +1,41 @@ +# Blog test + +Hello world! + +## heading + +_italic_ **bold** `code` [Link](https://metamuffin.org/) + +### sub heading + +Look at this code here. + +```rs +#[derive(Default)] +struct Impossible<T> { + something: HashMap<usize, T>, + // ^- look here, a keyword! --the grammar regex + itself: Impossible, +} + +pub async fn useless(s: Box<dyn Write>) -> impl Future<usize> { + todo!() // this panics +} +fn main() { + 1 + "blub" + for (x, y) in [(1,2),(3,4)] { + println!("{x} and {y}"); + Impossible::default(); + } +} +``` + +As you can see, its pointless i.e. no points + +- list +- list +- list + +1. first this +2. then that +3. done diff --git a/articles/2022-08-30-blog-tools.md b/articles/2022-08-30-blog-tools.md new file mode 100644 index 0000000..81d1ff9 --- /dev/null +++ b/articles/2022-08-30-blog-tools.md @@ -0,0 +1,104 @@ +# Tools I use to create this blog + +As you might expect, this blog uses some overengineered tools, after all, it +wouldn't be so much fun otherwise. This article is probably not too interesting +on its own. It's just "documentation". + +// TODO upload source code and link it here. + +## "Build system" + +```tree +. +├── ... (readme 'n stuff) +├── code +│ ├── Cargo.toml +│ └── src +│ └── ... (rust source) +└── content + ├── articles + │ ├── 2022-08-29-blog-start.md + │ ├── 2022-08-29-blog-test.md + │ └── 2022-08-30-blog-tools.md + ├── makefile + ├── out + │ ├── ... (generated files) + └── style.css +``` + +The entry point to this "build system" is `content/makefile`, it has rules for +generating HTM from the markdown sources, the index and the atom feed. The +"compiler" here is a small rust program. (`code`). + +```makefile +# oversimplifed, doesnt work +TOOL := blog-tool + +out/index: $(ALL_ARTICLES) $(TOOL) + $(TOOL) render-index > $@ +out/feed.atom: $(ALL_ARTICLES) $(TOOL) + $(TOOL) generate-atom > $@ +out/%: articles/%.md $(TOOL) + $(TOOL) render-article $< > $@ +``` + +A small trick here is, to make everything depend on the compiler (`$(TOOL)`) +too, so that when it changes, re-generation of all articles if triggered. + +## Rust tools + +I use the [laby](https://crates.io/crates/laby) crate for templating the HTM. + +// TODO what is important here?! + +### File server + +I wanted to serve all files without the file name extension (`.html`), but my +previous http server ([http-server](https://github.com/http-party/http-server)) +exclusively inferred the MIME type from the extension, which makes it impossible +to do that. The obvious solution was to reinvent the wheel. + +This had the great side-effect of making my website blazing fast 🚀🚀! :) + +#### http-server (node.js) + +``` +Running 10s test @ http://127.0.0.1:8080/test-file + 2 threads and 100 connections +Requests/sec: 2314.00 +Transfer/sec: 725.38KB +``` + +#### fileserver (rust, no caching) + +``` +Running 10s test @ http://127.0.0.1:8080/test-file + 2 threads and 100 connections +Requests/sec: 24464.69 +Transfer/sec: 3.10MB +``` + +// TODO also upload source code and link it + +### Syntax highlighing + +For that, I chose the crate [synoptic](https://crates.io/crates/synoptic). It +provides a some functions for defining tokens with a regex, then returning +start- and end-points of those for further processing. For example, this is how +i defined comments and types: + +```rs +let rust_grammar = &[ + (&["(?m)(//.*)$"], "comment"), + (&["[A-Z][a-z]*", "bool", "usize", /* simplified */ ], "type"), + /* more rules */ +] +``` + +The library finds all the tokens and lets me serialize them to HTM. + +## End + +I still need to learn, how to write this blog well and what is the most +interesting to read. Please give me some advice or commentary via matrix, mail +or fedi. (see [contact](https://metamuffin.org/contact)) diff --git a/articles/2022-09-19-gctf-readysetaction.md b/articles/2022-09-19-gctf-readysetaction.md new file mode 100644 index 0000000..0d5b719 --- /dev/null +++ b/articles/2022-09-19-gctf-readysetaction.md @@ -0,0 +1,84 @@ +# Google Beginner CTF: ReadySetAction + +**SPOILER WARNING: Go play the CTF at +[capturetheflag.withgoogle.com](https://capturetheflag.withgoogle.com/) first** + +The challenge was a short python script with the following code: + +```py +from Crypto.Util.number import * + +flag = b"REDACTED" + +p = getPrime(1024) +q = getPrime(1024) +n = p*q + +m = bytes_to_long(flag) + + +c = pow(m,3,n) + +print(c) +print(n) +#154780 ... 6709 +#210348 ... 4477 +#(i removed both 617 digit long numbers here) +``` + +It was immediatly obvious to me that this has to do with RSA cryptography +because it multiplies large primes and then does some powmod magic. Because I +didn't remember exactly how RSA worked, I quickly read the Wikipedia article +again - specifically section +[4.1 Attacks against plain RSA](https://en.wikipedia.org/wiki/RSA_%29cryptosystem%29#Attacks_against_plain_RSA) + +The first item already seems like the solution: + +> When encrypting with low encryption exponents (e.g., e = 3) and small values +> of the m (i.e., m < n1/e), the result of me is strictly less than the modulus +> n. In this case, ciphertexts can be decrypted easily by taking the eth root of +> the ciphertext over the integers. + +Which would mean that $\sqrt[3]{c}$ is a solution _if_ $m$ is small enough. +However + +$c = m^3 \mod n \\$ + +so the exponentiation could yield a much higher result that we dont get because +it is wrapped at $n$. So we can try out how often it wrapped: + +$m = \sqrt[3]{c + x*n} \quad x \in \N\\$ + +where $x$ can be brute-forced if $m$ is sufficiently small. Back in python it +looks like this: + +```py +c = ... +n = ... + +# simple but bad binary search for n-th root +def nth_root(c,e): + high = 1 + while high**e < c: + high *= 2 + low = 0 + mid = 0 + last_mid = 0 + while mid**e != c: + mid = (low + high) // 2 + if last_mid == mid: + return 0 # i use 0 to indicate a failure + if mid**e > c: + high = mid + else: + low = mid + last_mid = mid + return mid + +for x in range(100000): + m = nth_root(c + n*x, e) + # the probability of finding a perfect cube number is low, so any result is what we want + if m != 0: print(long_to_bytes(m)) +``` + +Thats it! With $x=1831$ we get our flag: `CTF{34sy_RS4_1s_e4sy_us3}` diff --git a/articles/2022-09-25-ductf-file-magic.md b/articles/2022-09-25-ductf-file-magic.md new file mode 100644 index 0000000..f4b55c9 --- /dev/null +++ b/articles/2022-09-25-ductf-file-magic.md @@ -0,0 +1,255 @@ +# DownUnderCTF 2022: File Magic + +A short writeup about my favorite challenge from DUCTF. It took me approximatly +12h to solve. I found it was the most creative and challenging one that i +solved. + +## Task + +The challenge consists of a python script and an ip-port pair which appears to +be running that script. Also the path of the flag is given: `./flag.txt` + +```py +#!/usr/bin/env python3 + +from Crypto.Cipher import AES +from PIL import Image +from tempfile import NamedTemporaryFile +from io import BytesIO +import subprocess, os + +KEY = b'downunderctf2022' + +iv = bytes.fromhex(input('iv (hex): ')) +assert len(iv) == 16 and b'DUCTF' in iv, 'Invalid IV' + +data = bytes.fromhex(input('file (hex): ')) +assert len(data) % 16 == 0, 'Misaligned file length' +assert len(data) < 1337, 'Oversized file length' + +data_buf = BytesIO(data) +img = Image.open(data_buf, formats=['jpeg']) +assert img.width == 13 and img.height == 37, 'Invalid image size' +assert img.getpixel((7, 7)) == (7, 7, 7), 'Invalid image contents' + +aes = AES.new(KEY, iv=iv, mode=AES.MODE_CBC) +dec = aes.decrypt(data) +assert dec.startswith(b'\x7fELF'), 'Not an ELF' + +f = NamedTemporaryFile(delete=False) +f.write(dec) +f.close() + +os.chmod(f.name, 0o777) +pipes = subprocess.Popen([f.name], stdout=subprocess.PIPE) +stdout, _ = pipes.communicate() +print(stdout.decode()) + +os.remove(f.name) +``` + +So, for anything to make it past these checks and be executed it must: + +1. be a valid 13x37 JPEG image with the pixel at 7,7 set to `#070707` +2. be a valid ELF binary that reads `./flag.txt` after decrypting with AES CBC, + fixed key (`downunderctf2022`) and the provided IV +3. The IV must contain `DUCTF` + +During the competition I discovered the information in the next three headings +in parallel but internally in-order. + +## 1. AES CBC "flaw" + +We need to generate a file that is a sort-of polyglot with JPEG and ELF, +converted with AES CBC (Cipher block chaining). + +AES itself operates on 16-byte (for 128-bit AES) blocks, so bigger files are +split and then encrypted separately. To ensure that identical blocks don't +result in identical blocks in ciphertext, each block is first xor'd with +something that won't be identical. In the case of CBC, the last ciphertext block +or the initialisation vector (IV) is used. Here is a diagram for encryption + +``` + ___plaintext____|___plaintext____|___plaintext____|... + v v v +IV--->XOR ,---------->XOR ,--------->XOR ,---- ... + v | v | v | + AES | AES | AES | + v---' v---' v---' + ___ciphertext___|___ciphertext___|___ciphertext___|... +``` + +For decryption we can just flip the diagram and replace AES with reverse AES. + +``` + ___ciphertext___|___ciphertext___|___ciphertext___|... + v---, v---, v---, + ∀EZ | ∀EZ | ∀EZ | + v | v | v | +IV--->XOR '---------->XOR '--------->XOR '---- ... + v v v + ___plaintext____|___plaintext____|___plaintext____|... +``` + +This does make sense, however i noticed that all but the first block do not +depend on IV. This turns out useful since we can turn any block into any other +block by applying a chosen value with XOR. So we can control the ciphertext with +the IV directly as follows: + +- $m$: first plaintext block +- $c$: first ciphertext block + +$$ c = AES(m \oplus IV) \\ + +AES^{-1}(c) = m \oplus IV \\ + +AES^{-1}(c) \oplus m = IV $$ + +All blocks in ciphertext after the first are now "uncontrollable" because IV and +plaintext are set. + +## 2. JPEG + +JPEG consists of a list of _segments_. Each starts with a marker byte (`ff`) +followed by a identifier and the length of the segment (if non-zero). + +| Identifier | Name | +| ---------- | ----------------------------------------------- | +| `d8` | Start of Image | +| `fe` | Comment | +| `d9` | End of Image | +| ... | _a bunch more that you dont need to know about_ | + +The comment segment is perfect for embedding our ELF binary into JPEG. We can +first generate a JPEG image, then insert a _comment_ somewhere containing any +data we want. + +## 3. ELF Payload + +The binary needs to be super small so creating it "manually" was required. I +followed the guide +[Creating a minimal ELF-64 file by tchajed](https://github.com/tchajed/minimal-elf/) +and recreated it for my needs. Like in the guide i also wrote the assembly with +a rust EDSL. + +```rs +let mut str_end = a.create_label(); +let mut filename = a.create_label(); +a.jmp(str_end)?; // jump over the string +a.set_label(&mut filename)?; +a.db(b"flag.txt\0")?; +a.set_label(&mut str_end)?; + +// open("flag.txt", O_RDONLY) +a.mov(eax, 2)?; +a.lea(rdi, ptr(filename))?; +a.mov(rsi, 0u64)?; +a.syscall()?; // fd -> rax + +// sendfile(1, rax, 0, 0xff) +a.mov(rsi, rax)?; +a.mov(eax, 40)?; +a.mov(rdi, 1u64)?; +a.mov(rdx, 0u64)?; +a.mov(r10, 0xffu64)?; +a.syscall()?; +``` + +I was able to produce a 207 byte long executable. + +## 4. _magic!_ + +Here is how we align the files now (single quotes `'` indicate ASCII +representation for clarity, question marks `?` represent data that is implicitly +defined): + +``` +plaintext: 7f 'E 'L 'F 02 01 01 00 00 00 00 00 00 00 00 00 +iv: ?? ?? ?? ?? ?? ?? 'D 'U 'C 'T 'F 00 00 00 00 00 +ciphertext: ff d8 ff fe {len} ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? +``` + +This scenario is a little more complicated because in some places we define +ciphertext and plaintext and in some we define ciphertext and IV. This is not a +problem because XOR operates on every byte individually. + +All-in-all, the file looks like this now: + +- First block: + - overlapping headers (6 bytes) + - `DUCTF` string (5 bytes) + - padding (5 bytes) +- Rest of the ELF binary +- _end of the jpeg comment_ +- Rest of the JPEG image + +All this information should be enough to solve this challenge! + +I here attach the implementation that i created _during_ the CTF here. I kept it +as messy as it was, just removed not-so-interesting code and added comments. + +```rs +let mut i = image::RgbImage::new(13, 37); +// jpeg is lossy so filling guarantees the pixel actually has the *exact* color +i.pixels_mut().for_each(|p| *p = Rgb([7, 7, 7])); + +let mut imgbuf = Cursor::new(vec![]); +image::codecs::jpeg::JpegEncoder::new(&mut imgbuf) + .encode_image(&i) + .unwrap(); + +let key = Aes128::new(&GenericArray::from(*b"downunderctf2022")); +let mut payload = create_elf_payload(); +let mut buf = BytesMut::new(); + +// pad payload so AES works +while payload.len() % 16 != 0 { + payload.put_u8(0); +} +assert!(payload.len() % 16 == 0); + +let prefix_len = 6; +// write the JPEG headers to start a comment +buf.put_u16(0xffd8); +buf.put_u16(0xfffe); +buf.put_u16(payload.len() as u16 + 2 /*seg len*/ - prefix_len as u16); + +// find a good IV +let iv = { + let mut fblock_target = vec![0u8; 16]; + fblock_target[0..prefix_len].copy_from_slice(&buf[0..prefix_len]); + key.decrypt_block(GenericArray::from_mut_slice(&mut fblock_target)); + let mut iv = xor_slice(&fblock_target, &payload[0..16]); + iv[prefix_len..prefix_len + 5].copy_from_slice(b"DUCTF"); + iv +}; + +// fill the first block up with zeroes +for _ in prefix_len..16 { + buf.put_u8(0); +} + +// encrypt starting with the 2nd block +buf.put(&payload[16..]); +let block_range = |n: usize| (n) * 16..(n + 1) * 16; +for i in 1..buf.len() / 16 { + let tb = Vec::from(&buf[block_range(i - 1)]); + xor_assign(&mut buf[block_range(i)], &tb); + key.encrypt_block(GenericArray::from_mut_slice(&mut buf[block_range(1)])); +} + +// append the rest of the image, stripping the first segment +buf.put(&imgbuf.into_inner()[2..]); + +// pad the final buffer again because the image might not be aligned +while buf.len() % 16 != 0 { + buf.put_u8(0); +} +assert!(buf.len() < 1337); + +File::create("image").unwrap().write_all(&buf).unwrap(); +File::create("iv").unwrap().write_all(&iv).unwrap(); +``` + +I am also still looking for team mates for upcoming CTF events and would be +happy to hack together with you! Just [contact](https://metamuffin.org/contact) me. diff --git a/articles/2022-11-07-programming-language-design.md b/articles/2022-11-07-programming-language-design.md new file mode 100644 index 0000000..674e696 --- /dev/null +++ b/articles/2022-11-07-programming-language-design.md @@ -0,0 +1,70 @@ +# Some Thoughts on Programming Language Design + +This is a collection of ideas to look at when inventing new langauges. + +## Other Ideas + +- The Language pushes abstraction to the limit by not noting any + hardware-related issues like memory-allocations, parallelism, heterogenous + computer architecture (CPU, GPU, …) + - requires a very "intellegent" compiler and a way to describe unknowns like + possible inputs, etc. in the language itself +- Start with assembly but add a very flexible macro system + +## Type System + +```diff +# Haskell +data LinkedList a = Nil | Cons a (Box (LinkedList a)) +data Test = Empty | Blub Int | State { x :: Int, y :: Int } +# Rust +enum LinkedList<T> { Nil, Cons(T, LinkedList<T>) } +``` + +## Memory Management + +- **Drop when out-of-scope** +- Garbage collections +- Reference counting + +## Compile-time logic + +- Annotation when calling function to be run as-far-as-possible at comptime + +```diff +fn format(template: String, args: [String]) -> String { + template.replace("@", (match, i) => args[i]) +} + +fun add(x, y) x + y + +fun main() print(format!("@ ist @; @", ["1+1", 1+1, x])) +# should expand to +fun main() print("1+1 ist 2; " ~ x)) +``` + +## Examples + +### Fizz-Buzz + +```diff +for (n in 0..100) { + if (n % (3*5) == 0) print("FizzBuzz") + else if (n % 3 == 0) print("Fizz") + else if (n % 5 == 0) print("Buzz") + else print(n) +} + + +if (true) x = 1 +if (true) { x = 1 } +``` + +``` +f(x) = 10 + g(x) +f x = 10 + g x + +main = { + +} +``` diff --git a/articles/2022-11-10-artist-correlation.md b/articles/2022-11-10-artist-correlation.md new file mode 100644 index 0000000..d52e613 --- /dev/null +++ b/articles/2022-11-10-artist-correlation.md @@ -0,0 +1,78 @@ +# Correlating music artists + +I listen to a lot of music and so every few months my music collection gets +boring again. So far I have asked friends to recommend me music but I am running +out of friend too now. Therefore I came up with a new solution during a few +days. + +I want to find new music that i might like too. After some research I found that +there is [Musicbrainz](https://musicbrainz.org/) (a database of all artists and +recordings ever made) and [Listenbrainz](https://listenbrainz.org/) (a service +to which you can submit what you are listening to). Both databases are useful +for this project. The high-level goal is to know, what people that have a lot of +music in common with me, like to listen to. For that the shared number of +listeners for each artist is relevant. I use the word 'a listen', to refer to +one playthrough of a track. + +## The Procedure + +### Parse data & drop unnecessary detail + +All of the JSON files of listenbrainz are parsed and only information about how +many listens each user has submitted for what artist are kept. The result is +stored in a B-tree map on my disk (the +[sled library](https://crates.io/crates/sledg) is great for that). + +- First mapping created: `(user, artist) -> shared listens`. +- (Also created a name lookup: `artist -> artist name`) + +The B-Tree stores values ordered, such that i can iterate through all artists of +a user, by scanning the prefix `(user, …`. + +### Create a graph + +Next an undirected graph with weighted edges is generated where nodes are +artists and edges are shared listens. For each user, each edge connecting +artists they listen to, the weight is incremented by the sum of the logarhythms +of either one's playthrough count for that user. This means that artists that +share listeners are connected and because of the logarhythms, users that listen +to an artist _a lot_ won't be weighted proportionally. + +Mapping: `(artist, artist) -> weight`. (Every key `(x, y)` is identical with +`(y, x)` so that edges are undirectional.) + +### Query artists + +The graph tree can now be queried by scanning with a prefix of one artist +(`("The Beatles", …`) and all correlated artists are returned with a weight. The +top-weighted results are kept and saved. + +### Notes + +Two issues appeared during this project that lead to the following fixes: + +- Limit one identity to 32 artists at most because the edge count grows + quadratically (100 artists -> 10000 edges) +- When parsing data the user id is made dependent of the time to seperate arists + when music tastes changing over time. Every 10Ms (~4 months) the user ids + change. + +## Results + +In a couple of minutes I rendered about 2.2 million HTML documents with my +results. They are available at `https://metamuffin.org/artist-correl/{name}`. +Some example links: + +- [The Beatles](https://metamuffin.org/artist-correl/The%20Beatles) +- [Aimer](https://metamuffin.org/artist-correl/Aimer) +- [Rammstein](https://metamuffin.org/artist-correl/Rammstein) +- [Mitski](https://metamuffin.org/artist-correl/Mitski) + +## Numbers + +- Musicbrainz: 15GB +- Listenbrainz: 350GB +- Extracted listening data: 23GB +- Graph: 56GB +- Rendered HTML: 2.3GB +- Compressed HTML (squashfs with zstd): 172MB |