diff options
Diffstat (limited to 'articles')
-rw-r--r-- | articles/2024-11-28-optimal-computer-part-1.md | 131 |
1 files changed, 131 insertions, 0 deletions
diff --git a/articles/2024-11-28-optimal-computer-part-1.md b/articles/2024-11-28-optimal-computer-part-1.md new file mode 100644 index 0000000..ca67c98 --- /dev/null +++ b/articles/2024-11-28-optimal-computer-part-1.md @@ -0,0 +1,131 @@ +# My dream computer (DRAFT) + +## Meta stuff + +For multiple years now I have had a vision for how a perfect operating system +should be designed. The specifics transformed slightly over the years but have +generally stood the same. However I have never actually gotten to implement this +design that envisioned, because of some or another problem that was hard to +solve at the time. I have still learned many things with my failed attempts and +considered more radical ideas too. In my continuing efforts in implementing my +ideas some day I want to write my most important thoughts on this topic in the +next few days because I realized that a problem of such great complexity is hard +to solve just within my head. These notes should be helpful for me but may also +interest others and so are published here. + +During my time of failing to implement the thing I found myself questioning +lower-level things a lot and so reevaluated these too and came to the conclusion +that these should be redesigned aswell. For my overview I will therefore start +from the bottom. + +In the rest of this article I am also just writing what I think is true or close +to the truth even for topics where I know less. I expect some things to be +wrong. In that case, please tell me. + +## Content + +My goal is it to design a _Computer system_ (You should know what that means to +you.). The heart of the physical part of this machine is a _Processor_, the +remaining parts store data or interface with the outside world or user and are +of less interest to me. Software is also important and I will write more about +this in the next posts. + +For performing the actual computations many design principles have been found in +the past: You could use a mechanical contraption, DNA/Protein's biochemical +reactions or anything really. Electric circuits however are most useful and +widely deployed today and since I do not have the expertise, I am not +reconsidering at this level. + +With electronic semiconductor logic gates you can arrange circuits in form of a +silicon die that perform most arithmetic and logic operations. The arrangement +of those gates on such a die is immutable however, meaning it is not possible to +be reprogrammed in that way. It is therefore required to build a circuit that is +general-purpose and can be programmed with data ("software"). + +This problem has been approached from multiple sides before. An FPGA is where +relatively small clusters of gates can be arranged to circuits by switching +connecting gates (that's a simplification). [Write how "traditional" CPUs work] +In the end there are many ways to achieve the required semi-turing-completeness +required for a computer. The performance of these does differ though. From +observing optimization in current CPUs it seems that the **useful computation +per space per time** matters most. The minimum time interval for computations +(cpu clock) does primarily depend on the size of the die (which is constraint by +the manufacturing technology which I consider fixed) and distance of components +on the die. A major part in optimizing _computational density_ of a processor is +keeping all parts as busy with computation as possible. + +I see a tradeoff between having small reconfigurable parts (like FPGAs) where +the overhead of reconfiguration is large (as in: many "useless" gates for +programming) and big parts where you lack the flexibility optimize the software. +In an unoptimized basic CPU for example, only one part would active at a time. +(The ADD instruction only performs an addition while other destinct components +like the multiplier, logic unit or memory are idle). Modern CPU design avoid +this to some extent with pipelining, speculative execution and simultaneous +multithreading which however introduce new complexity and also overhead for this +coordination. Also the program execution timing varies at runtime which +introduces additional problems. The concept of _Very Long Instruction Word_ +architectures appeals to me. It describes architectures where an instruction +specifies the configuration of most (usually all) components of the processor at +once. + +Next I am describing my design: Firstly the processor is made up of a number of +_modules_ that have _inputs_, _outputs_ and _parameters_. These modules' inputs +are then connected with data multiplexers to allow abitrary cross-module data +flow. Parameters of modules are bits that configure the modules operating within +one clock cycle of the processor. These parameters also include an output index +for each input of the module to obtain the data for this cycle from. The +processor operates in clock cycles where each cycle a new instruction is loaded +and executed. This module system is very flexible; Here are a few examples of +useful modules for a processor: + +- Add/Sub unit: Two inputs and one output. A flag to switch subtract mode. +- Literal loader: One output set to a value set by 32 or so parameter bits. + Essentially providing a way to introduce new constants from program code +- Memory loader: One input (address) and an output that takes on the value of + the RAM at the provided address. +- Flow controller: Two inputs, a condition and a program counter value, that set + the PC when a condition is met, otherwise increments the PC as normal. +- Memory storer: (left as an exercise to the reader) +- Logic unit +- Mul/Div unit +- Floating-point unit + +To balance the load between modules, the most busy units can be contained +multiple times within the processor, allowing for example to perform two +addition per cycle. + +Additionally, since typical RAM is rather slow, it is also possible to introduce +a _latency_ to some modules like that memory loader that need main memory where +there is internal uninterupted pipelining for memory fetches such that, for +example memory fetches arrive with 4 cycles delay and jumps happen 4 cycles +after beeing issued. This always assumes the worst case memory latency and can +not utilise a cache. You could however reimplement a cache manually be adding +secondary "fast" memory that is local but smaller where you load data into with +software. + +Still current superscalar CPUs can be smarter about module load balancing and +memory caching because of their runtime knowledge, that a compiler does not +have. In general, writing a _good_ compiler for this architecture (assuming a +traditional language) is relatively hard. + +## End + +Thats it for today. The writing of this article has been motivated by me +"participating" in +[Macoy Madson's Simple Useful System Challenge](https://macoy.me/blog/programming/SimpleUsefulSystem). +As usual I am always happy to receive feedback about what I am writing here. +[Send mail!](mailto:metamuffin@disroot.org) + +Next article will be more about my ideas about software. The overview is: + +- Compiler/Language design + - Memory safety + - Type systems + - Representing hardware execution/performance characteristics + - Memory management + - Data flow oriented programming +- Software design + - Permission checks at compile-time + - Cooperative multithreading asserted by compiler + - No driver-program boundary + - Global Safety |