aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 29997aefc07da313029b21a30e2a3f19a89f649c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# embeddings-sort

This program can sort images such that ones with similar motives are close together. This is accomplished by using [AI](https://github.com/minimaxir/imgbeddings) to extract the meaning of the image, and then approximating a travelling-salesperson-tour through all of them.  
As a bonus feature, this program can also sort the images by hue, brightness or color, though the results for this could be improved by using a less generalized algorithm.

The sorting can be accessed by letting the progam print the image paths in order, or by copying/symlinking the images into a new directory.

Detailed usage:

```
Usage: embeddings-sort [OPTIONS] [IMAGES]...

Arguments:
  [IMAGES]...  

Options:
  -e, --embedder <EMBEDDER>        Characteristic to sort by [default: content-euclidean] [possible values: brightness, hue, color, content-euclidean, content-angular-distance, content-manhatten]
  -s, --symlink-dir <SYMLINK_DIR>  Symlink the sorted images into this directory
  -o, --copy-dir <COPY_DIR>        Copy the sorted images into this directory. Uses COW when available
  -c, --stdout                     Write sorted paths into stdout, one per line
  -0, --stdout0                    Write sorted paths into stdout, null-separated. Overrides -c
  -b, --benchmark                  Output total tour length to stderr
      --tsp-approx <TSP_APPROX>    Algorithm for TSP approximation. Leave as default if unsure [default: christofides] [possible values: mst-dfs, christofides, christofides-refined]
  -h, --help                       Print help
```

## Insides
The chrisofides implementation uses an approximated min-weight matching algorithm, which may be non-ideal, though I haven't benchmarked how much of a difference it makes (mainly due to the implementation complexity of an exact algorithm, which would also increase the implementations complexity from O(n²) to O(n³) where n is the number of given images).

christofides-refined is planned to be christofides but with an O(n²) 2-opt-swapping step added after the main algorithm. Implementing this efficiently will also require some algorithmic trickery, so it's not ready yet.