CatgirlIntelligenceAgency/code/libraries/random-write-funnel
Viktor Lofgren 1d34224416 (refac) Remove src/main from all source code paths.
Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one.

While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules.  Which you'll do a lot, because it's *modul*ar.  The src/main/java convention makes a lot of sense for a non-modular project though.  This ain't that.
2024-02-23 16:13:40 +01:00
..
java/nu/marginalia/rwf (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
test/nu/marginalia/rwf (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
build.gradle (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
readme.md (doc) Update RandomWriteFunnel documentation 2024-02-06 12:35:24 +01:00

This micro-library with strategies for solving the problem of write amplification when writing large files out of order to disk. It offers a simple API to write data to a file in a random order, while localizing the writes.

Several strategies are available from the RandomFileAssembler interface.

  • Writing to a memory mapped file (non-solution, for small files)
  • Writing to a memory buffer (for systems with enough memory)
  • RandomWriteFunnel - Not bound by memory.

The data is written in a native byte order.

RandomWriteFunnel

The RandomWriteFunnel solves the problem by bucketing the writes into several temporary files, which are then evaluated to construct the larger file with a more predictable order of writes.

Even though it effectively writes 2.5x as much data to disk than simply attempting to construct the file directly, it is much faster than thrashing an SSD with dozens of gigabytes of small random writes, which is what tends to happen if you naively mmap a file that is larger than the system RAM, and write to it in a random order.

Demo

try (var rfw = new RandomWriteFunnel(tmpPath, expectedSize);
     var out = Files.newByteChannel(outputFile, StandardOpenOption.WRITE)) 
{
    rwf.put(addr1, data1);
    rwf.put(addr2, data2);
    // ...
    rwf.put(addr1e33, data1e33);
    
    rwf.write(out);
}
catch (IOException ex) {
    //
}

Central Classes