aboutsummaryrefslogtreecommitdiff
path: root/research/dynamic_packfiles.txt
diff options
context:
space:
mode:
authorGravatar Runxi Yu2026-03-11 11:43:17 +0800
committerGravatar Runxi Yu2026-03-11 11:43:17 +0800
commit36a878e8a7736bebe852c66cc4e97e9711ee2124 (patch)
tree7ecc947c3a5f1a5a1fbc71fe9b75b5f477060e9b /research/dynamic_packfiles.txt
parentREADME: go-git perf improved (diff)
signatureNo signature
research: dpack update
Diffstat (limited to 'research/dynamic_packfiles.txt')
-rw-r--r--research/dynamic_packfiles.txt35
1 files changed, 25 insertions, 10 deletions
diff --git a/research/dynamic_packfiles.txt b/research/dynamic_packfiles.txt
index 66c5d5a1..be5c5fd9 100644
--- a/research/dynamic_packfiles.txt
+++ b/research/dynamic_packfiles.txt
@@ -9,19 +9,34 @@ then, if desired, the repack process removes all the punched holes
and anything surrounding from unwanted objects that are slightly out
of the page boundary
-.idx is not a bsearch because that would cause me to need to rewrite
-the entire pack every time i add objects; instead use an extendible
-hash table.
-
genreational bloom filters
+idx design
+==========
+
+so, let's first get our invariants and patterns clear.
+
+* fixed-length cryptographic object IDs
+* essentially uniform key distribution
+* exact lookup only, no range scans, no ordered iteration requirements
+* reads are extremely important
+* writes are mostly append-like
+* deletes/tombstones may happen later but are secondary
+
+1st design
+----------
+
+* mutable front index
+* immutable base index
+* period merge/compaction into a new base generation
-research bitcask
+upload-pack/send-pack/repack
+============================
-fetch: take current pack, remove dead objects/holes, filter objects
-out, record offsets and adjust ofs_deltas since they always go
-backwards, write the pack back; then stream written pack to client.
-two-step necessary because pack header includes object count; could
-have a custom new protocol that doesn't do so.
+take current pack, remove dead objects/holes, filter objects out, record
+offsets and adjust ofs_deltas since they always go backwards, write the pack
+back; then stream written pack to client. two-step necessary because pack
+header includes object count; could have a custom new protocol that doesn't do
+so.