aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 2f31d1fcaa2fc7236082c4433da2e6a624a3c39c (about) (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
# Furgit

[![builds.sr.ht status](https://builds.sr.ht/~runxiyu/furgit.svg)](https://builds.sr.ht/~runxiyu/furgit)
[![Go Reference](https://pkg.go.dev/badge/codeberg.org/lindenii/furgit.svg)](https://pkg.go.dev/codeberg.org/lindenii/furgit)

Furgit is a low-level Git library in Go.

## Project status

* Several years away from stable
* Do not use in production
* Mature alternative: [go-git](https://github.com/go-git/go-git)
* Will use [Semantic Versioning 2.0.0](https://semver.org/spec/v2.0.0.html) starting at 1.0.0

## Goals

* General-purpose Git plumbing library for UNIX-like systems
* Aim for clear architecture then high performance
* Expect familiarity with Git internals

## Finding your way around

If you are working with an on-disk repository, start with
`repository.Open(...)`. It opens the repository and wires together the refs storage, object
storage, and resolver.

That gives you a repository handle with a few different entry points, but they
serve different purposes:

* `repo.Refs()` is for branch names, tags, `HEAD`, and ref updates.
  * Use it when you are starting from names rather than object IDs.
  * A common pattern is to resolve a ref first, then pass the resulting object
    ID to the resolver.

* `repo.Resolver()` is the main object-facing API for most callers.
  * Use it when you want commits, trees, blobs, or tags as typed values.
  * It also handles peeling through annotated tags, resolving objects to the
    type you actually want, and walking paths inside trees.
  * It even allows you to access a tree as an `io/fs.FS`.
  * If your goal is "show me this commit", "read this tree", "follow this tag",
    or "get me the file at this path", this is usually the right layer.

* `repo.Objects()` is the storage layer underneath resolution.
  * Use it when you need to read object headers, read raw object contents,
    stream object data, or otherwise look up objects directly by ID.
  * Most callers who want to work with Git objects as commits, trees, blobs, or
    tags should prefer the resolver instead.
  * However, checking an object ID's size and type are somewhat common
    operations that should be done here.

Some object concepts are kept separate:

* `object` contains parsed Git object values such as blobs, trees, commits, and
  tags. These are the decoded contents of Git objects and do not tell you
  anything about the object's identity.

* `object/stored` wraps a parsed object together with the object ID it was
  loaded from. This is used when you need both the parsed value and the
  identity it was loaded under.

As a rule of thumb:

* If you have a ref name, start with `repo.Refs()`.
* If you want typed objects or path-based access, use `repo.Resolver()`.
* If you need raw object lookup by ID, object headers, or object streams, use
  `repo.Objects()`.

Some useful operations are built separately and are meant to be constructed
over the stores that `Repository` already exposes:

* To check whether one revision is an ancestor of another, or to compute merge
  bases, construct a `commitquery.Query` over `repo.Objects()`.
  * This is the tool to reach for when you already have object IDs and want to
    ask commit-history questions.
  * If you already have a commit-graph reader, pass it in as well for
    performance.

* To walk commits or all reachable objects from a set of starting points,
  construct a `reachability.Reachability` over `repo.Objects()`.
  * Use commit traversal when you only care about history, and full object
    traversal when you care about the complete reachable object set.
  * This is useful for tasks such as connectivity checks and computing the
    object set that a fetch or push needs to account for.

* To accept pushes on the server side, construct `receivepack` or
  `receivepack/service` with the repository's ref store, object store, and
  object ID algorithm.
  * Push handling also needs the repository's object storage root so incoming
    objects can be quarantined and later promoted.
  * `Repository` does not currently expose that root directly (we'll consider
    possible solutions sometime later), so a push server usually keeps the
    repository path or object root handle alongside the `Repository` value.
  * Hook-based checks are just Go functions; then, a fast-forward check can use
    `commitquery` over the existing and quarantined object stores. Some hooks
    are provided.

## Features

* Configuration
  * [X] Parsing
  * [ ] Includes
  * [ ] Writing
* [X] Object IDs
  * [X] SHA-256
  * [X] SHA-1
* [X] Object model (incl., parse, serialize)
  * [X] Blobs
  * [X] Trees
    * [X] File mode definitions
    * [X] Entry insertion ordering
    * [X] Traversal
    * [ ] Pathspec
  * [X] Commits
  * [X] Annotated tags
  * [X] Stored objects
* Further cryptography
  * [ ] OpenPGP signatures
  * [ ] SSH signatures
* [X] Reading object stores
  * [X] Pluggable interface
  * [X] Chain lookup store
  * [X] Bundle store
  * [X] MRU lookup store
  * [X] Reading loose objects
  * [ ] Promisor remotes
  * [ ] Alternates
  * [X] Reading packed objects
    * [X] Pack index lookups
    * [X] Delta caching
    * [X] Delta application
    * [ ] Pack-wide bloom filters
    * [ ] Multi pack indexes
* [ ] Writing objects
  * [X] Loose object writing
* Misc bundle features
  * [ ] Writing bundles
* Misc packfile features
  * [X] Writing pack indexes
  * [X] Writing reverse pack indexes
  * [ ] Writing packfiles
    * [ ] Writing thin packs
    * [ ] Compressing deltas
      * [ ] Delta islands
  * [ ] Pack verification
* Compression
  * [ ] Plugabble compression algorithms
  * [X] ZLIB support
  * [ ] DEFLATE optimizations
  * [X] Adler-32 SIMD optimizations
* [X] References
  * [X] Detached references
  * [X] Symbolic references
  * [X] Name verification/resolution
  * [X] Annotated tag ref peeling
  * [ ] Describe
  * [ ] Revision syntax
  * [ ] Namespaces
  * [ ] Replace refs, grafts
* [X] Reference stores
  * [X] Chain lookup store
  * [X] Files reference store
    * [X] Reading loose refs
    * [X] Reading packed refs
    * [X] Atomic writes
    * [X] Batched writes
    * [ ] Packing refs
    * [ ] Reflogs
  * [ ] Reftable
* Reachability
  * [X] Have/wants walks
  * [X] Is ancestor
  * [X] Merge bases
  * [X] Commit graph
    * [X] Changed path bloom filters
    * [X] Chained graphs
    * [ ] Writing
  * [ ] Reachability bitmaps
    * [ ] For a single packfile
    * [ ] For Multi pack indexes
* Misc repository
  * [X] Opening relevant stores
  * [ ] Creating repositories
  * [ ] Filter branch/repo
  * [ ] Fast import/export
  * [ ] Git notes
  * [ ] Git attributes
  * [ ] Pseudorefs
  * Integrity and maintenance
    * [ ] Fsck
    * [ ] Repacking
    * [ ] Garbage collection
    * [ ] Cruft packing
    * [ ] Expiration
  * [ ] Grep
  * [ ] Submodules
  * [ ] Worktrees
  * [ ] Archive
  * [ ] LFS
  * [ ] Revision log walk
    * [ ] Topological ordering
    * [ ] Date ordering
    * [ ] Path-limited
* [ ] Diffing
  * [ ] Blame
  * [ ] Annotate
  * [X] Tree diffing
    * [ ] Similarity/rename/copy detection
  * [ ] Multi-way diffs
  * [ ] Patch-id
  * [ ] Range-diff
  * Blob diffing
    * [ ] Word diffs
    * [X] Myers
    * [ ] Patience
    * [ ] Histogram
    * [ ] Tree-way
  * [ ] Format patch
  * [ ] Apply/amend patch
* Branch integration/rewrite/etc methods
  * [ ] Merge
    * [ ] Recursive
    * [ ] ORT
  * [ ] Rebase
  * [ ] Cherry pick
  * [ ] Revert
  * [ ] Rerere
* Network protocols and related features
  * [X] pkt-line
  * [X] side-band-64k
  * [X] Ingesting packfiles
    * [X] Quarantine areas
    * [X] Un-thinning thin packs
  * Version 0, version 1 protocols
    * [X] Server side
      * [X] Reference advertisement
      * [X] Capability negotiation
      * [X] Receive
      * [ ] "Upload"
    * [ ] Client side
      * [ ] Send
      * [ ] Fetch
  * Version 2 protocol
    * [ ] Server side
      * [ ] "Upload"
    * [ ] Client side
      * [ ] Fetch
  * Protocol-independent logic
    * Common
      * [X] Progress meters
    * Client side
      * [ ] Refspec
      * [ ] Fetch
        * [ ] Partial clones
          * [ ] Object filtering
        * [ ] Bundle URI
        * [ ] Packfile URI
        * [ ] Shallow clones
      * [ ] Send
    * Server side
      * [ ] Upload
        * [ ] Object filtering
      * [X] Receive
        * [ ] Signed push
        * Hooks
          * Slots
            * [ ] After ref negotiation
            * [X] After object unpacking
          * Provided samples
            * [X] Chain
            * [X] Force push rejection
* [ ] Working trees
  * [ ] Stashing
  * [ ] Ignore rules
  * [ ] Checkouts
    * [ ] Sparse checkouts
    * [ ] CR/LF conversions
    * [ ] File mode conversions
  * [ ] Indexes
    * [ ] Conflict resolution
    * [ ] Split index
    * [ ] Sparse index
    * [ ] Untracked cache
  * [ ] Status listing
  * [ ] Filesystem monitor
  * [ ] Worktree
    * [ ] Common directory
    * [ ] Worktree-specific references
* Research
  * [ ] Dynamic packfiles
    * [ ] Compaction; page-sized hole punching
    * [ ] Dynamic indexing
      * [ ] Linear/extendible/spiral hashing
    * [ ] Dynamic reachability bitmaps

## Not planned

* CLI tools
* Clone
* Anything reasonably considered "porcelain"
* Credential helper
* Transports
* Auth
* Remote management
* Bisect
* Any use of env vars
* Repository discovery walking

I might make a second project that supports these.
Furgit will probably not, and will remain sans-IO.

## Benchmarks

* See [gitbench](https://git.sr.ht/~runxiyu/gitbench).
* `legacy` branch furgit is slightly faster due to buffer reuse and custom
  ZLIB. These will be re-added.
* Alpine edge, i5-10210U, `performance` governor, `linux.git`.
* go-git may become much faster when
  [#1894](https://github.com/go-git/go-git/pull/1894)
  and such are fully in use.
* These lone tests do not represent all workloads. Test your usage
  pattern yourself (and contribute to gitbench).

### Traversing all trees in `HEAD` and fetching each file size

Mainly tests the packfile object reader.

| Implementation | Total  | User   | System |
| -              | -      | -      | -      |
| Git            | 337 ms | 226 ms | 108 ms |
| libgit2        | 391 ms | 269 ms | 120 ms |
| Furgit         | 487 ms | 457 ms | 49 ms  |
| go-git         | 37 s   | 35 s   | 2 s    |

## Repos and mirrors

* [Codeberg](https://codeberg.org/lindenii/furgit) (with the canonical issue tracker)
* [SourceHut mirror](https://git.sr.ht/~runxiyu/furgit)
* [tangled mirror](https://tangled.org/@runxiyu.tngl.sh/furgit)
* [GitHub mirror](https://github.com/runxiyu/furgit)

## Community

* [#lindenii](https://webirc.runxiyu.org/kiwiirc/#lindenii)
  on [irc.runxiyu.org](https://irc.runxiyu.org)
* [#lindenii](https://web.libera.chat/#lindenii)
  on [Libera.Chat](https://libera.chat)

## History and lineage

* Lindenii Forge
* [hare-git](https://codeberg.org/lindenii/hare-git)
* Faster Git library needed for
  [Lindenii Villosa](https://codeberg.org/lindenii/villosa)
  the next generation of Lindenii Forge
* Translated hare-git and put it into `internal/common/git` in Villosa
* Extracted it out into this general-purpose library
* "Fur" is "git" left-shifted by 1 on QWERTY
* Some architectural elements inspired by [upstream Git](https://git-scm.com),
  OpenBSD's [Game of Trees](https://gameoftrees.org), and
  [9front Git](https://git.9front.org/plan9front/9front/HEAD/sys/src/cmd/git/f.html).

## Reporting bugs

Bug reports ideally include a reproduction recipe: a Go program which starts
out with an empty repository and calls Furgit and/or Git commands to trigger
undesirable behavior.

Please ask for help with writing your regression test before asking for your
problem to be fixed. Time invested in writing a regression test saves time
wasted on back-and-forth discussion about how the problem can be reproduced. A
regression test will need to be written in any case to verify a fix and prevent
the problem from resurfacing.

If writing an automated test really turns out to be impossible, please explain
in very clear terms how the problem can be reproduced.

## License

This project is licensed under the GNU Affero General Public License,
Version 3.0 only.

Pursuant to Section 14 of the GNU Affero General Public License, Version 3.0,
[Runxi Yu](https://runxiyu.org) is hereby designated as the proxy who is
authorized to issue a public statement accepting any future version of the
GNU Affero General Public License for use with this Program.

Therefore, notwithstanding the specification that this Program is licensed
under the GNU Affero General Public License, Version 3.0 only, a public
acceptance by the Designated Proxy of any subsequent version of the GNU Affero
General Public License shall permanently authorize the use of that accepted
version for this Program.

For the purposes of the Developer Certificate of Origin, the "open source
license" refers to the GNU Affero General Public License, Version 3.0, with the
above proxy designation pursuant to Section 14.

All contributors are required to "sign-off" their commits (using `git commit
-s`) to indicate that they have agreed to the [Developer Certificate of
Origin](https://developercertificate.org), reproduced below.

```
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.
```