With Git 2.31 (Q1 2021), "git rev-list https://github.com/git/git/blob/6fe12b5215f4ca597accc97ac5dce0f88e8483e9/Documentation/git-rev-list.txt"(man https://git-scm.com/docs/git-rev-list) command learned --disk-usage
option.
它有一个很多例子 https://github.com/git/git/commit/a1db097e107749cf6f1c2dc878c615ca2d3fb314,但关于分支大小,现在的命令是:
git rev-list --disk-usage=human --objects HEAD..<branch_name>
对于所有分支机构:
/* Report the disk size of each branch, not including objects used by the
current branch. This can find outliers that are contributing to a
bloated repository size (e.g., because somebody accidentally committed
large build artifacts).
*/
git for-each-ref --format='%(refname)' |
while read branch
do
size=$(git rev-list --disk-usage --objects HEAD..$branch)
echo "$size $branch"
done |
sort -n
See commit a1db097 https://github.com/git/git/commit/a1db097e107749cf6f1c2dc878c615ca2d3fb314, commit 669b458 https://github.com/git/git/commit/669b4587555597da7f2a875b95dc50f503b8187f (17 Feb 2021), and commit 16950f8 https://github.com/git/git/commit/16950f8384afa5106b1ce57da07a964c2aaef3f7, commit 3803a3a https://github.com/git/git/commit/3803a3a0993045605d7f3db363188ce377e917c8 (09 Feb 2021) by Jeff King (peff) https://github.com/peff.
(Merged by Junio C Hamano -- gitster -- https://github.com/gitster in commit 6fe12b5 https://github.com/git/git/commit/6fe12b5215f4ca597accc97ac5dce0f88e8483e9, 25 Feb 2021)
rev-list https://github.com/git/git/commit/16950f8384afa5106b1ce57da07a964c2aaef3f7: 添加 --disk-usage 选项来计算磁盘使用情况
Signed-off-by: Jeff King
有时查看哪些引用对整个存储库大小有贡献是有用的(例如,某个分支是否有一堆在历史记录中其他地方找不到的对象,这表明删除它会缩小克隆的大小)。
您可以通过生成对象列表、从 cat 文件获取它们的大小,然后对它们求和来找到这一点,例如:
git rev-list --objects --no-object-names main..branch
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
但请注意, git-cat-file(1) 中的警告适用于此处。
我们更多地“责怪”基础对象而不是它们的增量,尽管这种关系很容易被颠倒。
尽管如此,这仍然是一个有用的粗略衡量标准。
但有一个问题是运行速度很慢。
教导 rev-list 来总结大小可以更快,原因有两个:
- 它跳过所有对象名称和大小的管道。
- 如果使用位图,对于位图包文件中的对象,我们可以跳过
oid_object_info()
完全查找,只需向 revindex 询问磁盘上的大小即可。
该补丁实现了--disk-usage
在很短的时间内产生相同答案的选项。
以下是使用 torvalds/linux 克隆的一些计时:
[rev-list piped to cat-file, no bitmaps]
$ time git rev-list --objects --no-object-names --all |
git cat-file --buffer --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m29.635s
user 0m38.003s
sys 0m1.093s
[internal, no bitmaps]
$ time git rev-list --disk-usage --objects --all
1459938510
real 0m31.262s
user 0m30.885s
sys 0m0.376s
尽管挂钟时间由于并行性而稍微差一点,但请注意两者之间的 CPU 节省。
仅通过避免使用管道,我们就节省了 21% 的 CPU 资源。
但真正的胜利在于位图。
如果我们在没有新选项的情况下使用它们:
[rev-list piped to cat-file, bitmaps]
$ time git rev-list --objects --no-object-names --all --use-bitmap-index |
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m6.244s
user 0m8.452s
sys 0m0.311s
那么我们可以更快地生成对象列表,但我们仍然花费大量时间进行管道和查找。
但如果我们一起做:
[internal, bitmaps]
$ time git rev-list --disk-usage --objects --all --use-bitmap-index
1459938510
real 0m0.219s
user 0m0.169s
sys 0m0.049s
然后我们就能更快地得到相同的答案。
当然,对于“--all”,该答案将与“du object/pack”密切对应。
但我们实际上在这里检查可达性,因此当我们要求更有趣的事情时我们仍然很快:
$ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10
374798628
real 0m0.429s
user 0m0.356s
sys 0m0.072s
rev-list-options
现在包含在其man page https://github.com/git/git/blob/16950f8384afa5106b1ce57da07a964c2aaef3f7/Documentation/rev-list-options.txt#L230-L238:
--disk-usage
抑制正常输出;相反,打印所用字节的总和
用于所选提交或对象的磁盘存储。这是
相当于将输出通过管道输送到git cat-file --batch-check='%(objectsize:disk)'
,除了它运行很多
更快(尤其是--use-bitmap-index
)。请参阅CAVEATS
部分在git cat-file https://git-scm.com/docs/git-cat-file对于什么的限制
“磁盘存储”的意思。
With Git 2.38 (Q3 2022), "git rev-list --disk-usage https://github.com/git/git/blob/fddd8b4801b51234b2dd525c35d74e2a578638fd/Documentation/rev-list-options.txt#L244"(man https://git-scm.com/docs/git-rev-list#Documentation/git-rev-list.txt---disk-usage) learned to take an optional value human
to show the reported value in human-readable format, like "3.40MiB
".
See commit 9096451 https://github.com/git/git/commit/9096451acdf065c3dbcf609dcefe51cd68aa5d1e (11 Aug 2022) by Li Linchao (Cactusinhand) https://github.com/Cactusinhand.
(Merged by Junio C Hamano -- gitster -- https://github.com/gitster in commit fddd8b4 https://github.com/git/git/commit/fddd8b4801b51234b2dd525c35d74e2a578638fd, 18 Aug 2022)
rev-list https://github.com/git/git/commit/9096451acdf065c3dbcf609dcefe51cd68aa5d1e:支持人类可读的输出--disk-usage
Signed-off-by: Li Linchao
The '--disk-usage
' option for git-rev-list https://github.com/git/git/blob/9096451acdf065c3dbcf609dcefe51cd68aa5d1e/Documentation/git-rev-list.txt(man https://git-scm.com/docs/git-rev-list) was introduced in 16950f8 https://github.com/git/git/commit/16950f8384afa5106b1ce57da07a964c2aaef3f7 ("rev-list
: add https://github.com/git/git/blob/16950f8384afa5106b1ce57da07a964c2aaef3f7/Documentation/git-add.txt(man https://git-scm.com/docs/git-add) --disk-usage
option for calculating disk usage", 2021-02-09, Git v2.31.0-rc0 -- merge https://github.com/git/git/commit/6fe12b5215f4ca597accc97ac5dce0f88e8483e9).
这对于人们检查其 git 存储库对象使用信息非常有用,但结果数字对于人类来说很难阅读。
Teach git rev-list
使用“--disk-usage = human”时输出人类可读的结果。
rev-list-options
现在包含在其man page https://github.com/git/git/blob/9096451acdf065c3dbcf609dcefe51cd68aa5d1e/Documentation/rev-list-options.txt#L253-L254:
具有可选值human
,显示磁盘存储大小
以人类可读的字符串形式(例如12.24 Kib
, 3.50 Mib
).
With Git 2.43 (Q4 2023), "git for-each-ref --sort=contents:size https://github.com/git/git/blob/6a4e7440fb4b20822e1854925c0dcfae0c64402d/Documentation/git-for-each-ref.txt#L46"(man https://git-scm.com/docs/git-for-each-ref#Documentation/git-for-each-ref.txt---sortltkeygt)" sorts the refs according to size numerically, giving a ref that points at a blob twelve-byte (12) long before showing a blob hundred-byte (100) long.
See commit 6d79cd8 https://github.com/git/git/commit/6d79cd8474b7bb4979f2a7544fd736bed190261a (02 Sep 2023) by Kousik Sanagavarapu (five-sh) https://github.com/five-sh.
(Merged by Junio C Hamano -- gitster -- https://github.com/gitster in commit 6a4e744 https://github.com/git/git/commit/6a4e7440fb4b20822e1854925c0dcfae0c64402d, 14 Sep 2023)
ref-filter https://github.com/git/git/commit/6d79cd8474b7bb4979f2a7544fd736bed190261a:当“时按数字排序:size
“ 用来
Helped-by: Jeff King
Signed-off-by: Kousik Sanagavarapu
原子像“raw
" and "contents
“ 有一个 ”:size
" 选项可用于了解数据的大小。
由于这些原子具有cmp_type
FIELD_STR,
它们按字母顺序从“a”到“z”以及“0”到“9”排序。
这意味着,即使使用“:size”选项并且我们最终得到的是数字,我们仍然按字母顺序排序。
例如,考虑存储库中的以下情况
refname contents:size raw:size
======= ============= ========
refs/heads/branch1 1130 1210
refs/heads/master 300 410
refs/tags/v1.0 140 260
用“排序”--format="%(refname) %(contents:size) --sort=contents:size
“ 会给
refs/heads/branch1 1130
refs/tags/v1.0.0 140
refs/heads/master 300
这是按字母顺序排序的,而人们真正期望的是:
refs/tags/v1.0.0 140
refs/heads/master 300
refs/heads/branch1 1130
这是数字排序(即“$ sort -n file
“而不是”$ sort file
“, 在哪里 ”file
“仅包含”contents:size
" or "raw:size
“信息,每个信息都在换行符上)。
情况与“--sort=raw:size
".
因此,只要使用“contents:size”或“raw:size”完成排序,就按数字排序,并在“contents”或“raw”与其他选项一起使用时按正常字母顺序进行排序(它们是FIELD_STR
无论如何)。