I'm writing a program that stores files addressed by their hash digest, now I'm wondering what hashing algorithm to use.
I was inclined to go with BLAKE2b because of its speed an security but it isn't used much and I'd rather use something that is more commonly used so I can possibly deduplicate without having to hash the same file twice.
I'm guessing SHA256 is the most commonly (8chan and Nanochan uses this) used and still secure enough (though not against length extension attacks)
But it is also twice as slow as BLAKE2b ... so what should I do use something that almost no one uses but is more secure and faster or use something that is less secure and slower but more commonly used so I can use it more often to deduplicate without having to hash a file twice.
Or should I simply hash the file multiple times... (which kind of defeats the performance gains of BLAKE2b)
>>1861 If you have a (((64-bit))) processor, you should use SHA-512 because it is faster on 64-bit. Otherwise, use SHA-256.
You don't need to worry about security at all, unless you are going to allow people to upload files. If you don't need others to have write access, just use MD5. The chance of accidental collisions is still extremely small.
Here are results of a random benchmark (lower is better):
>BLAKE2b = [0.24815844299882883, 0.24455917099840008, 0.24289618100010557]
>SHA512 = [0.4054747469999711, 0.400344555004267, 0.40310188900184585]
>SHA256 = [0.5319015080021927, 0.527558287998545, 0.5294597870015423]
IPFS uses SHA256 by default as well I think, anyway if I were to use BLAKE2b would it be recommended to simply truncate the hash? I know BLAKE2b has an option to get smaller hashes but it'll be entirely different from the larger hash.
>>1864 Whatever the fuck you do, stick with it. Don't be like IPFS and have multiple different hashing algos. That's total cancer and the main reason why I couldn't be bothered with the piece of shitware after using it for a while.
If you want future-proofing, just use blake2b and FUCKING STICK WITH IT.
>>1866 >FUCKING STICK WITH IT.
Agreed. I hate what IPFS does, its retarded.
The reason I want to simply truncate BLAKE2b hashes is that no matter how long you want your digest you'll be able to deduplicate files still:
>da634c195b5050d8038ce1b83da6d382
>da634c195b5050d8038ce1b83da6d382882ce5914245ec866395a508ef972e39
>da634c195b5050d8038ce1b83da6d382882ce5914245ec866395a508ef972e39a6c39bcd60100e78311f401e9b8cdef7f0b920849e7f0201d83d44e3d76fee09
>>1868 On the topic of using outdated crypto. When hakase switched from a hashing function to a key derivation function he chose to use bcrypt. While bcrypt is still strong it's fairly outdated. In fact bcrypt is almost as old as me. Currently the best KDF to use is argon2id.
>>1870 I've been wanting to use Argon2 for some projects but the thing that prevented me from using it was lack of libraries for Python (I really need to invest my time in other languages)
Anyway, I have few encrypted disks with sha-256. Should I reencrypt? Can length extension attacks affect me? That means if it's, hmm, brokeable in 30 days, in worse case in 60? I used few times over the default iterations.
>>1914 With LUKS/cryptsetup, AES-256, but you know what I meant.
It should compare password I input with the hash, right? If the hash is known I am undefended, right? There's the problem because I couldn't find how exactly does whole process works.
>>1915 >It should compare password I input with the hash, right?
Not exactly. Here, read this research paper and maybe you'll understand how it works better.
>>1869 How is that confusing? its the least confusing computer science standards you could have. algo name and bit length.
>>1870 AFAIK, bcrypt is still secure and tested properly by the industry as well as its adoption on a lot of platforms. Newer doesn't mean less likely to be broken.
>>1861 >still secure enough (though not against length extension attacks)
that's not a vuln nigger that's a retard not understanding how to use a cryptographic hashing algorithm. also length extension attacks don't does not apply to deduplication
>>1864 >(lower is better)
thank you for this clickbait
>>1867 >wants to truncate hashes
typical nigger bullshit. if you want guarantees you don't fuck with the hash. if you're engineering properly to Zooko's triangle, a hash being 'too big' to write down does not matter
>>1915 >With LUKS/cryptsetup, AES-256, but you know what I meant.
No, we fucking don't.
>>3912 Retard, he isn't hashing passwords. Different use cases favor different hashing algorithms. A purposefully slow hashing algorithm is not fit for what OP wants (and needs).
I'm writing a program that stores files addressed by their hash digest, now I'm wondering what hashing algorithm to use.
I was inclined to go with BLAKE2b because of its speed an security but it isn't used much and I'd rather use something that is more commonly used so I can possibly deduplicate without having to hash the same file twice.
I'm guessing SHA256 is the most commonly (8chan and Nanochan uses this) used and still secure enough (though not against length extension attacks)
But it is also twice as slow as BLAKE2b ... so what should I do use something that almost no one uses but is more secure and faster or use something that is less secure and slower but more commonly used so I can use it more often to deduplicate without having to hash a file twice.
Or should I simply hash the file multiple times... (which kind of defeats the performance gains of BLAKE2b)
Sorry, I'm having trouble with making decisions