Hacker News

Hamming Distance na Hybrid Didi le SQLite me

Hamming Distance na Hybrid Didi le SQLite me Kukuɖenuŋu sia ku ɖe hamming ŋu, eye wòdzro eƒe vevienyenye kple ŋusẽ si wòate ŋu akpɔ ɖe amewo dzi me. Nukpɔsusu Vevi Siwo Ŋu Woƒo Nu Ðo Nya sia ku ɖe: Gɔmeɖose veviwo kple nufiafiawo Prac...

12 min read Via notnotp.com

Mewayz Team

Editorial Team

Hacker News

Hamming didime nye gɔmeɖoanyi ƒe sɔsɔme ƒe metrik si xlẽa bit vovovowo le ka eve dome, si wɔe be wònye mɔnu siwo le kabakaba wu eye wowɔa dɔ nyuie wu la dometɔ ɖeka na aƒelika si te ɖe wo nɔewo didi si gogo le nyatakakadzraɖoƒewo. Ne wozãe ɖe SQLite ŋu to didi ƒe xɔtuɖaŋu siwo wotsɔ tsaka me la, Hamming didiƒe ʋua dɔwɔƒe ƒe gɔmesese didi ƒe ŋutetewo ʋuʋu vektor nyatakakadzraɖoƒe tɔxɛwo ƒe gazazã manɔmee.

Nukae Nye Hamming Distance eye Nukatae Wòle Vevie Na Nyatakakadzraɖoƒe Didi?

Hamming ƒe didime dzidzea teƒe siwo ka eve siwo ƒe didime sɔ la to vovo le ƒe xexlẽme. Le kpɔɖeŋu me, ka eve 10101100 kple 10001101 ƒe Hamming didime nye 2, elabena woto vovo le bit ƒe nɔƒe eve pɛpɛpɛ. Le nyatakakadzraɖoƒe didi ƒe nɔnɔmewo me la, akɔntabubu sia si dze abe ɖe wòle bɔbɔe ene la va zua ŋusẽ tɔxɛ aɖe ŋutɔ.

SQL didi xoxo nɔ te ɖe nusiwo sɔ pɛpɛpɛ alo nuŋɔŋlɔ bliboa ƒe xexlẽdzesiwo ƒe xexlẽdzesiwo dzi, si wɔa avu kple gɔmesese ƒe sɔsɔ — didi tso emetsonu siwo gɔmesese nu ɖeka tsɔ wu be woama nya vevi siwo sɔ. Hamming distance xea mɔ na dometsotso sia to dɔwɔwɔ ɖe binary hash code siwo wokpɔ tso content embeddings me dzi, si na be nyatakakadzraɖoƒewo abe SQLite ene te ŋu tsɔa nuŋlɔɖi miliɔn geɖe sɔna kple wo nɔewo le milisekɔnd me to bitwise XOR dɔwɔwɔwo zazã me.

Richard Hamming ye to metrik la vɛ le ƒe 1950 me le vodada ɖɔɖɔɖo ƒe kɔdawo ƒe nyawo me. Ƒe bla nanewo megbe la, eva zu nu vevi aɖe le nyatakakawo xɔxɔ me, vevietɔ le ɖoɖo siwo me duƒuƒu le vevie wu nusi sɔ pɛpɛpɛ bliboe. Eƒe O(1) akɔntabubu le tsɔtsɔ sɔ kple wo nɔewo ɖesiaɖe me (CPU popcount mɔfiamewo zazã) na wòsɔ etɔxɛ na nyatakakadzraɖoƒe mɔ̃ siwo wotsɔ de eme eye woƒe kpekpeme le bɔbɔe.

Aleke Hybrid Search Tsɔ Hamming Distance Kple SQLite Biabia Deŋgɔwo Trɔna?

Hybrid didi le SQLite me ƒoa ƒu xɔxɔ ƒe mɔnu eve siwo kpe ɖe wo nɔewo ŋu: nya vevi didi si mebɔ o (zã SQLite ƒe FTS5 full-text search extension si wotu ɖe eme) kple dense similarity search (zã Hamming distance le binary quantized embeddings dzi). Mɔnu eveawo dometɔ aɖeke ɖeɖe mesɔ gbɔ na egbegbe didi ƒe nudidiwo o.

Didi ƒe mɔ̃ si wotsɔ tsaka si bɔ la wɔa dɔ ale:

    ƒe nyawo
  1. Embedding generation: Wotrɔa nuŋlɔɖi alo nuŋlɔɖi ɖesiaɖe wòzua high-dimensional floating-point vector to gbegbɔgblɔ ƒe kpɔɖeŋu alo encoding function zazã me.
  2. Binary quantization: Womia float vector la ɖe compact binary hash (e.g., 64 alo 128 bits) me to mɔnuwo abe SimHash alo random projection zazã me, si ɖea nudzraɖoƒe ƒe nudidiwo dzi kpɔtɔna ŋutɔ.
  3. Hamming index storage: Wodzra binary hash la ɖo abe INTEGER alo BLOB kɔlam ene le SQLite me, si wɔnɛ be bitwise dɔwɔwɔ kabakaba le biabiaɣi.
  4. Nyabiase-ɣeyiɣi ƒe dzesidede: Ne zãla tsɔ biabia ɖo ɖa la, SQLite bua Hamming ƒe didime to scalar dɔwɔwɔ tɔxɛ aɖe dzi to XOR kple popcount zazã me, trɔa ame siwo woɖo ɖe ɖoɖo nu le bit ƒe sɔsɔ nu.
  5. Dzesiwo ƒe fuɖeɖe: Wotsɔa nusiwo do tso gɔmesese didi si wotu ɖe Hamming dzi kple FTS5 nya veviwo didi me ƒoa ƒu to Reciprocal Rank Fusion (RRF) alo xexlẽdzesi si woda kpekpeme dzi tsɔ wɔa xexlẽdzesi mamlɛtɔ si woɖo ɖe ɖoƒe.
ƒe nyawo

SQLite ƒe kekeɖenudɔwɔwɔ to kekeɖenudɔwɔwɔ siwo woateŋu atsɔ agba alo dɔwɔwɔ siwo woƒo ƒu me na be woateŋu awɔ xɔtuɖaŋu sia evɔ womaʋu ayi nyatakakadzraɖoƒe ƒe ɖoɖo si kpekpe wu me o. Nusi do tso eme enye didimɔ̃ si le eɖokui si si wɔa dɔ le afisiafi si SQLite le dɔ wɔm le — si me mɔ̃ siwo wotsɔ de eme, asitelefon dzi dɔwɔɖoɖowo, kple edge deployments hã le.

ƒe nyawo

Key Insight: Binary Hamming didi le 64-bit hashes dzi le kabakaba abe 30–50x ene wu cosine similarity le full float32 vectors siwo ƒe didime sɔ. Le dɔwɔwɔ siwo hiã sub-10ms didi ƒe ɣeyiɣi didi le nuŋlɔɖi miliɔn geɖe siwo me xɔtunu tɔxɛ aɖeke mele o gome la, Hamming didime le SQLite me nyea mɔ̃ɖaŋudɔwɔwɔ ƒe asitsatsa nyuitɔ zi geɖe le nyateƒetoto kple dɔwɔwɔ dome.

ƒe nyawo

Nukae Nye Hamming Search ƒe Dɔwɔwɔ ƒe Nɔnɔmewo le SQLite me?

SQLite nye nyatakakadzraɖoƒe si me faɛl ɖeka le, si me server mele o, si wɔa mɔxenu tɔxɛwo kple mɔnukpɔkpɔwo hena Hamming didiƒe didi ƒe dɔwɔwɔ. Ne native vector indexing structures abe HNSW alo IVF (si wokpɔna le vector fiase tɔxɛwo me) manɔmee la, SQLite ɖoa ŋu ɖe linear scan ŋu hena Hamming didi — gake esia meɖoa seɖoƒe boo o wu alesi wòɖi.

Hamming ƒe didime ƒe akɔntabubu si nye 64-bit bia XOR si kplɔe ɖo kple popcount (amewo ƒe xexlẽme, bit siwo woɖo xlẽ). Egbegbe CPUwo wɔa esia le mɔfiame ɖeka me. Linear scan blibo si nye 64-bit hashes miliɔn 1 wu enu le abe milisekɔnd 5–20 ene me le adzɔnuwo ƒe xɔtunuwo dzi, si wɔe be SQLite wɔa dɔ na nyatakakadzraɖoƒe siwo ade nuŋlɔɖi miliɔn geɖe siwo me indexing tricks bubuwo manɔ o.

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →
| Afi siae didi ƒe xɔtuɖaŋu vovovowo klẽna le nyateƒe — nya veviwo ƒe ʋuʋudedi si mebɔ o la wɔa dɔ abe do ŋgɔ ƒe ʋuʋudedi kabakaba ene, eye Hamming didiƒe gbugbɔ ɖoa ame siwo tsi agbe la ɖe ɖoɖo nu.

Aleke Nàwɔ Hamming Distance Dɔwɔwɔ Ŋudɔ le SQLite me?

SQLite metsɔ Hamming ƒe didiƒedɔwɔwɔ ŋutɔŋutɔ de eme o, gake eƒe C kekeɖenudɔwɔwɔ API na be scalar dɔwɔwɔ tɔxɛwo le bɔbɔe be woaŋlɔ ŋkɔ. Le Python me to sqlite3 module zazã me la, àteŋu aŋlɔ dɔwɔwɔ si bua Hamming ƒe didime le xexlẽdzesi blibo eve dome:

Dɔwɔwɔa xɔa xexlẽdzesi blibo ƒe nyaʋiʋli eve siwo le tsitre ɖi na binary hashes, bua woƒe XOR, emegbe exlẽa bit siwo woɖo la to Python ƒe bin().count('1') alo bit manipulation mɔnu si le kabakaba wu zazã me. Ne wonya ŋlɔ ŋkɔ ko la, dɔwɔwɔ sia va nɔa SQL biabiawo me abe dɔwɔwɔ ɖesiaɖe si wotu ɖe eme ene, si wɔnɛ be woate ŋu awɔ biabiawo abe fli siwo me Hamming ƒe didime si le biabia ƒe hash gbɔ dze anyi le dzidzenu aɖe te, si woɖo ɖe ɖoɖo nu le didime si le dzi yim nu be woaxɔ esiwo te ɖe wo nɔewo ŋu wu gbã.

|

Ɣekaɣie Wòle Be Asitsahawo Natia SQLite Hamming Didi Ðe Vector Nyatakakadzraɖoƒe Siwo Woɖo Ðe Edzi?

Tiatia si le SQLite-dzi Hamming didi kple vektor nyatakakadzraɖoƒe tɔxɛwo abe Pinecone, Weaviate, alo pgvector dome nɔ te ɖe lolome, dɔwɔwɔ ƒe sesẽ, kple dɔwɔwɔ ƒe mɔxenuwo dzi. SQLite Hamming didi nye tiatia nyuitɔ ne nusiwo le bɔbɔe, tsɔtsɔ yi teƒe bubu, kple gazazã le vevie wu — si nye nusi le asitsadɔ akpa gãtɔ gome.

Vector nyatakakadzraɖoƒe siwo woɖo ɖi la toa dɔwɔwɔ ƒe gazazã veviwo vɛ: xɔtuɖoɖo vovovowo, network latency, synchronization complexity, kple gazazã gã le dzidzenu nu. Le dɔwɔɖoɖo siwo subɔa nuŋlɔɖi akpe ewo va ɖo miliɔn geɖe siwo le bɔbɔe gome la, SQLite Hamming didi naa vevienyenye si sɔ kple esi dze ŋgɔ zãla kple xɔtuɖoɖo bubu zero. Eɖoa wò didiƒefianu ɖe teƒe ɖeka kple wò dɔwɔwɔ ŋuti nyatakakawo, si ɖea ɖoɖo siwo woma ƒe kpododonu ƒe nɔnɔmewo ƒe hatsotso blibo aɖe ɖa.

Nyabiase Siwo Wobiana Enuenu

Ðe Hamming didiƒe didi de pɛpɛpɛ ale gbegbe na nuwɔwɔ didi ƒe dɔwɔɖoɖowoa?

Hamming didime le binary-quantized embeddings dzraa ŋkuɖodzinya ƒe nyateƒetoto sue aɖe na duƒuƒu ƒe viɖe gãwo. Le nuwɔna me la, zi geɖe la, binary quantization léa 90–95% ƒe ŋkuɖodzinu ƒe nyonyome si le full float32 cosine similarity search me ɖe asi. Le asitsatsa didi ƒe dɔwɔnu akpa gãtɔ gome — adzɔnuwo didi, nuŋlɔɖiwo xɔxɔ, asisiwo ƒe kpekpeɖeŋu sidzedze ƒe nɔƒewo — asitsatsa sia nye nusi dzi woda asi ɖo bliboe, eye ezãlawo mateŋu akpɔ vovototo si le emetsonu ƒe nyonyome me o.

Ðe SQLite ateŋu akpɔ nuxexlẽ kple nuŋɔŋlɔ le ɣeyiɣi ɖeka me le Hamming didi ƒe biabiawo mea?

SQLite doa alɔ nuxexlẽ le ɣeyiɣi ɖeka me to eƒe WAL (Write-Ahead Logging) nɔnɔme me, si na be nuxlẽla geɖewo te ŋu bia nya le ɣeyiɣi ɖeka me mɔxexeɖenu manɔmee. Nuŋɔŋlɔ ƒe ɣeyiɣi ɖeka me nɔnɔ seɖoƒe li na — SQLite wɔa nuŋɔŋlɔwo ɖe ɖoɖo nu — gake ƒã hafi esia nyea mɔxenu na dɔ siwo me didi le afisi nuŋɔŋlɔwo mebɔ o ne wotsɔe sɔ kple nuxexlẽwo. Le didi ƒe dɔwɔwɔ siwo me woxlẽa nu geɖe le gome la, SQLite ƒe WAL nɔnɔme sɔ gbɔ bliboe.

Aleke binary quantization kpɔa ŋusẽ ɖe nudzraɖoƒe ƒe nudidiwo dzi ne wotsɔe sɔ kple float vectors?

Nudzraɖoƒe ƒe ga si wodzra ɖo la wɔ nuku ŋutɔ. 768-dimensional float32 embedding si bɔ bia byte 3,072 (3 KB) le nuŋlɔɖi ɖesiaɖe me. 128-bit binary hash si nye embedding ɖeka ma ke hiã byte 16 ko — 192x ƒe ɖeɖekpɔkpɔ. Le nyatakakadzraɖoƒe si me nuŋlɔɖi miliɔn 1 le gome la, esia fia be vovototo le 3 GB kple 16 MB ƒe nudzraɖoƒe si wotsɔ de eme dome, si wɔe be Hamming-based search ateŋu adzɔ le ŋkuɖodzinu-sesẽ ƒe nɔnɔmewo me afisi float nudzraɖoƒe blibo mawɔ dɔ o.


ƒe nyawo

Adzɔnu siwo me nunya le, siwo woate ŋu adi la tutu nye ŋutete si tututu ma asitsaha siwo le dzidzim ɖe edzi kple esiwo tsi anyi. Mewayz nye asitsatsa ƒe OS si katã le ɖeka me si dzi zãla siwo wu 138,000 ka ɖo, si naa modules 207 siwo wotsɔ wɔ ɖekae — tso CRM kple numekuku dzi va ɖo nyatakakawo dzikpɔkpɔ dzi kple esiwo wu nenema — si dzea egɔme tso $19/ɣleti ko dzi. Dzudzɔ dɔwɔnu siwo metso kadodowo o la tsɔtsɔ ƒo ƒui eye nàdze xɔtutu gɔme ɖe nuƒolanɔƒe si wowɔ na dzidzenu dzi.

Dze wò Mewayz mɔzɔzɔ gɔme egbea le app.mewayz.com eye nàkpɔ nusi asitsadɔwɔɖoɖo si wɔ ɖeka vavã ateŋu awɔ na wò ƒuƒoƒoa.