Hacker News

Hamming Distance ma Hybrid Hwehwɛ wɔ SQLite mu

Hamming Distance ma Hybrid Hwehwɛ wɔ SQLite mu Saa nhwehwɛmu yi hwehwɛ hamming mu kɔ akyiri, na ɛhwehwɛ nea ɛkyerɛ ne nkɛntɛnso a ebetumi anya mu. Nsusuwii Titiriw a Wɔakata So Saa nsɛm yi hwehwɛ: Nnyinasosɛm ne nsusuwii atitiriw Prac...

10 min read Via notnotp.com

Mewayz Team

Editorial Team

Hacker News

Hamming distance yɛ fapem nsɛsoɔ metric a ɛkan bit ahodoɔ a ɛwɔ binary strings mmienu ntam, na ɛma ɛyɛ akwan a ɛyɛ ntɛm na ɛyɛ adwuma yie a wɔfa so hwehwɛ bɛyɛ sɛ wɔbɛn afipamfoɔ wɔ databases mu no mu baako. Sɛ wɔde di dwuma wɔ SQLite so denam hybrid search architectures so a, Hamming distance bue enterprise-grade semantic search tumi a enni overhead of dedicated vector databases.

Dɛn Ne Hamming Distance na Dɛn Nti Na Ɛho Hia Ma Database Hwehwɛ?

Hamming distance susuw gyinabea dodow a nhama abien a ne tenten yɛ pɛ no yɛ soronko. Sɛ nhwɛsoɔ no, nhama mmienu 10101100 ne 10001101 wɔ Hamming kwan a ɛyɛ 2, ɛfiri sɛ ɛsono wɔ bit gyinabea mmienu pɛpɛɛpɛ. Wɔ database hwehwɛ nsɛm mu no, saa akontabuo a ɛte sɛ nea ɛnyɛ den yi bɛyɛ tumi soronko.

Atetesɛm SQL hwehwɛ gyina nhyiamu pɛpɛɛpɛ anaa nsɛm a wɔakyerɛw nyinaa indexing so, a ɛpere ne nkyerɛaseɛ nsɛsoɔ — hwehwɛ nea ɛfiri mu ba a kyerɛ adeɛ korɔ no ara sen sɛ ɛbɛkyɛ nsɛmfua titire a ɛyɛ pɛ. Hamming distance siw saa nsonsonoe yi ano denam binary hash codes a wonya fi content embeddings mu a ɛyɛ adwuma so, na ɛma database te sɛ SQLite tumi de kyerɛwtohɔ ɔpepem pii toto milisekɔn mu denam bitwise XOR dwumadie so.

Richard Hamming na ɔde metric no baeɛ wɔ afe 1950 mu wɔ nsɛm a ɛfa mfomsoɔ a wɔsiesie mmara ho. Mfe du du akyi no, ɛbɛyɛɛ ade titiriw wɔ nsɛm a wogye mu, titiriw wɔ nhyehyɛe ahorow a ahoɔhare ho hia sen pɛpɛɛpɛyɛ a edi mũ mu. Ne O(1) akontabuo wɔ ntotoho biara mu (a wɔde CPU popcount akwankyerɛ di dwuma) ma ɛfata soronko ma database engine a wɔde ahyɛ mu na emu yɛ hare.

Ɛbɛyɛ dɛn na Hybrid Search Ka Hamming Distance ne Amammerɛ SQLite Asɛmmisa Bom?

Hybrid hwehwɛ wɔ SQLite mu ka akwan abien a ɛka bom a wɔfa so gye nneɛma bom: sparse keyword search (a wɔde SQLite FTS5 full-text search ntrɛwmu a wɔasisi no di dwuma) ne dense similarity search (a wɔde Hamming distance di dwuma wɔ binary quantized embeddings so). Ɔkwan abien no mu biara nkutoo nnɔɔso mma nnɛyi nhwehwɛmu ahwehwɛde ahorow.

Hybrid search pipeline a wɔtaa de di dwuma no yɛ adwuma sɛnea edidi so yi:

  1. Embedding generation: Wɔdane krataa anaa kyerɛwtohɔ biara kɔ soro-dimensional floating-point vector denam kasa nhwɛso anaa encoding dwumadie so.
  2. Binary quantization: Wɔde akwan te sɛ SimHash anaa random projection na ɛhyɛ float vector no mu ma ɛyɛ compact binary hash (e.g., 64 anaa 128 bits) denam akwan te sɛ SimHash anaa random projection so, na ɛtew storage ahwehwɛde so kɛse.
  3. Hamming index storage: Wɔde binary hash no sie sɛ INTEGER anaa BLOB kɔla wɔ SQLite mu, ɛma wotumi yɛ bitwise dwumadie ntɛmntɛm wɔ asɛmmisa berɛ mu.
  4. Asɛmmisa-bere nkontabuo: Sɛ ɔdefoɔ bi de asɛmmisa bi kɔ a, SQLite nam amanneɛbɔ scalar dwumadie a wɔde XOR ne popcount di dwuma so bu Hamming kwansin ho akontaa, na ɛsan de kandifoɔ a wɔahyehyɛ wɔn sɛdeɛ bit nsɛsoɔ teɛ.
  5. Score fusion: Wɔde Reciprocal Rank Fusion (RRF) anaa weighted scoring na ɛka nea efi Hamming-based semantic search ne FTS5 keyword search mu ba no bom de yɛ ranked list a etwa to.

SQLite ntrɛwmu denam ntrɛwmu a wotumi de gu mu anaasɛ dwumadi ahorow a wɔaboaboa ano so no ma saa nhyehyɛe yi tumi yɛ a wontu nkɔ database nhyehyɛe a emu yɛ duru so. Nea afi mu aba ne nhwehwɛmu engine a ɛwɔ ne ho a ɛyɛ adwuma wɔ baabiara a SQLite tu — a mfiri a wɔde ahyɛ mu, mobile apps, ne edge deployments ka ho.

a wɔde ahyɛ mu

Key Insight: Binary Hamming hwehwɛ wɔ 64-bit hashes so no yɛ bɛyɛ 30–50x ntɛmntɛm sen cosine nsɛsoɔ wɔ full float32 vectors a ɛwɔ dimensionality a ɛyɛ pɛ so. Wɔ application ahorow a ɛhwehwɛ sɛ sub-10ms hwehwɛ latency wɔ kyerɛwtohɔ ɔpepem pii a enni hardware soronko mu no, Hamming distance wɔ SQLite mu taa yɛ engineering trade-off a eye sen biara a ɛda pɛpɛɛpɛ ne adwumayɛ ntam.

na ɛkyerɛ sɛ woayɛ

Dɛn ne Adwumayɛ Su a ɛwɔ Hamming Search mu wɔ SQLite mu?

SQLite yɛ fael baako, serverless database, a ɛma anohyeto soronko ne hokwan ahorow a wɔde bedi dwuma wɔ Hamming akyirikyiri hwehwɛ mu. Sɛ native vector indexing structures te sɛ HNSW anaa IVF (wohu wɔ dedicated vector stores) nni hɔ a, SQLite de ne ho to linear scan so ma Hamming hwehwɛ — nanso eyi nyɛ anohyeto kɛse sɛnea ɛte.

Hamming akyirikyiri akontabuo a ɛwɔ 64-bit hwehwɛ XOR a popcount (nnipa dodoɔ dodoɔ, set bits a wɔkan) di akyire nko ara. Nnɛyi CPU ahorow no di eyi ho dwuma wɔ akwankyerɛ biako mu. Linear scan a edi mũ a ɛyɛ 64-bit hashes ɔpepem 1 wie wɔ bɛyɛ milisekɔn 5–20 mu wɔ aguade hardware so, na ɛma SQLite yɛ nea mfaso wɔ so ma datasets a ɛkɔ kyerɛwtohɔ ɔpepem pii a enni indexing tricks foforo.

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →

Wɔ dataset akɛseɛ ho no, adwumayɛ mu nkɔsoɔ firi candidate pre-filtering mu: SQLite WHERE clauses a wode bedi dwuma de ayi rows afiri metadata (date ranges, categories, user segments) ansa na wode Hamming distance adi dwuma, atew scan kɛseɛ a etu mpɔn no so denam orders of magnitude so. Eyi ne baabi a hybrid search architectures hyerɛn ampa — sparse keyword filter no yɛ adwuma sɛ pre-filter a ɛyɛ ntɛm, na Hamming distance san de candidates a wɔda so te ase no si hɔ.

Wobɛyɛ dɛn de Hamming Distance Function adi dwuma wɔ SQLite mu?

SQLite mfa Hamming akyirikyiri dwumadie a ɛyɛ kurom hɔ nka ho, nanso ne C ntrɛmu API ma amanne kwan so scalar dwumadie yɛ tẽẽ sɛ wobɛkyerɛw. Wɔ Python a wode sqlite3 module di dwuma mu no, wobɛtumi akyerɛw dwumadie bi a ɛbu Hamming kwan a ɛda integer mmienu ntam:

Adwuma no gye integer arguments mmienu a egyina hɔ ma binary hashes tom, ɛbu wɔn XOR, afei ɛka set bits no denam Python bin().count('1') anaa bit manipulation kwan a ɛyɛ ntɛm so. Sɛ wɔkyerɛw wo din wie a, saa dwumadie yi bɛyɛ adwuma wɔ SQL abisadeɛ mu te sɛ dwumadie biara a wɔde ahyɛ mu no, ɛma abisadeɛ te sɛ paw a wɔpaw row a Hamming kwan a ɛkɔ abisadeɛ hash so no hwe ase wɔ threshold bi ase, a wɔahyehyɛ no kwan a ɛkɔ soro sɛdeɛ ɛbɛyɛ a wɔbɛgye nsɛsoɔ a ɛbɛn paa no kane.

|

Bere bɛn na Ɛsɛ sɛ Nnwumakuw Paw SQLite Hamming Hwehwɛ Sene Vector Databases a Wɔahyira So?

Nneɛma a wobɛpaw wɔ SQLite-based Hamming search ne dedicated vector databases te sɛ Pinecone, Weaviate, anaa pgvector ntam no gyina scale, adwumayɛ mu nsɛnnennen, ne deployment anohyeto ahorow so. SQLite Hamming hwehwɛ yɛ paw a ɛfata bere a ɛnyɛ den, nea wotumi fa so, ne ɛka ho hia kɛse — a ɛte saa wɔ adwumayɛ aplikeshɔn dodow no ara ho.

Vector databases a wɔatu ho ama no de adwumayɛ ho ka a ɛho hia ba: infrastructure a ɛtetew mu, network latency, synchronization complexity, ne ɛka kɛse wɔ scale mu. Wɔ application ahorow a ɛsom mpempem du du kosi ɔpepem pii a ɛba fam kyerɛwtohɔ ahorow ho no, SQLite Hamming hwehwɛ de mfaso a ɛne nea ɔde di dwuma no hyia a ɛne no di nsɛ a enni infrastructure foforo biara nni mu ma. Ɛde wo hwehwɛ index no ne wo application data no bom, yiyi nhyehyɛe ahorow a wɔakyekyɛ no huammɔdi akwan nyinaa fi hɔ.

Nsɛmmisa a Wɔtaa Bisa

So Hamming akyirikyiri hwehwɛ no yɛ pɛpɛɛpɛ sɛnea ɛsɛ ma production search applications?

Hamming kwansin wɔ binary-quantized embeddings so di gua kakraa bi a wɔkae pɛpɛɛpɛ ma ahoɔhare mfaso kɛse. Wɔ nneyɛe mu no, binary quantization taa kura 90–95% wɔ nkae su a ɛwɔ full float32 cosine nsɛdi hwehwɛ mu. Wɔ adwumayɛ hwehwɛ aplikeshɔn dodow no ara fam — nneɛma a wohu, nkrataa a wogye, adetɔfo mmoa nimdeɛ nnyinaso — saa aguadi yi yɛ nea wogye tom koraa, na wɔn a wɔde di dwuma no ntumi nhu nsonsonoe a ɛwɔ nea efi mu ba no su mu.

So SQLite betumi adi akenkan ne akyerɛw a ɛkɔ so bere koro mu wɔ Hamming hwehwɛ nsɛmmisa mu?

SQLite boa akenkan a ɛkɔ so bere koro mu denam ne WAL (Write-Ahead Logging) mode so, ɛma akenkanfo pii tumi bisa bere koro mu a wonsiw kwan. Write concurrency yɛ anohyeto — SQLite serializes writes — nanso eyi ntaa nyɛ bottleneck ma search-heavy adwuma adesoa a akyerɛw ntaa mma sɛ wɔde toto akenkan ho. Wɔ akenkan-den-intensive hybrid search applications ho no, SQLite WAL mode no dɔɔso koraa.

Ɔkwan bɛn so na binary quantization nya storage ahwehwɛdeɛ so nkɛntɛnsoɔ sɛ wɔde toto float vectors ho a?

Sɛkora sika a wɔkora so no yɛ nwonwa. Float32 embedding a ɛyɛ 768-dimensional a wɔtaa de hyɛ mu no hwehwɛ sɛ wonya baiti 3,072 (3 KB) wɔ kyerɛwtohɔ biara mu. 128-bit binary hash a ɛwɔ embedding koro no ara hwehwɛ baiti 16 pɛ — 192x a wɔtew so. Wɔ dataset a ɛwɔ kyerɛwtohɔ ɔpepem 1 ho no, eyi kyerɛ nsonsonoe a ɛda 3 GB ne 16 MB a ɛwɔ embedding storage ntam, na ɛma Hamming-based search yɛ yiye wɔ memory-constrained mmeae a full float storage bɛyɛ nea ɛnyɛ nea mfaso wɔ so.


Nneɛma a nyansa wom, a wotumi hwehwɛ mu a wɔbɛkyekye no yɛ tumi a ɛtetew nnwuma a ɛrenya nkɔso ne nea egyina hɔ pintinn no mu pɛpɛɛpɛ. Mewayz yɛ adwuma OS a ɛwɔ ne nyinaa mu a nnipa bɛboro 138,000 gye di, a ɛde module 207 a wɔaka abom ma — efi CRM ne nhwehwɛmu so kosi nsɛm a ɛwɔ mu sohwɛ ne nea ɛboro saa — efi ase fi $19/ɔsram pɛ. Gyae sɛ wobɛpam nnwinnade a wɔatwa mu na fi ase si wɔ asɛnka agua a wɔayɛ ama nsenia so.

Fi ase wo Mewayz akwantuo nnɛ wɔ app.mewayz.com na nya deɛ adwumayɛ nhyehyɛeɛ a ɛyɛ biako ankasa betumi ayɛ ama wo kuo no.