Hacker News

Hamming Distance ka ɲɛsin Hybride ɲinini ma SQLite kɔnɔ

Hamming Distance ka ɲɛsin Hybride ɲinini ma SQLite kɔnɔ Nin ɲinini in bɛ don hamming (hamming) la, k’a nafa n’a nɔfɛkow sɛgɛsɛgɛ. Hakilila jɔnjɔn minnu bɛ dabɔ Nin kɔnɔkow bɛ sɛgɛsɛgɛli kɛ: Sariyakolo jɔnjɔnw ni miiriyaw Prac...

12 min read Via notnotp.com

Mewayz Team

Editorial Team

Hacker News

Hamming distance ye ɲɔgɔndan jateminɛ jɔnjɔn ye min bɛ bitiki danfaralenw jate sɛrɛ fila ni ɲɔgɔn cɛ, o b’a kɛ fɛɛrɛ teliyalenw ni nafama dɔ ye sigiɲɔgɔn kɛrɛfɛ ɲininiw na kunnafonidilanw kɔnɔ. Ni a kɛra SQLite kan ɲinini kɛcogo ɲɔgɔndanw fɛ, Hamming distance bɛ baarakɛda ka kɔrɔ ɲinini sekow da wuli k’a sɔrɔ a ma kɛ ni vektɔri kunnafonidilan kɛrɛnkɛrɛnnenw ka musaka ye.

Hamming distance ye mun ye ani mun na a nafa ka bon donanw ɲinini na ?

Hamming distance bɛ yɔrɔ hakɛ suman , jirisun fila minnu janya bɛ bɛn ɲɔgɔn ma . Misali la, sɛrɛ fila 10101100 ani 10001101 bɛ Hamming janya ye 2 ye, bawo u tɛ kelen ye bitiki jɔyɔrɔ fila tigitigi la. Donanw ɲinini siratigɛ la, nin jatebɔ in min bɛ i n’a fɔ a nɔgɔn, o bɛ kɛ fanga ye min tɛ se ka fɔ.

SQL ɲinini laadalata bɛ tali kɛ bɛnkan tigitigi walima sɛbɛnni dafalenw sɛgɛsɛgɛli la, min bɛ kɛlɛ kɛ ni kɔrɔɲɔgɔnmaya ye — ka jaabiw sɔrɔ minnu kɔrɔ ye kelen ye sanni ka daɲɛ kolomaw tila ɲɔgɔn na. Hamming distance bɛ o danfara in tigɛ ni baara kɛli ye binary hash codes kan minnu bɛ bɔ kɔnɔkow doncogo la, o b’a to kunnafonidilanw i n’a fɔ SQLite ka se ka sɛbɛn miliyɔn caman suma ɲɔgɔn na milisekɔndi kɔnɔ ni bitwise XOR baarakɛcogo ye.

mɛtiri in daminɛna Richard Hamming fɛ san 1950 filiw latilenni kode hukumu kɔnɔ . San tan caman o kɔfɛ, a kɛra kunnafoni sɔrɔli jɔyɔrɔba ye, kɛrɛnkɛrɛnnenya la, sigida minnu na teliya nafa ka bon ka tɛmɛ tiɲɛni dafalen kan. A ka O(1) jatebɔ kɛli ɲɔgɔndan kelen-kelen bɛɛ la (ka baara kɛ ni CPU popcount cikanw ye) b’a kɛ a bɛnnen don kosɛbɛ kunnafonidilanw motɛriw ma minnu bɛ don a kɔnɔ ani minnu ka nɔgɔn.

ɲini Hybride bɛ Hamming Distance ni laadala SQLite ɲininkaliw fara ɲɔgɔn kan cogo di ?

| Fɛɛrɛ fila si dɔrɔn tɛ se ka kɛ bi ɲinini wajibiyalenw na.

ɲini pipeline hybride danma dɔ bɛ baara kɛ nin cogo in na :

  1. Embedding generation : sɛbɛn walima sɛbɛn kelen-kelen bɛɛ bɛ wuli ka kɛ fɛn ye min bɛ se ka wuli ka kɛ fɛn ye min bɛ se ka wuli ni kanko misali ye walima ni kodɔn baarakɛcogo ye .
  2. Binary quantization : float vector bɛ digidigi ka kɛ hash binary compact ye (misali la, 64 walima 128 bits) ni fɛɛrɛw ye i n’a fɔ SimHash walima random projection, o bɛ dɔ bɔ kosɛbɛ marako wajibiyalenw na.
  3. Hamming index marali : Hamming binary bɛ mara i n’a fɔ INTEGER walima BLOB kulu SQLite kɔnɔ, o bɛ kɛ sababu ye ka bitwise baara teliyalenw kɛ ɲininkali waati la.
  4. Ɲininkali-waati jatebɔ : Ni baarakɛla ye ɲininkali bila, SQLite bɛ Hamming yɔrɔjan jatebɔ kɛ ni sɛgɛsɛgɛli kɛcogo ladamulen ye ni XOR ni popcount ye, ka kandidaw segin minnu labɛnna ka kɛɲɛ ni bitiki bɔɲɔgɔnko ye.
  5. Score fusion : Hamming basigilen kɔrɔ ɲinini ni FTS5 daɲɛ koloma ɲinini jaabiw bɛ fara ɲɔgɔn kan ni Reciprocal Rank Fusion (RRF) walima weighted scoring ye walasa ka lisi laban dilan .

SQLite ka sɛgɛsɛgɛli kɛli ni fɛnw ye minnu bɛ se ka doni walima baarakɛcogo minnu lajɛlen don, o b’a to nin jɔcogo in bɛ se ka kɛ k’a sɔrɔ a ma wuli ka taa kunnafonidilan girinman dɔ la. O kɔlɔlɔ ye ɲininikɛlan ye min bɛ a yɛrɛ ta, min bɛ baara kɛ SQLite bɛ baara kɛ yɔrɔ o yɔrɔ — minɛn minnu bɛ don a kɔnɔ, mobili porogaramuw, ani edge deployments fana sen bɛ o la.

ye

Hakilila kunbaba : Binary Hamming ɲinini 64-bit hashes kan, o teliya bɛ se 30–50x ma ka tɛmɛ cosine bɔɲɔgɔnko kan float32 vektɔri dafalenw kan minnu hakɛ bɛ bɛn ɲɔgɔn ma. Baarakɛminɛn minnu bɛ ɲinini latɛmɛni ɲini sub-10ms fɛ sɛbɛn miliyɔn caman kɔnɔ minnu tɛ ni fɛnɲɛnamafagalan kɛrɛnkɛrɛnnenw ye, Hamming distance in SQLite ka teli ka kɛ ɛntɛrinɛti jago ɲuman ye tigitigi ni baarakɛcogo cɛ.

ye

Hamming Search ka baarakɛcogo ye mun ye SQLite kɔnɔ ?

SQLite ye dosiye kelen ye, min tɛ ni baarakɛminɛn ye, min bɛ gɛlɛya kɛrɛnkɛrɛnnenw ni sababuw dilan Hamming yɔrɔjan ɲinini waleyali kama. Ni vektɔri indexing structures natives tɛ i n’a fɔ HNSW walima IVF (min bɛ sɔrɔ vecteur magasins kɛrɛnkɛrɛnnenw kɔnɔ), SQLite bɛ a jigi da linear scan kan Hamming ɲinini kama — nka o tɛ dan sigi ka tɛmɛ a mankan kan.

Hamming yɔrɔjan jatebɔ min bɛ kɛ ni bitiki 64 ye, o bɛ XOR dɔrɔn de wajibiya min bɛ tugu popcount kɔ (jama hakɛ, bitiki sigilenw jate). Bi CPUw b’o Kɛ cikan kelen na. 64-bit hakɛ miliyɔn 1 sɛgɛsɛgɛli dafalen bɛ dafa milisekɔndi 5–20 ɲɔgɔn kɔnɔ jagofɛnw kan, o b’a to SQLite bɛ se ka kɛ kunnafonidilanw ye fo ka se sɛbɛn miliyɔn caman ma k’a sɔrɔ indexing tricks wɛrɛw ma fara a kan.

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →

Ka ɲɛsin kunnafonidilan belebelebaw ma, baarakɛcogo ɲɛtaa bɛ bɔ kandidaw ka ɲɛfɛla filɛli la : ka baara kɛ ni SQLite ka WHERE daɲɛw ye walasa ka jirisunw bɔ kunnafonidilanw fɛ (donw hakɛw, kuluw, baarakɛlaw tilayɔrɔw) ka sɔrɔ ka Hamming yɔrɔjan waleya, ka sɛgɛsɛgɛli kɛcogo ɲuman dɔgɔya ni bonya hakɛ ye. Nin yɔrɔ in de la ɲinini kɛcogo ɲɔgɔndanw bɛ yeelen bɔ tiɲɛ na — daɲɛ kolomaw filɛri min ka dɔgɔ, o bɛ baara kɛ i n’a fɔ filɛri teliyalen ka kɔn, wa Hamming distance bɛ kandida minnu tora, olu sigiyɔrɔma segin.

I bɛ Hamming Distance Function waleya cogo di SQLite kɔnɔ ?

SQLite tɛ Hamming yɔrɔjan baarakɛcogo nafama dɔ don a kɔnɔ, nka a ka C extension API bɛ ladamu scalar baarakɛcogo kɛ ka ɲɛ walasa ka sɛbɛnni kɛ. Python kɔnɔ ni sqlite3 modulu ye, i bɛ se ka baarakɛcogo dɔ sɛbɛn min bɛ Hamming janya jatebɔ jateden dafalen fila ni ɲɔgɔn cɛ:

Baarakɛla bɛ sɔn jateden dafalen dalilu fila ma minnu bɛ hash fila jira, k’u ka XOR jatebɔ, o kɔfɛ, a bɛ set bitw jate ni Python ka bin().count('1') ye walima bitiki manipule fɛɛrɛ teliyalen ye. Ni o baara in sɛbɛnna, o bɛ kɛ SQL ɲininkaliw kɔnɔ i n’a fɔ baarakɛminɛn jɔlen bɛɛ, o bɛ se ka ɲininkaliw kɛ i n’a fɔ ka jirisunw sugandi yɔrɔ minnu na Hamming ka yɔrɔjan ka taa ɲininkali hakɛ dɔ la, o bɛ bin dakun dɔ jukɔrɔ, ka kɛɲɛ ni yɔrɔjan jiginni ye walasa ka ɲɔgɔndan surunw sɔrɔ fɔlɔ.

Ka ɲɛsin sɛnɛfɛnw bɔli ma, popcount logic dalajɛli i n’a fɔ C farankan ni SQLite ka sqlite3_create_function API ye, o bɛ baara kɛcogo ɲuman di siɲɛ 10–100 ka tɛmɛ Python kɔrɔfɔlen kan, o bɛ na ni SQLite ka Hamming ɲinini ye ka se vektɔri kunnafonidilan kɛrɛnkɛrɛnnenw ma, baarakɛta nafama caman kama.

Jagokɛlaw ka kan ka SQLite Hamming ɲinini sugandi tuma jumɛn ka tɛmɛn vektɔri kunnafonidilan kɛrɛnkɛrɛnnenw kan ?

Sugandili min bɛ kɛ SQLite-based Hamming ɲinini ni vektɔri kunnafonidilan kɛrɛnkɛrɛnnenw cɛ i n’a fɔ Pinecone, Weaviate, walima pgvector, o bɛ bɔ hakɛ la, baarakɛcogo gɛlɛya la, ani baarakɛcogo gɛlɛyaw la. SQLite Hamming ɲinini ye sugandili ɲuman ye ni nɔgɔya, tacogo, ani musaka nafa ka bon kosɛbɛ — o min bɛ kɛ jagokɛcogo fanba la.

Vɛtiri kunnafonidilanw kɛrɛnkɛrɛnnenw bɛ baarakɛcogo kunbabaw don senkan : fɛnsɔrɔsiraw danfaralenw, ɛntɛrinɛti latɛmɛni, ɲɔgɔndan gɛlɛya, ani musakabaw hakɛ la. Baarakɛminɛn minnu bɛ baara kɛ ni sɛbɛn ba tan ni tan ye fo ka se miliyɔn caman ma, SQLite Hamming ɲinini bɛ baarakɛcogo ɲuman di min bɛ se ka suma ni baarakɛlaw ɲɛda ye ni zeru ye min bɛ fara a kan. A b’i ka ɲininikɛlan ni i ka baarakɛminɛnw kunnafoniw sigi ɲɔgɔn fɛ, ka sistɛmu tilalenw dɛsɛcogo suguya bɛɛ bɔ yen.

Ɲininkali minnu bɛ kɛ tuma caman na

Yala Hamming yɔrɔjan ɲinini bɛ se ka kɛ fɛn dilanni ɲinini baarakɛminɛnw ye wa ?

Hamming distance on binary-quantized embeddings bɛ jago kɛ hakilijigin tigitigi hakɛ fitinin dɔ la teliya tɔnɔba kama. Tiɲɛ na, a ka c’a la, binary quantization bɛ 90–95% mara hakilijigin jogo la full float32 cosine similarity search kɔnɔ. Jago ɲinini baarakɛminɛn fanba la — fɛn dilannenw sɔrɔli, sɛbɛnw sɔrɔli, kiliyanw dɛmɛni dɔnniyaw — o jago in bɛ sɔn pewu, wa baarakɛlaw tɛ se ka danfara dɔn sɔrɔcogo ɲuman na.

Yala SQLite bɛ se ka kalan ni sɛbɛnni kɛ ɲɔgɔn fɛ Hamming ɲinini ɲininkaliw senfɛ wa ?

SQLite bɛ kalan kɛ ɲɔgɔn fɛ dɛmɛ a ka WAL (Write-Ahead Logging) cogo fɛ, o b’a to kalanden caman bɛ se ka ɲininkali kɛ waati kelen na k’a sɔrɔ u ma bali. Sɛbɛnni ɲɔgɔndɛmɛ dan ye — SQLite bɛ sɛbɛnniw kɛ ɲɔgɔn kɔ — nka o man teli ka kɛ buteli ye ɲinini-gɛlɛnw na yɔrɔ minnu na sɛbɛnniw man teli ka kɛ ni kalanw ye. Ka ɲɛsin ɲininikɛlanw ma minnu bɛ kalan kosɛbɛ, SQLite ka WAL cogoya bɛ se ka kɛ pewu.

quantisation binaire bɛ nɔ bila cogo di marako wajibiyalenw na ni i y' a suma ni float vectors ye ?

Mɔgɔ marali bolomafaraw ye kabako ye . Float32 768-dimensional embedding danma bɛ bayt 3072 (3 KB) de wajibiya sɛbɛn kelen kɔnɔ. 128-bit binary hash min bɛ o embedding kelen na, o bɛ bayt 16 dɔrɔn de wajibiya — o ye 192x dɔgɔyali ye. Ni kunnafonisɛbɛn miliyɔn 1 bɛ sɔrɔ, o kɔrɔ ye ko danfara bɛ 3 GB ni 16 MB cɛ embedding storage cɛ, o b’a to Hamming basigilen ɲinini bɛ se ka kɛ sigidaw la minnu hakili ka gɛlɛn, yɔrɔ minnu na float storage dafalen tɛ se ka kɛ.


fɛn hakilitigiw jɔli , minnu bɛ se ka ɲini , o ye seko sugu ye tigitigi min bɛ jago yiriwalenw ni jago jɔlenw fara ɲɔgɔn kan . Mewayz ye jagokɛminɛn bɛɛ lajɛlen OS ye, baarakɛla 138.000 ni kɔ dalen bɛ min na, a bɛ modulu 207 dilan ɲɔgɔn fɛ — k’a ta CRM ni jateminɛw na ka se kɔnɔkow ɲɛnabɔli ma ani a kɔfɛ — k’a daminɛ dɔrɔmɛ 19 dɔrɔn na kalo kɔnɔ. baarakɛminɛn tigɛlenw sirili dabila ka jɔli daminɛ kɛnɛ kan min dabɔra sɛgɛsɛgɛli kama.

I ka Mewayz taama daminɛ bi app.mewayz.com ani k’a dɔn jagokɛcogo kelen lakika bɛ se ka min kɛ i ka jɛkulu ye.