Hacker News

Ibanga eliHamming Losesho Lwe-Hybrid ku-SQLite

Ibanga eliHamming Losesho Lwe-Hybrid ku-SQLite Lokhu kuhlola kuhlola i-hamming, ihlola ukubaluleka kwayo kanye nomthelela ongaba khona. Imiqondo Eyinhloko Ehlanganisiwe Lokhu okuqukethwe kuhlola: Izimiso eziyisisekelo kanye nemibono Prakthiza...

7 min read Via notnotp.com

Mewayz Team

Editorial Team

Hacker News

Ibanga le-Hamming liyimethrikhi yokufana eyisisekelo ebala amabhithi ahlukene phakathi kweyunithi yezinhlamvu ezimbili kanambambili, okuyenza ibe enye yezindlela ezishesha kakhulu nezisebenza kahle kakhulu zosesho oluseduze lomakhelwane kuzigcinilwazi. Uma isetshenziswa ku-SQLite nge-hybrid search architectures, ibanga le-Haming livula amakhono okusesha e-enterprise-grade semantic ngaphandle kwe-overhead ye-vector database ezinikele.

Liyini Ibanga Le-Haming Distance futhi Kungani Ibalulekile Ekuseshweni Kwesizindalwazi?

Ibanga lokulinganisa lilinganisa inani lezindawo lapho izintambo kanambambili ezimbili zobude obulinganayo zihluka khona. Isibonelo, iyunithi yezinhlamvu kanambambili 10101100 kanye ne-10001101 zinebanga eli-Haming elingu-2, ngenxa yokuthi ziyahluka ezindaweni ezimbili ncamashi. Ezimweni zokusesha kusizindalwazi, lesi sibalo esibonakala silula siba namandla amangalisayo.

Ukusesha kwe-SQL kwendabuko kuncike ekufaniseni ncamashi noma ekukhombeni kombhalo ogcwele, odonsa kanzima ngokufana kwe-semantic — ukuthola imiphumela esho into efanayo kunokwabelana ngamagama angukhiye afanayo. Ibanga le-Hamming livala leli gebe ngokusebenzisa amakhodi kanambambili we-hashi athathwe ekushumekiweyo kokuqukethwe, okuvumela izingosi zolwazi ezifana ne-SQLite ukuthi ziqhathanise izigidi zamarekhodi ngama-millisecond kusetshenziswa imisebenzi ye-XOR bitwise.

Imethrikhi yethulwa ngu-Richard Hamming ngo-1950 kumongo wamakhodi okulungisa amaphutha. Emashumini eminyaka kamuva, kwaba ingqikithi yokubuyiswa kolwazi, ikakhulukazi kumasistimu lapho isivinini sibaluleke ngaphezu kokunemba okuphelele. Ukubalwa kwayo kwe-O(1) ngokuqhathanisa ngakunye (kusetshenziswa imiyalelo ye-CPU popcount) kuyenza ifanelekele ngokukhethekile izinjini zesizindalwazi ezishumekiwe nezingasindi.

Ingabe Usesho Lwe-Hybrid Lulihlanganisa Kanjani Ibanga Le-Haming Nemibuzo Yendabuko Ye-SQLite?

Ukusesha okuxubile ku-SQLite kuhlanganisa amasu amabili okubuyisa ahambisanayo: ukusesha kwegama elingukhiye elincane (usebenzisa isandiso sosesho sombhalo ogcwele se-SQLite esakhelwe ngaphakathi se-SQLite) nosesho oluminyene lokufana (kusetshenziswa ibanga le-Hamming ekushumekeni okulinganiselwe kanambambili). Ayikho indlela yodwa eyanele izidingo zokusesha zesimanje.

Ipayipi lokusesha elixubile elijwayelekile lisebenza ngale ndlela elandelayo:

  1. Isizukulwane sokushumeka: Idokhumenti ngayinye noma irekhodi liguqulwa libe i-vector yephoyinti elintantayo eliphezulu kusetshenziswa imodeli yolimi noma umsebenzi wombhalo wekhodi.
  2. Ukulinganisa kanambambili: Ivektha entantayo icindezelwa ku-hashi kanambambili ehlangene (isb., amabhithi angu-64 noma angu-128) kusetshenziswa amasu afana ne-SimHash noma i-random projection, enciphisa kakhulu izidingo zesitoreji.
  3. Isitoreji senkomba ye-Hamming: I-hash kanambambili igcinwa njengekholomu engu-INTEGER noma ye-BLOB ku-SQLite, evumela ukusebenza okushesha kancane ngesikhathi sombuzo.
  4. Ukuthola amaphuzu esikhathi sombuzo: Uma umsebenzisi ehambisa umbuzo, i-SQLite ibala ibanga le-Haming ngokusebenzisa umsebenzi wesikali wangokwezifiso usebenzisa i-XOR ne-popcount, amakhandidethi abuyisela ahlungwe ngokufana kancane.
  5. Ukuhlanganisa amaphuzu: Imiphumela evela kukusesha kwe-Hamming-based semantic kanye nokusesha kwegama elingukhiye le-FTS5 kuhlanganiswe kusetshenziswa i-Reciprocal Rank Fusion (RRF) noma amaphuzu anesisindo ukuze kukhiqizwe uhlu lokugcina lwezinga.

Ukwandiswa kwe-SQLite ngezandiso ezilayishekayo noma imisebenzi ehlanganisiwe yenza lesi sakhiwo sifezeke ngaphandle kokuthuthela kusistimu yesizindalwazi esindayo. Umphumela uba injini yokusesha ezimele esebenza noma yikuphi lapho i-SQLite isebenza khona - okuhlanganisa amadivaysi ashumekiwe, izinhlelo zokusebenza zeselula, nokusetshenziswa okusemaphethelweni.

Imininingwane Ebalulekile: Usesho lwe-Binary Hamming kuma-64-bit hashe cishe lushesha ngo-30–50x kunokufana kwe-cosine kumavekhtha e-float32 agcwele wobukhulu obulinganayo. Kuzinhlelo zokusebenza ezidinga ukubambezeleka kokusesha okungaphansi kwe-10ms ezigidini zamarekhodi ngaphandle kwezingxenyekazi zekhompuyutha eziyisipesheli, Ibanga le-Haming ku-SQLite livamise ukuhwebelana kobunjiniyela phakathi kokunemba nokusebenza.

Ziyini Izici Zokusebenza Zosesho lwe-Haming ku-SQLite?

I-SQLite iyifayela elilodwa, isizindalwazi esingenaseva, esidala imigoqo eyingqayizivele namathuba okusebenzisa usesho lwebanga le-Haming. Ngaphandle kwezakhiwo zenkomba ze-vector zomdabu ezifana ne-HNSW noma i-IVF (etholakala ezitolo ze-vector ezizinikele), i-SQLite incike ekuskeneni okuqondile ukuze kuseshwe i-Haming — kodwa lokhu kukhawula kancane kunalokho okuzwakalayo.

Izibalo zebanga le-Haming 64-bit zidinga kuphela i-XOR elandelwa i-popcount (isibalo sabantu, ukubala amabhithi esethi). Ama-CPU anamuhla akwenza lokhu ngomyalo owodwa. Ukuskena okugcwele komugqa kwesigidi esingu-1 sama-64-bit hashes kugcwalisa cishe ngama-millisecond angu-5-20 kuhadiwe yempahla, okwenza i-SQLite isebenze kumadathasethi amarekhodi afika ezigidini ezimbalwa ngaphandle kwamaqhinga ezinkomba engeziwe.

💡 DID YOU KNOW?

Mewayz replaces 8+ business tools in one platform

CRM · Invoicing · HR · Projects · Booking · eCommerce · POS · Analytics. Free forever plan available.

Start Free →

Kumadathasethi amakhulu, ukuthuthukiswa kokusebenza kuvela ekuhlungeni kwangaphambili kwekhandidethi: kusetshenziswa izigatshana ze-SQLite's WHERE ukususa imigqa ngemethadatha (ububanzi bedethi, izigaba, amasegimenti abasebenzisi) ngaphambi kokusebenzisa ibanga le-Haming, kunciphisa usayizi wokuskena osebenzayo ngama-oda wobukhulu. Yilapho i-hybrid search architectures ikhanya khona ngempela — isihlungi samagama angukhiye angenalutho sisebenza njengesihlungi sangaphambili esisheshayo, futhi ibanga le-Haming liphinda lilinganise amakhandidethi asaphila.

Uwufaka Kanjani Umsebenzi Webanga Elihle ku-SQLite?

I-SQLite ayifaki umsebenzi webanga le-Hamming, kodwa i-API yayo yesandiso ye-C yenza imisebenzi yesikali yangokwezifiso iqonde ukuze ubhalise. Ku-Python usebenzisa sqlite3 module, ungabhalisa umsebenzi ohlanganisa ibanga lokuhlanganisa phakathi kwama-integer amabili:

Umsebenzi wamukela ama-agumenti enamba amabili amelela ama-hashe kanambambili, ibala i-XOR yawo, bese ibala amabhithi asethiwe kusetshenziswa i-Python's bin().count('1') noma indlela yokukhohlisa esheshayo yePython. Uma usubhalisiwe, lo msebenzi utholakala emibuzweni ye-SQL njenganoma yimuphi umsebenzi owakhelwe ngaphakathi, okwenza imibuzo efana nokukhetha imigqa lapho ibanga le-Haming ukuya ku-hashi yombuzo liwela ngaphansi komkhawulo, lihlelwe ngebanga elikhuphukayo ukuze kutholwe okufanayo okuseduze kuqala.

Ekusetshenzisweni kokukhiqiza, ukuhlanganisa i-popcount logic njengesandiso se-C kusetshenziswa i-SQLite's sqlite3_create_function API ikhiqiza ukusebenza okungcono okungu-10–100x kunePython ehunyushiwe, okuletha ukusesha kwe-SQLite's Hamming kufinyeleleke kusizindalwazi se-vector esikhethekile semisebenzi eminingi eyenziwayo.

Kufanele Amabhizinisi Akhethe Nini I-SQLite Hamming Isesha Ngaphezu Kwemininingwane Yedatha Ezinikele?

Ukukhetha phakathi kokusesha okusekelwe ku-SQLite kwe-Haming kanye nesizindalwazi se-vector esizinikele njenge-Pinecone, i-Weaviate, noma i-pgvector kuncike esikalini, inkimbinkimbi yokusebenza, kanye nemikhawulo yokusebenzisa. Usesho lwe-SQLite Hamming luyisinqumo esifanele uma ubulula, ukuphatheka, kanye nezindleko kubaluleke kakhulu — okuyisimo sezinhlelo eziningi zebhizinisi.

Imininingwane egciniwe ye-vector ezinikele yethula okubalulekile okusebenzayo: ingqalasizinda ehlukene, ukubambezeleka kwenethiwekhi, ubunzima bokuvumelanisa, kanye nezindleko ezinkulu esikalini. Kuzinhlelo zokusebenza ezinikeza amashumi ezinkulungwane kuya ezigidini eziphansi zamarekhodi, ukusesha kwe-SQLite Hamming kuletha ukuhlobana okubhekene nomsebenzisi okuqhathanisekayo nengqalasizinda eyengeziwe enguziro. Ibeka ngokuhlanganyela inkomba yakho yosesho nedatha yohlelo lwakho lokusebenza, isuse sonke isigaba samamodi okuhluleka amasistimu asabalalisiwe.

Imibuzo Evame Ukubuzwa

Ingabe usesho lwebanga le-Haming lunembe ngokwanele ezinhlelweni zokusesha zokukhiqiza?

Ibanga lokulinganisa ekushumekeni okulinganiselwe kanambambili lihweba ngenani elincane lokunemba lokukhumbula ukuze kuzuzwe isivinini esikhulu. Empeleni, ukulinganisa okunambambili ngokuvamile kugcina u-90–95% wekhwalithi yokukhumbula yokusesha okugcwele kwe-float32 cosine. Ezicelweni eziningi zokucinga zebhizinisi - ukutholwa komkhiqizo, ukubuyiswa kwemibhalo, izisekelo zolwazi lokusekelwa kwamakhasimende - lokhu kuhwebelana kwamukeleka ngokuphelele, futhi abasebenzisi abakwazi ukubona umehluko kwikhwalithi yemiphumela.

Ingabe i-SQLite ingakwazi ukufunda nokubhala ngesikhathi esisodwa ngesikhathi semibuzo yosesho ye-Haming?

I-SQLite isekela ukufundwa okuhambisanayo ngemodi yayo ye-WAL (Write-Ahead Logging), evumela abafundi abaningi ukuba babuze kanyekanye ngaphandle kokuvinjwa. I-Writ concurrency inomkhawulo - I-SQLite ibhala ngokulandelana - kodwa lokhu akuvamile ukuthi kube ibhodlela lemisebenzi enzima yokusesha lapho ukubhala kungavamile uma kuqhathaniswa nokufundwayo. Kuzinhlelo zokusebenza zokusesha ezixubile ezifundekayo, imodi ye-WAL ye-SQLite yanele ngokuphelele.

Ingabe ukulinganisa okunambambili kuzithinta kanjani izimfuneko zesitoreji uma kuqhathaniswa namavekhtha antantayo?

Ukugcinwa kwesitoreji kuyamangalisa. Ukushumeka okujwayelekile kwe-768-dimensional float32 kudinga amabhayithi angu-3,072 (3 KB) irekhodi ngalinye. I-hash kanambambili engu-128 yokushumeka okufanayo idinga amabhayithi angu-16 kuphela - ukuncishiswa okungu-192x. Kudathasethi yamarekhodi ayizigidi ezingu-1, lokhu kusho umehluko phakathi kuka-3 GB kanye no-16 MB wesitoreji sokushumeka, okwenza ukusesha okusekelwe ku-Hamming kwenzeke ezindaweni ezinememori lapho isitoreji esigcwele singenakwenzeka khona.


Ukwakha imikhiqizo ehlakaniphile, eseshekayo iwuhlobo ncamashi lwekhono elihlukanisa amabhizinisi akhulayo namile. I-Mewayz iyi-OS yebhizinisi elihlanganisa konke okukodwa okuthenjwa ngabasebenzisi abangaphezu kuka-138,000, enikeza amamojula ahlanganisiwe angu-207 - kusukela ku-CRM nokuhlaziya ukuya ekuphathweni kokuqukethwe nangale kwalokho - kuqala ku-$19/ngenyanga. Yeka ukuhlanganisa amathuluzi anqanyuliwe bese uqala ukwakha endaweni eklanyelwe isikali.

Qala uhambo lwakho lwe-Mewayz namuhla ku-app.mewayz.com futhi uzwe ukuthi isistimu yokusebenza yebhizinisi ehlangene ngempela ingayenzela ini iqembu lakho.

Try Mewayz Free

All-in-one platform for CRM, invoicing, projects, HR & more. No credit card required.

Start managing your business smarter today

Join 30,000+ businesses. Free forever plan · No credit card required.

Ready to put this into practice?

Join 30,000+ businesses using Mewayz. Free forever plan — no credit card required.

Start Free Trial →

Ready to take action?

Start your free Mewayz trial today

All-in-one business platform. No credit card required.

Start Free →

14-day free trial · No credit card · Cancel anytime