Je, una wito mkubwa wa LLM katika mtiririko wako wa mabadiliko ya data? inaweza kuwa na uwezo wa kusaidia. Ni kuendeshwa na injini ya Rust yenye utendaji wa juu na sasa inasaidia batching ya kutosha. Hii imeboresha Throughput kwa ~5× (≈80% haraka runtime) kwa mtiririko wa kazi wa AI. Na bora zaidi ya yote, huna haja ya kubadilisha msimbo wowote kwa sababu batching hutokea moja kwa moja, kukabiliana na trafiki yako na kuweka GPU kutumika kikamilifu. CocoIndex Hapa ni nini tulijifunza wakati wa kujenga msaada wa batching adaptive katika Cocoindex. Lakini kwanza, hebu tujibu maswali machache ambayo inaweza kuwa kwenye mchanganyiko wako. Kwa nini batching huongeza kasi ya usindikaji? This consists of all the preparatory and administrative work required before the actual computation can begin. Examples include GPU kernel launch setup, Python-to-C/C++ transitions, scheduling of tasks, memory allocation and management, and bookkeeping performed by the framework. These overhead tasks are largely independent of the input size but must be paid in full for each call. Fixed overhead per call: This portion of the computation scales directly with the size and complexity of the input. It includes floating-point operations (FLOPs) performed by the model, data movement across memory hierarchies, token processing, and other input-specific operations. Unlike the fixed overhead, this cost increases proportionally with the volume of data being processed. Data-dependent work: Wakati vitu vinavyoshughulikiwa moja kwa moja, overhead ya imara hutokea mara kwa mara kwa kila bidhaa, ambayo inaweza haraka kutawala muda wa kuendesha jumla, hasa wakati hesabu ya kila bidhaa ni ndogo. Kwa upande mwingine, usindikaji wa bidhaa nyingi pamoja katika makundi hupunguza kwa kiasi kikubwa athari ya kila bidhaa ya overhead hii. Batching inaruhusu gharama za imara kupunguzwa juu ya bidhaa nyingi, wakati pia inaruhusu upatikanaji wa vifaa na programu ambazo husaidia kuboresha ufanisi wa kazi inayohusiana na data. Batching huongeza kwa kiasi kikubwa utendaji kwa kuboresha ufanisi wa kompyuta na matumizi ya rasilimali. Inatoa faida nyingi, za kuunganisha: Each function or API call carries a fixed overhead — GPU kernel launches, Python-to-C/C++ transitions, task scheduling, memory management, and framework bookkeeping. By processing items in batches, this overhead is spread across many inputs, dramatically reducing the per-item cost and eliminating repeated setup work. Amortizing one-time overhead: Larger batches allow the GPU to execute operations as dense, highly parallel matrix multiplications, commonly implemented as General Matrix–Matrix Multiplication (GEMM). This mapping ensures the hardware runs at higher utilization, fully leveraging parallel compute units, minimizing idle cycles, and achieving peak throughput. Small, unbatched operations leave much of the GPU underutilized, wasting expensive computational capacity. Maximizing GPU efficiency: Batching minimizes the frequency of memory transfers between CPU (host) and GPU (device). Fewer Host-to-Device (H2D) and Device-to-Host (D2H) operations mean less time spent moving data and more time devoted to actual computation. This is critical for high-throughput systems, where memory bandwidth often becomes the limiting factor rather than raw compute power. Reducing data transfer overhead: Katika mchanganyiko, madhara haya husababisha maboresho ya ukubwa kwa kiasi kikubwa katika uwezo wa kupitisha. Batching inabadilisha hesabu nyingi ndogo, zisizo na ufanisi katika shughuli kubwa, zinazotimizwa sana ambazo zinatumia kikamilifu uwezo wa vifaa vya kisasa. Kwa kazi za AI - ikiwa ni pamoja na mifano ya lugha kubwa, maono ya kompyuta, na usindikaji wa data ya wakati halisi - batching sio tu uboreshaji; ni muhimu ili kufikia utendaji wa kiwango cha uzalishaji. Jinsi batching inaonekana kwa ajili ya kawaida Python code Msimbo usio wa batch - rahisi lakini chini ya ufanisi Njia ya asili zaidi ya kuandaa pipeline ni kusindika data kipande kwa kipande. Kwa mfano, mzunguko wa ngazi mbili kama hii: for file in os.listdir(directory): content = file.read() chunks = split_into_chunks(content) for chunk in chunks: vector = model.encode([chunk.text]) # one item at a time index.upsert(file_id=file.name, chunk_offset=chunk.offset, vector=vector) Hii ni rahisi kusoma na kufikiri kuhusu: kila chunk hupita moja kwa moja kupitia hatua kadhaa. Kuchanganya kwa manually – ufanisi zaidi lakini ngumu Unaweza kuharakisha kwa kuchanganya, lakini hata toleo rahisi la "kuchanganya kila kitu mara moja" hufanya msimbo kuwa ngumu sana: # 1) Collect payloads and remember where each came from batch_texts = [] metadata = [] # (file_id, chunk_id) for file in os.listdir(directory): content = file.read() chunks = split_into_chunks(content) for chunk in chunks: batch_texts.append(chunk.text) metadata.append((file.name, chunk.offset)) # 2) One batched call (library will still mini-batch internally) vectors = model.encode(batch_texts) # 3) Zip results back to their sources for (file_name, chunk_offset), vector in zip(metadata, vectors): index.upsert(file_id=file.name, chunk_offset=chunk.offset, vector=vector) Zaidi ya hayo, kuchanganya kila kitu mara moja sio bora kwa sababu hatua zifuatazo zinaweza kuanza tu baada ya hatua hii imekamilika kwa data zote. Msaada wa Batching wa CocoIndex CocoIndex hupunguza upungufu na inaruhusu kupata bora ya ulimwengu wote wawili - kudumisha urahisi wa nambari yako kwa kufuata mtiririko wa asili, wakati wa kupata ufanisi kutoka kwa batching iliyotolewa na kozi ya CocoIndex. Sisi tayari kuwezesha msaada wa mfululizo kwa kazi zifuatazo zilizounganishwa: Muhtasari wa Mchakato wa kubadilisha Kumbukumbu ya Msisemi Shaykh Rabiy ́ Hakuna mabadiliko ya moto. Your existing code will just work without any change – still following the natural flow, while enjoying the efficiency of batching. Kwa kazi za kibinafsi, kuruhusu batching ni rahisi kama vile: Set batching=True katika decorator kazi ya kipekee. Badilisha maneno na kurudi aina kwa orodha. Kwa mfano, ikiwa unataka kuunda kazi ya kibinafsi ambayo inatoa API ili kujenga miniature kwa picha. @cocoindex.op.function(batching=True) def make_image_thumbnail(self, args: list[bytes]) -> list[bytes]: ... Angalia nyaraka za uhifadhi kwa maelezo zaidi. Angalia nyaraka za uhifadhi kwa maelezo zaidi. Jinsi ya kutumia CocoIndex Njia ya kawaida Batching inafanya kazi kwa kukusanya maombi yanayotoka katika mstari na kuamua wakati sahihi wa kuwasha kama mstari mmoja. wakati huo ni muhimu - ufanye vizuri, na ufanye usawa wa upatikanaji, latency, na matumizi ya rasilimali wote kwa wakati mmoja. Two widely used batching policies dominate the landscape: In this approach, the system flushes all requests that arrived within a fixed window of W milliseconds. Time-based batching (flush every W milliseconds): The maximum wait time for any request is predictable, and implementation is straightforward. It ensures that even during low traffic, requests will not remain in the queue indefinitely. Advantages: During periods of sparse traffic, idle requests accumulate slowly, adding latency for early arrivals. Additionally, the optimal window W often varies with workload characteristics, requiring careful tuning to strike the right balance between latency and throughput. Drawbacks: Here, a batch is triggered once the queue reaches a pre-defined number of items, K. Size-based batching (flush when K items are queued): The batch size is predictable, which simplifies memory management and system design. It is easy to reason about the resources each batch will consume. Advantages: When traffic is light, requests may remain in the queue for an extended period, increasing latency for the first-arriving items. Like time-based batching, the optimal K depends on workload patterns, requiring empirical tuning. Drawbacks: Mifumo mingi ya ufanisi huchukua a : wanapoweka mfululizo wakati mfululizo wa muda wa W unapotimia au mfululizo unapofikia ukubwa wa K - kile kinachotokea kwanza. mkakati huu unashughulikia faida za mbinu zote mbili, kuboresha kujibu wakati wa trafiki mdogo wakati wa kudumisha ukubwa wa mfululizo wa ufanisi wakati wa mzigo wa juu. hybrid approach Pamoja na hayo, uchafuzi daima unahusisha Mipangilio ya trafiki, sifa za kazi, na vikwazo vya mfumo wote huathiri mipangilio bora. Kufikia utendaji bora mara nyingi unahitaji ufuatiliaji, profile, na kurekebisha kimwili vigezo hivi ili kuungana na hali ya wakati halisi. tunable parameters and trade-offs Njia ya CocoIndex Kiwango cha framework: Adaptive, bure ya knob CocoIndex kutekeleza a ambayo inabadilika moja kwa moja kwa mzigo wa ombi la kuingia. mchakato unafanya kazi kama ifuatavyo: simple and natural batching mechanism Utaratibu wa kuendelea: Wakati mfululizo wa sasa unafanywa kwenye kifaa (kwa mfano, GPU), maombi mapya ya kuingia hayatashughulikiwa mara moja. Badala yake, yanahifadhiwa. Hii inaruhusu mfumo kukusanya kazi bila kuharibu hesabu inayoendelea. Automatic batch window: Wakati batch ya sasa inafunguliwa, CocoIndex mara moja huchukua maombi yote ambayo yamehifadhiwa katika mstari na kutibiwa kama batch ijayo. Adaptive batching: Hakuna timers, hakuna ukubwa wa batch imara, na hakuna mipaka iliyopangwa. ukubwa wa kila batch inabadilika kwa njia ya asili kwa trafiki ambayo ilikuja wakati wa huduma ya batch iliyopita. Muda mrefu wa trafiki hutoa batches kubwa, kuongeza matumizi ya GPU. Muda mdogo wa trafiki hutoa batches ndogo, kupunguza muda wa muda kwa maombi mapema. Mfumo wa CocoIndex ni Ni daima kusindika maombi katika makundi wakati kuruhusu ukubwa wa makundi kutafakari mahitaji ya wakati halisi, kufikia upatikanaji mkubwa bila kuhitaji tuning manual au heuristics ngumu. In essence, self-tuning Kwa nini hii ni nzuri? Ufuatiliaji wa chini wakati mdogo: Kwa maombi machache, mikataba ni ndogo (kwa kawaida ukubwa wa 1), hivyo unaendesha kwa ufanisi karibu na ufuatiliaji wa simu moja. Upatikanaji mkubwa wakati wa kusisimua: Wakati trafiki huongezeka, maombi zaidi yanakusanywa wakati wa mfululizo wa ndege, hivyo mfululizo ujao ni mkubwa zaidi - matumizi huongezeka moja kwa moja. Hakuna tuning: Huna haja ya kurekebisha W au K. Mfumo unabadilika kwa mfano wako wa trafiki kwa kubuni. Function-level batching: packing the batch intelligently Upatikanaji wa kiwango cha kazi: ufungaji wa batch kwa busara Kwenye kiwango cha kazi, CocoIndex huwezesha kila kazi kushughulikia dirisha la batch - maombi yote ya mstari wakati batch iliyopita inafunguliwa - kwa njia ya ufanisi zaidi na salama kwa mfano wake maalum au maktaba. kuwezesha ufanisi na ufanisi wa kiwango cha juu. how it’s processed is up to the function Kuchukua ya Mfano: Maktaba ya msingi ya sentence-transformer inaweza kukubali mikataba ya urefu wowote, lakini kwa ndani inachanganya katika (Kizazi cha default: 32) ili kuhakikisha kila moja inashikiliwa kwa urahisi katika kumbukumbu ya kifaa wakati wa kuweka kernels ya GPU katika "msitu" yao bora. SentenceTransformerEmbed micro-batches Batching sio tu juu ya kuunganisha data katika kumbukumbu - pia ni juu ya kupunguza hesabu iliyotumika. , ambayo inaruhusu GPU kutekeleza kernels sawa, ya upatikanaji mkubwa. Hata hivyo, hii inamaanisha kwamba mfululizo mfupi unapaswa kulipa gharama ya mfululizo mrefu zaidi katika mfululizo. Kwa mfano, kuchanganya vitu vya 64-token na 256-token husababisha vitu vya 64-token kutumika ~4x zaidi ya gharama kuliko inahitajika. na kuunda micro-batches ya urefu karibu sawa, kupunguza padding juu ya kichwa na kudumisha GPU matumizi ya juu. pad every sequence in a batch to the length of the longest sequence sorting requests by token count Function nyingine inaweza kutumia mikakati yao wenyewe: baadhi inaweza tu kuhamisha mfululizo kamili kwa backend, wakati wengine inaweza kutekeleza kama tile SIMD au merge-writes. CocoIndex inaendelea agnostic kwa njia - wajibu wake ni , kutoa kila kazi udhibiti kamili juu ya jinsi ya kuongeza upatikanaji na kupunguza overhead. custom packing schemes deliver the batch window efficiently and without delay Njia hii ya kujenga usawa Framework inashughulikia orchestration ya batching, wakati kazi yenyewe optimize kwa ajili ya kumbukumbu, kompyuta, na ufanisi wa kernel - kuhakikisha upatikanaji wa juu katika kazi mbalimbali bila kulazimisha suluhisho moja-kubwa-fits-kila. simplicity, flexibility, and performance Conclusion Mwisho wa Batching is one of the most effective strategies for accelerating computational workloads. By Kuwezesha , and , batching inabadilisha kile ambacho kitakuwa kompyuta ndogo na isiyo na ufanisi katika operesheni ndogo na yenye ufanisi. amortizing fixed overhead across multiple items larger, more efficient GPU operations minimizing data transfer CocoIndex hufanya batching Baadhi ya kazi zilizounganishwa tayari hutoa faili chini ya kapu, na kazi za kibinafsi zinaweza kuchukua kwa urahisi Hii hupunguza utata wa kuendesha mikono ya mikono, timers, au ukubwa wa seti, na kuruhusu watengenezaji kuzingatia mifano na maombi yao. effortless and automatic batching=True Faida za utendaji wa batching ni kubwa zaidi wakati , kama vile kwa mifano ndogo au shughuli ndogo. Batching pia ni ufanisi zaidi wakati API ya msingi au maktaba , kama msaada wa sehemu unaweza kupunguza faida - kwa mfano, baadhi ya maktaba kama Ollama kuonyesha tu uboreshaji mdogo wakati wa batching. fixed overhead represents a significant portion of total computation fully supports batched operations Kwa kifupi, uchafu ni a : inachukua kiasi kikubwa cha uwezo, inapunguza muda wa muda ambapo ni muhimu, na inaruhusu vifaa kufanya kazi karibu na uwezo wake kamili - yote wakati wa kuweka uzoefu wa watengenezaji rahisi na unaoweza kutabiri. high-leverage optimization Tusaidia kwa kuwapa CocoIndex Star kwenye GitHub na kushirikiana na jamii yako ikiwa unaona ni muhimu! Tusaidia kwa kuwapa CocoIndex Star kwenye GitHub na kushirikiana na jamii yako ikiwa unaona ni muhimu! ya Github