Backend	GPU	CPU	Accuracy
fast	16	16	⭐⭐⭐
compress-fast	4	8	⭐⭐
compress	4	4	⭐
baseline	16	16	⭐⭐⭐

Backend

GPU

CPU

Accuracy

fast

⭐⭐⭐

compress-fast

⭐⭐

compress

⭐

baseline

⭐⭐⭐

{/* Toggle Button */}

{/* Model Memory Calculator */}

Model Memory Calculator

Use our Model Memory Calculator to help you estimate the memory footprint of your model for different precisions and the maximum batch size / sequence length combination you can run on your device.

{/* Model and Device Selection */}

{/* Model Selection */}

Model

{modelSelectionTab ? ( <> Select a Model ) : ( <> Model Parameters (in billions) setModelParams(Number(e.target.value))} /> Hidden Size setHiddenSize(Number(e.target.value))} /> Number of Layers setNumLayers(Number(e.target.value))} /> )}

{/* Device Selection */}

Device

{deviceSelectionTab ? ( <> Select a Device ) : ( <> Device RAM (in GB) setDeviceMemory(Number(e.target.value))} /> )}

Backend Precision Table

This table shows the precision used by each Takeoff backend for CPUs and GPUs, as well as their accuracy preservation.

Input parameters

Sequence Length: The combined length of input tokens and output tokens. To restrict the maximum sequence length for inference on Takeoff, use the API parameters{' '} prompt_new_tokens for input tokens and max_new_tokens for output tokens when making a request.

Batch Size: The number of sequences that can be processed in parallel. To set a maximum batch size for inference on Takeoff, set the environment variable{' '} TAKEOFF_MAX_BATCH_SIZE to your desired value.

{/* Prefill Chunking Settings */} {isPrefillChunking && (

Prefill Chunking Settings

Max Chunk Size setMaxChunkSize(Number(e.target.value))} /> Intermediate Size setIntermediateSize(Number(e.target.value))} />

)} {/* Charts Section */} {isPrefillChunking ? ( ) : ( hiddenSize && numLayers && deviceMemory && modelParams && ( <> {/* Model Footprint Chart */}

Model Footprint

FP32

0} />

{calculateMemory(modelParams, 'fp32')} {deviceMemory ? `/ ${deviceMemory} ` : null}GB

{/* FP16 */}

FP16

0} />

{calculateMemory(modelParams, 'fp16')} {deviceMemory ? `/ ${deviceMemory} ` : null}GB

{/* INT8 */}

INT8

0} />

{calculateMemory(modelParams, 'int8')} {deviceMemory ? `/ ${deviceMemory} ` : null}GB

{/* INT4 */}

INT4

0} />

{calculateMemory(modelParams, 'int4')} {deviceMemory ? `/ ${deviceMemory} ` : null}GB

{/* Maximum Batch Size / Sequence Length Chart */}

Maximum Batch Size / Sequence Length

{/* Batch Size and Sequence Length Inputs */}

) )}