Gemma 3n includes the following key features:
Audio input: Process sound data for speech recognition, translation, and audio data analysis.
Visual and text input: Multimodal capabilities let you handle vision, sound, and text to help you understand and analyze the world around you.
PLE caching: Per-Layer Embedding (PLE) parameters contained in these models can be cached to fast, local storage to reduce model memory run costs. Learn more
MatFormer architecture: Matryoshka Transformer architecture allows for selective activation of the models parameters per request to reduce compute cost and response times. Learn more
Conditional parameter loading: Bypass loading of vision and audio parameters in the model to reduce the total number of loaded parameters and save memory resources. Learn more
Wide language support: Wide linguistic capabilities, trained in over 140 languages.
32K token context: Substantial input context for analyzing data and handling processing tasks.