BPOI Banner
Meta Unveils Open Source Llama 3.2: AI That Sees And Fits in Your Pocket Meta Unveils Open Source Llama 3.2: AI That Sees And Fits in Your Pocket

Meta Unveils Open Source Llama 3.2: AI That Sees And Fits in Your Pocket

It’s been a good week for open-source AI.

On Wednesday, Meta announced an upgrade to its state-of-the-art large language model, Llama 3.2, and it doesn’t just talk—it sees.

More intriguing, some versions can squeeze into your smartphone without losing quality, which means you could potentially have private local AI interactions, apps and customizations without sending your data to third party servers.

Unveiled Wednesday during Meta Connect, Llama 3.2 comes in four flavors, each packing a different punch. The heavyweight contenders—11B and 90B parameter models—flex their muscles with both text and image processing capabilities.

They can tackle complex tasks such as analyzing charts, captioning images, and even pinpointing objects in pictures based on natural language descriptions.

Llama 3.2 arrived the same week as Allen Institute’s Molmo, which claimed to be the best open-source multimodal vision LLM in synthetic benchmarks, performing in our tests on par with GPT-4o, Claude 3.5 Sonnet, and Reka Core.

Zuckerberg’s company also introduced two new flyweight champions: a pair of 1B and 3B parameter models designed for efficiency, speed, and limited but repetitive tasks that don’t require too much computation.

These small models are multilingual text maestros with a knack for “tool-calling,” meaning they can integrate better with programming tools. Despite their diminutive size, they boast an impressive 128K token context window—the same as GPT4o and other powerful models—making them ideal for on-device summarization, instruction following, and rewriting tasks.

Meta’s engineering team pulled off some serious digital gymnastics to make this happen. First, they used structured pruning to trim the unnecessary data from larger models, then employed knowledge distillation—transferring knowledge from large models to smaller ones—to squeeze in extra smarts.

The result was a set of compact models that outperformed rival competitors in their weight class, besting models including Google’s Gemma 2 2.6B and Microsoft’s Phi-2 2.7B on various benchmarks.

Meta is also working hard to boost on-device AI. They’ve forged alliances with hardware titans Qualcomm, MediaTek, and Arm to ensure Llama 3.2 plays nice with mobile chips from day one. Cloud computing giants aren’t left out either—AWS, Google Cloud, Microsoft Azure, and a host of others are offering instant access to the new models on their platforms.

Under the hood, Llama 3.2’s vision capabilities come from clever architectural tweaking. Meta’s engineers baked in adapter weights onto the existing language model, creating a bridge between pre-trained image encoders and the text-processing core.

In other words, the model’s vision capabilities don’t come at the expense of its text processing competence, so users can expect similar or better text results when compared to Llama 3.1.

The Llama 3.2 release is Open Source—at least by Meta’s standards. Meta is making the models available for download on Llama.com and Hugging Face, as well as through their extensive partner ecosystem.

Those interested in running it on the cloud can use their own Google Collab Notebook or use Groq for text-based interactions, generating nearly 5000 tokens in less than 3 seconds.

Riding the Llama

We put Llama 3.2 through its paces, quickly testing its capabilities across various tasks.

In text-based interactions, the model performs on par with its predecessors. However, its coding abilities yielded mixed results.

When tested on Groq’s platform, Llama 3.2 successfully generated code for popular games and simple programs. Yet, the smaller 70B model stumbled when asked to create functional code for a custom game we devised. The more powerful 90B, however, was a lot more efficient and generated a functional game on the first try.

You can see the full code generated by Llama-3.2 and all the other models we tested by clicking on this link.

Identifying styles and subjective elements in images

Llama 3.2 excels at identifying subjective elements in images. When presented with a futuristic, cyberpunk-style image and asked if it fit the steampunk aesthetic, the model accurately identified the style and its elements. It provided a satisfactory explanation, noting that the image didn’t align with steampunk due to the absence of key elements associated with that genre.

Chart Analysis (and SD image recognition)

Chart analysis is another strong suit for Llama 3.2, though it does require high-resolution images for optimal performance. When we input a screenshot containing a chart—one that other models like Molmo or Reka could interpret—Llama’s vision capabilities faltered. The model apologized, explaining that it couldn’t read the letters properly due to the image quality.

Text in Image Identification

While Llama 3.2 struggled with small text in our chart, it performed flawlessly when reading text in larger images. We showed it a presentation slide introducing a person, and the model successfully understood the context, distinguishing between the name and job role without any errors.

Verdict

Overall, Llama 3.2 is a big improvement over its previous generation and is a great addition to the open-source AI industry. Its strengths are in image interpretation and large-text recognition, with some areas for potential improvement, particularly in processing lower-quality images and tackling complex, custom coding tasks.

The promise of on-device compatibility is also good for the future of private and local AI tasks and is a great counterweight to close offers like Gemini Nano and Apple’s proprietary models.

Edited by Josh Quittner and Sebastian Sinclair

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

Jose Antonio Lanz

https://decrypt.co/283308/meta-unveils-open-source-llama-ai-fits-in-your-pocket

2024-09-27 00:27:06

bitcoin
Bitcoin (BTC) $ 91,005.38 3.03%
ethereum
Ethereum (ETH) $ 3,079.77 0.35%
tether
Tether (USDT) $ 1.00 0.02%
solana
Solana (SOL) $ 216.77 1.98%
bnb
BNB (BNB) $ 619.35 1.25%
dogecoin
Dogecoin (DOGE) $ 0.375408 0.12%
xrp
XRP (XRP) $ 0.885168 12.69%
usd-coin
USDC (USDC) $ 1.00 0.12%
staked-ether
Lido Staked Ether (STETH) $ 3,077.52 0.24%
cardano
Cardano (ADA) $ 0.738274 25.27%
tron
TRON (TRX) $ 0.192866 8.67%
shiba-inu
Shiba Inu (SHIB) $ 0.000025 4.71%
the-open-network
Toncoin (TON) $ 5.37 1.37%
avalanche-2
Avalanche (AVAX) $ 33.00 4.48%
wrapped-bitcoin
Wrapped Bitcoin (WBTC) $ 90,884.35 3.39%
wrapped-steth
Wrapped stETH (WSTETH) $ 3,657.14 0.22%
sui
Sui (SUI) $ 3.60 6.40%
pepe
Pepe (PEPE) $ 0.000023 5.71%
weth
WETH (WETH) $ 3,075.98 0.40%
chainlink
Chainlink (LINK) $ 13.79 5.05%
bitcoin-cash
Bitcoin Cash (BCH) $ 430.17 2.38%
polkadot
Polkadot (DOT) $ 5.15 6.60%
leo-token
LEO Token (LEO) $ 7.63 3.23%
near
NEAR Protocol (NEAR) $ 5.49 0.51%
aptos
Aptos (APT) $ 11.80 4.31%
litecoin
Litecoin (LTC) $ 83.30 1.45%
wrapped-eeth
Wrapped eETH (WEETH) $ 3,239.37 0.37%
usds
USDS (USDS) $ 0.993193 0.48%
uniswap
Uniswap (UNI) $ 8.55 3.36%
crypto-com-chain
Cronos (CRO) $ 0.169029 12.83%
stellar
Stellar (XLM) $ 0.144926 9.94%
internet-computer
Internet Computer (ICP) $ 8.68 7.49%
dogwifcoin
dogwifhat (WIF) $ 3.84 4.75%
bittensor
Bittensor (TAO) $ 516.20 2.32%
kaspa
Kaspa (KAS) $ 0.137764 2.49%
ethereum-classic
Ethereum Classic (ETC) $ 23.17 5.37%
fetch-ai
Artificial Superintelligence Alliance (FET) $ 1.28 2.64%
dai
Dai (DAI) $ 0.99957 0.16%
whitebit
WhiteBIT Coin (WBT) $ 22.32 0.68%
ethena-usde
Ethena USDe (USDE) $ 1.00 0.03%
bonk
Bonk (BONK) $ 0.000044 18.86%
polygon-ecosystem-token
POL (ex-MATIC) (POL) $ 0.371449 2.70%
blockstack
Stacks (STX) $ 1.87 2.10%
hedera-hashgraph
Hedera (HBAR) $ 0.073352 14.91%
render-token
Render (RENDER) $ 6.88 2.59%
okb
OKB (OKB) $ 43.91 0.55%
monero
Monero (XMR) $ 143.49 3.74%
first-digital-usd
First Digital USD (FDUSD) $ 1.00 0.25%
filecoin
Filecoin (FIL) $ 4.19 7.28%
aave
Aave (AAVE) $ 164.19 2.91%