Solidity Metadata Exposure
Theory
When you compile a Solidity contract, solc not only emits executable EVM bytecode, it also generates a metadata JSON file containing rich information about the compilation and contract structure.
Crucially, a reference to this metadata is appended directly into the deployed runtime bytecode. This is not accidental, Solidity intends for tools like wallets, explorers, and verifiers to automatically fetch interface and documentation data, but it also means more information than intended can be exposed to anyone who inspects contract bytecode (Solidity).
What’s in “Metadata”
The metadata file is a canonical JSON structure describing:
compiler version and settings (optimization, paths, options)
language and version attributes
ABI (Application Binary Interface) definitions
NatSpec documentation (developer and user comments)
source file references and cryptographic hashes
any libraries used, including their encoded addresses
That JSON is useful for reproducible builds and verification, but the presence of this rich, human‑meaningful description in conjunction with the contract’s bytecode means that security‑relevant information can be exposed off‑chain merely by inspecting deployed code.
How Metadata Is Embedded in Bytecode
The Solidity compiler does not embed the full JSON metadata in bytecode. Instead, it puts a compact reference to the metadata JSON at the very end of the runtime bytecode. This reference is encoded using CBOR (Concise Binary Object Representation), a binary serialization designed for efficiency, not secrecy, and then its length is appended.
To extract it:
Read the last two bytes of the deployed bytecode.
Treat those two bytes as a big‑endian integer, giving the length
Lof the CBOR payload.Look at the
Lbytes immediately before those two bytes — that is the CBOR‑encoded metadata map.
Inside that CBOR map, Solidity typically includes keys such as "ipfs" (an IPFS CID pointing to the full metadata JSON) and "solc" (the compiler version). It may also include additional optional keys (e.g., experimental flags or alternate hash types such as Swarm) if certain compiler settings were used.
A simplified conceptual structure looks like this in smart contrat runtime bytecodes:
Since CBOR is schemaless and map entries can vary, the only reliable extraction method is:
read last two bytes for length,
then run a CBOR decoder on the preceding segment.
What CBOR Actually Encodes
CBOR encodes binary maps efficiently. A common structure appended by solc might decode to a map like:
Where:
"ipfs"holds the CID that identifies the full metadata JSON on IPFS."solc"contains a compact version encoding such as 3 bytes representing the major, minor, and patch version numbers of the Solidity compiler used.
The encoder prefixes and length prefixes in CBOR are not human‑readable, but a CBOR parser easily converts them into a standard JSON object. This CBOR mapping is precisely what you find if you inspect a contract’s runtime bytecode and run it through a CBOR decoder. Tools like SolMeta automate this extraction.
Why This Exposure Matters
For security review, the exposure of metadata references is significant:
Exact Compiler Version & Settings Knowing the precise compiler version and flags can reveal which compiler bugs or optimizer issues may apply, turning benign‑looking code into a vulnerability candidate.
Source & ABI Access Once you resolve the IPFS CID from the CBOR, you can fetch the full metadata JSON and then fetch the entire source files, ABI, and even NatSpec docstrings. This undermines any “obfuscation” that relied on keeping source code non‑public.
Automated Attack Planning With full source and ABI data available, tools (slither, hevm, echidna) can run deeper static analysis.
Sensitive Parameters Metadata can sometimes reveal sensitive informations
Disabling Metadata Append
Solidity provides a compile‑time option (--no-cbor-metadata or via the Standard JSON interface with settings.metadata.appendCBOR: false) to omit appending the CBOR metadata entirely. Omitting it can save deployment gas and prevent exposure, but also breaks source verification workflows on tools like Etherscan or Sourcify
Practice
In a practical audit or pentest, your first step when encountering unknown bytecode on-chain is to extract and decode the CBOR metadata.
SolMeta is a Python tool for extracting Solidity smart contract metadata from bytecode or contract addres
SolMeta automates the entire process:
Retrieves the runtime bytecode via RPC / or if provided by file
Extracts the last 2 bytes to determine CBOR length.
Reads and decodes the CBOR map containing IPFS/Swarm metadata references.
Fetches the full metadata JSON from IPFS or Swarm.
Outputs ABI, compiler version, and source file references.
For a browser-based workflow, the Sourcify Playground (https://playground.sourcify.dev/) can:
Extract CBOR metadata
Fetch the full metadata JSON
Reconstruct ABI and source references for quick inspection
Resources
Last updated
Was this helpful?