Skip to content

Security

Hierarchical Deterministic Wallets in Practice: BIP-32/39/44 and Where Real Implementations Go Wrong

BIP-32/39/44 HD wallet derivation explained for engineers: hardened vs normal derivation, entropy mistakes, multi-chain paths, and common implementation bugs from production custody work.

10 min
#hd-wallets #bip32 #bip39 #bip44 #key-derivation #custody #cryptography

Implementing HD wallet derivation for a multi-chain custody platform is one of those tasks where the specification seems clear until you try to implement it. BIP-32, BIP-39, and BIP-44 are each individually understandable, but their interaction has corner cases that have caused real funds losses - not from cryptographic flaws, but from engineering mistakes in the derivation path logic, entropy sourcing, and backup procedures.

This post covers the full derivation chain from mnemonic entropy to leaf address, with specific attention to the mistakes I have seen in production implementations.

BIP-39: From Entropy to Mnemonic

BIP-39 defines the conversion from raw entropy to a human-readable mnemonic phrase. The flow:

Raw entropy (128 or 256 bits)
    ↓ SHA-256 hash
Checksum (4 or 8 bits from hash)
    ↓ append checksum to entropy
Final bit string (132 or 264 bits)
    ↓ split into 12 or 24 groups of 11 bits
Word indices (each 0-2047)
    ↓ map via wordlist (BIP-39 2048-word English list)
Mnemonic (12 or 24 words)
    ↓ PBKDF2(mnemonic + optional passphrase, "mnemonic" + passphrase, 2048 rounds, HMAC-SHA512)
512-bit seed

Use 24 words, not 12. This is non-negotiable for institutional custody. 12 words = 128 bits of entropy. 24 words = 256 bits of entropy. While 128 bits is computationally infeasible to brute-force with today’s hardware, the argument for 24 words is not about today’s compute - it is about the 20-30 year holding period of institutional funds and the margin of safety against algorithmic advances. Use 256 bits.

Entropy sources matter. The BIP-39 entropy must be cryptographically random. Several mistakes I have seen:

# WRONG: Python's random module is not cryptographically secure
import random
entropy = random.randbytes(32)  # Do not use this

# WRONG: seeding with a timestamp
import time
random.seed(time.time())  # Predictable

# CORRECT: use os.urandom() which reads from /dev/urandom
import os
entropy = os.urandom(32)  # 256 bits for 24-word mnemonic

# CORRECT: verify entropy quality before use
import secrets
entropy = secrets.token_bytes(32)  # secrets module uses os.urandom internally

The VM startup entropy problem. /dev/urandom on a freshly-started virtual machine may have insufficient entropy in its pool. On AWS, the Nitro hypervisor seeds the entropy pool from hardware RNG before the VM is visible. On other virtualization platforms, the entropy pool may not be seeded. Check:

# Check entropy available (should be > 256 bits for key generation)
cat /proc/sys/kernel/random/entropy_avail
# If < 256 on a fresh VM, wait or use haveged/rng-tools

# Verify entropy source
cat /proc/sys/kernel/random/urandom_min_reseed_secs
# On Linux 5.6+, /dev/urandom is as secure as /dev/random

Generating and verifying a mnemonic:

from mnemonic import Mnemonic
import os
import hashlib

def generate_mnemonic_24word() -> str:
    """Generate a cryptographically secure 24-word BIP-39 mnemonic."""
    mnemo = Mnemonic("english")

    # 256 bits = 24 words
    entropy = os.urandom(32)
    mnemonic = mnemo.to_mnemonic(entropy)

    # Verify checksum (mnemo.check() validates the checksum bits)
    assert mnemo.check(mnemonic), "Mnemonic checksum failed - entropy is corrupt"

    word_count = len(mnemonic.split())
    assert word_count == 24, f"Expected 24 words, got {word_count}"

    return mnemonic

def mnemonic_to_seed(mnemonic: str, passphrase: str = "") -> bytes:
    """Convert BIP-39 mnemonic to 512-bit seed."""
    from mnemonic import Mnemonic
    mnemo = Mnemonic("english")

    # Validate first
    if not mnemo.check(mnemonic):
        raise ValueError("Invalid BIP-39 mnemonic - checksum failed")

    # PBKDF2-HMAC-SHA512 with "mnemonic" + passphrase as salt
    # 2048 iterations (BIP-39 specification)
    seed = Mnemonic.to_seed(mnemonic, passphrase)
    assert len(seed) == 64, f"Expected 64-byte seed, got {len(seed)}"

    return seed

BIP-32: The Derivation Tree

BIP-32 defines how to derive a tree of key pairs from a single master seed. The core primitive is CKDpriv (Child Key Derivation, private):

CKDpriv(parent_key, parent_chain_code, index) → (child_key, child_chain_code)

Where:
- If index >= 2^31: hardened derivation - I = HMAC-SHA512(parent_chain_code, 0x00 || parent_key || index)
- If index < 2^31: normal derivation  - I = HMAC-SHA512(parent_chain_code, parent_pubkey || index)
- Split I into left 32 bytes (IL) and right 32 bytes (IR = child chain code)
- child_private_key = (IL + parent_key) mod n

The critical distinction: hardened vs normal derivation.

Normal derivation (index 0 to 2^31-1): the child key is derived from the parent public key. This means you can derive all child public keys from a parent public key, without knowing any private keys - useful for watch-only wallets and address generation without private key exposure.

The security risk: if an attacker knows a child private key and the parent chain code, they can compute the parent private key. This means a compromise of any child normal key + the chain code = full parent key compromise.

Hardened derivation (index 2^31 to 2^32-1, written as 0’ to 2^31-1’): the child key is derived from the parent private key. An attacker who knows a child hardened key cannot compute the parent private key. But you cannot generate child public keys without the parent private key.

Rule for institutional custody: always use hardened derivation at the account level and above. The purpose (44’), coin type, and account levels should all be hardened.

BIP-44: The Standard Path Structure

BIP-44 defines a specific path structure for multi-account, multi-coin wallets:

m / purpose' / coin_type' / account' / change / address_index

Where:
m           = master key
purpose'    = 44' (BIP-44) or 84' (BIP-84 for native SegWit) or 86' (Taproot)
coin_type'  = 0' for Bitcoin, 60' for Ethereum, 501' for Solana (SLIP-0044)
account'    = 0', 1', 2'... (separate accounts per use case)
change      = 0 (receiving) or 1 (change/internal) - NORMAL derivation
address_idx = 0, 1, 2... - NORMAL derivation

The first three levels (purpose, coin type, account) are hardened - this is mandatory per BIP-44. The last two (change, address index) are normal - which allows generating receiving addresses from the xpub without exposing private keys.

from hdwallet import HDWallet
from hdwallet.symbols import ETH

def derive_eth_account(mnemonic: str, account_index: int = 0) -> dict:
    """
    Derive an Ethereum account following BIP-44.
    Returns the account's xpub and first 10 receiving addresses.
    """
    wallet = HDWallet(symbol=ETH)
    wallet.from_mnemonic(mnemonic=mnemonic, language="english", passphrase="")
    wallet.from_path(path=f"m/44'/60'/{account_index}'/0")

    # Account extended public key (safe to share with watch-only services)
    account_xpub = wallet.xpublic_key()

    addresses = []
    for i in range(10):
        # Drive to m/44'/60'/{account}'/0/{i}
        child_wallet = HDWallet(symbol=ETH)
        child_wallet.from_mnemonic(mnemonic=mnemonic)
        child_wallet.from_path(path=f"m/44'/60'/{account_index}'/0/{i}")
        addresses.append({
            "index": i,
            "address": child_wallet.p2pkh_address(),
            "private_key": child_wallet.private_key(),  # Handle with care
        })

    return {"xpub": account_xpub, "addresses": addresses}

def derive_btc_taproot(mnemonic: str, account_index: int = 0) -> dict:
    """BIP-86 path for Bitcoin Taproot (P2TR) addresses."""
    from hdwallet import HDWallet
    from hdwallet.symbols import BTC

    wallet = HDWallet(symbol=BTC)
    wallet.from_mnemonic(mnemonic=mnemonic)
    wallet.from_path(path=f"m/86'/0'/{account_index}'/0/0")

    return {
        "path": f"m/86'/0'/{account_index}'/0/0",
        "address": wallet.p2tr_address(),  # bech32m format
        "xpub": wallet.xpublic_key(),
    }

Multi-chain coin type table (SLIP-0044):

Coin            Coin Type    Example Path
─────────────────────────────────────────────────────────────────
Bitcoin         0'           m/44'/0'/0'/0/0 (P2PKH, legacy)
                             m/84'/0'/0'/0/0 (P2WPKH, native SegWit)
                             m/86'/0'/0'/0/0 (P2TR, Taproot)
Ethereum        60'          m/44'/60'/0'/0/0
Solana          501'         m/44'/501'/0'/0'  (hardened all the way)
Cosmos          118'         m/44'/118'/0'/0/0
Binance Smart   9006'        m/44'/9006'/0'/0/0
Polygon         60'          m/44'/60'/0'/0/0  (same as ETH - same key)

Note: Polygon and most EVM chains use the same derivation path as Ethereum (coin type 60’). The same private key signs transactions on ETH and MATIC. This is a security consideration: compromise of your ETH key compromises your MATIC key.

Recovery Testing: The Most Important Practice

Most custody systems have a mnemonic backup. Most have never tested recovery from that mnemonic. This is a category error: you do not have a backup until you have successfully recovered from it.

Recovery test procedure to run quarterly:

#!/usr/bin/env python3
"""
mnemonic_recovery_test.py - Run quarterly to verify backup integrity
NEVER run this on a production machine - use an air-gapped offline machine
"""

import sys
from hdwallet import HDWallet
from hdwallet.symbols import ETH, BTC

def recovery_test(mnemonic_from_backup: str):
    """
    Derive known addresses from backup mnemonic and compare to known-good values.
    These known-good values should be stored separately from the mnemonic.
    """
    KNOWN_GOOD = {
        # Pre-recorded during initial key ceremony
        # m/44'/60'/0'/0/0
        "eth_account0_index0": "0xAbCd...1234",
        # m/44'/0'/0'/0/0
        "btc_account0_index0": "bc1q...xyz",
    }

    # Test ETH derivation
    eth = HDWallet(symbol=ETH)
    eth.from_mnemonic(mnemonic=mnemonic_from_backup)
    eth.from_path("m/44'/60'/0'/0/0")
    derived_eth = eth.p2pkh_address()

    if derived_eth != KNOWN_GOOD["eth_account0_index0"]:
        print(f"RECOVERY FAILURE: ETH address mismatch")
        print(f"  Expected: {KNOWN_GOOD['eth_account0_index0']}")
        print(f"  Derived:  {derived_eth}")
        sys.exit(1)

    # Test BTC derivation
    btc = HDWallet(symbol=BTC)
    btc.from_mnemonic(mnemonic=mnemonic_from_backup)
    btc.from_path("m/44'/0'/0'/0/0")
    derived_btc = btc.p2pkh_address()

    if derived_btc != KNOWN_GOOD["btc_account0_index0"]:
        print(f"RECOVERY FAILURE: BTC address mismatch")
        sys.exit(1)

    print("RECOVERY SUCCESS: All derived addresses match known-good values")
    print("Backup mnemonic is intact and valid")

# Input mnemonic from backup (prompted, never hardcoded)
mnemonic = input("Enter recovery mnemonic: ").strip()
recovery_test(mnemonic)

How This Breaks in Production

Failure 1: Non-hardened account derivation exposing master key. An engineer uses m/44'/60'/0/0/0 (note: no apostrophe on the account level - normal, not hardened). A monitoring service receives the account xpub (normal xpub from level m/44'/60'/0) to generate receiving addresses. The monitoring service is compromised. The attacker has the xpub. Later, through a separate incident, one child private key leaks. The attacker combines the xpub (chain code) with the leaked child private key and trivially computes the master key. All funds across all derivation paths are compromised. Fix: use hardened derivation (’) for purpose, coin_type, and account levels. Always.

Failure 2: Passphrase not included in backup. BIP-39 supports an optional passphrase (“25th word”) that is mixed into the seed derivation. The passphrase is not part of the mnemonic - it is additional authentication. If a passphrase was used during key generation and is not included in the backup procedure, recovery from the 24-word mnemonic alone produces different keys than the originals. The funds are inaccessible. Fix: if using a passphrase, it must be backed up with the same rigor as the mnemonic but stored separately (defense against an attacker who finds only one).

Failure 3: Address index gap limit causes lost funds. You derive receiving addresses at indices 0-9. All active. A user requests a new address - you give them index 10. They send funds. Your wallet software, attempting to restore from mnemonic, only scans indices 0-9 (the default gap limit for many wallets is 20, but your implementation used 10). The funds at index 10 are invisible until you increase the gap limit and rescan. Fix: use the standard gap limit of 20 for scanning on recovery, and monitor that your active highest index never exceeds discovered_last_used + gap_limit / 2.

Failure 4: Different coin_type for EVM-equivalent chains causing address mismatch. An exchange generates ETH addresses using m/44'/60'. For BNB Chain (BSC), they use m/44'/9006' (the official BNB Chain coin type). A user deposits BSC tokens to an ETH address (same format, different derivation path). The funds arrive at a valid EVM address that you do not have the private key for - it was derived under the ETH path, but the funds are on a chain your monitoring does not watch. Fix: for any EVM chain you support, document and enforce which coin type you use. Many production exchanges use coin_type=60 for all EVM chains (same key) - accept this as a business decision and document it explicitly.

Failure 5: Recovery phrase backup on cloud storage. An engineer creates a “secure backup” of the mnemonic in an encrypted note in their cloud password manager. The cloud account is compromised (phishing, credential reuse). The mnemonic is now in the attacker’s hands. Fix: mnemonics must be stored on physical media only - paper (laminated, multiple copies), metal backup plates, or hardware secure elements. Cloud storage of plaintext or weakly-encrypted mnemonics is never acceptable for institutional custody.

Failure 6: Testing recovery on a production machine. An engineer tests the mnemonic recovery procedure using a script on their development laptop. The script outputs the derived private keys to stdout. The laptop has been compromised by a keylogger or screen recording tool (via the Contagious Interview vector). The private keys appear in the attacker’s logs. Fix: mnemonic recovery testing must happen on an air-gapped, freshly-imaged machine. The machine must never have network access during or after the recovery procedure. After testing, the machine’s storage must be wiped.


Related reading: Key Ceremonies and Quorum Approvals covers the physical process of generating these keys securely. MPC vs HSM vs Multisig covers alternatives to HD wallets for institutional custody. The DPRK Threat Model covers how attackers target recovery procedures.

Continue Reading

Enjoyed this?

Get one deep infrastructure insight per week.

Free forever. Unsubscribe anytime.

You're in. Check your inbox.