BitIodine: extracting intelligence from the Bitcoin
Transkript
BitIodine: extracting intelligence from the Bitcoin
BitIodine: extracting intelligence from the Bitcoin network Michele Spagnuolo http://miki.it [email protected] @mikispag Bitcoin BitIodine About Bitcoin Decentralized, global digital currency A global peer-to-peer network, across which transactions are broadcast Can support strong anonymity implementations not strongly anonymous Recent clarification to US regulations - to be considered as foreign currency and not legal tender “Open money” “Open money” trust-no-one currency/commodity that isn’t subject to manipulation by central banks or corporations designed for security, founded on distributed cryptography “crypto-currency” ” . . . s i mer u n n i 1 s 2 i 4 v 1 s, t i i r s e a um t N n e a D , u e v Q “ ru t S . T History Digital currency based in cryptography 1993, 1998. Bitcoin whitepaper - 2008, by "Satoshi Nakamoto" (pseudonym) First client in January 2009 Open Source No single point of failure Who uses Bitcoin? A lot of “legitimate” uses of Bitcoin http://weusecoins.org Why use Bitcoin? Speed and price No central authority No setup Better privacy No counterfeit, no chargebacks No account freezing Algorithmically known inflation Trading goods for Bitcoin Also accepting Bitcoin... Who also uses Bitcoin? Top 20 categories $1.2M rev/month 220 categories, ranging from digital goods to various kinds of narcotics or prescription medicines Some stats about Bitcoin As of June 1: 1 BTC is traded for 128.4 USD (max: 266 USD on April 11, min: 50 USD on April 16) $1.5B market capitalization 30,000 BTC / 3.8M$ sent per hour Approximately 60,000 transactions per day Key concepts Address Wallet Transaction Block Blockchain Mining / Generation Address and Wallet An address is a string like 1MikiSPbrhCFk7S4wzZP7gQqhwWH866DCb generated by a Bitcoin client together with the private key needed to spend and redeem the coins sent to it. It is public, and can be posted everywhere in order to receive payments. A wallet is a file which stores addresses and the private keys needed to use them. Transaction A “record” of coins moving from one or more addresses to one recipient address. It is saved in a block. How spending works Several cryptographic technologies public key cryptography - ECDSA hashing algorithms SHA-1, SHA-256, ... Future proof - designed to be upgraded in a forward compatible way Block A block is a list of transactions. Each block embeds new transactions that took place before it was created. A new block is generated, on average, every 10 minutes. The first generated block is called genesis block. Transactions are placed in blocks, which are linked by SHA-256 hashes. Blockchain The blockchain is the linked list of all the blocks that have been generated since day one. A full copy of the blockchain contains every transaction ever executed, and is distributed on every client. For any block on the chain, there is only one path to the genesis block. Coming from the genesis block, however, there can be forks. Mining / Generation Users on the Bitcoin network compete to generate a new valid block that satisfies a difficult requirement (block hash with N leading zero bytes) - this is done by brute force. The lucky solver is then rewarded a fixed number of bitcoins for cryptographically validating transactions on the network. The parameter N is automatically adjusted by the protocol (difficulty), making it scale with the number of miners and making sure that, statistically, one block is generated every 10 minutes. Mining hardware Nowadays mining with GPUs is no longer profitable - FPGAs or ASICs. Distributed consensus problem New blocks are linked to older blocks, forming a block chain that is constantly being extended. Miners may generate blocks at the same time or too close together. A participant choosing to extend an existing path in the block chain indicates a vote towards consensus on that path. The longer the path, the more computation was expended building it. in a h c d e t p e cc a = h t a p t s e g n Lo Bitcoin offers a unique solution to the consensus problem in distributed systems since voting power is directly proportional to computing power. BitIodine Is Bitcoin anonymous? Yes No • Users don't need to • Every transaction is set up accounts • No ID required • Cash-like • Tor / VPN help stored in the blockchain • Analysis of the blockchain can correlate addresses BitIodine a tool for analyzing and profiling the Bitcoin network Transaction graph User graph Techniques for creating the User Graph Two main techniques: • Multi-input transactions grouping • Shadow address guessing (change) Multi-input transactions When a transaction has multiple input addresses, we can safely assume that those addresses belong to the same wallet, thus to the same user. Assumption: owners don't share private keys. Caveat: web wallets - pools that would be "mistakenly" grouped as a single user. Shadow addresses • the entire value of an unspent output of a prior transaction must be spent and used as input for a new transaction • input is destroyed, and change should be sent back to the user • to improve anonymity, a "shadow" address is automatically created and used to collect back the "change" Shadow addresses When a Bitcoin transaction has exactly two output addresses, O1 and O2, such that O2 is a new address (i.e., an address that has never appeared before), and O1 corresponds to an old address (an address that has appeared previously), we can assume that O2 constitutes a shadow address for one of the input addresses of that transaction. Transactions with two outputs are the vast majority. Transac/on'count'per'number'of'outputs' 10000000" 9000000" Why? 8000000" One payee, one shadow Number'of'transac/ons' 7000000" address for change back. 6000000" 5000000" 4000000" 3000000" 2000000" 1000000" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"26"27"28"29"30"31"32"33"34"35"36"37"38"39"40"41"42"43"44"45"46"47"48"49"50" Number'of'outputs' Transac7on(volume(per(number(of(outputs( 1,400,000,000" Volumes((BTC)( 1,200,000,000" 1,000,000,000" 800,000,000" 600,000,000" 400,000,000" 200,000,000" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"26"27"28"29"30"31"32"33"34"35"36"37"38"39"40"41"42"43"44"45"46"47"48"49"50" Number(of(outputs( Shadow addresses The official bitcoin client tries to randomize the position of the change output, but code is flawed: File: wallet.cpp Number of payees // Insert change txn at random position: vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size()); wtxNew.vout.insert(position, CTxOut(nChange, scriptChange)); size()+1 If just two outputs (one payee), GetRandInt(1) always returns 0. The change ends up always in the first output. If multiple outputs, change is never the last output. Still not fixed! (Recently fixed!) Shadow addresses Given the bug in the official client, BitIodine checks transactions with exactly two outputs, and also checks that the first address was new at the time of the transaction. If it is, chances are it's a shadow address. It is a heuristic. Evaluating heuristics Number of clusters 6M 5M 4M 3M 2M 1M 0M Cluster size 15 14.32 5,331,657 10 2,175,621 660,494 Heur. 1 Heur. 1+2 det. Heur. 1+2 relaxed Coverage 100 5 4.30 67 33 0 87.1% 30% 54.5% Heur. 1 39% 50% 99.7% Heur. 1+2 det. Heur. 1+2 relaxed Transaction coverage Addresses clustered 5 1.74 0 1 Heur. 1 1 Heur. 1+2 det. Heur. 1+2 relaxed Average Median The Classifier: labels for addresses The Classifier: labels for users Example: classifying address and owner • Classify an address we found in IRC chat logs of channel #bitcoin-otc • Borrowed money from other Bitcoin-OTC users Zero-balance address, exhausted in 84 transactions, belonging to user xisalty on BitcoinTalk forum and user xisalty-otc on Bitcoin-OTC. The owner is a known scammer! Every address belonging to the user is empty, 4.5% are OneTime-Addresses, 2.3% are zombies. A real-world case: investigating the Silk Road • Identify the addresses owned by the Silk Road • Investigate the activity of the addresses A real-world case: investigating the Silk Road One of the bitcoin addresses that moved most funds on the network during 2012 is: 1DkyBEKt5S2GDtv7aQw6rQepAvnsRyHoYM A real-world case: investigating the Silk Road Using the Mt.Gox scraper, we find that: • On July 17, 2012 this address had a balance of 517,825 BTC • On the same day Bitcoin looked on the verge of breaking 10 USD/BTC • At 02:00 AM, someone sold 10,000 BTC at 9 USD/ BTC, driving the price down • At 02:29, two large withdrawals of respectively 20,000 BTC and 60,000 BTC were made from this address one after the other, and included in a block at 02:32. A real-world case: investigating the Silk Road Using the Mt.Gox scraper, we find that: • Mt.Gox, at the time, needed 6 confirmations in order to allow the user to spend deposited bitcoins. 6 confirmations matured with block at height 189421, relayed at 02:47:24 AM • At 02:52 and 02:53, someone sold approximately 15,000 BTC at market price in several batches, causing the price to drop below 7.5 USD/BTC. A real-world case: investigating the Silk Road A real-world case: investigating the Silk Road • Sign up to the Silk Road • Deposit 0.001 BTC to a one-time deposit address • The coins are mixed: the deposit address is provably in the same wallet as more than 25,000 other addresses A real-world case: investigating the Silk Road • Find a connection between the addresses in the mixer and the large 1Dky... address • The mixer is a cluster active since June 18, 2012 more than 80,000 inputs/outputs • Follow the flow of coins! A real-world case: investigating the Silk Road Multi-hop connection found: the 1Dky... address belongs to the Silk Road. Conclusions We present BitIodine We test it on real-world use cases we are able to get valuable information we get insights on the Silk Road Plans for the future add user-friendly front-end to the framework Thank you.
Benzer belgeler
BitIodine: Extracting Intelligence from the Bitcoin Network
analysis of the Bitcoin blockchain, which is easily expandable and future-proof.
• We automatically label clusters/users with limited to no supervision.
• We test our framework on real-world use ca...