Corporate Data Protection & GDPR

Busking on blockchain and dreams of distribution in the privacy and identity space

A post prompted by the article in the below tweet and published after sitting on it for nigh on 18 months. The delay was down to needing a sanity check and some feedback, but also down to a more general break from twitter and blogging where drafts began to build up.

This particular post came back to the front of my mind because of conversations about data trusts and attempts to ‘democratise’ personal data processing, many of which mention blockchains, though points are mainly pertinent no matter the underlying technology.

So welcome to me thinking out loud, with the much appreciated help of Robin Wilton (@futureidentity), about my gut reactions to the next big startup thing in this data protection and privacy space.


He’s totally correct, it’s a great read and reminded me of time spent busking on the theme of blockchain in context of data protection and user identification. The thing that always quickly resonated, like a giant movie gong: it unravels as soon as you factor in humans and all the human run orgs you have to involve.

I’m not claiming any special blockchain or cryptocurrency expertise, but I did spend considerable time arming myself with the basics…and getting conclusions sanity checked by folk who REALLY know their stuff. The result of that got listed as a supplementary resource for a Coursera course, so I’m confident I grasped main themes.

Blockchains: Embedding Integrity (a.k.a blockchain for beginners with an InfoSec twist)

To give you a flavour for the discussion and gotchas here’s the basic idea:

The challenges (see the positive language thing I did there!):

Virtual identities are fundamentally insecure because of the myriad means to spoof, damage, or steal them. Then there’s the massive lake of reusable usernames and passwords souping round the dark and not so darknet, the 3rd place security almost invariably takes in the fast/cheap/secure development race, and the overwhelming lack of operational security awareness and capability among the lion’s share of folk online…especially those most personally vulnerable.

The most psychologically, and financially damaging fallout from identity theft (ignoring for now the the horror of things like control by domestic abusers and other cyber stalking) is difficulty proving that you are in fact the victim. If in doubt about that pain and implications of it, check out Bennett Aaron’s story. Sobering no matter how humorously told.

In exchange for ID management work done by those big organisations (Log on with LinkedIn, Facebook, Google, Twitter, Microsoft etc), our core and peripheral identities become currency shared to profile us and better commercially, politically, or otherwise control manipulate influence nudge serve us.

All of our various emails, usernames, and personal details are linked over time to device fingerprints, cell numbers, geographical locations and other means to track. Although lip service is paid to giving back some control via labyrinthine privacy settings and web services like AdChoices (provided by the Digital Advertising Alliance, which sometimes turn off tracking, if you allow third party cookies), you are still tracked in real-time, and resultant profiling socially engineers personalises everything you see and do online and things you don’t get to see behind the digital scenes.

Solution concepts

The tamper-resistant nature of a blockchain (could have said ‘proof’, but I worked in infosec too long for absolutes) presents the possibility of having a reusable identity or identities that are not managed, ‘secured’, and shared by social media giants, retail behemoths, governments, and financial institutions.

We could have new anonymised identities, derived from uniquely identifiable pre-verified master records and embedded in a trusted third party blockchain. Negating the need to provide personal verification information to everyone else for every transaction. Achieving what ‘Logon with Facebook/Twitter/LinkedIN’ was trying to do, but without associated data broking. The blockchain owner, or distributed verification network will confirm that your unique ID matches the one generated from your verified personal record. That confirmation acts as both the identification and authentication step. Keeping it under an individual’s control. Expanding on that:

Use case A – Personal data limits for online transactions

Users create one or more class A, B, and/or C identity.

  • Class A: Multiple identifying data points e.g. usual name, address, DOB, social security number, plus the extras e.g. historical residence details, encrypted biometrics, employment verification, birth/marriage/death certificates, employment records, device fingerprints etc etc.
  • Class B: A lesser subset of the above
  • Class C: A basic subset of the above

Each identity is represented by a unique hash. The next time you want to shop, you don’t provide your email, password, name, address, etc, you provide your hash, and your blockchain provider’s ID, and a one-time code generated by a hardware token. That serves as consent to query the blockchain and use data. The 3rd party will be required to explicitly list the data points necessary to complete the transaction, share their current privacy policy, list of any additional third parties who will see information to enable the transaction, and state the retention time for the data.

The retailer queries the validity of the hash with the blockchain. A simple retail transaction would be a Class C transaction. A request for any Class B or Class A specific data points would be rejected. The data points provided to complete the transaction are recorded in the blockchain and cross referenced to your secure wallet where you locally log transactions.

This provides an historical record (also in the blockchain), proving Class C consent was provided for third-party A, and any explicitly named additional companies involved in completing the specific transaction.

Use case B – Tracking and profiling protection

Identity master hashes provide a centralised means to manage online tracking protection. In addition to pre-verified identities, the blockchain will hold tracking preferences and record verification requests. This will enable use of an open source tool to identify instances of your hash anywhere on the web and confirm validity of use against verification logs.

If there is no record of the tracking entity making a valid query with a user-generated one time code, the tool’s default will be to notify the third-party they are using an identity without consent, and send that notification and details to the secure wallet held by the user. Intelligent analysis of abuses will enable pre-transaction flags to be raised to warn users about all of the historical abuses logged about a given organisation, eventually potentially evolving into a trust rating.

The general public will also be able to query the number of reported abuses by a given third-party and publicise anonymised data about the extent of the abuse, inviting those affected to refer to data protection regulators, legal council, or otherwise seek redress.

This is imperfect, but it injects transparency about identity re-use, creates a timestamped record of abuse, and gives means for the public to unpick the millions of unseen third parties in order to address it. It also stores a record of the point in time privacy policy of the organisations in question as a reference point to judge legality of an interaction.

The CARDI (Constraints, Assumptions, Risks, Dependencies, and Issues)

Or, in startup parlance: “Negative energy we don’t want in the room”

Taking it from the top and no doubt missing many other things that can go wrong, or create friction:

1. Who verifies data points provided to initially create the proxy identities?

Who could act as the central service provider on-boarding people to the blockchain? The government? The banks? Facebook? Each negates, or even potentially worsens the data protection situation. Whoever it is, they need to have no financial, political, or other vested interest, other than providing the blockchain creation, maintenance, protection, and verification services.

Maybe government is the answer, but that robs it of the transparency and transfer of power to individuals that makes blockchain worthwhile in the first place. Perhaps it could be a pro-privacy non-profit, or at least an organisation without commercial or political skin in the data harvesting game. No matter who you chose it would take a huge amount to start-up, and put them under huge commercial pressure to compromise fundamental principles to generate on-going funds to remain operationally reliable and secure, without which there is no service because we are asking people to depend on it to access most of the most essential things online. As for the pressure to disclose info to powers that be and the target painted for threat actors…well yeah.

Ask yourself which intermediary you would trust? Now ask yourself if Joe and Jane Bloggs would trust them? Could it work as a grass-roots service with evangelists courting people with high social media profiles to create a groundswell?

Does the available information about and upswell in government support for ‘data trusts’ answer those kinds of questions for you?

2. Frequent content changes and logging thousands of often concurrent queries is fundamentally at odds with the nature of a blockchain

A defining characteristic of a typical blockchain is the work needed to create a new block. You can remove that requirement. Make each hash trivial to generate, but you still need to confirm a block is unique and compliant.

If you drop those checks and balances for block creation and content verification, why are you bothering with this technology in the first place? All of the old access and integrity issues will resurface. Especially with privileged access multiplied by all the inevitable 3rd parties involved in development and support.

3. Why on earth would organisations switch to using this?

And this, right there, is the core issue. There isn’t a single solitary reason why any business would volunteer to rewire online identification and authentication to do this. Why go to vast expense when not doing so allows one to continue collecting, aggregating, profiling and distributing personal data with no real pushback, as long as one doesn’t have too many breaches, or one handles breaches tolerably well?

Perhaps, to court mainstream backing, firms could find a lower frequency, higher value transaction class that is ripe for change? Where motivation and budget is already there, or thereabouts. Property purchases, or another high value product exchange. But there are already contract focused blockchain implementations arguably better positioned to take off because they have a user base, established infrastructure, and a history of more or less successful stress testing. Identity verification for transactees is done through traditional checks, but their integrity defending architecture is fundamentally the same as everything I propose to underpin this. Why would you bet on a new horse that reinvents a rolling wheel (messily mixing metaphors)?

4. Why in hell would consumers switch to this kind of thing?

Did I say that last one was the core issue? This one probably trumps it. Unless people who use these services have a very personal motivation for change, it won’t happen. But (I hear you cry) future protection against often indirect impacts on ability to securely and fairly live online…

…yeah, not likely to cut it, at least not at a take-off-point-creating scale. A take off that would resist effort by existing players to buy or break it (we don’t have everlasting lightbulbs for a reason). A hard-core of privacy and security pros may help spread the word, but those folk can also kill things off with ‘if it’s not perfect, it’s pointless’ attitudes. Given we are looking to change something that’s a daily occurrence for many, a brutal tyre kicking would be the least of our worries.

Did I say ‘quick post’?

Yeah, sorry, I write and talk to think. The aim here was to put privacy related blockchain use in some kind of real life context. Creating something to push against, so to speak.

While my back of a fag packet idea might not float your boat, your board has probably seen something blockchain-ish that kind of did. What’s harder to come by is means to unpick the sales rhetoric.

A killer life-changing concept with rock solid implementation could be close to the boil behind some super secret squirrel code name, but as the man said in the article at the top, it’s been 10 years since the Bitcoin genesis block, and blockchain concepts are well over a decade old. Where are all the super smart folk with hungry VCs on speed dial who have had more than enough time to make it happen?

When all’s said and done, blockchains are game-changing information processing and storage plumbing, but there’s no inherent value outside the world of cryptocurrency trading and purchasing things where traditional currency won’t do. Creating indirect value through changing other aspects of a traditional transaction model involves cost, time, and disruption, with no guaranteed return, because the things it can really change are things the establishment sees no real reason to mess with.

If you own a hotel chain, you’re not going to get very far without pipework, but it’s not why people come to stay. Working the analogy from another angle: even if some radiators rattle and you get the odd leak, you’re not going to rip it all out and replace it. That is the challenge unavoidably linked to the very nature of the opportunity. Blockchains have potential to rock the very foundations of our oldest, biggest, and most powerful institutions, but not enough to topple them, and we don’t even feel the tremors in our living rooms and kitchens. So beyond some digital currency dabbling, and some narrow local applications, I foresee folks sitting very tight and merrily paying plumbers for the foreseeable future.

And given the blockchain part very much isn’t my core area of expertise I backed away at that point and sought input from some peers. I will leave you with a super insightful follow up from Mr Robin Wilton (@FutureIdentity). More rich food for your future of identity and personal data processing thoughts.


ROBIN WILTON

I think you hit on a key problem to do with the question “do the characteristics that make blockchains a good mechanism for cryptocurrency make it a good mechanism for anything else?”…

The crucial things about, say, Bitcoin, are that (i) it is intended to work like cash, in that you don’t have to trust anything other than the currency (you don’t depend on a central bank, you don’t have to trust the payer, etc.); (ii) the fact that the blocks need to be hashed, and that the hashing mechanism can be used to introduce an arbitrary work factor, gives a degree of control over the rate at which bitcoins can reliably be mined. For applications like property registries, that latter is actively unhelpful. You don’t want there to be an artificially introduced work factor, because it increases the amount of time and energy needed to conduct a transaction. That’s OK – you can just take it out of the design. But in doing so, you’re accepting that one of the blockchain design options that makes it ideal for cryptocurrency is actually counterproductive for other use cases.

As far as the trust factor is concerned – Steve Wilson has written at length about how, once you move away from the cryptocurrency use-case, you inevitably end up having to trust something that is “off chain”. A register of title deeds, an identity provider, an attribute authority, etc.. So, if your source of trust is off-chain in the first place, what does a blockchain buy you? It becomes a distributed ledger of the transactions concluded using the information from the trusted source. In effect just a database with a tamper resistant audit trail. But if you have a centralised trusted source anyway (the register of title deeds), why not simply maintain the audit trail on exchanges of title there? What do you gain by distributing it? Actually, you might introduce worse performance – since the more distributed nodes you have, the longer it will take for transactions to propagate through all of them.

Then there’s the question of consensus. That doesn’t mean the same in the blockchain context as it does in everyday usage… so what does it “mean” to say that the blockchain records a consensus that you and I have concluded an exchange of title… or that I have asserted that I am over 18? The fact that I am over 18 isn’t really open to debate, especially if we have both agreed on a mutually acceptable authoritative source for that attribute, so what’s the role of blockchain “consensus” in this context?

OK, up to now, I’ve assumed that the off-chain authoritative source is in fact trustworthy (HM Land Registry, for instance, which probably mostly is…). However, at a UNECA workshop in Addis Ababa last year, people kept saying “we want a blockchain solution for property transfers, because our land registry is unreliable” (!). It’s too easy, they say, for someone to get a fake deed of title registered in their systems – so the land registry is full of unreliable data, and even though you may think you have bought a particular house, someone else may show up with an apparently valid title deed to it, issued by the registry.

My question was… under those circumstances, how will a blockchain solution be any more reliable? If the source of trust is (a) off-chain and (b) unreliable, all you’ll end up with is a distributed ledger of transactions, some of which are based on untrustworthy data.

David Murakami Wood did some really good research a few years ago (pre-blockchain) about similar issues in the Brazilian eID system… apparently, although there was nominally a “national” eID, in practice it was all implemented at the regional level, and there was no centralisation of records. It meant that if you wanted a new identity, all you had to do was get registered in a different region, and the two records would never clash. Now, it might be that a distributed ledger would help, here, but if the problem is that you can easily re-register in a new region anyway, the ledger won’t entirely fix the issue.

Thinking a bit about “user centric” ID, I still think Mydex have the best model for how to change the economic status quo. As you say, a service provider who can currently insist on collecting and keeping data about you has no incentive to shift to a model that gives you more control at their expense. Mydex said “yes, but currently all you know about Robin is what you can glean from your interactions with him. We, as his trusted intermediary, can open up data feeds about other aspects of Robin’s online interactions, but in return, you’ll have to accept some constraints on what you do with data about him”. They get more data, but I get more bargaining power.

Lastly, for all the talk about sovereign identity, and the idea that I can go to a service provider with attribute assertions from my own Personal Data Store, they can still ask me for multiple attributes as a condition of doing business with me – and once I disclose those attributes, there’s a strict limit on what technical protections I can apply to them. (I call this the problem of maintaining privacy beyond first disclosure). Over time, can I actually ever stop them accumulating all the data they currently get about me?

There have been a couple of techno-centric attempts at selective, controlled attribute release – U-Prove, from Microsoft, which they integrated into their Infocard architecture… Infocard was their Great New Hope for user-managed attribute disclosure, all to be beautifully integrated into Windows, but guess what… service providers could specify which attributes you had to disclose in order to have their service, so the users ended up with as little control as they had had in the first place. The second is Idemix, from IBM. High function, but complex. Probably hard to implement on-premises, so you might have to rely either on a services contract or outsourced PaaS deployoment. It was used in the ABC4Trust H2020 project, but it’s hard to see it as much more than a proof of technology, much as I’d like the concepts to see widespread adoption.

Otherwise, if you haven’t already, I recommend having a look at Eve Maler’s UMA project, which is being integrated into Forgerock’s identity management products. It’s based on OAuth, so essentially what the user is doing is demonstrating that they can authenticate to a trustworthy resource which does not belong to the relying party. UMA is the fruit of a decade of thought and work, and is well worth investigating.


Final words

Since drafting this nearly 18 months ago there has been nothing to challenge the conclusion that blockchain is only as useful, secure, resilient, and trustworthy as the more traditional technology, control, and oversight wrapper placed around it (see all of the cryptocurrency exchange and blockchain security hiccups for details). In many cases, if you get that traditional wrapper right, cutting out the blockchain middle man may actually save you time and money, but I’m open to being proved wrong.

As a last perspective I thoroughly recommend this article by Cory Doctorow on the behavioural economics and geopolitics surrounding disruptive technology. In his example it’s the ride-sharing world of Uber and Lyft. He states that benefits are there to be realised for social collectives if the real conditions for competition are enforced. Something that would take a significant teardown of existing players. Things like digital rights management and computer misuse, created to protect the non-tech status quo from tech interlopers, have ended up protecting the startups that grew into our data oligopolies.

In short (or rather far too long) a fascinating space to watch.


Featured Image Copyright : Roman Fedin

Leave a Reply