Train a deep learning image classification model with ML.NET and TensorFlow

This sample may be downloaded and built directly. However, for a succesful run, you must first unzip in the project directory, and copy its subdirectories into the assets directory.

Source Code - Click to download

Understanding the problem

Image classification is a computer vision problem. Image classification takes an image as input and categorizes it into a prescribed class. This sample shows a .NET Core console application that trains a custom deep learning model using transfer learning, a pretrained image classification TensorFlow model and the ML.NET Image Classification API to classify images of concrete surfaces into one of two categories, cracked or uncracked.



The datasets for this tutorial are from Maguire, Marc; Dorafshan, Sattar; and Thomas, Robert J., "SDNET2018: A concrete crack image dataset for machine learning applications" (2018). Browse all Datasets. Paper 48.

SDNET2018 is an image dataset that contains annotations for cracked and non-cracked concrete structures (bridge decks, walls, and pavement).

The data is organized in three subdirectories:

  • D contains bridge deck images
  • P contains pavement images
  • W contains wall images

Each of these subdirectories contains two additional prefixed subdirectories:

  • C is the prefix used for cracked surfaces
  • U is the prefix used for uncracked surfaces

In this sample, only bridge deck images are used.

Prepare Data

  1. Unzip the directory in the project directory.
  2. Copy the subdirectories into the assets directory.
  3. Define the image data schema containing the image path and category the image belongs to. Create a class called ImageData.
class ImageData
    public string ImagePath { get; set; }

    public string Label { get; set; }
  1. Define the input schema by creating the ModelInput class. The only columns/properties used for training and making predictions are the Image and LabelAsKey. The ImagePath and Label columns are there for convenience to access the original file name and text representation of the category it belongs to respectively.
class ModelInput
    public byte[] Image { get; set; }
    public UInt32 LabelAsKey { get; set; }

    public string ImagePath { get; set; }

    public string Label { get; set; }
  1. Define the output schema by creating the ModelOutput class.
class ModelOutput
    public string ImagePath { get; set; }

    public string Label { get; set; }

    public string PredictedLabel { get; set; }

Load the data

  1. Before loading the data, it needs to be formatted into a list of ImageInput objects. To do so, create a data loading utility method LoadImagesFromDirectory.
public static IEnumerable<ImageData> LoadImagesFromDirectory(string folder, bool useFolderNameAsLabel = true)
    var files = Directory.GetFiles(folder, "*",
        searchOption: SearchOption.AllDirectories);

    foreach (var file in files)
        if ((Path.GetExtension(file) != ".jpg") && (Path.GetExtension(file) != ".png"))

        var label = Path.GetFileName(file);

        if (useFolderNameAsLabel)
            label = Directory.GetParent(file).Name;
            for (int index = 0; index < label.Length; index++)
                if (!char.IsLetter(label[index]))
                    label = label.Substring(0, index);

        yield return new ImageData()
            ImagePath = file,
            Label = label
  1. Inside of your application, use the LoadImagesFromDirectory method to load the data.
IEnumerable<ImageData> images = LoadImagesFromDirectory(folder: assetsRelativePath, useFolderNameAsLabel: true);
IDataView imageData = mlContext.Data.LoadFromEnumerable(images);

Preprocess the data

  1. Add variance to the data by shuffling it.
IDataView shuffledData = mlContext.Data.ShuffleRows(imageData);
  1. Machine learning models expect input to be in numerical format. Therefore, some preprocessing needs to be done on the data prior to training. First, the label or value to predict is converted into a numerical value. Then, the images are loaded as a byte[].
var preprocessingPipeline = mlContext.Transforms.Conversion.MapValueToKey(
        inputColumnName: "Label",
        outputColumnName: "LabelAsKey")
        outputColumnName: "Image",
        imageFolder: assetsRelativePath,
        inputColumnName: "ImagePath"));
  1. Fit the data to the preprocessing pipeline.
IDataView preProcessedData = preprocessingPipeline
  1. Create train/validation/test datasets to train and evaluate the model.
TrainTestData trainSplit = mlContext.Data.TrainTestSplit(data: preProcessedData, testFraction: 0.3);
TrainTestData validationTestSplit = mlContext.Data.TrainTestSplit(trainSplit.TestSet);

IDataView trainSet = trainSplit.TrainSet;
IDataView validationSet = validationTestSplit.TrainSet;
IDataView testSet = validationTestSplit.TestSet;

Define the training pipeline

var classifierOptions = new ImageClassificationTrainer.Options()
    FeatureColumnName = "Image",
    LabelColumnName = "LabelAsKey",
    ValidationSet = validationSet,
    Arch = ImageClassificationTrainer.Architecture.ResnetV2101,
    MetricsCallback = (metrics) => Console.WriteLine(metrics),
    TestOnTrainSet = false,
    ReuseTrainSetBottleneckCachedValues = true,
    ReuseValidationSetBottleneckCachedValues = true,

var trainingPipeline = mlContext.MulticlassClassification.Trainers.ImageClassification(classifierOptions)

Train the model

Apply the data to the training pipeline.

ITransformer trainedModel = trainingPipeline.Fit(trainSet);

Use the model

  1. Create a utility method to display predictions.
private static void OutputPrediction(ModelOutput prediction)
    string imageName = Path.GetFileName(prediction.ImagePath);
    Console.WriteLine($"Image: {imageName} | Actual Value: {prediction.Label} | Predicted Value: {prediction.PredictedLabel}");

Classify a single image

  1. Make predictions on the test set using the trained model. Create a utility method called ClassifySingleImage.
public static void ClassifySingleImage(MLContext mlContext, IDataView data, ITransformer trainedModel)
    PredictionEngine<ModelInput, ModelOutput> predictionEngine = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(trainedModel);

    ModelInput image = mlContext.Data.CreateEnumerable<ModelInput>(data,reuseRowObject:true).First();

    ModelOutput prediction = predictionEngine.Predict(image);

    Console.WriteLine("Classifying single image");
  1. Use the ClassifySingleImage inside of your application.
ClassifySingleImage(mlContext, testSet, trainedModel);

Classify multiple images

  1. Make predictions on the test set using the trained model. Create a utility method called ClassifyImages.
public static void ClassifyImages(MLContext mlContext, IDataView data, ITransformer trainedModel)
    IDataView predictionData = trainedModel.Transform(data);

    IEnumerable<ModelOutput> predictions = mlContext.Data.CreateEnumerable<ModelOutput>(predictionData, reuseRowObject: true).Take(10);

    Console.WriteLine("Classifying multiple images");
    foreach (var prediction in predictions)
  1. Use the ClassifyImages inside of your application.
ClassifySingleImage(mlContext, testSet, trainedModel);

Run the application

Run your console app. The output should be similar to that below. You may see warnings or processing messages, but these messages have been removed from the following results for clarity. For brevity, the output has been condensed.

Bottleneck phase

Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 279
Phase: Bottleneck Computation, Dataset used:      Train, Image Index: 280
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:   1
Phase: Bottleneck Computation, Dataset used: Validation, Image Index:   2

Training phase

Phase: Training, Dataset used: Validation, Batch Processed Count:   6, Epoch:  21, Accuracy:  0.6797619
Phase: Training, Dataset used: Validation, Batch Processed Count:   6, Epoch:  22, Accuracy:  0.7642857
Phase: Training, Dataset used: Validation, Batch Processed Count:   6, Epoch:  23, Accuracy:  0.7916667

Classification Output

Classifying single image
Image: 7001-220.jpg | Actual Value: UD | Predicted Value: UD

Classifying multiple images
Image: 7001-220.jpg | Actual Value: UD | Predicted Value: UD
Image: 7001-163.jpg | Actual Value: UD | Predicted Value: UD
Image: 7001-210.jpg | Actual Value: UD | Predicted Value: UD
Image: 7004-125.jpg | Actual Value: CD | Predicted Value: UD
Image: 7001-170.jpg | Actual Value: UD | Predicted Value: UD
Image: 7001-77.jpg | Actual Value: UD | Predicted Value: UD

Improve the model

  • More Data: The more examples a model learns from, the better it performs. Download the full SDNET2018 dataset and use it to train.
  • Augment the data: A common technique to add variety to the data is to augment the data by taking an image and applying different transforms (rotate, flip, shift, crop). This adds more varied examples for the model to learn from.
  • Train for a longer time: The longer you train, the more tuned the model will be. Increasing the number of epochs may improve the performance of your model.
  • Experiment with the hyper-parameters: In addition to the parameters used in this tutorial, other parameters can be tuned to potentially improve performance. Changing the learning rate, which determines the magnitude of updates made to the model after each epoch may improve performance.
  • Use a different model architecture: Depending on what your data looks like, the model that can best learn its features may differ. If you're not satisfied with the performance of your model, try changing the architecture.


ML.NET and Model Builder

ML.NET is an open-source, cross-platform machine learning framework for .NET developers. It enables integrating machine learning into your .NET apps without requiring you to leave the .NET ecosystem or even have a background in ML or data science.

We are excited to announce new versions of ML.NET and Model Builder!

In this post, we’ll cover the following items:

  1. Model Builder Preview
  2. ML.NET v1.5.5
  3. Virtual ML.NET Community Conference
  4. Feedback
  5. Get started and resources

Model Builder Preview

This preview brings a lot of big changes to Model Builder, and we’re excited to get your feedback on all the new features which include:

  • Config-based training with generated code-behind files
  • Restructured Advanced Data Options
  • Redesigned Consume step

You can sign up for the Preview at

Config-based training with generated code-behind files

The Model Builder experience has been revamped! Now when you right-click on your project in Solution Explorer and Add > Machine Learning, the Add New Item Dialog opens, and you can add an ML.NET Model.

New Item Dialog in Visual Studio

After adding your model, the Model Builder UI opens, and a new item (an *.mbconfig file) shows up in the Solution Explorer.

Model Builder UI in Visual Studio

Close up of Solution Explorer in Visual Studio

At any point when using Model Builder, if you close out of the UI, you can double click on the *.mbconfig in Solution Explorer, and it will open the UI again to your last saved state.

After training, two files are generated under the *.mbconfig file:

Solution Explorer expanded with mbconfig in Visual Studio

  • Model.consumption.cs: This file contains the Model Input and Model Output schemas as well as the Predict function generated for consuming the model.
  • This file contains the training pipeline (data transforms, algorithm, algorithm hyperparameters) chosen by Model Builder to train the model. You can use this pipeline for re-training your model.
  • This is a serialized zip file which represents your trained ML.NET model.

Previously, these files were added as two new projects (a class library for model consumption code and a console app for the training pipeline). The new experience is similar to adding a new form in a Windows Forms application, where there are code-behind files behind the form and double clicking the form opens the designer.

If you open the *.mbconfig file, you can see that it is simply a JSON file with state information:

  "TrainingConfigurationVersion": 0,
  "TrainingTime": 10,
  "Scenario": {
    "ScenarioType": "Classification"
  "DataSource": {
    "DataSourceType": "TabularFile",
    "FileName": "C:\Desktop\Datasets\yelp_labelled.txt",
    "Delimiter": "t",
    "DecimalMarker": ".",
    "HasHeader": true,
    "ColumnProperties": [
        "ColumnName": "Comment",
        "ColumnPurpose": "Feature",
        "ColumnDataFormat": "String",
        "IsCategorical": false
        "ColumnName": "Sentiment",
        "ColumnPurpose": "Label",
        "ColumnDataFormat": "String",
        "IsCategorical": true
  "Environment": {
    "EnvironmentType": "LocalCPU"
  "Artifact": {
    "Type": "LocalArtifact",
    "MLNetModelPath": "C:\source\repos\ConsoleApp8\ConsoleApp8\"
  "RunHistory": {
    "Trials": [
        "TrainerName": "AveragedPerceptronOva",
        "Score": 0.8059,
        "RuntimeInSeconds": 4.4
    "Pipeline": "[{"EstimatorType":"MapValueToKey","Name":null,"Inputs":["Sentiment"],"Outputs":["Sentiment"]},{"EstimatorType":"FeaturizeText","Name":null,"Inputs":["Comment"],"Outputs":["Comment_tf"]},{"EstimatorType":"CopyColumns","Name":null,"Inputs":["Comment_tf"],"Outputs":["Features"]},{"EstimatorType":"NormalizeMinMax","Name":null,"Inputs":["Features"],"Outputs":["Features"]},{"LabelColumnName":"Sentiment","EstimatorType":"AveragedPerceptronOva","Name":null,"Inputs":null,"Outputs":null},{"EstimatorType":"MapKeyToValue","Name":null,"Inputs":["PredictedLabel"],"Outputs":["PredictedLabel"]}]",
    "MetricName": "MicroAccuracy"

This new Model Builder experience brings many benefits. You can:

  • Specify the name of your model and generated code.
  • Have more than one Model Builder-generated model in a solution.
  • Save your state and come back to the last saved state. If you spend an hour training and close out of Model Builder, now you don’t have to start over and can just pick up where you left off.
  • Share the *.mbconfig file and collaborate on the same Model Builder instance via source control.
  • Use the same *.mbconfig file in Model Builder and the ML.NET CLI (coming soon!).

Restructured Advanced Data Options

In the last Model Builder release, we added advanced data options for data loading which gave you more control over column settings and data formatting.

In this release, we added several more options and reorganized the options to make selecting your column settings even easier:

  • Purpose: Choose whether the column is a Feature column, a Label column, or a column to Ignore during training.
  • Data type: Choose whether the data in the column is a String, Single, or Boolean.
  • Categorical: Choose whether the column is categorical or not.

Advanced Data Options in Model Builder

Redesigned Consume Step

We have redesigned the consume step to make a smooth transition from training and evaluating a model to using that model to make predictions in an end-user application.

A code snippet has been provided in the UI which demonstrates how to set up the Model Input as well as how to use the generated Predict function to return the predicted output.

Each Model Input property is filled in with sample data from the first row of your dataset. You can use the copy button in the top right of the box to copy the entire code snippet; then once you paste this code into your end-user application, you can modify the Model Input fields to get real data to feed into your model.

Consume step in Model Builder

Additionally, there is a new Sample project section which generates an application that uses your model and adds the project to your solution. In previous versions of Model Builder, a sample console app was automatically added to your solution; now you can choose whether you want to add a new project to use your model.

Currently, there is only the option to add a console app, but in the future, we plan to add support for Web APIs, Azure Functions, and more.

ML.NET v1.5.5

This release of ML.NET brings numerous bug fixes and enhancements as well as the following new features:

  • New API that accepts double type for the confidence level which helps when you need to have higher precision than an int will allow for. Thank you @esso23 for your contributions!
  • Support for export ValueMapping estimator to ONNX.
  • New API to specify if the output from TensorFlow is batched or not (previously ML.NET always assumed it was a batch amount which caused errors when that wasn’t true).

Check out the release notes for more details.

Virtual ML.NET Community Conference

On May 7th, the 2nd annual Virtual ML.NET Community Conference will kick off with 2 days of sessions on all things ML.NET, and we’re looking for speakers to talk about:

  • MLOps
  • Case studies and real-life use cases
  • Interactive computing with Jupyter
  • ML.NET interop (ONNX)
  • ML.NET and IoT devices
  • ML.NET in F#
  • Big Data and ML.NET
  • A journey from experimentation to production
  • Anything else ML.NET related you can think of!

This is a 100% free event, by the community, for the community.

Published By:

Bri Achtman - Program Manager, .NET

March 15th, 2021

4 inspirational examples of legacy systems modernisation

We all know how hard it is to get rid of that old car. You have many memories associated with it and it always got you from point A to point B. It still works right? You know you'd love a new model that has new features, that can actually make it up a hill without belching smoke. But new cars are expensive, and the one you have works, right? However, your gas mileage is terrible. It breaks down frequently, and worse it's getting harder and harder to find replacement parts.

You can't seem to find any 8-track tapes any more. You can't plug your phone into the AM radio. The crank on the window has fallen off.

You get the picture. It's time for a new car. With old legacy systems, much like your old wheels, it's often hard to give them up for all sorts of reasons. However in this case modernising is not simply getting newer software, but entirely new systems with features that you never even realised that you were missing. Upgrading to new systems is less like getting a newer model of car, but more like trading your old Buick for a starship.

Let's take a look at some inspirational examples from a few vertical industries.

CRM Adaptation

For many B2B and B2C companies, adaptation to a Customer Relationship Management system may seem like something that is obvious these days. We are long past the era of the Rolodex (how many of you under the age of 35 even know what this is?) for keeping track of our business contacts.

We mostly think of these as a way of providing customised service to customers, but there are many more advantages. With CRM software, we are able to gain create more detailed intelligence about our users, develop new and improved marketing strategies, and make it possible to share information between different representatives within a company.

The data alone can provide the foundation for gaining better business intelligence, using machine learning and AI to develop clearer understanding of business topography and stronger marketing campaigns.

Core Banking Systems

Using Core Banking, or Centralised Online Real-time Exchange, has enabled banks and other financial institutions to completely revolutionise the way that they operate. With older legacy systems, even between branches of the same bank it used to take at least a day for a transaction to reflect in a real account. Each bank would have its own local server and data was only shared via a batch process at the end of the business day.

With Core Banking systems, all branches operate on one system. All information is updated instantly, so there is no wait for balances to be reconciled. Simple processes such as verification of transfers, or any other transaction can happen instantaneously, and often between different banks. Whereas it used to take up to a week for a check to clear, it can typically occur in less than a day, or in some cases instantaneously (assuming banks are using compatible systems)

With Core services, making/servicing loans, opening new accounts, processing cash transactions, interest calculations, and more are now handled instantaneously, with each party operating off of the same data.

ERP Solutions

Manufacturing and Pharmaceutical companies are now adapting to using Enterprise Resource Planning systems.

ERP Solutions help with integration of all business processes into a seamless operation. They can include everything from automation of back-office processes, product planning, development, manufacturing, sales, and marketing, and often synchronising all of these processes in real time. As a result, these help streamline processes, reduce costs, and companies for more flexible. They are able to be more competitive by being able to adapt and respond to change.

Within the pharmaceutical sector, this has been particularly valuable, due to the need to adapt to major transformations associated with increased competition and stringent production rules. ERP enables consolidation and integration of pharmaceutical manufacturing process across multiple units, track sensitive ops like compliance, expiry management, quality, formulation, cost, yield, etc.

Cloud-based Applications

Across almost every vertical is the adaptation to working with cloud-based systems.

The advantages are multifold. License management is much easier to handle; instead of purchasing expensive software, many companies have moved to leasing their office software (such as Office365). Software as a Service (SaaS) is revolutionising the way that may companies operate, providing access to tools that were not previously considered feasible.

Data is able to be kept uniform and secure across an entire organisation, and as a result it is automatically backed up without having to go through normal batch processes, which always had the risk of lost information if something were to occur between batches.

On top of this, cloud computing increases communication ability between staff members and management, and make it lot easier for your employees to be able to access data remotely (at least without having to login via a VPN). This provides companies with a great deal more flexibility


These are only a few examples, and the descriptions are exceedingly brief, but the idea is fairly clear. The advantages of trading that Edsel for a Tesla become obvious. By moving away from old legacy systems, while there is typically some up-front expense, the pay-off in terms of power for your company or organisation becomes patently clear.

Legacy Systems Modernisation explained


The argument for legacy systems modernisation

Technology is not like wine. It doesn’t get better with age. And yet, many companies continue to use systems that are long past their prime.   

There are many reasons for this. The simplest and most common one is that the software still works. It still copes with most of its tasks, still provides for the users’ needs. It’s the “if it ain’t broke, don’t fix it” argument. There are also economic reasons why businesses keep old systems running. These include uncertainty about return on investment (ROI), vendor lock-ins, change management challenges and concerns about system availability. For these reasons and others, owners deem the costs of redesigning or replacing the system prohibitive and/or unnecessary. 

However, it is now catching on in all industries that the benefits of modernising legacy systems with new technologies far outweigh the costs. This is no doubt why a 2018 survey by Whishworks revealed that 71% of IT leaders cite legacy systems modernisation as their top priority. 

Here are some of the key benefits. 

Improved efficiency and productivity

New technologies can eliminate paper, non-value-added work activities and procedural steps, dated and legacy business policies etc. This enables employees to be efficient and have more time to dedicate to billable work.   

Increased flexibility and agility

IT modernisation is based on agile methodologies designed to make it easier to adopt new technologies and solutions in the future. This enables businesses to evolve and expand more easily, and adapt more quickly to changing market conditions. 

Happier employees and customers

Streamlined workflows mean less paperwork, data entry and routine administrative tasks for employees. Faster, more effective processes result in improved service delivery and customer satisfaction. 

Lower support costs

Most modern software is hosted in the cloud on a SaaS basis, shifting the burden of maintaining it from the end-user organisation to the provider, eliminating the usual IT support costs. 

Improved security and compliance

The fact that most modern software is managed in-house by the vendor significantly reduces the risk of systems failures, security breaches and problems with compliance. 

Wide scope for integration

Modern software is integration-ready by default and third-party APIs make it possible to access the capabilities of other applications. This enables businesses to offer new and better solutions to their customers. Integrating the different platforms in use throughout a business allow all departments to communicate and collaborate. 


Common legacy systems by vertical


Many insurance companies are constrained by inflexible legacy systems for billing, claims, policy administration, underwriting and broker/agent management. Many of these functions are facilitated by workarounds, which is why managing and operating these systems is costly, and is becoming increasingly so.  In addition, many life insurers and property & casualty (P&C) insurers compete by constantly launching new products and services. However, legacy systems are hindering their speed to market, which is a critical market differentiator for those firms.  There is also the problem that many life and P&C insurers are using legacy systems that date back 40 years or more, which means the IT employees with knowledge of those systems are about to retire—if they haven’t already. 


Healthcare is increasingly data-driven and hospital IT leaders are under pressure to keep pace with new initiatives whilst managing a growing pool of legacy data systems.   Many hospital IT leaders have chosen to keep certain legacy systems running indefinitely. This is because converting clinical data such as lab results, radiology reports and ambulatory data to new systems is expensive, time-consuming and sometimes not even possible.   However, since this is a risky and costly approach, increasing numbers of hospitals are looking at active archiving solutions that allow legacy applications to be decommissioned, while still enabling hospital staff to have real-time access to records. 


Most airlines still rely on IBM’s Transaction Processing Facility (TPF), introduced in the 1960s. This processes hundreds of thousands of transactions per second and is still very reliable despite its age.   However, TPF is good at processing high volumes of data but nothing else. As a result, airlines rely on increasing numbers of bolt-on solutions for managing flight operations and offering customers more options. A disconnect between old and new IT systems is being blamed for the major systems failures that airlines such as Delta have experienced in the last few years.   Airlines are now being encouraged to spend a lot more on their IT. 


In the 1970s and 80s, banks were at the forefront of technological innovation. This was the period that brought us ATMs, BACs and international card payments. However, the sector put the brakes on innovation after that, which is why many core systems banks rely on today are ones that were built back then.  For example, a lot of banks are still using systems that are written primarily in COBOL, which was introduced in the 1960s. Not many people are learning COBOL anymore and many COBOL coders have already retired. Some are being persuaded to become part-time consultants just to keep banking systems up and running.   The banking industry is one that is in particular need of modernisation. However, it is also the best example of why businesses cannot simply rip out and replace their legacy systems. This is because every product banks sell is supported by a system made of thousands of interdependencies. 

These are but a few... 

These are just some of the industries impacted by legacy technologies. Research has consistently shown that the vast majority of IT leaders across all sectors believe that legacy systems are holding back their business.   

Challenges posed by retaining outdated systems

We are living in an age of digital transformation and legacy systems are a barrier to that. The “if it ain’t broke, don’t fix it” attitude fails to take account of the following challenges.

• The cost of maintaining legacy systems is high. A 2018 report by EY revealed that banks spend 75% of their IT budgets on maintaining legacy systems, including technical patches, workarounds to augment old platforms’ limited capabilities, and maintaining old infrastructure. 

• Customers expect more from their providers. Today’s customers demand faster, easier and more secure access to the things they need. They want personalised solutions based on implicit assumptions that their providers know them well from past interactions. Legacy systems make it extremely difficult to meet these ever-increasing expectations.

• Scope for integration is limited. Modern software platforms are integration-ready by default, whereas legacy systems lack compatibility and large amounts of custom code are required to connect them. 

• Legacy systems are less secure. Cyber-attacks are becoming increasingly sophisticated and legacy systems are not equipped to handle them. Plus, the older a system is, the more time an attacker has had to learn the code and discover its vulnerabilities.

• Compliance is difficult. Old systems beset with process gaps and human error-prone manual interventions make it extremely difficult to comply with relevant industry laws, regulations and internal procedures. 

• Legacy systems inhibit productivity. Administrative work required to fill in process gaps and augment system limitations stop staff from doing billable work. • It’s difficult to take advantage of new business opportunities. The more money and time you spend on maintaining legacy systems, the less you have for innovation.

• It’s difficult to respond to new market challenges. Businesses bound to legacy systems lack organisational agility to adapt quickly to changes in their market.

• There is often a lack of support. A lot of outdated software is no longer supported or maintained by the vendor, which means there is no one to provide patches or fixes when problems arise. There is also a shortage of engineers skilled in legacy programs because most are reaching retirement age.

• Legacy systems offer a fragmented view of the customer. Customer experiences are disjointed and unsatisfying when customer service representatives are unable to answer their questions due to data silos. 

So, in today’s world, the fact that your legacy systems work is not enough. They need to work better. They may not be broken, but they still need fixing. 

Cost of modernisation versus cost of inaction

Assessing much it will cost to modernise your legacy systems versus how much it will cost if you do nothing is a great way of deciding whether modernisation is worth investing in. 

Cost of modernisation 

The cost of modernisation depends on a variety of factors. The first is how much modernising you intend to do. Some companies will simply opt to improve their legacy infrastructure with minor additions and new code, extending its lifespan for a few more years. Others will take a sticky plaster approach, using new technology to address specific issues while the core architecture remains the same.   

Both of these approaches are only a stop-gap (and can cost more in the long run). Most firms eventually need to rebuild their systems in their entirety, either gradually or all at once. This approach requires the biggest investment, but it also delivers the biggest returns.   

If you opt for a total replacement with new, custom-built software, the next factor to have a bearing on the cost is the size of that software. The more screens/pages you have, the more expensive the application will be to deliver. The same goes for the level of complexity (which affects how much coding and testing will need to be done) and the level of creative design.   

If you intend to integrate your application with other systems, this will also affect the project cost. Integrations with payment providers like PayPal are typically very easy, but older or lesser-known systems could pose a challenge and drive up the cost. 

A major part of most legacy systems modernisation projects is data migration. This involves writing custom scripts that lift data from your old system and reshape it to fit the new one. Test runs of migration data will need to be performed once the software is finished to ensure that the new system is using the data properly, and adjustments made if something hasn’t translated correctly. These actions add time and cost to modernisation projects.   

Cost of inaction 

The most obvious and easiest-to-calculate cost of keeping your systems as they are the maintenance costs. According to a 2018 report by EY, banks spend 75% of their IT budgets on maintaining legacy systems, a potentially enormous sum that leaves little room for anything else.   

The next cost to consider stems from the fact that most legacy systems are beset with manual processes. Find out how much time could be saved if those processes were automated and you will see how much they are costing your business. 

Then there are the costs that are incurred when legacy systems fail. When Delta Airlines’ systems crashed in 2016, its whole fleet was grounded. This cost the airline $150m.   

Legacy systems can also have a significant negative impact on competitiveness. Today’s customers are mobile, interface with IT systems constantly, and expect increasingly interactive and personalised experiences. New technologies are appearing all the time to meet and drive these expectations. If your business is using outdated systems, you may not be able to deliver the experience customers expect, and eventually they will turn to a competitor.

In most cases, however, the biggest overall cost is that the existing system is simply unable to support the growth of the business. It cannot handle the volume and frequency of data required for a growing customer base or support the direction the business wants to take. Being constrained from pursuing new customers and new growth opportunities is a cost that most businesses cannot afford, particularly ones in competitive markets.     

Modernisation options

Here are 7 possible approaches to modernising your legacy systems. As touched on in the previous section, how much modernising you decide to do will have the biggest bearing on the resources you will need to fund the project. 

• Total transformation

As it sounds, this involves replacing your old system with an entirely new one and migrating all your data to the new platform. Although risky, it is the most rewarding in the long run.

• Gradual replacement

The entire system is rebuilt in blocks, which is lower risk, easier on budgets and delivers results quicker. Although success rates are high, systems can become disjointed if the project isn’t monitored carefully.

• Sticky plaster approach

Your system is patched using new technology to address specific issues, but the core architecture remains the same. This can offer big returns when done properly, though it can lead to a jumbled and poorly designed system if used too much.

• Improvement

Your technology remains the same but is improved using new code. This extends the lifespan of the technology and doesn’t require procurement of new apps; however, it is only a stopgap.

• As-you-were approach

You decide that your firm does not need to modernise, or is not ready or able to—yet. You decide that you are going to ‘wait and watch’. 

• Blended

You forge your own path to modernisation, adapting one or more of the aforementioned approaches.

• Transformation-as-a-Service

You partner with an IT services provider like QBA, who will design and manage the entire modernisation process for you. The provider will shoulder the burden of deciding which modernisation approach will work best, taking the pressure off your IT teams. Firms like QBA will then give unadulterated advice and guidance from a technology-agnostic perspective and coordinate a modernisation project that is tailor-made to suit your needs, priorities and budgets.   

Proven steps for modernisation project success

Legacy systems modernisation projects are highly rewarding if they are carefully planned and monitored. Here is a checklist to help you engineer a successful and future-proof transformation. 

1. Build your business case as early as possible, involving your key stakeholders and looking carefully at the benefits, costs and risks.

2. Make sure all impacted parties are committed, particularly as modernisation projects can take years to complete.

3. Free up your experts to focus exclusively on the transformation, hiring temporary resources to cover their day-to-day activities.

4. Assess the current state of your legacy systems, looking at problems, process gaps and potential future issues.

5. Find partners and vendors you can rely on and take advantage of their expertise.

6. Choose the best-fit modernisation approach, one that will empower you to deliver results fast.

7. Appraise technology stacks, looking at the most optimal solutions for your purposes, both now and in the future. 

8. Make a plan that sets out everybody’s responsibilities, details a strong quality assurance and testing process, and maps out a retirement schedule for your legacy apps. 

The Blockchain Explained to Web Developers, Part 3: The Truth


After exploring the blockchain theory and using it for real, we now have a better understanding of its strengths and weaknesses. Surprisingly, most of our conclusions are very different from what you will read in the blogosphere. Maybe it’s because we don’t blindly relay the fascination caused by the huge valuations of BitCoin and others. Maybe it’s because the hard truth about the blockchain is that it’s not ready yet. Read on to understand our take on the blockchain, based on strong evidence.

The Technology Is Not Mature Enough

As explained in detail in the previous post in this series, developing Decentralized Apps over a blockchain is a pain. The development community is small, available code snippets don’t work, public tutorials are outdated, the libraries are crippled with bugs, developer tooling is lacking, bugs are silent, etc.

It’s not that the Ethereum developers and community are bad ; they’re amazing, and they’re pouring a lot of time and expertise into their tools. But building a blockchain framework is a huge amount of work, and they’re only halfway through. Ethereum hasn’t reached the point of usability yet. I’m confident that this will change in the future, but I don’t know if it’s a matter of months or years.

Tip: We haven’t developed DApps for Bitcoin, but I’ve heard it’s worse. Instead of using a JavaScript-like language (Solidity) for smart contracts, you must use an assembly-like language, which isn’t even Turing-complete. Yikes.

The consequence is that developers don’t want to work on blockchain projects - they find it very frustrating. If you force them to work with a technology they hate, they will leave. Since it’s extremely hard to find skilled developers these days, you should think twice before taking a chance on the blockchain.

The second consequence is that it’s impossible to estimate the time it will take to build a project on the blockchain. If you can’t estimate your costs, good luck building a Business Model on the blockchain.

Smart Contracts Can’t Call APIs

In our blockchain experimentation, everything a bit “smart” in the contract had to be moved to a plain old web service running outside of the blockchain, in a trusted environment. For instance, a smart contract can’t figure out if the person asking for an ad placement is the author of a pull request, because a smart contract can’t call the GitHub API. As a consequence, our smart contract keeps only a very minimal amount of logic, becoming, in fact, a dumb contract. It’s not because we wanted to, it’s because we couldn’t do otherwise.

By design, a blockchain is deterministic. That means that if you take the entire history of blocks, and replay it locally, you should end up with the same state as every other node. This forbids the call to external APIs, where responses may change over time, or according to who calls them.

Blockchains are walled gardens. You can execute a contract from the outside world, but a contract itself can’t require data from a source outside of the blockchain. If a smart contract needs external data, someone must push the data to the blockchain first. There is an effort to ease this process through a concept called Oracles. But Oracles need a reputation system and governance. So much for fully-automated contracts and disintermediation.

In the real world, very few applications work in isolation. All the applications we’ve built for the past 3 years relied on external APIs - for identity management, payment, live data source, image processing, storage, etc. The limited capabilities of smart contracts make them useless in real world situations.

You Need A PhD to Understand It

If you read through the first blog post of this series, you probably think that you have a good basic understanding of the blockchain. Now, go and read this article. I’m an average engineer with only 20 years of experience in Web Development, and I couldn’t understand anything after the Jurassic Park reference. Terms like “two-way-pegged blockchains”, “pre-determined Host Oracle Contract”, and sentences like “The M-S result, combined with our inability to feed (non-BB) a revelation mechanism, means that Oracles are out” make me fell like a first grader.

The blockchain concept is complex. Existing implementations rely on rare design patterns, that you don’t learn in college. The blockchain vocabulary is kabbalistic.

Developing decentralized apps on top of blockchains requires understanding too many complicated concepts to fit in an average developer’s brain. My opinion is that there are not enough highly skilled programmers to support the revolution promised by the blockchain. And there will never be, as long as it’s so hard to understand.

As a consequence, most Decentralized apps are very buggy. A recent article stated that smart contracts contain 1 bug every 10 lines of code, making Ethereum “candy for hackers”. It wouldn’t be such a big deal if fixing bugs was easy. Unfortunately, as we explained in the previous post, you can’t update a smart contract. You have to create a new contract, transfer all the data and pointers from the old contract to the new one, and wait for the blockchain to propagate the change. The old buggy contracts and transactions remain in the blockchain forever.

Developer Power

The blockchain authors suggest using “code as law”. This also means “bugs as law”, as every software contains bugs. These bugs can be used by smart developers (criminals, the NSA, etc.) to avoid playing by the rules. Bugs are very common, even in popular open-source projects. Bitcoin, for instance, suffered several critical bugs leading to “cybertheft”. So leaving the keys to developers also means giving extraordinary power to the mean developers.

I don’t want to go all FUD (Fear, Uncertainty and Doubt) on you, but the possible scenarios of a society governed by machines don’t all finish with a happy ending in my mind.

When machines control the world

And even if we don’t consider mean developers, giving the power to good developer is dangerous, too. The problem is that developers are irresponsible (no harm intended - I’m a developer myself). It’s not that they’re childish, it’s that nobody ever taught them to write the law.

Also, developers are not elected by the people. If you don’t agree with the direction that Bitcoin takes (favoring speculation rather than practical applications) too bad for you - there is nothing you can do to change that. This is currently happening: the Bitcoin network currently suffers a severe crisis, because of the disagreements between a few core developers.

The decisions of half a dozen developers may cause the collapse of a billion dollar market capitalization. But nobody will hold them accountable in case of failure.

Waste of Resources

A blockchain is not cost-effective at all. In fact, it’s a huge waste of resources.

Take data replication for instance. The blockchain replicates all transactions across all nodes. Engineers have long invented replication strategies with better space efficiency. Compare the Blockchain with RAID6 disk clustering for instance:

In a Blockchain network, 10 nodes of 1GB each allow for a total replicated data volume of 1GB. You can loose up to 9 nodes in the network, and yet be able to recover the entire data.

In a RAID6 pool, 10 hards disks of 1GB each allow for a total replicated data volume of 8GB. You can loose up to 2 HDD in the pool, and yet be able to recover the entire data.

Mining nodes require very expensive hardware, with high end GPU cards and a huge amount of memory.

And it’s not just about buying expensive hardware. 99.99% of the computing is just wasted. All miners compete to mine a block by running expensive Math challenges. In Bitcoin, only one node every 10 minutes wins, and is actually useful to the chain by creating a block. The computation done by all the other nodes is thrown away.

The Ethereum blockchain is trying to fix that: they plan to switch from a proof-of-work consensus algorithm to a proof-of-stake, which is much less resource intensive. But proof-of-stake also has drawbacks, such as giving more power to people or companies owning high amounts of cryptocurrency. Besides, it’s far from ready yet (expect it in at least a year from now).

When machines control the world

This waste of storage, CPU and memory translates into a huge waste of energy. According to a bitcoin mining-farm operator, energy consumption totaled 240kWh per bitcoin in 2014 (the equivalent of 16 gallons of gasoline). Mining farms are a distributed engine turning electricity into heat. A blockchain is, in short, an expensive radiator. Energy efficiency is a big deal in a globally warming planet.

Very Expensive

Who pays for all the wasted energy? The companies that publish and use smart contracts. Yes, that’s you, if you intend to run a business on the blockchain. When you pay for a transaction on the blockchain, you also pay 99.99% of the network running at full speed for nothing. That makes blockchain transactions expensive.

A million dollars in bank notes

An average BitCoin transaction requires a fee of BTC 0.0002 ($0.11). This price is rising. It’s not really cheaper than a bank transaction fee (unless you consider a transfer across two countries with different currencies, of course).

For ZeroDollarHomepage, executing a 10-lines script on Ethereum method costs about one cent ($0.01). That’s insanely expensive. Amazon Lambda, for instance, costs $0.0000002 per request (after the first million requests each month).

It’s normal to pay for hosting costs when you use a platform, but the Blockchain costs are orders of magnitude higher than the most expensive PaaS.

Volatility and Speculation

You could say that the blockchain cost isn’t such a big deal, as long as people are willing to use the network and pay for transactions. It’s a question of supply and demand, and the demand for blockchain and cryptocurrencies is currently high enough to make it profitable. But this high demand leads to speculation, and therefore the price of computing and storage in a blockchain (any blockchain) is highly volatile.


Some analyst compare Bitcoin to a Ponzi Scheme, and predict that the market value will collapse once general interest disappears.

If we build a business based on the Ethereum’s blockchain, most of our expenses will be in Ether. If we don’t mine it ourselves, we’ll have to pay for that Ether in real money. But since the USD value of Ether may vary tenfold within a year, our business can move from profitable to worthless in the same timeframe. And we can’t do anything against it. On the other hand, if we mine ourselves, what is currently affordable (running a small server to cover expenses in Ether) might become very expensive once very large mining farms move from Bitcoin to Ethereum.

The high volatility of cryptocurrencies forbids any long-term profitable business built on the blockchain - except speculation.

Slow As Hell

Compared to many other innovations based on computers and networks, the blockchain is very slow. Experts say that you should wait 6 blocks to make sure that a transaction is legit. This means more than 1 minute in Ethereum, or more than 1 hour in Bitcoin.

In a traditional ad server, scheduling an ad takes about 100ms. If you’ve used our ZeroDollarHomepage Ad Server, you probably had a very different experience: Scheduling an ad takes about a minute. The network transport and replication accounts for a small share of that duration ; most of the time is spent waiting for the network to mine the transaction, and add a few more blocks after that. But all in all, the Ethereum blockchain is several orders of magnitude slower than traditional computing.


For end users, every second counts. The Web Performance Optimization trend focuses on improving revenue by earning one or two seconds in download time. Betting on a technology that requires a transaction to be acknowledged by the entire world isn’t the best way to make business.

Free Market and Anarchy

One of the promises of the blockchain is to liberate markets that still require an intermediary. No more lawyers, bankers, or bookmakers. A great opportunity for new businesses?

Except these intermediaries currently report criminal activities to the authorities (governments and law enforcement agencies). If you remove the intermediaries, you also remove the police, and you let criminals proliferate. The first bitcoin application at scale was called The Silk Road. It was an online marketplace for everything illegal: drugs, weapons, child pornography, etc. Not to mention the ability to use bitcoins for tax evasion.

Even the proponents of free market economy recognize that a certain level of regulation is necessary to avoid total chaos. Running a business in a land full of criminals with no police isn’t profitable - unless you’re a criminal, too. For instance, the Mt. Gox Bankrupcy in 2014 cost about $450 million to BitCoin users.

Just like it took a long time for governments to control the Internet (which was, and remains, a haven for criminals), it will take a long time for our lawmakers to control the anarchy unleashed by blockchains. The blockchain may carry the promise of a better future in the long term, but for the near future, you’d better be armed.

Do You Really Need A Blockchain?

A large share of the hype around the blockchain comes from people who don’t really understand its shortcomings. They would probably use another solution is they were better informed. Here are a few bad reasons why you should probably not choose the blockchain technology.

You can use a private blockchain Nearly 80% of the blockchain projects I hear about, especially in finance, are based on private blockchains. This completely defeats the main purpose, which is to get an agreement between non-trusted parties. If a project needs runs on a private blockchain, then only trusted parties can join it, and you don’t have a trust problem. In a trusted network, there are many, many other tools to share a ledger of facts - all much better optimized than the blockchain (for instance: a web service).

It offers a way to reach distributed consensus It does, but only if this consensus can be written as code. For instance, a company working with music rights distribution recently contacted us to build an international platform for artist retribution on the blockchain. Except that when two countries disagree on how to pay right holders, they both have valid contracts. Only a court can decide which contract wins. No smart contract can replace that. You must have clear governance rules that already work before trying to automate them in a blockchain.

It’s secure Asymmetric cryptography is one of the blockchain’s strengths. However, the blockchain technology, just like any other, is safe only until someone finds a vulnerability. It has already happened in the past. The computer science behind the blockchain is so complex that very few developers can contribute or review the code. Consider smart contracts and blockchains as relatively less secure than, say, TSL on the web (through HTTPS). Of, and even if the software works perfectly, it doesn’t prevent fraud. Remember the double spend problem from our first post? It turns out people regularly try that in blockchains (see the latest 200 double spends in the Bitcoin blockchain)

It’s transparent Granted, all transactions are public, and expose location and IP address. But no personal information ever transits - only anonymous hashes. Even the creator of Bitcoin is a mystery. So blockchain transparency doesn’t prevent crime or fraud. Also, transparency is usually an inconvenient for businesses. Are you willing to bet your business on a technology that lets everyone track all your transactions, and exposes your code to hackers?

Data is replicated and safe Sure, but with the least cost effective replication strategy. Amazon S3 replicates every bit of data at least 3 times with 100% uptime, for a fraction of the price. And if you actually need full transaction history, use an event store.

It connects anonymous peers But if it’s only for a shared storage (i.e. if you don’t need fact ordering), then regular peer-to-peer network protocols like BitTorrent are enough.

It’s hip I can’t argue with that: yelling the word “blockchain” out loud is currently a great way to grab an innovation budget. However, many of the shining products that pretend to run on the blockchain are merely powerpoint presentations. Besides, you’ll get better results with many other technologies. Not to mention that the word blockchain also evokes money laundering, tax fraud, and pornography.

If you want to build your business on the blockchain, be certain that you need it, and that it will be really useful for your use case.


Blockchains are a very smart idea, with huge possible implications. But are the current implementations ready to power the disruptive applications of the next decade?

On the technical side, some elementary features are simply not feasible. Blockchains are not efficient enough, not enough developer-friendly, and they give too much power to a small league of extraordinary developers without enough political and economical background.

On the business side, the blockchain is moving too fast, it’s expensive, and often overkill. Costs may vary tenfold for no reason. Building a business on such an unstable platform is incredibly risky.

My take is that we have to wait. The blockchain isn’t ready yet. It needs more maturity, another killer app than a speculation engine, a larger developer community, more ecological and economical responsibility. How long will it take? Maybe a year or two? Nobody can tell that.

To be honest, this conclusion surprised me. Most of the publications about the blockchain suggest the opposite. They say “it’s time”, “don’t miss the train”, or “the giant businesses of the next decade are being built on the blockchain right now”. Maybe they are wrong, or maybe we are wrong. We’ve tried to argument this analysis with strong evidence. If you have a different opinion, please voice your comment below.

We’ll be following the developments in the different blockchain projects closely. Make sure you follow this blog for related news!

The Blockchain Explained to Web Developers, Part 2: In Practice

Published on 20 May 2016 by Gildas Garcia and Kevin Maschtaler

Edmonton's first digital billboard?

Is the blockchain a revolution? The technology that powers Bitcoin sure has the potential to disrupt the entire Internet, as we explained in a previous post. But how can you, a developer, use the blockchain to build applications? Are the tools easy to use, despite the complexity of the underlying concepts? How good is the developer experience?

We wanted to find out, and there is no better tutorial than developing an app from scratch. So we’ve made a simple decentralized ad server called Zero Dollar Homepage, powered by blockchain. This is the story of our experience. Read on to learn how hard the blockchain is for developers today.

Application Concept

The blockchain shines when it replaces intermediaries. We chose to focus on Ad Platforms, which are intermediaries between announcers (who buy visibility) and content providers (who sell screen real estate). Our project was to build a decentralized ad platform running on the blockchain.

Since the famous Million Dollar Homepage experiment, innovating in the field of paid ads can’t make you rich anymore.

Instead, we chose to build a tool that allows to display ads for free - a Zero Dollar Homepage. For free, but not for nothing: advertisers exchange ad visibility for open-source contributions. So we’ve built a decentralized app to manage how ads display on a particular page. Advertisers need to take up a coding challenge to be able to put their ads on this page.

User Workflow

In concrete terms, whenever we merge a Pull Request (PR) on one of marmelab’s open-source repositories, a GitHub bot comments on the PR, and invites the PR author to publish their ad on the ad platform admin panel.

Following the link contained in the comment, the PR author is asked to sign in with their GitHub credentials. Then, they can upload an ad - in fact, a simple image. This image is added to the list of images uploaded by other PR authors, in chronological order.

Each day at midnight, an automated script takes the next image in line (using a FIFO ordering), and displays it on for the next 24 hours.

Note: The entire process requires no intermediary, but in order to avoid the display of adult imagery on our site, we validate the uploaded images through the Google Vision API before putting them online.


Here is how we separated responsibilities in each of the 4 use cases of an ad platform:

  1. Open-source contributor notification Whenever an open-source PR gets merged on one of our repositories, GitHub notifies the admin app with the PR details. The app publishes a comment on the PR to notify the contributor. The comment contains a link back to the admin app, with the PR details.
  2. Claim and image upload Following the comment link, the contributor goes to the admin app. He must sign in with his GitHub credentials to be authenticated. The admin app then calls GitHub to grab the PR details, and to check that the contributor is actually the PR author. If it’s OK, the admin app displays an image upload form. When the contributor uploads an image, the admin app pushes the PR id to the blockchain, and uploads the image to a CDN (named after the PR id). The admin app displays the approximate date of publication of the image based on the number of valid PRs with an image still waiting in the blockchain.
  3. Ad placement Every 24 hours, a cron asks the blockchain for the next PR not yet displayed. The blockchain marks this PR as displayed and sends the ID. The cron renames the image named after the pr ID to “current image”.
  4. Ad display Each time a visitor wants to display the ad in ZeroDollarHomepage, it asks the CDN for the current image. It happens to be the latest published ad from the blockchain, which remains displayed at least 1 day (and until another contributor claims a PR).

This might seem surprising, as the blockchain plays a very small part in the process. But we quickly realized that the entire code of the ad platform couldn’t live in the blockchain. In fact, blockchains have very limited capabilities in terms of connectivity to the Internet, and processing power. So we delegated only the crucial ad placement tasks to the blockchain:

  • Register a pull request by an authenticated contributor
  • Get the last non displayed pull request, and mark it as displayed

Other tasks ended up in the admin app, outside of the blockchain, for various reasons:

  • Register a pull request from a webhook Registering a pull request before it’s been claimed is useless, since the contributor may never claim it. Besides, storing data in the blockchain isn’t free, so we store only what we have to store. The downside is that any PR on our public repositories, including those created before this experiment, are eligible for the next step.
  • Notify the user by posting a comment to GitHub A smart contract can’t call an external API, so it’s just not possible. Instead, we delegated this task to the admin app.
  • Verify a claimed PR’s author Again, a smart contract can’t call the GitHub APIs. Instead, we moved this logic to the admin app, and made it a prerequisite before calling the blockchain.
  • Store the Image In theory, you can store pretty much anything in the blockchain, including images. In practice, images cost a lot to store, and we didn’t manage to store more than one “table” (array of data) in our smart contract.
  • Update the displayed ad to the next in line A blockchain has no equivalent of the setTimeout function, or cron jobs. You might however execute some code every x blocks but it’s not related to time. Instead, we used a cron-like library on our API.

Research, documentation and first attempts

As we explained in a previous post, they aren’t many good choices when choosing a blockchain network. So we chose Ethereum.

We quickly hit our first wall. Until a few weeks ago, you couldn’t play with the Ethereum blockchain without buying Ether first, even for simple tests. Also, Ethereum didn’t really allow private blockchains in its former version (named Frontier), which made development very complicated. Anyone accessing the Ethereum network might call your test contracts. More importantly, the documentation is a volunteer initiative, and was not in sync with the development state.

Note: Ethereum bumped their version since we developed the application, switching fromFrontier to Homestead. The Ethereum community improved the documentation quality for Homestead.

Despite these shortcomings, we managed to register three nodes on the Ethereum network across Nancy, Paris and Dijon, and to share a ping between those nodes.

In the course of our documentation search, we eventually found the Eris documentation. Eris did an excellent job at explaining Blockchains and contracts. Moreover, they especially built a layer on top of Ethereum, and open-sourced a bunch of tools to ease the process of developing smart contracts.

eris is command line tool you can use to initialize any number of local blockchains you need.

Smart Contract Implementation

A smart contract is very similar to an API. It has a few public functions which might be called by anyone registered on the blockchain network. Unlike an API, a smart contract cannot call external web APIs (a blockchain is a closed ecosystem). A smart contract may however call other smart contracts, provided it knows their address.

As with an API, the public functions are only the tip of the iceberg. A contract might be in fact composed of many private functions, variables, etc.

Smart contracts are hosted in the blockchain in an Ethereum-specific binary format, executable by the Ethereum Virtual Machine. Several languages and compilers are available to write contracts:

At marmelab, we code a lot in Javascript, so we chose to use Solidity. Solidity contracts are stored in .sol files.

The Zero Dollar Homepage Contract

The Zero Dollar Homepage contract stores the claimed pull-requests, and a queue of requests to display. The first version of the Solidity contract looked like this:

// in src/ethereum/ZeroDollarHomePage.sol
contract ZeroDollarHomePage {
    uint constant ShaLength = 40;

    enum ResponseCodes {

    struct Request {
        uint id;
        string authorName;
        string imageUrl;
        uint createdAt;
        uint displayedAt;

    // what the contract stores
    mapping (uint => Request) _requests; // key is the pull request id
    uint public numberOfRequests;
    uint[] _queue;
    uint public queueLength;
    uint _current;
    address owner;

    // constructor
    function ZeroDollarHomePage() {
        owner = msg.sender;
        numberOfRequests = 0;
        queueLength = 0;
        _current = 0;

    // a contract must give a way to destroy itself once uploaded to the blockchain
    function remove() {
        if (msg.sender == owner){

    // the following three methods are public contracts entry points

    function newRequest(uint pullRequestId, string authorName, string imageUrl) returns (uint8 code, uint displayDate) {
        if (pullRequestId <= 0) {
            // Solidity is a strong typed language. You get compilation errors when types mismatch
            code = uint8(ResponseCodes.InvalidPullRequestId);

        if (_requests[pullRequestId].id == pullRequestId) {
            code = uint8(ResponseCodes.PullRequestAlreadyClaimed);

        if (bytes(authorName).length <= 0) {
            code = uint8(ResponseCodes.InvalidAuthorName);

        if (bytes(imageUrl).length <= 0) {
            code = uint8(ResponseCodes.InvalidImageUrl);

        // store new pull request details
        numberOfRequests += 1;
        _requests[pullRequestId].id = pullRequestId;
        _requests[pullRequestId].authorName = authorName;
        _requests[pullRequestId].imageUrl = imageUrl;
        _requests[pullRequestId].createdAt = now;

        queueLength += 1;

        code = uint8(ResponseCodes.Ok);
        displayDate = now + (queueLength * 1 days);
        // no need to explicitly return code and displayDate as they are in the method signature

    function closeRequest() returns (uint8) {
        if (queueLength == 0) {
            return uint8(ResponseCodes.EmptyQueue);

        _requests[_queue[_current]].displayedAt = now;
        delete _queue[0];
        queueLength -= 1;
        _current = _current + 1;
        return uint8(ResponseCodes.Ok);

    function getLastNonPublished() returns (uint8 code, uint id, string authorName, string imageUrl, uint createdAt) {
        if (queueLength == 0) {
            code = uint8(ResponseCodes.EmptyQueue);

        var request = _requests[_queue[_current]];
        id =;
        authorName = request.authorName;
        imageUrl = request.imageUrl;
        createdAt = request.createdAt;
        code = uint8(ResponseCodes.Ok);

For this first attempt, we used the Eris JS libraries to communicate with our blockchain. Instanciating a contract from a Node.js file turned up to be as simple as:

import eris from 'eris';

function getContract(url, account) {
    const address = // Read address file stored on disk by the eris CLI;
    const abi = // Read abi file stored on disk by the eris CLI;
    const manager = eris.newContractManagerDev(url, account);
    return manager.newContractFactory(abi).at(address);

And calling it wasn’t difficult either:

function* newRequest(pullrequestId, authorName, imageUrl) {
    const contract = getContract(url, account);
    // First gotcha, when a function returns several named variables, they are returned as an Arrays
    // Second gotcha, numbers are returned as instances of BigNumber, do not forget to convert when standard numbers are expected
    const [codeAsBigNumber, displayDateAsBigNumber] = yield contract.newRequest(pullrequestId, authorName, imageUrl);
    const code = codeAsBigNumber.toNumber();

    if (code !== 0) {
        throw new Error(getErrorMessageFromCode(code));

    // Return the displayDate for UI confirmation screen
    return displayDate.toNumber();

For more information about the Eris JS binding libraries, please refer to Eris documentation.

Unit Testing Contracts

We love Test Driven Development, and one of the first question we had was: how can we test a Solidity smart contract?

The Eris guys made a tool for that, too: sol-unit. It runs a new local blockchain network for each test, in a docker container (which ensures each test run in a clean environment), and executes the test. Tests are written as a contract, too. Neat!

Well, not so fast. sol-unit is an npm package, and to use the testing functions (assertions, etc.), we had to import the contract supplied by this package in our testing contracts. For that, there is a simple Solidity syntax:

import "../node_modules/sol-unit/.../Asserter.sol";

So far so good… or not. We hit a strange case when compiling our contracts. Apparently, you can’t import contracts with such a path. We ended up adding a command in our testmakefile target to copy those sol-unit contracts in the same folder as ours. After that, running sol-unit was simple and we started coding.

	@cp -f ./node_modules/sol-unit/contracts/src/* ./src/ethereum/

	solc --bin --abi -o ./src/ethereum ./src/ethereum/ZeroDollarHomePage.sol ./src/ethereum/ZeroDollarHomePageTest.sol

test-ethereum: copy-sol-unit compile-contract
	./node_modules/.bin/solunit --dir ./src/ethereum

Running a Test Blockchain

Running a blockchain and deploying our contract to it was as simple as following the Eris documentation. We managed to resolve the few troubles we met using a bunch of commands that we integrated in our makefile. The whole process of running a new blockchain with our contract looks like this:

  • Reset any running eris docker containers, and remove some temporary files
  • Start the eris key service
  • Generate our account key, and store its address in a convenient file to be loaded later by the JS API
  • Generate the genesis.json, which is the “block 0” of the blockchain
  • Create and start the new blockchain
  • Upload the contract to the blockchain and save its address in order to call it when we need it

After a few days of work, we were able to run the contracts on a local Eris blockchain.

From Eris to Ethereum

At this point, we wanted to try out our contracts on a local Ethereum blockchain.

To communicate with contracts inside the Ethereum blockchain, we had to use the Web3 libraries. We learned a lot while trying to use them. We realized that eris was hiding a lot of the underlying complexity.

First, our initial assumption that a contract is similar to an API was not correct. We had to distinguish functions that were only reading data from the blockchain, and functions that were writing data to it.

The first kind (read-only functions) would return the resulting data asynchronously, just like an API would do. The second kind (write functions) would only return a transaction hash. The expected side effects of a write function (changes inside the blockchain) wouldn’t be effective until the corresponding blocks would be mined, which could take some time (from ten seconds to one minute in the worst case). Moreover, we haven’t been able to make those writing functions return values, so we had to change our solidity code to call a write function first, then call a read function to get the results.

We also discovered events, which can be used to be notified when something happens in a smart contract. The smart contract is responsible for triggering the events. They look like this with solidity:

event PullRequestClaimed(unit pullRequestId, uint estimatedDisplayDate);

And they can be triggered from any of the smart contract functions, like this:

PullRequestClaimed(pullRequestId, estimatedDisplayDate);

Those events are stored permanently in the blockchain. That means we could use the blockchain as an event store. It might be the easiest way to determine if a call to a function has been successfully executed: the smart contract may trigger an event at the end of its process with failure reasons, results of computation, etc… It’s worth noting that some integration packages for Meteor are already available.

Eventually, we refactored our smart contracts to be a lot simpler in order to get almost the same features. We had to get rid of the mappings (which we haven’t been able to use - our transactions weren’t mined by the Ethereum network for some reason).

The solidity language may be close to JavaScript, it is still very young and incomplete. Arrays don’t have the functions we’re used to work with in JavaScript (not even indexOf), and strings don’t have any functions. This might be addressed by some community efforts in the near future.

The Ethereum implementation looks like this:

// in src/ethereum/ZeroDollarHomePage.sol
contract ZeroDollarHomePage {
    event InvalidPullRequest(uint indexed pullRequestId);
    event PullRequestAlreadyClaimed(uint indexed pullRequestId, uint timeBeforeDisplay, bool past);
    event PullRequestClaimed(uint indexed pullRequestId, uint timeBeforeDisplay);
    event QueueIsEmpty();

    bool _handledFirst;
    uint[] _queue;
    uint _current;
    address owner;

    function ZeroDollarHomePage() {
        owner = msg.sender;
        _handledFirst = false;
        _current = 0;

    function remove() {
        if (msg.sender == owner){

    function newRequest(uint pullRequestId) {
        if (pullRequestId <= 0) {

        // Check that the pr hasn't already been claimed
        bool found = false;
        uint index = 0;

        while (!found && index < _queue.length) {
            if (_queue[index] == pullRequestId) {
                found = true;
            } else {

        if (found) {
            PullRequestAlreadyClaimed(pullRequestId, (index - _current) * 1 days, _current > index);

        PullRequestClaimed(pullRequestId, (_queue.length - _current) * 1 days);

    function closeRequest() {
        if (_handledFirst && _current < _queue.length - 1) {
            _current += 1;

        _handledFirst = true;

    function getLastNonPublished() constant returns (uint pullRequestId) {
        if (_current >= _queue.length) {
            return 0;

        return _queue[_current];

The process for claiming a pull request and returning the estimated display date evolved to become:

// make a [transaction]( call to our smart-contract write function
contract.newRequest.sendTransaction(pullrequestId, {
    to: client.eth.coinbase,
}, (err, tx) => {
    if (err) {
        throw error;

    // wait for it to be mined using [code]( from [@croqaz](
    return waitForTransationToBeMined(client, tx)
        .then(txHash => {
            if (!txHash) throw new Error('Transaction failed (no transaction hash)');

            // get its receipt which might contains informations about event triggered by the contract's code
            // this function might also check wether the transaction was successful by analyzing the receipt for ethereum specific error cases (insufficient funds, etc.)
            return getReceipt(client, txHash);
        .then(receipt => {
            // parse those logs to extract only event data
            return parseReceiptLogs(receipt.logs, contractAbi));
        .then(logs => {
            if (logs.length === 0) {
                throw new Error('Transaction failed (Invalid logs)');

            const log = logs[0];

            if (log.event === 'PullRequestClaimed') {
                // timeBeforeDisplay is a BigNumber instance
                return log.args.timeBeforeDisplay.toNumber();

            if (log.event === 'PullRequestAlreadyClaimed') {
                const number = log.args.timeBeforeDisplay;

                if (log.args.past) {
                    // timeBeforeDisplay is a BigNumber instance
                    return number.negated().toNumber();

                // timeBeforeDisplay is a BigNumber instance
                return number.toNumber();

            if (log.event === 'InvalidPullRequest') {
                throw new Error('Invalid pull request id');

And with this code, our decentralized app worked in a local Ethereum network.

Deployment to Production

Running our application in a local environment was a challenge, but deploying it to production, in the real Ethereum network, was a battle.

There are a few gotchas to be aware of. The most important one is that contracts are immutable in code. This means that:

  • A contract that you deploy to the blockchain stays there forever. If you find a bug you your contract, you can’t fix it - you have to deploy a new contract.
  • When you deploy a new version of an existing contract, and any data stored in the previous contract isn’t automatically transferred - unless you voluntarily initialize the new contract with the past data. In our case, fixing a bug in the contract actually wipes away recorded PRs (whether already advertised, or waiting for ad display).
  • Every contract version has an id (for instance, the current ZeroDollarHomepage contract is 0xd18e21bb13d154a16793c6f89186a034a8116b74). Since past versions may contain data, keep track of past contract ids if you don’t want to lose the data (this happened to us, too).
  • As you can’t update a contract, you can’t rollback an update either. Make really sure that your contract works before redeploying it.
  • When you deploy a new version of an existing contract, the old (buggy) contract can still be called. Any system outside of the blockchain referencing the contract (like our Node admin app in Zero Dollar Homepage) must be updated to point to the new contract. We forgot to do it a few times, and scratched our head desperately to understand why our new code didn’t run.
  • Contracts authors can kill their contract if they include a suicide call in the code. But all the existing transactions of the contract remain in the blockchain - forever. Also, make sure that the kill switch deals with the remaining ether in the contract if you don’t want it to disappear.

Another gotcha is that every contract deployment and write operation in the blockchain costs a variable amount of ether. We managed to get 5 ETH (more about getting ether below), but we had no idea how much we would need to deploy our contract, or calling a transaction. It’s harder to test when each failed test costs money.

For the Node.js part, we decided to run it on an AWS EC2 instance, like most of our projects. To do so, we had to:

  • Run an Ethereum node on the server
  • Download the entire blockchain to this server
  • Unlock an account with some Ether on the node
  • Deploy our application and link it to the node
  • Register our smart contract into the blockchain through the node

Make sure your blockchain node server has plenty of storage. The current size of the blockchain is about 15GB. The default volume size on an EC2 instance is 8GB (sigh). We had many troubles because we hadn’t downloaded the entire chain (but we didn’t realize it immediately). For instance, we had an account with 5 ETH, but for a long time the system responded as if we hadn’t unlocked our account, or as if we had no ether. Downloading the rest of the chain fixed this issue.

Likewise, unlocking our precious account containing 5 ETH was not an easy task. We did not want to hardcode our passphrase in the application, and we wanted to run the node with supervisord to ease the deployment. We finally found a way that allowed us to change the configuration without exposing our passphrase with the following supervisordconfiguration:

command=geth --ipcdisable --rpc --fast --unlock 0 --password /path/to/our/password/in/a/file

One final security note: The Remote Procedure Call (RPC) port of the blockchain is 8545. Do not open this port on your EC2 instance! Anyone knowing the instance IP could control your Ethereum node, and steal your ether.

Ether and Gas

As explained in our first post on the blockchain, deploying and calling a contract in the Ethereum blockchain isn’t free. Since a blockchain is expensive to run, any write operation must be paid for. In Ethereum, the price of calling a write contract method depends on the complexity of the method. Ethereum comes with a list of Gas Fees, which tells you how much Ether you should add to a contract call to have it executed.

In practice, that’s a very low amount of Ether, a fraction of a single Ether. The Ethereum blockchain introduced another currency for running contracts: Gas.

1 Gas = 0.00001 ETH 1 ETH = 100,000 Gas

The Gas to Ether conversion rate will vary in the future according to the supply of computing power, and the computation demand.

Charging a fee to process a transaction isn’t compulsory, but recommended. The Ethereum documentation says: “Miners are free to ignore transactions whose gas price is too low”. However, a mined block always give 5 ETH to the successful miner.

To call our own contracts, the Ethereum blockchain requires between 0.00045 and 0.00098 Ether (the price depends on the gas price and the gas used by the transaction).

How do you get Ether and Gas? You can buy Ether (mostly by exchanging Bitcoins), or you can mine it. In France, where we live, buying Bitcoins or Ether requires almost the same procedure as opening a bank account. It’s slow (a few days), painful, and depends on exchange rates fixed by offer and demand.

Mining Ether

So we decided to mine our Ether. That’s a good way to see if mining is profitable on Ethereum or not. We spawned a heavy Amazon EC2 instance, with strong GPU computing power (a g2.2xlarge instance). The price of this instance is 17$ per day. We installed Ethminer, and started our node. We quickly had to beef up the instance even more, because of high memory and storage requirements. The first thing a node does when it joins a blockchain is to download the entire history of past transactions. That takes a huge amount of storage: over 14GB for the blockchain’s history, and about 3GB for the Ethash Proof of Work.

Once our Ethereum node started, we had to mine for 3 days to create a valid block:

As a reminder, the Ethereum blockchain mines one block every 10 seconds. Mining a block brings up 5 Ether, which sell for roughly $55. The running cost for our beefy EC2 instance for these 3 days was about $51. All in all, it was cheaper to mine Ether on AWS than to buy it. But we were very lucky: since we mined our block, the network’s difficulty was multiplied by three.

How long can we run the ZeroDollarHomePage with 5 Ether? Let’s make the computation.

The Zero Dollar Homepage workflow implies one transaction per day, plus one transaction per claimed PR. Supposing contributors claim one PR per day, the yearly price in ether for running the platform would be at most 365 * 2 * 0,00098 = 0.72 ETH. With 5 ETH, we should normally be able to run the platform for almost seven years.

As you see, running a contract in Ethereum isn’t free, but at the current price of Ether, it’s still cheap. However, the Ether value varies a great deal. Since mining Bitcoin is becoming less and less profitable, some large Bitcoin mining farms switch to Ethereum. This makes mining harder, and makes Ether more expensive every day.

Final Surprise

Finally, our smart contract ended up working fine in our real world Ethereum node hosted on EC2.

But by the time we got there, Ethereum released their Homestead version, which brought a lot of new things and broke our code entirely. It took us about a week to understand why, through trial and error, and fix the code that wasn’t compatible anymore for obscure reasons.

Tip: The Homestead release documents a hidden Ethereum feature, private networks, to ease development. The lack of private networks was one of our reasons to use Eris in the first place.

The ZeroDollarHomePage platform is now up and running again. You can use it by opening a pull request on one of marmelab’s open-source repositories on GitHub, see the ads currently displayed on, or browse the code of the application on marmelab/ZeroDollarHomePage. Yes, we’re open-sourcing the entire ad platform, so you can see in detail how it works, and reproduce it locally.


The Ethereum developer experience is very bad. Imagine that you have no logs and no debug tools. Imagine that the only way to discover why a program fails is to echo “I’m here” strings every line to locate the problem. Imagine that sometimes (e.g. in Solidity contracts), you can’t even do that. Imagine that a program that works perfectly in the development environment (where you can add debug statements) fails silently in the production environment (where you can’t). That’s the developer experience in Ethereum.

If you store data in your smart contract, there is no built-in way to visualize the current state of this data after a transaction. That means you need to build your own visualisation tool to be able to troubleshoot errors.

The tools available to track Ethereum contracts and transactions are:

For instance, here is how our contract looks in etherscan:

Each transaction (call to a contract method) is logged there, together with a trace of the contract execution… in machine language. Apart from making sure your call actually gets to the contract, you can’t use it for debugging.

Also, these tools can only monitor the public Ethereum network. Unfortunately, you can’t use them to debug a local blockchain.

If you have ever seen Bitcoin transaction auditing sites, don’t expect the same level of sophistication for Ethereum. Besides, the bitcoin network only has one kind of transaction, so it’s easier to monitor than a network designed to run smart contracts.


And that’s not all: the Ethereum documentation is not in sync with the code (at least in the Frontier version), so most of the time we had to look at the libraries to try to understand how we’re expected to code. Since the libraries in question use a language that no one uses (Solidity), good luck figuring out how they work. Oh, and don’t expect help from Stack Overflow, either. There are too few people like us who dared to implement something serious to have a good community support.

Let’s be clear: we are not criticizing the Ethereum community for their lack of efforts. Once again, there is a tremendous momentum behind Ethereum, and things improve at a rapid pace. Kudos to all the documentation contributors for their work. But by the time we developed our application, the documentation state was clearly not good enough for a new Ethereum developer to start a project.

You can find a few tutorials here and there, but most of the time, copy-pasted code from these tutorials simply doesn’t work.

Here are a few resources worth reading if you want to start developing smart contracts yourself:


After 4 weeks of work by 2 experienced developers, we managed to make our code work in the public Ethereum network with lots of effort. Regressions and compatibility breaks in the Ethereum libraries between Frontier and Homestead versions didn’t help. Check the project source code at marmelab/ZeroDollarHomePage for a detailed understanding of the inner workings. Please forgive the potential bugs in the code, or the inaccuracies in this post - we have a limited experience in the matter. Feel free to send us your corrections in GitHub, or in the comments.

We didn’t enjoy the party. Finding our way across bad documentation and young libraries isn’t exactly our cup of tea. Fighting to implement simple features (like string manipulation) with a half-baked language isn’t fun either. Realizing that, despite years of programming experience in many scripting languages, we are not able to write a simple solidity contract is frustrating. Most importantly, the youth of the Ethereum ecosystem makes it completely impossible to forecast the time to implement a simple feature. Since time is money, it’s currently impossible to determine how much it will cost to develop a Decentralized App.

In time and resources, ZeroDollarHomepage represents a development cost of more than €20,000 - even if it’s a very simple system. As compared to the tools we use in other projects (Node.js, Koa, React.js, PostgreSQL, etc.), developing on the blockchain is very expensive. It’s a great disappointment for the dev team, and a strong signal that the ecosystem isn’t ready yet.

Is this bad experience sufficient to make up our mind about the blockchain? How come many startups showcase their blockchain services as successful innovations? What’s the real cost of building a DApp? Read the last post in this series to see what we really think about the blockchain phenomenon.

The Blockchain Explained to Web Developers, Part 1: The Theory

Published on 28 April 2016 by Francois Zaninotto

The blockchain is the new hot technology. If you haven’t heard about it, you probably know Bitcoin. Well, the blockchain is the underlying technology that powers Bitcoin. Experts say the blockchain will cause a revolution similar to what Internet provoked. But what is it really, and how can it be used to build apps today? This post is the first in a series of three, explaining the blockchain phenomenon to web developers. We’ll discuss the theory, show actual code, and share our learnings, based on a real world project.

To begin, let’s try to understand what blockchains really are.

What Is A Blockchain, Take One

Although the blockchain was created to support Bitcoin, the blockchain concept can be defined regardless of the Bitcoin ecosystem. The literature usually defines a blockchain as follows:

A blockchain is a ledger of facts, replicated across several computers assembled in a peer-to-peer network. Facts can be anything from monetary transactions to content signature. Members of the network are anonymous individuals called nodes. All communication inside the network takes advantage of cryptography to securely identify the sender and the receiver. When a node wants to add a fact to the ledger, a consensus forms in the network to determine where this fact should appear in the ledger; this consensus is called a block.

The Thinker, by Rodin

I don’t know about you, but after reading these definitions, I still had troubles figuring out what this is all about. Let’s get a bit deeper.

Ordering Facts

Decentralized peer-to-peer networks aren’t new. Napster and BitTorrent are P2P networks. Instead of exchanging movies, members of the blockchain network exchange facts. Then what’s the real deal about blockchains?

P2P networks, like other distributed systems, have to solve a very difficult computer science problem: the resolution of conflicts, or reconciliation. Relational databases offer referential integrity, but there is no such thing in distributed system. If two incompatible facts arrive at the same time, the system must have rules to determine which fact is considered valid.

Take for instance the double spend problem: Alice has 10$, and she sends twice 10$ to Bob and Charlie. Who will have the 10$ eventually? To answer this question, the best way is to order the facts. If two incompatible facts arrive in the network, the first one to be recorded wins.

double spend

In a P2P network, two facts sent roughly at the same time may arrive in different orders in distant nodes. Then how can the entire network agree on the first fact? To guarantee integrity over a P2P network, you need a way to make everyone agree on the ordering of facts. You need a consensus system.

Consensus algorithms for distributed systems are a very active research field. You may have heard of Paxos or Raft algorithms. The blockchain implements another algorithm, the proof-of-work consensus, using blocks.


Blocks are a smart trick to order facts in a network of non-trusted peers. The idea is simple: facts are grouped in blocks, and there is only a single chain of blocks, replicated in the entire network. Each block references the previous one. So if fact F is in block 21, and fact E is in block 22, then fact E is considered by the entire network to be posterior to fact F. Before being added to a block, facts are pending, i.e. unconfirmed.

How blocks group facts


Some nodes in the chain create a new local block with pending facts. They compete to see if their local block is going to become the next block in the chain for the entire network, by rolling dice. If a node makes a double six, then it earns the ability to publish their local block, and all facts in this block become confirmed. This block is sent to all other nodes in the network. All nodes check that the block is correct, add it to their copy of the chain, and try to build a new block with new pending facts.

Rolling dice

But nodes don’t just roll a couple dice. Blockchain challenges imply rolling a huge number of dice. Finding the random key to validate a block is very unlikely, by design. This prevents fraud, and makes the network safe (unless a malicious user owns more than half of the nodes in the network). As a consequence, new blocks gets published to the chain at a fixed time interval. In Bitcoin, blocks are published every 10 minutes on average.

In Bitcoin, the challenge involves a double SHA-256 hash of a string made of the pending facts, the identifier of the previous block, and a random string. A node wins if their hash contains at least n leading zeroes.

// a losing hash for Bitcoin
// a winning hash for Bitcoin if n=10

Number n is adjusted every once in a while to keep block duration fixed despite variations in the number of nodes. This number is called the difficulty. Other blockchain implementations use special hashing techniques that discourage the usage of GPUs (e.g. by requiring large memory transfers).

The process of looking for blocks is called mining. This is because, just like gold mining, block mining brings an economical reward - some form of money. That’s the reason why people who run nodes in a blockchain are also called miners.

Note: By default, a node doesn’t mine - it just receives blocks mined by other nodes. It’s a voluntary process to turn a node into a miner node.

Money and Cryptocurrencies

Every second, each miner node in a blockchain tests thousands of random strings to try and form a new block. So running a miner in the blockchain pumps a huge amount of computer resources (storage and CPU). That’s why you must pay to store facts in a blockchain. Reading facts, on the other hand, is free: you just need to run your own node, and you’ll recuperate the entire history of facts issued by all the other nodes. So to summarize:

  • Reading data is free
  • Adding facts costs a small fee
  • Mining a block brings in the money of all the fees of the facts included in the block

We’re not talking about real money here. In fact, each blockchain has its own (crypto-)currency. It’s called Bitcoin (BTC) in the Bitcoin network, Ether (ETH) on the Ethereum network, etc. To make a payment in the Bitcoin network, you must pay a small fee in Bitcoins - just like you would pay a fee to a bank. But then, where do the first coins come from?

A pile of Bitcoins

Miners receive a gratification for keeping the network working and safe. Each time they successfully mine a block, they receive a fixed amount of cryptocurrency. In Bitcoin this gratification is 25 BTC per block, in Ethereum it’s 5 ETH per block. That way, the blockchain generates its own money.

Lastly, cryptocurrencies rapidly became convertible to real money. Their facial value is only determined by offer and demand, so it’s subject to speculation. At the time of writing, mining Bitcoins still costs slightly less in energy and hardware than you can earn by selling the coins you discovered in the process. That’s why people add new miners every day, hoping to turn electricity into money. But fluctuations in the BTC value make it less and less profitable.



So far we’ve mostly mentioned facts storage, but a blockchain can also execute programs. Some blockchains allow each fact to contain a mini program. Such programs are replicated together with the facts, and every node executes them when receiving the facts. In bitcoin, this can be used to make a transaction conditional: Bob will receive 100 BTC from Alice if and only if today is February 29th.

Other blockchains allow for more sophisticated contracts. In Ethereum for instance, each contract carries a mini-database, and exposes methods to modify the data. As contracts are replicated across all nodes, so are their database. Each time a user calls a method on the contract and therefore updates the underlying data, this command is replicated and replayed by the entire network. This allows for a distributed consensus on the execution of a promise.

This idea of pre-programed conditions, interfaced with the real world, and broadcasted to everyone, is called a smart contract. A contract is a promise that signing parties agree to make legally-enforceable. A smart contract is the same, except with the word “technically-“ instead of “legally-“. This removes the need for a judge, or any authority acknowledged by both parties.

Public hearings of the Court presided over by H.E. Judge Rosalyn Higgins (February/March 2006)

Imagine that you want to rent your house for a week and $1,000, with a 50% upfront payment. You and the loaner sign a contract, probably written by a lawyer. You also need a bank to receive the payment. At the beginning of the week, you ask for a $5,000 deposit; the loaner writes a check for it. At the end of the week, the loaner refuses to pay the remaining 50%. You also realize that they broke a window, and that the deposit check refers to an empty account. You’ll need a lawyer to help you enforce the rental contract in a court.

Smart contracts in a blockchain allow you to get rid of the bank, the lawyer, and the court. Just write a program that defines how much money should be transferred in response to certain conditions:

  • two weeks before beginning of rental: transfer $500 from loaner to owner
  • cancellation by the owner: transfer $500 from owner to loaner
  • end of the rental period: transfer $500 from loaner to owner
  • proof of physical degradation after the rental period: transfer $5,000 from loaner to owner

Upload this smart contract to the blockchain, and you’re all set. At the time defined in the contract, the money transfers will occur. And if the owner can bring a predefined proof of physical degradation, they get the $5,000 automatically (without any need for a deposit).

You might wonder how to build a proof of physical degradation. That’s where the Internet of Things (IoT) kicks in. In order to interact with the real world, blockchains need sensors and actuators. The Blockchain revolution won’t happen unless the IoT revolution comes first.

Such applications relying on smart contracts are called Decentralized Apps, or DApps.

Smart contracts naturally extend to smart property, and a lot more smart things. The thing to remember is that “smart” means “no intermediaries”, or “technically-enforced”. Blockchains are a new way to disintermediate businesses - just like the Internet disintermediated music distribution.


What Is A Blockchain, Take Two

In my opinion, the best way to understand the blockchain is to look at it from various angles.

What it does A blockchain allows to securely share and/or process data between multiple parties over a network on non-trusted peers. Data can be anything, but most interesting uses concern information that currently require a trusted third-party to exchange. Examples include money (requires a bank), a proof or property (requires a lawyer), a loan certificate, etc. In essence, the blockchain removes the need for a trusted third party.

How it works From a technical point of view, the blockchain is an innovation relying on three concepts: peer-to-peer networks, public-key cryptography, and distributed consensus based on the resolution of a random mathematical challenge. None of there concepts are new. It’s their combination that allows a breakthrough in computing. If you don’t understand it all, don’t worry: very few people know enough to be able to develop a blockchain on their own (which is a problem). But not understanding the blockchain doesn’t prevent you from using it, just like you can build web apps without knowing about TCP slow start and Certificate Authorities.

What it compares to See the blockchain as a database replicated as many times as there are nodes and (loosely) synchronized, or as a supercomputer formed by the combination of the CPUs/GPUs of all its nodes. You can use this supercomputer to store and process data, just like you would with a remote API. Except you don’t need to own the backend, and you can be sure the data is safe and processed properly by the network.

It's all a matter of perspective

Practical Implications

Facts stored in the blockchain can’t be lost. They are there forever, replicated as many times as there are nodes. Even more, the blockchain doesn’t simply store a final state, it stores the history of all passed states, so that everyone can check the correctness of the final state by replaying the facts from the beginning.

Facts in the blockchain can be trusted, as they are verified by a technically enforceable consensus. Even if the network contains black sheeps, you can trust its judgement as a whole.

Storing data in the blockchain isn’t fast, as it requires a distributed consensus.

Tip: If you have 20 spare minutes to get a deeper understanding, watch this excellent introduction video about Bitcoin, which also explains the blockchain:

Why It’s a Big deal

«The Blockchain is the most disruptive technology I have ever seen.» Salim Ismail

«The most interesting intellectual development on the Internet in the last five years.» Julian Assange

«I think the fact that within the Bitcoin universe an algorithm replaces the functions of [the government] … is actually pretty cool.» Al Gore

These smart people have seen a huge potential in the blockchain. It concerns disintermediation. The blockchain can potentially replace all the intermediaries required to build trust. Let’s see a few example applications, most of which are just proof-of-concepts for now:

  • Monegraph lets authors claim their work, and set their rules (and fares) for use
  • La Zooz is a decentralized Uber. Share your car, find a seat, without Uber taking a fee.
  • Augur is an online bookmaker. Bet on outcomes, and get paid.
  • is a peer-to-peer storage system. Rent your unused disk space, or find ultra cheap online storage.
  • Muse is a distributed, open, and transparent database tailored for the music industry
  • Ripple enables low cost cross-border payments for banks

Blockchain use cases

Many successful businesses on the Internet today are intermediaries. Think about Google for a minute: Google managed to become the intermediary between you and the entire Internet. Think about Amazon: they became the intermediary between sellers and buyers for any type of good. That’s why a technology that allows to remove intermediaries can potentially disrupt the entire Internet.

Will it benefit to end users, who won’t need third parties to exchange goods and services anymore? It’s far from certain. Internet had the same promise of heavy disintermediation. Yet Google built the first market capitalization worldwide as an intermediary. That’s why it’s crucial to invest in the blockchain quickly, because the winners and losers of the next decade are being born right now.

You Won’t Build Your Own Blockchain

The technology behind the blockchain uses advanced cryptography, custom network protocols, and performance optimizations. This is all too sophisticated to be redeveloped each time a project needs a blockchain. Fortunately, aside of Bitcoin, there are several open-source blockchain implementations. Here are the most advanced:

  • Ethereum: an open-source blockchain platform by the Ethereum Foundation
  • Hyperledger: another open-source implementation, this time by the Linux Foundation. The first proposal was published very recently.
  • Eris Industries: Tools helping to manipulate Ethereum, Bitcoin or totally independent blockchains, mostly to build private networks. Their tutorials and explainers are a great starting point for an overview of the blockchain technology.


The maturity of these implementations varies a lot. If you have to build an application now, we’d advise:

  • Eris for a closed Blockchain, or to discover and play with the technology
  • Ethereum for a shared Blockchain

Also, Bitcoin isn’t a good choice to build an application upon. It was designed for money transactions and nothing else, although you can program pseudo-smart contracts (but you have to love assembly). The network currently suffers a serious growth crisis, transactions wait in line for up to one hour to get inserted in a block. Miners often select transactions with the highest fees, so money transfers in Bitcoin become more expensive than they are in a Bank. The developer community is at war, and the speculation on the cryptocurrency makes the face value move too much.


How big are blockchains today? Let’s see some numbers.



Ethereum stats


The blockchain technology is both intriguing and exciting. Could it be the revolution that gurus predict? Or is it just a speculative bubble based on an impractical idea? After reading a lot on the matter, we couldn’t form a definitive opinion.

When we face uncertainty, we know a great way to lift it: trying. That’s what we decided to do. Read the next post in this series to see what we’ve learned by building a real world app running on the blockchain.