Scientist in the making (aka using Science.NET)

When we’re dealing with refactoring legacy code, we’ll often try to ensure the existing unit tests (if they exist) or new ones cover as much of the code as possible before refactoring it. But there’s always a concern about turning off the old code completely until we’ve got a high confidence in the new code. Obviously the test coverage figures and unit tests themselves should give us that confidence, but wouldn’t it by nice to maybe we instead ran the old and new code in parallel and compare the behaviour or at least the results of the code? This is where the Scientist library comes in.

Note: This is very much (from my understanding) in an alpha/pre-release stage of development, so any code written here may differ from the way the library ends up working. So basically what I’m saying is this code works at the time of writing.

Getting started

So the elevator pitch for Science.NET is that it “allows us to two difference implementations of code, side by side and compare the results”. Let’s expand on that with an example.

First off, we’ll set-up our Visual Studio project.

Create a new console application (just because its simple to get started with)
From the Package Manager Console, execute Install-Package Scientist -Pre

Let’s start with a very simple example, let’s assume we have a method which returns a numeric value, we don’t really need to worry much about what this value means – but if you like a back story, let’s assume we import data into an application and the method calculates the confidence that the data matches a known import pattern.

So the legacy code, or the code we wish to verify/test against looks like this

public class Import
{
   public float CalculateConfidenceLevel()
   {
       // do something clever and return a value
       return 0.9f;
   }
}

Now our new Import class looks like this

public class NewImport
{
   public float CalculateConfidenceLevel()
   {
      // do something clever and return a value
      return 0.4f;
   }
}

Okay, okay, I know the result is wrong, but this is mean’t to demonstrate the Science.NET library not my Import code.

Right, so what we want to do is run the two versions of the code side-by-side and see whether the always give the same result. So we’re going to simply run these in our console’s Main method for now but ofcourse the idea is this code would be run from wherever you currently run the Import code from. For now just add the following to Main (we’ll discuss strategies for running the code briefly after this)

var import = new Import();
var newImport = new NewImport();

float confidence = 
   Scientist.Science<float>(
      "Confidence Experiment", experiment =>
   {
      experiment.Use(() => import.CalculateConfidenceLevel());
      experiment.Try(() => newImport.CalculateConfidenceLevel());
   });

Now, if you run this console application you’ll see the confidence variable will have the value 0.9 in it as it’s used the .Use code as the result, but the Science method (surely this should be named the Experiment method :)) will actually run both of our methods and compare the results.

Obviously as both the existing and new implementations are run side-by-side, performance might be a concern for complex methods, especially if running like this in production. See the RunIf method for turning on/off individual experiments if this is a concern.

The “Confidence Experiment” string denotes the name of the comparison test and can be useful in reports, but if you ran this code you’ll have noticed everything just worked, i.e. no errors, no reports, nothing. That’s because at this point the default result publisher (which can be accessed via Scientist.ResultPublisher) is an InMemoryResultPublisher we need to implement a publisher to output to the console (or maybe to a logger or some other mechanism).

So let’s pretty much take the MyResultPublisher from Scientist.net but output to console, so we have

 public class ConsoleResultPublisher : IResultPublisher
{
   public Task Publish<T>(Result<T> result)
   {
      Console.WriteLine(
          $"Publishing results for experiment '{result.ExperimentName}'");
      Console.WriteLine($"Result: {(result.Matched ? "MATCH" : "MISMATCH")}");
      Console.WriteLine($"Control value: {result.Control.Value}");
      Console.WriteLine($"Control duration: {result.Control.Duration}");
      foreach (var observation in result.Candidates)
      {
         Console.WriteLine($"Candidate name: {observation.Name}");
         Console.WriteLine($"Candidate value: {observation.Value}");
         Console.WriteLine($"Candidate duration: {observation.Duration}");
      }

      if (result.Mismatched)
      {
         // saved mismatched experiments to DB
      }

      return Task.FromResult(0);
   }
}

Now insert the following before the float confidence = line input our Main method

Scientist.ResultPublisher = new ConsoleResultPublisher();

Now when you run the code you’ll get the following output in the console window

Publishing results for experiment 'Confidence Experiment'
Result: MISMATCH
Control value: 0.9
Control duration: 00:00:00.0005241
Candidate name: candidate
Candidate value: 0.4
Candidate duration: 00:00:03.9699432

So now you’ll see where the string in the Science method can be used.

More…

Checkout the documentation on Scientist.net of the source itself for more information.

Real world usage?

First off let’s revisit how we might actually design our code to use such a library. The example was created from scratch to demonstrate basic use of the library, but it’s more likely that we’d either create an abstraction layer which instantiates and executes the legacy and new code or if available add the new method to the legacy implementation code. So in an ideal worlds our Import and NewImport methods might implement an IImport interface. Thus it would be best to implement a new version of this interface and within the methods call the Science code, for example

public interface IImport
{
   float CalculateConfidenceLevel();
}

public class ImportExperiment : IImport
{
   private readonly IImport import = new Import();
   private readonly IImport newImport = new Import();

   public float CalculateConfidenceLevel()
   {
      return Scientist.Science<float>(
         "Condfidence Experiment", experiment =>
         {
            experiment.Use(() => import.CalculateConfidenceLevel());
            experiment.Try(() => newImport.CalculateConfidenceLevel());
         });
   }
}

I’ll leave the reader to put the : IImport after the Import and NewImport classes.

So now our Main method would have the following

Scientist.ResultPublisher = new ConsoleResultPublisher();

var import = new ImportExperiment();
var result = import.CalculateConfidenceLevel();

Using an interface like this now means it’s both easy to switch from the old Import to the experiment implementation and eventually to the new implementation, but then hopefully this is how we always code. I know those years of COM development make interfaces almost the first thing I write along with my love of IoC.

And more…

Comparison replacement

So the simple example above demonstrates the return of a primitive/standard type, but what if the return is one of our own more complex objects and therefore more complex comparisons? We can implement an

experiment.Compare((a, b) => a.Name == b.Name);

ofcourse we could hand this comparison off to a more complex predicate.

Unfortunately the Science method expects a return type and hence if your aim is to run two methods with a void return and maybe test some encapsulated data from the classes within the experiment, then you’ll have to do a lot more work.

Toggle on or off

The IExperiment interface which we used to call .Use and .Try also has the method RunIf which I mentioned briefly earlier. We might wish to write our code in such a way that the dev environment runs the experiments but production does not, ensuring our end user’s do not suffer performances hits due to the experiment running. We can use RunIf in the following manner

experiment.RunIf(() => !environment.IsProduction);

for example.

If we needed to include this line in every experiment it might be quite painful, so it’s actually more likely we’d use this to block/run specific experiments, so maybe we run all experiments in all environment, except one very slow experiment.

To enable/disable all experiments, instead we can use

Scientist.Enabled(() => !environment.IsProduction);

Note: this method is not in the NuGet package I’m using but is in the current source on GitHub and in the documentation so hopefully it works as expected in a subsequent release of the NuGet package.

Running something before an experiment

We might need to run something before an experiment starts but we want the code within the context of the experiment, a little like a test setup method, we can use

experiment.BeforeRun(() => BeforeExperiment());

in the above we’ll run some method BeforeExperiment() before the experiment continues.

Finally

I’ve not covered all the currently available methods here as the Scientist.net repository already does that, but hopefully I’m given a peek into what you might do with this library.

My Memory

Outsourcing my memory and thoughts (and other ramblings) to the web

Scientist in the making (aka using Science.NET)