Monthly Archives: October 2022

Linked lists, “do I need them?”

I noticed a few months back, several posts on LinkedIn and Twitter where people are saying things like “what’s the point of a linked list?”, “do I need linked lists?” etc.

The answer, like in many cases, to these sorts of questions are “it depends”.

What is a linked list?

A singularly linked list is a container structure where you usually maintain a HEAD link and from that you have a pointer/reference to a NEXT link in the list, each link points to a NEXT link OR a NULL. The NULL simply indicates there are no more links.

Let’s create a really simple example of this in C#. This will act list a Queue, hence we’ll put to the end of the list and get from the front of the list, i.e. FIFO (First In First Out). It should be pretty obvious how it can be written in most other languages

class Link<T>
{
   public Link<T>(T data)
   {
      Data = data;
   }
  
   public T Data { get; }
   public Link<T> Next { get; set; }
}

class Queue<T>
{
   private Link<T> _head;
   private Link<T> _tail;
 
   public void Put(T data)
   {
      if(_head == null)
      {
         _head = _tail = new Link<T>(data);
      }
      else
      {
         _tail.Next = new Link<T>(data);
         _tail = _tail.Next;
      }
   }

   public T Get()
   {
      if(_head != null) 
      {
         var data = _head.Data;
         _head = _head.Next;
         return data;
      }
      return default(T);
   }
}

In the code above we’ve added a little optimization by including a TAIL link, which just allows us to add to the end of the linked list without having to iterate of the links each time, but otherwise it’s about as simple as it gets.

What are the problems with Linked Lists?

So immediately from the fairly trivial example we can see a few issues with this type of container. The most obvious is the lack of direct indexing, i.e. we cannot just peek or get the third item in a O(1) time complexity. Instead, we must go to the HEAD and iterate to the third item via the NEXT links. Ofcourse, an array would have far better performance than a Linked List in such situations.

With a linked list, we are (potentially) going to end up, with quite a bit of memory fragmentation as each link can be stored anywhere in memory. So, that’s another downside of using a linked list.

I suppose one might find the implementation complex compared to just creating an array, although once implemented and fully tested that’s probably not a concern. What might be more concerned with is that for each item in memory we have to store a link to the next item.

But let’s be honest the main issue people would have with a linked list is performance especially compared to an array.

So, what’s the point of a linked list when we could just use an array?

Way back, when I learned to use linked lists, I was using C++ on 8088 processors and we had very little memory available. Now, whilst each link in a linked list does have this overheard of a NEXT link, a link list can grow in a way that an array cannot.

Let’s assume we have an array of 1000 items and we want to add one more item. Then we need to create a new array buffer the size of the previous buffer + 1 and then copy all the items from the old to the new buffer then add our new item to the next extended buffer then free the original array buffer.

This means we have two arrays in memory, one with a size of 1000 items and another with 1001 items being held in memory whilst we do the copy. When memory is scarce or heavily fragmented, there’s the potential of an Out of Memory exception as an array would need a block of memory that’s contiguous and hence indexable. Along with the fact that this will end up being slow, creating a new array every time we add an item and copying.

The first thing the reader might say is – well instead of creating a new array with 1 extra item, meaning we have to create a new allocation for every item added, let’s instead create an array with more memory than we currently need, so we might then create an array with a delta (a value that says, when I create a new buffer it will always be old buffer length + delta) where delta might be a value such as 100. Now, we only need to allocate more memory every 100 times we add an item. In other words in our example, instead of creating a 1001 buffer we create a 1100 and simply track length as being 1001 but buffer size being 1100, hence we do not need to allocate more memory until the developer reaches the 1100 and wants to add an item.

This reduces memory allocation, but we still have the issues around maintaining two buffers in memory whilst we copy from one to another and, again, on systems with smaller amounts of memory such as microprocessors and still has the potential of Out of Memory exceptions due to this copy process and/or fragmentation.

Conclusion

So, in today’s modern desktop, laptops and even phones, we probably do not care about the extra issues around creating arrays and resizing them because we have plenty of memory and with some languages using garbage collection with heap defragmentation because we’ve memory to spare and the O(1) indexing is far more important to us, but if you’re using a microprocessor/microcontroller where memory is a premium linked lists still make a lot of sense. Ofcourse you will lose out on O(1) indexing but everything is a trade-off where memory is at a premium.

Unit testing with Swift

Swift comes with its own unit testing framework XCTest, let’s take a look at what we need to do to enable unit testing and how we use this library.

Test Package Dependencies

If you created your package via the Wwift package manager your project will be laid out with Sources and Tests folders.

Here’s an example of a library package, notice the .testTarget section

targets: [
   .target(
      name: "MyLib",
      dependencies: []),
   .testTarget(
      name: "MyLibTests",
      dependencies: ["MyLib"]),
]

Creating a test fixture

Next up we need to import the XCTest library and derive our test fixture/class from XCTestCase, for example

final class MyLibTests: XCTestCase {
   // tests go here
}

Creating a test

Tests are written in test methods within the test case subclass. Test methods have no parameters, no return value and the name should begin with lowercase prefix test, so for example

final class MyLibTests: XCTestCase {
   func testIsValid() {
     // test setup and asserts
   }
}

Asserting our tests

The XCTest library prefixes its assertion method with XCT. Hence, we have XCTAssertEqual for example to assert that two values are equal

final class MyLibTests: XCTestCase {
   func testIsValid() {
      let a = 1
      let b = 1
      XCTAssertEqual(a, b)
   }
}

try…catch, but no finally in Swift

My Swift tip of the day…

Swift handles errors (what we would call exceptions in many other languages) using try and catch, but there’s no finally keyword. Instead, we can wrap a closure and pass to the defer function instead. For example we open some resource (file or whatever – we’re assuming this works fine) we create a defer (a bit like using a Disposable), then we useResource which might exception, but defer will now call closeResource for us

let someResource = openResource()
defer { closeResource(someResource) }

do {
   try useResource(someResource)
} catch {
   throw MyError.resoureFailure()
}

Big O and time complexity

Introduction

The time complexity of a function (or algorithm) can be seen as the relative time the function takes to process data of different sizes.

Time complexity is sometimes referred to as simply “complexity”, or “algorithmic complexity” or “asymptotic complexity”. We might measure the time complexity of a function with respect to best case, the average case or more likely and more generally the worst case.

Note: We generally talk about the time complexity of a function in terms of its worst case, however it’s possible we might declare the best or average case but we should therefore explicitly state that this time complexity is best or average case when using it, otherwise worst case is assumed.

As we’ll see, and you probably worked out from the title of this post, we have notation for this called Big O.

When I say “relative time” we’re not talking about the actual performance characteristics of a function as this may depend on the language used as well and the hardware running the function as well as the actual algorithm used. We’re not really interested directly in the actual time to run an algorithm or function but we’re more interested in how long it takes relative to the data or input growth. In other words whilst a function might take milliseconds for 100 items, what it we increase the data presented to the function to 10,000 items.

Disclaimer: This post is based upon a computer science view/usage of time complexity and Big O. It is not as rigorously written (or intended to be) as one would require far a mathematics based analysis and frankly it’s primarily based upon what I recall from my learning this stuff way back in the early nineties along with some refresher reading.

A simple example

Let’s look at something equivalent of the “Hello World” app. but for time complexity…

We have function which counts the number of elements in an array

func count(array: [Int]) -> Int {
  var c = 0
  for item in array {
    c += 1
  }
  return c
}

The language is immaterial, although as I’m messing with Swift at the moment, that’s what I’m presenting here, but it’s so basic, hopefully it’s pretty obvious how this works.

Now let’s pretend I ran this code against an array of 100 items and it ran in 2ms then I ran the function again with 200 elements and measure the time to be 4ms. We can see that this might indicate a linear time complexity – in other words if we continually double the number of items we’d see a double of the time taken. Hence the time against size or growth of the data creates a linear graph (i.e. y = mx+c). Ofcourse we’d need to run a fair few more samples to prove this, but you hopefully get the idea.

But wait… I said at the start that we’re interested in relative time.

What we know about this simple function is that, as the array size grows the time complexity of the function grows in proportion to size, so now we need to figure out a way to represent this information.

Big O

Big O is used to represent our relative time using notation to represent time complexity. The notation starts with a capitalized (big) O (which represents order of) followed by parenthesis and within the parenthesis we have a constant or a formula using n usually, to represent the size of data.

In the count code (above), we’re having to iterate the array to count the number of items. Therefore, to find the length of the array we must iterate over N items, hence this function has a Big O of O(n). You’ll notice though that we have essentially three statements within this function, so let’s be a little more precise about how we calculate the Big O of this function.

func count(array: [Int]) -> Int {
  var c = 0            // O(1)  
  for item in array {  // O(n)
    c += 1             // O(1) 
  }
  return c             // O(1)
}

The line where we assign the initial value to c has a constant time complexity (denoted as O(1)), in other words, this line and for that matter the incrementing of the c variable takes the same time whatever the size of input. However, the for loop must loop through each item in the array, hence has a time complexity of O(n), so we really end up with a total time complexity of

O(1) + O(n) + O(1) + O(1)

We can remove the constants as these become immaterial and are unaffected by the growth of the array, hence removing the O(1) we’re left with the O(n) time complexity.

There are a bunch of (if you like) standard time complexities you’ll see a lot, these are (in order of dominance)

O(1) = constant time
O(log n) = logarithmic time
O(n) = linear time 
O(n log n) = linear log time
O(n^2) = quadratic time
O(2^n) = exponential time
O(n!) = factorial time 

We ignore lower order terms when the algorithm is dominated by higher order terms, in other words O(n) dominates the O(log n) and hence a function that has both terms would have an overall term of O(n).

With regards our sample code, hopefully all languages will store the length of the array alongside the array data. Hence to access the length of an array we simply return this length variable and this happens in constant time O(1). Therefore, we can see that regardless of the size of the array the time to get the length variable will never change in such implementations. But if you’re used to C# think of the difference of using a List Count property against a Linq Count(). The first will return the stored count whereas the second would try to iterate the list giving quite different time complexities.

Working out the Big O of a function

We had a quick look at how to work out the time complexity of the count function, whereby we look at each line and apply the Big O notation to it, then we look for the dominant term and essentially this becomes our time complexity for the function.

Let’s now look at another function

func display() {
   for x in 0..100 {    // O(n^2)
     for y in 0..100 {  
        print(x * y)    // O(1)
     }
   }
}

This gives us O(n^2) + O(1), we can discard the constant and hence this function has a time complexity of O(n^2)

Time complexity as a range

It was mentioned earlier that we could calculate the time complexity for best, average or worst case. Whilst best and worst case give us a range to define the efficiency of our function and one might say that average gives us a useful value, we tend to concern ourselves most with the worst case. A simple example of this would be if we now change our count function to search an array for the first occurrence of a value, for example

func find(array: [Int], search: Int) -> Int{
  var pos = 0            
  for item in array {    
    if item == search {
      return pos
    }
    pos += 1
  }
  return -1
}

In the above code we can see that the only real difference from the previous count function is there’s a break from the O(n) loop when a match is found, therefore we can determine if the array’s first item was a match to our search parameter then this function would have a best case time complexity of O(1), on the other hand if the match was the last item we’d have a time complexity of O(n). Whilst we know that a best case gives us a constant time, we always need to look at the worst case. So, this function would still have a time complexity of O(n) as a worst case.

Why not just time our runs?

Earlier I mentioned some hypothetical timings for the count algorithm. Why don’t we just use those timings to represent our algorithm’s efficiency?

Well as you probably ascertained, we’re looking at have a machine independent measurement. Obviously if we’re comparing timings on a powerful development machine with lots of CPU power and memory our timings will not be indicative of the same algorithm on a low powered machine. Hence, we’re aiming to get a representation of a machine independent value.

As per the start of this post “The time complexity of a function (or algorithm) can be seen as the relative time the function takes to process data of different sizes.
“.

So, one must also remember that we’re measuring how an algorithm’s time complexity changes as data of different sizes are applied to it. But there’s a catch to this. What if we have function countA and it does the count process as implemented above but has a sleep, or some other constant time delay and compare it’s time complexity against the same function countB without this constant time delay, the Big O will essentially be the same, but obviously measured performance on the same machine will differ (potentially vastly differ).

For example, if our implementations looked like this

func countA(array: [Int]) -> Int {
  var c = 0            // O(1)  
  for item in array {  // O(n)
    c += 1             // O(1) 
  }
  return c             // O(1)
}

func countB(array: [Int]) -> Int {
  var c = 0            // O(1)  
  for item in 1..1000 { // O(1)
  }
  for item in array {  // O(n)
    c += 1             // O(1) 
  }
  return c             // O(1)
}

So, in countB we essentially have a constant time loop but it’s obvious from these two functions that countA will be more performant than countB but both will have the same O(n) time complexity.

So how do we deal with this difference?

In the response to that question, at this time, I’m not wholly certain. I need to research this further – but what’s obvious here is that the two algorithms will both be O(n) regarding the changes in the amount of data but not in a true comparison with one another as it’s obvious that countB will always be slower than countA.

Check for vulnerabilities using audit

This is a reminder post as I occasionally forget this command in yarn and npm.

Third party packages within our applications occasionally need updating due to security issues/vulnerabilities. Github kindly supplies information via dependabot, but if you’re not using github or simply want to check the latest state of the packages used in your code, you can use

yarn audit

See yarn audit for more information.

For NPM it’s the same CLI option, i.e.

npm audit

See npm audit for more information.

Property based testing

Tools/libraries such as FsCheck (for .NET), fast-check (for JavaScript) and ofcourse QuickCheck (for Haskell) are aimed at allowing us to write property based tests. In truth we can write property based test with existing tools, what these tend to offer is frameworks with the ability to execute multiple tests and generate data for our tests. So with this in mind, what is property based testing?

What is property based testing?

When we use the word property, we’re not using it like you might in C#, F#, Delphi etc. where it’s a way of accessing fields on an object. Instead we’re really saying is “what are the properties that make this function what it is or do what it does”, when I say “do what it does” I don’t mean from an implementation point.

I’m going to use a canonical example (and I suggest you watch the youtube video The lazy programmer’s guide to writing thousands of tests – Scott Wlaschin for a far better explanation of property based testing using similar examples and a far more complete description of property based testing than I’m going to list here.

Let’s take a simple function, the Add function. Now we want to test our implementation of the add function so how do we do this? Well, we might write something like

[Test]
public void Add_TwoAndFour()
{
   var result = MyMath.Add(2, 4);
   Assert.That(result, Is.EqualTo(6));
}

This is fine, but what we write out Add implementation top allow this test to pass, theoretically (I know nobody, or I hope nobody would do this) we might have an implementation which looks like this

public class MyMath
{
   public static int Add(int a, int b) => 6;
}

Now our unit test will pass. Ofcourse this is a ridiculous implementation, but we’ve just demonstrated that the method passes the supplied unit test but certainly isn’t a valid implementation of an Add function. I mean what if we might add another unit test

[Test]
public void Add_FourAndFour()
{
   var result = MyMath.Add(4, 4);
   Assert.That(result, Is.EqualTo(8));
}

Our test now fails, but what if our mad implementation simply changes to look like this

public static int Add(int a, int b) => (a, b) switch
{
   (2, 4) => 6,
   (4, 4) => 8,
   _ => 0
};

You get the idea, our Add method is passing the test each time by us adding code to pass each test.

If we change our test to supply multiple random values, this will stop our implementation passing our tests by adding a case for every possible test scenario, but now we hit another interesting question. How do we define success?

If we assume that x and y in the following code are supplied via some random number generator. Then we could simply write the implementation, as per the Add function, within our test, like this

[TestRandomData(Max = 100)]
public void Add_XAndY(int x, int y)
{
   var result = MyMath.Add(x, y);
   var expected = x + y;
   Assert.That(result, Is.EqualTo(expected));
}

This would work (assuming we had a TestRandomDataAttribute which took a Max value to supply 100 iterations of random pairs of data.

But there’s a problem, we’re basically having to write the implementation of the method in the unit test, to prove the method worked. Ofcourse it’ll work, the unit test implementation matches what would be the implementation of Add, i.e.

public static int Add(int a, int b) => a + b;

This is not really helping us much!

Note: It’s not unusual to use alternate implementations of code to prove a function works, for example we define a distributed sorting algorithm, so we could test it against a standard built-in sort, but this has limitations of alternate and fully tested implementations do not exist.

What we really need to be doing is testing the properties of the Add method. In other words, what can we use to determine whether our Add method worked apart from using the same implementation code. What are the properties of addition?

What are the properties of addition?

What can we say about properties of addition, let’s look at the Khan Academy definitions for properties of addition.

One of the obvious features of addition is that we can swap the inputs and the output should remain the same, this is known as the Commutative property. Hence, we could now write a test like this

[TestRandomData(Max = 100)]
public void Add_Commutative(int x, int y)
{
   var a = MyMath.Add(x, y);
   var b = MyMath.Add(y, x);
   Assert.That(a, Is.EqualTo(b));
}

Now that we’re testing whether the method is commutative we also no longer care about the numbers being used or the results of those numbers used, we simply care that x + y = y + x.

Ofcourse you may say, “well that’s great but x * y = y * x also holds true and is therefore commutative”. You’d be correct just using this single property based test would also suggest a function Multiply is the same as Add. Hence we need to add more tests to prove more properties. In fact, it’s likely that property based testing will need multiple property based tests to truly define any uniqueness to a function in scenarios like this.

The next property of addition is the Associative property which says that (x + y) + z = x + (y + z). We can therefore write a property test that looks like

[TestRandomData(Max = 100)]
public void Add_Associative(int x, int y, int z)
{
   var a = MyMath.Add(add(x, y), z);
   var b = MyMath.Add(x, add(y, z));
   Assert.That(a, Is.EqualTo(b));
}

and finally from the Properties of addition we look at an identity property based test, i.e. 0 + x = x and we could write something like this

[TestRandomData(Max = 100)]
public void Add_Associative(int x)
{
   var y = MyMath.Add(0, x);
   Assert.That(x, Is.EqualTo(y));
}

Now these three property based tests tell us the Add method adheres to the expected properties of addition. If we compare the properties to multiplication, we know that the identity test fails on Multiply as 0 * x = 0, however the other two tests pass – so again you see why multiple property based tests are required.

What’s next?

Ofcourse this demonstrates property based testing on a pretty simple function, more complicated functions require the developer to try to view the function, not so much about its inputs and outputs and values or the likes that are expected, but instead think about what properties make that function unique. This is not always easy.

Let’s look at some other examples of how we might use property based testing…

If we have code that reverses a string we can actually reverse a string then reverse again and see if it results in the original list. Now you’d be correct in thinking that this is not a great test as, if the reverse function does nothing, then reversing the reversed string will be the same as just reversing the string – so property based testing does not exclude the need for standard unit tests with known and expected values.

We could also use the Test oracle whereby we have a different implementation to an existing method that we then check that both output the same value.

The with expression in C# 9.0 and later

I’ve not had cause to use this so far. Which seems strange (in a way) as I used similar language features with the spread operator and changing values in TypeScript a lot.

The with expression basically copies a record, struct or anonymous type and allows us to change some values, hence creating a new instance but overriding certain properties or fields.

Note: In C# 9.0 the with expression only worked with record types, in C# 10 it can be used, additionally, with struct and anonymous types.

Let’s look at a simple example. We have my old favourite demo type, a Person record which looks like this

public record Person(string Name, int Age);

Now we’ll create a Person like this

Person scooby = new("Scooby Doo", 7);

Console.WriteLine(scooby);

This would ofcouse output something like Person { Name = Scooby Doo, Age = 7 }.

Note: Obviously this is a rather simplistic record but it should give you the idea how things work. For more complex cases imagine there’s a first name, last name, lines of an address, national insurance/social security number etc. just to make it a little more work to create copies of a Person record instance.

Now let’s say we wanted to create a new record which has all the same properties/fields bar some properties/fields that will be changed. Ofcourse we could just create a new Person in the same way as above and change the properties. However, the with expression allows us to copy and change in a more elegant way (well more elegant if we had lots more properties/fields on the record than this example).

We can write

Person scooby = new("Scooby Doo", 7);
Person scrappy = scooby with { Name = "Scrappy Doo" };

Console.WriteLine(scooby);
Console.WriteLine(scrappy);

The scooby instance remains unchanged as we’ve essentially created a new instance of the Person record, copying the Age and explicitly changing the Name.

Now, what if we wanted to simply create a new instance of a Person with the same properties and fields as the scooby instance? We can just use the { } syntax like this

Person scoobyCopy = scooby with { };

Assert.IsFalse(Object.ReferenceEquals(scooby, scoobyCopy));

As the Assert states, scooby and scoobyCopy instances are not the same instance, they are copies.

Note: If using a struct (or if you’re explicitly supplying properties for a record) as the left-hand of the with expression and change values within the curly braces, you will need properties to have setters.