AI tools
January 20, 2024

Github Copilot VS Stable Code 3B

How does Stability.ai’s new LLM stack up against the industry standard?

John Paul Ada
by 
John Paul Ada

For their first LLM release of the year, Stability.ai brought out the big guns.

Well actually, no. The brought out the not-so-big guns. The small guns. The pocket-sized guns.

They released a released a code-generating LLM not for the cloud, but for edge devices: Stable Code 3B.

From the official Stable Code 3B release post, ft. sad low-poly robot.

What is Stable Code 3B?

From its name, we can surmise (wow big word) that it is an LLM with 3 billion parameters. It is indeed tiny:

  • bigger than Microsoft’s Phi models (2.7B) and its derivatives, but
  • smaller than most of the top-performing byte-sized (hehe) models like Mistral (7B) and friends.

It’s small enough that it can be run by people who can only afford MacBook Airs but not GPUs.

I know. Crazy times, right?

Here’s the kicker though: It claims that it is on par with 7B models that are 2.5x its size, like the CodeLLaMA 7B.

But I don’t use CodeLLaMA though. I mainly use Github Copilot.

So let’s see how it holds up against the big bois, shall we?

Comparison

In order to compare the performance of Stable Code 3B and Github Copilot, I will be pitting them against each other in 3 rounds.

Before we begin, we have to setup Stable Code first. I’ll be running on my local MacBook Air M1 2020. I’ll be using Ollama to run Stable Code:

Setting up Stable Code using Ollama

Pre-testing Notes

So I went into testing Stable Code assuming it was an Instruction model, but IT WAS NOT. It was apparently a Completion model. I planned to originally use the same prompt instructions for both, but now I guess it won’t work 😅

So I present you the results of my struggle:

Round 1: Copying a Login Page

For this round, I wanted them to make a copy of the Airbnb Login Page using Tailwind classes.

Stable Code Prompt:

Stable Code Round 1 Code

I saved the code to an HTML file and when opened, it looks like this:

Login page generated by Stable Code

Github Copilot Prompt:

Create a beautiful, Airbnb-inspired pure HTML login page that uses Tailwind classes with no scripting

Github Copilot Round 1 Code

Once again, I saved the result into an HTML file and when opened, it looks like this:

Login page generated by Github Copilot

2. Optimizing a Data Store

For this round, I want them to implement an efficient in-memory database. For reference, here’s my take on a similar problem (without AI assistance, of course).

Stable Code Prompt:

"""This is a simple, but efficient In-Memory Key-Value Database implementation written in Python.It has support for creating, committing, and rolling back transactions.When a transaction is created, the size of the database does not double.It can do basic CRUD operations and a count operation.All operations are O(log n) or better."""class InMemoryKeyValueDatabase():

Stable Code Output:

class InMemoryKeyValueDatabase(): def __init__(self): # The following list will contain all the key-value pairs that make up your database. # Each of these items is a dictionary containing 'key' and 'value'. self.__listOfAllData__ = [] """ The InMemoryKeyValueDatabase has an internal variable called "databaseName". This holds the name of this particular database. It is set once when you create your database using one of the factory functions (InMemoryKeyValueFactory)). """ self.__internalVariable__ = None # The following dictionary will contain all the key-value pairs that make up your database. # Each of these items is a dictionary containing 'key' and 'value'. self.__dictionaryOfAllDataByKeys__ = {} def addNewKeyValuePairToDatabase( self, newKey, newValue ): """ This will take in the key as well as its value. If these items are already present within this InMemoryKeyValue database then no changes to anything would happen and nothing would be updated or changed. The only thing that could possibly change is if a new 'key' and/or 'value' pair was added into one of the dictionarys ((self.__dictionaryOfAllDataByKeys__) that are used within this InMemoryKeyValue database. """ # If either of these items already exist then no changes to anything would happen and nothing would be updated or changed. if newKey in self.__dictionaryOfAllDataByKeys__.keys(): return False # This will add the given 'key' and its associated 'value'. self.__dictionaryOfAllDataByKeys__[newKey] = newValue; """ The InMemoryKeyValueDatabase has an internal variable called "databaseName". This holds the name of this particular database. It is set once when you create your database using one of the factory functions (InMemoryKeyValueFactory)). """ return True def getAllData( self ): # The following list will contain all the key-value pairs that make up your database. # Each of these items is a dictionary containing 'key' and 'value'. self.__listOfAllData__ = [] """ The InMemoryKeyValueDatabase has an internal variable called "databaseName". This holds the name of this particular database. It is set once when you create your database using one of the factory functions (InMemoryKeyValueFactory)). """ # The following dictionary will contain all the key-value pairs that make up your database. # Each of these items is a dictionary containing 'key' and 'value'. self.__dictionaryOfAllDataByKeys__ = {} def removeSingleItem( self, itemKeyToRemove ): """ This will take in the key as well as its value. If these items are already present within this InMemoryKeyValue database then no changes to anything would happen and nothing would be updated or changed. The only thing that could possibly change is if a new 'key' and/or 'value' pair was added into one of the dictionarys ((self.__dictionaryOfAllDataByKeys__) that are used within this InMemoryKeyValue database. """ # If either of these items already exist then no changes to anything would happen and nothing would be updated or changed. if itemKeyToRemove in self.__dictionaryOfAllDataByKeys__.keys(): return False del self.__dictionaryOfAllDataByKeys__[itemKeyToRemove] # If either of these items already exist then no changes to anything would happen and nothing would be updated or changed. if itemKeyToRemove in self.__dictionaryOfAllDataByKeys__.keys(): return False del self.__dictionaryOfAllDataByKeys__[itemKeyToRemove] """ The InMemoryKeyValueDatabase has an internal variable called "databaseName". This holds the name of this particular database. It is set once when you create your database using one of the factory functions (InMemoryKeyValueFactory)). """def clearAllData( self ) : self.__listOfAllData__ = [] del self.__dictionaryOfAllDataByKeys__ # If either of these items already exist then no changes to anything would happen and nothing would be updated or changed. if "databaseName" in self.__internalVariable__.keys(): return False self.__internalVariable__ = {}

Github Copilot Prompt

I want you to create a simple, but efficient In-Memory Key-Value Database implementation written in Python.
It should support for creating, committing, and rolling back transactions.
When a transaction is created, the size of the database does not double.
It should be able to handle CRUD operations and a count operation.
All operations should be O(log n) or better.

Github Copilot Response

Github Copilot Round 2 Result

Github Copilot Code

class InMemoryDB: def __init__(self): self.data = {} self.transactions = [] def begin(self): self.transactions.append({}) def commit(self): if not self.transactions: raise Exception("No transaction") self.data.update(self.transactions.pop()) def rollback(self): if not self.transactions: raise Exception("No transaction") self.transactions.pop() def set(self, key, value): if self.transactions: self.transactions[-1][key] = value else: self.data[key] = value def get(self, key): if self.transactions: for transaction in reversed(self.transactions): if key in transaction: return transaction[key] return self.data.get(key) def delete(self, key): if self.transactions: self.transactions[-1][key] = None else: if key in self.data: del self.data[key] def count(self, value): count = sum(1 for v in self.data.values() if v == value) for transaction in self.transactions: count += sum(1 for v in transaction.values() if v == value) return count

Given that the count operation was still O(n), I asked it to go a bit further:

This doesn’t completely comply with the requirements (e.g. size constraint), but it’s getting there!

3. Writing Tests

For this round, I wanted them to create a property-based test suite for a Pearson r function.

Stable Code Prompt

# Property-based Test Suite for a Pearson-r correlation coefficient function

Stable Code Result

Stable Code Round 3 Result

Github Copilot Prompt

Create a property-based Test Suite for a Pearson-r correlation coefficient function

Github Copilot Result

Github Copilot Round 3 Result

Github Copilot Code

from hypothesis import given, strategies as stimport numpy as npfrom scipy import statsdef pearson_r(x, y): """Calculate Pearson correlation coefficient.""" mean_x = np.mean(x) mean_y = np.mean(y) num = np.sum((x - mean_x) * (y - mean_y)) den = np.sqrt(np.sum((x - mean_x)**2) * np.sum((y - mean_y)**2)) return num / den@given(st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=2), st.lists(st.floats(allow_nan=False, allow_infinity=False), min_size=2))def test_pearson_r(x, y): """Property-based test for the Pearson correlation coefficient function.""" # Ensure the lists have the same size if len(x) > len(y): x = x[:len(y)] else: y = y[:len(x)] # Convert lists to numpy arrays x = np.array(x) y = np.array(y) # Calculate Pearson correlation coefficients our_result = pearson_r(x, y) scipy_result, _ = stats.pearsonr(x, y) # Assert that the results are approximately equal np.testing.assert_almost_equal(our_result, scipy_result, decimal=5)

Bonus Round

Okay so it’s obvious who the winner is, but I refuse to give up!

We’ve already seen how Github Copilot created the Pearson r function. I wanna see how Stable Code performs with simpler prompts:

Stable Code Prompt

def pearson_r(x, y):

Stable Code Result

It ends up creating a couple more functions there other than Pearson r itself, but unfortunately this implementation is most likely incorrect as well.

Comparison Summary

Throughout all the rounds, we have a clear winner:

Github Copilot knocks Stable Code 3B out of the park. Like really, really out of the park.

While Stable Code 3B comes close with the simpler prompts, it still can’t hold a candle to the king/queen/best of the Code AIs.

My Thoughts

Okay, I admit it: this was an unfair comparison, but I wanted to see if there was hope for lightweight code generating models for the edge. This way, maybe we could introduce it in places with less resources, like some remote places in the Philippines.

Github Copilot is clearly better right now but I hope this gets better! I’ve tried other Instruction open source code generating models and I’ve had good results with those so I’m pretty hopeful! ☀️

References