Category Archives: python

Python Class vs Instance variables

Recently I had the pleasure of learning about Python class vs instance variables. Coming from other programming languages, such as Java, this was quite different for me. So what are they?

I was working on my Monero scraper, so I will just use that as the example, since that is where I had the fun as well..

Class variables

Monero is a blockchain. A blockchain consists of linked blocks, which contain transactions. Each transaction further contains various attributes, most relevant here being tx_in and tx_out type elements. These simply describe actual Monero coins being moved in / out of a wallet in a trasaction.

So I made a Transaction class to contain this information. Like this:

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = []
    tx_outs: List[TxOut] = []

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height

I figured this should match a traditional Java class like this:

public class Transaction {
    int fee;
    int blockHeight;
    List<TxIn> txIns = new ArrayList();
    List<TxOut> txOuts = new ArrayList();

    public Transaction(int fee, int blockHeight) {
        this.fee = fee;
        this.blockHeigh = blockHeigh;
    }
}

Of course, it turned out I was wrong. A class variable in Python is actually more like a static variable in Java. So, in the above Python code, all the variables in the Transaction class are shared by all Transaction objects. Well, actually only the lists are in this case. But more on that in a bit.

Here is an example to illustrate the case:

t1 = Transaction(1, "aa", 1, 1, 1)
t1.tx_ins.append(TxIn(1, 1, 1, 1))
t2 = Transaction(1, "aa", 1, 1, 1)
t2.tx_ins.append(TxIn(1, 1, 1, 1))

print(t1.tx_ins)
print(t2.tx_ins)

I was expecting the above to print out a list with a single item for each transaction. Since I only added one to each. But it actually prints two:

[<monero.transaction.TxIn object at 0x109ceee10>, <monero.transaction.TxIn object at 0x11141abd0>]
[<monero.transaction.TxIn object at 0x109ceee10>, <monero.transaction.TxIn object at 0x11141abd0>]

There was something missing for me here, which was understanding the instance variables in Python.

Instance variables

So what makes an instance variable an instance variable?

My understanding is, the difference is setting it in the constructor (the __init__()) method:

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = []
    tx_outs: List[TxOut] = []

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height
        self.tx_ins = []
        self.tx_outs = []

Compared to the previous example, the only difference in the above is that the list values are assigned (again) in the __init__ method. Here is the result of the previous test with this setup:

[<monero.transaction.TxIn object at 0x107447e50>]
[<monero.transaction.TxIn object at 0x108ea2d10>]

So now it works as I intended, each transaction holding its own set of tx_inand tx_out. Since hey became instance variables.

I used the above Transaction structure when scraping the Monero blockchain. Because I originally had the tx_ins and tx_outs initialized as lists at class variable level, adding new values to these lists actually just kept growing the shared (class variable) lists forever. Which was not the intent.

Because I expected each Transaction object to have a new, empty set of lists. Of course, they didn’t, but rather the values just accumulated in the shared (class variable) lists. An I inserted the transactions one at a time into a database, the number of tx_ins and tx_outs for later transactions in the blockchain kept growing and growing, as they now contained also all the values of previous transactions. Hundreds of millions of inserted rows later..

After fixing the variables to be instance variables, the results and counts make sense again.

Gotcha

Even with the above fix to use instance variables for the lists, I still found myself an issue. I typoed the variable name in the constructor:

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = []
    tx_outs: List[TxOut] = []

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height
        self.tx_inx = []
        self.tx_outs = []

In the above I typoed self.tx_inx instead of self.tx_ins. Because the class level tx_ins is already initialized as an empty list, it gave no errors but the objects kept accumulating as before for the tx_ins part. Brilliant.

So I ended up with the following approach (for now):

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = None
    tx_outs: List[TxOut] = None

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height
        self.tx_inx = []
        self.tx_outs = []

This way, if I typo the instance variable in the __init__ method, the class variable stays uninitialized, and I will get a runtime error in trying to use it (as the class variable value is None).

The main reason I am doing the variable initialization like I am here, is to get the variables defined for IDE autocompletion, and to be able to add type hints to them, for further IDE assistance and checking. There might be other ways to do it, but this is what I figured so far..

When I was looking into this, I also found this Stack Overflow post on the topic. It points to other optional ways to specify type hints for instance variables (e.g., typing the parameters to the constructor). There is also some pointer to Python Enhancement Proposal PEP526. Which references other PEP’s, but let’s not go into all of those..

I cannot say to have 100% assurance of all possibilities related to these annotation, and instance vs class variables, but I think I got a pretty good idea.. If you have any pointers to what I missed or misinterpreted, please leave a comment 🙂