Category Archives: programming

Python Class vs Instance variables

Recently I had the pleasure of learning about Python class vs instance variables. Coming from other programming languages, such as Java, this was quite different for me. So what are they?

I was working on my Monero scraper, so I will just use that as the example, since that is where I had the fun as well..

Class variables

Monero is a blockchain. A blockchain consists of linked blocks, which contain transactions. Each transaction further contains various attributes, most relevant here being tx_in and tx_out type elements. These simply describe actual Monero coins being moved in / out of a wallet in a trasaction.

So I made a Transaction class to contain this information. Like this:

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = []
    tx_outs: List[TxOut] = []

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height

I figured this should match a traditional Java class like this:

public class Transaction {
    int fee;
    int blockHeight;
    List<TxIn> txIns = new ArrayList();
    List<TxOut> txOuts = new ArrayList();

    public Transaction(int fee, int blockHeight) {
        this.fee = fee;
        this.blockHeigh = blockHeigh;
    }
}

Of course, it turned out I was wrong. A class variable in Python is actually more like a static variable in Java. So, in the above Python code, all the variables in the Transaction class are shared by all Transaction objects. Well, actually only the lists are in this case. But more on that in a bit.

Here is an example to illustrate the case:

t1 = Transaction(1, "aa", 1, 1, 1)
t1.tx_ins.append(TxIn(1, 1, 1, 1))
t2 = Transaction(1, "aa", 1, 1, 1)
t2.tx_ins.append(TxIn(1, 1, 1, 1))

print(t1.tx_ins)
print(t2.tx_ins)

I was expecting the above to print out a list with a single item for each transaction. Since I only added one to each. But it actually prints two:

[<monero.transaction.TxIn object at 0x109ceee10>, <monero.transaction.TxIn object at 0x11141abd0>]
[<monero.transaction.TxIn object at 0x109ceee10>, <monero.transaction.TxIn object at 0x11141abd0>]

There was something missing for me here, which was understanding the instance variables in Python.

Instance variables

So what makes an instance variable an instance variable?

My understanding is, the difference is setting it in the constructor (the __init__()) method:

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = []
    tx_outs: List[TxOut] = []

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height
        self.tx_ins = []
        self.tx_outs = []

Compared to the previous example, the only difference in the above is that the list values are assigned (again) in the __init__ method. Here is the result of the previous test with this setup:

[<monero.transaction.TxIn object at 0x107447e50>]
[<monero.transaction.TxIn object at 0x108ea2d10>]

So now it works as I intended, each transaction holding its own set of tx_inand tx_out. Since hey became instance variables.

I used the above Transaction structure when scraping the Monero blockchain. Because I originally had the tx_ins and tx_outs initialized as lists at class variable level, adding new values to these lists actually just kept growing the shared (class variable) lists forever. Which was not the intent.

Because I expected each Transaction object to have a new, empty set of lists. Of course, they didn’t, but rather the values just accumulated in the shared (class variable) lists. An I inserted the transactions one at a time into a database, the number of tx_ins and tx_outs for later transactions in the blockchain kept growing and growing, as they now contained also all the values of previous transactions. Hundreds of millions of inserted rows later..

After fixing the variables to be instance variables, the results and counts make sense again.

Gotcha

Even with the above fix to use instance variables for the lists, I still found myself an issue. I typoed the variable name in the constructor:

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = []
    tx_outs: List[TxOut] = []

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height
        self.tx_inx = []
        self.tx_outs = []

In the above I typoed self.tx_inx instead of self.tx_ins. Because the class level tx_ins is already initialized as an empty list, it gave no errors but the objects kept accumulating as before for the tx_ins part. Brilliant.

So I ended up with the following approach (for now):

class Transaction:
    fee = None
    block_height = None
    tx_ins: List[TxIn] = None
    tx_outs: List[TxOut] = None

    def __init__(self, fee, block_hight):
        self.fee = fee
        self.block_height = block_height
        self.tx_inx = []
        self.tx_outs = []

This way, if I typo the instance variable in the __init__ method, the class variable stays uninitialized, and I will get a runtime error in trying to use it (as the class variable value is None).

The main reason I am doing the variable initialization like I am here, is to get the variables defined for IDE autocompletion, and to be able to add type hints to them, for further IDE assistance and checking. There might be other ways to do it, but this is what I figured so far..

When I was looking into this, I also found this Stack Overflow post on the topic. It points to other optional ways to specify type hints for instance variables (e.g., typing the parameters to the constructor). There is also some pointer to Python Enhancement Proposal PEP526. Which references other PEP’s, but let’s not go into all of those..

I cannot say to have 100% assurance of all possibilities related to these annotation, and instance vs class variables, but I think I got a pretty good idea.. If you have any pointers to what I missed or misinterpreted, please leave a comment 🙂

Remote Execution in PyCharm

Editing and Running Python Code on a Remote Server in PyCharm

Recently I was looking at an option to run some code on a remote server, while editing it locally. This time on AWS, but generally ability to do so on any remote server would be nice. I found that PyCharm has this nice option to use a Python SSH interpreter. Give it some SSH credentials, and point it to the Python interpreter on the remote machine, and you should be ready to go. Nice pic about it:

Overview

Sounds cool, and actually works really well. Even supports debugging. A related issue I ran into for pipenv also mentions profiling, pip package management, etc. Great. No, I haven’t tried all the advanced stuff yet, but at least the basics worked great.

Basic Remote Use

I made this simple program to test this feature:

print("hello world")
with open("bob.txt", "w") as bob:
    bob.write("hello.txt")

print("oops")

The point is to print text to the console and create a file. I am looking to see that running this remotely will show me the prints locally, and create the file remotely. This would confirm to me that the execution happens remotely, while I edit, control execution, and see the results locally.

Running this locally prints "hello world" followed by "oops" and a file named "hello.txt" appears. Great.

To try remotely, I need to set up a remote Python interpreter in PyCharm. This can be done via project preferences:

Add interpreter

Or by clicking the interpreter in the status bar:

Statusbar interpreter

On a local configuration this shows the Python interpreter (or pipenv etc.) on my computer. In remote configuration it asks for many options such as remote server IP and credentials. All the run/debugging traffic between local and remote machines is then automatically transferred over SSH tunnels by PyCharm. To start, select SSH interpreter as type when adding new interpreter:

SSH interpreter

Just enter the remote IP/URL address, and username. Click next to enter also password/keyfile. PyCharm will try to connect and see this all works. On the final page of the remote interpreter dialog, it asks for the interpreter path:

Remote Python config

This is referring to the python executable on the remote machine. A simple which python3 does the trick. This works to run the code using the system python on the remote machine.

To run this remote configuration, I just press the run button as usual in PyCharm. With this, PyCharm uploads my project files to the remote server over SSH, starts the interpreter there for the given configuration, and transports back to my local host the console output of the execution. For me it looks exactly the same as running it locally. This is the output of running the above configuration:

ssh://ec2-user@18.195.211.65:22/usr/bin/python3 -u /tmp/pycharm_project_411/hello_world.py
hello world
oops

The first line shows some useful information. It shows that it is using the SSH interpreter with the given IP and username, with the configured Python path. It also shows the directory where it has uploaded my project files. In this case it is "/tmp/pycharm_project_411". This is the path defined in Project Interpreter settings in the Path Mappings part, as illustrated higher above in image (with too many red arrows) in this post. OK, the attached image further above has a different number due to playing with different projects but anyway. To see the files and output:

[ec2-user@ip-172-31-3-125 ~]$ cd /tmp/pycharm_project_411/
[ec2-user@ip-172-31-3-125 pycharm_project_411]$ ls
bob.txt  hello_world.py

This is the file listing from the remote server. PyCharm has uploaded the "hello_world.py" file, since this was the only file I had in my project (under project root as configured for synch in path mappings). There is a separate tab on PyCharm to see these uploads:

Remote synch

After syncing the files, PyCharm has executed the configuration on the remote host, which defined to run the hello_world.py file. And this execution has create the file "bob.txt" as it should (on remote host). The output files go in this remote target directory, as it is the working directory for the running python program.

Another direction to synchronize is from the remote host to local. Since PyCharm provides intelligent coding assistance and navigation on local system, it needs to know and install the libraries used by the executed code. For this reason, it installs all the packages installed in the remote host Python environment. Something to keep in mind. I suppose it must install some type of a local virtual environment for this. Haven’t needed to look deeper on that yet.

Using a Remote Pipenv

The above discusses the usage of standard Python run configuration and interpreter. Something I have found useful for Python environemnts is pipenv.

So can we also do a remote execution of a remote pipenv configuration? The issue I linked earliner contains solutions and discussion on this. Basically, the answer is, yes we can. Just have to find the pipenv files on the remote host and configure the right one as the remote interpreter.

For more complex environments, such as those set up with pipenv, a bit more is required. The issue I linked before had some actual instructions on how to do this:

Remote pipenv config

I made a directory "t" on the remote host, and initialized pipenv there. Installed a few dependencies. So:

  • mkdir t
  • cd t
  • pipenv install pandas

And there we have the basic pipenv setup on the remote host. To find the pipenv dir on remote host (t is the dir where pipenv was created above):

[ec2-user@ip-172-31-3-125 t]$ pipenv --venv
/home/ec2-user/.local/share/virtualenvs/t-x5qHNh_c

To see what it contains:

[ec2-user@ip-172-31-3-125 t]$ ls /home/ec2-user/.local/share/virtualenvs/t-x5qHNh_c
bin  include  lib  lib64  src
[ec2-user@ip-172-31-3-125 t]$ ls /home/ec2-user/.local/share/virtualenvs/t-x5qHNh_c/bin
activate       activate.ps1      chardetect        pip     python     python-config
activate.csh   activate_this.py  easy_install      pip3    python3    wheel
activate.fish  activate.xsh      easy_install-3.7  pip3.7  python3.7

To get python interpreter name:

[ec2-user@ip-172-31-3-125 t]$ pipenv --py
/home/ec2-user/.local/share/virtualenvs/t-x5qHNh_c/bin/python

This is just a link to python3:

[ec2-user@ip-172-31-3-125 t]$ ls -l /home/ec2-user/.local/share/virtualenvs/t-x5qHNh_c/bin/python
lrwxrwxrwx 1 ec2-user ec2-user 7 Nov  7 20:55 /home/ec2-user/.local/share/virtualenvs/t-x5qHNh_c/bin/python -> python3

Use that to configure this pipenv as remote executor, as shown above already:

Remote pipenv config

UPDATE:

Besides automated sync, I found the Pycharm IDE has features for manual upload to / download from the remote server. Seems quite useful.

First of all, the root of the remote deployment dir is defined in Deployment Configuration / Root Path. Under Deployment / Options, you can also disable the automated remote sync. Just set "Update changed files automatically to the default server" to "never". Here I have set the root dir to "/home/ec2-user". Which means the temp directory I discussed above actually is created under /home/ec2-user/tmp/pycharm_project_703/…

Deployment config

With the remote configuration defined, you can now view files on the remote server. First of all, enable the View->Tools Windows->Remote Host. This opens up the Remote Host view on the right hand side of the IDE window. The following shows a screenshot of the PyCharm IDE with this window open. The popup window (as also shown) lets you also download/upload files between the remote host and the localhost:

Deployment view

In a similar way, we can also upload local files to the remote host using the context menu for the files:

Upload to remote

One can also select entire folders for upload / download. The root path on the remote host used for all this is the one I discussed above (e.g., /home/ec2-user as defined higher above).

Conclusions

I haven’t used this feature on a large scale yet, but it seems very useful. The issue I keep linking discusses one option of using it to run data processing on a large desktop system from a laptop. I also find it interesting for just running experiments in parallel on a separate machine, or for using cloud infrastrucure while developing.

The issue also has some discussion with potential pipenv management from PyCharm coming in 2020.1 or 2020.2 version. Just speculation, of course. But until then one can set up the virtualenv using pipenv on remote host and just use the interpreter path above to set up the SSH Interpreter. This works to run the code inside the pipenv environment.

Some issues I ran into included PyCharm apparently only keeping a single state mapping in memory for remote and local file diffs. PyCharm synchronizes files very well, and identifies changes to upload new files. But if I change the remote host address, it seems to still think it has the same delta. Not a big issue, but something to keep in mind as always.

UPDATE: The manual sync I added a description for it actually quite nice way to bypass the issues on automated sync. Of course it is manual, and using it for uploading everything all the time in a big project is not useful. But for me and my projects it has been nice so far..

That’s all.

nota..monkey?

So after a longish time I wanted to try refreshing my C programming a bit. In my quest for an interesting platform to try out on, I recently came across Network On Terminal Architecture (NOTA, http://www.notaworld.org). I figured giving this a try might be interesting so I installed it and started looking.. Wonder what will become of all this nonsense..

BTW, I wonder whoever came up with the NOTA acronym? All kinds of things come to mind, such as “not a”…