Movies – Recently Watched

Pi (1998) :

Darren Aronofosky has never disappointed you, has he? Requiem for a dream, Black Swan, The Fountain etc. are some of the movies he has directed. Pi is a nice movie to watch. Shot in black and white, this movie is about a geeky mathematician who is trying to figure out something, a pattern, which has some deeper meaning to it. This movie is very intense and the whole paranoia of the character displayed is really nice. Overall, it’s a good movie.

Jacob’s Ladder (1990):

Sometimes you need a dose of mind-bending movies and this movie really fits well. You get an intelligent, nail biting plot which has so many twists and turns that makes you thinks about a lot of possibilities. The movie is weird, creepy and unique. A Must watch!

Leaving Las Vegas (1995):

I had never thought that Nicolas Cage, an actor of so many pathetic movies, would have ever got an Oscar for his acting. But, boy, has he acted in this movie. What a fantastic flick! The chemistry between Cage and the actress, Elizabeth Shue, is quite amazing. They seem so natural in the roles they play. The whole emotions and vulnerability of the characters that has been shown, seems so genuine and flawless.

Like Father, Like Son (2013): 

These Japanese directors try to keep the things very simple and this is where they get distinguished. This movie has got a very unexpected plot which I should not be talking about. Just watch this beautiful movie and get overwhelmed with the simplicity of the characters.

Django Update 1.6 to 1.9 – 1.8 to 1.9

Upgrading DJango from 1.8 to 1.9 was relatively easier as the main pain of upgrading DRF was all dealt with when I was upgrading Django from 1.6 to 1.8, which has been discussed here. A lot of libraries has to be updated which I came to know as and when I tried running the application.

One of the main issues that I had while the whole upgrade was creating the migration from scratch. The project had South for managing the migration, but since Django has now its inbuilt support, I removed this South and all the existing migration files. There were some circular dependency issues when creating the new migration. You may get an error as below.

django.db.migrations.graph.CircularDependencyError: partner.0001_initial, address.0001_initial, users.0001_initial

The reason was with some of the foreign keys. Django internally creates a graph data structure to figure out this dependency. You can get rid of it by removing the foreign keys temporarily, running the migrations and then creating the keys again.

These following libraries had to go for an update, mentioned with the version that I am using with the Django version 1.9:

Django Reversion 1.10.0, Django tables 2 1.0.5, Django Mptt 0.8.0,  Django Celery 3.2.1, Django Extensions 1.7.3, Django Haystack 2.5.1, Django Redis Cache 1.7.1, Django Redis Session 0.5.6

There are setting changes around the library – Pipeline. You can find the changes to be done on the library’s documentation page. The new settings are as such:

"STATICFILES_FINDERS - 'pipeline.finders.PipelineFinder', 
 PIPELINE = {
    'PIPELINE_ENABLED': True,
    'JAVASCRIPT': {
        'stats': {
            'source_filenames': (
              'js/jquery.js',
              'js/d3.js',
              'js/collections/*.js',
              'js/application.js',
            ),
            'output_filename': 'js/stats.js',
        }
    }
}"

Other than this, there were some code changes which were introduced in Django 1.9, for which I had to change some import statements. Some frequent Django and other libraries’ issues:

Issue:

from django.db.models.loading import get_models
ImportError: No module named loading

Solution:

from django.apps import apps

Issue:

from django.utils.importlib import import_module
ImportError: No module named importlib

Solution:

from importlib import import_module

Issue:

class ProductAdmin(reversion.VersionAdmin):
AttributeError: 'module' object has no attribute 'VersionAdmin'

Solution:

from reversion.admin import VersionAdmin

Issue:

@reversion.register
NameError: name 'reversion' is not defined

Solution:

from reversion import revisions

Issue:

django.utils.log.NullHandler class not found

Solution:

Use logging.NullHandler

This is pretty much it. Django upgrade is more of a task which needs a lot of patience than other things. Happy upgrading.

Django Update 1.6 to 1.9 – 1.6 to 1.8 and DRF Upgrade

This is going to be a series of posts describing briefly about something which turned my life upside down for few weeks. I am still in the process, as of now. Django upgrade teaches you a lot of things and when you are updating from a very old version to relatively newer one, it kills you, literally. You are up for an extremely bumpy ride that you will remember for a very very long time.

The first and the most difficult part is to upgrade the Django Rest Framework. I did not think this one was going to be such a biggie. We were on DRF 2.3.13 and without thinking a lot I took it to 3.4.0 and that was the first mistake that I did. All the test cases failed miserably and I started fixing them one by one. Some of them seemed self-explanatory and there were some issues which even Google failed to help. I struggled with these errors and they wasted 2 days. When nothing moved, I decided to move it to 3.0.5 first and then taking it to the other versions.

LESSON LEARNT – When upgrading a library, take it to just one major version above it and we must read the changelog carefully.

Upgrading to DRF 3.0.5 was relatively easier. There were some serializer issues which were mostly straightforward. And there were few deprecations as well. Then I took it to 3.1.1 which did not throw any extra error. Then upgraded to 3.2.1 and then to 3.3.1 and finally to 3.4.5. Meanwhile, I had upgraded Django to 1.8 which I shall discuss in the next post.

Some of the errors which frequently showed up are as below.

NotImplementedError: `request.QUERY_PARAMS` has been deprecated in favor of `request.query_params` since version 3.0, 
and has been fully removed as of version 3.2.

For the above one change QUERY_PARAMS to query_params. Simple googling can help here.

NotImplementedError: Field.to_representation() must be implemented for field id. 
If you do not need to support write operations you probably want to subclass `ReadOnlyField` instead.

This is one of the basic changes that you have to do. This one is basically asking you to change to:

order_id = serializers.Field(source='pk') # older
order_id = serializers.ReadOnlyField(source='pk') # newer
AssertionError: It is redundant to specify `source='discount_amount'` on field 'Field' in serializer 'OrderListSerializer', 
because it is the same as the field name. Remove the `source` keyword argument.

This error comes when in any of the fields of the serializer, you have the field name same as the source name.

discount_amount = serializers.ReadOnlyField(source='discount_amount') # older
discounted_amount = serializers.ReadOnlyField(source='discount_amount') # newer

Also, please take those serializers very seriously which take a list to serialize. The parameter many=True solves so many issues around them.

Some of the other issue which appeared, and I am sorry to have forgotten how they were fixed, are as below. I am sure, if you are this far, you will find them very easy to fix.

AttributeError: Got AttributeError when attempting to get a value for field `product` on serializer `ProductListSerializer`.
The serializer field might be named incorrectly and not match any attribute or key on the `RelatedManager` instance. 
Original exception text was: 'RelatedManager' object has no attribute 'product'
AttributeError: 'RelatedManager' object has no attribute
AttributeError: 'QuerySet' object has no attribute 'user'

Other changes include:

  • The earlier implementation where you had to define one custom to_native method has been changed to to_representation
class FinalRecordField(serializers.WritableField):

    def to_native(self, value):
        return value

to

class FinalStockRecordField(serializers.Serializer):

    def to_representation(self, instance):
        return instance
  • The library djangorestframework-xml had to be included as the default XML renderer was not working. I used version 1.3.0.

This is what I mostly remember. Try to be patient with the update as it has to be a disturbing thing. I am happily on DRF version 3.4.5. Next article will be on Django upgrade from 1.6 to 1.8.

Django Cache Busting

Browsers cache images, stylesheets, javascript, it’s their default nature. They do not want to fetch the same file again and again from the server as they are smart. But sometimes, when in your app you have changes in any of the javascripts, this feature can bite you in your back. You have made some changes in your js file but it’s not reflecting in the browser. Clear the browser cache and it works. Ask your clients to do this and they will be super furious. So, what are the options here?

Browsers are forced to fetch the latest file from the server when there is a name change in the source file. Versioning the file and tagging with a new version every time a change has been done works just fine.  By adding a new version, you ask the browser to fetch the new file. Something as adding this one

?version=0.0.1 # current version

But there is one pain here – changing the version each time you have done some changes in the js file. You can easily forget to change the version and the changes in the js files do not reflect.  There are ways to accomplish this thing. I tried implementing the same taking help from one the GitHub projects.

The idea is following:

  • Write something which overwrites the existing static tag, so that you serve the files the way you are handling their names. Get an idea about custom templates and tags  from here: Custom template tags and filters
  • The render function of the custom tag class is the place where you need to add the implementation. The most general of all would be adding the current timestamp with the filename, so that whenever(if) the file is changed, it will be tagged with the timestamp of its last edit.
  • Use the custom tag instead of the default staticfiles tag.

Here goes the implementation part. The custom tag class is as below.

from django import template
from django.conf import settings

import posixpath
import datetime
import urllib
import os

try:
    from django.contrib.staticfiles import finders
except ImportError:
    finders = None

register = template.Library()


@register.tag('static')
def do_static(parser, token):
    """
    overwriting the default tag here
    """
    return CacheBusterTag(token, False)


class CacheBusterTag(template.Node):
    def __init__(self, token, is_media):
        self.is_media = is_media

        try:
            tokens = token.split_contents()
        except ValueError:
            raise template.TemplateSyntaxError, "'%r' tag must have one or two arguments" % token.contents.split()[0]

        self.path = tokens[1]
        self.force_timestamp = len(tokens) == 3 and tokens[2] or False

    def render(self, context):
        """
        rendering the url with modification
        """
        try:
            path = template.Variable(self.path).resolve(context)
        except template.VariableDoesNotExist:
            path = self.path

        path = posixpath.normpath(urllib.unquote(path)).lstrip('/')
        url_prepend = getattr(settings, "STATIC_URL", settings.MEDIA_URL)

        if settings.DEBUG and finders:
            absolute_path = finders.find(path)
        else:
            absolute_path = os.path.join(getattr(settings, 'STATIC_ROOT', settings.MEDIA_ROOT), path)

        unique_string = self.get_file_modified(absolute_path)
        return url_prepend + path + '?' + unique_string

    @staticmethod
    def get_file_modified(path):
        """
        get the last modified time of the file
        """
        try:
            return datetime.datetime.fromtimestamp(os.path.getmtime(os.path.abspath(path))).strftime('%S%M%H%d%m%y')
        except Exception as e:
            return '000000000000'

Now comes the part where we serve the view.

from django.http import Http404
from django.contrib.staticfiles.views import serve as django_staticfiles_serve

"""
Views and functions for serving static files.
"""

def static_serve(request, path, document_root=None):
    try:
        return django_staticfiles_serve(request, path, document_root)
    except Http404:
        unique_string, new_path = path.split("/", 1)
        return django_staticfiles_serve(request, new_path, document_root)

In the base URLs class where you call the  Django’s default serve method, change that one to the following:

urlpatterns = patterns('',
                       url(r'^static/(?P.*)$', 'cachebuster.views.static_serve', {'document_root': STATIC_URL}),
                       .....
)

Now call the custom tag to load the static files.

{% load cachebuster %}

That’s it. This works for me. So, cheers.

Django Api Throttling

There are cases when you do not want your clients to bombard some apis. Django Rest Framework gives you an out of box support for controlling how many times your apis can be hit. It gives you options to control the number of hits per second, per minute, per hour and per day, exceeding which the client will get a status of 429. For storing the count, the framework uses the default caches set for the application.

CACHES = {
    "default": {
        "BACKEND": "redis_cache.cache.RedisCache",
        "LOCATION": "redis.cache.amazonaws.com:6379",
        "OPTIONS": {
            "DB": 0,
            "CLIENT_CLASS": "redis_cache.client.DefaultClient",
        }
    }
}

Your MIDDLEWARE_CLASSES in the settings.py look like this:

MIDDLEWARE_CLASSES = (
    '.......'
    'custom.throttling.ThrottleMiddleWare', # the custom class to control throttling limits
)

In the REST_FRAMEWORK settings in settings.py, we need to mention the counts and the classes to help with throttling. DRF gives you default implmentaion, but you write your own throttling as well. If you have to use the default classes :

REST_FRAMEWORK = {
    'DEFAULT_THROTTLE_CLASSES': (
        'custom.throttling.PerMinuteThrottle', # custom throttle [implemented below]
        # 'rest_framework.throttling.AnonRateThrottle',
        # 'rest_framework.throttling.UserRateThrottle'
    ),
    'DEFAULT_THROTTLE_RATES': {
        'per_minute': '256/min',
    }
}

The throttle class implemented below does a per minute throttling. You can implement similar other classes to fit your usecase.

from rest_framework.settings import APISettings, USER_SETTINGS, DEFAULTS, IMPORT_STRINGS
from rest_framework.throttling import UserRateThrottle

api_settings = APISettings(USER_SETTINGS, DEFAULTS, IMPORT_STRINGS)

class ThrottleMiddleWare(object):
    def process_response(self, request, response):
        """
        Setting the standard rate limit headers
        :param request:
        :param response:
        :return:
        """
        response['X-RateLimit-Limit'] = api_settings.DEFAULT_THROTTLE_RATES.get('per_minute', "None")
        if 'HIT_COUNT' in request.META:
            response['X-RateLimit-Remaining '] = self.parse_rate((api_settings.DEFAULT_THROTTLE_RATES.get(
                'per_minute'))) - request.META['HIT_COUNT']
        return response

    def parse_rate(self, rate):
        """
        Given the request rate string, return a two tuple of:
        , 
        """
        num_requests = 0
        try:
            if rate is None:
                return (None, None)
            num, period = rate.split('/')
            num_requests = int(num)
        except Exception:
            pass
        return num_requests

REQUEST_METHOD_GET, REQUEST_METHOD_POST = 'GET', 'POST'

class PerMinuteThrottle(UserRateThrottle):
    scope = 'per_minute'

    def allow_request(self, request, view):
        """
        Custom implementation:
        Implement the check to see if the request should be throttled.
        On success calls `throttle_success`.
        On failure calls `throttle_failure`.
        """
        hit_count = 0

        try:
            if request.user.is_authenticated():
                user_id = request.user.pk
            else:
                user_id = self.get_ident(request)
            request.META['USER_ID'] = user_id

            if str(request.method).upper() == REQUEST_METHOD_POST:
                return True

            if self.rate is None:
                return True

            self.key = self.get_cache_key(request, view)
            if self.key is None:
                return True

            self.history = self.cache.get(self.key, [])
            self.now = self.timer()

            # Drop any requests from the history which have now passed the
            # throttle duration

            duration = self.now - self.duration
            while self.history and self.history[-1] <= duration:
                self.history.pop()
            
            hit_count = len(self.history) 
            request.META['HIT_COUNT'] = hit_count + 1   
            if len(self.history) >= self.num_requests: 
                 request.META['HIT_COUNT'] = hit_count
                 return self.throttle_failure()
                 return self.throttle_success()
             except Exception:
                 pass

        # in case any exception occurs - we must allow the request to go through
        request.META['HIT_COUNT'] = hit_count
        return True

When hit the limit, you get something like this:

INFO {'status': 429, 'path': '/api/order/history/', 'content': '{detail: Request was throttled.Expected available in 16 seconds.}\n', 'method': 'GET', 'user': 100}

MySql Database Engines – InnoDB and MyISAM

A database storage engine is a component which determines the way data is structured in the database and this structure determines the way CRUD is done on the database. MySql has different types of engines of which InnoDB and MyISAM are the most heard of.
mysql> show engines;
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                        | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+
| FEDERATED          | NO      | Federated MySQL storage engine                                 | NULL         | NULL | NULL       |
| MRG_MYISAM         | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         |
| MyISAM             | YES     | MyISAM storage engine                                          | NO           | NO   | NO         |
| BLACKHOLE          | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         |
| CSV                | YES     | CSV storage engine                                             | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         |
| ARCHIVE            | YES     | Archive storage engine                                         | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                             | NO           | NO   | NO         |
+--------------------+---------+----------------------------------------------------------------+--------------+------+------------+

InnoDB replaced MyISAM as MySql’s default storage engine after 5.5 release.

MyISAM and InnoDB:

When we have a scenario with very less number of writes and heavy read and massive querying, MyISAM fits the best. It has a feature of table lock which on any write operation locks the whole table and then writes. So, in case we have an application which is write heavy will suffer a lag because of the sequential write with a table lock. When a table is table locked, only one query can run against it at one time. This is where a nice robust hardware support would be required.

MyISAM also does not support atomic transactions which is a key feature of (one of ACIDs) a database. You cannot commit and rollback with MyISAM. Due to this, our data can go into an inconsistence state and we might have to face serious issues. Let’s say you issue an update query which has to update a lot many rows. The table is locked and the data is being updated. In the mean time, some connection error occurs because of which the querying is interrupted and you know you are in a trouble. Imagine the situation when you have to update your table frequently. MyISAM does not guarantee any data integrity. InnoDB on the other hand supports ACID feature. It uses transactions and validations to ensure data doe not corrupt. In case of any crash, the database will recover itself to the last save point. For maintainance MyISAM has some commands which one should run on regular intervals – Optimize, Analyze, Check etc.

Comparing the sizes the tables take in both storage engines, MyISAM tables take less space because of the less overheads. The data can further be compressed if it is sure to be ready only. It also does not support/use foreign keys as a concept. InnoDB has to do a lot of house keeping because of its relational nature, so the data size is relatively higher.

There are very limited options available for instrumentation and tweaking MyISAM. There are some cool aspects of using MyISAM, ex: there is not deadlock. MyISAM has seen very limited development in past years. One of the crash-safe database extensions of MyISAM is MariaDB which uses storage engine ARIA.

Using psycopg2 with PostgreSQL

I had been using MySql my whole life until recently I got my hands dirty on PostgreSQL in one of projects. I must tell, switching to PostgreSQL has been very easy. It has got some very cool and robust features. Let’s not talk about that here. When using python, psycopg2 is one of the mostly used database adapter. It is fairly stable and got a good community support. We used aiopg, which is a library for accessing a PostgreSQL database with asyncio. In this post, I will try to mention few important things which I came across.

1. DictCursor:

dict_cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)

helps in fetching data from the database as a Python dictionary where we can easily get columns against their names. A plain Curson gives values against their index which can be sometimes painful. Say we have to fetch a row for id = 3 from user table and we have to use couple of fields as: name, age and gender and we do not want to use 10 other fields. Using dictcursor we can get these data as :

row.get('name'), row.get('age') and row.get('gender')

against :

row[2], row.get[4] and row.get[10], where 2, 4, and 10 are the orders of the required field

Some code:

cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
query = """SELECT * FROM {} where user_id = %s""".format(DBOperations.TABLE_NAME)
yield from cur.execute(query, (10, ))
row = yield from cur.fetchall()
return row

2. Single insert for multiple rows:

We might want to execute multiple insert in one query.

insert into user address ('name1', 'address1'), ('name1', 'address2'), ('name1', 'address3')

we have to construct the string and execute the query, which can be done as below:

def set_address(self, user_id, address_ids:list):
	tup = [(user_id, aid) for aid in address_ids]
	args_str = ",".join([str(s) for s in tup])
	insert = yield from cur.execute("INSERT INTO user_address VALUES " + args_str)

3. Searching in a jsonb array:

One of the cool datatypes in PostgreSql is jsonb array. PS has made sure that querying this array is easy. Sometime we may need to search for a particular key in the jsons, say a user has got many addresses in various cities and we need to look for all the users who have address in Mumbai.

def find_user_address_by_city(cls, cur, city: str):
        array_str = "%s"
        query = """SELECT * FROM user WHERE to_json(array(SELECT jsonb_array_elements(address) ->> 'city'))::jsonb ?|
         ARRAY[{}];""".format(array_str)
        yield from cur.execute(query, tuple(city))
        rows = yield from cur.fetchall()
        return rows

I will try to add other things as and when I get them.