in

How to create an ORM in python?


I’m working on a python project where I try to implement a very simple ORM.

So I have an SQL database with two tables:

customer:
  id: string
  name: string

order:
  id: string
  customer_id: stirng (foregin key)
  message: string

I have the two corresponding python classes in a python file:

class Customer:
  def __init__(self, id, name, orders=[]):
    self.id = id
    self.name = name
    self.orders = orders

class Order:
  def __init__(self, id, customer_id, message):
    self.id = id
    self.customer_id = customer_id
    self.message = message

I want to implement a function: get_customer_with_orders(customer_id)

  • This function takes as input a customer id
  • It should execute the SQL query to get the corresponding customer with it’s orders
  • Then with the result of the sql query, it should create the instance of Customer instance with its Order instances.
  • finally return the Customer instance.

The function I wrote:

def get_customer_with_orders(customer_id):
  sql_query = f"""
  SELECT *
  FROM customer
  LEFT JOIN order ON order.customer_id = customer.id
  WHERE customer.id = '{customer_id}'
  """

  with ENGINE.connect() as con:
    sql_result = con.execute(sql_query)
    orders = []

    for customer_with_order in sql_result:
      order_related_data = customer_with_order[3:]
      if order_related_data[0] is None: continue
      orders.append(Order(*order_related_data))

    customer = Customer(*result[0][:3], orders=orders)

    return customer

The issue i’m facing are:

  1. The data structure I get in sql_result variable is not normalized. It looks like this (for a customer with 3 different orders):
[
  ('customer_0_id', 'customer_0_name', 'order_0_id', 'customer_0_id', 'order_0_message'),
  ('customer_0_id', 'customer_0_name', 'order_1_id', 'customer_0_id', 'order_1_message'),
  ('customer_0_id', 'customer_0_name', 'order_2_id', 'customer_0_id', 'order_2_message'),
]

Some data is duplicated: customer_0_id, customer_0_name appears in each row
It would be better to have something like this:

(
  'customer_0_id', 
  'customer_0_name', 
  [
    ('order_0_id', 'customer_0_id', 'order_0_message'),
    ('order_1_id', 'customer_0_id', 'order_1_message'),
    ('order_2_id', 'customer_0_id', 'order_2_message'),
  ]
)
  1. Because the data is not normalized, I have to do dirty row splitting to get Customer related data and Order related data.
    I need to know how many attribute Customer has, but if one day I add a new attribute and I change the Customer constructor, the function will break.

  2. Because there is data duplication, I end up using more memory than needed (the size of sql_result variable could be smaller)

  • Does anyone know how I could make the function cleaner ?
  • Is there a way to write SQL request and get normalized data structure ?
  • How do ORMs such as SQLalchemy implement this link between classes and db (especially with joins)?
  • Is there a tutorial / doc / ressource about how to create an ORM in python ?

Note:

I checked this link:
https://levelup.gitconnected.com/how-i-built-a-simple-orm-from-scratch-in-python-18b50108cfa3
but it does not handle join case

I checked the SQL query executed by SQLAlchemy when i use the ORM to get entities with joined entities. It does a JOIN but i don’t know how it instantiate the corresponding classes.



Source: https://stackoverflow.com/questions/70536650/how-to-create-an-orm-in-python

A Lightweight eventbus with async compatibility for Golang

Unreachable MongoDB once memory is allocated on the swap