# AliExpress Order Parser This project extracts order information from an AliExpress HTML page and stores it in a MariaDB database. ## Features - Parses AliExpress order HTML page - Extracts order information: - Order date (converted from French format to US format YYYY-MM-DD) - Order number (16-digit identifier) - Order detail URL - Item description - Item price (in EUR) - Item quantity - Item image URL - Order total (in EUR) - Creates MariaDB table with proper structure - Inserts extracted data into database ## Requirements - Python 3.7+ - MariaDB or MySQL server - Python packages (see requirements.txt) ## Installation ### 1. Install Python dependencies ```bash pip install -r requirements.txt ``` ### 2. Setup MariaDB Database Option A: Using SQL file ```bash mysql -u root -p < create_database.sql ``` Option B: Using MySQL Workbench or phpMyAdmin - Open the `create_database.sql` file - Execute the SQL commands ### 3. Configure Database Connection Edit `parse_orders.py` and update the database configuration: ```python DB_CONFIG = { 'host': 'localhost', 'user': 'your_username', # Change this 'password': 'your_password', # Change this 'database': 'aliexpress', 'charset': 'utf8mb4' } ``` ## Usage ### Run the parser ```bash python parse_orders.py ``` The script will: 1. Parse the `Commandes.htm` file 2. Extract all order information 3. Create the database table if it doesn't exist 4. Insert all extracted orders into the database ## Database Structure ### Table: `items` | Column | Type | Description | |--------|------|-------------| | id | INT | Auto-increment primary key | | orderDate | DATE | Order date (YYYY-MM-DD) | | orderNumber | VARCHAR(20) | 16-digit order number | | orderURL | VARCHAR(500) | URL to order detail page | | itemDesc | TEXT | Item description | | itemPrice | DECIMAL(10,2) | Item unit price in EUR | | itemQuantity | INT | Quantity ordered | | itemImageURL | VARCHAR(500) | URL to item image | | orderTotal | DECIMAL(10,2) | Total order price in EUR | | created_at | TIMESTAMP | Record creation timestamp | | updated_at | TIMESTAMP | Record update timestamp | ## Example Queries ### View all orders ```sql SELECT * FROM items ORDER BY orderDate DESC; ``` ### View orders by date range ```sql SELECT * FROM items WHERE orderDate BETWEEN '2025-12-01' AND '2026-01-31'; ``` ### Get specific order details ```sql SELECT * FROM items WHERE orderNumber = '3066436169351201'; ``` ### Calculate total spending ```sql SELECT SUM(orderTotal) as total_spent FROM items; ``` ### Count orders by month ```sql SELECT DATE_FORMAT(orderDate, '%Y-%m') as month, COUNT(DISTINCT orderNumber) as order_count, SUM(orderTotal) as monthly_total FROM items GROUP BY month ORDER BY month DESC; ``` ### Get items with price above 10 EUR ```sql SELECT orderNumber, itemDesc, itemPrice, itemQuantity FROM items WHERE itemPrice > 10.00 ORDER BY itemPrice DESC; ``` ## Data Extraction Details ### Date Conversion French dates like "3 janv. 2026" are converted to US format "2026-01-03" Supported French month abbreviations: - janv. → 01, févr. → 02, mars → 03, avr. → 04 - mai → 05, juin → 06, juil. → 07, août → 08 - sept. → 09, oct. → 10, nov. → 11, déc. → 12 ### Price Conversion French prices like "1,29€" are converted to decimal 1.29 ### Quantity Extraction Quantity strings like "x1", "x2" are converted to integers 1, 2 ## Troubleshooting ### Database Connection Error - Verify MariaDB is running - Check username and password in DB_CONFIG - Ensure database 'aliexpress' exists ### Parsing Error - Verify the HTML file path is correct - Check that Commandes.htm is in the correct location - Ensure the HTML structure matches the expected format ### Character Encoding Issues - The script uses UTF-8 encoding for both file reading and database - Ensure your MariaDB database uses utf8mb4 charset ## Files - `parse_orders.py` - Main Python script to parse HTML and insert into database - `create_database.sql` - SQL script to create database and table - `requirements.txt` - Python package dependencies - `Commandes.htm` - Source HTML file (from AliExpress) - `README.md` - This file ## License This project is for personal use.