4.4 KiB
4.4 KiB
AliExpress Order Parser
This project extracts order information from an AliExpress HTML page and stores it in a MariaDB database.
Features
- Parses AliExpress order HTML page
- Extracts order information:
- Order date (converted from French format to US format YYYY-MM-DD)
- Order number (16-digit identifier)
- Order detail URL
- Item description
- Item price (in EUR)
- Item quantity
- Item image URL
- Order total (in EUR)
- Creates MariaDB table with proper structure
- Inserts extracted data into database
Requirements
- Python 3.7+
- MariaDB or MySQL server
- Python packages (see requirements.txt)
Installation
1. Install Python dependencies
pip install -r requirements.txt
2. Setup MariaDB Database
Option A: Using SQL file
mysql -u root -p < create_database.sql
Option B: Using MySQL Workbench or phpMyAdmin
- Open the
create_database.sqlfile - Execute the SQL commands
3. Configure Database Connection
Edit parse_orders.py and update the database configuration:
DB_CONFIG = {
'host': 'localhost',
'user': 'your_username', # Change this
'password': 'your_password', # Change this
'database': 'aliexpress',
'charset': 'utf8mb4'
}
Usage
Run the parser
python parse_orders.py
The script will:
- Parse the
Commandes.htmfile - Extract all order information
- Create the database table if it doesn't exist
- Insert all extracted orders into the database
Database Structure
Table: items
| Column | Type | Description |
|---|---|---|
| id | INT | Auto-increment primary key |
| orderDate | DATE | Order date (YYYY-MM-DD) |
| orderNumber | VARCHAR(20) | 16-digit order number |
| orderURL | VARCHAR(500) | URL to order detail page |
| itemDesc | TEXT | Item description |
| itemPrice | DECIMAL(10,2) | Item unit price in EUR |
| itemQuantity | INT | Quantity ordered |
| itemImageURL | VARCHAR(500) | URL to item image |
| orderTotal | DECIMAL(10,2) | Total order price in EUR |
| created_at | TIMESTAMP | Record creation timestamp |
| updated_at | TIMESTAMP | Record update timestamp |
Example Queries
View all orders
SELECT * FROM items ORDER BY orderDate DESC;
View orders by date range
SELECT * FROM items
WHERE orderDate BETWEEN '2025-12-01' AND '2026-01-31';
Get specific order details
SELECT * FROM items
WHERE orderNumber = '3066436169351201';
Calculate total spending
SELECT SUM(orderTotal) as total_spent
FROM items;
Count orders by month
SELECT
DATE_FORMAT(orderDate, '%Y-%m') as month,
COUNT(DISTINCT orderNumber) as order_count,
SUM(orderTotal) as monthly_total
FROM items
GROUP BY month
ORDER BY month DESC;
Get items with price above 10 EUR
SELECT orderNumber, itemDesc, itemPrice, itemQuantity
FROM items
WHERE itemPrice > 10.00
ORDER BY itemPrice DESC;
Data Extraction Details
Date Conversion
French dates like "3 janv. 2026" are converted to US format "2026-01-03"
Supported French month abbreviations:
- janv. → 01, févr. → 02, mars → 03, avr. → 04
- mai → 05, juin → 06, juil. → 07, août → 08
- sept. → 09, oct. → 10, nov. → 11, déc. → 12
Price Conversion
French prices like "1,29€" are converted to decimal 1.29
Quantity Extraction
Quantity strings like "x1", "x2" are converted to integers 1, 2
Troubleshooting
Database Connection Error
- Verify MariaDB is running
- Check username and password in DB_CONFIG
- Ensure database 'aliexpress' exists
Parsing Error
- Verify the HTML file path is correct
- Check that Commandes.htm is in the correct location
- Ensure the HTML structure matches the expected format
Character Encoding Issues
- The script uses UTF-8 encoding for both file reading and database
- Ensure your MariaDB database uses utf8mb4 charset
Files
parse_orders.py- Main Python script to parse HTML and insert into databasecreate_database.sql- SQL script to create database and tablerequirements.txt- Python package dependenciesCommandes.htm- Source HTML file (from AliExpress)README.md- This file
License
This project is for personal use.