178 lines
4.4 KiB
Markdown
178 lines
4.4 KiB
Markdown
# AliExpress Order Parser
|
|
|
|
This project extracts order information from an AliExpress HTML page and stores it in a MariaDB database.
|
|
|
|
## Features
|
|
|
|
- Parses AliExpress order HTML page
|
|
- Extracts order information:
|
|
- Order date (converted from French format to US format YYYY-MM-DD)
|
|
- Order number (16-digit identifier)
|
|
- Order detail URL
|
|
- Item description
|
|
- Item price (in EUR)
|
|
- Item quantity
|
|
- Item image URL
|
|
- Order total (in EUR)
|
|
- Creates MariaDB table with proper structure
|
|
- Inserts extracted data into database
|
|
|
|
## Requirements
|
|
|
|
- Python 3.7+
|
|
- MariaDB or MySQL server
|
|
- Python packages (see requirements.txt)
|
|
|
|
## Installation
|
|
|
|
### 1. Install Python dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 2. Setup MariaDB Database
|
|
|
|
Option A: Using SQL file
|
|
```bash
|
|
mysql -u root -p < create_database.sql
|
|
```
|
|
|
|
Option B: Using MySQL Workbench or phpMyAdmin
|
|
- Open the `create_database.sql` file
|
|
- Execute the SQL commands
|
|
|
|
### 3. Configure Database Connection
|
|
|
|
Edit `parse_orders.py` and update the database configuration:
|
|
|
|
```python
|
|
DB_CONFIG = {
|
|
'host': 'localhost',
|
|
'user': 'your_username', # Change this
|
|
'password': 'your_password', # Change this
|
|
'database': 'aliexpress',
|
|
'charset': 'utf8mb4'
|
|
}
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Run the parser
|
|
|
|
```bash
|
|
python parse_orders.py
|
|
```
|
|
|
|
The script will:
|
|
1. Parse the `Commandes.htm` file
|
|
2. Extract all order information
|
|
3. Create the database table if it doesn't exist
|
|
4. Insert all extracted orders into the database
|
|
|
|
## Database Structure
|
|
|
|
### Table: `items`
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | INT | Auto-increment primary key |
|
|
| orderDate | DATE | Order date (YYYY-MM-DD) |
|
|
| orderNumber | VARCHAR(20) | 16-digit order number |
|
|
| orderURL | VARCHAR(500) | URL to order detail page |
|
|
| itemDesc | TEXT | Item description |
|
|
| itemPrice | DECIMAL(10,2) | Item unit price in EUR |
|
|
| itemQuantity | INT | Quantity ordered |
|
|
| itemImageURL | VARCHAR(500) | URL to item image |
|
|
| orderTotal | DECIMAL(10,2) | Total order price in EUR |
|
|
| created_at | TIMESTAMP | Record creation timestamp |
|
|
| updated_at | TIMESTAMP | Record update timestamp |
|
|
|
|
## Example Queries
|
|
|
|
### View all orders
|
|
```sql
|
|
SELECT * FROM items ORDER BY orderDate DESC;
|
|
```
|
|
|
|
### View orders by date range
|
|
```sql
|
|
SELECT * FROM items
|
|
WHERE orderDate BETWEEN '2025-12-01' AND '2026-01-31';
|
|
```
|
|
|
|
### Get specific order details
|
|
```sql
|
|
SELECT * FROM items
|
|
WHERE orderNumber = '3066436169351201';
|
|
```
|
|
|
|
### Calculate total spending
|
|
```sql
|
|
SELECT SUM(orderTotal) as total_spent
|
|
FROM items;
|
|
```
|
|
|
|
### Count orders by month
|
|
```sql
|
|
SELECT
|
|
DATE_FORMAT(orderDate, '%Y-%m') as month,
|
|
COUNT(DISTINCT orderNumber) as order_count,
|
|
SUM(orderTotal) as monthly_total
|
|
FROM items
|
|
GROUP BY month
|
|
ORDER BY month DESC;
|
|
```
|
|
|
|
### Get items with price above 10 EUR
|
|
```sql
|
|
SELECT orderNumber, itemDesc, itemPrice, itemQuantity
|
|
FROM items
|
|
WHERE itemPrice > 10.00
|
|
ORDER BY itemPrice DESC;
|
|
```
|
|
|
|
## Data Extraction Details
|
|
|
|
### Date Conversion
|
|
French dates like "3 janv. 2026" are converted to US format "2026-01-03"
|
|
|
|
Supported French month abbreviations:
|
|
- janv. → 01, févr. → 02, mars → 03, avr. → 04
|
|
- mai → 05, juin → 06, juil. → 07, août → 08
|
|
- sept. → 09, oct. → 10, nov. → 11, déc. → 12
|
|
|
|
### Price Conversion
|
|
French prices like "1,29€" are converted to decimal 1.29
|
|
|
|
### Quantity Extraction
|
|
Quantity strings like "x1", "x2" are converted to integers 1, 2
|
|
|
|
## Troubleshooting
|
|
|
|
### Database Connection Error
|
|
- Verify MariaDB is running
|
|
- Check username and password in DB_CONFIG
|
|
- Ensure database 'aliexpress' exists
|
|
|
|
### Parsing Error
|
|
- Verify the HTML file path is correct
|
|
- Check that Commandes.htm is in the correct location
|
|
- Ensure the HTML structure matches the expected format
|
|
|
|
### Character Encoding Issues
|
|
- The script uses UTF-8 encoding for both file reading and database
|
|
- Ensure your MariaDB database uses utf8mb4 charset
|
|
|
|
## Files
|
|
|
|
- `parse_orders.py` - Main Python script to parse HTML and insert into database
|
|
- `create_database.sql` - SQL script to create database and table
|
|
- `requirements.txt` - Python package dependencies
|
|
- `Commandes.htm` - Source HTML file (from AliExpress)
|
|
- `README.md` - This file
|
|
|
|
## License
|
|
|
|
This project is for personal use.
|