BePro (SoccerTrackv2)
- class Phase_data(data_provider='bepro', event_path=None, bp_tracking_json_paths=None, bp_tracking_xml_path=None, meta_data=None)
Load and process event and tracking data for the BePro (SoccerTrackv2) dataset. This class supports both modern JSON-based tracking sequences and legacy XML tracking files.
- Parameters:
data_provider (str) – Name of the data provider (default is ‘bepro’).
event_path (str) – File path to the BePro event CSV file.
bp_tracking_json_paths (list) – (For Match ID >= 130000) A list of paths to JSON frame data files.
bp_tracking_xml_path (str) – (For Match ID < 130000) File path to the tracker box XML data.
meta_data (str) – File path to the match metadata (JSON or XML depending on the match ID).
- Returns:
A synchronized DataFrame containing processed phase data.
- Return type:
pandas.DataFrame
Example Usage:
Depending on the Match ID, the data structure differs. The following example shows the conditional loading logic.
import os
import glob
from preprocessing import Phase_data
match_id = "130001"
data_provider = 'bepro'
base_dir = '/path/to/data'
if int(match_id) >= 130000:
# New Format (JSON)
file_pattern = os.path.join(base_dir, 'tracking_data', data_provider, match_id, f"{match_id}_*_frame_data.json")
tracking_json_paths = sorted(glob.glob(file_pattern))
meta_data = os.path.join(base_dir, 'tracking_data', data_provider, match_id, f"{match_id}_metadata.json")
event_csv_path = glob.glob(os.path.join(base_dir, 'event_data', data_provider, match_id, '*.csv'))[0]
preprocessing_df = Phase_data(
data_provider=data_provider,
bp_tracking_json_paths=tracking_json_paths,
event_path=event_csv_path,
meta_data=meta_data
).load_data()
else:
# Old Format (XML)
tracking_path = os.path.join(base_dir, 'tracking_data', data_provider, match_id, f"{match_id}_tracker_box_data.xml")
meta_data = os.path.join(base_dir, 'tracking_data', data_provider, match_id, f"{match_id}_tracker_box_metadata.xml")
event_csv_path = glob.glob(os.path.join(base_dir, 'event_data', data_provider, match_id, '*.csv'))[0]
preprocessing_df = Phase_data(
data_provider=data_provider,
bp_tracking_xml_path=tracking_path,
event_path=event_csv_path,
meta_data=meta_data
).load_data()
# Save the standardized output
output_file_path = os.path.join(base_dir, 'phase_data', data_provider, match_id, f"{match_id}_main_data.csv")
preprocessing_df.to_csv(output_file_path, index=False)
Details:
The BePro processing pipeline unifies disparate data sources into a standardized coordinate system:
Format Detection: Automatically handles JSON sequences for newer matches and XML files for legacy data.
Metadata Integration: Extracts pitch dimensions, team rosters, and player details from metadata files.
Time Synchronization: Merges tracking frames with event data using the nearest time-match logic.
- Feature Extraction:
Normalizes position coordinates.
Adds ball-specific tracking columns.
Organizes player data (id, name, position, x, y) for both home and away teams.
Downsampling: Standardizes the output to 5fps (200ms) to ensure consistency across the OpenSTARLab ecosystem.