This is one of my favorites.
The Problem
Finding an efficient way to test over 150+ functions.
This happened while I was working on an enhancement of the Open edX platform. Currently, Open edX uses a custom event tracking library for tracking user generated events. These events are simple JSON objects that are fed into the Open edX Insights app.
Each user generated event in Open edX is passed through a series of processors, which are basically Python classes or functions. The event passes through each of these processors (which add specific data to the event), before finally going through a router that logs the event into a log file. However, currently these is no standardized schema for the events. An event can contain any arbitrary data and the insights app can process it however it wants to. Not an ideal scenario for accurate and real-time analytics.
We were assigned to standardize these logs into a consistent format, namely the Caliper Analytics Specification by IMS Global. That way we would be enabling real-time analysis of the logs as well as consolidating a fixed structure for each event.
For example, here is an event that is generated by Open edX
{ "accept_language": "en-US,en;q=0.9", "agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", "context": { "course_id": "", "org_id": "", "path": "/api/bookmarks/v1/bookmarks/", "user_id": 17 }, "event": { "bookmark_id": "ali,block-v1:edX+DemoX+Demo_Course+type@vertical+block@vertical_0270f6de40fc", "component_type": "vertical", "component_usage_id": "block-v1:edX+DemoX+Demo_Course+type@vertical+block@vertical_0270f6de40fc", "course_id": "course-v1:edX+DemoX+Demo_Course" }, "event_source": "server", "event_type": "edx.bookmark.added", "host": "9c680a5d44c8", "ip": "172.18.0.1", "name": "edx.bookmark.added", "page": null, "referer": "http://localhost:18000/courses/course-v1:edX+DemoX+Demo_Course/courseware/d8a6192ade314473a78242dfeedfbf5b/edx_introduction/1?activate_block_id=block-v1%3AedX%2BDemoX%2BDemo_Course%2Btype%40vertical%2Bblock%40vertical_0270f6de40fc", "session": "bf5ee12841d50f244b738e79ada5d1fc", "time": "2018-12-07T14:24:26.143437+00:00", "username": "ali" }
And here is the same event in Caliper compliant format
{ "@context": "http://purl.imsglobal.org/ctx/caliper/v1p1", "action": "Bookmarked", "actor": { "id": "http://localhost:18000/u/ali", "name": "ali", "type": "Person" }, "eventTime": "2018-12-07T14:24:26.143Z", "extensions": { "extra_fields": { "accept_language": "en-US,en;q=0.9", "agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", "event_source": "server", "event_type": "edx.bookmark.added", "host": "9c680a5d44c8", "ip": "172.18.0.1", "org_id": "", "page": null, "path": "/api/bookmarks/v1/bookmarks/", "session": "bf5ee12841d50f244b738e79ada5d1fc", "user_id": 17 } }, "id": "urn:uuid:d4618c23-d612-4709-8d9a-478d87808067", "object": { "id": "http://localhost:18000/courses/course-v1:edX+DemoX+Demo_Course/courseware/d8a6192ade314473a78242dfeedfbf5b/edx_introduction/1?activate_block_id=block-v1%3AedX%2BDemoX%2BDemo_Course%2Btype%40vertical%2Bblock%40vertical_0270f6de40fc", "type": "Page", "extensions": { "course_id": "course-v1:edX+DemoX+Demo_Course", "bookmark_id": "ali,block-v1:edX+DemoX+Demo_Course+type@vertical+block@vertical_0270f6de40fc", "component_type": "vertical", "component_usage_id": "block-v1:edX+DemoX+Demo_Course+type@vertical+block@vertical_0270f6de40fc" } }, "referrer": { "id": "http://localhost:18000/courses/course-v1:edX+DemoX+Demo_Course/courseware/d8a6192ade314473a78242dfeedfbf5b/edx_introduction/1?activate_block_id=block-v1%3AedX%2BDemoX%2BDemo_Course%2Btype%40vertical%2Bblock%40vertical_0270f6de40fc", "type": "WebPage" }, "type": "AnnotationEvent" }
That is only 1 of the around 150 events that we had to transform and we had to write a separate function for each event. Okay, so writing a transformer function is not an issue. The real challenge was how to test all of the transformers in an efficient way.
The Solution
One way would be to add the testing responsibility to the developer, so who ever picks an event to transform would also be writing the test cases for it.
Since we were using Python, each developer would have to first construct a few input “dicts” and then run them through the relevant transforming function and then validate the output dict. If left to the developers, this would’ve added considerable monotonous effort on the part of each member of our team. So we thought that there has to be a better way to do this.
There was and it’s called Data Driven Testing (DDT).
Data driven testing is defined as a software testing methodology in which the inputs are not hard coded in the test cases and are provided through some external source. That source can be a database table, HTTP requests from another system, or (as in our case) simple JSON files.
Our solution was to simply create, 2 directories. current
and expected
. We wrote only one test case that performed the following operations.
- Read a JSON file from the
current
directory - Parse the file and run it through the corresponding transformer
- Read the corresponding JSON file from the
expected
directory - Assert that both dicts are equal
- Repeat
The original PR for that test case can be found here, however here is the main test case function.
def test_caliper_transformers(self): test_files = [file for file in os.listdir( '{}current/'.format(TEST_DIR_PATH)) if file.endswith(".json")] for file in test_files: input_file = '{}current/{}'.format( TEST_DIR_PATH, file ) output_file = '{}expected/{}'.format( TEST_DIR_PATH, file ) with open(input_file) as current, open(output_file) as expected: event = json.loads(current.read()) expected_event = json.loads(expected.read()) expected_event.pop('id') caliper_event = CaliperBaseTransformer(event).transform_event() related_function = EVENT_MAPPING[event.get('event_type')] caliper_event = related_function(event, caliper_event) caliper_event.pop('id') self.assertDictEqual(caliper_event, expected_event)
After that test case was written, each developer had to only add the necessary JSON files in the necessary folders and they would automatically be tested while running our CI build. Leaving more time to write code and optimize for performance.
I’m hopeful that this code will be added to the main Open edX platform. and I would consider that a major contribution on our part towards Arbisoft, UC San Deigo and Open edX.