Usage

To use Parameterize Jobs in a project:

In [1]: import parameterize_jobs as pj

The Component class

In [2]: component = Component([0, 50, 100])
In [3]: component
Out[3]:
<Component [0, 50, 100]>

components are essentially just a wrapper around whatever data you provided, which should be an iterable.

In [4]: component[0]
Out[4]:
0
In [5]: len(component)
Out[5]:
3
In [6]: list(component)
Out[6]:
[0, 50, 100]
Q: sweet. but why would we want that?
A: you don’t. components are just a helper class. you want to use a ComponentSet!

The ComponentSet class

In [7]: cs = ComponentSet(a=range(5), b=['a', 'b', 'c', 'd'])
In [8]: cs
Out[8]:
<ComponentSet {a: 5, b: 4}>

A ComponentSet is sort of like itertools.product with some additional features:

  • ComponentSet objects have a length if the constituent Component objects have lengths:

    In [ 9]: cs = ComponentSet(a=range(5), b=['a', 'b', 'c', 'd'])
    In [10]: len(cs)
    Out[10]:
    20
    
  • ComponentSet objects can be positionally indexed:

    In [11]: cs[0]
    Out[11]:
    {'a': 0, 'b': 'a'}
    
    In [12]: cs[1]
    Out[12]:
    {'a': 0, 'b': 'b'}
    
    In [13]: cs[-1]
    Out[13]:
    {'a': 4, 'b': 'd'}
    

This is all done without computing the full set of combinations. ComponentSet objects can be iterated over to retrieve all combinations:

In [14]: for c in cs:
    ...:      print(c)
    ...:
Out[14]:
{'a': 0, 'b': 'a'}
{'a': 0, 'b': 'b'}
{'a': 0, 'b': 'c'}
{'a': 0, 'b': 'd'}
{'a': 1, 'b': 'a'}
...

You can see the performance implications of not producing the full product by comparing len(cs) with len(list(cs)):

In [15]: %%timeit
    ...:
    ...: len(ComponentSet(a=range(100), b=range(100), c=range(100)))
Out[15]:
6.89 µs ± 265 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %%timeit
    ...:
    ...: len(list(ComponentSet(a=range(100), b=range(100), c=range(100))))
Out[16]:
1.35 s ± 41.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Q: that’s cool. can we do anything else with these?
A: Yeah! You can do math!
In [17]: a = ComponentSet(a=range(5), b=list('abcd'))
In [18]: b = ComponentSet(c=range(0, 101, 50))
In [19]: c = a * b

whoa. what is this?

In [20]: c
Out[20]:
<ComponentSet {a: 5, b: 4, c: 3}>

Multiplication: adding a new dimension

When you multiply two ComponentSet objects, the constituent Component objects are combined into a new ComponentSet with the outer product of the constituent components.

In [21]: a = ComponentSet(a=range(5), b=list('abcd'))
In [22]: b = ComponentSet(c=range(0, 101, 50))
In [23]: c = a * b
In [24]: len(c)
Out[24]:
60
In [25]: c[0]
Out[25]:
{'a': 0, 'b': 'a', 'c': 0}
In [26]: c[-1]
Out[26]:
{'a': 4, 'b': 'd', 'c': 100}
In [27]: list(c)
Out[27]:
[{'a': 0, 'b': 'a', 'c': 0},
 {'a': 0, 'b': 'a', 'c': 50},
 {'a': 0, 'b': 'a', 'c': 100},
 {'a': 0, 'b': 'b', 'c': 0},
 {'a': 0, 'b': 'b', 'c': 50},
 {'a': 0, 'b': 'b', 'c': 100},
 {'a': 0, 'b': 'c', 'c': 0},
 {'a': 0, 'b': 'c', 'c': 50},
 {'a': 0, 'b': 'c', 'c': 100},
 {'a': 0, 'b': 'd', 'c': 0},
 {'a': 0, 'b': 'd', 'c': 50},
 {'a': 0, 'b': 'd', 'c': 100},
 {'a': 1, 'b': 'a', 'c': 0},
 {'a': 1, 'b': 'a', 'c': 50},
 {'a': 1, 'b': 'a', 'c': 100},
 {'a': 1, 'b': 'b', 'c': 0},
 {'a': 1, 'b': 'b', 'c': 50},
 {'a': 1, 'b': 'b', 'c': 100},
 {'a': 1, 'b': 'c', 'c': 0},
 {'a': 1, 'b': 'c', 'c': 50},
 {'a': 1, 'b': 'c', 'c': 100},
 {'a': 1, 'b': 'd', 'c': 0},
 {'a': 1, 'b': 'd', 'c': 50},
 {'a': 1, 'b': 'd', 'c': 100},
 {'a': 2, 'b': 'a', 'c': 0},
 {'a': 2, 'b': 'a', 'c': 50},
 {'a': 2, 'b': 'a', 'c': 100},
 {'a': 2, 'b': 'b', 'c': 0},
 {'a': 2, 'b': 'b', 'c': 50},
 {'a': 2, 'b': 'b', 'c': 100},
 {'a': 2, 'b': 'c', 'c': 0},
 {'a': 2, 'b': 'c', 'c': 50},
 {'a': 2, 'b': 'c', 'c': 100},
 {'a': 2, 'b': 'd', 'c': 0},
 {'a': 2, 'b': 'd', 'c': 50},
 {'a': 2, 'b': 'd', 'c': 100},
 {'a': 3, 'b': 'a', 'c': 0},
 {'a': 3, 'b': 'a', 'c': 50},
 {'a': 3, 'b': 'a', 'c': 100},
 {'a': 3, 'b': 'b', 'c': 0},
 {'a': 3, 'b': 'b', 'c': 50},
 {'a': 3, 'b': 'b', 'c': 100},
 {'a': 3, 'b': 'c', 'c': 0},
 {'a': 3, 'b': 'c', 'c': 50},
 {'a': 3, 'b': 'c', 'c': 100},
 {'a': 3, 'b': 'd', 'c': 0},
 {'a': 3, 'b': 'd', 'c': 50},
 {'a': 3, 'b': 'd', 'c': 100},
 {'a': 4, 'b': 'a', 'c': 0},
 {'a': 4, 'b': 'a', 'c': 50},
 {'a': 4, 'b': 'a', 'c': 100},
 {'a': 4, 'b': 'b', 'c': 0},
 {'a': 4, 'b': 'b', 'c': 50},
 {'a': 4, 'b': 'b', 'c': 100},
 {'a': 4, 'b': 'c', 'c': 0},
 {'a': 4, 'b': 'c', 'c': 50},
 {'a': 4, 'b': 'c', 'c': 100},
 {'a': 4, 'b': 'd', 'c': 0},
 {'a': 4, 'b': 'd', 'c': 50},
 {'a': 4, 'b': 'd', 'c': 100}]

Addition: creating a new MultiComponentSet

Adding two ComponentSet objects can be used when combining two objects with similar dimensions but different labels within those dimensions.

For example, the following ComponentSets are both indexed by a and b, but there is no overlap along these dimensions:

In [28]:  = ComponentSet(a=range(5), b=list('abcd'))
In [29]: b = ComponentSet(a=range(10, 15), b=list('wxyz'))

In [30]: ab = a + b
In [31]: ab
Out[31]:
<MultiComponentSet [{a: 5, b: 4}, {a: 5, b: 4}]>

Instead of adding a new dimension or extending each dimension, addition creates a new type of object, which is essentially a concatenated list of ComponentSet objects

The MultiComponentSet has a length equal to the sum of the lengths of the constituent Componentset objects, and on iteration, the result simply proceeds thorugh each of the constituent ComponentSets.

In [32]: len(a), len(b)
Out[32]:
(20, 20)
In [33]: len(ab)
Out[33]:
40
In [34]: list(ab)
Out[34]:
[{'a': 0, 'b': 'a'},
 {'a': 0, 'b': 'b'},
 {'a': 0, 'b': 'c'},
 {'a': 0, 'b': 'd'},
 {'a': 1, 'b': 'a'},
 {'a': 1, 'b': 'b'},
 {'a': 1, 'b': 'c'},
 {'a': 1, 'b': 'd'},
 {'a': 2, 'b': 'a'},
 {'a': 2, 'b': 'b'},
 {'a': 2, 'b': 'c'},
 {'a': 2, 'b': 'd'},
 {'a': 3, 'b': 'a'},
 {'a': 3, 'b': 'b'},
 {'a': 3, 'b': 'c'},
 {'a': 3, 'b': 'd'},
 {'a': 4, 'b': 'a'},
 {'a': 4, 'b': 'b'},
 {'a': 4, 'b': 'c'},
 {'a': 4, 'b': 'd'},
 {'a': 10, 'b': 'w'},
 {'a': 10, 'b': 'x'},
 {'a': 10, 'b': 'y'},
 {'a': 10, 'b': 'z'},
 {'a': 11, 'b': 'w'},
 {'a': 11, 'b': 'x'},
 {'a': 11, 'b': 'y'},
 {'a': 11, 'b': 'z'},
 {'a': 12, 'b': 'w'},
 {'a': 12, 'b': 'x'},
 {'a': 12, 'b': 'y'},
 {'a': 12, 'b': 'z'},
 {'a': 13, 'b': 'w'},
 {'a': 13, 'b': 'x'},
 {'a': 13, 'b': 'y'},
 {'a': 13, 'b': 'z'},
 {'a': 14, 'b': 'w'},
 {'a': 14, 'b': 'x'},
 {'a': 14, 'b': 'y'},
 {'a': 14, 'b': 'z'}]

Math with MultiComponentSets

Works just like you’d expect! Multiplication applies to each consitutent ComponentSet, Addition nests MultiComponentSets.

In [35]: d1 = ComponentSet(d=['first', 'second'])
In [36]: ab
Out[36]:
<MultiComponentSet [{a: 5, b: 4}, {a: 5, b: 4}]>
In [37]: ab*d1
Out[37]:
<MultiComponentSet [{a: 5, b: 4, d: 2}, {a: 5, b: 4, d: 2}]>
In [38]: d2 = ComponentSet(d=['third', 'fourth'])
In [39]: e = ComponentSet(e=['another'])
In [40]: abdde = ((ab * d1) + (ab * d2)) * e
In [41]: abdde
Out[41]:
<MultiComponentSet [[{a: 5, b: 4, d: 2, e: 1}, {a: 5, b: 4, d: 2, e: 1}], [{a: 5, b: 4, d: 2, e: 1}, {a: 5, b: 4, d: 2, e: 1}]]>
In [42]: len(abdde)
Out[42]:
160
In [43]: abdde[0]
Out[43]:
{'a': 0, 'b': 'a', 'd': 'first', 'e': 'another'}
In [44]: abdde[-1]
Out[44]:
{'a': 14, 'b': 'z', 'd': 'fourth', 'e': 'another'}

ComponentSets with exhaustible generators

ComponentSet objects can be used with generators, but the length and indexing features will not work:

In [45]: with_generator = ComponentSet(gen=(i for i in [1, 2, 3, 4]))
In [46]: with_generator
Out[46]:
<ComponentSet {gen: (...)}>

The following would return an error:

In [47]: len(with_generator)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-028f83238a52> in <module>
----> 1 len(with_generator)

<ipython-input-1-2d6ab0f3cd2e> in __len__(self)
     69
     70     def __len__(self):
---> 71         return product(map(len, self._sets.values()))
     72
     73     def __iter__(self):

<ipython-input-1-2d6ab0f3cd2e> in product(arr)
      4
      5 def product(arr):
----> 6     return reduce(lambda x, y: x * y, arr, 1)
      7
      8 def cumprod(arr):

<ipython-input-1-2d6ab0f3cd2e> in __len__(self)
     20
     21     def __len__(self):
---> 22         return len(self._values)
     23
     24     def __iter__(self):

TypeError: object of type 'generator' has no len()

but this can still be iterated over:

In [48]: list(with_generator)
Out[48]:
[{'gen': 1}, {'gen': 2}, {'gen': 3}, {'gen': 4}]

as it’s a generator, the list is exhausted on use:

In [49]: list(with_generator)
Out[49]:
[]

Use with dask

ComponentSet and MultiComponentSet objects can be used with many queueing libraries, including dask

In [50]: import dask.distributed as dd
In [51]: client = dd.Client()
In [52]: client

Client

Cluster

  • Workers: 4
  • Cores: 4
  • Memory: 8.59 GB
In [53]: def do_something(kwargs):
    ...:     import time
    ...:     import random
    ...:     time.sleep(random.random())
    ...:     return str(kwargs)
In [54]: futures = client.map(do_something, abdde)
In [55]: dd.progress(futures)
Out[55]:
VBox()

A real-world example

parameterizing operations over multiple incompatible climate model, year, and scenario combinations

Global climate model outputs from CMIP5 simulations typically have an incompatible set of historical and projection years, ensemble members, and even models, as some models are run with some scenario and ensemble combinations, and others do not. At the same time, you may wish to do the same operation across all the existing model years, and would like to manage the runs with a single job generator.

This can be easily handled by building a MultiComponentset:

In [56]: hist = Constant(rcp='historical', model='obs')
In [57]: hist_years = ComponentSet(year=list(range(1950, 2006)))
In [58]: rcp45 = ComponentSet(
    ...:     rcp=['rcp45'],
    ...:     model=(
    ...:         ['ACCESS1-0', 'CCSM4']
    ...:         + ['pattern{}'.format(i) for i in [1, 2, 3, 5, 6, 27, 28, 29, 30, 31, 32]]))

In [59]: rcp85 = ComponentSet(
    ...:     rcp=['rcp85'],
    ...:     model=(
    ...:         ['ACCESS1-0', 'CCSM4']
    ...:         + ['pattern{}'.format(i) for i in [1, 2, 3, 4, 5, 6, 28, 29, 30, 31, 32, 33]]))

In [60]: proj_years = ComponentSet(year=list(range(2006, 2100)))

Jobs can also be added into the parameterization

In [61]: days_under = Constant(func = lambda x, thresh: x <= thresh, threshold=32)
In [62]: days_over = ComponentSet(func = [lambda x, thresh: x >= thresh], threshold=[90, 95])

The entire job set is the sum of valid (model * model years), the entire set of which is run for each job specification:

In [63]: runs = ((hist * hist_years) + ((rcp45 + rcp85) * proj_years)) * (days_under + days_over)
In [64]: runs
Out[64]:
<MultiComponentSet [[{rcp: 1, model: 1, year: 56, func: 1, threshold: 1}, {rcp: 1, model: 1, year: 56, func: 1, threshold: 2}], [[{rcp: 1, model: 13, year: 94, func: 1, threshold: 1}, {rcp: 1, model: 13, year: 94, func: 1, threshold: 2}], [{rcp: 1, model: 14, year: 94, func: 1, threshold: 1}, {rcp: 1, model: 14, year: 94, func: 1, threshold: 2}]]]>
In [65]: len(runs)
Out[65]:
7782

The different job specifications can be examined to make sure the job was built the way you expect:

In [66]: runs[0]
Out[66]:
{'rcp': 'historical',
 'model': 'obs',
 'year': 1950,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 32}
In [67]: runs[55]
Out[67]:
{'rcp': 'historical',
 'model': 'obs',
 'year': 2005,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 32}
In [68]: runs[56]
Out[68]:
{'rcp': 'historical',
 'model': 'obs',
 'year': 1950,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 90}
In [69]: runs[167]
Out[69]:
{'rcp': 'historical',
 'model': 'obs',
 'year': 2005,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 95}
In [70]: runs[168]
Out[70]:
{'rcp': 'rcp45',
 'model': 'ACCESS1-0',
 'year': 2006,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 32}
In [71]: runs[261]
Out[71]:
{'rcp': 'rcp45',
 'model': 'ACCESS1-0',
 'year': 2099,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 32}
In [72]: runs[262]
Out[72]:
{'rcp': 'rcp45',
 'model': 'CCSM4',
 'year': 2006,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 32}
In [73]: runs[3833]
Out[73]:
{'rcp': 'rcp45',
 'model': 'pattern32',
 'year': 2099,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 95}
In [74]: runs[3834]
Out[74]:
{'rcp': 'rcp85',
 'model': 'ACCESS1-0',
 'year': 2006,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 32}
In [75]: runs[-1]
Out[75]:
{'rcp': 'rcp85',
 'model': 'pattern33',
 'year': 2099,
 'func': <function __main__.<lambda>(x, thresh)>,
 'threshold': 95}

This entire set can be run using a single call

In [76]: def do_something_fast(kwargs):
    ...:     return str(kwargs)
In [77]: futures = client.map(do_something_fast, runs)
In [78]: dd.progress(futures)
Out[78]:
VBox()
In [79]: client.gather(futures[-1])
Out[79]:
"{'rcp': 'rcp85', 'model': 'pattern33', 'year': 2099, 'func': <function <lambda> at 0x10f4c9400>, 'threshold': 95}"