Title
Curious Efficiency
Go Home
Category
Description
Efficiency (a virtue) is the child of laziness and greed (both vices), while much of our economic activity is devoted to preventing boredom in the idle time created by increases in efficiency. To be human is to be a strange creature indeed :)
Address
Phone Number
+1 609-831-2326 (US) | Message me
Site Icon
Curious Efficiency
Page Views
0
Share
Update Time
2022-05-02 10:03:30

"I love Curious Efficiency"

www.curiousefficiency.org VS www.gqak.com

2022-05-02 10:03:30

Skip to main content Toggle navigation Curious Efficiency About Archives Tags RSS Python Notes What does "x = a + b" mean? Nick Coghlan 2019-03-16 06:06 Comments Making sense of "x = a + b"Shorthand notations and shared contextThe original shared context: AlgebraThe corresponding Python context: NumbersAnother mathematical context: Matrix algebraThe corresponding Python context: NumPy ArraysPython's string concatenation contextPython's immutable sequence concatenation contextPython's mutable sequence concatenation contextA brief digression back to mathematics: MultisetsAnd back to Python once more: collections.CounterWhat does all this have to do with the idea of dictionary addition?Shorthand notations and shared contextGuido van Rossum recently put together anexcellent posttalking about the value of infix binary operators in making certain kinds ofoperations easier to reason about correctly.The context inspiring that post is a python-ideas discussion regarding thepossibility of adding a shorthand spelling (x = a + b) to Python for theoperation:x = a.copy()x.update(b)The PEP for that proposal is still in development, so I'm not going to link toit directly [1], but the paragraph above gives the gist of the idea. Guido'sarticle came in response to the assertion that infix operators don't improvereadability, when we have plenty of empirical evidence to show that they do.Where this article comes from is a key point that Guido's article mentions,but doesn't emphasise: that those readability benefits rely heavily onimplicitly shared context between the author of an expression and the readersof that expression.Without a previous agreement on the semantics, the only possible general answerto the question "What does x = a + b mean?" is "I need more information toanswer that".The original shared context: AlgebraIf the additional information supplied is "This is an algebraic expression",then x = a + b is expressing a constraint on the permitted values of x,a, and b.Specifying x = a - b as an additional constraint would then further allowthe reader to infer that x = a and b = 0.The corresponding Python context: NumbersThe use case for + in Python that most closely corresponds with algebra isusing it with numbers - the key differences lie in the meaning of =, ratherthan the meaning of +.So if the additional information supplied is "This is a Python assignmentstatement; a and b are both well-behaved finite numbers", then thereader will be able to infer that x will be the sum of the two numbers.Inferring the exact numeric type of x would require yet more informationabout the types of a and b, as types implementing the numeric +operator are expected to participate in a type coercion protocol that givesboth operands a chance to carry out the operation, and only raises TypeErrorif neither type understands the other.The original algebraic meaning then gets expressed in Python asassert x == a + b, and successful execution of the assignment statementensures that assertion will pass.In this context, types implementing the + operator are expected to provideall the properties that would be expected of the corresponding mathematicalconcepts (a + b == b + a, a + (b + c)  == (a + b) + c, etc), subjectto the limitations of performing calculations on computers that actually exist.Another mathematical context: Matrix algebraIf the given expression used uppercase letters, as in X = A + B, then theadditional information supplied may instead be "This is a matrix algebraexpression". (It's a notational convention in mathematics that matrices beassigned uppercase letters, while lowercase letters indicate scalar values)For matrices, addition and subtraction are defined as only being valid betweenmatrices of the same size and shape, so if X = A - B were to be supplied asan additional constraint, then the implications would be:X, A and B are all the same size and shapeB consists entirely of zeroesX = AThe corresponding Python context: NumPy ArraysThe numpy.ndarray type, and other types implementing the same API, bring thesemantics of matrix algebra to Python programming, similar to the way that thebuiltin numeric types bring the semantics of scalar algebra.This means that if the additional information supplied is "This is a Pythonassignment statement; A and B are both matrices of the same size andshape containing well-behaved finite numbers", then the reader will be able toinfer that X will be a new matrix of the same shape and size as matricesA and B, with each element in X being the sum of the correspondingelements in A and B.As with scalar algebra, inferring the exact numeric type of the elements ofX would require more information about the types of the elements in Aand B, the original algebraic meaning gets expressed in Python asassert X == A + B, successful execution of the assignment statementensures that assertion will pass, and types implementing + in this contextare expected to provide the properties that would be expected of a matrix inmathematics.Python's string concatenation contextMathematics doesn't provide a convenient infix notation for concatenating twostrings together (aside from writing their names directly next to each other),so programming language designers are forced to choose one.While this does vary across languages, the most common choice is the one thatPython uses: the + operator.This is formally a distinct operation from numeric addition, with differentsemantic expectations, and CPython's C API somewhat coincidentally ended upreflecting that distinction by offering two different ways of implementing+ on a type: the tp_number->nb_add and tp_sequence->sq_concat slots.(This distinction is absent at the Python level: only __add__, __radd__and __iadd__ are exposed, and they always populate the relevanttp_number slots in CPython)The key semantic difference between algebraic addition and string concatenation isthat in algebraic addition, the order of the operands doesn't matter(a + b == b + a), while in the string concatenation case, the order of theoperands determines which items appear first in the result (e.g."Hello" + "World" == "HelloWorld" vs "World" + "Hello" == "WorldHello").This means that a + b == b + a being true when concatenating stringsindicates that either one or both strings are empty, or else the two strings areidentical.Another less obvious semantic difference is that strings don't participate inthe type coercion protocol that is defined for numbers: if the right handoperand isn't a string (or string subclass) instance, they'll raiseTypeError immediately, rather than letting the other operand attempt theoperation.Python's immutable sequence concatenation contextPython goes further than merely allowing + to be used for stringconcatenation: it allows it to be used for arbitrary sequence concatenation.For immutable container types like tuple, this closely parallels the waythat string concatenation works: a new immutable instance of the same type iscreated containing references to the same items referenced by the originaloperands:>>> a = 1, 2, 3>>> b = 4, 5, 6>>> x = a + b>>> a(1, 2, 3)>>> b(4, 5, 6)>>> x(1, 2, 3, 4, 5, 6)As for strings, immutable sequences will usually only interact with otherinstances of the same type (or subclasses), even when the x += b notationis used as an alternative to x = x + b. For example:>>> x = 1, 2, 3>>> x += [4, 5, 6]Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate tuple (not "list") to tuple>>> x += 4, 5, 6>>> x(1, 2, 3, 4, 5, 6)In addition to str, the tuple, and bytes types implement theseconcatenation semantics. range and memoryview, while otherwiseimplementing the Sequence API, don't support concatenation operations.Python's mutable sequence concatenation contextMutable sequence types add yet another variation to the possible meanings of+ in Python. For the specific example of x = a + b, they're very similarto immutable sequences, creating a fresh instance that references the same itemsas the original operands:>>> a = [1, 2, 3]>>> b = [4, 5, 6]>>> x = a + b>>> a[1, 2, 3]>>> b[4, 5, 6]>>> x[1, 2, 3, 4, 5, 6]Where they diverge is that the x += b operation will modify the targetsequence directly, rather than creating a new container:>>> a = [1, 2, 3]>>> b = [4, 5, 6]>>> x = a; x = x + b>>> a[1, 2, 3]>>> x = a; x += b>>> a[1, 2, 3, 4, 5, 6]The other difference is that where + remains restrictive as to thecontainer types it will work with, += is typically generalised to workwith arbitrary iterables on the right hand side, just like theMutableMapping.extend() method:>>> x = [1, 2, 3]>>> x = x + (4, 5, 6)Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate list (not "tuple") to list>>> x += (4, 5, 6)>>> x[1, 2, 3, 4, 5, 6]Amongst the builtins, list and bytearray implement these semantics(although bytearray limits even in-place concatenation to bytes-liketypes that support memoryview style access). Elsewhere in the standardlibrary, collections.deque and array.array are other mutable sequencetypes that behave this way.A brief digression back to mathematics: MultisetsMultisets are a concept in mathematics that allow for values to occur in a setmore than once, with the multiset then being the mapping from the valuesthemselves to the count of how many times that value occurs in the multiset(with a count of zero or less being the same as the value being omitted fromthe set entirely).While they don't natively use the x = a + b notation the way that scalaralgebra and matrix algebra do, the key point regarding multisets that's relevantto this article is the fact that they do have a "Sum" operation defined, and thesemantics of that operation are very similar to those used for matrix addition:element wise summation for each item in the multiset. If a particular value isonly present in one of the multisets, that's handled the same way as if it werepresent with a count of zero.And back to Python once more: collections.CounterSince Python 2.7 and 3.1, Python has included an implementation of themathematical multiset concept in the form of the collections.Counter class.It uses x = a + b to denote multiset summation:>>> a = collections.Counter(maths=2, python=2)>>> b = collections.Counter(python=4, maths=1)>>> x = a + b>>> xCounter({'python': 6, 'maths': 3})As with sequences, counter instances define their own interoperability domain,so they won't accept arbitrary mappings for a binary + operation:>>> x = a + dict(python=4, maths=1)Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'Counter' and 'dict'But they're more permissive for in-place operations, accepting arbitrarymapping objects:>>> x += dict(python=4, maths=1)>>> xCounter({'python': 10, 'maths': 4})What does all this have to do with the idea of dictionary addition?Python's dictionaries are quite interesting mathematically, as in mathematicalterms, they're not actually a container. Instead, they're a function mappingbetween a domain defined by the set of keys, and a range defined by a multisetof values [2].This means that the mathematical context that would most closely correspond todefining addition on dictionaries is the algebraic combination of functions.That's defined such that (f + g)(x) is equivalent to f(x) + g(x), sothe only binary in-fix operator support for dictionaries that could be groundedin an existing mathematical shared context is one where d1 + d2 wasshorthand for:x = d1.copy()for k, rhs in d2.items(): try: lhs = x[k] except KeyError: x[k] = rhs else: x[k] = lhs + rhsThat has the unfortunate implication that introducing a Python-specific binaryoperator shorthand for dictionary copy-and-update semantics would represent ahard conceptual break with mathematics, rather than a transfer of existingmathematical concepts into the language. Contrast that with the introductionof collections.Counter (which was grounded in the semantics of mathematicalmultisets and borrowed its Python notation from element-wise addition onmatrices), or the matrix multiplication operator (which was grounded in thesemantics of matrix algebra, and only needed a text-editor-friendly symbolassigned, similar to using * instead of × for scalar multiplicationand / instead of ÷ for division),At least to me, that seems like a big leap to take for something where thein-place form already has a perfectly acceptable spelling (d1.update(d2)),and a more expression-friendly variant could be provided as a new dictionaryclass method:@classmethoddef from_merge(cls, *inputs): self = cls() for input in inputs: self.update(input) return selfWith that defined, then the exact equivalent of the proposed d1 + d2 wouldbe type(d1).from_merge(d1, d2), and in practice, you would often give thedesired result type explicitly rather than inferring it from the inputs(e.g. dict.from_merge(d1, d2)).However, the PEP is still in the very first stage of the discussion and reviewprocess, so it's entirely possible that by the time it reaches python-devit will be making a more modest proposal like a new dict class method,rather than the current proposal of operator syntax support.[1]The whole point of the python-ideas phase of discussion is to get a PEPready for a more critical review by the core development team, so it isn'tfair to the PEP author to invite wider review before they're ready for it.[2]The final section originally stated that arithmetic operations onmathematical functions didn't have a defined meaning, so proposing them forPython's dictionaries would be treading new conceptual ground. However, areader pointed outthat algebraic operations on functions are defined, and they translate toapplying the functions independently to the inputs, and then performing thespecified arithmetic operation on the results. The final section has beenupdated accordingly. Considering Python's Target Audience Nick Coghlan 2017-10-09 01:33 Comments Who is Python being designed for?Use cases for Python's reference interpreterWhich audience does CPython primarily serve?Why is this relevant to anything?Where does PyPI fit into the picture?Why are some APIs changed when adding them to the standard library?Why are some APIs added in provisional form?Why are only some standard library APIs upgraded?Will any parts of the standard library ever be independently versioned?Why do these considerations matter?Several years ago, Ihighlighted"CPython moves both too fast and too slowly" as one of the more common causesof conflict both within the python-dev mailing list, as well as between theactive CPython core developers and folks that decide that participating inthat process wouldn't be an effective use of their personal time and energy.I still consider that to be the case, but it's also a point I've spent a lotof time reflecting on in the intervening years, as I wrote that original articlewhile I was still working for Boeing Defence Australia. The following month,I left Boeing for Red Hat Asia-Pacific, and started gaining a redistributorlevel perspective onopen source supply chain managementin large enterprises.Use cases for Python's reference interpreterWhile it's a gross oversimplification, I tend to break down CPython's use casesas follows (note that these categories aren't fully distinct, they're justaimed at focusing my thinking on different factors influencing the rollout ofnew software features and versions):Education: educator's main interest is in teaching ways of modelling andmanipulating the world computationally, not writing or maintainingproduction software). Examples:Australia's Digital CurriculumLorena A. Barba's AeroPythonPersonal automation & hobby projects: software where the main, and often only,user is the individual that wrote it. Examples:my Digital Blasphemyimage download notebookPaul Fenwick's (Inter)National Rick Astley HotlineOrganisational process automation: software where the main, and often only,user is the organisation it was originally written to benefit. Examples:CPython's core workflow toolsDevelopment, build & release management tooling for Linux distrosSet-and-forget infrastructure: software where, for sometimes debatablereasons, in-life upgrades to the software itself are nigh impossible, butupgrades to the underlying platform may be feasible. Examples:most self-managed corporate and institutional infrastructure (where properlyfunded sustaining engineering plans are disturbingly rare)grant funded software (where maintenance typically ends when the initialgrant runs out)software with strict certification requirements (where recertification istoo expensive for routine updates to be economically viable unlessabsolutely essential)Embedded software systems without auto-upgrade capabilitiesContinuously upgraded infrastructure: software with a robust sustainingengineering model, where dependency and platform upgrades are consideredroutine, and no more concerning than any other code change. Examples:Facebook's Python service infrastructureRolling release Linux distributionsmost public PaaS and serverless environments (Heroku, OpenShift, AWS Lambda,Google Cloud Functions, Azure Cloud Functions, etc)Intermittently upgraded standard operating environments: environments that docarry out routine upgrades to their core components, but those upgrades occuron a cycle measured in years, rather than weeks or months. Examples:VFX PlatformLTS Linux distributionsCPython and the Python standard libraryInfrastructure management & orchestration tools (e.g. OpenStack, Ansible)Hardware control systemsEphemeral software: software that tends to be used once and then discardedor ignored, rather than being subsequently upgraded in place. Examples:Ad hoc automation scriptsSingle-player games with a defined "end" (once you've finished them, evenif you forget to uninstall them, you probably won't reinstall them on a newdevice)Single-player games with little or no persistent state (if you uninstall andreinstall them, it doesn't change much about your play experience)Event-specific applications (the application was tied to a specific physicalevent, and once the event is over, that app doesn't matter any more)Regular use applications: software that tends to be regularly upgraded afterdeployment. Examples:Business management softwarePersonal & professional productivity applications (e.g. Blender)Developer tools & services (e.g. Mercurial, Buildbot, Roundup)Multi-player games, and other games with significant persistent state, butno real defined "end"Embedded software systems with auto-upgrade capabilitiesShared abstraction layers: software components that are designed to make itpossible to work effectively in a particular problem domain even if you don'tpersonally grasp all the intricacies of that domain yet. Examples:most runtime libraries and frameworks fall into this category (e.g. Django,Flask, Pyramid, SQL Alchemy, NumPy, SciPy, requests)many testing and type inference tools also fit here (e.g. pytest,Hypothesis, vcrpy, behave, mypy)plugins for other applications (e.g. Blender plugins, OpenStack hardwareadapters)the standard library itself represents the baseline "world according toPython" (and that's anincredibly complexworld view)Which audience does CPython primarily serve?Ultimately, the main audiences that CPython and the standard library specificallyserve are those that, for whatever reason, aren't adequately served by thecombination of a more limited standard library and the installation ofexplicitly declared third party dependencies from PyPI.To oversimplify the above review of different usage and deployment models evenfurther, it's possible to summarise the single largest split in Python's userbase as the one between those that are using Python as a scripting languagefor some environment of interest, and those that are using it as an applicationdevelopment language, where the eventual artifact that will be distributed issomething other than the script that they're working on.Typical developer behaviours when using Python as a scripting language include:the main working unit consists of a single Python file (or Jupyter notebook!),rather than a directory of Python and metadata filesthere's no separate build step of any kind - the script is distributed as ascript, similar to the way standalone shell scripts are distributedthere's no separate install step (other than downloading the file to anappropriate location), as it is expected that the required runtime environmentwill be preconfigured on the destination systemno explicit dependencies stated, except perhaps a minimum Python version,or else a statement of the expected execution environment. If dependenciesoutside the standard library are needed, they're expected to be provided bythe environment being scripted (whether that's an operating system,a data analysis platform, or an application that embeds a Python runtime)no separate test suite, with the main test of correctness being "Did thescript do what you wanted it to do with the input that you gave it?"if testing prior to live execution is needed, it will be in the form of a"dry run" or "preview" mode that conveys to the user what the software woulddo if run that wayif static code analysis tools are used at all, it's via integration into theuser's software development environment, rather than being set up separatelyfor each individual scriptBy contrast, typical developer behaviours when using Python as an applicationdevelopment language include:the main working unit consists of a directory of Python and metadata files,rather than a single Python filethese is a separate build step to prepare the application for publication,even if it's just bundling the files together into a Python sdist, wheelor zipapp archivewhether there's a separate install step to prepare the application for usewill depend on how the application is packaged, and what the supported targetenvironments areexternal dependencies are expressed in a metadata file, either directly inthe project directory (e.g. pyproject.toml, requirements.txt,Pipfile), or as part of the generated publication archive (e.g.setup.py, flit.ini)a separate test suite exists, either as unit tests for the Python API,integration tests for the functional interfaces, or a combination of the twousage of static analysis tools is configured at the project level as part ofits testing regime, rather than being dependent onAs a result of that split, the main purpose that CPython and the standardlibrary end up serving is to define the redistributor independent baselineof assumed functionality for educational and ad hoc Python scriptingenvironments 3-5 years after the corresponding CPython feature release.For ad hoc scripting use cases, that 3-5 year latency stems from a combinationof delays in redistributors making new releases available to their users, andusers of those redistributed versions taking time to revise their standardoperating environments.In the case of educational environments, educators need that kind of time toreview the new features and decide whether or not to incorporate them into thecourses they offer their students.Why is this relevant to anything?This post was largely inspired by the Twitter discussion following on fromthis comment of mineciting the Provisional API status defined inPEP 411 as an example of anopen source project issuing a de facto invitation to users to participate moreactively in the design & development process as co-creators, rather than onlypassively consuming already final designs.The responses included several expressions of frustration regarding the difficultyof supporting provisional APIs in higher level libraries, without those librariesmaking the provisional status transitive, and hence limiting support for anyrelated features to only the latest version of the provisional API, and not anyof the earlier iterations.My main reactionwas to suggest that open source publishers should impose whatever supportlimitations they need to impose to make their ongoing maintenance effortspersonally sustainable. That means that if supporting older iterations ofprovisional APIs is a pain, then they should only be supported if the projectdevelopers themselves need that, or if somebody is paying them for theinconvenience. This is similar to my view on whether or not volunteer-drivenprojects should support older commercial LTS Python releases for free when it'sa hassle for them to do: I don't think they should,as I expect most such demands to be stemming from poorly managed institutionalinertia, rather than from genuine need (and if the need is genuine, then itshould instead be possible to find some means of paying to have it addressed).However, my second reaction,was to realise that even though I've touched on this topic over the years (e.g.in the original 2011 article linked above, as well as in Python 3 Q & A answershere,here,and here,and to a lesser degree in last year's article on thePython Packaging Ecosystem),I've never really attempted to directly explain the impact it has on the standardlibrary design process.And without that background, some aspects of the design process, such as theintroduction of provisional APIs, or the introduction ofinspired-by-but-not-the-same-as, seem completely nonsensical, as they appear to be an attempt to standardiseAPIs without actually standardising them.Where does PyPI fit into the picture?The first hurdle that any proposal sent to python-ideas or python-dev has toclear is answering the question "Why isn't a module on PyPI good enough?". Thevast majority of proposals fail at this step, but there are several commonthemes for getting past it:rather than downloading a suitable third party library, novices may be proneto copying & pasting bad advice from the internet at large (e.g. this is whythe secrets library now exists: to make it less likely people will use therandom module, which is intended for games and statistical simulations,for security-sensitive purposes)the module is intended to provide a reference implementation and to enableinteroperability between otherwise competing implementations, rather thannecessarily being all things to all people (e.g. asyncio, wsgiref,unittest`, and logging all fall into this category)the module is intended for use in other parts of the standard library (e.g.enum falls into this category, as does unittest)the module is designed to support a syntactic addition to the language (e.g.the contextlib, asyncio and typing modules fall into thiscategory)the module is just plain useful for ad hoc scripting purposes (e.g.pathlib, and ipaddress fall into this category)the module is useful in an educational context (e.g. the statisticsmodule allows for interactive exploration of statistic concepts, even if youwouldn't necessarily want to use it for full-fledged statistical analysis)Passing this initial "Is PyPI obviously good enough?" check isn't enough toensure that a module will be accepted for inclusion into the standard library,but it's enough to shift the question to become "Would including the proposedlibrary result in a net improvement to the typical introductory Python softwaredeveloper experience over the next few years?"The introduction of ensurepip and venv modules into the standard libraryalso makes it clear to redistributors that we expect Python level packagingand installation tools to be supported in addition to any platform specificdistribution mechanisms.Why are some APIs changed when adding them to the standard library?While existing third party modules are sometimes adopted wholesale into thestandard library, in other cases, what actually gets added is a redesignedand reimplemented API that draws on the user experience of the existing API,but drops or revises some details based on the additional design considerationsand privileges that go with being part of the language's referenceimplementation.For example, unlike its popular third party predecessor, path.py`, ``pathlibdoes not define string subclasses, but instead independent types. Solvingthe resulting interoperability challenges led to the definition of thefilesystem path protocol, allowing a wider range of objects to be used withinterfaces that work with filesystem paths.The API design for the ipaddress module was adjusted to explicitlyseparate host interface definitions (IP addresses associated with particularIP networks) from the definitions of addresses and networks in order to serveas a better tool for teaching IP addressing concepts, whereas the originalipaddr module is less strict in the way it uses networking terminology.In other cases, standard library modules are constructed as a synthesis ofmultiple existing approaches, and may also rely on syntactic features thatdidn't exist when the APIs for pre-existing libraries were defined. Both ofthese considerations apply for the asyncio and typing modules,while the latter consideration applies for the dataclasses API beingconsidered in PEP 557 (which can be summarised as "like attrs, but usingvariable annotations for field declarations").The working theory for these kinds of changes is that the existing librariesaren't going away, and their maintainers often aren't all that interestedin putitng up with the constraints associated with standard library maintenance(in particular, the relatively slow release cadence). In such cases, it'sfairly common for the documentation of the standard library version to featurea "See Also" link pointing to the original module, especially if the thirdparty version offers additional features and flexibility that were omittedfrom the standard library module.Why are some APIs added in provisional form?While CPython does maintain an API deprecation policy, we generally prefer notto use it without a compelling justification (this is especially the casewhile other projects are attempting to maintain compatibility with Python 2.7).However, when adding new APIs that are inspired by existing third party oneswithout being exact copies of them, there's a higher than usual risk that someof the design decisions may turn out to be problematic in practice.When we consider the risk of such changes to be higher than usual, we'll markthe related APIs as provisional, indicating that conservative end users maywant to avoid relying on them at all, and that developers of shared abstractionlayers may want to consider imposing stricter than usual constraints on whichversions of the provisional API they're prepared to support.Why are only some standard library APIs upgraded?The short answer here is that the main APIs that get upgraded are those where:there isn't likely to be a lot of external churn driving additional updatesthere are clear benefits for either ad hoc scripting use cases or else inencouraging future interoperability between multiple third party solutionsa credible proposal is submitted by folks interested in doing the workIf the limitations of an existing module are mainly noticeable when using themodule for application development purposes (e.g. datetime), ifredistributors already tend to make an improved alternative third party optionreadily available (e.g. requests), or if there's a genuine conflict betweenthe release cadence of the standard library and the needs of the package inquestion (e.g. certifi), then the incentives to propose a change to thestandard library version tend to be significantly reduced.This is essentially the inverse to the question about PyPI above: since PyPIusually is a sufficiently good distribution mechanism for applicationdeveloper experience enhancements, it makes sense for such enhancements to bedistributed that way, allowing redistributors and platform providers to maketheir own decisions about what they want to include as part of their defaultoffering.Changing CPython and the standard library only comes into play when there isperceived value in changing the capabilities that can be assumed to be presentby default in 3-5 years time.Will any parts of the standard library ever be independently versioned?Yes, it's likely the bundling model used for ensurepip (where CPythonreleases bundle a recent version of pip without actually making it partof the standard library) may be applied to other modules in the future.The most probable first candidate for that treatment would be the distutilsbuild system, as switching to such a model would allow the build system to bemore readily kept consistent across multiple releases.Other potential candidates for this kind of treatment would be the Tcl/Tkgraphics bindings, and the IDLE editor, which are already unbundled and turnedinto an optional addon installations by a number of redistributors.Why do these considerations matter?By the very nature of things, the folks that tend to be most actively involvedin open source development are those folks working on open source applicationsand shared abstraction layers.The folks writing ad hoc scripts or designing educational exercises for theirstudents often won't even think of themselves as software developers - they'reteachers, system administrators, data analysts, quants, epidemiologists,physicists, biologists, business analysts, market researchers, animators,graphical designers, etc.When all we have to worry about for a language is the application developerexperience, then we can make a lot of simplifying assumptions around whatpeople know, the kinds of tools they're using, the kinds of developmentprocesses they're following, and the ways they're going to be building anddeploying their software.Things get significantly more complicated when an application runtime alsoenjoys broad popularity as a scripting engine. Doing either job well isalready difficult, and balancing the needs of both audiences as part of a singleproject leads to frequent incomprehension and disbelief on both sides.This post isn't intended to claim that we never make incorrect decisions as partof the CPython development process - it's merely pointing out that the mostreasonable reaction to seemingly nonsensical feature additions to the Pythonstandard library is going to be "I'm not part of the intended target audiencefor that addition" rather than "I have no interest in that, so it must be auseless and pointless addition of no value to anyone, added purely to annoy me". The Python Packaging Ecosystem Nick Coghlan 2016-09-17 03:46 Comments From Development to DeploymentMy core software ecosystem design philosophyThe key conundrumPlatform management or plugin management?Where do we go next?Sustainability and the bystander effectMigrating PyPI to pypi.orgMaking the presence of a compiler on end user systems optionalBootstrapping dependency management tools on end user systemsMaking the use of distutils and setuptools optionalMaking PyPI security independent of SSL/TLSAutomating wheel creationThere have been a few recent articles reflecting on the current status ofthe Python packaging ecosystem from an end user perspective, so it seemsworthwhile for me to write-up my perspective as one of the lead architectsfor that ecosystem on how I characterise the overall problem space of softwarepublication and distribution, where I think we are at the moment, and where I'dlike to see us go in the future.For context, the specific articles I'm replying to are:Python Packaging is Good Now (Glyph Lefkowitz)Conda: Myths and Misconceptions (Jake VanderPlas)Python Packaging at PayPal (Mahmoud Hashemi)These are all excellent pieces considering the problem space from differentperspectives, so if you'd like to learn more about the topics I cover here,I highly recommend reading them.My core software ecosystem design philosophySince it heavily influences the way I think about packaging system design ingeneral, it's worth stating my core design philosophy explicitly:As a software consumer, I should be able to consume libraries, frameworks,and applications in the binary format of my choice, regardless of whetheror not the relevant software publishers directly publish in that formatAs a software publisher working in the Python ecosystem, I should be able topublish my software once, in a single source-based format, and have it beautomatically consumable in any binary format my users care to useThis is emphatically not the way many software packaging systems work - for agreat many systems, the publication format and the consumption format aretightly coupled, and the folks managing the publication format or theconsumption format actively seek to use it as a lever of control over acommercial market (think operating system vendor controlled application stores,especially for mobile devices).While we're unlikely to ever pursue the specific design documented in therest of the PEP (hence the "Deferred" status), the"Development, Distribution, and Deployment of Python Software"section of PEP 426 provides additional details on how this philosophy appliesin practice.I'll also note that while I now work on software supply chain managementtooling at Red Hat, that wasn't the case when I first started activelyparticipating in the upstream Python packaging ecosystemdesign process. Back then I was workingon Red Hat's mainhardware integration testing system, andgrowing increasingly frustrated with the level of effort involved inintegrating new Python level dependencies into Beaker's RPM based developmentand deployment model. Getting actively involved in tackling these problems onthe Python upstream side of things then led to also getting more activelyinvolved in addressing them on theRed Hat downstream side.The key conundrumWhen talking about the design of software packaging ecosystems, it's very easyto fall into the trap of only considering the "direct to peer developers" usecase, where the software consumer we're attempting to reach is another developerworking in the same problem domain that we are, using a similar set ofdevelopment tools. Common examples of this include:Linux distro developers publishing software for use by other contributors tothe same Linux distro ecosystemWeb service developers publishing software for use by other web servicedevelopersData scientists publishing software for use by other data scientistsIn these more constrained contexts, you can frequently get away with using asingle toolchain for both publication and consumption:Linux: just use the system package manager for the relevant distroWeb services: just use the Python Packaging Authority's twine for publicationand pip for consumptionData science: just use conda for everythingFor newer languages that start in one particular domain with a preferredpackage manager and expand outwards from there, the apparent simplicity arisingfrom this homogeneity of use cases may frequently be attributed as an essentialproperty of the design of the package manager, but that perception of inherentsimplicity will typically fade if the language is able to successfully expandbeyond the original niche its default package manager was designed to handle.In the case of Python, for example, distutils was designed as a consistentbuild interface for Linux distro package management, setuptools for pluginmanagement in the Open Source Application Foundation's Chandler project, pipfor dependency management in web service development, and conda for locallanguage-independent environment management in data science.distutils and setuptools haven't fared especially well from a usabilityperspective when pushed beyond their original design parameters (hence thecurrent efforts to make it easier to use full-fledged build systems likeScons and Meson as an alternative when publishing Python packages), while pipand conda both seem to be doing a better job of accommodating increases intheir scope of application.This history helps illustrate that where things really have the potential toget complicated (even beyond the inherent challenges of domain-specificsoftware distribution) is when you start needing to cross domain boundaries.For example, as the lead maintainer of contextlib in the Pythonstandard library, I'm also the maintainer of the contextlib2 backportproject on PyPI. That's not a domain specific utility - folks may need itregardless of whether they're using a self-built Python runtime, a pre-builtWindows or Mac OS X binary they downloaded from python.org, a pre-builtbinary from a Linux distribution, a CPython runtime from some otherredistributor (homebrew, pyenv, Enthought Canopy, ActiveState,Continuum Analytics, AWS Lambda, Azure Machine Learning, etc), or perhaps evena different Python runtime entirely (PyPy, PyPy.js, Jython, IronPython,MicroPython, VOC, Batavia, etc).Fortunately for me, I don't need to worry about all that complexity in thewider ecosystem when I'm specifically wearing my contextlib2 maintainerhat - I just publish an sdist and a universal wheel file to PyPI, and the restof the ecosystem has everything it needs to take care of redistributionand end user consumption without any further input from me.However, contextlib2 is a pure Python project that only depends on thestandard library, so it's pretty much the simplest possible case from atooling perspective (the only reason I needed to upgrade from distutils tosetuptools was so I could publish my own wheel files, and the only reason Ihaven't switched to using the much simpler pure-Python-only flit instead ofeither of them is that that doesn't yet easily support publishing backwardscompatible setup.py based sdists).This means that things get significantly more complex once we start wanting touse and depend on components written in languages other than Python, so that'sthe broader context I'll consider next.Platform management or plugin management?When it comes to handling the software distribution problem in general, thereare two main ways of approaching it:design a plugin management system that doesn't concern itself with themanagement of the application framework that runs the pluginsdesign a platform component manager that not only manages the pluginsthemselves, but also the application frameworks that run themThis "plugin manager or platform component manager?" question shows up over andover again in software distribution architecture designs, but the case of mostrelevance to Python developers is in the contrasting approaches that pip andconda have adopted to handling the problem of external dependencies for Pythonprojects:pip is a plugin manager for Python runtimes. Once you have a Python runtime(any Python runtime), pip can help you add pieces to it. However, by design,it won't help you manage the underlying Python runtime (just as it wouldn'tmake any sense to try to install Mozilla Firefox as a Firefox Add-On, orGoogle Chrome as a Chrome Extension)conda, by contrast, is a component manager for a cross-platform platformthat provides its own Python runtimes (as well as runtimes for otherlanguages). This means that you can get pre-integrated components, ratherthan having to do your own integration between plugins obtained via pip andlanguage runtimes obtained via other meansWhat this means is that pip, on its own, is not in any way a directalternative to conda. To get comparable capabilities to those offered by conda,you have to add in a mechanism for obtaining the underlying language runtimes,which means the alternatives are combinations like:apt-get + pipdnf + pipyum + pippyenv + piphomebrew (Mac OS X) + pippython.org Windows installer + pipEnthought CanopyActiveState's Python runtime + PyPMThis is the main reason why "just use conda" is excellent advice to anyprospective Pythonista that isn't already using one of the platform componentmanagers mentioned above: giving that answer replaces an otherwise operatingsystem dependent or Python specific answer to the runtime management problemwith a cross-platform and (at least somewhat) language neutral one.It's an especially good answer for Windows users, as chocalatey/OneGet/WindowsPackage Management isn't remotely comparable to pyenv or homebrew at this pointin time, other runtime managers don't work on Windows, and getting folksbootstrapped with MinGW, Cygwin or the new (still experimental) WindowsSubsystem for Linux is just another hurdle to place between them and whatevergoal they're learning Python for in the first place.However, conda's pre-integration based approach to tackling the externaldependency problem is also why "just use conda for everything" isn't asufficient answer for the Python software ecosystem as a whole.If you're working on an operating system component for Fedora, Debian, or anyother distro, you actually want to be using the system provided Pythonruntime, and hence need to be able to readily convert your upstream Pythondependencies into policy compliant system dependencies.Similarly, if you're wanting to support folks that deploy to a preconfiguredPython environment in services like AWS Lambda, Azure Cloud Functions, Heroku,OpenShift or Cloud Foundry, or that use alternative Python runtimes like PyPyor MicroPython, then you need a publication technology that doesn't tightlycouple your releases to a specific version of the underlying language runtime.As a result, pip and conda end up existing at slightly different points in thesystem integration pipeline:Publishing and consuming Python software with pip is a matter of "bring yourown Python runtime". This has the benefit that you can readily bring yourown runtime (and manage it using whichever tools make sense for your usecase), but also has the downside that you must supply your own runtime(which can sometimes prove to be a significant barrier to entry for newPython users, as well as being a pain for cross-platform environmentmanagement).Like Linux system package managers before it, conda takes away therequirement to supply your own Python runtime by providing one for you.This is great if you don't have any particular preference as to whichruntime you want to use, but if you do need to use a different runtimefor some reason, you're likely to end up fighting against the tooling, ratherthan having it help you. (If you're tempted to answer "Just add anotherinterpreter to the pre-integrated set!" here, keep in mind that doing sowithout the aid of a runtime independent plugin manager like pip acts as amultiplier on the platform level integration testing needed, which can be asignificant cost even when it's automated)Where do we go next?In case it isn't already clear from the above, I'm largely happy with therespective niches that pip and conda are carving out for themselves as aplugin manager for Python runtimes and as a cross-platform platform focusedon (but not limited to) data analysis use cases.However, there's still plenty of scope to improve the effectiveness of thecollaboration between the upstream Python Packaging Authority and downstreamPython redistributors, as well as to reduce barriers to entry for participationin the ecosystem in general, so I'll go over some of the key areas I see forpotential improvement.Sustainability and the bystander effectIt's not a secret that the core PyPA infrastructure (PyPI, pip, twine,setuptools) isnowhere near as well-fundedas you might expect given its criticality to the operations of some trulyenormous organisations.The biggest impact of this is that even when volunteers show up ready andwilling to work, there may not be anybody in a position to effectively wranglethose volunteers, and help keep them collaborating effectively and moving in aproductive direction.To secure long term sustainability for the core Python packaging infrastructure,we're only talking amounts on the order of a few hundred thousand dollars ayear - enough to cover some dedicated operations and publisher support staff forPyPI (freeing up the volunteers currently handling those tasks to help work onecosystem improvements), as well as to fund targeted development directed atsome of the other problems described below.However, rather than being a true"tragedy of the commons",I personally chalk this situation up to a different human cognitive bias: thebystander effect.The reason I think that is that we have so many potential sources of thenecessary funding that even folks that agree there's a problem that needs to besolved are assuming that someone else will take care of it, without actuallychecking whether or not that assumption is entirely valid.The primary responsibility for correcting that oversight falls squarely on thePython Software Foundation, which is why the Packaging Working Group wasformed in order to investigate possible sources of additional funding, as wellas to determine how any such funding can be spent most effectively.However, a secondary responsibility also falls on customers and staff ofcommercial Python redistributors, as this is exactly the kind of ecosystemlevel risk that commercial redistributors are being paid to manage on behalf oftheir customers, and they're currently not handling this particular situationvery well. Accordingly, anyone that's actually paying for CPython, pip, andrelated tools (either directly or as a component of a larger offering), andexpecting them to be supported properly as a result, really needs to be askingsome very pointed question of their suppliers right about now. (Here's a samplequestion: "We pay you X dollars a year, and the upstream Python ecosystem isone of the things we expect you to support with that revenue. How much of whatwe pay you goes towards maintenance of the upstream Python packaginginfrastructure that we rely on every day?").One key point to note about the current situation is that as a 501(c)(3) publicinterest charity, any work the PSF funds will be directed towards betterfulfilling that public interest mission, and that means focusing primarily onthe needs of educators and non-profit organisations, rather than those ofprivate for-profit entities.Commercial redistributors are thus far better positioned to properlyrepresent their customers interests in areas where their priorities maydiverge from those of the wider community (closing the "insider threat"loophole in PyPI's current security model is a particular case that comes tomind - see Making PyPI security independent of SSL/TLS).Migrating PyPI to pypi.orgAn instance of the new PyPI implementation (Warehouse) is up and running athttps://pypi.org/ and connected directly to theproduction PyPI database, so folks can already explicitly opt-in to using itover the legacy implementation if they prefer to do so.However, there's still a non-trivial amount of design, development and QA workneeded on the new version before all existing traffic can be transparentlyswitched over to using it.Getting at least this step appropriately funded and a clear project managementplan in place is the main current focus of the PSF's Packaging Working Group.Making the presence of a compiler on end user systems optionalBetween the wheel format and the manylinux1 usefully-distro-independentABI definition, this is largely handled now, with conda available as anoption to handle the relatively small number of cases that are still a problemfor pip.The main unsolved problem is to allow projects to properly express theconstraints they place on target environments so that issues can be detectedat install time or repackaging time, rather than only being detected asruntime failures. Such a feature will also greatly expand the ability tocorrectly generate platform level dependencies when converting Pythonprojects to downstream package formats like those used by conda and Linuxsystem package managers.Bootstrapping dependency management tools on end user systemsWith pip being bundled with recent versions of CPython (including CPython 2.7maintenance releases), and pip (or a variant like upip) also being bundled withmost other Python runtimes, the ecosystem bootstrapping problem has largelybeen addressed for new Python users.There are still a few usability challenges to be addressed (like defaultingto per-user installations when outside a virtual environment, interoperatingmore effectively with platform component managers like conda, and providingan officially supported installation interface that works at the Python promptrather than via the operating system command line), but those don't requirethe same level of political coordination across multiple groups that wasneeded to establish pip as the lowest common denominator approach todependency management for Python applications.Making the use of distutils and setuptools optionalAs mentioned above, distutils was designed ~18 years ago as a common interfacefor Linux distributions to build Python projects, while setuptools was designed~12 years ago as a plugin management system for an open source MicrosoftExchange replacement. While both projects have given admirable service intheir original target niches, and quite a few more besides, their age andoriginal purpose means they're significantly more complex than what a userneeds if all they want to do is to publish their pure Python library orframework to the Python Package index.Their underlying complexity also makes it incredibly difficult to improve theproblematic state of their documentation, which is split between the legacydistutils documentation in the CPython standard library and the additionalsetuptools specific documentation in the setuptools project.Accordingly, what we want to do is to change the way build toolchains forPython projects are organised to have 3 clearly distinct tiers:toolchains for pure Python projectstoolchains for Python projects with simple C extensionstoolchains for C/C++/other projects with Python bindingsThis allows folks to be introduced to simpler tools like flit first, betterenables the development of potential alternatives to setuptools at the secondtier, and supports the use of full-fledged pip-installable build systems likeScons and Meson at the third tier.The first step in this project, defining the pyproject.toml format to allowdeclarative specification of the dependencies needed to launch setup.py,has been implemented, and Daniel Holth's enscons project demonstrates thatthat is already sufficient to bootstrap an external build system even withoutthe later stages of the project.Future steps include providing native support for pyproject.toml in pipand easy_install, as well as defining a declarative approach to invokingthe build system rather than having to run setup.py with the relevantdistutils & setuptools flags.Making PyPI security independent of SSL/TLSPyPI currently relies entirely on SSL/TLS to protect the integrity of the linkbetween software publishers and PyPI, and between PyPI and software consumers.The only protections against insider threats from within the PyPIadministration team are ad hoc usage of GPG artifact signing by some projects,personal vetting of new team members by existing team members and 3rd partychecks against previously published artifact hashes unexpectedly changing.A credible design for end-to-end package signing that adequately accounts forthe significant usability issues that can arise around publisher and consumerkey management has been available for almost 3 years at this point (seeSurviving a Compromise of PyPIandSurviving a Compromise of PyPI: the Maximum Security Edition).However, implementing that solution has been gated not only on being able tofirst retire the legacy infrastructure, but also the PyPI administators beingable to credibly commit to the key management obligations of operating thesigning system, as well as to ensuring that the system-as-implemented actuallyprovides the security guarantees of the system-as-designed.Accordingly, this isn't a project that can realistically be pursued until theunderlying sustainability problems have been suitably addressed.Automating wheel creationWhile redistributors will generally take care of converting upstream Pythonpackages into their own preferred formats, the Python-specific wheel formatis currently a case where it is left up to publishers to decide whether ornot to create them, and if they do decide to create them, how to automate thatprocess.Having PyPI take care of this process automatically is an obviously desirablefeature, but it's also an incredibly expensive one to build and operate.Thus, it currently makes sense to defer this cost to individual projects, asthere are quite a few commercial continuous integration and continuousdeployment service providers willing to offer free accounts to open sourceprojects, and these can also be used for the task of producing releaseartifacts. Projects also remain free to only publish source artifacts, relyingon pip's implicit wheel creation and caching and the appropriate use ofprivate PyPI mirrors and caches to meet the needs of end users.For downstream platform communities already offering shared buildinfrastructure to their members (such as Linux distributions and conda-forge),it may make sense to offer Python wheel generation as a supported output optionfor cross-platform development use cases, in addition to the platform's nativebinary packaging format. What problem does it solve? Nick Coghlan 2016-08-06 11:05 Comments One of the more puzzling aspects of Python for newcomers to the language is thestark usability differences between the standard library's urllib moduleand the popular (and well-recommended) third party module, requests, whenit comes to writing HTTP(S) protocol clients. When your problem is"talk to a HTTP server", the difference in usability isn't immediately obvious,but it becomes clear as soon as additional requirements like SSL/TLS,authentication, redirect handling, session management, and JSON request andresponse bodies enter the picture.It's tempting, and entirely understandable, to want tochalk this differencein ease of use up to requests being "Pythonic" (in 2016 terms), while urllibhas now become un-Pythonic (despite being included in the standard library).While there are certainly a few elements of that (e.g. the property builtinwas only added in Python 2.2, while urllib2 was included in the originalPython 2.0 release and hence couldn't take that into account in its API design),the vast majority of the usability difference relates to an entirely differentquestion we often forget to ask about the software we use:What problem does it solve?That is, many otherwise surprising discrepancies between urllib/urllib2and requests are best explained by the fact that they solve differentproblems, and the problems most HTTP client developers have todayare closer to those Kenneth Reitz designed requests to solve in 2010/2011,than they are to the problems that Jeremy Hylton was aiming to solve more thana decade earlier.It's all in the nameTo quote the current Python 3 urllib package documentation: "urllib is apackage that collects several modules for working with URLs".And the docstring from Jeremy'soriginal commit messageadding urllib2 to CPython: "An extensible library for opening URLs using avariety [of] protocols".Wait, what? We're just trying to write a HTTP client, so why is thedocumentation talking about working with URLs in general?While it may seem strange to developers accustomed to the modern HTTPS+JSONpowered interactive web, it wasn't always clear that that was how things weregoing to turn out.At the turn of the century, the expectation was instead that we'd retain arich variety of data transfer protocols with different characteristics optimisedfor different purposes, and that the most useful client to have in the standardlibrary would be one that could be used to talk to multiple different kindsof servers (like HTTP, FTP, NFS, etc), without client developers needing toworry too much about the specific protocol used (as indicated by the URLschema).In practice, things didn't work out that way (mostly due to restrictiveinstitutional firewalls meaning HTTP servers were the only remote services thatcould be accessed reliably), so folks in 2016 are now regularly comparing theusability of a dedicated HTTP(S)-only client library with a general purposeURL handling library that needs to be configured to specifically be usingHTTP(S) before you gain access to most HTTP(S) features.When it was written, urllib2 was a square peg that was designed to fit intothe square hole of "generic URL processing". By contrast, most modern clientdevelopers are looking for a round peg to fit into the round hole that isHTTPS+JSON processing - urllib/urllib2 will fit if you shave the cornersoff first, but requests comes pre-rounded.So why not add requests to the standard library?Answering the not-so-obvious question of "What problem does it solve?" thenleads to a more obvious follow-up question: if the problems that urllib/urllib2 were designed to solve are no longer common, while the problems thatrequests solves are common, why not add requests to the standard library?If I recall correctly, Guido gave in-principle approval to this idea at alanguage summit back in 2013 or so (after the requests 1.0 release), and it'sa fairly common assumption amongst the core development team that eitherrequests itself (perhaps as a bundled snapshot of an independently upgradablecomponent) or a compatible subset of the API with a different implementationwill eventually end up in the standard library.However, even putting aside themisgivings of the requests developersabout the idea, there are still some non-trivial system integration problemsto solve in getting requests to a point where it would be acceptable as astandard library component.In particular, one of the things that requests does to more reliably handleSSL/TLS certificates in a cross-platform way is to bundle the MozillaCertificate Bundle included in the certifi project. This is a sensiblething to do by default (due to the difficulties of obtaining reliable accessto system security certificates in a cross-platform way), but it conflictswith the security policy of the standard library, which specifically aims todelegate certificate management to the underlying operating system. That policyaims to address two needs: allowing Python applications access to custominstitutional certificates added to the system certificate store (most notably,private CA certificates for large organisations), and avoiding adding anadditional certificate store to end user systems that needs to be updated whenthe root certificate bundle changes for any other reason.These kinds of problems are technically solvable, but they're not fun to solve,and the folks in a position to help solve them already have a great many otherdemands on their time.This means we're not likely to see much in the way ofprogress in this area as long as most of the CPython and requests developersare pursuing their upstream contributions as a spare time activity, rather thanas something they're specifically employed to do. Propose a talk for the PyCon Australia Education Seminar! Nick Coghlan 2016-05-02 01:29 Comments The pitch Involved in Australian education, whether formally or informally?Making use of Python in your classes, workshops or other activities?Interested in sharing your efforts with other Australian educators, and withthe developers that create the tools you use? Able to get to the MelbourneConvention & Exhibition Centre on Friday August 12th, 2016?Then please consider submitting a proposal to speak at the Python in AustralianEducation seminar at PyCon Australia 2016! More information about the seminarcan be foundhere,while details of the submission process are on the mainCall for Proposalspage.Submissions close on Sunday May 8th, but may be edited further after submission(including during the proposal review process based on feedback from reviewers).PyCon Australia is a community-run conference, so everyone involved is avolunteer (organisers, reviewers, and speakers alike), but accepted speakersare eligible for discounted (or even free) registration, and assistance withother costs is also available to help ensure the conference doesn't miss outon excellent presentations due to financial need (for teachers needing topersuade skeptical school administrators, this assistance may extend tocontributing towards the costs of engaging a substitute teacher for the day).The backgroundAt PyCon Australia 2014, James Curran presented an excellent keynote on"Python for Every Child in Australia",covering some of the history of the National Computer Science School, thedevelopment of Australia's National Digital Curriculum (finally approved inSeptember 2015), and the opportunity this represented to introduce the nextgeneration of students to computational thinking in general, and Python inparticular.Encouraged by both Dr Curran's keynote at PyCon Australia, and Professor LorenaBarba's"If There's Computational Thinking, There's Computational Learning" keynote at SciPy 2014, it was my honour and privilegein 2015 not only to invite Carrie Anne Philbin, Education Pioneer at theUK's Raspberry Pi Foundation, to speak at the main conference (on"Designed for Education: a Python Solution"),but also to invite her to keynote the inaugural Python in Australian Educationseminar. With the support of the Python Software Foundation and Code ClubAustralia, Carrie Anne joined QSITE's Peter Whitehouse, Code Club Australia'sKelly Tagalan, and several other local educators, authors and community workshoporganisers to present an informative, inspirational and sometimes challengingseries of talks.For 2016, we have a new location in Melbourne (PyCon Australia has a two yearrotation in each city, and the Education seminar was launched during thesecond year in Brisbane), a new co-organiser (Katie Bell of Grok Learning andthe National Computer Science School), and a Call for Proposals and financialassistance program that are fully integrated with those for the main conference.As with the main conference, however, the Python in Australian Education seminaris designed around the idea of real world practitioners sharing information witheach other about their day to day experiences, what has worked well for them,and what hasn't, and creating personal connections that can help facilitateadditional collaboration throughout the year.So, in addition to encouraging people to submit their own proposals, I'd alsoencourage folks to talk to their friends and peers that they'd like to seepresenting, and see if they're interested in participating. 27 languages to improve your Python Nick Coghlan 2015-10-11 02:54 Comments 27 LanguagesBroadening our horizonsProcedural programming: C, Rust, CythonObject-oriented data modelling: Java, C#, EiffelObject-oriented C derivatives: C++, DArray-oriented data processing: MATLAB/Octave, JuliaStatistical data analysis: RComputational pipeline modelling: Haskell, Scala, Clojure, F#Event driven programming: JavaScript, Go, Erlang, ElixirGradual typing: TypeScriptDynamic metaprogramming: Hy, RubyPragmatic problem solving: Lua, PHP, PerlComputational thinking: Scratch, LogoAs a co-designer of one of the world's most popular programming languages, oneof the more frustrating behaviours I regularly see (both in the Python communityand in others) is influential people trying to tap into fears of "losing" toother open source communities as a motivating force for community contributions.(I'm occasionally guilty of this misbehaviour myself, which makes it eveneasier to spot when others are falling into the same trap).While learning from the experiences of other programming language communitiesis a good thing, fear based approaches to motivating action are seriouslyproblematic, as they encourage community members to see members of thoseother communities as enemies in a competition for contributor attention, ratherthan as potential allies in the larger challenge of advancing the state of theart in software development. It also has the effect of telling folks that enjoythose other languages that they're not welcome in a community that views themand their peers as "hostile competitors".In truth, we want there to be a rich smorgasboard of cross platform opensource programming languages to choose from, as programming languages are firstand foremost tools for thinking - they make it possible for us to convey ourideas in terms so explicit that even a computer can understand them. If someonehas found a language to use that fits their brain and solves their immediateproblems, that's great, regardless of the specific language (or languages)they choose.So I have three specific requests for the Python community, and one broadersuggestion. First, the specific requests:If we find it necessary to appeal to tribal instincts to motivate action, weshould avoid using tribal fear, and instead aim to use tribal pride.When we use fear as a motivator, as in phrasings like "If we don't do X,we're going to lose developer mindshare to language Y", we're deliberatelycreating negative emotions in folks freely contributing the results of theirwork to the world at large. Relying on tribal pride instead leads tophrasings like "It's currently really unclear how to solve problem X inPython. If we look to ecosystem Y, we can see they have a really niceapproach to solving problem X that we can potentially adapt to provide asimilarly nice user experience in Python". Actively emphasising taking pridein our own efforts, rather than denigrating the efforts of others, helpspromote a culture of continuous learning within the Python community andalso encourages the development of ever improving collaborativerelationships with other communities.Refrain from adopting attitudes of contempt towards other open sourceprogramming language communities, especially if those communities haveempowered people to solve their own problems rather than having to wait forcommercial software vendors to deign to address them. Most of the importantproblems in the world aren't profitable to solve (as the folks afflicted bythem aren't personally wealthy and don't control institutional fundingdecisions), so we should be encouraging and applauding the folks stepping upto try to solve them, regardless of what we may think of their technologychoices.If someone we know is learning to program for the first time, and theychoose to learn a language we don't personally like, we should support themin their choice anyway. They know what fits their brain better than we do,so the right language for us may not be the right language for them. Ifthey start getting frustrated with their original choice, to the point whereit's demotivating them from learning to program at all, then it makes senseto start recommending alternatives. This advice applies even for those of usinvolved in improving the tragically bad state of network security: the waywe solve the problem with inherently insecure languages is by improvingoperating system sandboxing capabilities, progressively knocking downbarriers to adoption for languages with better native security properties,and improving the default behaviours of existing languages, not by confusingbeginners with arguments about why their chosen language is a poor choicefrom an application security perspective. (If folks are deploying unauditedsoftware written by beginners to handle security sensitive tasks, it isn'tthe folks writing the software that are the problem, it's the folksdeploying it without performing appropriate due diligence on the provenanceand security properties of that software)My broader suggestion is aimed at folks that are starting to encounter thelimits of the core procedural subset of Python and would hence like to startexploring more of Python's own available "tools for thinking".Broadening our horizonsOne of the things we do as part of the Python core development process is tolook at features we appreciate having available in other languages we haveexperience with, and see whether or not there is a way to adapt them to beuseful in making Python code easier to both read and write. This means thatlearning another programming language that focuses more specifically on agiven style of software development can help improve anyone's understandingof that style of programming in the context of Python.To aid in such efforts, I've provided a list below of some possible areas forexploration, and other languages which may provide additional insight intothose areas. Where possible, I've linked to Wikipedia pages rather thandirectly to the relevant home pages, as Wikipedia often provides interestinghistorical context that's worth exploring when picking up a new programminglanguage as an educational exercise rather than for immediate practical use.While I do know many of these languages personally (and have used several ofthem in developing production systems), the full list of recommendationsincludes additional languages that I only know indirectly (usually by eitherreading tutorials and design documentation, or by talking to folks that I trustto provide good insight into a language's strengths and weaknesses).There are a lot of other languages that could have gone on this list, so thespecific ones listed are a somewhat arbitrary subset based on my own interests(for example, I'm mainly interested in the dominant Linux, Android and Windowsecosystems, so I left out the niche-but-profitable Apple-centric Objective-Cand Swift programming languages, and I'm not familiar enough with art-focusedenvironments like Processing to even guess at what learning them might teacha Python developer). For a more complete list that takes into account factorsbeyond what a language might teach you as a developer, IEEE Spectrum'sannual ranking of programming language popularity and growth is well worth alook.Procedural programming: C, Rust, CythonPython's default execution model is procedural: we start at the top of the mainmodule and execute it statement by statement. All of Python's support for theother approaches to data and computational modelling covered below is builton this procedural foundation.The C programming language is still the unchallenged ruler of low levelprocedural programming. It's the core implementation language for the referencePython interpreter, and also for the Linux operating system kernel. As asoftware developer, learning C is one of the best ways to start learning moreabout the underlying hardware that executes software applications - C is oftendescribed as "portable assembly language", and one of the first applicationscross-compiled for any new CPU architecture will be a C compilerRust, by contrast, is a relatively new programming language created byMozilla. The reason it makes this list is because Rust aims to take all of thelessons we've learned as an industry regarding what not to do in C, anddesign a new language that is interoperable with C libraries, offers the sameprecise control over hardware usage that is needed in a low level systemsprogramming language, but uses a different compile time approach to data modellingand memory management to structurally eliminate many of the common flawsafflicting C programs (such as buffer overflows, double free errors, nullpointer access, and thread synchronisation problems). I'm an embedded systemsengineer by training and initial professional experience, and Rust is the firstnew language I've seen that looks like it may have the potential to scale downto all of the niches currently dominated by C and custom assembly code.Cython is also a lower level procedural-by-default language, but unlikegeneral purpose languages like C and Rust, Cython is aimed specifically atwriting CPython extension modules. To support that goal, Cython is designed asa Python superset, allowing the programmer to choose when to favour the purePython syntax for flexibility, and when to favour Cython's syntax extensionsthat make it possible to generate code that is equivalent to native C code interms of speed and memory efficiency.Learning one of these languages is likely to provide insight into memorymanagement, algorithmic efficiency, binary interface compatibility, softwareportability, and other practical aspects of turning source code into runningsystems.Object-oriented data modelling: Java, C#, EiffelOne of the main things we need to do in programming is to model the state ofthe real world, and offering native syntactic support for object-orientedprogramming is one of the most popular approaches for doing that:structurally grouping data structures, and methods for operating on thosedata structures into classes.Python itself is deliberately designed so that it is possible to use theobject-oriented features without first needing to learn to write your ownclasses. Not every language adopts that approach - those listed in this sectionare ones that consider learning object-oriented design to be a requirement forusing the language at all.After a major marketing push by Sun Microsystems in the mid-to-late 1990's,Java became the default language for teaching introductory computer sciencein many tertiary institutions. While it is now being displaced by Python formany educational use cases, it remains one of the most popular languages forthe development of business applications. There are a range of other languagesthat target the common JVM (Java Virtual Machine) runtime, including theJython implementation of Python. The Dalvik and ART environments for Androidsystems are based on a reimplementation of the Java programming APIs.C# is similar in many ways to Java, and emerged as an alternative after Sunand Microsoft failed to work out their business differences around Microsoft'sJava implementation, J++. Like Java, it's a popular language for thedevelopment of business applications, and there are a range of other languagesthat target the shared .NET CLR (Common Language Runtime), includingthe IronPython implementation of Python (the core components of the originalIronPython 1.0 implementation were extracted to create the language neutral.NET Dynamic Language Runtime). For a long time, .NET was a proprietary Windowsspecific technology, with mono as a cross-platform open sourcereimplementation, but Microsoft shifted to an open source ecosystem strategyin early 2015.Unlike most of the languages in this list, Eiffel isn't one I'd recommendfor practical day-to-day use. Rather, it's one I recommend because learning ittaught me an incredible amount about good object-oriented design where"verifiably correct" is a design goal for the application. (Learning Eiffel alsotaught me a lot about why "verifiably correct" isn't actually a design goal inmost software development, as verifiably correct software really doesn't copewell with ambiguity and is entirely unsuitable for cases where you genuinelydon't know the relevant constraints yet and need to leave yourself enoughwiggle room to be able to figure out the finer details through iterativedevelopment).Learning one of these languages is likely to provide insight into inheritancemodels, design-by-contract, class invariants, pre-conditions, post-conditions,covariance, contravariance, method resolution order, generic programming, andvarious other notions that also apply to Python's type system. There are alsoa number of standard library modules and third party frameworks that use this"visibly object-oriented" design style, such as the unittest and loggingmodules, and class-based views in the Django web framework.Object-oriented C derivatives: C++, DOne way of using the CPython runtime is as a "C with objects" programmingenvironment - at its core, CPython is implemented using C's approach toobject-oriented programming, which is to define C structs to hold the dataof interest, and to pass in instances of the struct as the first argument tofunctions that then manipulate that data (these are the omnipresentPyObject* pointers in the CPython C API). This design pattern isdeliberately mirrored at the Python level in the form of the explicit selfand cls arguments to instance methods and class methods.C++ is a programming language that aimed to retain full source compatibilitywith C, while adding higher level features like native object-orientedprogramming support and template based metaprogramming. It's notoriously verboseand hard to program in (although the 2011 update to the language standardaddressed many of the worst problems), but it's also the language of choice inmany contexts, including 3D modelling graphics engines and cross-platformapplication development frameworks like Qt.The D programming language is also interesting, as it has a similarrelationship to C++ as Rust has to C: it aims to keep most of the desirablecharacteristics of C++, while also avoiding many of its problems (like the lackof memory safety). Unlike Rust, D was not a ground up design of a newprogramming language from scratch - instead, D is a close derivative of C++,and while it isn't a strict C superset as C++ is, it does follow the designprinciple that any code that falls into the common subset of C and D mustbehave the same way in both languages.Learning one of these languages is likely to provide insight into thecomplexities of combining higher level language features with the underlyingC runtime model. Learning C++ is also likely to be useful when using Pythonto manipulate existing libraries and toolkits written in C++.Array-oriented data processing: MATLAB/Octave, JuliaArray oriented programming is designed to support numerical programming models:those based on matrix algebra and related numerical methods.While Python's standard library doesn't support this directly, array orientedprogramming is taken into account in the language design, with a range ofsyntactic and semantic features being added specifically for the benefit ofthe third party NumPy library and similarly array-oriented tools.In many cases, the Scientific Python stack is adopted as an alternative tothe proprietary MATLAB programming environment, which is used extensivelyfor modelling, simulation and numerical data analysis in science andengineering. GNU Octave is an open source alternative that aims to besyntactically compatible with MATLAB code, allowing folks to compare andcontrast the two approaches to array-oriented programming.Julia is another relatively new language, which focuses heavily on arrayoriented programming and type-based function overloading.Learning one of these languages is likely to provide insight into thecapabilities of the Scientific Python stack, as well as providing opportunitiesto explore hardware level parallel execution through technologies like OpenCLand Nvidia's CUDA, and distributed data processing through ecosystems likeApache Spark and the Python-specific Blaze.Statistical data analysis: RAs access to large data sets has grown, so has demand for capable freelyavailable analytical tools for processing those data sets. One such tool isthe R programming language, which focuses specifically on statistical dataanalysis and visualisation.Learning R is likely to provide insight into the statistical analysiscapabilities of the Scientific Python stack, especially the pandas datamanipulation library and the seaborn statistical visualisation library.Computational pipeline modelling: Haskell, Scala, Clojure, F#Object-oriented data modelling and array-oriented data processing focus a lotof attention on modelling data at rest, either in the form of collections ofnamed attributes or as arrays of structured data.By contrast, functional programming languages emphasise the modelling of datain motion, in the form of computational flows. Learning at least the basicsof functional programming can help greatly improve the structure of datatransformation operations even in otherwise procedural, object-oriented orarray-oriented applications.Haskell is a functional programming language that has had a significantinfluence on the design of Python, most notably through the introduction oflist comprehensions in Python 2.0.Scala is an (arguably) functional programming language for the JVM that,together with Java, Python and R, is one of the four primary programminglanguages for the Apache Spark data analysis platform. While being designed toencourage functional programming approaches, Scala's syntax, data model, andexecution model are also designed to minimise barriers to adoption for currentJava programmers (hence the "arguably" - the case can be made that Scala isbetter categorised as an object-oriented programming language with strongfunctional programming support).Clojure is another functional programming language for the JVM that isdesigned as a dialect of Lisp. It earns its place in this list by beingthe inspiration for the toolz functional programming toolkit for Python.F# isn't a language I'm particularly familiar with myself, but seems worthnoting as the preferred functional programming language for the .NET CLR.Learning one of these languages is likely to provide insight into Python's owncomputational pipeline modelling tools, including container comprehensions,generators, generator expressions, the functools and itertools standardlibrary modules, and third party functional Python toolkits like toolz.Event driven programming: JavaScript, Go, Erlang, ElixirComputational pipelines are an excellent way to handle data transformation andanalysis problems, but many problems require that an application run as apersistent service that waits for events to occur, and then handles thoseevents. In these kinds of services, it is usually essential to be able to handlemultiple events concurrently in order to be able to accommodate multiple users(or at least multiple actions) at the same time.JavaScript was originally developed as an event handling language for webbrowsers, permitting website developers to respond locally to client sideactions (such as mouse clicks and key presses) and events (such as the pagerendering being completed). It is supported in all modern browsers, andtogether with the HTML5 Domain Object Model, has become a de facto standardfor defining the appearance and behaviour of user interfaces.Go was designed by Google as a purpose built language for creating highlyscalable web services, and has also proven to be a very capable language fordeveloping command line applications. The most interesting aspect of Go froma programming language design perspective is its use of CommunicatingSequential Processes concepts in its core concurrency model.Erlang was designed by Ericsson as a purpose built language for creatinghighly reliable telephony switches and similar devices, and is the languagepowering the popular RabbitMQ message broker. Erlang uses the Actor modelas its core concurrency primitive, passing messages between threads ofexecution, rather than allowing them to share data directly. While I've neverprogrammed in Erlang myself, my first full-time job involved working with (andon) an Actor-based concurrency framework for C++ developed by an ex-Ericssonengineer, as well as developing such a framework myself based on the TSK (Task)and MBX (Mailbox) primitives in Texas Instrument's lightweight DSP/BIOSruntime (now known as TI-RTOS).Elixir earns an entry on the list by being a language designed to run on theErlang VM that exposes the same concurrency semantics as Erlang, while alsoproviding a range of additional language level features to help provide a morewell-rounded environment that is more likely to appeal to developers migratingfrom other languages like Python, Java, or Ruby.Learning one of these languages is likely to provide insight into Python's ownconcurrency and parallelism support, including native coroutines, generatorbased coroutines, the concurrent.futures and asyncio standardlibrary modules, third party network service development frameworks likeTwisted and Tornado, the channels concept being introduced to Django,and the event handling loops in GUI frameworks.Gradual typing: TypeScriptOne of the more controversial features that landed in Python 3.5 was the newtyping module, which brings a standard lexicon for gradual typing supportto the Python ecosystem.For folks whose primary exposure to static typing is in languages like C,C++ and Java, this seems like an astoundingly terrible idea (hence thecontroversy).Microsoft's TypeScript, which provides gradual typing for JavaScriptapplications provides a better illustration of the concept. TypeScript codecompiles to JavaScript code (which then doesn't include any runtime typechecking), and TypeScript annotations for popular JavaScript libraries aremaintained in the dedicated DefinitelyTyped repository.As Chris Neugebauer pointed out in his PyCon Australia presentation, this isvery similar to the proposed relationship between Python, the typeshed typehint repository, and type inference and analysis tools like mypy.In essence, both TypeScript and type hinting in Python are ways of writingparticular kinds of tests, either as separate files (just like normal tests),or inline with the main body of the code (just like type declarations instatically typed languages). In either case, you run a separate command toactually check that the rest of the code is consistent with the available typeassertions (this occurs implicitly as part of the compilation to JavaScript forTypeScript, and as an entirely optional static analysis task for Python's typehinting).Dynamic metaprogramming: Hy, RubyA feature folks coming to Python from languages like C, C++, C# and Java oftenfind disconcerting is the notion that "code is data": the fact that things likefunctions and classes are runtime objects that can be manipulated like anyother object.Hy is a Lisp dialect that runs on both the CPython VM and the PyPy VM. Lispdialects take the "code as data" concept to extremes, as Lisp code consists ofnested lists describing the operations to be performed (the name of the languageitself stands for "LISt Processor"). The great strength of Lisp-style languagesis that they make it incredibly easy to write your own domain specificlanguages. The great weakness of Lisp-style languages is that they make itincredibly easy to write your own domain specific languages, which can sometimesmake it difficult to read other people's code.Ruby is a language that is similar to Python in many respects, but as acommunity is far more open to making use of dynamic metaprogramming featuresthat are "supported, but not encouraged" in Python. This includes things likereopening class definitions to add additional methods, and using closures toimplement core language constructs like iteration.Learning one of these languages is likely to provide insight into Python's owndynamic metaprogramming support, including function and class decorators,monkeypatching, the unittest.mock standard library module, and thirdparty object proxying modules like wrapt. (I'm not aware of any languages tolearn that are likely to provide insight into Python's metaclass system, so ifanyone has any suggestions on that front, please mention them in the comments.Metaclasses power features like the core type system, abstract base classes,enumeration types and runtime evaluation of gradual typing expressions)Pragmatic problem solving: Lua, PHP, PerlPopular programming languages don't exist in isolation - they exist as part oflarger ecosystems of redistributors (both commercial and community focused),end users, framework developers, tool developers, educators and more.Lua is a popular programming language for embedding in larger applicationsas a scripting engine. Significant examples include it being the languageused to write add-ons for the World of Warcraft game client, and it's alsoembedded in the RPM utility used by many Linux distributions. Compared toCPython, a Lua runtime will generally be a tenth of the size, and its weakerintrospection capabilities generally make it easier to isolate from the rest ofthe application and the host operating system. A notable contribution from theLua community to the Python ecosystem is the adoption of the LuaJIT FFI(Foreign Function Interface) as the basis of the JIT-friendly cffi interfacelibrary for CPython and PyPy.PHP is another popular programming language that rose to prominence as theoriginal "P" in the Linux-Apache-MySQL-PHP LAMP stack, due to its focus onproducing HTML pages, and its broad availability on early Virtual PrivateServer hosting providers. For all the handwringing about conceptual flaws invarious aspects of its design, it's now the basis of several widely popularopen source web services, including the Drupal content management system, theWordpress blogging engine, and the MediaWiki engine that powers Wikipedia. PHPalso powers important services like the Ushahidi platform for crowdsourcedcommunity reporting on distributed events.Like PHP, Perl rose to popularity on the back of Linux. Unlike PHP, whichgrew specifically as a web development platform, Perl rose to prominence asa system administrator's tool, using regular expressions to string togetherand manipulate the output of text-based Linux operating system commands. Whensh, awk and sed were no longer up to handling a task, Perl was thereto take over.Learning one of these languages isn't likely to provide any great insight intoaesthetically beautiful or conceptually elegant programming language design.What it is likely to do is to provide some insight into how programminglanguage distribution and adoption works in practice, and how much that has todo with fortuitous opportunities, accidents of history and lowering barriers toadoption by working with redistributors to be made available by default, ratherthan the inherent capabilities of the languages themselves.In particular, it may provide insight into the significance of projects likeCKAN, OpenStack NFV, Blender, SciPy, OpenMDAO, PyGMO,PyCUDA, the Raspberry Pi Foundation and Python's adoption by awide range of commercial organisations, for securing ongoinginstitutional investment in the Python ecosystem.Computational thinking: Scratch, LogoFinally, I fairly regularly get into discussions with functional andobject-oriented programming advocates claiming that those kinds of languagesare just as easy to learn as procedural ones.I think the OOP folks have a point if we're talking about teaching throughembodied computing (e.g. robotics), where the objects being modelled insoftware have direct real world counterparts the students can touch, likesensors, motors, and relays.For everyone else though, I now have a standard challenge: pick up a cookbook,translate one of the recipes into the programming language you're claiming iseasy to learn, and then get a student that understands the language theoriginal cookbook was written in to follow the translated recipe. Most of thetime folks don't need to actually follow through on this - just running itas a thought experiment is enough to help them realise how much prior knowledgetheir claim of "it's easy to learn" is assuming. (I'd love to see academicresearchers perform this kind of study for real though - I'd be genuinelyfascinated to read the results)Another way to tackle this problem though is to go learn the languages thatare actually being used to start teaching computational thinking to children.One of the most popular of those is Scratch, which uses a drag-and-dropprogramming interface to let students manipulate a self-contained graphicalenvironment, with sprites moving around and reacting to events in thatenvironment. Graphical environments like Scratch are the programmingequivalent of the picture books we use to help introduce children to readingand writing.This idea of using a special purpose educational language to manipulate agraphical environment isn't new though, with one of the earliest incarnationsbeing the Logo environment created back in the 1960's. In Logo (and similarenvironments like Python's own turtle module), the main thing you'reinteracting with is a "turtle", which you can instruct to move around andmodify its environment by drawing lines. This way, concepts like commandsequences, repetition, and state (e.g. "pen up", "pen down") can be introducedin a way that builds on people's natural intuitions ("imagine you're the turtle,what's going to happen if you turn right 90 degrees?")Going back and relearning one of these languages as an experienced programmeris most useful as a tool for unlearning: the concepts they introduce helpremind us that these are concepts that we take for granted now, but needed tolearn at some point as beginners. When we do that, we're better able to workeffectively with students and other newcomers, as we're more likely toremember to unpack our chains of logic, including the steps we'd otherwise takefor granted. TCP echo client and server in Python 3.5 Nick Coghlan 2015-07-11 06:31 Comments This is a follow-on from myprevious poston Python 3.5's new async/await syntax. Rather than the simple backgroundtimers used in the original post, this one will look at the impact nativecoroutine support has on the TCP echo client and server examples from theasyncio documentation.First, we'll recreate the run_in_foreground helper defined in the previouspost. This helper function makes it easier to work with coroutines fromotherwise synchronous code (like the interactive prompt):def run_in_foreground(task, *, loop=None): """Runs event loop in current thread until the given task completes Returns the result of the task. For more complex conditions, combine with asyncio.wait() To include a timeout, combine with asyncio.wait_for() """ if loop is None: loop = asyncio.get_event_loop() return loop.run_until_complete(asyncio.ensure_future(task, loop=loop))Next we'll define the coroutine for our TCP echo server implementation,which simply waits to receive up to 100 bytes on each new client connection,and then sends that data back to the client:async def handle_tcp_echo(reader, writer): data = await reader.read(100) message = data.decode() addr = writer.get_extra_info('peername') print("-> Server received %r from %r" % (message, addr)) print("<- Server sending: %r" % message) writer.write(data) await writer.drain() print("-- Terminating connection on server") writer.close()And then the client coroutine we'll use to send a message and wait for aresponse:async def tcp_echo_client(message, port, loop=None): reader, writer = await asyncio.open_connection('127.0.0.1', port, loop=loop) print('-> Client sending: %r' % message) writer.write(message.encode()) data = (await reader.read(100)).decode() print('<- Client received: %r' % data) print('-- Terminating connection on client') writer.close() return dataWe then use our run_in_foreground helper to interact with these coroutinesfrom the interactive prompt. First, we start the echo server:>>> make_server = asyncio.start_server(handle_tcp_echo, '127.0.0.1')>>> server = run_in_foreground(make_server)Conveniently, since this is a coroutine running in the current thread, ratherthan in a different thread, we can retrieve the details of the listeningsocket immediately, including the automatically assigned port number:>>> server.sockets[0]>>> port = server.sockets[0].getsockname()[1]Since we haven't needed to hardcode the port number, if we want to define asecond server, we can easily do that as well:>>> make_server2 = asyncio.start_server(handle_tcp_echo, '127.0.0.1')>>> server2 = run_in_foreground(make_server2)>>> server2.sockets[0]>>> port2 = server2.sockets[0].getsockname()[1]Now, both of these servers are configured to run directly in the main thread'sevent loop, so trying to talk to them using a synchronous client wouldn't work.The client would block the main thread, and the servers wouldn't be able toprocess incoming connections. That's where our asynchronous client coroutinecomes in: if we use that to send messages to the server, then it doesn'tblock the main thread either, and both the client and server coroutines canprocess incoming events of interest. That gives the following results:>>> print(run_in_foreground(tcp_echo_client('Hello World!', port)))-> Client sending: 'Hello World!'-> Server received 'Hello World!' from ('127.0.0.1', 44386)<- Server sending: 'Hello World!'-- Terminating connection on server<- Client received: 'Hello World!'-- Terminating connection on clientHello World!Note something important here: you will get exactly that sequence ofoutput messages, as this is all running in the interpreter's main thread, ina deterministic order. If the servers were running in their own threads, wewouldn't have that property (and reliably getting access to the port numbersthe server components were assigned by the underlying operating system wouldalso have been far more difficult).And to demonstrate both servers are up and running:>>> print(run_in_foreground(tcp_echo_client('Hello World!', port2)))-> Client sending: 'Hello World!'-> Server received 'Hello World!' from ('127.0.0.1', 44419)<- Server sending: 'Hello World!'-- Terminating connection on server<- Client received: 'Hello World!'-- Terminating connection on clientHello World!That then raises an interesting question: how would we send messages to thetwo servers in parallel, while still only using a single thread to manage theclient and server coroutines? For that, we'll need another of our helperfunctions from the previous post, schedule_coroutine:def schedule_coroutine(target, *, loop=None): """Schedules target coroutine in the given event loop If not given, *loop* defaults to the current thread's event loop Returns the scheduled task. """ if asyncio.iscoroutine(target): return asyncio.ensure_future(target, loop=loop) raise TypeError("target must be a coroutine, " "not {!r}".format(type(target)))Update: As with the previous post, this post originally suggested acombined "run_in_background" helper function that handled both schedulingcoroutines and calling arbitrary callables in a background thread or process.On further reflection, I decided that was unhelpfully conflating two differentconcepts, so I replaced it with separate "schedule_coroutine" and"call_in_background" helpersFirst, we set up the two client operations we want to run in parallel:>>> echo1 = schedule_coroutine(tcp_echo_client('Hello World!', port))>>> echo2 = schedule_coroutine(tcp_echo_client('Hello World!', port2))Then we use the asyncio.wait function in combination with run_in_foregroundto run the event loop until both operations are complete:>>> run_in_foreground(asyncio.wait([echo1, echo2]))-> Client sending: 'Hello World!'-> Client sending: 'Hello World!'-> Server received 'Hello World!' from ('127.0.0.1', 44461)<- Server sending: 'Hello World!'-- Terminating connection on server-> Server received 'Hello World!' from ('127.0.0.1', 44462)<- Server sending: 'Hello World!'-- Terminating connection on server<- Client received: 'Hello World!'-- Terminating connection on client<- Client received: 'Hello World!'-- Terminating connection on client({:1> result='Hello World!'>, :1> result='Hello World!'>}, set())And finally, we retrieve our results using the result method of the taskobjects returned by schedule_coroutine:>>> echo1.result()'Hello World!'>>> echo2.result()'Hello World!'We can set up as many concurrent background tasks as we like, and then useasyncio.wait as the foreground task to wait for them all to complete.But what if we had an existing blocking client function that we wanted orneeded to use (e.g. we're using an asyncio server to test a synchronousclient API). To handle that case, we use our third helper function from theprevious post:def call_in_background(target, *, loop=None, executor=None): """Schedules and starts target callable as a background task If not given, *loop* defaults to the current thread's event loop If not given, *executor* defaults to the loop's default executor Returns the scheduled task. """ if loop is None: loop = asyncio.get_event_loop() if callable(target): return loop.run_in_executor(executor, target) raise TypeError("target must be a callable, " "not {!r}".format(type(target)))To explore this, we'll need a blocking client, which we can build based onPython's existingsocket programming HOWTO guide:import socketdef tcp_echo_client_sync(message, port): conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM) print('-> Client connecting to port: %r' % port) conn.connect(('127.0.0.1', port)) print('-> Client sending: %r' % message) conn.send(message.encode()) data = conn.recv(100).decode() print('<- Client received: %r' % data) print('-- Terminating connection on client') conn.close() return dataWe can then use functools.partial in combination with call_in_background tostart client requests in multiple operating system level threads:>>> query_server = partial(tcp_echo_client_sync, "Hello World!", port)>>> query_server2 = partial(tcp_echo_client_sync, "Hello World!", port2)>>> bg_call = call_in_background(query_server)-> Client connecting to port: 35876-> Client sending: 'Hello World!'>>> bg_call2 = call_in_background(query_server2)-> Client connecting to port: 41672-> Client sending: 'Hello World!'Here we see that, unlike our coroutine clients, the synchronous clients havestarted running immediately in a separate thread. However, because the eventloop isn't currently running in the main thread, they've blocked waiting fora response from the TCP echo servers. As with the coroutine clients, weaddress that by running the event loop in the main thread until our clientshave both received responses:>>> run_in_foreground(asyncio.wait([bg_call, bg_call2]))-> Server received 'Hello World!' from ('127.0.0.1', 52585)<- Server sending: 'Hello World!'-- Terminating connection on server-> Server received 'Hello World!' from ('127.0.0.1', 34399)<- Server sending: 'Hello World!'<- Client received: 'Hello World!'-- Terminating connection on server-- Terminating connection on client<- Client received: 'Hello World!'-- Terminating connection on client({, }, set())>>> bg_call.result()'Hello World!'>>> bg_call2.result()'Hello World!' Background tasks in Python 3.5 Nick Coghlan 2015-07-10 08:17 Comments One of the recurring questions with asyncio is "How do I execute one or twooperations asynchronously in an otherwise synchronous application?"Say, for example, I have the following code:>>> import itertools, time>>> def ticker():... for i in itertools.count():... print(i)... time.sleep(1)...>>> ticker()0123^CTraceback (most recent call last): File "", line 1, in File "", line 4, in tickerKeyboardInterruptWith the native coroutine syntax coming in Python 3.5, I can change thatsynchronous code into event-driven asynchronous code easily enough:import asyncio, itertoolsasync def ticker(): for i in itertools.count(): print(i) await asyncio.sleep(1)But how do I arrange for that ticker to start running in the background? What'sthe coroutine equivalent of appending & to a shell command?It turns out it looks something like this:import asynciodef schedule_coroutine(target, *, loop=None): """Schedules target coroutine in the given event loop If not given, *loop* defaults to the current thread's event loop Returns the scheduled task. """ if asyncio.iscoroutine(target): return asyncio.ensure_future(target, loop=loop) raise TypeError("target must be a coroutine, " "not {!r}".format(type(target)))Update: This post originally suggested a combined "run_in_background"helper function that handle both scheduling coroutines and calling arbitrarycallables in a background thread or process. On further reflection, I decidedthat was unhelpfully conflating two different concepts, so I replaced it withseparate "schedule_coroutine" and "call_in_background" helpersSo now I can do:>>> import itertools>>> async def ticker():... for i in itertools.count():... print(i)... await asyncio.sleep(1)...>>> ticker1 = schedule_coroutine(ticker())>>> ticker1:1>>But how do I run that for a while? The event loop won't run unless the currentthread starts it running and either stops when a particular event occurs, orwhen explicitly stopped. Another helper function covers that:def run_in_foreground(task, *, loop=None): """Runs event loop in current thread until the given task completes Returns the result of the task. For more complex conditions, combine with asyncio.wait() To include a timeout, combine with asyncio.wait_for() """ if loop is None: loop = asyncio.get_event_loop() return loop.run_until_complete(asyncio.ensure_future(task, loop=loop))And then I can do:>>> run_in_foreground(asyncio.sleep(5))01234Here we can see the background task running while we wait for the foregroundtask to complete. And if I do it again with a different timeout:>>> run_in_foreground(asyncio.sleep(3))567We see that the background task picked up again right where it left offthe first time.We can also single step the event loop with a zero second sleep (the ticksreflect the fact there was more than a second delay between running eachcommand):>>> run_in_foreground(asyncio.sleep(0))8>>> run_in_foreground(asyncio.sleep(0))9And start a second ticker to run concurrently with the first one:>>> ticker2 = schedule_coroutine(ticker())>>> ticker2:1>>>>> run_in_foreground(asyncio.sleep(0))010The asynchronous tickers will happily hang around in the background, ready toresume operation whenever I give them the opportunity. If I decide I want tostop one of them, I can cancel the corresponding task:>>> ticker1.cancel()True>>> run_in_foreground(asyncio.sleep(0))1>>> ticker2.cancel()True>>> run_in_foreground(asyncio.sleep(0))But what about our original synchronous ticker? Can I run that as abackground task? It turns out I can, with the aid of another helper function:def call_in_background(target, *, loop=None, executor=None): """Schedules and starts target callable as a background task If not given, *loop* defaults to the current thread's event loop If not given, *executor* defaults to the loop's default executor Returns the scheduled task. """ if loop is None: loop = asyncio.get_event_loop() if callable(target): return loop.run_in_executor(executor, target) raise TypeError("target must be a callable, " "not {!r}".format(type(target)))However, I haven't figured out how to reliably cancel a task running in aseparate thread or process, so for demonstration purposes, we'll define avariant of the synchronous version that stops automatically after 5 ticksrather than ticking indefinitely:import itertools, timedef tick_5_sync(): for i in range(5): print(i) time.sleep(1) print("Finishing")The key difference between scheduling a callable in a background thread andscheduling a coroutine in the current thread, is that the callable will startexecuting immediately, rather than waiting for the current threadto run the event loop:>>> threaded_ticker = call_in_background(tick_5_sync); print("Starts immediately!")0Starts immediately!>>> 1234FinishingThat's both a strength (as you can run multiple blocking IO operations inparallel), but also a significant weakness - one of the benefits of explicitcoroutines is their predictability, as you know none of them will startdoing anything until you start running the event loop. Inaugural PyCon Australia Education Miniconf Nick Coghlan 2015-04-27 10:43 Comments PyCon Australia launched its Call for Papers just over a month ago, and it closes in a little over a week on Friday the 8th of May.A new addition to PyCon Australia this year, and one I'm particularly excited about co-organising following Dr James Curran's "Python for Every Child in Australia" keynote last year, is the inaugural Python in Education miniconf as a 4th specialist track on the Friday of the conference, before we move into the main program over the weekend.From the CFP announcement: "The Python in Education Miniconf aims to bring together community workshop organisers, professional Python instructors and professional educators across primary, secondary and tertiary levels to share their experiences and requirements, and identify areas of potential collaboration with each other and also with the broader Python community."If that sounds like you, then I'd love to invite you to head over to the conference website and make your submission to the Call for Papers!This year, all 4 miniconfs (Education, Science & Data Analysis, OpenStack and DjangoCon AU) are running our calls for proposals as part of the main conference CFP - every proposal submitted will be considered for both the main conference and the miniconfs.I'm also pleased to announce two pre-arranged sessions at the Education Miniconf:Carrie Anne Philbin, author of "Adventures in Raspberry Pi" and Education Pioneer at the Raspberry Pi Foundation will be speaking on the Foundation's Picademy professional development program for primary and secondary teachers Peter Whitehouse, author of IPT - a virtual approach (online since 1992!) and a longstanding board member of the Queensland Society for Information Technology in Education will provide insight into some of the challenges and opportunities of integrating Python and other open source software into Australian IT educationI'm genuinely looking forward to chairing this event, as I see tremendous potential in forging stronger connections between Australian educators (both formal and informal) and the broader Python and open source communities. Accessing TrueCrypt Encrypted Files on Fedora 22 Nick Coghlan 2015-04-25 22:24 Comments I recently got a new ultrabook (a HP Spectre 360), which means I finally have enough space to transfer my music files from the external drive where they've been stored for the past few years back to the laptop (there really wasn't enough space for them on my previous laptop, a first generation ASUS Zenbook, but even with the Windows partition still around, the extra storage space on the new device leaves plenty of room for my music collection).Just one small problem: the bulk of the storage on that drive was in a TrueCrypt encrypted file, and the Dolphin file browser in KDE doesn't support mounting those as volumes through the GUI (at least, it doesn't as far as I could see).So, off to the command line we go. While TrueCrypt itself isn't readily available for Fedora due to problems with its licensing terms, the standard cryptsetup utility supports accessing existing TrueCrypt volumes, and the tcplay package also supports creation of new volumes.In my case, I just wanted to read the music files, so it turns out that cryptsetup was all I needed, but I didn't figure that out until after I'd already installed tcplay as well.For both cryptsetup and tcplay, one of the things you need to set up in order to access a TrueCrypt encrypted file (as opposed to a fully encrypted volume) is a loopback device - these let you map a filesystem block device back to a file living on another filesystem. The examples in the tcplay manual page (man tcplay) indicated the command I needed to set that up was losetup.However, the losetup instructions gave me trouble, as they appeared to be telling me I didn't have any loopback devices:[[email protected] ~]$ losetup -f losetup: cannot find an unused loop device: No such file or directorySearching on Google for "fedora create a loop device" brought me to this Unix & Linux Stack Exchange question as the first result, but the answer there struck me as being far too low level to be reasonable as a prerequisite for accessing encrypted files as volumes.So I scanned further down through the list of search results, with this Fedora bug report about difficulty accessing TrueCrypt volumes catching my eye. As with the Stack Overflow answer, most of the comments there seemed to be about reverting the effect of a change to Fedora's default behaviour a change which meant that Fedora no longer came with any loop devices preconfigured.However, looking more closely at Kay's original request to trim back the list of default devices revealed an interesting statement: "Loop devices can and should be created on-demand, and only when needed, losetup has been updated since Fedora 17 to do that just fine."That didn't match my own experience with the losetup command, so I wondered what might be going on to explain the discrepancy, which is when it occurred to me that running losetup with root access might solve the problem. Generally speaking, ordinary users aren't going to have the permissions needed to create new devices, and I'd been running the losetup command using my normal user permissions rather than running it as root. That was a fairly straightforward theory to test, and sure enough, that worked:[[email protected] ~]$ sudo losetup -f /dev/loop0Armed with my new loop device, I was then able to open the TrueCrypt encrypted file on the external GoFlex drive as a decrypted volume:[[email protected] ~]$ sudo cryptsetup open --type tcrypt /dev/loop0 flexdecryptedActually supplying the password to decrypt the volume wasn't a problem, as I use a password manager to limit the number of passwords I need to actually remember, while still being able to use strong passwords for different services and devices.However, even with my music files in the process of copying over to my laptop, this all still seemed a bit cryptic to me, even for the Linux command line. It would have saved me a lot of time if I'd been nudged in the direction of "sudo losetup -f" much sooner, rather than having to decide to ignore some bad advice I found on the internet and instead figure out a better answer by way of the Fedora issue tracker.So I took four additional steps:First, I filed a new issue against losetup, suggesting that it nudge the user in the direction of running it with root privileges if they first run it as a normal user and don't find any devicesSecondly, I followed up on the previous issue I had found in order to explain my findingsThirdly, I added a new answer to the Stack Exchange question I had found, suggesting the use of the higher level losetup command over the lower level mknod commandFinally, I wrote this post recounting the tale of figuring this out from a combination of local system manual pages and online searchesAdding a right-click option to Dolphin to be able to automatically mount TrueCrypt encrypted files as volumes and open them would be an even nicer solution, but also a whole lot more work. The only actual change suggested in my above set of additional steps is tweaking a particular error message in one particular situation, which should be far more attainable than a new Dolphin feature or addon. Older posts Contents © 2019 Nick Coghlan - CC0, republish as you wish. - Powered by Nikola