PyBay Talks
Python 103: Memory Model & Best Practices
Wesley Chun
Language Internals, Intermediate
Description: There's a growing crowd of Python users who don't consider themselves beginners anymore. However some users at this stage discover odd behavior that's hard to explain. Why doesn't code behave like it should? Why doesn't "correct" code execute correctly? We'll focus on Python's object & memory model, addressing these issues directly. Let's empower attendees to not create these bugs to begin with!
Abstract: In "Python 101," you learned basic Python syntax. In "Python 102" (or equivalent in experience), you went further, exploring Python more deeply -- creating/using classes, decorators, files, other standard library or 3rd-party modules/packages -- and graduated from being purely a beginner. Because Python has been around the block for quite awhile now, there is a continuously growing number of "Python 103" programmers out there. Many are no longer new to the language, however, they have run into various issues, bugs, or odd behavior in their code that is difficult to explain. It's time to take a closer look. This is an interactive best practices talk, focusing on how Python objects, references, and the memory model work as well as thinking about performance. Knowing more about how the interpreter works under the covers, including the relationship between data objects and memory management, will make you a much more effective Python programmer, and the (main) goal with the knowledge imparted in this talk is to empower developers to not (inadvertently) create certain classes of bugs in their code to begin with! All you need to bring is the desire to learn more about the interpreter to take your Python skills to the next level.
Bio: Wesley J Chun is the author of the bestselling Core Python titles and the Python Fundamentals Live Lessons companion video. He is coauthor of Python Web Development with Django (withdjango.com), and has written for Linux Journal, CNET, and InformIT. Wesley is an architect and Developer Advocate at Google.
___________________________________________________________________________________________
What to do when your data is large, but not big
Dillon Niederhut
Scalable Python, Intermediate
Description: This talk will present strategies in Python for handling data that is too large to fit in memory and/or too slow to process in one thread, but small enough to still fit in one machine.
Abstract: Unless you work at a large internet company, you probably don't have BIG data, but you might have LARGE data. Large data consume an unacceptable amount of time and memory when medium strategies are used, but also incur unnecessary financial and latency costs when big strategies are used. Two basic strategies for handling large data, chunking and parallelization, will be discussed with live coded examples in Python.
Bio: I'm a research scientist currently living in the Bay Area and working in neuroethology, human evolution, and natural language processing. I currently work at D-Lab, where I help researchers apply advances in computation to their research paradigms.
___________________________________________________________________________________________
Python tracing superpowers with systems tools
Eben Freeman
Performant Python, Intermediate
Description: Modern system tracers like SystemTap or Dtrace are incredibly powerful. If they're not part of your arsenal of techniques for analyzing Python code, you might be missing out. In this talk, we'll explore how these tools work, and how they can be used for dynamic, low-overhead analysis of unmodified Python programs.Maybe you want to profile your program, but it's running lots of C extension code and conventional profilers can't help you. Or maybe you're tracking down an emergent problem in a production system, but the logs are barren.
Abstract: Maybe you want to profile your program, but it's running lots of C extension code and conventional profilers can't help you. Or maybe you're tracking down an emergent problem in a production system, but the logs are barren.
Advanced tracing toolkits like SystemTap can help you analyze your program in real time, without modifying or restarting it. But they can also seem dauntingly unfriendly, especially when applied to interpreted languages like Python.
Fear not! We'll talk about how kernel tracing actually works, what tools are available, and what we need to know about the Python interpreter's internals to use them effectively. We'll see how to do mixed-mode profiling, and how to trace specific events, like memory allocations or network calls. We'll discuss some of the pros and cons of these techniques, and how they can be applied to debugging systems in other languages too.
Bio: Eben is a software engineer based in San Francisco. He's used Python to do math research and build email infrastructure, among other things. He likes pie, and rock climbing.
___________________________________________________________________________________________
Automating Your Browser and Desktop Apps
Al Sweigart
All things Web, Beginner
Description: There's a lot of data on the web and in your desktop apps, but accessing it can involve a lot of tedious typing and clicking. This talk is an introduction to the Selenium and PyAutoGUI modules, with live demos straight from the interactive shell. Al Sweigart explains web scraping techniques and programmatically controlling the keyboard and mouse to automate these tasks for you.
Abstract: The internet and personal computer are central tools in many jobs, including professions outside of engineering. This makes web scraping and GUI automation are relevant to not just developers and QA testers, but academics, organizers, and office workers. This talk is an introduction to Selenium and PyAutoGUI modules. and programatically controlling your browser and desktop applications from Python.
Web scraping and GUI automation frameworks have an intimidating reputation for a steep learning curve. While they do have many sophisticated features, the basics that most folks will ever need can be covered in a single presentation.
This presentation has multiple live demos to showcase these modules straight from the interactive shell.
The content from this talk is derived from Automate the Boring Stuff with Python, a beginner's Python book freely available under a Creative Commons license at https://automatetheboringstuff.com
Bio: Al Sweigart is a software developer and the author of Automate the Boring Stuff with Python, Invent Your Own Computer Games with Python, Making Games with Python & Pygame, and Hacking Secret Ciphers with Python. These books are freely available under a Creative Commons license at https://inventwithpython.com. Al enjoys haunting coffee shops, writing educational materials, cat whispering, and making useful software. He lives in San Francisco.
___________________________________________________________________________________________
Django, Channels, and Distributed Systems
Andrew Godwin
All things Web, Intermediate
Description: Learn about the Django Channels project, how it makes WebSockets easy, how it's not just limited to Django, and the difficulty of building WebSocket and other stateful protocol handling at scale.
Abstract: Django Channels' headline feature is bringing WebSocket support to Django, but what it provides is far more useful than that. Underlying it is a robust, generic cross-process communication mechanism, built to support and scale with stateful protocols like WebSockets.
This talk will look at the design of this mechanism - codenamed ASGI - and the difficulties of building an entire system to support WebSockets and broadcast systems across a large number of servers, and how Django encapsulates this to provide you a simple but powerful interface with good performance characteristics.
We'll also take a brief look at how parts of Channels are useable outside of Django with other web frameworks or pure Python code, and how it lets us build better systems overall.
Bio: Andrew is a Python programmer, Django core developer and Senior Engineer at Eventbrite. He's behind Django's migration and channels systems, and in his spare time enjoys mountains, archery, and cheese.
___________________________________________________________________________________________
Interactive Data Visualization Applications for the Browser with Bokeh
Bryan Van de Ven
/etc, Intermediate
Description: Bokeh (http://bokeh.pydata.org/en/latest) provides a compelling open-source platform for creating interactive data visualization applications in the browser. This talk will demonstrate Bokeh's newest capabilities: the second generation Bokeh server, APIs for streaming data, new hooks for user-extensibility (e.g. to easily leverage JavaScript 3D plot libraries), new features for GIS, and more.
Abstract: With support from the DARPA XDATA Initiative, commercial engagements, and contributions from over 150 community members, the Bokeh visualization library (http://bokeh.pydata.org) has grown into a large, successful open source project with heavy interest and following on GitHub (https://github.com/bokeh/bokeh). The principal goals of Bokeh are to provide capability to developers and domain experts: easily create and share interactive, versatile, and powerful visualizations that extract insight from data sets that may be remote, large, or streaming. Bokeh provides a platform for anyone to create interactive data and visualization applications in the browser for themselves, their colleagues, or for a wider audience.
This talk will give a quick overview of recent developments, and demonstrate some of the newest capabilities of Bokeh including:
* Bokeh applications and the second generation Bokeh server (that is more performant, better documented, and much simpler to use and deploy)
* APIs for streaming data (both in the notebook and Bokeh applications)
* The ability to extend Bokeh with your own custom functionality (for example to create 3D plots or network graphs)
* Recent GIS features such as support for GeoJSON and tiled map data sources
* The new Datashader library that can be used together with Bokeh to visualize billions of data points.
Finally the talk will discuss near-term plans for the project, it's governance, and community development.
Bio: Bryan studied undergraduate CS and Math at UT Austin, and graduate Physics at UCLA. Currently he leads the technical effort for work done on the Bokeh project at Continuum Analytics. Previously, he has worked on feature detection and classification systems for submarine platforms, automated tools for financial risk modeling, and workflow optimization for fluid mixing simulations. He has also taught Basic, Advanced, and Scientific Python courses to more than 1500 students in the last four years.
___________________________________________________________________________________________
The Python Deployment Albatross
Cindy Sridharan
Fundamentals, Intermediate
Description: Python deployments can be notoriously tricky - a lot more trickier than they need to be. This talk will briefly sketch out the history of Python deployments, explore in detail the current landscape and run the gamut from the most popular to the the most trendy/state-of-the-art to the most esoteric and my experiences with them.
Abstract: In order to understand the current state of packaging and deployments, it's important to understand how we got here in the first place. The talk will explain the architecture and internals of tools such as distutils, setuptools, eggs, pip, PyPI and virtualenv. Most importantly, the talk will explore wheels and building compiled extensions in depth.
In the recent few years, an interesting development from Twitter has been PEX - P(ython) EX(ecutable). PEX is famously being used both at Twitter and at LinkedIn to deploy *all* Python applications. This talk will chart our history of using PEX (along with the Pants build system) at Imgix for the course of the last 2 years.
No talk on python packaging and distribution will be complete without mentioning the D word - yes, you guessed it right - Docker. This talk will explore the current state of Python deployment using Docker as well as several anti patterns. The talk will highlight the challenges containerization calls for and why it might not be the right solution for many use cases. I'll be drawing on my experience running a Dockerized Python application in production.
Lastly, the talk will explore Nix - an open source powerful package manager for Linux and other Unix systems that makes package management reliable and reproducible. The talk will detail some of our experience with Nix so far as well as some of the issues we ran into, and whether Nix is a viable alternative to the existing tools in the ecosystem.
Bio: I've been working with Python for over 4 years now and am currently employed at Imgix where I'm the lead python engineer. I organize the San Francisco Python Twisted and Bay Area Lua Developers meetup.
___________________________________________________________________________________________
A Practical Introduction to Airflow
Matt Davis
Dealing with Data, Intermediate
Description: Moving data through transformations and from one place to another is a big part of data science/eng. We’ve been using Airflow for several months at Clover Health and have learned a lot about its strengths and weaknesses. We will use this talk to give a practical introduction to Airflow that gives people the information they need to decide whether Airflow is right for them and how to get started.
Abstract: Airflow is a popular pipeline orchestration tool for Python that allows users to configure complex (or simple!) multi-system workflows that are executed in parallel across any number of workers. A single pipeline might contain bash, Python, and SQL operations. With dependencies specified between tasks, Airflow knows which ones it can run in parallel and which ones must run after others. Airflow is written in Python and users can add their own operators with custom functionality, doing anything Python can do.
At Clover Health, we’ve been pushing Airflow’s limits, digging into the source code, and contributing patches upstream. In this talk, we’ll cover the basics of Airflow so you can use what we’ve learned to start your Airflow journey on the right foot. This talk aims to answer questions such as: What is Airflow useful for? How do I get started? What do I need to know that’s not in the docs?
Bio: I have been a scientific Python developer since 2008. I’ve worked in atmospheric science, astronomy, urban planning, web applications, and healthcare. I maintain several open source Python libraries and am currently a data engineer at Clover Health.
___________________________________________________________________________________________
Python Profiling and Performance: Elementary to Enterprise
Mahmoud Hashemi
Performant Python, Intermediate
Description: This talk provides an end-to-end introduction and overview of Python performance practices, from fundamentals to functional industry practices to the future of performant Python. If you've ever felt lost in or out of touch with the constant whirl of Python performance advancements, this practical talk will put it back into perspective.
Abstract: Performance is a complex topic. It means a lot of things to a lot of people. Python gives us a great starting point: strong primitives and the "good enough" philosophy. But is Python actually good enough for performance-critical applications?
This talk defines different kinds of performance, covers basic principles, and dives right into measurement. With those foundations laid, it outlines eight approaches to scaling Python, four of which are stack-agnostic and four of which are Python-specific. It outlines many examples from industry to promote a holistic view of performance as a practical process, not a large-scale benchmarking competition.
Bio: Mahmoud Hashemi is Lead Developer of Python Infrastructure at PayPal, where he focuses on distributed systems, API design, and application security. He presented O'Reilly's Enterprise Software with Python, as well as several guides to topics from DNS to software versioning to statistics. An avid Wikipedian, Mahmoud is half of Hatnote, creators of Listen to Wikipedia and other fine wiki-based software.
___________________________________________________________________________________________
Caffe + Jupyter + Pandas It’s not rocket science, well sorta.
Katherine Scott
Dealing with Data, Intermediate
Description: In this talk I will walk the users through the entire process of building a convolutional neural network for image classification. The process starts with a flask application to label your data, followed by characterizing, training, and evaluating the CNN using Pandas, Jupyter Notebooks, and Bokeh plots. Finally we show how the CNN can be deployed and used in real-world applications.
Abstract: Convolutional Neural Networks: they’re new, they’re big, they’re complex, they’re poorly documented and accordingly they are a little scary. At Planet we will image the entire earth every day, and to deliver that data to customers we need to analyze images without it ever being seen by human eyes. In this talk we’ll cover how to build, train, and characterize a neural net for image classification all from the comfort and safety of a Jupyter notebook. This talk will serve as a template for building and using your very own CNN.
Bio: Katherine Scott is a senior software engineer at Planet working on image classification. Prior to planet Ms. Scott was the co-founder and CTO of Tempo Automation and a co-founder at Sight Machine. Katherine is currently the Program Chair for the Open Source Hardware Association.
___________________________________________________________________________________________
Project Jupyter
Jamie Whitacre
Dealing with Data, Intermediate
Description: An overview of Project Jupyter.
Abstract: Jupyter is an open source, language agnostic, interactive computing platform used in scientific computing and data science that provides multiple tools tailored for different workflows, from traditional terminal-style control to the popular web-based Notebook. The Jupyter Notebook is a web application that allows users to create and share documents that contain live code, equations, visualizations and explanatory text. Jupyter is the evolution of the original ideas in the IPython interactive shell, as we generalized them into a language agnostic protocol that has now been implemented in over 50 separate languages.
One project within the Jupyter ecosystem, JupyterHub, is a multi-user environment for Jupyter Notebooks that runs off a central server and that can be used to serve Notebooks to classes of students, corporate workgroups, or scientific research groups. JupyterHub is the backbone for UC Berkeley’s new Undergraduate Data Science Education Program, an ambitious program that aims to provide every freshman with core knowledge and skills in data science.
In this talk we will discuss and demonstrate the many development activities underway at Project Jupyter, including IPython 5.0, JupyterHub, and JupyterLab, and how these tools are used in data science, industry, scientific research, and education.
Bio: Jamie Whitacre is the technical project manager for Project Jupyter, an open-source scientific computing and data science ecosystem used extensively in academia and industry. Project Jupyter operates out of the Berkeley Institute for Data Science (BIDS) at UC Berkeley. Matthias Bussonnier is a postdoctoral researcher at BIDS and a core developer for Jupyter and IPython.
___________________________________________________________________________________________
Explore Git internals using Python | Let's write `git log` in Python
Glen Jarvis
/etc, Intermediate
Description: Git is a powerful tool for source control. It's often misunderstood and abused. Under the surface Git is an elegant and simple data structure. When you don't understand that data structure, you don't really understand Git. It is flexible enough to give you all the rope that you need to hang yourself in Git hell. However, if you understand it, you are released from Git hell.
Abstract: In this talk, we start with a simple explanation of the Git data structure on disk. We discuss where the local Git repo is stored: `.git`. From there, we discuss the `config, `HEAD`, `refs/heads`, and `objects`.
We use Python to read those data structures and reconstruct a `git log` command for any arbitrary git repository. When finished, we should have our own working command that does the same thing as `git log` for any arbitrary repository, on any branch. We'll simply start at `HEAD` and work our way down the data structure.
Although it is not *useful* to have a Python version of Git, it is *fun*. Also, this exploration helps you understand the Git tool on a much deeper level. When you can program something, you can understand it. And, understanding Git helps you be a better developer and collaborator.
Bio: Glen Jarvis has been programming Python for over 8 years and has been programming in different languages for longer. He has been certified in Linux/Unix administration by UC-Berkeley. He gained the highest certification available for Informix DBAs. He is also certified in MongoDB as Developer and Administrator. He has worked for companies such as IBM, UC-Berkeley, Sprint and Silicon Valley Start-ups. He has worked in the fields of Databases, DataScience, Bioinformatics and Web Technologies.
___________________________________________________________________________________________
Beautiful Documentation Oriented Programming
Daniel Mizyrycki
Fundamentals, Beginner
Description: Have you ever wonder how to write beautiful documentation with minimal effort? Did the tools get in your way in the process? This talk offers practical examples of leveraging simple text and docstrings to create stunning browsable documentation while making sure your code works as designed.
Abstract: Documentation is a fundamental organizational tool. Not only it help us to understand our programs, documentation can help us to develop and test our code iteratively.
Formats and tools like reStructuredText and Sphinx had made a positive lasting impact in our Python community as we can now easily write splendid documentation with little effort. In turn, the documentation can be auto-tested and taken straight from our source code avoiding redundancy.
This talk highlights the benefits of using simple text for writing programs and documentation, teaching the basics of reStructuredText, Sphinx, docstrings and doctests. We will be modeling the early stages of developing an application, following best practices, verifying program correctness and learning how to create beautiful documentation.
Bio: Daniel Mizyrycki has been programming in Python for over a decade in industry (GreenBusinessCA, Amazon, Docker) and educational (CCSF, RCSD) environments. Previously, he used assembly, C, perl, bash, founded the first Argentinean Linux User Group (1993) and consulted for early Argentinean ISPs. He loves Python's community being a PSF Contributing Member at SFPython, PyLadies, Baypiggies, PyCon and authoring sphinxserve and loadconfig. Today, he teaches Python to hundreds of Cisco engineers.
___________________________________________________________________________________________
Next-generation Python Big Data Tooling, powered by Apache Arrow
Wes McKinney
Dealing with Data, Intermediate
Description: The Python data stack has struggled to interoperate well with big data systems. Apache Arrow provides standard in-memory columnar data structures that will enable Python programmers to participate in big data problems in a more natural and performant way. This talk will discuss the Apache Arrow project itself and the state of the new tools being created to help Python work better with Apache Hadoop and Apache Spark.
Bio: Wes McKinney is a software engineer at Cloudera. He is the creator of Python’s pandas library and the Ibis project, a committer to the Apache Parquet and Apache Arrow projects, and the author of the O'Reilly Media book, Python for Data Analysis. Previously, Wes was the founder and CEO of DataPad.
___________________________________________________________________________________________
"Good Enough" IS Good Enough!
Alex Martelli
Fundamentals, Intermediate
Description: Our culture's default assumption is that everybody should always be striving for perfection -- settling for anything less is seen as a regrettable compromise. This is wrong in most software development situations: focus instead on keeping the software simple, just "good enough", launch it early, and iteratively improve, enhance, and re-factor it. This is how software success is achieved!
Abstract: In 1989, Richard Gabriel caricatured two approaches to SW development: "worse is better" ("New Jersey approach") and "the right thing" ("MIT/Stanford approach"), reluctantly concluding NJ was more viable, for several reasons (speed of development, flexible designs, systems adaptable to a variety of uses [including changes in requirements], ease of gradual, incremental improvement, ...). And this debate hasn't died down since.
Debate rages, but reality has moved away from "right thing" ("Cathedral"-centralized "Big Design Up Front", focus on academia/large firms, unsuited to shifting real-world requirements), toward "NJ" ("Bazaar"-like, agile iterative enhancement, dynamic start-ups/independent developers, a world of always-shifting specs).
In this talk I support "the NJ approach", on both philosophical and pragmatical grounds, with examples from many areas. Winners of the "mind-share battles" focused on simplicity ("good enough"), not theoretical refinement/completeness: large ecosystems of developers, incremental improvement -- TCP/IP approach vs ISO/OSI, HTTP/HTML vs Xanadu, early Unix's simplistic (but OK) approach to interrupted system calls vs Multic's/ITS's perfectionism.
In Python, metaclasses often end up too complex (80% of their pluses can be had via class decorators, for 20% of the complexity); OTOH, incremental improvement worked just fine in sorting, generators, and guaranteed-finalization semantics.
The talk is not perfect, but I do think it's good enough.
Bio: Author of "Python in a Nutshell", co-author of "Python Cookbook", PSF Fellow, frequent speaker at Python conferences, prolific contributor to StackOverflow, and winner of the 2006 Frank Willison Memorial Award for contributions to Python, Alex currently leads "1:many tech support" for Google Cloud Platform. He's married to Anna Ravenscroft, his co-author in the "Cookbook" 2nd edition and "Nutshell" 3rd edition, also a PSF Fellow, and also a winner of the Frank Willison Memorial Award, in 2013.
___________________________________________________________________________________________
Self-Healing Systems: The Road to 99.99% Uptime
William Ting
Scalable Python, Intermediate
Description: Stop firefighting and start fireproofing! There are many tools that make oncall easier and increase availability, but we'll be mostly focusing on a few principles and design patterns that help make your systems more robust.
Abstract: Feature velocity is typically a higher priority early in a software's lifecycle, but as the system matures there is an effort to start fireproofing the system. On the Yelp Transactions Platform team we've used a combination of circuit breakers, queues, and idempotent operations to minimize downtime and waking up in the middle of the night.
We'll take a look at how these design patterns help us in a distributed system, when they should be used, and common pitfalls associated.
Bio: William Ting is a longtime FOSS advocate with contributions in various projects (Pelican, autojump, pyramid_swagger, Rust, GNOME). He's currently an infrastructure engineer at Reddit, and previously on the Yelp Transaction Platform team.
___________________________________________________________________________________________
Exploring complex data with Elasticsearch and Python
Simon Willison
Dealing with Data, Intermediate
Description: Elasticsearch is a powerful open-source search and analytics engine with applications that stretch far beyond adding text-based search to a website. Learn how Elasticsearch can be used with Python and Django to crunch through complex datasets and quickly build powerful interfaces for exploring information.
Bio:
Simon Willison is an engineering director at Eventbrite, a Bay Area ticketing company working to bring the world together through live experiences. Simon works as part of a small product research and prototyping lab helping develop new concepts for Eventbrite products and features. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework.
___________________________________________________________________________________________
Image processing using Python
Ravi Chityala
Dealing with Data, Intermediate
Description: Image acquisition and processing have become a standard method for qualifying and quantifying experimental measurements in many fields of science and engineering. Python provides many computational tools that can be used to perform image processing. In this talk, we will walk through the most common workflow in image processing along with examples.
Abstract: Image acquisition and processing have become a standard method for qualifying and quantifying experimental measurements in many fields of science and engineering. Python offers the following advantage: simpler syntax, powerful libraries and modules that focuses on increasing the productivity and most importantly it is free and open-source.
We will learn image processing through a simple and common workflow. We will read a high-resolution image of a mice. We will filter the image to reduce noise and improve the quality of the image. We will then segment the image, so that we obtain only the bones. We will clean up the over-segmented regions using morphological operations. We will perform measurements on the segmented image. Finally, we will discuss the workflow with a Python code.
Bio: Ravi Chityala is a Senior Engineer at Elekta Inc. He has more than 12 years of experience in image processing and scientific computing. He is also a part time instructor at the UCSC Extension, San Jose, CA, where he teaches advanced Python to programmers. He uses Python for web development, scientific prototyping and computing and as a glue to automate process. He is the co-author of the book, "Image Processing and Acquisition using Python" published by CRC Press.
___________________________________________________________________________________________
A/B Testing: Harder than just a color change
Or Weizman
All things Web, Intermediate
Description: Is your Product Manager asking you to test out different text or button colors? Not sure where to start? This talk will contain methodology and two case studies from Yelp’s Transaction Platform on how to properly run an experiment and get the best result. Learn about how to run a simple button color experiment, avoid pitfalls, test, and analyze the results with confidence. Statistical confidence!
Abstract: A/B testing is a common practice for websites...but where do you begin? This data-driven approach allows you to launch experiments and features with confidence. So how do you prepare, launch, and analyze an A/B experiment? How do you know for how long to keep it running? What about which metrics to track?
This talk will present a procedure developed to run an A/B experiment, from planning the task and understanding the key metrics to analyzing the results. We will cover both simple and more complex case study, which help us understand the challenges involved in running experiments.
This talk will cover a topic that will enable developers to make more data-driven decisions but has not been covered at Pycon. By providing case studies as motivation and a procedure to implement A/B testing this talk will excite the audience. Yelp runs multiple experiments on different aspects and the Transaction Platform team has gotten unique experience of needing to create experiments with limited traffic which will be discussed in the talk.
Bio: Or Weizman is an engineer for Yelp's Transaction Platform team, which enables users to transact with Yelp's extensive set of businesses through many third party providers.
___________________________________________________________________________________________
Behind Closed Doors: Managing Passwords in a Dangerous World
Noah Kantrowitz
Security, Intermediate
Description: A modern application has a lot of passwords and keys floating around. Encryption keys, database passwords, and API credentials; often typed in to text files and forgotten. Fortunately a new wave of tools are emerging to help manage, update, and audit these secrets. Come learn how to avoid being the next TechCrunch headline.
Bio: Noah Kantrowitz is a web developer turned infrastructure automation enthusiast, and all around engineering rabble-rouser. By day he builds tools and teaches, and by night he works with the Python Software Foundation infrastructure team. He is an active member of the Chef community, and enjoys merge commits, cat pictures, and beards.
___________________________________________________________________________________________
Caravel - A data visualization, exploration and dashboarding platform
Maxime Beauchemin
Dealing with Data, Intermediate
Description: Airbnb developed Caravel to provide all employees with interactive access to data while minimizing friction. Caravel's main goal is to make it easy to slice, dice and visualize data. It empowers each and everyone to perform analytics at the speed of thought.
Abstract: Topics include:
* Intuitively visualizing datasets while filtering, pivoting, and changing views
* Creating and sharing simple dashboards
* Caravel's rich set of visualizations
* Caravel's extensible, high-granularity security/permission model allowing intricate rules on who can access individual features and the dataset
* Caravel's enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP, OAuth, and REMOTE_USER through Flask AppBuilder)
* Caravel's simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function metrics are made available to the user
* Caravel’s deep integration with Druid
* Caravel’s integration with most RDBMS through SQLAlchemy
* How Javascript/Node/D3/React can cohabit and work well along with Python/Pypi/Flask
Bio: Maxime Beauchemin works at Airbnb as part of the Data Tools team, developing open source products that reduce friction that help generating insight from data. He is the creator and a leading maintainer of Apache Airflow [incubating] (a workflow engine) and Caravel (a data visualization platform). Before Airbnb, Maxime worked at Facebook on computation frameworks around engagement and growth analytics, at Yahoo! on social properties analytics, and at Ubisoft as a data warehouse architect.
___________________________________________________________________________________________
Xonsh – put some Python in your Shell.
Matthias Bussonnier
/etc, Intermediate
Description: Xonsh is a Python-ish, BASHwards-looking shell language and command prompt. The language is a superset of Python 3.4+ with additional support for the best parts of shells that you are used to, such as Bash, zsh, fish, and IPython. It works on all major systems including Linux, Mac OSX, and Windows. Xonsh is meant for the daily use of experts and novices alike.
Abstract: Programmers spend their time at a command line interface often sticking to
default shell. A lot of progress have been made for the friendliness,
usability, extensibility of shell. We thus introduce Xonsh which attempt to
bring the command line shell to the 21st century.
Xonsh is general purpose shell that combines Python and the best features of
Bash, zsh, IPython and fish. Written in Python and relying only the standard
library and PLY, the xonsh language is a strict superset of Python that
compiles to a Python AST. The shell can provides exciting features: rich
history, tab completion from bash and man pages, syntax highlighting,
auto-suggestion, foreign-function aliases and more!
Wether you are a novice who is looking to use use the command line, or an
Python expert Xonsh is made for you.
Because xonsh is Python, it automatically has all the available python
ecosystem at your fingertip. Xonsh makes meshing and intertwining python code
with command-line interfaces as seamless as possible. Have you ever wanted to
use regular expressions to glob files? No problem! Ever wanted to curl a remote
resource right into `json.loads()`? Now you can. Do you not want to leave the
command line to use pandas, NLTK or add two numbers together? No big deal.
The xonsh homepage is at https://xon.sh
Bio: I am a PostDoc at UC Berkeley Institute for Data science, and have been a core Developer of IPython and Jupyter for a couple of years. With a background in Physics I spend most of my time developing tools for the scientific community and for education as well as promoting Python 3.
___________________________________________________________________________________________
Safe-ish By Default: The Django Security Model and How to Make it Better
Philip James
Security, Intermediate
Description: Come join us by the fire as we have Security Story Time with our friends, Frog and Toad. With them, you'll learn about all the things Django does to protect users and developers out of the box. We'll look at simplified code samples from the Django codebase to see what's happening under the hood, and cover how to make the Django security model even stronger in your application
Abstract: Introduction. Philip James, how long I’ve worked with Python and Django, background at EB
Introduction to the story, and the characters. Safe-ish: Talk about Django’s Security Model and how it tries to provide sane defaults for developers
Run-through of the parts of the django security model:
* XSS (brief definition). How do you turn it off? Mark Safe, | n, safe
* CSRF (brief definition). Django has middleware that checks POST requests for a token. Token is stored in cookie, also. Side-effect: harder to JS. Also, only an issue if you’re already owned, so maybe not an issue?. How to get around it? csrf_exempt
* SQLi (brief definition). Django’s ORM makes clean sql, (even when given bad data?). How? How to get around it: extra()/RawSQL()
* Clickjacking protection (brief definition). Django has middleware that sets headers browsers are supposed to respect. How to get around it: xframe_options_exempt, xframe_options_deny, xframe_options_sameorigin
* HTTPS. This one is less "out of the box" than the others, so won’t be talked about here.
* Host Header Validation (brief definition). Django verifies against allowed hosts in settings. How? get_host()
* Session security. What are django sessions?. Cookie-based by design. How can we make this better?
* Overall: Vigilance. Be aware of uses of this within your product
* HTTPS: Use it!. Set the correct settings. SECURE_SSL_REDIRECT: How does it work?
Other things
Bio: Philip is a Senior Software Engineer at Eventbrite. In his spare time, he writes novels, makes twitter bots, and gives technical talks. He used to run a webcomic, but there's just no money in it, you know? Philip is a refugee from the video games industry, and wishes anyone still there the best of luck. Philip has spoken at conferences about Python, Django, Node.js, and Linux. Philip believes in the web.
___________________________________________________________________________________________
Introduction to HTTPS: A Comedy of Errors
Ashwini Oruganti
Security, Beginner
Description: Given recent increases in hostile attacks on internet services and large scale surveillance operations by certain unnamed government organizations, security in our software is becoming ever more important. We'll give you an idea of how modern crypto works in web services and clients, look at some of the common flaws in these crypto implementations, and discuss recent developments in TLS.
Abstract: In this talk I'll explain what happens behind the scenes when we try to establish a secure connection to a web site.
I'll cover the common security flaws in popular TLS implementations like OpenSSL, and show how these issues can be avoided if we have a well-designed TLS implementation in a high level language like Python.
Finally, I'll demonstrate and discuss how the API design of OpenSSL leads to application bugs, and a lack of abstract secure defaults leads to insecure applications.
Bio: Ashwini is a Software Engineer at Eventbrite, and an open source developer living in San Francisco. In the past, she has worked on a pure Python TLS implementation through the Stripe Open Source Retreat, an asynchronous event-driven networking framework - Twisted, and a PHP implementation in RPython called HippyVM. She also served as a Director of the Python Software Foundation last year.
___________________________________________________________________________________________
TensorFlow on the Web
Kendall Chuang
All things Web, Intermediate
Description: This talk will be about walking through the steps to put a TensorFlow project into production on the web with Flask and Heroku. The goal is to introduce the project and show how TensorFlow can be used online for real data tasks, and discuss other considerations for deployment of a TensorFlow project.
Abstract: TensorFlow is a deep learning library with Python and C++ bindings that was released in 2015. The talk start with a brief intro to TensorFlow, and then dive into the specific steps to set up a simple project that can be served online.
Bio: Kendall is a lead software engineer at YesGraph, where he uses machine learning and Flask to power better invite flows for mobile and web apps. Previously he worked as an independent software consultant for four years, and before that he was a hardware designer at Qualcomm in San Diego for three years. Kendall was an an organizer of the San Diego Python Users Group, where he helped plan six one-day workshops on various Python topics.
___________________________________________________________________________________________
Unspeakably Evil Hacks in Service of Marginally Improved Syntax: "Compile-Time" Python Programming
Scott Sanderson
Language Internals, Intermediate
Description: One of Python's strengths as a dynamic language is its suite of powerful metaprogramming tools. What happens, however, when you want to move beyond the tools provided by "traditional" metaprogramming techniques? This talk will take the audience on a brief tour of projects and techniques that stretch the boundaries of what's possible in Python.
Abstract: In this talk, we provide an introduction to several lesser-known techniques for hacking extending the functionality of Python. Along the way, we consider the costs (in clarity, portability, or otherwise) of employing nonstandard tools to work around limitations of Python.
Topics may include:
- Runtime Bytecode Rewriting (https://github.com/llllllllll/codetransformer)
- Hooking the Lexer with Custom Encodings (https://github.com/dropbox/pyxl)
- Import Hooks (https://github.com/hylang/hy, http://cython.org/)
Bio: Scott Sanderson is an engineer at Quantopian, where he is responsible for the design of Quantopian's backtesting and research APIs. He is a core developer on the open source backtesting library Zipline, and he is a contributor to several projects in the PyData ecosystem, including IPython and the Jupyter Notebook. Scott graduated from Williams College in 2013 with bachelor's degrees in Mathematics and Philosophy.
___________________________________________________________________________________________
Building a Tic-Tac-Toe Two-Player Game using Tornado over Websockets
Ramesh Sampath
All things Web, Beginner
Description: Learn how to build a Two-player game using Python Tornado web framework. We will be using websockets to make the app realtime.
Abstract: We will live code and learn how to build a real-time game app using Tornado web framework and websockets. Through this app, we will learn how to write an web app using Tornado web framework (http://www.tornadoweb.org/), and how to communicate over websockets. We will be building a Tic-Tac-Toe two-player game to learn about these concepts. A player can start a new game or accept a challenge from another player.
When a player starts a new game, the app would create a new game channel, provide a handle to the channel that the player can send to his friend to join in and play. We will not be dealing with any authentication or logins to start a new game or to join an existing one. The goal is to show how easy it's to build an realtime app with Tornado.
Bio: Ramesh loves building data products that blend visualization and machine learning. He mostly uses Flask / Tornado to create web apps, Pandas / Scikit-learn for build machine learning models and D3 for visualizations.
___________________________________________________________________________________________
The Game is a Graph
Meghan Heintz
Dealing with Data, Beginner
Description: The Game is a Graph: An introduction to network theory and Networkx.
How slots machines can be modeled as nodes in a graph to create a recommendation system for slots players.
Abstract: This talk would be an introduction to network theory and Networkx and how it can be applied to develop a recommendation system for slots machines within a slots game. The complex flow of users through a 60+ machine slots game can be modeled with machines as nodes and links weighted by the number of players moving between them in a session. With this framework, node and graph attributes can be calculated for segments of users with Networkx and clusters of machines that may or may not have been obvious to the game developer can be discovered. These clusters can then be used in a recommendation engine to introduce players to new machines similar players have a high affinity too.
Bio: Data Scientist at Zynga. Former environmental engineer and river restoration engineer turned data mavan and pythonista.
___________________________________________________________________________________________
A Guide to Bad Programming
Paul Bailey
/etc, Advanced
Description: In a sea of talks and information about how to improve your coding skills, this talk will make a case for bad code in your everyday life. In this talk you'll learn how and why you should write bad code.
Abstract: Inspired by "the queen of sh*tty robots" and a talk I had recently with a friend about how often our code "optimizations" and best practices don't matter, this talk will point out some of the obsessions we have as professional programmers that don't matter and can even be harmful to the progress of a product. The talk will show how identifying as a "bad programmer" can improve your skills in the long run and help you become a better programmer. Lastly, the talk will showcase some "bad practices" that can be fine or even good when used appropriately.
Bio: I'm a web developer with a background in aerospace engineering. I'm obsessed with Web technology and created an award winning Chrome application called Neutron Drive (https://super.neutrondrive.com/). I also run the PyWeb Meetup in Houston TX and am a chair person for the PyTexas annual conference. In addition to being a Web and aerospace geek, I'm a father of three and can cook a pretty mean pizza from scratch. I hold a BS in Aerospace Engineering from Embry-Riddle Aeronautical University.
___________________________________________________________________________________________
Log Visualization for dummies
Varang Amin
Dealing with Data, Beginner
Description: During this talk the attendees will have an opportunity to use the ELK(Elasticsearch, Logstash, Kibana) stack to visualize their complex log data.
Abstract: Data is the new bacon. For all industries, including health, security, entertainment, etc., it is impossible for anyone to store and analyze data without using an automated platform. A unified platform is needed to provide data visualization and extract intelligence.
Elasticsearch is a distributed, real-time, search and analytics platform. With the help of a restful API, Elasticsearch saves data and auto indexes the parsed data.
During our talk, we will walk attendees through configuring the ELK stack and visualize datasets on Kibana.
Bio: Varang Amin is working as a Sr Staff Engineer at Palo Alto Networks. Darlene Wong is working as a Sr Staff Engineer at Palo Alto Networks.
___________________________________________________________________________________________
Pants, or How I Learned to Stop Worrying and Love Builds
Moshe Zadka
Scalable Python, Intermediate
Description: For integrated services, it makes sense to keep several logical Python projects in a single repository -- a common library, a web front end and a back end service. For such repositories, Pants (build in Python for Python, Java, C++ and more) helps maintain dependencies and build (mostly) stand-alone executables which simplify deployment.
Abstract: Pants is a modern build system written in Python. It can build Python, Java, C++, Go and more. Twitter, Square and FourSquare use it internally, and contribute to it.
Bio: Moshe is a Twisted contributor, and has contributed to core Python. He loves infrastructure for building, monitoring and making services highly available.
___________________________________________________________________________________________
REST Websockets API with Django Channels
Sam Bolgert
All things Web, Intermediate
Description: Building REST APIs over HTTP has been discussed time and again. But could we do the same with WebSockets? What is the performance benefit? What learnings can we carry over from HTTP to WS? This talk will describe how engineers can build a REST API over WebSockets using Django and Channels. It is largely based on my experiences trying to build a REST WebSocket API.
Abstract: Intro
- Brief history of REST and HTTP
- Identify fallbacks of HTTP
- WebSockets
- Alright you want to use WebSockets now what?
What is Django/Channels
- Architecture Overview
- Daphne/Interface servers
- Redis/Messaging Layer
- Worker layer
What is Channels-Api
- opensource lib I wrote
- based on Django Rest Framework
Setup Project
- Channel Layer
- Routing.py
Define Consumer class
- consumers.py
- serializer_class
- model
Benefits
- Less overhead from HTTP
- More application use cases
- Server push
- Retry messages
Conclusion
- async application without writing async code
- WebSockets are cool
- Doesn't have to replace HTTP but can augment it
Bio: Author of channels-api library. Former Lead engineer at a number of startups.
___________________________________________________________________________________________
Data in a dynamic system: Strategies for backwards compatibility
Trisha Kothari
Dealing with Data, Beginner
Description: There are several unanswered questions in deploying huge schema or logic changes: How do you modify systems with zero downtime or service interruption? How do you optimize online data migrations to allow for fallbacks? Any changes in schema or code in dynamic systems may cause existing users to experience downtime. The talk focuses on strategies to ensure backwards compatibility and prevent breaking data integrity.
Abstract: In an ideal scenario, feature development is easy. Just replace the old code with new code, and you’re done. This is, in fact, true for a system in state of inertia. However, in a dynamic system, with constantly moving pieces of business logic, this presents a hard problem. There are several unanswered questions while deploying huge schema or logic changes: How do you make code and schema changes with zero downtime or service interruption? How do you optimize online migrations of data to allow for fallbacks? Any modifications in schema or code may lead to users existing in the older system to experience downtime, which may have terrible implications for user loyalty and company economics. The talk will focus on the importance of backwards compatibility, the difficulties it presents, effective deployment strategies, and finally, how developers can pay it forward to make backwards compatibility less onerous.
Bio: Trisha works as a Software Engineer at Affirm, a take on modern banking started by Max Levchin. At Affirm, Trisha has worked on several projects including the creation of the underlying financial system, architecture of systems for underwriting data processing, and several other product features. She graduated from the University of Pennsylvania studying Computer Science.
___________________________________________________________________________________________
One Pykid at a time
Meenal Pant
/etc, Beginner
Description: If you are a Pythonista, an educator, STEM supporter, love free software and a parent then you should attend this talk. This talk brings home the importance of brining STEM and computing education to the K-12 school children early and in a timely manner.
Abstract: Python is a language that makes learning programming easy and can set the foundation for our children to go on and take STEM coursework or use their knowledge of computing in other subject areas when its time for graduate school. Our school system currently has a gap in their curriculum when it comes to computing and learning how to code. pykids is a voluntary organization that is aiming to fill that gap by providing easy to use learning resources. pykids is also encouraging classroom learning by creating a space for local meetups and volunteer classes that run through the curriculum.
The pykids set up today includes the following:
- A blog/website where students/instructors register and share ideas
- A jupyter server that allows running notebooks on the fly
- Downloadable Notebooks created for K3-High School students (WIP)
- Teaching material for K-3 students
- Volunteers
All a student needs is a laptop and an internet connection to start learning Python!
Bio: Meenal Pant is a mom, long time programmer and yes a Pythonista!. She has worked in both the industry and academic /research institutes and therefore is keen to “build a bond” between technology and education. She is a poster presenter and speaker (education summit/lightening talks) in the past few PyCons - the most recent being PyCon2016. She is actively involved in the STEM education via her workplace and also personally in her kid’s schools.
___________________________________________________________________________________________
Make sense of Deep Neural Networks using TensorBoard
Arpan Chakraborty
Dealing with Data, Intermediate
Description: In this talk we look at some ways in which the TensorBoard utility can be used to better understand the structure of Deep Neural Networks and how they function. Best practices on how to use the TensorFlow Python API to make your models and results more interpretable are discussed.
Abstract: Deep Neural Networks are fast becoming the face of modern Machine Learning. But understanding how they work can be a real challenge, especially while you are trying to build a model. Google's recently published library, TensorFlow, includes a lesser-used utility called TensorBoard that can be used to visualize the structure of your neural network model and inspect how data flows through it. This talk will demonstrate some techniques which will help you use TensorBoard more effectively, and better understand how TensorFlow computations work. Code walkthroughs will be done in iPython notebooks, which will be made available to attendees.
Bio: Arpan likes to find computing solutions to everyday problems. He is interested in human-computer interaction, robotics and cognitive science. He obtained his PhD from North Carolina State University, focusing on biologically-inspired computer vision. Working at Udacity, he develops content for artificial intelligence and machine learning courses.
___________________________________________________________________________________________