Python 103: Memory Model & Best Practices
Language Internals, Intermediate
Description: There's a growing crowd of Python users who don't consider themselves beginners anymore. However some users at this stage discover odd behavior that's hard to explain. Why doesn't code behave like it should? Why doesn't "correct" code execute correctly? We'll focus on Python's object & memory model, addressing these issues directly. Let's empower attendees to not create these bugs to begin with!
Abstract: In "Python 101," you learned basic Python syntax. In "Python 102" (or equivalent in experience), you went further, exploring Python more deeply -- creating/using classes, decorators, files, other standard library or 3rd-party modules/packages -- and graduated from being purely a beginner. Because Python has been around the block for quite awhile now, there is a continuously growing number of "Python 103" programmers out there. Many are no longer new to the language, however, they have run into various issues, bugs, or odd behavior in their code that is difficult to explain. It's time to take a closer look. This is an interactive best practices talk, focusing on how Python objects, references, and the memory model work as well as thinking about performance. Knowing more about how the interpreter works under the covers, including the relationship between data objects and memory management, will make you a much more effective Python programmer, and the (main) goal with the knowledge imparted in this talk is to empower developers to not (inadvertently) create certain classes of bugs in their code to begin with! All you need to bring is the desire to learn more about the interpreter to take your Python skills to the next level.
Bio: Wesley J Chun is the author of the bestselling Core Python titles and the Python Fundamentals Live Lessons companion video. He is coauthor of Python Web Development with Django (withdjango.com), and has written for Linux Journal, CNET, and InformIT. Wesley is an architect and Developer Advocate at Google.
Flask for Fun and Profit
All things Web, Intermediate
Description: Learn about building small and large projects with Flask in ways you probably did not see yet.
Abstract: This talk explores how you can build applications and APIs with Flask step by step by being easy to test and scale to larger and more complex scenarios.
The talk will also go a bit into the history of some design decisions in Flask and what works well and in which areas you might want to mix it with other technologies for better results.
Bio: Armin Ronacher is a Python developer, creator of the Flask framework and many popular Python libraries like Jinja, Werkzeug, MarkupSafe and others. He loves to create and design large systems and APIs. He is currently an independent software developer who works on Lektor, computer game infrastructure and the Sentry project residing in Vienna, Austria together with his wife and little boy.
What to do when your data is large, but not big
Scalable Python, Intermediate
Description: This talk will present strategies in Python for handling data that is too large to fit in memory and/or too slow to process in one thread, but small enough to still fit in one machine.
Abstract: Unless you work at a large internet company, you probably don't have BIG data, but you might have LARGE data. Large data consume an unacceptable amount of time and memory when medium strategies are used, but also incur unnecessary financial and latency costs when big strategies are used. Two basic strategies for handling large data, chunking and parallelization, will be discussed with live coded examples in Python.
Bio: I'm a research scientist currently living in the Bay Area and working in neuroethology, human evolution, and natural language processing. I currently work at D-Lab, where I help researchers apply advances in computation to their research paradigms.
Python tracing superpowers with systems tools
Performant Python, Intermediate
Description: Modern system tracers like SystemTap or Dtrace are incredibly powerful. If they're not part of your arsenal of techniques for analyzing Python code, you might be missing out. In this talk, we'll explore how these tools work, and how they can be used for dynamic, low-overhead analysis of unmodified Python programs.
Abstract: Maybe you want to profile your program, but it's running lots of C extension code and conventional profilers can't help you. Or maybe you're tracking down an emergent problem in a production system, but the logs are barren.
Advanced tracing toolkits like SystemTap can help you analyze your program in real time, without modifying or restarting it. But they can also seem dauntingly unfriendly, especially when applied to interpreted languages like Python.
Fear not! We'll talk about how kernel tracing actually works, what tools are available, and what we need to know about the Python interpreter's internals to use them effectively. We'll see how to do mixed-mode profiling, and how to trace specific events, like memory allocations or network calls. We'll discuss some of the pros and cons of these techniques, and how they can be applied to debugging systems in other languages too.
Bio: Eben is a software engineer based in San Francisco. He's used Python to do math research and build email infrastructure, among other things. He likes pie, and rock climbing.
Automating Your Browser and Desktop Apps
All things Web, Beginner
Description: There's a lot of data on the web and in your desktop apps, but accessing it can involve a lot of tedious typing and clicking. This talk is an introduction to the Selenium and PyAutoGUI modules, with live demos straight from the interactive shell. Al Sweigart explains web scraping techniques and programmatically controlling the keyboard and mouse to automate these tasks for you.
Abstract: The internet and personal computer are central tools in many jobs, including professions outside of engineering. This makes web scraping and GUI automation are relevant to not just developers and QA testers, but academics, organizers, and office workers. This talk is an introduction to Selenium and PyAutoGUI modules. and programatically controlling your browser and desktop applications from Python.
Web scraping and GUI automation frameworks have an intimidating reputation for a steep learning curve. While they do have many sophisticated features, the basics that most folks will ever need can be covered in a single presentation.
This presentation has multiple live demos to showcase these modules straight from the interactive shell.
The content from this talk is derived from Automate the Boring Stuff with Python, a beginner's Python book freely available under a Creative Commons license at https://automatetheboringstuff.com
Bio: Al Sweigart is a software developer and the author of Automate the Boring Stuff with Python, Invent Your Own Computer Games with Python, Making Games with Python & Pygame, and Hacking Secret Ciphers with Python. These books are freely available under a Creative Commons license at https://inventwithpython.com. Al enjoys haunting coffee shops, writing educational materials, cat whispering, and making useful software. He lives in San Francisco.
Django, Channels, and Distributed Systems
All things Web, Intermediate
Description: Learn about the Django Channels project, how it makes WebSockets easy, how it's not just limited to Django, and the difficulty of building WebSocket and other stateful protocol handling at scale.
Abstract: Django Channels' headline feature is bringing WebSocket support to Django, but what it provides is far more useful than that. Underlying it is a robust, generic cross-process communication mechanism, built to support and scale with stateful protocols like WebSockets.
This talk will look at the design of this mechanism - codenamed ASGI - and the difficulties of building an entire system to support WebSockets and broadcast systems across a large number of servers, and how Django encapsulates this to provide you a simple but powerful interface with good performance characteristics.
We'll also take a brief look at how parts of Channels are useable outside of Django with other web frameworks or pure Python code, and how it lets us build better systems overall.
Bio: Andrew is a Python programmer, Django core developer and Senior Engineer at Eventbrite. He's behind Django's migration and channels systems, and in his spare time enjoys mountains, archery, and cheese.
Interactive Data Visualization Applications for the Browser with Bokeh
Bryan Van de Ven
Abstract: With support from the DARPA XDATA Initiative, commercial engagements, and contributions from over 150 community members, the Bokeh visualization library (http://bokeh.pydata.org) has grown into a large, successful open source project with heavy interest and following on GitHub (https://github.com/bokeh/bokeh). The principal goals of Bokeh are to provide capability to developers and domain experts: easily create and share interactive, versatile, and powerful visualizations that extract insight from data sets that may be remote, large, or streaming. Bokeh provides a platform for anyone to create interactive data and visualization applications in the browser for themselves, their colleagues, or for a wider audience.
This talk will give a quick overview of recent developments, and demonstrate some of the newest capabilities of Bokeh including:
* Bokeh applications and the second generation Bokeh server (that is more performant, better documented, and much simpler to use and deploy)
* APIs for streaming data (both in the notebook and Bokeh applications)
* The ability to extend Bokeh with your own custom functionality (for example to create 3D plots or network graphs)
* Recent GIS features such as support for GeoJSON and tiled map data sources
* The new Datashader library that can be used together with Bokeh to visualize billions of data points.
Finally the talk will discuss near-term plans for the project, it's governance, and community development.
Bio: Bryan studied undergraduate CS and Math at UT Austin, and graduate Physics at UCLA. Currently he leads the technical effort for work done on the Bokeh project at Continuum Analytics. Previously, he has worked on feature detection and classification systems for submarine platforms, automated tools for financial risk modeling, and workflow optimization for fluid mixing simulations. He has also taught Basic, Advanced, and Scientific Python courses to more than 1500 students in the last four years.
The Python Deployment Albatross
Description: Python deployments can be notoriously tricky - a lot more trickier than they need to be. This talk will briefly outline the history of Python deployments, explore in depth the current landscape and will shed light on the variety of tools and techniques available to deploy Python applications, from the popular to the trendy/state-of-the-art to the esoteric and my experiences with them.
Abstract: To understand the current state of packaging, distribution and deployment of Python artifacts, it's important to understand the architecture and internals of commonly used tools such as distutils, setuptools, eggs, pip, PyPI, virtualenv, wheels and building compiled extensions.
In the recent few years, an interesting development from Twitter has been PEX - P(ython) EX(ecutable). PEX is famously being used at companies like Twitter, Square and LinkedIn to deploy *all* applications. This talk will chart our history of using PEX (along with the Pants build system) at imgix over the course of the last 2 years.
No talk on python packaging and distribution will be complete without mentioning the D word - yes, you guessed it right - Docker. This talk will explore the current state of Python deployment using Docker/containers as well as several anti patterns. The talk will highlight the challenges containerization calls for and why it might not be the right solution for many use cases. I'll be drawing on my experience running a Dockerized Tornado application in production.
Lastly, the talk will explore Nix - a powerful open source package manager for Linux and other Unix systems that makes package management reliable and reproducible. The talk will detail some of our experience with Nix so far as well as some of the issues we ran into, and whether Nix is a viable alternative to the existing tools in the ecosystem.
Bio: I've been working with Python for over 5 years now and am currently employed at imgix where I work on Python, Go and other misc backend stuff. I organize the San Francisco Python Twisted and Bay Area Lua Developers meetups.
A Practical Introduction to Airflow
Dealing with Data, Intermediate
Description: Moving data through transformations and from one place to another is a big part of data science/eng. We’ve been using Airflow for several months at Clover Health and have learned a lot about its strengths and weaknesses. We will use this talk to give a practical introduction to Airflow that gives people the information they need to decide whether Airflow is right for them and how to get started.
Abstract: Airflow is a popular pipeline orchestration tool for Python that allows users to configure complex (or simple!) multi-system workflows that are executed in parallel across any number of workers. A single pipeline might contain bash, Python, and SQL operations. With dependencies specified between tasks, Airflow knows which ones it can run in parallel and which ones must run after others. Airflow is written in Python and users can add their own operators with custom functionality, doing anything Python can do.
At Clover Health, we’ve been pushing Airflow’s limits, digging into the source code, and contributing patches upstream. In this talk, we’ll cover the basics of Airflow so you can use what we’ve learned to start your Airflow journey on the right foot. This talk aims to answer questions such as: What is Airflow useful for? How do I get started? What do I need to know that’s not in the docs?
Bio: I have been a scientific Python developer since 2008. I’ve worked in atmospheric science, astronomy, urban planning, web applications, and healthcare. I maintain several open source Python libraries and am currently a data engineer at Clover Health.
Python Profiling and Performance: Elementary to Enterprise
Performant Python, Intermediate
Description: This talk provides an end-to-end introduction and overview of Python performance practices, from fundamentals to functional industry practices to the future of performant Python. If you've ever felt lost in or out of touch with the constant whirl of Python performance advancements, this practical talk will put it back into perspective.
Abstract: Performance is a complex topic. It means a lot of things to a lot of people. Python gives us a great starting point: strong primitives and the "good enough" philosophy. But is Python actually good enough for performance-critical applications?
This talk defines different kinds of performance, covers basic principles, and dives right into measurement. With those foundations laid, it outlines eight approaches to scaling Python, four of which are stack-agnostic and four of which are Python-specific. It outlines many examples from industry to promote a holistic view of performance as a practical process, not a large-scale benchmarking competition.
Bio: Mahmoud Hashemi is Lead Developer of Python Infrastructure at PayPal, where he focuses on distributed systems, API design, and application security. He presented O'Reilly's Enterprise Software with Python, as well as several guides to topics from DNS to software versioning to statistics. An avid Wikipedian, Mahmoud is half of Hatnote, creators of Listen to Wikipedia and other fine wiki-based software.
Caffe + Jupyter + Pandas It’s not rocket science, well sorta.
Dealing with Data, Intermediate
Description: In this talk I will walk the users through the entire process of building a convolutional neural network for image classification. The process starts with a flask application to label your data, followed by characterizing, training, and evaluating the CNN using Pandas, Jupyter Notebooks, and Bokeh plots. Finally we show how the CNN can be deployed and used in real-world applications.
Abstract: Convolutional Neural Networks: they’re new, they’re big, they’re complex, they’re poorly documented and accordingly they are a little scary. At Planet we will image the entire earth every day, and to deliver that data to customers we need to analyze images without it ever being seen by human eyes. In this talk we’ll cover how to build, train, and characterize a neural net for image classification all from the comfort and safety of a Jupyter notebook. This talk will serve as a template for building and using your very own CNN.
Bio: Katherine Scott is a senior software engineer at Planet working on image classification. Prior to planet Ms. Scott was the co-founder and CTO of Tempo Automation and a co-founder at Sight Machine. Katherine is currently the Program Chair for the Open Source Hardware Association.
Dealing with Data, Intermediate
Description: An overview of Project Jupyter.
Abstract: Jupyter is an open source, language agnostic, interactive computing platform used in scientific computing and data science that provides multiple tools tailored for different workflows, from traditional terminal-style control to the popular web-based Notebook. The Jupyter Notebook is a web application that allows users to create and share documents that contain live code, equations, visualizations and explanatory text. Jupyter is the evolution of the original ideas in the IPython interactive shell, as we generalized them into a language agnostic protocol that has now been implemented in over 50 separate languages.
One project within the Jupyter ecosystem, JupyterHub, is a multi-user environment for Jupyter Notebooks that runs off a central server and that can be used to serve Notebooks to classes of students, corporate workgroups, or scientific research groups. JupyterHub is the backbone for UC Berkeley’s new Undergraduate Data Science Education Program, an ambitious program that aims to provide every freshman with core knowledge and skills in data science.
In this talk we will discuss and demonstrate the many development activities underway at Project Jupyter, including IPython 5.0, JupyterHub, and JupyterLab, and how these tools are used in data science, industry, scientific research, and education.
Bio: Jamie Whitacre is the technical project manager for Project Jupyter, an open-source scientific computing and data science ecosystem used extensively in academia and industry. Project Jupyter operates out of the Berkeley Institute for Data Science (BIDS) at UC Berkeley. Matthias Bussonnier is a postdoctoral researcher at BIDS and a core developer for Jupyter and IPython.
Explore Git internals using Python | Let's write `git log` in Python
Description: Git is a powerful tool for source control. It's often misunderstood and abused. Under the surface Git is an elegant and simple data structure. When you don't understand that data structure, you don't really understand Git. It is flexible enough to give you all the rope that you need to hang yourself in Git hell. However, if you understand it, you are released from Git hell.
Abstract: In this talk, we start with a simple explanation of the Git data structure on disk. We discuss where the local Git repo is stored: `.git`. From there, we discuss the `config, `HEAD`, `refs/heads`, and `objects`.
We use Python to read those data structures and reconstruct a `git log` command for any arbitrary git repository. When finished, we should have our own working command that does the same thing as `git log` for any arbitrary repository, on any branch. We'll simply start at `HEAD` and work our way down the data structure.
Although it is not *useful* to have a Python version of Git, it is *fun*. Also, this exploration helps you understand the Git tool on a much deeper level. When you can program something, you can understand it. And, understanding Git helps you be a better developer and collaborator.
Bio: Glen Jarvis has been programming Python for over 8 years and has been programming in different languages for longer. He has been certified in Linux/Unix administration by UC-Berkeley. He gained the highest certification available for Informix DBAs. He is also certified in MongoDB as Developer and Administrator. He has worked for companies such as IBM, UC-Berkeley, Sprint and Silicon Valley Start-ups. He has worked in the fields of Databases, DataScience, Bioinformatics and Web Technologies.
"Good Enough" IS Good Enough!
Description: Our culture's default assumption is that everybody should always be striving for perfection -- settling for anything less is seen as a regrettable compromise. This is wrong in most software development situations: focus instead on keeping the software simple, just "good enough", launch it early, and iteratively improve, enhance, and re-factor it. This is how software success is achieved!
Abstract: In 1989, Richard Gabriel caricatured two approaches to SW development: "worse is better" ("New Jersey approach") and "the right thing" ("MIT/Stanford approach"), reluctantly concluding NJ was more viable, for several reasons (speed of development, flexible designs, systems adaptable to a variety of uses [including changes in requirements], ease of gradual, incremental improvement, ...). And this debate hasn't died down since.
Debate rages, but reality has moved away from "right thing" ("Cathedral"-centralized "Big Design Up Front", focus on academia/large firms, unsuited to shifting real-world requirements), toward "NJ" ("Bazaar"-like, agile iterative enhancement, dynamic start-ups/independent developers, a world of always-shifting specs).
In this talk I support "the NJ approach", on both philosophical and pragmatical grounds, with examples from many areas. Winners of the "mind-share battles" focused on simplicity ("good enough"), not theoretical refinement/completeness: large ecosystems of developers, incremental improvement -- TCP/IP approach vs ISO/OSI, HTTP/HTML vs Xanadu, early Unix's simplistic (but OK) approach to interrupted system calls vs Multic's/ITS's perfectionism.
In Python, metaclasses often end up too complex (80% of their pluses can be had via class decorators, for 20% of the complexity); OTOH, incremental improvement worked just fine in sorting, generators, and guaranteed-finalization semantics.
The talk is not perfect, but I do think it's good enough.
Bio: Author of "Python in a Nutshell", co-author of "Python Cookbook", PSF Fellow, frequent speaker at Python conferences, prolific contributor to StackOverflow, and winner of the 2006 Frank Willison Memorial Award for contributions to Python, Alex currently leads "1:many tech support" for Google Cloud Platform. He's married to Anna Ravenscroft, his co-author in the "Cookbook" 2nd edition and "Nutshell" 3rd edition, also a PSF Fellow, and also a winner of the Frank Willison Memorial Award, in 2013.
Beautiful Documentation Oriented Programming
Description: Have you ever wonder how to write beautiful documentation with minimal effort? Did the tools get in your way in the process? This talk offers practical examples of leveraging simple text and docstrings to create stunning browsable documentation while making sure your code works as designed.
Abstract: Documentation is a fundamental organizational tool. Not only it help us to understand our programs, documentation can help us to develop and test our code iteratively.
Formats and tools like reStructuredText and Sphinx had made a positive lasting impact in our Python community as we can now easily write splendid documentation with little effort. In turn, the documentation can be auto-tested and taken straight from our source code avoiding redundancy.
This talk highlights the benefits of using simple text for writing programs and documentation, teaching the basics of reStructuredText, Sphinx, docstrings and doctests. We will be modeling the early stages of developing an application, following best practices, verifying program correctness and learning how to create beautiful documentation.
Bio: Daniel Mizyrycki has been programming in Python for over a decade in industry (GreenBusinessCA, Amazon, Docker) and educational (CCSF, RCSD) environments. Previously, he used assembly, C, perl, bash, founded the first Argentinean Linux User Group (1993) and consulted for early Argentinean ISPs. He loves Python's community being a PSF Contributing Member at SFPython, PyLadies, Baypiggies, PyCon and authoring sphinxserve and loadconfig. Today, he teaches Python to hundreds of Cisco engineers.
Self-Healing Systems: The Road to 99.99% Uptime
Scalable Python, Intermediate
Description: Stop firefighting and start fireproofing! There are many tools that make oncall easier and increase availability, but we'll be mostly focusing on a few principles and design patterns that help make your systems more robust.
Abstract: Feature velocity is typically a higher priority early in a software's lifecycle, but as the system matures there is an effort to start fireproofing the system. On the Yelp Transactions Platform team we've used a combination of circuit breakers, queues, and idempotent operations to minimize downtime and waking up in the middle of the night.
We'll take a look at how these design patterns help us in a distributed system, when they should be used, and common pitfalls associated.
Bio: William Ting is a longtime FOSS advocate with contributions in various projects (Pelican, autojump, pyramid_swagger, Rust, GNOME). He's currently an infrastructure engineer at Reddit, and previously on the Yelp Transaction Platform team.
Exploring complex data with Elasticsearch and Python
Dealing with Data, Intermediate
Description: Elasticsearch is a powerful open-source search and analytics engine with applications that stretch far beyond adding text-based search to a website. Learn how Elasticsearch can be used with Python and Django to crunch through complex datasets and quickly build powerful interfaces for exploring information.
Bio: Simon Willison is an engineering director at Eventbrite, a Bay Area ticketing company working to bring the world together through live experiences. Simon works as part of a small product research and prototyping lab helping develop new concepts for Eventbrite products and features. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010. He is a co-creator of the Django Web Framework.
The Tower of Abstraction
Description: Abstraction is a powerful servant, but a dangerous master. We code, design, think, debug ... on a tower of abstractions. Spolsky's Law tells us that "All abstractions leak". This talk explores why they leak, why that's often a problem, what to do about it; I also cover why sometimes abstractions SHOULD "leak", and how best to produce and consume abstraction layers.
Image processing using Python
Dealing with Data, Intermediate
Description: Image acquisition and processing have become a standard method for qualifying and quantifying experimental measurements in many fields of science and engineering. Python provides many computational tools that can be used to perform image processing. In this talk, we will walk through the most common workflow in image processing along with examples.
Abstract: Image acquisition and processing have become a standard method for qualifying and quantifying experimental measurements in many fields of science and engineering. Python offers the following advantage: simpler syntax, powerful libraries and modules that focuses on increasing the productivity and most importantly it is free and open-source.
We will learn image processing through a simple and common workflow. We will read a high-resolution image of a mice. We will filter the image to reduce noise and improve the quality of the image. We will then segment the image, so that we obtain only the bones. We will clean up the over-segmented regions using morphological operations. We will perform measurements on the segmented image. Finally, we will discuss the workflow with a Python code.
Bio: Ravi Chityala is a Senior Engineer at Elekta Inc. He has more than 12 years of experience in image processing and scientific computing. He is also a part time instructor at the UCSC Extension, San Jose, CA, where he teaches advanced Python to programmers. He uses Python for web development, scientific prototyping and computing and as a glue to automate process. He is the co-author of the book, "Image Processing and Acquisition using Python" published by CRC Press.
A/B Testing: Harder than just a color change
All things Web, Intermediate
Description: Is your Product Manager asking you to test out different text or button colors? Not sure where to start? This talk will contain methodology and two case studies from Yelp’s Transaction Platform on how to properly run an experiment and get the best result. Learn about how to run a simple button color experiment, avoid pitfalls, test, and analyze the results with confidence. Statistical confidence!
Abstract: A/B testing is a common practice for websites...but where do you begin? This data-driven approach allows you to launch experiments and features with confidence. So how do you prepare, launch, and analyze an A/B experiment? How do you know for how long to keep it running? What about which metrics to track?
This talk will present a procedure developed to run an A/B experiment, from planning the task and understanding the key metrics to analyzing the results. We will cover both simple and more complex case study, which help us understand the challenges involved in running experiments.
This talk will cover a topic that will enable developers to make more data-driven decisions but has not been covered at Pycon. By providing case studies as motivation and a procedure to implement A/B testing this talk will excite the audience. Yelp runs multiple experiments on different aspects and the Transaction Platform team has gotten unique experience of needing to create experiments with limited traffic which will be discussed in the talk.
Bio: Or Weizman is an engineer for Yelp's Transaction Platform team, which enables users to transact with Yelp's extensive set of businesses through many third party providers.
Behind Closed Doors: Managing Passwords in a Dangerous World
Description: A modern application has a lot of passwords and keys floating around. Encryption keys, database passwords, and API credentials; often typed in to text files and forgotten. Fortunately a new wave of tools are emerging to help manage, update, and audit these secrets. Come learn how to avoid being the next TechCrunch headline.
Bio: Noah Kantrowitz is a web developer turned infrastructure automation enthusiast, and all around engineering rabble-rouser. By day he builds tools and teaches, and by night he works with the Python Software Foundation infrastructure team. He is an active member of the Chef community, and enjoys merge commits, cat pictures, and beards.
Caravel - A data visualization, exploration and dashboarding platform
Dealing with Data, Intermediate
Description: Airbnb developed Caravel to provide all employees with interactive access to data while minimizing friction. Caravel's main goal is to make it easy to slice, dice and visualize data. It empowers each and everyone to perform analytics at the speed of thought.
Abstract: Topics include:
* Intuitively visualizing datasets while filtering, pivoting, and changing views
* Creating and sharing simple dashboards
* Caravel's rich set of visualizations
* Caravel's extensible, high-granularity security/permission model allowing intricate rules on who can access individual features and the dataset
* Caravel's enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP, OAuth, and REMOTE_USER through Flask AppBuilder)
* Caravel's simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function metrics are made available to the user
* Caravel’s deep integration with Druid
* Caravel’s integration with most RDBMS through SQLAlchemy
Bio: Maxime Beauchemin works at Airbnb as part of the Data Tools team, developing open source products that reduce friction that help generating insight from data. He is the creator and a leading maintainer of Apache Airflow [incubating] (a workflow engine) and Caravel (a data visualization platform). Before Airbnb, Maxime worked at Facebook on computation frameworks around engagement and growth analytics, at Yahoo! on social properties analytics, and at Ubisoft as a data warehouse architect.
Xonsh – put some Python in your Shell.
Description: Xonsh is a Python-ish, BASHwards-looking shell language and command prompt. The language is a superset of Python 3.4+ with additional support for the best parts of shells that you are used to, such as Bash, zsh, fish, and IPython. It works on all major systems including Linux, Mac OSX, and Windows. Xonsh is meant for the daily use of experts and novices alike.
Abstract: Programmers spend their time at a command line interface often sticking to
default shell. A lot of progress have been made for the friendliness,
usability, extensibility of shell. We thus introduce Xonsh which attempt to
bring the command line shell to the 21st century.
Xonsh is general purpose shell that combines Python and the best features of
Bash, zsh, IPython and fish. Written in Python and relying only the standard
library and PLY, the xonsh language is a strict superset of Python that
compiles to a Python AST. The shell can provides exciting features: rich
history, tab completion from bash and man pages, syntax highlighting,
auto-suggestion, foreign-function aliases and more!
Wether you are a novice who is looking to use use the command line, or an
Python expert Xonsh is made for you.
Because xonsh is Python, it automatically has all the available python
ecosystem at your fingertip. Xonsh makes meshing and intertwining python code
with command-line interfaces as seamless as possible. Have you ever wanted to
use regular expressions to glob files? No problem! Ever wanted to curl a remote
resource right into `json.loads()`? Now you can. Do you not want to leave the
command line to use pandas, NLTK or add two numbers together? No big deal.
The xonsh homepage is at https://xon.sh
Bio: I am a PostDoc at UC Berkeley Institute for Data science, and have been a core Developer of IPython and Jupyter for a couple of years. With a background in Physics I spend most of my time developing tools for the scientific community and for education as well as promoting Python 3.
Safe-ish By Default: The Django Security Model and How to Make it Better
Description: Come join us by the fire as we have Security Story Time with our friends, Frog and Toad. With them, you'll learn about all the things Django does to protect users and developers out of the box. We'll look at simplified code samples from the Django codebase to see what's happening under the hood, and cover how to make the Django security model even stronger in your application
Abstract: Introduction. Philip James, how long I’ve worked with Python and Django, background at EB
Introduction to the story, and the characters. Safe-ish: Talk about Django’s Security Model and how it tries to provide sane defaults for developers
Run-through of the parts of the django security model:
* XSS (brief definition). How do you turn it off? Mark Safe, | n, safe
* CSRF (brief definition). Django has middleware that checks POST requests for a token. Token is stored in cookie, also. Side-effect: harder to JS. Also, only an issue if you’re already owned, so maybe not an issue?. How to get around it? csrf_exempt
* SQLi (brief definition). Django’s ORM makes clean sql, (even when given bad data?). How? How to get around it: extra()/RawSQL()
* Clickjacking protection (brief definition). Django has middleware that sets headers browsers are supposed to respect. How to get around it: xframe_options_exempt, xframe_options_deny, xframe_options_sameorigin
* HTTPS. This one is less "out of the box" than the others, so won’t be talked about here.
* Host Header Validation (brief definition). Django verifies against allowed hosts in settings. How? get_host()
* Session security. What are django sessions?. Cookie-based by design. How can we make this better?
* Overall: Vigilance. Be aware of uses of this within your product
* HTTPS: Use it!. Set the correct settings. SECURE_SSL_REDIRECT: How does it work?
Bio: Philip is a Senior Software Engineer at Eventbrite. In his spare time, he writes novels, makes twitter bots, and gives technical talks. He used to run a webcomic, but there's just no money in it, you know? Philip is a refugee from the video games industry, and wishes anyone still there the best of luck. Philip has spoken at conferences about Python, Django, Node.js, and Linux. Philip believes in the web.
Introduction to HTTPS: A Comedy of Errors
Description: Given recent increases in hostile attacks on internet services and large scale surveillance operations by certain unnamed government organizations, security in our software is becoming ever more important. We'll give you an idea of how modern crypto works in web services and clients, look at some of the common flaws in these crypto implementations, and discuss recent developments in TLS.
Abstract: In this talk I'll explain what happens behind the scenes when we try to establish a secure connection to a web site.
I'll cover the common security flaws in popular TLS implementations like OpenSSL, and show how these issues can be avoided if we have a well-designed TLS implementation in a high level language like Python.
Finally, I'll demonstrate and discuss how the API design of OpenSSL leads to application bugs, and a lack of abstract secure defaults leads to insecure applications.
Bio: Ashwini is a Software Engineer at Eventbrite, and an open source developer living in San Francisco. In the past, she has worked on a pure Python TLS implementation through the Stripe Open Source Retreat, an asynchronous event-driven networking framework - Twisted, and a PHP implementation in RPython called HippyVM. She also served as a Director of the Python Software Foundation last year.
TensorFlow on the Web
Kendall Chuang, David Clark
All things Web, Intermediate
Description: This talk will be about walking through the steps to put a TensorFlow project into production on the web with Flask and Heroku. The goal is to introduce the project and show how TensorFlow can be used online for real data tasks, and discuss other considerations for deployment of a TensorFlow project.
Abstract: TensorFlow is a deep learning library with Python and C++ bindings that was released in 2015. The talk start with a brief intro to TensorFlow, and then dive into the specific steps to set up a simple project that can be served online.
Bio: Kendall is a lead software engineer at YesGraph, where he uses machine learning and Flask to power better invite flows for mobile and web apps. Previously he worked as an independent software consultant for four years, and before that he was a hardware designer at Qualcomm in San Diego for three years. Kendall was an an organizer of the San Diego Python Users Group, where he helped plan six one-day workshops on various Python topics.
Bio2: David Clark has a background in astrophysics, where he used Python extensively to analyze astronomical data. He recently transitioned careers to data science. Currently he is doing consulting for two startups. At Palo Alto Scientific, Inc., he uses the machine learning library TensorFlow to model sensor data from a wearable and infer a runner’s performance. He is also doing work for Quantea, Inc., making a dashboard using the Python libraries Bokeh and Pandas.
Unspeakably Evil Hacks in Service of Marginally Improved Syntax: "Compile-Time" Python Programming
Language Internals, Intermediate
Description: One of Python's strengths as a dynamic language is its suite of powerful metaprogramming tools. What happens, however, when you want to move beyond the tools provided by "traditional" metaprogramming techniques? This talk will take the audience on a brief tour of projects and techniques that stretch the boundaries of what's possible in Python.
Abstract: In this talk, we provide an introduction to several lesser-known techniques for hacking extending the functionality of Python. Along the way, we consider the costs (in clarity, portability, or otherwise) of employing nonstandard tools to work around limitations of Python.
Topics may include:
- Runtime Bytecode Rewriting (https://github.com/llllllllll/codetransformer)
- Hooking the Lexer with Custom Encodings (https://github.com/dropbox/pyxl)
- Import Hooks (https://github.com/hylang/hy, http://cython.org/)
Bio: Scott Sanderson is an engineer at Quantopian, where he is responsible for the design of Quantopian's backtesting and research APIs. He is a core developer on the open source backtesting library Zipline, and he is a contributor to several projects in the PyData ecosystem, including IPython and the Jupyter Notebook. Scott graduated from Williams College in 2013 with bachelor's degrees in Mathematics and Philosophy.
Building a Tic-Tac-Toe Two-Player Game using Tornado over Websockets
All things Web, Beginner
Description: Learn how to build a Two-player game using Python Tornado web framework. We will be using websockets to make the app realtime.
Abstract: We will live code and learn how to build a real-time game app using Tornado web framework and websockets. Through this app, we will learn how to write an web app using Tornado web framework (http://www.tornadoweb.org/), and how to communicate over websockets. We will be building a Tic-Tac-Toe two-player game to learn about these concepts. A player can start a new game or accept a challenge from another player.
When a player starts a new game, the app would create a new game channel, provide a handle to the channel that the player can send to his friend to join in and play. We will not be dealing with any authentication or logins to start a new game or to join an existing one. The goal is to show how easy it's to build an realtime app with Tornado.
Bio: Ramesh loves building data products that blend visualization and machine learning. He mostly uses Flask / Tornado to create web apps, Pandas / Scikit-learn for build machine learning models and D3 for visualizations.
Description: We talk about magic tongue-in-cheek and often with a negative connotation. This talk will look to clearly define magic and answer the question of "when is magic harmful?"
Abstract: This talk will look into some libraries we use (and love!) and the magic that powers them.
We will look at how things like namedtuple, Django models, and the Flask request object are implemented. We will look at how each of these libraries encapsulates complexity. At the end of the talk (hopefully) we will understand more about the internals of these libraries and have new tools to gauge if magic should be considered harmful for yourself.
A Guide to Bad Programming
Description: In a sea of talks and information about how to improve your coding skills, this talk will make a case for bad code in your everyday life. In this talk you'll learn how and why you should write bad code.
Abstract: Inspired by "the queen of sh*tty robots" and a talk I had recently with a friend about how often our code "optimizations" and best practices don't matter, this talk will point out some of the obsessions we have as professional programmers that don't matter and can even be harmful to the progress of a product. The talk will show how identifying as a "bad programmer" can improve your skills in the long run and help you become a better programmer. Lastly, the talk will showcase some "bad practices" that can be fine or even good when used appropriately.
Bio: I'm a web developer with a background in aerospace engineering. I'm obsessed with Web technology and created an award winning Chrome application called Neutron Drive (https://super.neutrondrive.com/). I also run the PyWeb Meetup in Houston TX and am a chair person for the PyTexas annual conference. In addition to being a Web and aerospace geek, I'm a father of three and can cook a pretty mean pizza from scratch. I hold a BS in Aerospace Engineering from Embry-Riddle Aeronautical University.
Log Visualization for dummies
Varang Amin, Darlene Wong
Dealing with Data, Beginner
Description: During this talk the attendees will have an opportunity to use the ELK(Elasticsearch, Logstash, Kibana) stack to visualize their complex log data.
Abstract: Data is the new bacon. For all industries, including health, security, entertainment, etc., it is impossible for anyone to store and analyze data without using an automated platform. A unified platform is needed to provide data visualization and extract intelligence.
Elasticsearch is a distributed, real-time, search and analytics platform. With the help of a restful API, Elasticsearch saves data and auto indexes the parsed data.
During our talk, we will walk attendees through configuring the ELK stack and visualize datasets on Kibana.
Bio: Varang Amin is working as a Sr Staff Engineer at Palo Alto Networks. Darlene Wong is working as a Sr Staff Engineer at Palo Alto Networks.
Bio2: Darlene Wong is working as a Sr Staff Engineer at Palo Alto Networks. Before PAN, she worked in development role at Juniper Networks & Cisco Systems.
Pants, or How I Learned to Stop Worrying and Love Builds
Scalable Python, Intermediate
Description: For integrated services, it makes sense to keep several logical Python projects in a single repository -- a common library, a web front end and a back end service. For such repositories, Pants (build in Python for Python, Java, C++ and more) helps maintain dependencies and build (mostly) stand-alone executables which simplify deployment.
Abstract: Pants is a modern build system written in Python. It can build Python, Java, C++, Go and more. Twitter, Square and FourSquare use it internally, and contribute to it.
Bio: Moshe is a Twisted contributor, and has contributed to core Python. He loves infrastructure for building, monitoring and making services highly available.
REST Websockets API with Django Channels
All things Web, Intermediate
Description: Building REST APIs over HTTP has been discussed time and again. But could we do the same with WebSockets? What is the performance benefit? What learnings can we carry over from HTTP to WS? This talk will describe how engineers can build a REST API over WebSockets using Django and Channels. It is largely based on my experiences trying to build a REST WebSocket API.
Abstract: Websockets have been around for a number of years but popular web frameworks have been slow to integrate because of their asynchronous nature. With Django Channels we’ve finally broken that frontier in a synchronous way. How will developers use this new territory? I will outline some of my explorations that I have serialized into a library I call channels-api. It takes familiar patterns from Django Rest Framework and applies them to websocket land. I will walk through a sample project to demonstrate the configuration and installation of the library. I will demonstrate using these patterns we can create a “REST like” API relatively quickly. We can also implement new features that HTTP doesn’t support like server side push.
Bio: Author of channels-api library. Former Lead engineer at a number of startups.
Data in a dynamic system: Strategies for backwards compatibility
Dealing with Data, Intermediate
Description: There are several unanswered questions in deploying huge schema or logic changes: How do you modify systems with zero downtime or service interruption? How do you optimize online data migrations to allow for fallbacks? Any changes in schema or code in dynamic systems may cause existing users to experience downtime. The talk focuses on strategies to ensure backwards compatibility and prevent breaking data integrity.
Abstract: In an ideal scenario, feature development is easy. Just replace the old code with new code, and you’re done. This is, in fact, true for a system in state of inertia. However, in a dynamic system, with constantly moving pieces of business logic, this presents a hard problem. There are several unanswered questions while deploying huge schema or logic changes: How do you make code and schema changes with zero downtime or service interruption? How do you optimize online migrations of data to allow for fallbacks? Any changes in schema or code in dynamic systems may cause existing users to experience downtime. The talk focuses on strategies to ensure backwards compatibility and prevent breaking data integrity.
Bio: Trisha works as a Software Engineer at Affirm, a take on modern banking started by Max Levchin. At Affirm, Trisha has worked on several projects including the creation of the underlying financial system, architecture of systems for underwriting data processing, and several other product features. She graduated from the University of Pennsylvania studying Computer Science.
One Pykid at a time
Description: If you are a Pythonista, an educator, STEM supporter, love free software and a parent then you should attend this talk. This talk brings home the importance of brining STEM and computing education to the K-12 school children early and in a timely manner.
Abstract: Python is a language that makes learning programming easy and can set the foundation for our children to go on and take STEM coursework or use their knowledge of computing in other subject areas when its time for graduate school. Our school system currently has a gap in their curriculum when it comes to computing and learning how to code. pykids is a voluntary organization that is aiming to fill that gap by providing easy to use learning resources. pykids is also encouraging classroom learning by creating a space for local meetups and volunteer classes that run through the curriculum.
The pykids set up today includes the following:
- A blog/website where students/instructors register and share ideas
- A jupyter server that allows running notebooks on the fly
- Downloadable Notebooks created for K3-High School students (WIP)
- Teaching material for K-3 students
All a student needs is a laptop and an internet connection to start learning Python!
Bio: Meenal Pant is a mom, long time programmer and yes a Pythonista!. She has worked in both the industry and academic /research institutes and therefore is keen to “build a bond” between technology and education. She is a poster presenter and speaker (education summit/lightening talks) in the past few PyCons - the most recent being PyCon2016. She is actively involved in the STEM education via her workplace and also personally in her kid’s schools.
Make sense of Deep Neural Networks using TensorBoard
Dealing with Data, Intermediate
Description: In this talk we look at some ways in which the TensorBoard utility can be used to better understand the structure of Deep Neural Networks and how they function. Best practices on how to use the TensorFlow Python API to make your models and results more interpretable are discussed.
Abstract: Deep Neural Networks are fast becoming the face of modern Machine Learning. But understanding how they work can be a real challenge, especially while you are trying to build a model. Google's recently published library, TensorFlow, includes a lesser-used utility called TensorBoard that can be used to visualize the structure of your neural network model and inspect how data flows through it. This talk will demonstrate some techniques which will help you use TensorBoard more effectively, and better understand how TensorFlow computations work. Code walkthroughs will be done in iPython notebooks, which will be made available to attendees.
Bio: Arpan likes to find computing solutions to everyday problems. He is interested in human-computer interaction, robotics and cognitive science. He obtained his PhD from North Carolina State University, focusing on biologically-inspired computer vision. Working at Udacity, he develops content for artificial intelligence and machine learning courses.
Deep Dive into Principal Components Analysis
Dealing with Data, Intermediate
Description: Commonly used in image recognition, speech to text and text analysis, Principal Components Analysis (or PCA) separates the signal from the noise in your data and reduces your dimensionality so that meaningful analyses can be performed.
Abstract: PCA is vital for reducing high dimensional models with sparsity issues, without sacrificing the information contributed by each feature. In this talk, I will be explaining what happens under the hood during PCA, making the code and math accessible and interpretable.
Bio: Rumman comes to data science from a quantitative social science background. Prior to joining Metis, she was a data scientist at Quotient Technology, where she used retailer transaction data to build an award-winning media targeting model. Her industry experience ranges from public policy, to economics, and consulting. Her prior clients include the World Bank, the Vera Institute of Justice, and the Los Angeles County Museum of the Arts. She holds two undergraduate degrees from MIT, a Masters in Quantitative Methods of the Social Sciences from Columbia, and she is currently finishing her Political Science PhD from the University of California, San Diego. Her dissertation uses machine learning techniques to determine whether single-industry towns have a broken political process. Her passion lies in teaching and learning from teaching. In her spare time, she teaches and practices yoga, reads comic books, and works on her podcast.
Turnkey Distributed Tracing: OpenTracing and Python
Performant Python, Intermediate
Description: This talk underlines the importance of distributed tracing in microservices, reveals the need for standardization of tracing instrumentation, and explains how the OpenTracing project addresses that need. We will showcase OpenTracing’s set of consistent, expressive, explicitly vendor-neutral tracing APIs with working examples using python gRPC, Flask, and possibly other frameworks if time allows.
Abstract: Microservice architectures offer many benefits but are notoriously difficult to observe or debug, especially as transactions inevitably cross process boundaries. I focused on this problem at Google and built Dapper, Google’s production distributed tracing system. I learned from Dapper’s design eccentricities that—in order to facilitate ubiquitous, transparent distributed tracing in OSS-heavy applications—as an industry we must find a way to make open-source software components interoperate from a tracing standpoint.
This need for standardized tracing instrumentation across open source brings us to OpenTracing. By offering consistent, expressive, vendor-neutral, open-source APIs for popular platforms, OpenTracing provides a semantic and syntactic standard for distributed tracing instrumentation, allowing developers to add (or switch) tracing implementations with an O(1) configuration change.
I will explain OpenTracing’s data model and the motivations behind it, describe how it generalizes across tracing systems and instrumentation languages (with a focus on Python), and show demos of how it integrates with frameworks (like Flask and gRPC) and applications. We will then cover the “hello world” of tracing: creating and linking traces that traverse process boundaries. Finally, I’ll share what’s in store for the rest of 2016 and 2017 and talk about ways to get involved.
Bio: Ben is a cofounder at LightStep, an early-stage company focused on reliability and performance management for modern distributed systems. He is a globally recognized expert in monitoring and tracing for highly concurrent, large-scale distributed systems. Previously, Ben was at Google where he built and led the engineering teams for Dapper — Google’s distributed systems tracing infrastructure — and Monarch, Google’s fleet-wide timeseries collection, storage, analysis, and alerting system.