Data + AI Summit 2022 超清视频下载

Data + AI Summit 2022 于2022年06月27日至30日举行。本次会议是在旧金山进行，中国的小伙伴是可以在线收听的，一共为期四天，第一天是培训，后面几天才是正式会议。本次会议有超过200个议题，演讲嘉宾包括业界、研究和学术界的专家，本次会议主要分为六大块：

数据分析, BI 以及可视化：了解最新的数据分析、BI 和可视化技术以及客户和社区的解决方案。
数据工程：从实现数据管道到管理数据质量、ETL和数据质量框架再到数据 ops，深入了解最新的数据工程知识。
Data Lakes, Data Warehouses and Data Lakehouses：了解数据湖和数据仓库演变为 Data Lakehouses 背后的概念和最佳实践；
数据科学, 机器学习以及 MLOps：了解关于生产数据科学和机器学习管道的技术和最佳实践。
数据安全和治理：
学术研究：致力于学术和先进的工业研究领域，包括大规模调度程序，图表，数据分析和机器学习系统。

会议的全部日程请参见：https://databricks.com/dataaisummit/agenda

如果想及时了解Spark、Hadoop或者HBase相关的文章，欢迎关注微信公众号：过往记忆大数据

本次会议的第一天 KeyNote 宣布了几件重要的事情：Apache Spark 后续发展、下一代 Structured Streaming 解决方案、Delta Lake 的功能全部开源。除了第一天的 KeyNote，下面几个议题也推荐大家看看：

Apache Spark SQL Aggregate Improvement at Meta (Facebook)
Recent Parquet Improvements in Apache Spark
Spark Data Source V2 Performance Improvement: Aggregate Push Down
Deep Dive into the New Features of Apache Spark 3.2 and 3.3
Managing Straggler Executors at Apache Spark 3.3
Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
PySpark in Apache Spark 3.3 and Beyond
Delta Lake 2.0 Overview
Improving Interactive Querying Experience on Spark SQL
Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest
Radical Speed on the Lakehouses: Photon under the hood

超清视频下载途径

考虑到大家可能对不同的主题感兴趣，这里给大家整理了所有可以下载的视频，全部是超清，大家可以根据自己的兴趣去下载观看。另外，会议的 PPT 当前还不可以下载，需要 PPT 的同学可以继续关注本公众号，获取相关消息。

关注微信公众号 过往记忆大数据 或者 Java与大数据架构 并回复 10187 获取 Data + AI Summit 2022 超清视频。

可下载视频的议题

本次可下载视频的议题共 197 个。

A Low-Code Approach to 10x Data Engineering
A Modern Approach to Big Data for Finance
A Practitioner's Guide to Unity Catalog—A Technical Deep Dive
Accelerating Hybrid Data Mesh Implementation
Accidentally Building a Petabyte-Scale Cybersecurity Data Mesh in Azure With Delta Lake at HSBC
Administrator Best Practices and Tips for Future-proofing your Databricks Account
Advanced Migrations From Hive to SparkSQL
Adversarial Drifts Model Monitoring and Feedback Loops Building Human-in-the-Loop Machine Learning Systems for Content Moderation
An Advanced S3 Connector for Spark to Hunt for Cyber Attacks
Analyzing Population Health using Healthcare Claims
Apache Arrow Flight SQL: High Performance, Simplicity, and Interoperability for Data Transfers
Apache Spark SQL Aggregate Improvement at Meta Facebook
Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors
Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh
Automating Model Lifecycle Orchestration with Jenkins
Backfill Streaming Data Pipelines in Kappa Architecture
Batches Streams and Everything in between Unifying Batch and Stream Storage with Apache Pulsar and Lakehouse Architectures
Beyond Daily Batch Processing Operational Trade-Offs of Microbatch Incremental and Real-Time Processing for Your ETLs (and Your Team's Sanity)
Beyond Monitoring The Rise of Data Observability
Build an Enterprise Lakehouse for Free with Trino and Delta Lake
Building Enterprise Scale Data and Analytics Platforms at Amgen
Building Metadata and Lineage Driven Pipelines on Kubernetes
Building Production-Ready Recommender Systems with Feature Stores
Building Scalable & Advanced AI based Language Solutions for R&D using Databricks
Building Spatial Applications with Apache Spark and CARTO
Building a Lakehouse for Data Science at DoorDash
Building a Lakehouse on AWS for Less with AWS Graviton and Photon
Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Security
Building and Scaling Machine Learning-Based Products in the World's Largest Brewery
Chaos Engineering in the World of Large-Scale Complex Data Flow
Cloud Native Geospatial Analytics at JLL
Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databricks
Complete Data Security and Governance Powered by Unity Catalog and Immuta
Connecting the Dots with DataHub Lakehouse and Beyond
Constraints, Democratization, and the Modern Data Stack - Building a Data Platform At Red Ventures with Fivetran and Databricks
Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines
Correlation Over Causation Cracking the Relationship Between User Engagement and User Happiness
Cutting the Edge in Fighting Cybercrime Reverse-Engineering a Search Language to Cross-Compile it to PySpark
DELETE UPDATE MERGE Operations in Data Source V2
Data Lakehouse and Data Mesh—Two Sides of the Same Coin
Data Mesh Implementation Patterns
Data Warehousing on the Lakehouse
DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine
Databricks Lakehouse Overview
Databricks SQL Under the Hood: What's New with Live Demos
Day 1 Afternoon Keynote
Day 2 Afternoon Keynote
Day 2 Opening Keynote
Deep Dive How to Build Your Modern Data Stack on Databricks to Solve Modern Problems
Deep Dive into the New Features of Apache Spark 3.2 and 3.3
Deliver Faster Decision Intelligence From Your Lakehouse
Delta Lake, the Foundation of Your Lakehouse
Delta Live Tables Modern software engineering and management for ETL
Delta Sharing - A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse
Delta Sharing for Healthcare and Life Sciences
Democratizing Metrics at Airbnb
Designing Better MLOps Systems
Destination Lakehouse All Your Data Analytics and AI on One Platform
Distributed Machine Learning at Lyft
Dive Deeper into Data Engineering on Databricks
Doubling the Capacity of the Data Platform Without Doubling the Cost
Driving Real-Time Data Capture and Transformation in Delta Lake with Change Data Capture
Efficient and Multi-Tenant Scheduling of Big Data and AI Workloads
Eliminating AI Risk—One Model Failure at a Time
Emerging Data Architectures & Approaches for Real-Time AI using Redis
Enable Production ML with Databricks Feature Store
Enabling Advanced Analytics at The Department of State using Databricks
Enabling BI in a Lakehouse Environment How Spark and Delta Can Help With Automating a DWH Development
Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code
Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification
Entity Resolution
Evolution of Data Architectures and How to Build a Lakehouse
Financial Services Industry Forum: The Future of Financial Services is Open with Data and AI at Its Core
Fugue Tune Distributed Hybrid Hyperparameter Tuning
FugueSQL—The Enhanced SQL Interface for Pandas and Spark DataFrames
FutureMetrics Using Deep Learning to Create a Multivariate Time Series Forecasting Platform for Economic Strategic Planning
Gamer User Toxicity
Gazelle-Jni: A Middle Layer to Offload Spark SQL to Native Engines for Execution Acceleration
Government Industry Forum Lunch and Program
Hassle-Free Data Ingestion into the Lakehouse
Healthcare Data Interoperability
How AARP Services Inc. automated SAS transformation to Databricks using LeapLogic—A cloud accelerator for transformation of legacy analytics ETL DW & Hadoop
How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks with Two Different Approaches using Photon and RAPIDS Accelerator for Apache Spark
How Databricks is driving disruptive digital transformation in the airline industry
How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities
How McAfee Leverages Databricks on AWS at Scale
How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins
How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances
How To Use Databricks SQL for Analytics on Your Lakehouse
How to Implement a Semantic Layer for Your Lakehouse
How unsupervised machine learning can scale data quality monitoring in Databricks
Immuta - Unlocking sensitive use cases with automated data access
Implementing a Framework for Data Security and Policy at a Large Public Sector Agency
Implementing an End-to-End Demand Forecasting Solution Through Databricks and MLflow
Improving Apache Spark Structured Streaming Application Processing Time by Configurations Code Optimizations and Custom Data Source
Improving Interactive Querying Experience on Spark SQL
Introducing Zipline An Open Source Feature Engineering Platform
Introduction to Flux and OSS Replication
Lakehouse with Delta Lake Deep Dive
Laying the Foundation for Claims Automation
Learn to Efficiently Test ETL Pipelines
Lessons Learned from Deidentifying 700 Million Patient Notes
Leveraging ML-Powered Analytics for Rapid Insights and Action a demonstration
Live Analytics: The next user engagement frontier
Low-Code Machine Learning on Databricks with AutoML
ML on the Lakehouse Bringing Data and ML Together to Accelerate AI Use Cases
MLOps at DoorDash
MLflow Pipelines Accelerating MLOps from Development to Production
Managing Straggler Executors at Apache Spark 3.3
Meetup Women in Data and AI
Meshing About with Databricks
Migrate Your Existing DAGs to Databricks Workflows
Migrate and Modernize your Data Platform with Confluent and Databricks
Migrating Complex SAS Processes to Databricks - Case Study
Migrating SAS to a Lakehouse on Databricks and S3
Monitoring and Quality Assurance of Complex ML Deployments via Assertions
More Context Less Chaos How Atlan and Unity Catalog Power Column-Level Lineage and Active Metadata
Mosaic: A Framework for Geospatial Analytics at Scale
Moving from Apache Spark 2 to Apache Spark 3 Spark Version Upgrade at Scale in Pinterest
Multi-Touch Attribution
Multimodal Deep Learning Applied to E-commerce Big Data
Near Real-Time Analytics with Event Streaming Live Tables and Delta Sharing
Nixtla: Deep Learning for Time Series Forecasting
Opening the Floodgates Enabling Fast Unmediated End User Access to Trillion-Row Datasets with SQL Data Warehouses
Operational Analytics: Expanding the Reach of Data in the Lakehouse Era
Optimizing Speed and Scale of User-Facing Analytics Using Apache Kafka and Pinot
Orchestration Made Easy with Databricks Workflows
OvalEdge End-To-End Data Governance
Patient Cohort Building with NLP and Knowledge Graphs
Powering Up the Business with a Lakehouse
Practical Data Governance in a Large Scale Databricks Environment
Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning
Predicting and Preventing Machine Downtime with AI and Expert Alerts
Propensity Scoring Demo
Protecting Personally Identifiable Information (PII)/PHI Data in Data Lake via Column Level Encryption
Pushing the limits of scale and performance for enterprise-wide analytics: A fire-side chat with Akamai
PySpark in Apache Spark 3.3 and Beyond
Radical Speed on the Lakehouse Photon Under the Hood
Real Time Bidding
Real Time Retail Demo
Real World Evidence and Propensity Score Matching
Real-Time Search and Recommendation at Scale Using Embeddings and Hopsworks
Real-time Risk Management with Confluent & Databricks
Realize the Promise of Streaming with the Databricks Lakehouse Platform
Recent Parquet Improvements in Apache Spark
Regulatory Reporting: Automatically translate enterprise data models into efficient data pipelines
Retail Industry Forum
Rethinking Orchestration as Reconciliation Software-Defined Assets in Dagster
Running a Low Cost Versatile Data Management Ecosystem with Apache Spark at Core
SAS Migration
Scaling AI Workloads with the Ray Ecosystem
Scaling Deep Learning on Databricks
Scaling ML at CashApp with Tecton
Scaling Salesforce In-Memory Streaming Analytics Platform for Trillion Events Per Day
Scaling Your Workloads with Databricks Serverless
Search and Aggregations Made Easy with OpenSearch and NodeJS
Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture
Serving Near Real-Time Features at Scale
Simplifying Migrations to Lakehouse—the Databricks Way
Sink Framework Evolution in Apache Flink
Smart Manufacturing Real-time Process Optimization with Databricks
So Fresh and So Clean: Learn How to Build Real-Time Warehouses on Lakehouse
Sound Data Engineering in Rust—From Bits to DataFrames
Spark Inception: Exploiting the Apache Spark REPL to Build Streaming Notebooks
Stadium Analytics
Streaming Data into Delta Lake with Rust and Kafka
Streaming ML Enrichment Framework Using Advanced Delta Table Features
Supercharge your SaaS applications with a modern cloud-native database
Survey of Production ML Tech Stacks
Tackling Challenges of Distributed Deep Learning with Open Source Solutions
Take Databricks Lakehouse to the Max with Informatica
Technical and Tactical Football Analysis Through Data
The Databricks Notebook Front Door of the Lakehouse
The Future is Open - a Look at Google Cloud’s Open Data Ecosystem
The Future of Data - What’s Next with Google Cloud
The Road to a Robust Data Lake Utilizing Delta Lake and Databricks to Map 150 Million Miles of Roads a Month
Tools for Assisted Apache Spark Version Migrations From 2.1 to 3.2+
Towards Dynamic Microstructure The Role of Machine Learning in the Next Generation of Exchanges
Tredence On Shelf Availability
Turbocharge your AI/ML Databricks workflows with Precisely
Turning Fan Data Into an Asset
Unifying Data Science and Business Artificial Intelligence Augmentation and Integration into Production Business Applications
What to Do When Your Job Goes OOM in the Night Flowcharts
Why a Data Lakehouse is Critical During the Manufacturing Apocalypse
You Have BI. Now What Activate Your Data
Your fastest path to Lakehouse and beyond
dbt + Machine Learning What Makes a Great Baton Pass
dbt and Databricks: Analytics Engineering on the Lakehouse