DeepETA: How Uber Predicts Arrival Times Using Deep Learning
Uberκ° λμ°©μκ°μ μμΈ‘νκΈ° μν΄ μ λ₯λ¬λ λͺ¨λΈλ‘ λμ΄κ°κ² λμλμ§ μ μ μλ κΈμ΄λ€.
μλ λ³Έλ¬Έμ μ΄λ₯Ό μ½κ³ μ 리ν λ΄μ©μ΄λ€.
μ°λ²λ μ¬μ©μμκ² μ°¨λ λμ°© μμΈ‘ μκ°(ETA)μ μ 곡νλ€.
ETAλ₯Ό μ¬μ©νμ¬ μκΈμ κ³μ°νκ³ , ν½μ μκ°μ μΆμ νκ³ , λΌμ΄λμ κΈ°μ¬λ₯Ό μ°κ²°νκ³ , λ°°μ‘μ κ³ννλ λ±μ μμ μ μννλ©°, μ΄λ₯Ό μ ννκ² μΆμ νλ λ₯λ ₯μ λ§€μ° μ€μνλ€. μ νν ETA μΆμ μ κ³ κ°μκ² κΈμ μ μΈ κ²½νμ μ 곡νκ³ , μλΉμ€ κ°κ²©κ³Ό μ΄μ κ²½λ‘ λ±μ μ€μ νλ λ°λ νμ©λλ€.
μ΄ κΈμμ μ°λ²κ° ETA μμΈ‘ κ°μ μ μν΄ μ λ₯λ¬λ λͺ¨λΈμ μ ννκ³ , μ΄λ€ κΈ°μ μ μ¬μ©νλμ§ λ€λ£¬λ€.
μ ν΅μ μΈ ETA μμ§μ λλ‘ λ€νΈμν¬μ μμ μΈκ·Έλ¨ΌνΈλ‘ λλ μ κ·Έλνμ κ°μ€μΉλ₯Ό λκ³ κ³μ°νλ€.
μ΅λ¨κ²½λ‘ μκ³ λ¦¬μ¦μΌλ‘ μ΅μ μ κ²½λ‘λ₯Ό μ°Ύκ³ , ETAλ₯Ό λμΆνκΈ° μν΄ κ°μ€μΉλ₯Ό ν©μ°νλ€.
νμ§λ§ μ§λλ μ§νμ΄ μλλ€. λλ‘ κ·Έλνλ λͺ¨λΈμΌ λΏμ΄λ©°, μ€μ μ§ν μν©μ λ°μνμ§ λͺ»νλ€. λν λΌμ΄λ/λλΌμ΄λ²λ€μ΄ λͺ©μ μ§κΉμ§ μ΄λ€ κ²½λ‘λ₯Ό μ νν μ§ μ μ μλ€.
κ³Όκ±° λ°μ΄ν°μ μ€μκ° μ νΈκ° κ²°ν©λ λ°μ΄ν°λ₯Ό μ¬μ©νμ¬ λλ‘ κ·Έλν μμΈ‘ μμ λ¨Έμ λ¬λ(ML) λͺ¨λΈμ νλ ¨μν΄μΌλ‘μETAλ₯Ό κ°μ μν¬ μ μμλ€.
By training machine learning (ML) models on top of the road graph prediction using historical data in combination with real-time signals, we can refine ETAs that better predict real-world outcomes.
λͺ λ κ° μ°λ²λ ETA κ°μ μ μν΄ Gradient-boosted decision tree ensemblesμ μ¬μ©νμ§λ§, μ΄μ Apache Spark + XGBoostλ‘λ Dataμ Modelμ λ λ릴 μ μλ νκ³μ λλ¬νλ€.
→ λͺ¨λΈμ κ³μ νμ₯νκ³ μ νλλ₯Ό κ°μ νκΈ° μν΄ λ°μ΄ν° λ³λ ¬ SGDλ₯Ό μ¬μ©νμ¬ λκ·λͺ¨ λ°μ΄ν° μΈνΈλ‘ νμ₯νλ κ²μ΄ μλμ μΌλ‘ μ¬μ΄ λ₯λ¬λμ μ ννλ€.
λ₯λ¬λμΌλ‘μ μ νμ μν΄ μΈ κ°μ§ μ£Όμ λ¬Έμ μ μ ν΄κ²°ν΄μΌ νλ€.
- Latency : λͺ λ°λ¦¬μ΄ λ΄μ ETA κ³μ°
- Accuracy : MAE(Mean Absolute Error, νκ· μ λ μ€μ°¨)λ₯Ό XGBoost λͺ¨λΈλ³΄λ€ κ°μ
- Generality : μ°λ²μ λͺ¨λ λΉμ¦λμ€μμ μ μΈκ³μ μΌλ‘ ETA μμΈ‘μ μ 곡
μ΄λ¬ν λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ Uber AIλ DeepETAλΌλ νλ‘μ νΈμμ Uberμ μ§λ νκ³Ό νλ ₯νμ¬ κΈλ‘λ² ETA μμΈ‘μ μν low-latency deep neural network architectureλ₯Ό κ°λ°νλ€.
7κ°μ§μ μ κ²½λ§ μν€ν μ²λ₯Ό ν μ€νΈ
→ μ΅μ’ μ μΌλ‘ Self-Attentionμ μ΄μ©ν Encoder-Decoder Architecture(Transformer-based)κ° κ°μ₯ μ ννλ€.
νΈλμ€ν¬λ¨Έ κΈ°λ° μΈμ½λκ° μ΅κ³ μ μ νλλ₯Ό μ 곡νμ§λ§ μ¨λΌμΈ μ€μκ° μλΉμ€λ₯Ό μν λκΈ° μκ° μꡬ μ¬νμ μΆ©μ‘±νκΈ°μλ λ무 λλ Έλ€.
→ κ³μ°μ λΉ λ₯΄κ² κ°μ ν Linear Transformerλ₯Ό μ ν
More Embeddings, Fewer Layers
→ DeepETAλ₯Ό λΉ λ₯΄κ² νκΈ° μν΄ μΆκ°μ μΌλ‘ feature sparsity νμ© > κΈ°λ₯ ν¬μμ±..? μ΄λΆλΆμ μ μ΄ν΄κ° λμ§ μλλ€.
- ν΄λΉ λΆλΆ λ°μ·
First of all, the model itself is relatively shallow with just a handful of layers. The vast majority of the parameters exist in embedding lookup tables. By discretizing the inputs and mapping them to embeddings, we avoid evaluating any of the unused embedding table parameters.
Discretizing the inputs gives us a clear speed advantage at serving time compared to alternative implementations. Take the geospatial embeddings pictured in Figure 5 as an example. To map a latitude and longitude to an embedding, DeepETA simply quantizes the coordinates and performs a hash lookup, which takes O(1) time. In comparison, storing embeddings in a tree data structure would require O(log N) lookup time, while using fully-connected layers to learn the same mapping would require O(N2) lookup time. Seen from this perspective, discretizing and embedding inputs is simply an instance of the classic space vs time tradeoff in computer science: by precomputing partial answers in the form of large embedding tables learned during training, we reduce the amount of computation needed at serving time.
μΆμ² : https://eng.uber.com/deepeta-how-uber-predicts-arrival-times/β