To minimize the fuel consumption for driving, several methods have been proposed to calculate vehicles' optimal velocity profiles on a remote cloud. Considering the traffic dynamism, each vehicle needs to keep updating the velocity profile, which requires low latency for information uploading and profile calculation. However, these proposed methods cannot satisfy this requirement due to (1) high queuing delay for information uploading caused by a large number of vehicles, and (2) the neglect of the traffic light and high computation delay for velocity profile. For (1), considering the driving features of close vehicles on a road, e.g., similar velocity and interdistances, we propose to group vehicles within a certain range and let the leader vehicle in each group to upload the group information to the cloud, which then derives the velocity of each vehicle in the group. For (2), we propose spatial-temporal DP (ST-DP) that additionally considers the traffic lights. We innovatively find that the DP process makes it well suited to run on Spark (a fast parallel cluster computing framework) and then present how to run ST-DP on Spark. Finally, we demonstrate the superiority of our method using both trace-driven simulation (NS-2.33 simulator and MATLAB) and real-world experiments.