Computation offloading is proposed to solve one obstacle of enabling high-accurate and real-time deep inference in resource-constrained Internet of Things (IoT) devices. Cooperative deep inference is proposed recently to further trade-off the introduced communication latency in computation offloading, which partitions a Deep Neural Network (DNN) model into two parts and utilizes the IoT end device and the server to process the DNN model cooperatively. We observe one important but ignored fact in all previous works: DNN computation and communication processing cbe conducted simultaneously in cooperative deep inference. As a result, the DNN layer-wise processing schedule has an impact on inference latency and it is non-trivial to find the optimal schedule in State-Of-The-Art (SOTA) DNNs with Directed Acyclic Graph (DAG) computational architectures. The contributions of this paper are as follows. (1) The proposed Deep Inference Optimization with Layer-wise Schedule, Deep-Inference-L, is a unique pipeline-based DAG schedule problem, which turns out to be NP-hard. (2) We categorize SOTA DNNs into three different categories and discuss the corresponding optimal processing schedule in special cases and efficient heuristic schedules in the general case. (3) The proposed solutions are extensively tested via a proof-of-concept prototype. (4) Results indicate that our algorithms can achieve an 8x speedup compared with local inference in the best case.