With the development of Internet of Things (IoT) and the advance of deep learning, there is an urgent need to enable deep learning inference on IoT devices. To address the computation limitation of IoT devices in processing complex Deep Neural Networks (DNNs), partial computation offloading is developed to dynamically adjust computation offloading assignment strategy in different channel conditions for better performance. In this paper, we take advantage of intrinsic DNN computation characteristics, and propose a novel Fused-Layer-based (FL-based) DNN model parallelism method to accelerate inference. The key idea is that a DNN layer can be converted to several smaller layers to increase partial computation offloading flexibility, and thus further create better computation offloading solution. However, there is a trade-off between parallelism computation offloading flexibility and model parallelism overhead. Then, we discuss the optimal DNN model parallelism and the corresponding scheduling and offloading strategies in partial computation offloading. In particular, we present a Minimizing Waiting (MW) method, which explores both the FL strategy, the path scheduling strategy, and the path offloading strategy to reduce time complexity. Finally, we validate the effectiveness of the proposed method in commonly used DNNs. The results show that the proposed method can reduce the DNN inference time by an average of 18.39 times compared with No FL (NFL) algorithm, and is very close to the optimal solution Brute Force (BF) with greatly reduced time complexity.